Páginas no assunto: [1 2] > | Using OCR with my scanner - chunks of text missing Autor da sequência: Wendy Cummings
| Wendy Cummings Reino Unido Local time: 11:56 Espanhol para Inglês + ...
I have an HP Scanjet 4850 and it came with OCR software. Great, I thought, a solution to all my pdf problems.
However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.
I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it mi... See more I have an HP Scanjet 4850 and it came with OCR software. Great, I thought, a solution to all my pdf problems.
However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.
I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.
Its not even a case of garbled text - the paragraphs simply aren't there.
Is there a reason why the software would do this. And, more importantly, can it be fixed? ▲ Collapse | | | Uldis Liepkalns Letónia Local time: 13:56 Membro (2003) Inglês para Letão + ... Can't tell without knowing what this OCR software is | Mar 8, 2009 |
However, with my scanner too there came some OCR soft- I think it was I.R.I.S.
Compared with Finereader which I already had- no use at all... It recognises some, but nothing like Finereader.
Uldis
Wendy Leech wrote:
Is there a reason why the software would do this. And, more importantly, can it be fixed?
| | | Possible explaination | Mar 8, 2009 |
Wendy Leech wrote:
However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.
I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.
Is there a reason why the software would do this. And, more importantly, can it be fixed?
It would help if you provide the name of the OCR software. They might all quack like ducks, but not all of them are ducks.
Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read. | | | Wendy Cummings Reino Unido Local time: 11:56 Espanhol para Inglês + ... Autor do assunto
Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.
As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely. | |
|
|
Wendy Cummings Reino Unido Local time: 11:56 Espanhol para Inglês + ... Autor do assunto active sections | Mar 8, 2009 |
Bogdan Burghelea wrote:
Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read.
How do I do this? | | | Uldis Liepkalns Letónia Local time: 13:56 Membro (2003) Inglês para Letão + ... Manual recognition | Mar 8, 2009 |
Even a freeware OCR should have an option to draw (mark) recognition areas manually. And it should do it. OTOH, from my experience these "complimentary" softs are of not much practical use.
But, though I myself have not tried it, but I've heard that Office XP- 2007 already contains inbuilt OCR feature (as well as speech recognition- which is not widely known, but I can certify that the later indeed does work).
You might want to Google "OCR in Office".
Uldis
Wendy Leech wrote:
Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.
As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely. | | | HP Scanjet uses Omnia OCR | Mar 8, 2009 |
HP Scanjet uses Omnia OCR | | | Uldis Liepkalns Letónia Local time: 13:56 Membro (2003) Inglês para Letão + ...
My scanner also is HP, but the OCR sure was not Omnia.
Uldis
Russell Jones wrote:
HP Scanjet uses Omnia OCR | |
|
|
Hi Wendy
If you have Vista+Word, you can try:
Choose well by "seeing fonction" your texte to be scaned- do it. When a file with your image appears in the folder then open it by double click of a mouse. On the top of the page (on the rigth) you can see "save fonction": click on and save as *.TIFT format. Then again by double click, open it. On the top ( on the right) you can see "open fonction", then open in Microsoft Office Document Imaging. On the band of tools(on the top) click wit... See more Hi Wendy
If you have Vista+Word, you can try:
Choose well by "seeing fonction" your texte to be scaned- do it. When a file with your image appears in the folder then open it by double click of a mouse. On the top of the page (on the rigth) you can see "save fonction": click on and save as *.TIFT format. Then again by double click, open it. On the top ( on the right) you can see "open fonction", then open in Microsoft Office Document Imaging. On the band of tools(on the top) click with your mouse on the 8th window and ... after on the 9th window. It works for me. Sorry for my English.
Franela ▲ Collapse | | | Brandis (X) Local time: 12:56 Inglês para Alemão + ...
Hi! That I must admit is a great piece of software. Set at 200 dpi catch resolution you have almost all the content in one step, the rest being here and there you may have to do some copy editing. The best combination for a translator I find is Acrobat 9 ( with plug-ins) and Abby 9.0. BR Brandis | | | Wendy Cummings Reino Unido Local time: 11:56 Espanhol para Inglês + ... Autor do assunto
franela wrote:
Sorry for my English.
It is a little hard to follow your instructions. I see French is one of your languages- write in French if it is easier. | | | Beaucoup plus simple | Mar 9, 2009 |
Alors, pour scanner il faut bien choisir votre partie du texte qui doit être ensuite traitée par l’OCR à l’aide de la fonction « aperçu », en bas de la fenêtre « nouvelle numérisation ». Pour un article de presse ça peut être une colonne. Votre image, une fois scanné, se trouvera dans le répertoire « Documents scannés ». Vous faites un double-clique pour ouvrir votre fil. Une fois votre texte est sur l’écran-en haut de la page vous verrez plusieurs fonctionnalités. Tout ... See more Alors, pour scanner il faut bien choisir votre partie du texte qui doit être ensuite traitée par l’OCR à l’aide de la fonction « aperçu », en bas de la fenêtre « nouvelle numérisation ». Pour un article de presse ça peut être une colonne. Votre image, une fois scanné, se trouvera dans le répertoire « Documents scannés ». Vous faites un double-clique pour ouvrir votre fil. Une fois votre texte est sur l’écran-en haut de la page vous verrez plusieurs fonctionnalités. Tout à fait à droite se trouve la fonction « enregistrer sous » - cliquez dessus. Une nouvelle fenêtre s’ouvre, choisissez l’option *.TIFT. Enregistrez à nouveau dans le même répertoire qu'avant (vous avez seulement changé le format). Ouvrez ce fichier. En haut de la page, tout à fait à droite, vous avez la fonctionnalité « ouvrir », cliquez là. Ils vont se dérouler les options d’ouverture- choisissez Microsoft Office Document Imaging. Dans le bandeau de commende cliquez sur la 8- ème fenêtre ( reconnaître un texte par OCR). Une fois le texte reconnu - cliquez sur la fenêtre 9 (pour enregistrer votre document sous word) ▲ Collapse | |
|
|
Jing Nie China Local time: 19:56 Membro (2011) Inglês para Chinês + ... I often meet same problem. | Mar 9, 2009 |
I have found that it is due to background color or background images.
To imporve the OCR quality, you may adjust the color of scanned images in "Microsoft Office Picture Manager" before OCR procedure , it have an "auto adjust" function. It will improve the contrast of your images. There are also some other similiar freewares like GIMP can do that. | | | Piotr Bienkowski Polónia Local time: 12:56 Membro (2005) Inglês para Polaco + ... Recognize PDF as image? | Mar 9, 2009 |
If the OCR software is based on FineReader in any way, you might be able to try these two options:
* extract text from PDF, if available
* recognize PDF as image
and which one yields better results.
HTH
Piotr | | |
You may wish to take a look at the link below. If not using PDF, save your scanned copies as "TIFF", and do as the video says.
http://www.proz.com/videos/ocr
| | | Páginas no assunto: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Using OCR with my scanner - chunks of text missing Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |