Character Recognition Program that's Word-Compatible Autor da sequência: BrianHayden
| BrianHayden Estados Unidos da América Russo para Inglês
Is there anyway I could scan the pages of a dictionary, then convert them into a (massive) file on Word? If so, what would be the cheapest and simplest way? | | | Vadim Kadyrov Ucrânia Local time: 01:33 Membro (2011) Inglês para Russo + ...
The best application (I believe) is Abbyy Finereader (you can use the 8th version, it should be much cheaper than the newest one). You just scan pages into jpeg files and then use this application to OCR the images.
Still, this is an extremely time-consuming task. Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages. | | | BrianHayden Estados Unidos da América Russo para Inglês Autor do assunto More detail... | Jan 2, 2014 |
I should probably better explain what my plan -- feasible or unfeasible though it may be -- is. I like Microsoft Word, and I think it's fairly straightforward to use. I've been keeping a dictionary of idioms as a Word file, adding new entries as I encounter new new idioms. Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search for a word within the phrase, which is easier than looking through all the words of an idiom separately in ... See more I should probably better explain what my plan -- feasible or unfeasible though it may be -- is. I like Microsoft Word, and I think it's fairly straightforward to use. I've been keeping a dictionary of idioms as a Word file, adding new entries as I encounter new new idioms. Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search for a word within the phrase, which is easier than looking through all the words of an idiom separately in a standard dictionary, which still may not list the idiom. I've recently found an especially good dictionary with a lot of idioms -- and I wanted to scan that in and add it to the Word file, somehow. Hand-typing the entries from the dictionary would be murderous. Anything that would be less laborious than hand-typing is okay in my book.
And I forgot to mention that I need a program that can read Cyrillic -- since this is a dictionary, I also need a program that can read Cyrillic with accent marks. Does Abby FineReader do that? And is it user-friendly?
[Edited at 2014-01-02 08:38 GMT]
[Edited at 2014-01-02 08:39 GMT]
[Edited at 2014-01-02 08:39 GMT] ▲ Collapse | | | Rolf Keller Alemanha Local time: 00:33 Inglês para Alemão OCR needs know-how | Jan 2, 2014 |
[quote]Vadim Kadyrov wrote:
You just scan pages into jpeg files and then use this application to OCR the images.
This is possible, but must be done cautiously. JPG files can (and are if you use default settings) be non-lossless compressed, so that the OCR results will not be optimal. BTW, any OCR application should be able to use scanner input directly – no need to scan beforehand.
Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.
??? For the mentioned purpose, you probably don't want to reproduce the original layout but a clean table (one table row per dictionary item).
In the worst case you have to mark up the columns manually (in the OCR software) and ignore all the remaining. Such markup takes about 30 seconds per page, so 240 pages take 2 hours. In many cases the OCR software will do that automatically, though.
Depending on the dictionary you might have to write a Word macro that tidies up the resulting Word table. This might take one hour or one day. | |
|
|
Vadim Kadyrov Ucrânia Local time: 01:33 Membro (2011) Inglês para Russo + ... The thing I suggested | Jan 2, 2014 |
[quote]Rolf Keller wrote:
Vadim Kadyrov wrote:
You just scan pages into jpeg files and then use this application to OCR the images.
This is possible, but must be done cautiously. JPG files can (and are if you use default settings) be non-lossless compressed, so that the OCR results will not be optimal. BTW, any OCR application should be able to use scanner input directly – no need to scan beforehand.
Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.
??? For the mentioned purpose, you probably don't want to reproduce the original layout but a clean table (one table row per dictionary item).
In the worst case you have to mark up the columns manually (in the OCR software) and ignore all the remaining. Such markup takes about 30 seconds per page, so 240 pages take 2 hours. In many cases the OCR software will do that automatically, though.
Depending on the dictionary you might have to write a Word macro that tidies up the resulting Word table. This might take one hour or one day.
The thing I suggested is a general scenario, with all the details to be discussed (or suggested) later on. The thing I assumed when I saw the message of the topic starter was his wish to reproduce the hard copy of the dictionary in electronic form (ok, some old and really precious edition of this dictionary).
In case he wants only some entries from this dictionary to be digitalized, the task becomes much easier, of course.
Some words about jpeg images. In case the resolution is high, quality-related issues of this file type no longer matter, I believe.
But these are details. I think the topic starter has already seen the "path". | | | esperantisto Local time: 02:33 Membro (2006) Inglês para Russo + ... LOCALIZADOR DO WEBSITE
BrianHayden wrote:
Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search
If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.
program that can read Cyrillic with accent marks. Does Abby FineReader do that?
No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.
[Edited at 2014-01-02 11:38 GMT] | | | BrianHayden Estados Unidos da América Russo para Inglês Autor do assunto Dictionary Shell? | Jan 2, 2014 |
esperantisto wrote:
BrianHayden wrote:
Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search
If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.
program that can read Cyrillic with accent marks. Does Abby FineReader do that?
No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors. [Edited at 2014-01-02 11:38 GMT]
What is a dictionary shell? | | | BrianHayden Estados Unidos da América Russo para Inglês Autor do assunto Accent marks... | Jan 2, 2014 |
No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.
[Edited at 2014-01-02 11:38 GMT] [/quote]
Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок... See more No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.
[Edited at 2014-01-02 11:38 GMT] [/quote]
Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок, замо́к). ▲ Collapse | |
|
|
esperantisto Local time: 02:33 Membro (2006) Inglês para Russo + ... LOCALIZADOR DO WEBSITE
BrianHayden wrote:
What is a dictionary shell?
Well, a dictionary program. A program used to access dictionaries.
BrianHayden wrote:
Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок, замо́к).
No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode. | | | Emma Goldsmith Espanha Local time: 00:33 Membro (2004) Espanhol para Inglês Russian is in the drop-down list of languages in Abbyy | Jan 3, 2014 |
esperantisto wrote:
No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode.
I've got no idea either, but Russian is definitely included in the list of languages that Abbyy will recognise. (Version 11.0)
You can also add a host of symbols/letters as a "user language". For example, I've added µ, α and β because Abbyy doesn't recognise them out of the box. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Character Recognition Program that's Word-Compatible Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |