Character Recognition Program that's Word-Compatible (Software applications)

Fóruns técnicos » Software applications »
Character Recognition Program that's Word-Compatible
Track this topic

Character Recognition Program that's Word-Compatible

Autor da sequência: BrianHayden

BrianHayden
Estados Unidos da América
Russo para Inglês

Jan 2, 2014

Is there anyway I could scan the pages of a dictionary, then convert them into a (massive) file on Word? If so, what would be the cheapest and simplest way?

Vadim Kadyrov

Ucrânia
Local time: 01:33
Membro (2011)
Inglês para Russo
+ ...

Yes, you can

Jan 2, 2014

The best application (I believe) is Abbyy Finereader (you can use the 8th version, it should be much cheaper than the newest one). You just scan pages into jpeg files and then use this application to OCR the images.

Still, this is an extremely time-consuming task. Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.

BrianHayden
Estados Unidos da América
Russo para Inglês

Autor do assunto

More detail...

Jan 2, 2014

I should probably better explain what my plan -- feasible or unfeasible though it may be -- is. I like Microsoft Word, and I think it's fairly straightforward to use. I've been keeping a dictionary of idioms as a Word file, adding new entries as I encounter new new idioms. Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search for a word within the phrase, which is easier than looking through all the words of an idiom separately in a standard dictionary, which still may not list the idiom. I've recently found an especially good dictionary with a lot of idioms -- and I wanted to scan that in and add it to the Word file, somehow. Hand-typing the entries from the dictionary would be murderous. Anything that would be less laborious than hand-typing is okay in my book.

And I forgot to mention that I need a program that can read Cyrillic -- since this is a dictionary, I also need a program that can read Cyrillic with accent marks. Does Abby FineReader do that? And is it user-friendly?

[Edited at 2014-01-02 08:38 GMT]

[Edited at 2014-01-02 08:39 GMT]

[Edited at 2014-01-02 08:39 GMT] ▲ Collapse

Rolf Keller
Alemanha
Local time: 00:33
Inglês para Alemão

OCR needs know-how

Jan 2, 2014

[quote]Vadim Kadyrov wrote:

You just scan pages into jpeg files and then use this application to OCR the images.

This is possible, but must be done cautiously. JPG files can (and are if you use default settings) be non-lossless compressed, so that the OCR results will not be optimal. BTW, any OCR application should be able to use scanner input directly – no need to scan beforehand.

Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.

??? For the mentioned purpose, you probably don't want to reproduce the original layout but a clean table (one table row per dictionary item).

In the worst case you have to mark up the columns manually (in the OCR software) and ignore all the remaining. Such markup takes about 30 seconds per page, so 240 pages take 2 hours. In many cases the OCR software will do that automatically, though.

Depending on the dictionary you might have to write a Word macro that tidies up the resulting Word table. This might take one hour or one day.

Vadim Kadyrov

Ucrânia
Local time: 01:33
Membro (2011)
Inglês para Russo
+ ...

The thing I suggested

Jan 2, 2014

[quote]Rolf Keller wrote:

Vadim Kadyrov wrote:

You just scan pages into jpeg files and then use this application to OCR the images.

Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.

The thing I suggested is a general scenario, with all the details to be discussed (or suggested) later on. The thing I assumed when I saw the message of the topic starter was his wish to reproduce the hard copy of the dictionary in electronic form (ok, some old and really precious edition of this dictionary).

In case he wants only some entries from this dictionary to be digitalized, the task becomes much easier, of course.

Some words about jpeg images. In case the resolution is high, quality-related issues of this file type no longer matter, I believe.

But these are details. I think the topic starter has already seen the "path".

esperantisto

Local time: 02:33
Membro (2006)
Inglês para Russo
+ ...

LOCALIZADOR DO WEBSITE

No	Jan 2, 2014

BrianHayden wrote:

Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search

If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.

program that can read Cyrillic with accent marks. Does Abby FineReader do that?

BrianHayden
Estados Unidos da América
Russo para Inglês

Autor do assunto

Dictionary Shell?

Jan 2, 2014

esperantisto wrote:

BrianHayden wrote:

Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search

If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.

program that can read Cyrillic with accent marks. Does Abby FineReader do that?

What is a dictionary shell?

BrianHayden
Estados Unidos da América
Russo para Inglês

Autor do assunto

Accent marks...

Jan 2, 2014

No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.

[Edited at 2014-01-02 11:38 GMT] [/quote]

Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок... See more

esperantisto

Local time: 02:33
Membro (2006)
Inglês para Russo
+ ...

LOCALIZADOR DO WEBSITE

Answers

Jan 3, 2014

BrianHayden wrote:

What is a dictionary shell?

Well, a dictionary program. A program used to access dictionaries.

BrianHayden wrote:

Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок, замо́к).

No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode.

Emma Goldsmith

Espanha
Local time: 00:33
Membro (2004)
Espanhol para Inglês

Russian is in the drop-down list of languages in Abbyy

Jan 3, 2014

esperantisto wrote:

No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode.

I've got no idea either, but Russian is definitely included in the list of languages that Abbyy will recognise. (Version 11.0)

You can also add a host of symbols/letters as a "user language". For example, I've added µ, α and β because Abbyy doesn't recognise them out of the box.

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderador(es) deste fórum
Natalie	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

Character Recognition Program that's Word-Compatible

Forum rules

Help and orientation

Pastey
Your smart companion app Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations. Find out more »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Publicações recentes | Perguntas frequentes | Regras | Moderadores | Banco de artigos

Your current localization setting

Português (Eu)

Select a language

More languages...

Character Recognition Program that's Word-Compatible

Character Recognition Program that's Word-Compatible

You have native languages that can be verified

Your current localization setting

Select a language