Machine translation changes the order of a list
Thread poster: Oliver Pekelharing
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 10:44
Dutch to English
May 15, 2014

I keep a machine translator (Microsoft) open on the side in my CAT tools as it often comes up with useful suggestions, can be a helpful reminder in 'its on the tip of my tongue' situations and can save time reproducing lists of countries etc. What I've been wondering of late is why Microsoft MT tends to change the order of lists. So if the source says 'Belgium, Germany and France', Microsoft MT is apt to suggest 'Germany, Belgium and France'. It also fails to stick to a single spelling conventio... See more
I keep a machine translator (Microsoft) open on the side in my CAT tools as it often comes up with useful suggestions, can be a helpful reminder in 'its on the tip of my tongue' situations and can save time reproducing lists of countries etc. What I've been wondering of late is why Microsoft MT tends to change the order of lists. So if the source says 'Belgium, Germany and France', Microsoft MT is apt to suggest 'Germany, Belgium and France'. It also fails to stick to a single spelling convention, e.g. using both ise and ize. Why? And do the competitors do this too?

Just wondering,
Thanks,
Olly
Collapse


 
Patrick Porter
Patrick Porter
United States
Local time: 05:44
Spanish to English
+ ...
It's all math May 15, 2014

I think most of these web-based MT engines are based on statistical models. One advantage of this kind of engine is that it avoids the meticulous work of creating a rules-based model. All you have to do is feed it data. A disadvantage is that it is less consistent than a rules-based engine.

Statistical MT involves computing translation probabilities based on the "training" data. There are also "language models" trained on monolingual target texts to weight the probabilities of word
... See more
I think most of these web-based MT engines are based on statistical models. One advantage of this kind of engine is that it avoids the meticulous work of creating a rules-based model. All you have to do is feed it data. A disadvantage is that it is less consistent than a rules-based engine.

Statistical MT involves computing translation probabilities based on the "training" data. There are also "language models" trained on monolingual target texts to weight the probabilities of word order and the like in the target language. A "reordering model" is also generally involved.

When MS Translator outputs "Germany, Belgium and France" it has calculated that this order "scores" the highest based on the probabilities/weights computed from the input data when the engine was trained/tuned. The reason it is inconsistent with "ize" vs. "ise" is because the data used to train it was also inconsistent. It chooses one over the other in a given context based on the number of times the result was seen in a similar context in the training data. That's one reason why widely-used general MT engines trained on mountains of data are less likely to be useful for specific purposes.
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:44
Multiplelanguages
+ ...
good statement Peter May 16, 2014

Peter, thanks for your very well stated analysis of what was happening. It's all based on probability.

Jeff


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 10:44
Dutch to English
TOPIC STARTER
Thanks Patrick May 16, 2014

Thanks for that explanation. Am I to assume it is too expensive or technologically challenging to insert a few rules, such as 'keep word order in lists', 'make spelling consistent'? Do you know if other MT providers do this? It would be an improvement, though it wouldn't tempt me to take up post-editing!

 
Patrick Porter
Patrick Porter
United States
Local time: 05:44
Spanish to English
+ ...
There are some options May 16, 2014

Olly Pekelharing wrote:

Am I to assume it is too expensive or technologically challenging to insert a few rules


I don't think this is possible with the big public MT providers, not directly. There are ways, however, to tweak the output a bit. I'm actually working on this kind of solution right now and plan to offer a limited version of this functionality in upcoming updates to my two MT plugins for SDL Studio, in case that happens to be the CAT tool you use.

It's also possible to just build and train your own MT engine, which would not necessarily be that technically challenging and/or expensive. If you are interested in this kind of thing and are comfortable with Linux, check out the "Moses" open source project (http://www.statmt.org/moses/?n=Moses.Overview). Moses uses statistical models, but since you decide on the training data, you have more control.

I've had fairly good results using Moses to train an MT engine with texts translated by me for one company. The output is pretty good on similar texts for the same company. It is still only good as a reference, but the nice thing is that having the output in view while I'm translating helps keep my style and terminology consistent. The results of other experiments with different training texts have not been as good, but the possibility of leveraging my previous work this way, beyond just translation memory/concordance, is interesting.

though it wouldn't tempt me to take up post-editing!


I agree. I don't even believe there is such a thing. It's all just translating to me. There should probably be a better term for translating while using MT merely as an aid. The phrase "post-editing" seems to be something LSPs have latched onto to justify lowering costs, but in my experience MT, at the current state of the art, offers no significant gain in productivity that would merit lowering rates for translation.

[Edited at 2014-05-16 14:23 GMT]


 
Giovanni Guarnieri MITI, MIL
Giovanni Guarnieri MITI, MIL  Identity Verified
United Kingdom
Local time: 09:44
Member (2004)
English to Italian
So... May 16, 2014

since you work in the field, Patrick, you don't believe in all this hype about incredibly well-tuned MT engines that are going to replace us (technical translators) in the non too-distant future? Is it just a rumor spread by the big LSPs to catch unaware clients looking for a cheaper alternative in their nets? Are these mysterious engines a myth or close reality? Even GT is very good for some texts, but I wonder how good these specifically developed engines really are...

 
Patrick Porter
Patrick Porter
United States
Local time: 05:44
Spanish to English
+ ...
My opinion, for what it's worth May 17, 2014

Giovanni Guarnieri MITI, MIL wrote:

since you work in the field, Patrick, you don't believe in all this hype about incredibly well-tuned MT engines that are going to replace us (technical translators) in the non too-distant future? Is it just a rumor spread by the big LSPs to catch unaware clients looking for a cheaper alternative in their nets? Are these mysterious engines a myth or close reality? Even GT is very good for some texts, but I wonder how good these specifically developed engines really are...


I'm no expert in MT and not sure how qualified I am to comment on this. I'm just a translator interested in technology and have been experimenting a bit with using my TMs and termbases to train MT engines. I've also been reading some academic and commercial literature on the topic lately.

My experience is only anecdotal, but some agency clients of mine have occasionally offered me projects involving some kind of domain-specific or company-specific custom MT output. I've mostly declined these flat-out, but in a recent instance went so far as to ask to see the source and target MT output. From what I could see it would have taken me just as long to translate the documents with or without the MT, but the agency was looking to drastically cut my rate. Since the agency has been a particularly good client of mine, I offered to work on a small sample portion of the text in good faith for 80% of my normal rate and then re-evaluate. They declined. There was probably another person willing to take the job with no negotiation, which is fine with me, since my calendar is already quite full with traditional translation work.

But what if the MT output had been incredibly good? That would be fine too. I'm running a service business and my time cost is the largest portion of my overhead. If a client could give me a tool that would, for example, double my productivity, then they would be adding some kind of value. Even if it was my own tool, in trying to stay competitive, I might want to consider passing on some of my savings to clients. I've just not run across that kind of tool.

And I think that even if a company were to succeed in developing its own specific MT with excellent output, it wouldn't be any good for another company, or on certain kinds of texts. To some extent MT may prove useful to translate some of the mountains of text that otherwise goes untranslated, but it's hard for me to imagine a scenario where this would put a dent in the demand for high-quality human translation.


 
Giovanni Guarnieri MITI, MIL
Giovanni Guarnieri MITI, MIL  Identity Verified
United Kingdom
Local time: 09:44
Member (2004)
English to Italian
thanks... May 18, 2014

for your reply, Patrick. I got the same impressions the couple of time I was asked to do some MTPE. I looked at the texts and turned them down. To me, it seems that what the LSPs are prepared to pay for PTME doesn't really match the effort required. The rate is unrealistically low. I'm sure some companies have developed very good, fine-tuned systems, but I still have to see one...

 
Martin Benjamin
Martin Benjamin
Switzerland
Local time: 10:44
Swahili to English
+ ...
inside the gears of Google Translate May 19, 2014

Giovanni Guarnieri MITI, MIL wrote:

Even GT is very good for some texts, but I wonder how good these specifically developed engines really are...


I've been doing some reverse engineering of GT, and have a soft-publication version of the discussion on the Kamusi blog at http://kamusi.org/google_translate. (Hoping for some comment from GT before finalizing.) The piece is fairly long, hopefully in the service of offering some useful new analysis. I'd be interested in additional thoughts from members of this forum before releasing a final version.

[Edited at 2014-05-19 14:30 GMT]


 
Noud van Oeteren
Noud van Oeteren
Netherlands
Local time: 10:44
English to Dutch
+ ...
MT can have dangerous quirks Sep 12, 2014

I find that Google Translate does things like lists and bullets better than Bing, but it still can be tricky,

When translating the following sentence to Dutch with GT:

It is generally used to accept the existing programming option.

becomes

Het wordt algemeen gebruikt om de bestaande mogelijkheid programmeren te beëindigen.


Google leaves out 'accept' and introduces 'ending'




[Edited at 2014-09-12 1
... See more
I find that Google Translate does things like lists and bullets better than Bing, but it still can be tricky,

When translating the following sentence to Dutch with GT:

It is generally used to accept the existing programming option.

becomes

Het wordt algemeen gebruikt om de bestaande mogelijkheid programmeren te beëindigen.


Google leaves out 'accept' and introduces 'ending'




[Edited at 2014-09-12 18:14 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Machine translation changes the order of a list






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »