Fuzzy matching and consistency - devil’s inventions against translators (translation itself)?
Thread poster: Wladyslaw Janowski
Wladyslaw Janowski
Wladyslaw Janowski  Identity Verified
Local time: 01:58
German to Polish
+ ...
Oct 26, 2013

Since CAT tools exist, translators are faced with the problem - how to deal with existing translations. Is it sensible to recycle them? If so, then how to assess, what can be recycled with some sense and how?
Of course a kind of recycling existed before CAT era. The process took place in the brain of the translator and resources were in his/her memory, dictionaries a.s.o. There were some losses, as human memory is by far more flexible then RAM, but not so “failproof”. If my brain is i
... See more
Since CAT tools exist, translators are faced with the problem - how to deal with existing translations. Is it sensible to recycle them? If so, then how to assess, what can be recycled with some sense and how?
Of course a kind of recycling existed before CAT era. The process took place in the brain of the translator and resources were in his/her memory, dictionaries a.s.o. There were some losses, as human memory is by far more flexible then RAM, but not so “failproof”. If my brain is in good condition, I can find there “hits” faster than any hard- & software can. Moreover - I will find it in the real context, not in the statistical context or the extremely primitive formal context, called Perfect Match, Context Match a.s.o. used in CAT (based upon neighbour segments). Now we must forget that kind of “research” and rely on hits we receive from “resources” (TM, TB ...). Because these resources are practically never maintained and the internal consistence is a rarity, we must rely on things, which are not reliable per se. But so called Quality Control (differently named in different CA tools) will check our work against those resources, no brain, no real context. And our customers will simply look on QC statistics. They are not linguists and frequently even not native-speakers in the target language.
And now we get “fuzzy matches”. If even 100% matches, CM’s or PM’s are not really reliable (and it is not rare, that customers wish, we “check” them too), what to think about a purely formal, statistical “match”, which is assessed by some algorithm - but this is not identical with the algorithm of human thinking. If 7 words of 10 in a sentence match another sentence, we get 70% fuzzy. Does this mean, the work, we have to execute on such sentence, is 30% of the work we would have when translating from scratch? If the real context would be negligible (of course is not), we are still not sure, if the rest 70% of the match is correct.
At the end comes the algorithm, designed by developers of the CAT tool. As we all know, different tools but also different versions of the same tool generate different match figures. More or less these algorithms are arbitrary, not linguistically justified and in the worst case (which we sometimes MUST suspect), they can be intentionally manipulated, because the highest “value” in each business, also in translation business - is money. So we have here a fundamental conflict. What’s good for the translator, is inevitably bad for customer (agency, end-customer or both).
The story could be much longer, but let’s reassume. The following is my idea, but I have one (1) agency (where the owner is translator and the agency is rather small), which have adopted this scheme and is able to convict the end-customers.
1. 0-98% matches are no-matches
2. 99-99% matches are the only fuzzy matches paid as 60% of the main rate
3. 100%-matches, CM, PM must be excluded from the process via locking before the translator starts his/her work. If the customer wishes, the translators should “kindly” check them, they must be treated as 99% matches
Consistency is a more complicated matter - the solution depends on the individual agreement between the customer and the translator. Either QA is deciding or the translator (excluding from QA some categories or ignoring (=rejecting) error messages.
What do you think?
Regards
WJ
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:58
Member (2006)
English to Afrikaans
+ ...
@Wladyslaw Oct 26, 2013

Wladyslaw Janowski wrote:
The story could be much longer, but let’s reassume.


If I understand the "story" correctly, you are saying that since different CAT tools calculate fuzzy matches differently, translators can't be expected to give discounts for fuzzy matches, is that right?

Well, I don't think that the fact that different tools have different algorithms for matching means that translators can't be asked to give discounts for matches. The difference between CAT tools' matches will only become a financial issue for the translator when very, very large amounts of text are involved. However, if you feel differently, i.e. if you feel that some CAT tool has dramatically poorer matching, then you should simply charge a higher rate for jobs that involve that tool.

And if I understand the "story" correctly, you are also saying that translators should not give any discounts for fuzzy matches below a very high threshold (you mention 98% later in your post), is that right?

Well, I tend to agree with you that some fuzzy matching grids are a little optimistic and also mostly too complex. In reality, low fuzzy matches do not save time and should therefore not be discounted, and I don't think that one can clearly correlate the exact match percentage of higher fuzzy matches with an exact amount of effort savings, which means that it is equally silly to use a complex sliding scale of discount percentages.

Beginners in this industry and agencies who don't know any better might think that fuzzy match discounts should work like this:

* full rate for non-matches
* small discount for low fuzzy matches
* mid-range discount for high fuzzy matches
* high discount for near exact matches or exact matches
* no charge for locked segments

But once you gain some experience with texts and with matching, you'll realise that that does not hold water. A more appropriate scale would be something like this:

* full rate for non-matches and also for low fuzzy matches
* small discount for high fuzzy matches
* mid-range discount for near exact matches or exact matches
* no charge or high discount for locked segments

What you define as "low fuzzy" and "high fuzzy" is up to you and the agency, but it is rather dependent on the type of text, the text itself, and the language combination. You mention "98%" later in your post as what I assume you must think is the threshold for "low fuzzy match", but when I say "low fuzzy match" I have in mind percentages like 85% of less, not 98% or less.

The following is my idea, [and] I have one agency ... which has adopted this scheme and is able to convict the end-customers:
1. 0-98% matches are no-matches
2. 99-99% matches are the only fuzzy matches paid as 60% of the main rate
3. 100%-matches, CM, PM must be excluded from the process via locking before the translator starts his/her work. If the customer wishes, the translators should “kindly” check them, they must be treated as 99% matches.


Many people will agree with you that if a client wants a 100% discount for 100% matches, then he should lock those segments (or hide them) and not expect the translator to check them. Other translators have no objection to it, as long as it is clear what the client wants them to do.

However, I think you are being overoptimistic with your no-discount threshold of 98%.


 
Wladyslaw Janowski
Wladyslaw Janowski  Identity Verified
Local time: 01:58
German to Polish
+ ...
TOPIC STARTER
@Samuel Oct 26, 2013

If I understand the "story" correctly, you are saying that since different CAT tools calculate fuzzy matches differently, translators can't be expected to give discounts for fuzzy matches, is that right?

Not exactly. The main reason for trying not to give discount for fuzzy matches is their principal "value" meaning less or more effort. On the other side of the same thing there are also cases, when fuzzy matches are not beneficial for customers and sometimes I decide myself to give more discount then expected by the customer. But such cases are of course rather rare. IMO fuzzy matching idea is generally correct only if we define the quality of translation as a level of consistency rather then as the level of linguistic quality and terminology correctness. Consistency is sometimes useful, but should not be the seen as the highest value of a translation - and this is just a core and foundation for the fuzzy matching concept.

And if I understand the "story" correctly, you are also saying that translators should not give any discounts for fuzzy matches below a very high threshold (you mention 98% later in your post), is that right?

Of course not necessarily 98 or 99 - possibly 97 would be sufficient But this depends on algorithms and this is different in different CAT tools (differences for fuzzy matching results between SDL Studio and memoQ are significant and I would earn about 20-30% more for the project calculated in memoQ when compared with Studio). I cannot tell nothing about opther tools, as I don't use them. But I expect, there are people on PROZ, who can tell, how it looks like in another CAT programs.

I have in mind percentages like 85% of less, not 98% or less.

This would be possibly a more (but still not very) realistic limit for fuzzy-based calculation of the price

Other translators have no objection to it, as long as it is clear what the client wants them to do.

Well, all parts must be flexible to an extent, which is a fair compromise, but there is of course no strict definition of what is "fair compromise"

However, I think you are being overoptimistic with your no-discount threshold of 98%.

As I wrote, I have only one customer (agency) starting with 99% (all thereunder beeing treated as no-matches). But there are also agencies, letting translators "bid" less or more then a "start price" offered by the agency. My experience in this case is, translators with high "priority" (assessed by the end-customer and categorized as "primary preferred", "secondary preferred" a.s.o.) or by the agency itself, will be selected for the job even if there are many offers with lower price. This is the "value" of the chance, they earn less in the short term, but more in the long term, because of minor or zero costs of proofreading, higher customer's satisfaction and loyalty a.s.o.

Regards
WJ


 
LilianNekipelov
LilianNekipelov  Identity Verified
United States
Local time: 19:58
Russian to English
+ ...
I am not sure, if you have similar views , Wladyslaw Oct 27, 2013

but I would say that " fuzzy matches are a total waste of time and human effort. There are no 100% matches in nature --perhaps just 99.9 (going to infinity) What do you need those matches for at all? An experienced translator knows most of the phrases -- like 99.9% in his or her specialization. The task is just to write the text in another language. You have to check a few words sometimes -- but this is really sporadic. I would say no more than 5 words in 10,000 words on average. Maybe not even... See more
but I would say that " fuzzy matches are a total waste of time and human effort. There are no 100% matches in nature --perhaps just 99.9 (going to infinity) What do you need those matches for at all? An experienced translator knows most of the phrases -- like 99.9% in his or her specialization. The task is just to write the text in another language. You have to check a few words sometimes -- but this is really sporadic. I would say no more than 5 words in 10,000 words on average. Maybe not even that.

Now, are you supposed to check, if the suggestions are correct? How would you know?

Also, one word in English, let's say, may have to be translated into a few different words in another language, like the Slavic languages, for example. First of all because the word, or phrase, may mean different things in different context. Secondly, because word repetition is considered a poor style in those languages, in contexts other than legal documents, or strictly technical, or medical translation.

The matches may have some use -- once in a blue moon, when you have absolutely no idea how to translate a particular sentence. This does not happen that often, though, this is why the use or any discounts for matches on an everyday basis should not be required -- ever. It should be entirely up to the translator whether to use CAT tools, or even turn the suggestions on while using them.
Collapse


 
Wladyslaw Janowski
Wladyslaw Janowski  Identity Verified
Local time: 01:58
German to Polish
+ ...
TOPIC STARTER
Very similar :) Oct 27, 2013

Well lilian, I wholy agree with you and could present more arguments. For instance this. Where is the "real" context, if we must translate sentences or some wild fragments of them and not the whole text?Using CAT we are more technical correctors then translators. So in fact CAT tools are not translation tools, but rather proofing tools - for proofing of consistency, not for proofing a translation quality.
But this leads our discuttion probably in the direction, where the thread could be ba
... See more
Well lilian, I wholy agree with you and could present more arguments. For instance this. Where is the "real" context, if we must translate sentences or some wild fragments of them and not the whole text?Using CAT we are more technical correctors then translators. So in fact CAT tools are not translation tools, but rather proofing tools - for proofing of consistency, not for proofing a translation quality.
But this leads our discuttion probably in the direction, where the thread could be banned as incompatible with the forum

Regards
WJ
PS: the best translations I have produced ever was made with classic tools, since paper and pencil until Word or direct translation in DTP programs. And I was able to translate 30 pages (= about 6-7.000 words) incl. proofreading and even some correction of the resulting layout in PageMaker or Quark. So in fact CAT tools have not resulted in any significant progress in the translation process. But we should not forget where CAT comes from - large American global enterprises as Caterpillar, who invented s called "controlled language". So we should now tell, our langhuage pair is Controlled Polish/Controller German or so
Collapse


 
xxLecraxx (X)
xxLecraxx (X)
Germany
Local time: 01:58
French to German
+ ...
discounts Oct 27, 2013

Why do agencies expect translators to give discounts, anyway? I think it's unfair to be obliged to allow discounts only because you've invested in an expensive CAT tool. Something doesn't feel right here...

 
Wladyslaw Janowski
Wladyslaw Janowski  Identity Verified
Local time: 01:58
German to Polish
+ ...
TOPIC STARTER
The same with copyrights Oct 27, 2013

Marcel G. wrote:

Why do agencies expect translators to give discounts, anyway? I think it's unfair to be obliged to allow discounts only because you've invested in an expensive CAT tool. Something doesn't feel right here...

Hi Marcel,
Of course. You should earn all profits (if any) from using CAT, eventually share them with the agency, if the agency is not a simple middleman, but is doing something, you could not because of missing know-how, namely - preparing documents from end-customer for translation, maintaining termbases and translation memories.
But in fact these are our tasks. An agency preparing terminology for a project and maintaining translation memories (I mean - REALLY maintaining, so you can 100% rely on them) would be a "casus rarus".
So we are investing in CAT tools, preparing terminology, maintaining our own TM's, so we can work with reliable resources even with lower leverage, because those coming from end-customer and mostly directly passed to the translator "as is" and are all but not a reliable resource. And then we should give discounts. Because the valuable customer (mostly large enterprise) has "limited budget".
So our copyrights have a minus sign
Regards
WJ
PS: Why don't we establish a kind of "trade union" to save our rights?

[Edited at 2013-10-27 17:37 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Fuzzy matching and consistency - devil’s inventions against translators (translation itself)?







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »