Machine Translation Quality Differs Between Languages

Machine translation can be very good, very bad or anything between. Most popular machine translation services tend to produce slightly different translations but the quality level is still quite the same. More variance can be found when you look at the translation quality between languages. English–Finnish is still far from English–Spanish in terms of machine translation quality.

Main_world_languagesMajor machine translation services have always been targeted to major languages. The reasons are quite rational. Developing machine translation is business. Those languages which have the most speakers hold the biggest possible user bases − and incomes too. Even if the service itself is free, they often have the goal of getting as much users as possible. The popularity brings power in the industry and helps to improve the machine when users give feedback and correct mistakes.

In addition to wide audiences, the big languages also have the largest resources of professionally translated material. This is especially important if you are developing statistical machine translation that requires massive amount of data. Many machine translation developers would be interested in working even with the tiniest languages but the lack of material limits their work.

Like major languages, also closely related languages often translate very well automatically. Similar languages are easier to translate automatically than completely different types of languages because they follow rules and have vocabularies that are close to each others. The same logic makes it possible for a machine to learn to produce good translations. Mistakes are inevitable when one language is put into the frame of another language.

Still despite all this discussion, automatic translation services, like Google Translate and Microsoft Translator, have managed to build language selections of dozens of languages. Although they don’t sort or rank the available languages, the translation quality isn’t the same between all those languages. The differences are so remarkable in some situations that everyone can see them.

These services struggle with the same challenge of having too little translation data. One common way to overcome this issue is to use English as a milestone when translating from or to a small language. In practice this means that instead of translating straight from Arabic to Finnish, the translation can be made from Arabic to English and then from English to Finnish. You can imagine how this affects the translation quality.

Minor languages have entered a vicious circle. When the quality is low there are only few users. When there are very few users companies don’t invest on the development and the quality won’t improve. Thus the machine translation quality differences between languages won’t be going anywhere anytime soon.

It is important to notice that this discussion applies only to general machine translators that are widely available. Customized and specially built machine translators have designed for specific situations, texts and language pairs. Their quality can be totally different than what general machine translation services can produce.


