We May Soon Get Better Machine Translation Quality

The chances of machine translation (MT) quality improving more quickly in the near future have improved. Some time ago we wrote that according to our MT quality research the MT improvement has stalled in some languages. However, the situation is changing and we might very well be already experiencing a considerable improvement in MT quality.

The currently dominating technology in MT is so-called statistical machine translation. The idea is that there is one pretty general MT engine that is trained with huge amounts of bilingual texts, i.e. collections of texts in two languages. Lately the engine part has not developed so quickly but the quality improvements have come mainly from increasing the amount of training data.

However, we have been quickly approaching the limits of improvements in statistical MT. For example, Google has said that increasing the amount of data by 50% improves MT quality by only 0,5%. That is, to improve MT quality by 2% we would have to find 5 times more data than we currently have. This kind of amounts of data simply do not exist.

Now this situation seems to changing. A new technology for MT engines is emerging. The new technology is called neural networks. This is a new type of approach and may very well achieve a significant improvement in MT quality. Neural networks technology is quite possibly able to get more out of the already existing data, i.e. the improvements would next come from the MT engine instead of the data amount. For example, Facebook is already using this neural network technology in its own MT engine, and with good results. Also Google plans to follow.

If neural network technology indeed proves to be better than the current statistical MT engines, there is a good reason to expect better MT quality in very close future.


Published by

Multilizer / Niko Papula

I am managing director of Multilizer, a Finnish software company specialising in software for enhancing translation quality, speed and cost.