Guidelines for better machine translation quality

Do you always get a bad machine translation? Nowadays machine translation is often quite good when translating between the most used languages. So, if you think the quality is very low every time you’ve used automatic translation, the reason may not be the machine. It might be you.

You can affect the machine translation quality. And it’s not even hard. All you need to do is to take a look at your original text before you start the translation. A couple of small changes can make a big difference.

Here are some of the most easiest things you can do to improve the machine translation quality:

Write simple text. Simple text doeasn’t have to be stupid. Simple text is well thought, properly edited and nice to read. Both simplicity and complication are style choices but complex style makes it harder to deliver the message. Authors of difficult text may want to make a smart and intellectual impression. However, they only make readers feel stupid because they don’t get the point of the text. Machines and readers both like to work with simple text. So if you have a message to deliver to an international audience, write simple text with short sentences and think about your reader.

Fix typos. Machine translators aren’t very good at recognizing typos and other small mistakes. Wordswithoutspaces or worsd wiht smaal misstakse are gibberish to most machines. When we read our own texts our brains fix the typos and we don’t see them.  Luckily word processing tools can fix some of them. Maybe in the future a spell checker will be integrated to all machine translators. Today we still need to check our text first with a separate spell checker before using any machine translation.

Use clear sentence structures. Like mention earlier, short sentences are easier to translate than long sentences. A step further is to reconsider the sentence structures. If possible, always use the active voice. It is common in most languages. Due to the similarities of the active voice across many languages, machine translators can often read and find correct translations to active sentences. The passive voice is more challenging to translate because its use varies between languages. Often the passive voice also leads to complex references from one sentence to another what causes troubles to any machine translators.

Choose vocabulary wisely. The fourth thing that affects the machine translation quality is vocabulary. Idioms and sayings rarely translate correctly. Likewise words with several meanings are challenging. If there’s a chance that some sentence has an ambiguous meaning, correct it. Also avoid words that don’t have an equivalent in the target language. Statistical machine translation plays with probabilities, so make sure that the odds are on your side.

These simple tricks help you to get the best possible quality out of machine translation. However, automatic translation is still made by a machine and the quality probably won’t be totally flawless even with these tips. Machine translation is often enough when the goal is just to understand. If you need a perfect translation, a professional translator is your solution.


6 thoughts on “Pre-editing Guidelines for Better Machine Translation Quality”

  1. The problem is that guidelines such as “Write simple text” and “Use clear sentence structures” are so vague that they are pretty useless. If writers knew how to do those two things, then they would probably be doing them. My book, The Global English Style Guide, contains much more specific guidelines, most of which can help improve MT output. If you are not aware of the book, you might want to check it out. It’s available on Amazon and all the other major book e-tailers. Best regards, John Kohl

  2. Thank you for your comment, John Kohl.
    It is true that these guidelines are not very specific. However, I doubt that machine translators don’t know your style book and its guidelines. Don’t get me wrong. I’m sure your book is good resource for people who are writing English but it may not lead to better machine translation quality. And even the best guidelines for English won’t help the majority of machine translation users because they use other source languages. The main point is to know that machine translation quality is not given and that we all can affect the quality by editing the original text.

  3. I agree that many machine translations are from languages other than English. However, in the context of machine translations from English, your reply to John contradicts your blog post.

    In your blog post, you wrote, “A couple of small changes can make a big difference.”

    In your reply to John, you wrote, “I’m sure your book is good resource for people who are writing English but it may not lead to better machine translation quality.”

    Please clarify the contradiction.

  4. Thanks Mike for your question. I’m sorry that my comment wasn’t clear enough.

    My point was that machine translators don’t know the rules of any style book. Machine translation is often more or less statistical nowadays and the translation is based on probabilities. If the goal is to get the best possible quality out of machine translation, the original text needs to be either simple or include very common phrases that are well represented in the database.

    If one writes a piece of text based on John’s or someone else’s style guides it will be written to humans. The text will be nice to read but it won’t be optimized for machine translation. And that’s ok. This blog post however concentrates on machine translation.

  5. I see your point, Mike. Those who need more guidance for writing English for machine translation may find John’s guidelines useful in the Appendix B.

    Overall, it is important to see the difference between guidelines that are for human communication and those that are for machine translation.

