Why-PDF-is-difficult-to-translate

Why PDF can be such a difficult format to translate?

PDF is a very popular file format. It’s easy and compact, perfect for sharing information. For such a nice file format to share and read, translating a PDF document can be a real challenge – for both machines and professional translators.

Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it. (Wikipedia)

If all PDF documents were like the Wikipedia definition above states, PDF translation to other languages would be easy – always and without exceptions. Unfortunately this isn’t the case. Despite the .pdf in the file name PDF documents can be very different.

Professional translators usually prefer the original format like word or power point because they are easy to process. When PDF is the only available format the translation work can be very slow. Struggling with a difficult file isn’t the dream task for a translator.

This is why many translators have higher rates for PDF translation work. The reason for this isn’t to earn more money but to compensate the extra work and headache that a PDF translation often requires.

Many translation professionals use computer-assisted translation (CAT) tools and translation memories to improve their work efficiency. Unfortunately, almost all tools are totally useless if the computer can’t read the text of the original PDF file.

This is the main reason why machine translators can’t process all PDF files. What seems to be text for the human eye may not be text from the technical perspective. For example, scanned documents are technically images. Without (and sometimes even with) an optical character recognition (OCR) software the reading of any scanned and embedded texts is impossible for a machine.

There are also different types of tools to edit a PDF file but they tend to be very expensive. Furthermore, the same technical difficulties as with translator tools apply to these editors too. Some PDF files can be modified but not all.

Luckily, some documents can be read automatically. Still those tools aren’t perfect and the outcome may not be exactly like the original. Often the original layout and alignment is something that needs to be preserved with the translation. Some information might be lost if the layout changes or any of the original elements are eliminated.

On top of everything else, PDF files can be protected with a password. This means that machines and tools can’t read the content without unlocking the file first. Passwords are lost all the time.

Sometimes the only option is to do the work manually all the way from start to finish. The process includes many steps, including reading, rewriting, translating, editing, designing and proofreading. All this takes time and increases the total translation costs. Machine translation can be fast and cheap but, like mentioned earlier, the layout or other visual information may be lost during the process.

Overall, even if the translation can be made, the original layout of the document may not work for the translated content. PDF is a visual format and the layout is designed for the original language. A new language may need less or more space, and the direction of the text may change, for example.

All this can sound like the PDF is the worst file format on this planet. It isn’t. Like said in the beginning, PDF documents are very useful. Many of them are clear and easy also from the technical perspective. The greatest benefit of choosing a PDF format is that the document will look nice to everyone regardless of the equipment. It’s just good to know that there are many types of PDFs and some of them are very challenging to translate.

 


 

translate your pdf filesGet the gist of a foreign PDF document easily. Multilizer PDF Translator translates PDF files automatically. Learn more.

4 thoughts on “Why PDF can be such a difficult format to translate?”

  1. El PDF es un formato flexible para manejar, pero depende del documento y de la calidad de la fotografía para poder procesarlo. En algunos PDF se incrusta el material en fotografía, y en estos casos se debe trabajar primero con un editor de imágenes para lograr un contraste aceptable y legible. Es un proceso algo complicado, pero vale la pena, y al final los “jeroglíficos” tienen sentido.

  2. El PDF es uno de los formatos que más se utilizan hoy en día junto al .doc y hacer que sea difícil de traducir no sería algo inteligente la verdad, excelente artículo

  3. as PDF is difficult to translate. but some companies prefer PDF translations as the format of documents remains constant. but for professional translators it is very hard compared to word format

Comments are closed.