Problem with /ToUnicode map only on macOS Preview

Problem with /ToUnicode map only on macOS Preview

I have a weird issue with a PDF and the contained /ToUnicode CMap that only affects macOS Preview, all other tested viewers work fine. The thing is I don't know whether the contained /ToUnicode CMap is at fault or Preview.

Here is the PDF in question: Read more and the Github issue where this problem popped up.

If that PDF is opened in macOS Preview and the text selected and copied, everything after "Hello from HexaPD" is wrong. Other viewers copy the whole text just fine.

Current status:

  • HexaPDF, the library generating the PDF, is using an optimization that avoids creating character codes containing the ASCII characters \r, (, ) and \. The reason is that those would need to be escaped when serializing as PDF literal string.

  • If this optimization is turned off, the resulting file (see https://github.com/user-attachments/files/19575820/example.pdf) works perfectly in macOS Preview (i.e. copy and paste works).

  • Removing the /ToUnicode CMap entirely leads to uncopyable text. This means that macOS Preview is indeed using this CMap and that it is the most likely culprit.

  • After reading the respective parts of the PDF specification and the "5014 Adobe CMap and CIDFont Files Specification" I think that the /ToUnicode CMap in both linked files above is correct.

  • Adding a dummy entry like <000D><0044> to the /ToUnicode CMap doesn't work.

Any insights into whether the generated /ToUnicode CMap is invalid or whether it is macOS Preview's fault are appreciated!

Answer

The problem seems simple enough as the characters are HEX encoded but have been incorrectly applied to a ( literal stream ) rather than a < hex encoded block > some viewers are able to "fix that on the fly".

enter image description here

The fix is simple, we transpose from hex to hex encoded text. Part1 = enter image description here

The problem now is that has corrupted all the addresses after this point! and the easiest way to fix that is ask Acrobat Reader or other library to save again.

enter image description here

The file size will sadly increase using Reader. So we may prefer a different method. enter image description here

However Acrobat Reader rewrote whist maintaining the PDF/A-3u integrity. enter image description here

Enjoyed this article?

Check out more content on our blog or follow us on social media.

Browse more articles