1
2
I have several PDF documents (such as this one) that appear to be written using standard Chinese ideograms, but when I extract the text, it turns out that it's encoded using characters from the Unicode supplemental private use areas.
Is there any reliable way to map from the private use characters back to the appropriate CJK characters?