19
2
Lately, I have seen that scribd makes it very difficult for users (free users) to browse through a document hosted on their site. There is no ability to search within a document, let alone being able to download the same.
Using javascript, they load pages on demand in the browser, and so the browser's "save as" feature does not help much.
To my amazement, I saw that even copy/ pasting text copies gibberish to the clipboard! To check out what was wrong, I turned off javascript in the browser and then loaded the same document again. Voila, I did see the gibberish. And so, it looks like the javascript from scribd somehow decodes the gibberish text and then displays it in the browser.
Now, my question is, even after javascript is enabled, and the text is rendered properly in the browser, if I go and look at the DOM objects corresponding to the text I select, I still see the gibberish text.
So, now, I am confused. The text is displayed alright to the user, but the DOM objects still contain gibberish. So the question is, what kind of javascript hooks/ code is the site using, so as to be able to retain the gibberish in the DOM objects and still render the decoded text?
Is there a way I can access the decoded text? My intention is not to reverse engineer the algorithm to decode, but to locate where the decoded text is being stored?
Example document is:
See what happens when you turn Javascript on/ off!
Its sort of simple. They created a javascript pdf viewer. Mozilla did something with Firefox. Since your PDF viewer is not actually being used to display the content they can control nearly every aspect of the viewing experience. – Ramhound – 2013-06-19T11:53:51.097