Is it possible to get the text of a document being shown by the Flash Player?

7

3

I was searching for something, and found a document on a website. It is showing it using the Flash Player. When I wanted to save a copy by clicking "Download this document", it says needs to register, and registration is for free.

So I entered my email address and a simple password to register. And then it showed: needs US$0.30 to buy this document and download it.

Besides feeling a little bit cheated, I wondered, is there a way to save the document as a txt or pdf if it is being shown by the Flash player?

nonopolarity

Posted 2010-01-20T17:16:04.880

Reputation: 7 932

Answers

6

Depends on site engine. Flash viewers often download the real content from site with a dynamic HTTP request. You can try to intercept such a request and view it's contents with a Mozilla addon named Firebug: http://www.getfirebug.com/.

Download the addon, then travel to site, turn on Firebug panel (click on grey bug in right side of statusbar), select the 'Net' tab and click 'Flash' button. Do not close the panel and then reload whole page. All requests made by Flash plugin will be shown in Firebug. You can save their contents with context menu.

I can suggest that you may analyze contents of intercepted requests with some program that supports magic value searching; this is surely file under *nix and probably WinHex on Windows (but I did not checked the latter).

Also you can try to analyze the <object> tag that is includig Flash Player to the page. Sometimes there is bare link to some file passed to the player, but this is unlikely to happen on paid site.

whitequark

Posted 2010-01-20T17:16:04.880

Reputation: 14 146

you can get those documents at docin and doc88 from http://www.google.com/search?hl=en&q=%22%E4%B8%80%E6%B4%BE%E7%9B%B8%E4%BF%A1%E6%89%8B%E6%AE%B5%22&btnG=Search&aq=f&aqi=&oq= you can see the PDF on the website. The difficult thing is to download them using Firebug and then be able to show them locally.

– nonopolarity – 2010-02-04T08:08:02.060

Okay. The encryption that docin.com uses is absolutely unknown to me, but I determined that doc88.com probably uses software from http://cryptbot.com, through I was unable to extract the key: it's probably buried deep into the flash viewer.

– whitequark – 2010-02-04T19:22:20.763

Tag along to this comment thread. For those of up working with docin/doc88 files, is there any solution to get these into PDF? – William Entriken – 2012-09-18T15:49:11.893

it really does work. Now I have a 2.2MB file, but the only catch is that it is not openable by Adobe Acrobat and if I rename it as .txt and open it with Firefox, and try UTF-8 or Big5 or GB, or UTF-16, no meaningful text will show. So I suspect it actually does some encryption. For example, it can just XOR each byte with some constant to encrypt it, and it will be too troublesome to try and decrypt it. For example, maybe it XOR with a number for byte 1, 3, 5, 7, 9 and XOR with a different number for byte 2, 4, 6, 8, 10... and it is really to troublesome to find out – nonopolarity – 2010-01-20T19:06:21.507

I think this is simply not PDF or plaintext. Humanity created thousands of formats, and not all of them are readable by Adobe Reader... Can you post somewhere first 64 bytes of this file, maybe as a hex dump so a 'magic number' check can be done? (Magic numbers are special combinations of bytes that can be used to almost exactly identify file format). – whitequark – 2010-01-20T19:26:19.607