0
I'm looking for an easy toll which can crawl the web I give it to and extract all text elements from that. It will be nice if it can do: plain text, alt and title for images, header section. All of the separately if possible. The output should be somehow searchable or text files (xml) for every page it crawled. I need these text to pass them to translators.
There are plenty of web crawlers. Here are a few - opensource and python
– Praveen – 2012-10-18T13:09:22.080