Data Toolbar

Data Toolbar
Developer(s)	DataTool Services
Operating system	Microsoft Windows
Type	Browser toolbar, Web scraping
Website	www.datatoolbar.com

Data Toolbar is a Web scraping computer software add-on to the Internet Explorer, Mozilla Firefox, and Google Chrome Web browsers that collects and converts structured data from Web pages into a tabular format that can be loaded into a spreadsheet or database management program.[1]

Algorithm

The program implements a variation of the genetic tree matching algorithm with respect to nested lists.[2] That is, inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm.[3]

Features

Collection of data and images directly from the Internet Explorer
Collection of information from Details pages linked to the catalog
Automatic processing of multi-page catalogs
Support of irregular multi-row catalogs mixed with advertisement

Similar tools

Automation Anywhere - The Web Extractor is a part of the larger automation system
Easy Web Extract - Standalone application, Windows
Mozenda - Web based service
Newprosoft - Standalone application, includes an Agent, Windows
OutWit – Standalone Application and Firefox Extension
Data Scraping Studio – Standalone Application for Windows and Chrome Extension
Diggernaut – Web platform with standalone application for Windows, Linux, MacOS and Google Chrome Extension

Sources

"A guide to the mortgage banking industry's leading providers of high-tech products and services". The Journal for Mortgage Banking Professionals. Zackin Publications. 25 (2): 14. January 2011.
Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira A Brief Survey of Web Data Extraction Tools Archived 2011-07-06 at the Wayback Machine ACM SIGMOD Volume 31 Issue 2
Nitin Jindal, Bing Liu A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction Proceedings of the Tenth SIAM International Conference on Data Mining, 2010

gollark: That doesn't work, ABR doesn't have perms.

gollark: Indeed.

gollark: Idea: contaminate the milk so you'll have more to do.

gollark: Why do we *have* esobot? Does it do anything useful?

gollark: Okay, emergency contingencies continged.

External links

http://datatoolbar.com/

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "A guide to the mortgage banking industry's leading providers of high-tech products and services". The Journal for Mortgage Banking Professionals. Zackin Publications. 25 (2): 14. January 2011.

[2] Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira A Brief Survey of Web Data Extraction Tools Archived 2011-07-06 at the Wayback Machine ACM SIGMOD Volume 31 Issue 2

[3] Nitin Jindal, Bing Liu A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction Proceedings of the Tenth SIAM International Conference on Data Mining, 2010