Web Scrapper Tool for Data Extraction
Author(s):
Dhanse Sufyan , Al-Ameen College Of Engineering; Khan Abdul Qayume, Al-Ameen College Of Engineering; Malik Arjumand, Al-Ameen College Of Engineering
Keywords:
Web Data Extraction, Multiple Tree Merging, Schema, Vision-Based Page Segmentation, Web Page, Wrapper Generation, Web Mining
Abstract:
Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatic web data extraction is critical in web integration. A number of approaches have been proposed. The early work is most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this, we propose a novel approach that makes use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.
Other Details:
| Manuscript Id | : | IJSTEV2I12029
|
| Published in | : | Volume : 2, Issue : 12
|
| Publication Date | : | 01/07/2016
|
| Page(s) | : | 64-71
|
Download Article