Title:
Omini: A Fully Automated Object Extraction System for the World Wide Web
Omini: A Fully Automated Object Extraction System for the World Wide Web
dc.contributor.author | Buttler, David John | en_US |
dc.contributor.author | Liu, Ling | |
dc.contributor.author | Pu, Calton | |
dc.date.accessioned | 2005-06-17T17:45:34Z | |
dc.date.available | 2005-06-17T17:45:34Z | |
dc.date.issued | 2000 | en_US |
dc.description.abstract | This paper presents a fully automated object extraction system - Omini.A distinct feature of Omini is the suite of algorithms and the automatically learned information extraction rules for discovering and extracting objects from dynamic Web pages or static Web pages that contain multiple object instances. We evaluated the system using more than 2,000 Web pages over 40 sites. It achieves 100% precision (returns only correct objects) and excellent recall (between 93% and 98%, with very few significant objects left out). The object boundary identification algorithms are fast, about 0.1 second per page with a simple optimization. | en_US |
dc.format.extent | 531766 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1853/6590 | |
dc.language.iso | en_US | |
dc.publisher | Georgia Institute of Technology | en_US |
dc.relation.ispartofseries | CC Technical Report; GIT-CC-00-22 | en_US |
dc.subject | Object extraction system | |
dc.subject | Web page discovery | |
dc.title | Omini: A Fully Automated Object Extraction System for the World Wide Web | en_US |
dc.type | Text | |
dc.type.genre | Technical Report | |
dspace.entity.type | Publication | |
local.contributor.author | Liu, Ling | |
local.contributor.author | Pu, Calton | |
local.contributor.corporatename | College of Computing | |
local.relation.ispartofseries | College of Computing Technical Report Series | |
relation.isAuthorOfPublication | 96391b98-ac42-4e2c-93ee-79a5e16c2dfb | |
relation.isAuthorOfPublication | fc48a3de-da43-4d32-af59-414047eb7cd7 | |
relation.isOrgUnitOfPublication | c8892b3c-8db6-4b7b-a33a-1b67f7db2021 | |
relation.isSeriesOfPublication | 35c9e8fc-dd67-4201-b1d5-016381ef65b8 |
Files
Original bundle
1 - 1 of 1