Title:
Omini: A Fully Automated Object Extraction System for the World Wide Web
Omini: A Fully Automated Object Extraction System for the World Wide Web
Authors
Buttler, David John
Liu, Ling
Pu, Calton
Liu, Ling
Pu, Calton
Authors
Advisors
Advisors
Associated Organizations
Organizational Unit
Series
Collections
Supplementary to
Permanent Link
Abstract
This paper presents a fully automated object extraction system - Omini.A
distinct feature of Omini is the suite of algorithms and the automatically
learned information extraction rules for discovering and extracting objects
from dynamic Web pages or static Web pages that contain multiple object
instances. We evaluated the system using more than 2,000 Web pages over 40
sites. It achieves 100% precision (returns only correct objects) and
excellent recall (between 93% and 98%, with very few significant objects
left out). The object boundary identification algorithms are fast, about
0.1 second per page with a simple optimization.
Sponsor
Date Issued
2000
Extent
531766 bytes
Resource Type
Text
Resource Subtype
Technical Report