Dagstuhl-Seminar "Declarative Data Access on the Web",
Schloss Dagstuhl, Germany,
September 12-17,
1999.
Dagstuhl-Report No. 251.
Information Extraction from the Web with FLORID
Wolfgang May
Abstract:
The talk presents an integrated architecture where Web exploration,
wrapping, mediation, and querying is done in a monolithic system. The
system is based on a unified framework -- i.e., data model and
language -- in which all tasks are done. We regard the Web and its
contents as a unit, represented in an object-oriented data model: the
Web structure, given by its hyperlinks, the parse-trees of Web pages,
and its contents all becomes part of the internal world model of the
system. The advantage of this unified view is that the same data
manipulation and querying language can be used for the Web structure
and the application-semantic model: The model is complemented by a
rule-based object-oriented language which is extended by Web access
capabilities and structured document analysis and allows for accessing
the Web, wrapping, mediating, and querying information. Due to this
integration, a system in this architecture can be equipped with Web
navigation and exploration functionality. We present generic rule
patterns for typical extraction, integration, and restructuring tasks
using this framework. We show the practicability of our approach by
using the FLORID system. The approach is illustrated by two
case-studies.
[Slides]
|
|
|