Projektseminar
Web Data Integration and Data Management
Summer 2018
Technical Data
- Advanced Bachelor or Master/Diploma in
Applied Computer Science or Information Systems (Wirtschaftsinformatik)
- Prerequisites/Vorbedingungen: Basic Knowledge in e.g. XML and/or RDF
- 6 ECTS
- Number of participants: max. 10
- Language: German and english are allowed. Reading of english text/documentation
is required.
Contents
There is a lot of data available in the Web and in the Semantic Web. Web data is usually
provided in a human-readable form of Web pages (including forms, the so-called Deep Web),
while it cannot be processed in a database-style way by users. Data Extraction,
e.g. from the CIA World Factbook or from Wikipedia, is thus a neverending "hot topic".
Apart from pattern-based approaches, also Natural Language Processing Approaches are
used.
The Semantic Web (cf. lecture Semantic Web) makes
some attempts to provide, extend and/or annotate Web Data towards a machine-readable way.
For this, the RDF data format is used, together with the OWL ontology language for
describing metadata.
Form of the Seminar
The intention of the seminar is to get an overview of the state of the art in data integration
from the Web and background data management.
For each topic, the following has to be done:
- a written tutorial-style paper that gives an overview of an
approach,
- evaluate some tools, write a report (installation, functionality,
usability, ...) [optionally german or english]
- prepare an illustrative medium-size case study using one or more tools
(optionally: comparatively)
- a presentation giving the tutorial and showing a demo of how to
use it (about 90 minutes incl. discussion; optionally german or english).
Time Schedule
- first meeting at the beginning of the semester:
Monday 16.4. 10h c.t. SR 2.101, IFI: First Meeting
Assignment of topics and papers.
- May: Meeting with each of the participants;
for that date there must be evidence that the
tutorial/presentation/case study will be successful.
- May/June: preparation of case studies and presentations, individual meetings
-
Registration/Deregistration in FlexNever is open until 30.6.2018.
- 16. July: presentations (10:30-12:00; 13:00-...)
Potential Topics
- Query Languages (mostly for RDF):
- Mark Kaminski and
Egor V. Kostylev: Beyond Well-designed SPARQL (ICDT 2016).
The paper is based on the SPARQL theory-practice papers by
Arenas/Gutierrez/Perez that have been discussed in the Semantic Web lecture [Requires SemWeb]
- Paolo Guagliardo and Leonid Libkin: Correctness
of SQL Queries on Databases with Nulls (SIGMOD Record
Sept.2017). The paper is about SQL; the talk should evaluate the paper and draw conclusions for SPARQL-style
queries (where nulls are common). [Requires SemWeb]
- An approach that presents a true algebra over RDF data.
Leonid Libkin, Juan L. Reutter, Domagoj Vrgoc:
Trial for RDF: adapting graph query languages for RDF data. PODS 2013: 201-212;
and Martin Przyjaciel-Zablocki, Alexander Schätzle, and Georg Lausen
TriAL-QL: Distributed Processing of Navigational Queries
(WebDB 2015) [Requires SemWeb]
- Jing Wang, Nikos Ntarmos, Peter Triantafillou:
GraphCache (EDBT 2017)
A Caching System for Graph Queries [Requires SemWeb]
- Other Topics:
- WebScrapping with OXPath/Diadem (Oxford Univ.)
[assigned]
Comment: Evaluate, do a case study, and describe. Based on XPath/XML. No RDF knowledge required, but XML/XPath
knowledge required.
- K. Venkatesh Emani,Karthik Ramachandra, Subhro Bhattacharya,S. Sudarshan
Extracting Equivalent SQL form Imperative Code in Database Applications and
DBridge: Translating Imperative Code to SQL
DBridge: Is a system for optimizing data access in database applications by using static program analysis and program transformations.
Tool might not be available...
- RDBMS
Kareem El Gebaly and Jimmy Lin
Afterburner: The Case for In-Browser Analytic
[assigned]
Afterburner implements an analytical RDBMS in pure JavaScript so
that it runs completely inside a browser with no external
dependencies. This approach should be compared to the
classical MonetDB (e.g. this
and many more papers are available about MonetDB) Explain
the overall structure of both, point out differences and as far
as possible try to find the individual strengths and weakness of
each. Both applications should be freely available(as
download/webtool)
- SQL Query Equivalence Prover
Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung and Dan Suciu
Demonstration of the Cosette Automated SQL Prover and Cosette: An Automated Prover for SQL
These papers present a new formalism and implementation for reasoning about the equivalence of SQL queries.
Explain the overall structure, the introduced formalisms and try to find out the limitations of this approach. We will provide some example queries which should be extended up to the limitation of the prover.
The application can be found here.
- Natural Language Interfaces: Compair ATHENA and the classical NaLix
Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, Fatma Ozcan
ATHENA: An Ontology Driven System for Natural Language Querying over Relational Data Stores and
Constructing a Generic Natural Language Interface for an XML Database
Explain how ATHENA works and show the similarities and the differences as well as the limitations of both approaches, especially point out the advantages of using XML or SQL.
Both applications are not freely available as far as we know, but it could not hurt to give it a try.
- Incomplete reasoning [assigned]
Xiaoxiao Gu, Tim Klinger, Clemens Rosenbaum, Joseph P. Bigus, Murray Campbell, Ban Kawas,
Kartik Talamadupula, Gerald Tesauro and Satinder Singh / Josef Weston.
Dialog-based Language Learning and
Learning to Query, Reason, and Answer Questions On Ambiguous Texts
Explain the general problem, show the strength and weaknesses of both approaches.
Note: Papers can be found via the DBLP
http://www.dblp.org
(originally, DBLP meant "Databases and Logic Programming", but by now
it covers all topics in Computer Science), or simply by searching for
the paper title with google (this often yields the pdf directly). A
list of other papers of the same authors can then be found via
DBLP. Note that many publishers' pages are only freely accessible from
university computers.
|