Semistructured Data and XML Summer 2020
Prof. Dr. Wolfgang May
Lars Runge, M.Sc.,
Sebastian Schrage, M.Sc.
Date and Time:
- Monday 14-16 ct,
IFI SR 2.101
- Wednesday 10-12 ct,
IFI SR 2.101
- Virtual Meetings: We will use BigBlueButton provided by GWDG;
the rooms/meetings can be entered via StudIP. There also the
recordings can be found (they cannot be exported or edited at all)
Please also read the general and technical information
about DBIS virtual teaching.
Lecture and Exercises mixed (see announcements on this page). There will be non-mandatory
exercise sheets whose solutions will be discussed as parts of the lecture.
Module M.Inf.1141, 4 SWS, 6 ECTS.
The module's home is the MSc studies in
Applied CS. It can also be credited in the BSc studies in Applied CS
(as "Vertiefung Softwaresysteme und Daten"),
and in several other studies:
BSc/MSc Wirtschaftsinformatik, Mathematik (BSc/MSc), Teaching/2-Fach-Bachelor, PhD GAUSS, ...
Course Description
One of the most important facts that lead to the overall success of XML
is that the "XML world" combines a lot of already known concepts in an
optimal way for coping with a broad spectrum of requirements.
The course will first review some of these preceding (partially even historic)
concepts (network database model, relational databases, object-oriented
databases) and the integration of data and metadata (SchemaSQL). Then,
the idea of "semistructured data" is introduced by showing early
representatives that helped to shape the XML world (F-Logic, OEM).
In the main part, XML is presented as a data model and a markup-meta-language,
and the current languages of the concepts of the XML world are systematically
investigated and applied: DTD, XPath, XQuery, XSLT, XLink, XML Schema,
and SQL/XML.
The lecture uses the geographical sample database "Mondial"
in its XML version for illustrations.
For practical exercises, the XML software is installed in the IFI CIP
Pool.
The software playground page can be found
here;
the XPath/XQuery/XSLT Web interface is available
here.
The sample code fragments can be found in the CIP pool under
/afs/informatik.uni-goettingen.de/course/xml-lecture/
.
Dates & Topics
- Mon 20.4. 14ct-16: No official meeting. Instead, we have
an inofficial test meeting (joined also by the participants of the Deductive Databases lecture
and the SQL lab course)
using the BBB tool (enter via StudIP) where DBIS people are present, and where we
can answer questions.
- Wed 22.4.:
[I am happy that we met virtually: 54 participants in StudIP. The seminar room would have been much too small.]
Administrativa, Overview, ...
The lecture is intended to have three dimensions:
- A typical technology course (XML and its languages) with much practical contents.
Perfect for self-learning and experimenting.
- XML is a perfect example of many computer science concepts (practically: around data,
data exchange and interoperability, theoretical: trees, language design)
By now (2020) most of
you have already practical experience with many of them: ant, maven (XML), JSON (a
form of a much newer "lightweight" reduction of XML).
Learning: Always be aware and analytical of the underlying concepts and structures.
- "History": how did (do, and will do) computer science and IT concepts evolve?
The "XML world" (and many other things, like most basic Web technologies) were
developed as a response to new requirements in the mid/late 1990s.
A very interesting time, where many already existing ideas and experiments were improved
and combined.
Learning: see how to analyze and question requirements and existing things/software.
As I have nothing to write on ... reuse old Smartboard notes:
document and data trees [ss16]
This graphics (mainly its left side on data models) from yesterday's DD lecture:
"concepts and buzzwords".
- 27.4.:
timeline of data models [1718]
Introductory Presentation "XML"
[Slides]
- 29.4.:
Intro, General concepts and notions of the database area.
Slides: Relational Model
Earlier database models, concepts and extensions:
Basic Concepts and Notions; example and recall: relational model.
Slides: data models
-
Reuse old Smartboard notes:
Rule-based language design, Network data model as "book"
Reification in Modeling [17/18]
Reification in Modeling (UML Assoc Classes) [ss19]
CORBA [ss16]
- 4.5.: Earlier database models, concepts and extensions:
Network data model, ...
- Some references to read about database history (optional):
- 6.5.: "History" continued: ... ODMG
- 11.5.: "History" continued - academic prototypes: SchemaSQL (an
extremely powerful, yet syntactically minimal "opening" of SQL to metadata) and
early semistructured data models (Tsimmis/OEM/F-Logic)
Slides: early semistructured data models
- 13.5.: Early semistructured data models (OEM, F-Logic);
- 18.5.: XML: data model, language, DTDs etc.
Slides: XML basics
- 20.5.: XML DTDs
- 25.5.: XML: data model, language, DTDs etc. (cont'd)
notes
Exercise Sheet 1
(XML basics, parsing, grammar aspects)
- 27.5.: XPath: navigation and addressing language for XML
Slides: XPath
- 1.6. NO LECTURE (Holiday)
- 3.6.: XPath (cont'd)
Recall slide from last time: XML Axes for XPath
XPath position functions (local) with graphics
- 8.6.: XPath (cont'd)
- Discussion of Exercise Sheet 1:
Solutions to Exercise Sheet 1. If some of the solutions should be
discussed or presented, or you have questions, send me a mail.
- Concerning Exercise Sheet 2: if there are questions etc.,
the RocketChat dbis channel can be used
(also participants are encouraged to answer questions from others) - we might use the
advantages of online courses.
- 10.6.: On demand: comments on the solutions of Exercise Sheet 1.
XPath (conclusions), XML Query Languages: History/Evolution - XQL
XPath tree navigation sketch (pdf)
Slides: XQuery note: for
experimenting with XQuery, using saxon from the command line (on
the IFI CIP Pool computers or install it on your own computer)
provides better error messages than the Web Service.
- 12.6. first "news" about the exam (see below)
- 15.6.: XML Query Languages (cont'd): XML-QL and then XQuery
Notes of today (rule metaconcept, index-based eval)
- 17.6.: XQuery (cont'd)
- 22.6.: XQuery (cont'd)
Solutions to Exercise Sheet 2
Sketch for Ex. 2.5 (axes), and functional "if"/"case" statement
Exercise Sheet 3 (XQuery)
- 24.6.: XQuery (cont'd)
- 29.6.: XQuery (cont'd)
- 1.7.: XML Query Languages (cont'd), XSLT
Slides: XSLT
- news on the planning of the exam, see below
- 6.7.: XSLT
Exercise Sheet 4 (XSLT)
Solutions to Exercise Sheet 3
- 8.7.: XSLT
comment on the "failed" XSLT-to-XQuery example: after "for $y in $c//city", the "return" was missing.
Was easy to debug with saxon on command shell, when linenumbers are returned.
Nevertheless, unclear why this also sometimes occurred in the WebService when these lines were commented out ...
- 8.7. 14-16h,
(optional, belonging to the SQL lab course, the meeting will be shared for both courses in StudIP): XML in SQL/SQLX
Slides: XML and Databases
SQL Web interface (in addition to the
Mondial tables, the tables mondial, countryXML, cityXML used on the slides exist)
- Announcement:
Praktikum/Lab course XML WS2020/21
- Announcement:
Semantic Web WS2020/21
- (General) information about E-Exams
(click upper right for German language)
- 13.7.: XSLT
- 15.7.: XML Schema
Slides: XML Schema
(note: the XML Schema atomic datatypes are also used in the RDF data model)
Solutions to Exercise Sheet 4
- Note: the rest of the slide set belongs to the XML lab course
which takes place in the winter term 2020/21.
- 17.7.2020 End of lecture period.
- We will add further information on the exam here.
It is planned to have one or more (optional, recorded) online-meetings before.
Maybe we can share a video what the exam will look like.
- (added 15.7.): The exam will be an "open-book-exam", i.e., you can use documentation
whatever you want (but it is intended that you should not need it).
recommended: print the slides, just to have them and to have a better feeling. Put
notes wherever you need them. List keywords and pagenumbers that you need on the first
page.
Strongly recommended: prepare a "cheat sheet" (German: Spickzettel) where you put
everything that you may want lo lookup quickly. This preparation also helps to become
aware of the material.
Details of the syntax of the pre-XML history section are not relevant. That section is
for understanding the concepts and the problems.
- to be extended
- 21.8. 10-12 Exam, see below
Exams
This year, the exam will (most probably) be a written exam (last
year, the number of oral exams became too high). Dependent on the
number of participants and the situation, we might also consider
some kind of online-exam using some online-education tool.
The current situation (4.7.) is as follows:
- The exam will most probably take place on August 21st:
- Plan A: a computer-based exam (as in the "C programming course" in our BSc) in the E-Exams room
(Central Campus, building MZG "Blue Tower", room 1.116).
Two rooms have been booked for 9-13h, the concrete "working time" is currently planned for 10-12.
Due to the entrance regulations, there might be additional requirements that will be communicated
later.
Technically an E-exam as intended (using an editor and command line saxon under the e-exams-Windows
environment) is possible.
- 10.7.: There is some progress: Saxon has been installed. It will be called via Command-Line (cmd), as
usual for the E-exams under Windows. For this, and for the editor (Notepad++), buttons will be
added to the "Safe Exam Browser" that is used in e-exams.
- 13.7.: some more progress: ...
- 25.7.: we had a visit of the E-exams room yesterday. Basically, everything is clear:
- (use the big stairs in the entrance hall, there, the waiting zones and procedure are explained
- everybody will receive shortly before the exam a user name/number and a password.
- when you enter the room, there is an identity check booth where you have to "check in".
- wear a mask until you sit down at your place. Note that only healthy persons are allowed to
participate.
- Technics:
- The (Windows!) screen then shows 3 windows: photo 1,
photo 2.
You can move and resize them (all not very surprising):
- (1) (white) the Windows Notepad++ editor.
- (2) (black) Windows command shell, in working directory "...\Desktop\saxon". On the first picture,
you see (hard to read) that
this was prepared for testing with mondial-europe.xml, mondial.dtd, a stylesheet and a query file.
For the exam, it contains (pre-prepared) files exam.xml (with the XML prologue and reference to exam.dtd,
so you don't have to learn this syntax stuff), exam.dtd (empty), exam.xsl
(with a <xsl:stylesheet xmlns:xsl="..."> ... </xsl:stylesheet> frame).
The saxon.jar is there, and executable files saxonValid.bat, saxonXQ.bat and saxonXSL.bat are given.
The call syntax is then the same as in the course. It will also be available on the pdf.
- (3) (which shows the place number "53" there) the "safe exam browser" where the "ILIAS" exam system runs.
When you arrive on your place, do not yet log in. This will be done for all at the same time.
After opening it, you can resize the answer fields.
We cannot have a test exam there, but the SS2008 exam
will be available (in english, next week?) in a "testing ILIAS" via StudIP.
- internet access is disabled in the "safe exam browser" used in the E-exams room. In general,
only very restricted applications are available. Basically, there are the three above-mentioned
windows. Note that there is only one Notepad++ possible, but you can open and see several files in
subwindows (tabs) or windows (e.g. XML and DTD), as demonstrated in this
Video.
- work on the files exam.xml, exam.dtd, exam1.xq, ..., exam.xsl using the editor, call saxon in the
command shell, and at the end copy the files into the ILIAS answer fields (we will explicitly
mention this at the end of the exam).
This is, what we finally get for grading it.
- There will be a pdf exam sheet with the text and maybe some questions to write/draw on the paper.
So bring a pen with you.
- Exam:
- The teaching staff sits in the middle of the room, and we see all participants from behind.
So it is possible to contact us by raising the hand and maybe a short "hello!".
- We plan not to come to your places, but there is a chat functionality, but only after raising
the hand, because we must then explicitly
open the chat channel to a certain person (from a computer inside a glass box, so the chatting
person will not be able to see you). It is also possible to share the screen, so in case you have
a problem (mainly small, like the typical programming
"I don't see why this syntax is wrong"-problems) contact us by raising the hand and/or a short
hello (we will see how this works).
- Like in a "paper exam", also solutions that do maybe not work (completely) can be delivered and
will be graded with appropriately partial points.
The above chat-help is primarily intended for helping in "small" syntax problems, not for
bigger questions.
- Hints for training:
- you need to design a small, but useful XML instance from the given text (which will contain
sample data as in the earlier exams). Note that for queries in the "all x such that for all y ..."
style, it is helpful to have such x,y in the instance.
- have a plan, "experience", how to edit an XML file quickly with copy+pasting elements.
This can be much faster than in a paper exam where the chatty XML stuff must be written
several times.
Choose short element names and attribute names. You may also abbreviate text contents like
person names to initials.
- Current state: 26 participants registered in FlexNow
- 31.7.
- added the English pdf version of the 2008 exam (which serves as a base for a
mock online "test" exam in StudIP/ILIAS).
- A computer-based ILIAS mock exam is now available via StudIP:
Go to the lecture's StudIP page, use the "Learning Modules"/"Lernmodule" button,
then, "Exam SSD SS2008" should appear. Click on it and it should work. Experiment with it.
In case of any questions, or in case something does not work, preferably use the
RocketChat:dbis channel.
- Official document on E-exams during the Corona time
- This general E-exam info sheet gives some abstract overview.
It will be there at the exam. (same in German)
- Current state: 28 participants registered in FlexNow
- 12.8.: Note that internet access is not possible with the "safe exam browser" used in the E-exams room.
current state: still 28 registrations.
- 17.8.:
-
Video about handling Notepad++
- For the SSD/XML exam, the english language variants of Notepad++ and ILIAS are activated, but you can
switch to German.
- Notepad++ provides a restricted syntax highlighting for XML and completion, it should recognize
this by the file extension "bla.xml".
- In the E-Exams room, the keyboards have GERMAN layout. The symbols [,],{,},\ etc. depicted on
the lower right of the keys are invoked with "AltGr" and the respective key.
- 18.8.: The keyboards in the E-exams room are all German-layout keyboards.
The current state is that the participants cannot switch them by themselves to
US-layout, but that it is possible that the organizers there switch invididual
keyboards IN ADVANCE to US layout (note that then still, the keys show the
german symbols ...).
Persons who want to have the keyboard switched to US should send me a mail until
tomorrow (19.8.), 18:00. Then, I will forward these names
to the organizers.
- 19.8.: each participant can switch the keyboard layout (software) individually.
Note that the hardware will then still show QWERTZ...
- 20.8. organizational issues (could not be sent
directly via FlexNever since quite some characters are not accepted, and also not
all exam participants are registered in StudIP)
-
Plan B fallback: a classical written exam.
For that we got a reservation for 21.8. 9-13h in ZHG 011 (big enough).
In that case also, the concrete "working time" is currently planned for 10-12.
- In both cases, the exam will be of the same style (similar to the written exams in the
years 2004-2011, see below).
In case we have a computer-based exam, the plan is that you use the "answer fields" as usual
like "programming on paper", but that optionally you have the possibility to run saxon
for DTD-validation, XPath/XQuery and XSLT.
- Exam 2020 without solutions
- Exam 2020 with solutions
Some hints for Exam preparation
- Understanding the important concepts and to know how to apply them.
- Theoretical concepts around XML (DTD, regular expressions).
- "Programming" on Paper (in case we get the E-exam running: in practice)
with XML, DTD, XPath, XQuery, and basic XSLT constructs.
- "History"-Sections: understanding of the crucial issues of each of the concepts; how they
contributed to XML, and what can be found of them in XML&friends.
- The written exams from previous years are also a good preparation (hmm... most of them they are only
available in German, but still give some overview what they looked like):
|