Efficient Keyword
Search for Smallest Lowest Common Ancestors in XML Databases
- We have developed the Indexed Lookup Eager
Algorithm, Scan Eager Algorithm to answer keyword search queries.
Experiments have shown that the Indexed Lookup Eager algorithm outperforms
the Scan Eager algorithm and the Stack algorithm from prior work, often by
orders of magnitude when the keywords in the query have quite different
frequencies, and loses only by a small margin when the keywords have similar
frequencies. Hence in the demo, only the Indexed Lookup Eager
algorithm is used.
- The DBLP data used in this demo is from http://www.informatik.uni-trier.de/~ley/db/index.html
, contains publication up to the year of 2001. The size is 83MB.
As an example, the first 10KB
DBLP data is provided.
- The DBLP data have been first grouped by
conference and then grouped by year under each conference. If
two author names like ``John Ben'' are queried, they may have several
relationships : they might be coauthors of some papers; they might have
published in the same conference. In the second case, their
relationship is loose and the lowest common ancestor of the corresponding
keyword occurrences is a conference element. Since the basic
presentation unit of this demo is designed to be ``paper'' element, the
second relationship
is not visualized in the result of the query. Instead the papers of both
authors from the same conferences are presented as the result.
However conference names are shown along with papers, which gives a hint to the
relationship of the authors.
- Notice that because of the encoding used in the
DBLP data, keywords like "Jérôme", "Jerome" lead to different answers.
XKSearch uses Berkeley DB from Sleepycat Company as the underlying B tree
implementation.
If you have any comments, please send email to yxu@cs.ucsd.edu.
back to Yu Xu's homepage.