Query, indexing and benchmark for XML-based bibliography databases

Pan, Chunrong
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Computer Science
Organizational Unit
Journal Issue

XML is becoming the de facto standard for data exchange over the Internet. In particular, many XML databases contain a large set of small XML documents with similar structure. It creates a new requirement to store and retrieve information from these XML documents efficiently. XML-based bibliography file is such a large set of XML documents with similar structure. The problem we are concerned is to build index and support query for this kind of XML documents in a fast and effective way. Since each document is small, the query result is not necessarily a tree or subtree of XML document. We can return the document ID for each query in order to retrieve the entire document. In order to solve this problem, we propose to store a set of XML documents into B+ tree inverted files and query the information based on B+ tree structure. This project uses XML as a bridge and combines the database and information retrieval into one application with supporting storage, indexing and querying. It allows creating index based on the entire path or keywords so that it could retrieve the document ID for keyword query and path query. Based on the assumption that the document is small, this project can deal with the queries with simple path structure or terms very efficiently. We have constructed DBLP B+ tree, DBLP Author B+ tree and DBLP Title B+ trees. All these B+ trees provide the user with a good opportunity to search and retrieve the information from a large xml-based bibliography database very efficiently. We have conducted many experiments to test the performance of construction time for each B+ tree and to observe the query efficiency.

Computer science