Keyword search has been seen in recent years as an attractive way for querying data with some form of structure. Indeed, it allows simple users to extract information from databases without mastering a complex structured query language and without having knowledge of the schema of the data. It also allows for integrated search of heterogeneous data sources. However, as keyword queries are ambiguous and not expressive enough, keyword search cannot scale satisfactorily on big datasets and the answers are, in general, of low accuracy. Therefore, flat keyword search alone cannot efficiently return high quality results on large data with structure. In this dissertation, the algorithm improve keyword search over databases by exploiting semantic information of the data and by extending flat keyword queries with semantic information. First, it develop an algorithm for keyword search over graph databases which exploits tree canonical forms and techniques developed for mining tree patterns. The algorithm substantially reduces the number of redundant intermediate results generated, which is the bottleneck of query evaluation algorithms. Our experiments show that it outperforms previous algorithms by one to two orders of magnitude in terms of efficiency and memory consumption and displays smooth scalability. Furthermore, the algorithm leverages semantic information of the data to address the aforementioned problems. The method follows a schema-based approach for evaluating keyword queries on relational databases, which computes patterns mapped onto the schema graph of the database. Pattern graphs are representatives for clusters of query results. As such, they are much less numerous than the actual query results and can be translated into SQL queries on the relational database which can produce the results in the cluster. Our pattern graphs allow keywords to match schema elements and capture key-foreign key relationships and inclusion relationships. The the system employ information-retrieval-based and semantics-based techniques for scoring query pattern graphs and design an efficient top-k algorithm for computing the patterns graphs of a keyword query. Finally, the dissertation study employing keyword queries enhanced with cohesiveness constraints (cohesive keyword queries) to query relational databases. Cohesive keyword queries bridge the gap between flat keyword queries and structured queries. The dissertation formally define semantics for cohesive queries on relational databases and design an efficient evaluation algorithm. The experimental results show that cohesive keyword queries substantially improve the quality of the results of flat keyword queries and the performance of their evaluation. Most importantly, these improvements are attained without compromising the simplicity and convenience of traditional keyword search.
|