Adapting information retrieval systems to contexts: the case of query difficulty


Université de Toulouse Paul Sabatier


Prof. Josiane Mothe

Defense Day

June 15th, 2015


Information Retrieval, Machine Learning, Difficult Queries, Selective Information Retrieval, Query Expansion, Disambiguation, Classification


In the search engine environment, users submit queries according to their information need. In response to a query, the system retrieves and displays a list of documents which it considers of interest for the user. The user checks the results and decides what information is actually relevant to his information need. However, retrieved documents for some queries may not be satisfactory for the user. The queries for which the search engine cannot retrieve relevant information are called difficult. The query difficulty represents the research context of this thesis. We specifically aim at adapting the information retrieval systems with respect to difficult queries, in order to improve the retrieval quality.

Term ambiguity may be the cause of difficulty. For example, the query “orange” is ambiguous, since the engine does not know whether it refers to the fruit, to color, or to the telephone company. We have developed a method for disambiguation of queries in order to get better search results. This method works best on difficult query, result that motivates our research in predicting the difficulty. We also provide combinations of the difficulty predictors to improve the prediction quality of individual predictors. The search results can also be improved by expanding queries, process that adds terms to the initial query. We propose an automatic method of learning to classify queries according to query expansion variants. Finally, we try to optimize a parameter to improve a query expansion model.