2015
Conferences
Radu Tudor Ionescu; Adrian-Gabriel Chifu; Josiane Mothe
International Symposium on String Processing and Information Retrieval, SPIRE2015 Springer 2015.
Abstract | Links | BibTeX | Tags: Document Topic Distribution, Information Retrieval, Kurtosis, LDA, Ranking Retrieval Systems, Skewness, Topic Modeling
@conference{ionescu2015deshato,
title = {DeShaTo: Describing the Shape of Cumulative Topic Distributions to Rank Retrieval Systems without Relevance Judgments},
author = {Radu Tudor Ionescu and Adrian-Gabriel Chifu and Josiane Mothe},
url = {https://oatao.univ-toulouse.fr/15354/1/ionescu_15354.pdf},
year = {2015},
date = {2015-09-01},
urldate = {2015-01-01},
booktitle = {International Symposium on String Processing and Information Retrieval},
pages = {75--82},
organization = {Springer},
series = {SPIRE2015},
abstract = {This paper investigates an approach for estimating the effectiveness of any IR system. The approach is based on the idea that a set of documents retrieved for a specific query is highly relevant if there are only a small number of predominant topics in the retrieved documents. The proposed approach is to determine the topic probability distribution of each document offline, using Latent Dirichlet Allocation. Then, for a retrieved set of documents, a set of probability distribution shape descriptors, namely the skewness and the kurtosis, are used to compute a score based on the shape of the cumulative topic distribution of the respective set of documents. The proposed model is termed DeShaTo, which is short for Describing the Shape of cumulative Topic distributions. In this work, DeShaTo is used to rank retrieval systems without relevance judgments. In most cases, the empirical results are better than the state of the art approach. Compared to other approaches, DeShaTo works independently for each system. Therefore, it remains reliable even when there are less systems to be ranked by relevance.},
keywords = {Document Topic Distribution, Information Retrieval, Kurtosis, LDA, Ranking Retrieval Systems, Skewness, Topic Modeling},
pubstate = {published},
tppubtype = {conference}
}
This paper investigates an approach for estimating the effectiveness of any IR system. The approach is based on the idea that a set of documents retrieved for a specific query is highly relevant if there are only a small number of predominant topics in the retrieved documents. The proposed approach is to determine the topic probability distribution of each document offline, using Latent Dirichlet Allocation. Then, for a retrieved set of documents, a set of probability distribution shape descriptors, namely the skewness and the kurtosis, are used to compute a score based on the shape of the cumulative topic distribution of the respective set of documents. The proposed model is termed DeShaTo, which is short for Describing the Shape of cumulative Topic distributions. In this work, DeShaTo is used to rank retrieval systems without relevance judgments. In most cases, the empirical results are better than the state of the art approach. Compared to other approaches, DeShaTo works independently for each system. Therefore, it remains reliable even when there are less systems to be ranked by relevance.
TRANSLATE with x
English
TRANSLATE with
Enable collaborative features and customize widget: Bing Webmaster Portal