01 Jul 2020

Fair Exposure of Documents in Information Retrieval: a Community Detection Approach

CIRCLE2020

Keywords: Information systems, Information retrieval, Fair document exposure, Document network, Document communities, Document re-ranking

Conference PapersInternational Conference Papers Adrian Chifu, Josiane Mothe, Md Zia Ullah

Fair Exposure of Documents in Information Retrieval: a Community Detection Approach

Adrian Chifu, Josiane Mothe, Md Zia Ullah
Conference PapersInternational Conference Papers
About The Publication

While (mainly) designed to answer users’ needs, search engines and recommendation systems do not necessarily guarantee the exposure of the data they store and index while it can be essential for information providers. A recent research direction so called “fair” exposure of documents tackles this problem in information retrieval. It has mainly been cast into a re-ranking problem with constraints and optimization functions. This paper presents the first steps toward a new framework for fair document exposure. This framework is based on document linking and document com- munity detection; communities are used to rank the documents to be retrieved according to an information need. In addition to the first step of this new framework, we present its potential through both a toy example and a few illustrative examples from the 2019 TREC Fair Ranking Track data set.

01 Dec 2018

Feature Selection for Spectral Clustering: to Help or Not to Help Spectral Clustering when Performing Sense Discrimination for IR?

Open Computer Science, Volume 8, Issue 1, p. 218–227.

Keywords: word sense discrimination; information retrieval; query disambiguation; spectral clustering

International Journal PapersJournal Papers Adrian Chifu, Florentina Hristea.

Feature Selection for Spectral Clustering: to Help or Not to Help Spectral Clustering when Performing Sense Discrimination for IR?

Adrian Chifu, Florentina Hristea.
International Journal PapersJournal Papers
About The Publication

Whether or not word sense disambiguation (WSD) can improve information retrieval (IR) results represents a topic that has been intensely debated over the years, with many inconclusive or contradictory conclusions. The most rarely used type of WSD for this task is the unsupervised one, although it has been proven to be beneficial at a large scale. Our study builds on existing research and tries to improve the most recent unsupervised method which is based on spectral clustering. It investigates the possible benefits of “helping” spectral clustering through feature selection when it performs sense discrimination for IR. Results obtained so far, involving large data collections, encourage us to point out the importance of feature selection even in the case of this advanced, state of the art clustering technique that is known for performing its own feature weighting. By suggesting an improvement of what we consider the most promising approach to usage of WSD in IR, and by commenting on its possible extensions, we state that WSD still holds a promise for IR and hope to stimulate continuation of this line of research, perhaps at an even more successful level.

01 Dec 2015

Statistical Analysis to Establish the Importance of Information Retrieval Parameters

Journal of Universal Computer Science, Consortium J.UCS, Special Issue Information Retrieval and Recommendation, Vol. 21 N. 13 (2015), p. 1767-1789.

Keywords: Information Retrieval, query difficulty, query clustering, IR system pa- rameters, Random Forest.

International Journal PapersJournal Papers Julie Ayter, Adrian Chifu, Sebastien Déjean, Cecile Desclaux, Josiane Mothe.

Statistical Analysis to Establish the Importance of Information Retrieval Parameters

Julie Ayter, Adrian Chifu, Sebastien Déjean, Cecile Desclaux, Josiane Mothe.
International Journal PapersJournal Papers
About The Publication

Search engines are based on models to index documents, match queries and documents and rank documents. Research in Information Retrieval (IR) aims at defining these models and their parameters in order to optimize the results. Using benchmark collections, it has been shown that there is not a best system configura- tion that works for any query, but rather that performance varies from one query to another. It would be interesting if a meta-system could decide which system config- uration should process a new query by learning from the context of previousqueries. This paper reports a deep analysis considering more than 80,000 search engine config- urations applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which configuration responds best to a certain type of query. We considered two approaches to define query types: one is post-evaluation, based on query clustering according to the performance measured with Average Precision, while the second approach is pre-evaluation, using query features (including query difficulty predictors) to cluster queries. Globally, we identified two parameters that should be optimized: retrieving model and TrecQueryTags process. One could ex- pect such results as these two parameters are major components of IR process. However our work results in two main conclusions: 1/ based on post-evaluation approach, we found that retrieving model is the most influential parameter for easy queries while TrecQueryTags process is for hard queries; 2/ for pre-evaluation, current query fea- tures do not allow to cluster queries to identify differences in the influential parameters.

01 Mar 2015

Word Sense Discrimination in Information Retrieval: A Spectral Clustering-based Approach

Information Processing & Management, Elsevier, Vol. 51, p. 16-31

Keywords: Information retrieval, Word sense disambiguation, Word sense discrimination, Spectral clustering, High precision

International Journal PapersJournal Papers Selected Adrian Chifu, Florentina Hristea, Josiane Mothe, Marius Popescu

Word Sense Discrimination in Information Retrieval: A Spectral Clustering-based Approach

Adrian Chifu, Florentina Hristea, Josiane Mothe, Marius Popescu
International Journal PapersJournal PapersSelected
About The Publication

Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved doc- ument list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries.

28 Dec 2012

Word Sense Disambiguation to Improve Precision for Ambiguous Queries

Central European Journal of Computer Science, Versita, co-éditeur Springer Verlag, Londres - GB, Vol. 2 N. 4, p. 398-411

Keywords: Information retrieval, Word sense disambiguation, Naïve bayes classification, Difficult queries, Ambiguous queries, Document clustering, Fusion functions

International Journal PapersJournal Papers Adrian Chifu, Radu Tudor Ionescu

Word Sense Disambiguation to Improve Precision for Ambiguous Queries

Adrian Chifu, Radu Tudor Ionescu
International Journal PapersJournal Papers
About The Publication

Success in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.

01 Dec 2020

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ? (to appear)

Revue d'Intelligence Artificielle (RIA 2020)


Journal PapersNational Journal Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ? (to appear)

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Journal PapersNational Journal Papers
01 Dec 2019

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ?

Le Bulletin de l'Association Française pour l'Intelligence Artificielle (AFIA 2019)

Keywords: Sentiment analysis, Aspects detection, Criteria evaluation, Contradiction intensity

Journal PapersNational Journal Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ?

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Journal PapersNational Journal Papers
About The Publication

Reviews on web resources (e.g. courses, movies) become increasingly exploited in text analysis tasks (e.g. opinion detection, controversy detection). This paper investigates contradiction intensity in reviews exploiting different features such as variation of ratings and variation of polarities around specific entities (e.g. aspects, topics). Firstly, aspects are identified according to the distributions of the emotional terms in the vicinity of the most frequent nouns in the reviews collection. Secondly, the polarity of each review segment containing an aspect is estimated. Only resources containing these aspects with opposite polarities are considered. Finally, some features are evaluated, using feature selection algorithms, to determine their impact on the effectiveness of contradiction intensity detection. The selected features are used to learn some state-of- the-art learning approaches. The experiments are conducted on the Massive Open Online Courses data set containing 2244 courses and their 73,873 reviews, collected from coursera.org. Results showed that variation of ratings, variation of polarities, and reviews quantity are the best predictors of contradiction intensity. Also, J48 was the most effective learning approach for this type of classification.

11 May 2020

DeepNLPF: A Framework for Integrating Third Party NLP Tools

LREC2020

Keywords: Natural Language Processing, NLP tools integration, Framework

Conference PapersInternational Conference Papers Francisco Rodrigues, Rinaldo Lima, William Domingues, Robson Fidalgo, Adrian Chifu, Bernard Espinasse, Sébastien Fournier

DeepNLPF: A Framework for Integrating Third Party NLP Tools

Francisco Rodrigues, Rinaldo Lima, William Domingues, Robson Fidalgo, Adrian Chifu, Bernard Espinasse, Sébastien Fournier
Conference PapersInternational Conference Papers
About The Publication

Natural Language Processing (NLP) of textual data is usually broken down into a sequence of several subtasks, where the output of one the subtasks becomes the input to the following one, which constitutes an NLP pipeline. Many third-party NLP tools are currently available, each performing distinct NLP subtasks. However, it is difficult to integrate several NLP toolkits into a pipeline due to many problems, including different input/output representations or formats, distinct programming languages, and tokenization issues. This paper presents DeepNLPF, a framework that enables easy integration of third-party NLP tools, allowing the user to preprocess natural language texts at lexical, syntactic, and semantic levels. The proposed framework also provides an API for complete pipeline customization including the definition of input/output formats, integration plugin management, transparent multiprocessing execution strategies, corpus-level statistics, and database persistence. Furthermore, the DeepNLPF user-friendly GUI allows its use even by a non-expert NLP user. We conducted runtime performance analysis showing that DeepNLPF not only easily integrates existent NLP toolkits but also reduces significant runtime processing compared to executing the same NLP pipeline in a sequential manner.

01 Jun 2019

The R2I_LIS Team Proposes Majority Vote for VarDial’s MRC Task

6th Sixth Workshop on NLP for Similar Languages, Varieties and Dialects - colocated with NAACL 2019 (VarDial2019 @ NAACL2019)

Keywords: Dialect classification, Feature engineering, Majority vote, Competition

Conference PapersInternational Conference Papers Adrian Chifu

The R2I_LIS Team Proposes Majority Vote for VarDial’s MRC Task

Adrian Chifu
Conference PapersInternational Conference Papers
About The Publication

This article presents the model that generated the runs submitted by the R2I LIS team to the VarDial2019 evaluation campaign, more particularly, to the binary classification by dialect sub-task of the Moldavian vs. Romanian Cross-dialect Topic identification (MRC) task. The team proposed a majority vote-based model, between five supervised machine learning models, trained on forty manually-crafted features. One of the three submitted runs was ranked second at the binary classification sub-task, with a performance of 0.7963, in terms of macro-F1 measure. The other two runs were ranked third and fourth, respectively.

07 Apr 2019

On the Use of Dependencies in Relation Classification of Text with Deep Learning

20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing2019)

Keywords: Dependencies, Relation Classification, Deep Learning, Word Embedding, Compositional Word Embedding

Conference PapersInternational Conference Papers Bernard Espinasse, Sébastien Fournier, Adrian Chifu, Gaël Guibon, René Azcurra, Valentin Mace

On the Use of Dependencies in Relation Classification of Text with Deep Learning

Bernard Espinasse, Sébastien Fournier, Adrian Chifu, Gaël Guibon, René Azcurra, Valentin Mace
Conference PapersInternational Conference Papers
About The Publication

Deep Learning is more and more used in NLP tasks, such as in relation classification of texts. This paper assesses the impact of syntactic dependencies in this task at two levels. The first level concerns the generic Word Embedding (WE) as input of the classification model, the second level concerns the corpus whose relations have to be classified. In this paper, two classification models are studied, the first one is based on a CNN using a generic WE and does not take into account the dependencies of the corpus to be treated, and the second one is based on a compositional WE combining a generic WE with syntactical annotations of this corpus to classify. The impact of dependencies in relation classification is estimated using two different WE. The first one is essentially lexical and trained on the Wikipedia corpus in English, while the second one is also syntactical, trained on the same previously annotated corpus with syntactical dependencies. The two classification models are evaluated on the SemEval 2010 reference corpus using these two generic WE. The experiments show the importance of taking dependencies into account at different levels in the relation classification.

01 Jul 2018

Query Performance Prediction Focused on Summarized Letor Features

41st st International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR2018

Keywords: Query performance prediction, Query difficulty prediction, Query features, Post retrieval features, Letor features

Conference PapersInternational Conference Papers Selected Adrian Chifu, Léa Laporte, Josiane Mothe, Md Zia Ullah

Query Performance Prediction Focused on Summarized Letor Features

Adrian Chifu, Léa Laporte, Josiane Mothe, Md Zia Ullah
Conference PapersInternational Conference PapersSelected
About The Publication

Query performance prediction (QPP) aims at automatically estimating the information retrieval system effectiveness for any user’s query. Previous work has investigated several types of pre- and post-retrieval query performance predictors; the latter has been shown to be more effective. In this paper we investigate the use of features that were initially defined for learning to rank in the task of QPP. While these features have been shown to be useful for learning to rank documents, they have never been studied as query performance predictors. We developed more than 350 variants of them based on summary functions. Conducting experiments on four TREC standard collections, we found that Letor-based features appear to be better QPP than predictors from the literature. Moreover, we show that combining the best Letor features outperforms the state of the art query performance predictors. This is the first study that considers such an amount and variety of Letor features for QPP and that demonstrates they are appropriate for this task.

01 Jul 2018

Predicting Contradiction Intensity: Low, Strong or Very Strong?

41st st International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR2018

Keywords: Sentiment, Aspect, Feature evaluation, Contradiction intensity

Conference PapersInternational Conference Papers Selected Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Predicting Contradiction Intensity: Low, Strong or Very Strong?

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersInternational Conference PapersSelected
About The Publication

Reviews on web resources (e.g. courses, movies) become increasingly exploited in text analysis tasks (e.g. opinion detection, controversy detection). This paper investigates contradiction intensity in reviews exploiting different features such as variation of ratings and variation of polarities around specific entities (e.g. aspects, topics). Firstly, aspects are identified according to the distributions of the emotional terms in the vicinity of the most frequent nouns in the reviews collection. Secondly, the polarity of each review segment containing an aspect is estimated. Only resources containing these aspects with opposite polarities are considered. Finally, some features are evaluated, using feature selection algorithms, to determine their impact on the effectiveness of contradiction intensity detection. The selected features are used to learn some state-of-the-art learning approaches. The experiments are conducted on the Massive Open Online Courses data set containing 2244 courses and their 73,873 reviews, collected from coursera.org. Results showed that variation of ratings, variation of polarities, and reviews quantity are the best predictors of contradiction intensity. Also, J48 was the most effective learning approach for this type of classification.

09 Jul 2018

Challenges to knowledge organization in the era of social media. The case of social controversies

15th International ISKO Conference, ISKO2018

Keywords: controversy mediation, social media, Twitter, post-truth, social capital, societal challenges to knowledge organization

Conference PapersInternational Conference Papers Nathanaëla Andrianasolo, Adrian-Gabriel Chifu, Sébastien Fournier, Fidelia Ibekwe-SanJuan

Challenges to knowledge organization in the era of social media. The case of social controversies

Nathanaëla Andrianasolo, Adrian-Gabriel Chifu, Sébastien Fournier, Fidelia Ibekwe-SanJuan
Conference PapersInternational Conference Papers
About The Publication

In this paper, we look at how social media, in particular Twitter, are used to trigger, propagate and regulate opinions, and social controversies. Social media platforms are displacing the mainstream media and traditional sources of knowledge by facilitating the propagation of ideologies and causes championed by different groups of people. This results in pressures being brought to bear on institutions in the real world which are forced to make hasty decisions based on social media campaigns. The new forms of activism and the public arena enabled by social media platforms have also facilitated the propagation of so-called “post-truth” and “alternative facts” that obfuscate the traditional processes of knowledge elaboration which took decades to arrive at. This poses serious challenges for Knowledge Organization systems (KOs) that the KO community needs to find ways to address.

01 Mar 2018

Contradiction in Reviews: is it Strong or Low?

40th European Conference on Information Retrieval, ECIR 2018 - BroDyn

Keywords: sentiment analysis, aspect detection, contradiction intensity

Conference PapersInternational Conference Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Contradiction in Reviews: is it Strong or Low?

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersInternational Conference Papers
About The Publication

Analysis of opinions (reviews) generated by users becomes increasingly exploited by a variety of applications. It allows to follow the evolution of the opinions or to carry out investigations on web resource (e.g. courses, movies, products). The detection of contradictory opinions is an important task to evaluate the latter. This paper focuses on the problem of detecting and estimating contradiction intensity based on the sentiment analysis around specific aspects of a resource. Firstly, certain aspects are identified, according to the distributions of the emotional terms in the vicinity of the most frequent names in the whole of the reviews. Secondly, the polarity of each review segment containing an aspect is estimated using the state-of-the-art approach SentiNeuron. Then, only the resources containing these aspects with opposite polarities (positive, negative) are considered. Thirdly, a measure of the intensity of the contradiction is introduced. It is based on the joint dispersion of the polarity and the rating of the reviews containing the aspects within each resource. The evaluation of the proposed approach is conducted on the Massive Open Online Courses collection containing 2244 courses and their 73,873 reviews, collected from Coursera. The results revealed the effectiveness of the proposed approach to detect and quantify contradictions.

01 Oct 2017

Les réactions des stakeholders aux allégations d’irresponsabilité organisationnelle : le cas du scandale Volkswagen

12ème congrès du RIODD


Conference PapersInternational Conference Papers Amélie Bohas, Marie-Aude Abid-Dupont, Tarek Abid, Adrian Chifu, Sébastien Fournier

Les réactions des stakeholders aux allégations d’irresponsabilité organisationnelle : le cas du scandale Volkswagen

Amélie Bohas, Marie-Aude Abid-Dupont, Tarek Abid, Adrian Chifu, Sébastien Fournier
Conference PapersInternational Conference Papers
01 Nov 2017

Finding and Quantifying Temporal-Aware Contradiction in Reviews

The 13th Asia Information Retrieval Societies Conference, AIRS2017

Keywords: Sentiment analysis, Aspect detection, Contradiction intensity

Conference PapersInternational Conference Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Finding and Quantifying Temporal-Aware Contradiction in Reviews

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersInternational Conference Papers
About The Publication

Opinions (reviews) on web resources (e.g., courses, movies), generated by users, become increasingly exploited in text analysis tasks, the detection of contradictory opinions being one of them. This paper focuses on the quantification of sentiment-based contradictions around specific aspects in reviews. However, it is necessary to study the contradictions with respect to the temporal dimension of reviews (their sessions). In general, for web resources such as online courses (e.g. coursera or edX), reviews are often generated during the course sessions. Between sessions, users stop reviewing courses, and there are chances that courses will be updated. So, in order to avoid the confusion of contradictory reviews coming from two or more different sessions, the reviews related to a given resource should be firstly grouped according to their corresponding session. Secondly, aspects are identified according to the distributions of the emotional terms in the vicinity of the most frequent nouns in the reviews collection. Thirdly, the polarity of each review segment containing an aspect is estimated. Then, only resources containing these aspects with opposite polarities are considered. Finally, the contradiction intensity is estimated based on the joint dispersion of polarities and ratings of the reviews containing aspects. The experiments are conducted on the Massive Open Online Courses data set containing 2244 courses and their 73,873 reviews, collected from coursera.org. The results confirm the effectiveness of our approach to find and quantify contradiction intensity.

01 Sep 2017

Role of social media in propagating controversies: the case of cultural microblog feeds

The 8th Conference and Labs of the Evaluation Forum - CLEF Microblog Cultural Contextualization, CLEF2017

Keywords: Focus IR, opinion mining, information vizualization

Conference PapersInternational Conference Papers Adrian Chifu, Fidelia Ibekwe-Sanjuan, Nathanaêla Andrianasolo

Role of social media in propagating controversies: the case of cultural microblog feeds

Adrian Chifu, Fidelia Ibekwe-Sanjuan, Nathanaêla Andrianasolo
Conference PapersInternational Conference Papers
About The Publication

The aim of this research is to investigate how social media mediate social controversies in the public arena. For that, we will use the CLEF MC2 corpus of microblogs that captured long term political and cultural controversies in order to follow the birth and development of controversies across time and pinpoint the increasing role that social media play in their propagation, regulation and resolution.

01 Sep 2017

Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity in Temporal-Related Reviews

21th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2017

Keywords: Sentiment Analysis, Aspect Extraction, Rating, Review, Time, Contradiction Intensity

Conference PapersInternational Conference Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity in Temporal-Related Reviews

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersInternational Conference Papers
About The Publication

Analysis of opinions (reviews) generated by users becomes increasingly exploited by a variety of applications. It allows to follow the evolution of the opinions or to carry out investigations on products. The detection of contradictory opinions about a web resource (e.g., courses, movies, products, etc.) is an important task to evaluate the latter. This paper focuses on the problem of detecting contradictions in reviews based on the sentiment analysis around specific aspects of a resource (document). In general, for web resources such as online courses (e.g. on Coursera or edX), reviews are often generated during course sessions. Between each session users stop reviewing on the course, and this course may have updates. So, in order to avoid the confusion of contradictory reviews coming from two or more different sessions, the reviews related to a given resource should be firstly grouped according to their session. Secondly, certain aspects are extracted according to the distributions of the emotional terms in the vicinity of the most frequent names in the reviews collection. Thirdly, the polarity of each review segment containing an aspect is identified. Then taking only the resources containing these aspects with opposite polarities (positive, negative). Finally, we propose a measure of contradiction intensity based on the joint dispersion of the polarity and the rating of the reviews containing the aspects within each resource. The evaluation of our approach is conducted on the Massive Open Online Courses (MOOC) collection containing 2244 courses and their 73,873 reviews, collected from Coursera. The results of experiments revealed the effectiveness of the proposed approach to capture and quantify contradiction intensity.

01 Apr 2017

Human-Based Query Difficulty Prediction

39th European Conference on Information Retrieval, ECIR 17

Keywords: Free Text, Query Term, Free Text Comment, Human Annotator, Query Suggestion

Conference PapersInternational Conference Papers Selected Adrian-Gabriel Chifu, Sébastien Déjean, Stefano Mizzaro, Josiane Mothe

Human-Based Query Difficulty Prediction

Adrian-Gabriel Chifu, Sébastien Déjean, Stefano Mizzaro, Josiane Mothe
Conference PapersInternational Conference PapersSelected
About The Publication

The purpose of an automatic query difficulty predictor is to decide whether an information retrieval system is able to provide the most appropriate answer for a current query. Researchers have investigated many types of automatic query difficulty predictors. These are mostly related to how search engines process queries and documents: they are based on the inner workings of searching/ranking system functions, and therefore they do not provide any really insightful explanation as to the reasons for the difficulty, and they neglect user-oriented aspects. In this paper we study if humans can provide useful explanations, or reasons, of why they think a query will be easy or difficult for a search engine. We run two experiments with variations in the TREC reference collection, the amount of information available about the query, and the method of annotation generation. We examine the correlation between the human prediction, the reasons they provide, the automatic prediction, and the actual system effectiveness. The main findings of this study are twofold. First, we confirm the result of previous studies stating that human predictions correlate only weakly with system effectiveness. Second, and probably more important, after analyzing the reasons given by the annotators we find that: (i) overall, the reasons seem coherent, sensible, and informative; (ii) humans have an accurate picture of some query or term characteristics; and (iii) yet, they cannot reliably predict system/query difficulty.

01 Sep 2016

SegChainW2V: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions and word embeddings

20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2016

Keywords: Video retrieval, Story segmentation, Lexical chains, Word embeddings, Transcriptions

Conference PapersInternational Conference Papers Adrian Chifu, Sébastien Fournier

SegChainW2V: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions and word embeddings

Adrian Chifu, Sébastien Fournier
Conference PapersInternational Conference Papers
About The Publication

With the advances in multimedia broadcasting through a rich variety of channels and with the vulgarization of video production, it becomes essential to be able to provide reliable means of retrieving information within videos, not only the videos themselves. Research in this area has been widely focused on the context of TV news broadcasts, for which the structure itself provides clues for story segmentation. The systematic employment of these clues would lead to thematically driven systems that would not be easily adaptable in the case of videos of other types. The systems are therefore dependent on the type of videos for which they have been designed. In this paper we aim at introducing SegChainW2V, a generic unsupervised framework for story segmentation, based on lexical chains from transcriptions and their vectorization. SegChainW2V takes into account the topic changes by perceiving the fiuctuations of the most frequent terms throughout the video, as well as their semantics through the word embedding vectorization.

01 Jun 2016

SegChain: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions

6th International Conference on Web Intelligence, Mining and Semantics (WIMS'2016)

Keywords: Video retrieval, Story segmentation, Lexical chains, Transcriptions

Conference PapersInternational Conference Papers Adrian Chifu, Sébastien Fournier

SegChain: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions

Adrian Chifu, Sébastien Fournier
Conference PapersInternational Conference Papers
About The Publication

With the advances in multimedia broadcasting through a rich variety of channels and with the vulgarization of video production, it becomes essential to be able to provide reliable means of retrieving information within videos, not only the videos themselves. Research in this area has been widely focused on the context of TV news broadcasts, for which the structure itself provides clues for story segmentation. The systematic employment of these clues would lead to thematically driven systems that would not be easily adaptable in the case of videos of other types. The systems are therefore dependent on the type of videos for which they have been designed. In this paper we aim at introducing SegChain, a generic unsupervised framework for story segmentation, based on lexical chains from transcriptions. SegChain takes into account the topic changes by perceiving the fluctuations of the most frequent terms throughout the video.

01 Sep 2015

DeShaTo: Describing the Shape of Cumulative Topic Distributions to Rank Retrieval Systems without Relevance Judgments

Symposium on String Processing and Information Retrieval (SPIRE 2015)

Keywords: information retrieval, topic modeling, LDA, document topic distribution, skewness, kurtosis, ranking retrieval systems

Conference PapersInternational Conference Papers Radu Tudor Ionescu, Adrian Chifu, Josiane Mothe

DeShaTo: Describing the Shape of Cumulative Topic Distributions to Rank Retrieval Systems without Relevance Judgments

Radu Tudor Ionescu, Adrian Chifu, Josiane Mothe
Conference PapersInternational Conference Papers
About The Publication

This paper investigates an approach for estimating the effectiveness of any IR system. The approach is based on the idea that a set of documents retrieved for a specific query is highly relevant if there are only a small number of predominant topics in the retrieved documents. The proposed approach is to determine the topic probability distribution of each document offline, using Latent Dirichlet Allocation. Then, for a retrieved set of documents, a set of probability distribution shape descriptors, namely the skewness and the kurtosis, are used to compute a score based on the shape of the cumulative topic distribution of the respective set of documents. The proposed model is termed DeShaTo, which is short for Describing the Shape of cumulative Topic distributions. In this work, DeShaTo is used to rank retrieval systems without relevance judgments. In most cases, the empirical results are better than the state of the art approach. Compared to other approaches, DeShaTo works independently for each system. Therefore, it remains reliable even when there are less systems to be ranked by relevance.

01 Jul 2018

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ?

29es journées francophones d'Ingénierie des Connaissances (IC2018)

Mots-clés: Analyse de sentiments, Détection d’aspects, Évaluation des critères, Intensité de contradiction.
(2nd best paper)

Conference PapersNational Conference Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Prédire l’intensité de contradiction dans les commentaires : faible, forte ou très forte ?

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersNational Conference Papers
About The Publication

LescommentairessurdesressourcesWeb(ex.descours,desfilms)deviennentdeplusenplus exploitées dans des tâches d’analyse de texte (ex. détection d’opinion, détection de controverses). Cet article étudie l’intensité de contradiction dans les commentaires en exploitant différents critères tels que la variation des notations et la variation des polarités autour d’entités spécifiques (ex. aspects, sujets). Premièrement, les aspects sont identifiés en fonction des distributions des termes émotionnels à proximité des noms les plus fréquents dans la collection des commentaires. Deuxièmement, la polarité est estimée pour chaque segment de commentaire contenant un aspect. Seules les ressources ayant des commentaires contenant des aspects avec des polarités opposées sont prises en compte. Enfin, les critères sont évalués, en utilisant des algorithmes de sélection d’attributs, pour déterminer leur impact sur l’efficacité de la détection de l’intensité des contradictions. Les critères sélectionnés sont ensuite introduits dans des modèles d’apprentissage pour prédire l’intensité de contradiction. L’évaluation expérimentale est menée sur une collection contenant 2244 cours et leurs 73873 commentaires, collectés à partir de coursera.org. Les résultats montrent que la variation des notations, la variation des polarités et la quantité de commentaires sont les meilleurs prédicteurs de l’intensité de contradiction. En outre, J48 est l’approche d’apprentissage la plus efficace pour cette tâche.

29 Mar 2017

Détection de contradiction dans les commentaires

COnférence en Recherche d'Information et Applications (CORIA2017)

Mots-clés : Analyse de sentiments, Contenus générés par l’utilisateur, Contradiction
Keywords: Sentiment analysis, User generated content, Contradiction

Conference PapersNational Conference Papers Ismaïl Badache, Sébastien Fournier, Adrian Chifu

Détection de contradiction dans les commentaires

Ismaïl Badache, Sébastien Fournier, Adrian Chifu
Conference PapersNational Conference Papers
About The Publication

Résumé :

L’analyse des avis (commentaires) générés par les utilisateurs devient de plus en plus exploitable par une variété d’applications. Elle permet de suivre l’évolution des avis ou d’effec- tuer des enquêtes sur des produits. La détection d’avis contradictoires autour d’une ressource Web (ex. cours, film, produit, etc.) est une tâche importante pour évaluer cette dernière. Dans cet article, nous nous concentrons sur le problème de détection des contradictions et de la me- sure de leur intensité en se basant sur l’analyse du sentiment autour des aspects spécifiques à une ressource (document). Premièrement, nous identifions certains aspects, selon les distri- butions des termes émotionnels au voisinage des noms les plus fréquents dans l’ensemble des commentaires. Deuxièmement, nous estimons la polarité de chaque segment de commentaire contenant un aspect. Ensuite, nous prenons uniquement les ressources contenant ces aspects avec des polarités opposées (positive, négative). Troisièmement, nous introduisons une mesure de l’intensité de la contradiction basée sur la dispersion conjointe de la polarité et du rating des commentaires contenant les aspects au sein de chaque ressource. Nous évaluons l’effica- cité de notre approche sur une collection de MOOC (Massive Open Online Courses) contenant 2244 cours et leurs 73873 commentaires, collectés à partir de Coursera. Nos résultats montrent l’efficacité de l’approche proposée pour capturer les contradictions de manière significative.

Abstract:

Analysis of opinions (reviews) generated by users becomes increasingly exploited by a variety of applications. It allows to follow the evolution of the opinions or to carry out investigations on products. The detection of contradictory opinions about a Web resource (e.g., courses, movies, products, etc.) is an important task to evaluate the latter. In this paper, we focus on the problem of detecting contradictions based on the sentiment analysis around specific as- pects of a resource (document). First, we identify certain aspects, according to the distributions of the emotional terms in the vicinity of the most frequent names in the whole of the reviews. Second, we estimate the polarity of each review segment containing one aspect. Then we take only the resources containing these aspects with opposite polarities (positive, negative). Third, we introduce a measure of the intensity of the contradiction based on the joint dispersion of the polarity and the rating of the reviews containing the aspects within each resource. We evalu- ate the effectiveness of our approach on the Massive Open Online Courses (MOOC) collection containing 2244 courses and their 73873 reviews, collected from Coursera. Our results show the effectiveness of the proposed approach to capture contradictions significantly.

18 Oct 2016

MyBestQuery: A serious game to collect manual query reformulation

Colloque Veille Stratégique Scientifique et Technologique (VSST 2016), Rabat (Morocco)

Keywords: Information retrieval, Query reformulation, Serious game, Human annotation

Conference PapersNational Conference Papers Adrian Chifu, Serge Molina, Josiane Mothe

MyBestQuery: A serious game to collect manual query reformulation

Adrian Chifu, Serge Molina, Josiane Mothe
Conference PapersNational Conference Papers
About The Publication

This paper presents MyBestQuery, a serious game designed to collect query reformulations from players. Query reformulation is a hot topic in information retrieval and covers many aspects. One of them is query reformulation analysis which is based on users’ session. It can be used to understand user’s intent or to measure his satisfaction with regards to the results he obtained when querying the search engine. Automatic query reformulation is another aspect of query reformulation. It automatically expands the initial user’s query in order to improve the quality of the retrieved document set. This mechanism relies on document analysis but could also benefit from manually reformulated query analysis. Web search engines collect millions of search sessions and possible query reformulations. As academics, this information is hardly accessible for us. MyBestQuery is designed as a serious game in order to collect various possible reformulation users suggest. The more long-term objective of this work is to analyse the humanly produced query reformulation in order to both analyse manual query reformulation and compare them with the automatically produced reformulations. Preliminary results are reported in this paper.

01 Mar 2016

MyBestQuery : un jeu sérieux pour apprendre des utilisateurs

Conférence francophone en Recherche d'Information et Applications (CORIA 2016), Toulouse

Mots-clés : Jeu sérieux ; Crowdsourcing, Etude utilisateur, Moteur de recherche d’information, Annotation des requêtes, Aide aux utilisateurs
Keywords: Serious game, Crowdsourcing, User study, Search engine, Query annotation

Conference PapersNational Conference Papers Adrian Chifu, Serge Molina, Josiane Mothe

MyBestQuery : un jeu sérieux pour apprendre des utilisateurs

Adrian Chifu, Serge Molina, Josiane Mothe
Conference PapersNational Conference Papers
About The Publication

Résumé :

MyBestQuery est un jeu sérieux qui collecte des éléments sur les requêtes soumises à un moteur de recherche: (i) la prédiction de la difficulté de la requête par le joueur (ii) des raisons possibles expliquant cette difficulté (iii) des propositions de reformulation.

Abstract:

MyBestQuery is a serious game designed to collect items from queries submitted to a search engine: (i) the query difficulty prediction (ii) the possible reasons for this difficulty (iii) other query formulations.

18 Mar 2015

La prédiction efficace de la difficulté des requêtes : une tâche impossible ?

Conférence francophone en Recherche d'Information et Applications (CORIA 2015), Paris

Mots-clés : Recherche d’information, requête difficile, prédiction, analyse de données
Keywords: Information retrieval, query difficulty predictor, data mining, evaluation

Conference PapersNational Conference Papers Adrian Chifu, Léa Laporte, Josiane Mothe

La prédiction efficace de la difficulté des requêtes : une tâche impossible ?

Adrian Chifu, Léa Laporte, Josiane Mothe
Conference PapersNational Conference Papers
About The Publication

Résumé :

Les moteurs de recherche d’information (RI) retrouvent des réponses quelle que soit la requête, mais certaines requêtes sont difficiles (le système n’obtient pas de bonne performance en termes de mesure de RI). Pour les requêtes difficiles, des traitements ad-hoc doivent être appliqués. Prédire qu’une requête est difficile est donc crucial et différents prédicteurs ont été proposés. Dans cet articlenous étudions la variété de l’information captée par les prédicteurs existants et donc leur non redondance. Par ailleurs, nous montrons que les corrélations entre les prédicteurs et les performance des systèmes donnent peu d’espoir sur la capacité de ces prédicteurs à être réellement efficaces. Enfin, nous étudions la capacité des prédicteurs à prédire les classes de difficulté des requêtes en nous appuyant sur une variété de méthodes exploratoires et d’apprentissage. Nous montrons que malgré les (faibles) corrélations observées avec les mesures de performance, les prédicteurs actuels conduisent à des performances de prédiction variables et sont donc difficilement utilisables dans une application concrète de RI.

Abstract:

Search engines found answers whatever the user query is, but some queries are more difficult than others for the system. For difficult queries, adhoc treatments must be applied. Predicting query difficulty is crucial and different predictors have been proposed. In this paper, we revisit these predictors. First we check the non statistical redundancy of predictors. Then, we show that the correlation between the values of predictors and system performance gives little hope on the ability of these predictors to be effective. Finally, we study the ability of predictors to predict the classes of difficulty by relying on a variety of exploratory and learning methods. We show that despite the (low) correlation with performance measures, current predictors are not robust enough to be used in practical IR applications.

01 Jun 2014

Performance Analysis of Information Retrieval Systems

Spanish Conference on Information Retrieval, Coruna

Keywords: Information Retrieval, Classification, Query difficulty, Optimization, Random Forest, Adaptive Information Retrieval

Conference PapersNational Conference Papers Julie Ayter, Cecile Desclaux, Adrian Chifu, Josiane Mothe, Sébastien Déjean

Performance Analysis of Information Retrieval Systems

Julie Ayter, Cecile Desclaux, Adrian Chifu, Josiane Mothe, Sébastien Déjean
Conference PapersNational Conference Papers
About The Publication

It has been shown that there is not a best information retrieval system configuration which would work for any query, but rather that performance can vary from one query to another. It would be interesting if a meta-system could decide which system should process a new query by learning from the context of previously submitted queries. This paper reports a deep analysis considering more than 80,000 search engine configura- tions applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which search engine configuration responds best to a certain type of query. We considered two approaches to define query types: one is based on query clustering according to the query performance (their difficulty), while the other approach uses various query features (including query difficulty predictors) to cluster queries. We identified two parameters that should be optimized first. An important outcome is that we could not obtain strong conclusive results; considering the large number of systems and methods we used, this result could lead to the conclusion that current query features does not fit the optimizing problem.

19 Mar 2014

Expansion sélective de requêtes par apprentissage

Conférence francophone en Recherche d'Information et Applications (CORIA 2014), Nancy, France

Mots-clés : Recherche sélective d’information, Prédicteurs de difficulté, Difficulté de requête, Expansion de requête, Apprentissage
Keywords: Selective information retrieval, Difficulty predictors, Query expansion, Machine learning

Conference PapersNational Conference Papers Adrian Chifu, Josiane Mothe

Expansion sélective de requêtes par apprentissage

Adrian Chifu, Josiane Mothe
Conference PapersNational Conference Papers
About The Publication

Résumé :

Si l’expansion de requête automatique améliore en moyenne la qualité de recherche, elle peut la dégrader pour certaines requêtes. Ainsi, certains travaux s’intéressent à développer des approches sélectives qui choisissent la fonction de recherche ou d’expansion en fonction des requêtes. La plupart des approches sélectives utilisent un processus d’apprentissage sur des caractéristiques de requêtes passées et sur les performances obtenues. Cet article présente une nouvelle méthode d’expansion sélective qui se base sur des prédicteurs de difficulté des requêtes, prédicteurs linguistiques et statistiques. Le modèle de décision est appris par un SVM. Nous montrons l’efficacité de la méthode sur des collections TREC standards. Les modèles appris ont classé les requêtes de test avec plus de 90% d’exactitude. Par ailleurs, la MAP est améliorée de plus de 11%, comparée à des méthodes non sélectives.

Abstract:

Query expansion (QE) improves the retrieval quality in average, even though it can dramatically decrease performance for certain queries. This observation drives the trend to suggest selective approaches that aim at choosing the best function to apply for each query. Most of selective approaches use a learning process on past query features and results. This paper presents a new selective QE method that relies on query difficulty predictors. The method combines statistically and linguistically based predictors. The QE method is learned by a SVM. We demonstrate the efficiency of the proposed method on a number of standard TREC benchmarks. The supervised learning models have performed the query classification with more than 90% accuracy on the test collection. Our approach improves MAP by more than 11%, compared to the non selective methods.

03 Apr 2013

Prédire la difficulté des requêtes : la combinaison de mesures statistiques et sémantiques

Conférence francophone en Recherche d'Information et Applications (CORIA 2013), Neuchatel, Suisse

Mots-clés : Recherche d’Information, prédire la performance, difficulté des requêtes, ambiguïté des requêtes, combinaison des prédicteurs, corrélation des mesures
Keywords: Information Retrieval, performance prediction, query difficulty, query ambiguity, combined predictors, measure correlation

Conference PapersNational Conference Papers Adrian Chifu

Prédire la difficulté des requêtes : la combinaison de mesures statistiques et sémantiques

Adrian Chifu
Conference PapersNational Conference Papers
About The Publication

Résumé :

La performance d’un Système de Recherche d’Information (SRI) est étroite- ment liée à la requête. Les requêtes pour lesquelles les SRI échouent sont appelées dans la littérature des « requêtes difficiles ». L’étude présentée dans cet article vise à ana- lyser, adapater et combiner plusieurs prédicteurs de difficulté de requêtes. Nous avons considéré trois prédicteurs: un lié à l’ambiguïté des termes, un basé sur la fréquence des termes et une mesure de répartition des résultats. L’évaluation de la prédiction est basée sur la corrélation entre la difficulté prédite et la performance réelle des SRI. Nous montrons que la combinaison de ces prédicteurs donne de bons résultats. Le cadre d’évaluation est celui des collections TREC7 et TREC8 adhoc.

Abstract:

The performance of an Information Retrieval System (IRS) is closely related to the query. The queries that lead to retrieval failure are referenced in the literature as “difficult queries”. This study aims at analysing, adapting and combining several difficulty predictors. The evaluation of the prediction is based on the correla- tion between the predicted difficulty and the IRS performance. As predictors, we have considered an ambiguity predictor, the IDF measure and a score distribution measure. We show that combining the proposed predictors, produce good results. The evaluation framework consists in the TREC7 and TREC8 ahdoc collections.

30 Jan 2017

Vers une personnalisation des environnements d’apprentissages à l’expérience émotionnelle de l’apprenant

ORPHEE RDV 2017, Font Romeu (France)

Atelier : Réalités mixtes, virtuelles et augmentées pour l'apprentissage : perspectives et challenges pour la conception, l'évaluation et le suivi

Position Papers Magalie Ochs, Adrian Chifu, Sebastien Fournier, Evelyne Lombardo, Ivan Madjarov, Patrice Bellot

Vers une personnalisation des environnements d’apprentissages à l’expérience émotionnelle de l’apprenant

Magalie Ochs, Adrian Chifu, Sebastien Fournier, Evelyne Lombardo, Ivan Madjarov, Patrice Bellot
Position Papers
About The Publication

Les émotions d’un apprenant jouent un rôle déterminant dans l’apprentissage, influant fortement sur ses capacités cognitives (Lafortune et al., 2004 ; Cuisinier et Pons, 2011). Aujourd’hui un des enjeux majeurs des environnement d’apprentissage est d’y intégrer une forme d’intelligence émotionnelle (Mayer et al., 2001) permettant d’adapter automatique l’apprentissage aux émotions de l’apprenant (Harley et al., 2015; Ochs et Frasson, 2004) . Les problématiques sous-jacentes à la création d’un environnement d’apprentissage “émotionnellement intelligents” rejoignent celles de l’Informatique Affective (Picard, 2003) :

  1. la reconnaissance automatique des émotions ;
  2. la gestion des émotions de l’utilisateur ;
  3. l’expression d’émotions par des systèmes interactifs (e.g. via des comportements verbaux et non verbaux de personnages virtuels ou de robots humanoïdes).

Dans ce “position paper”, nous nous concentrerons plus particulièrement sur les deux premiers points : la reconnaissance et la gestion des émotions de l’utilisateur. L’objectif est de modéliser l’expérience émotionnelle de l’apprenant (comprendre les causes et les effets de ses émotions lors du processus d’apprentissage) afin d’adapter l’apprentissage aux émotions de l’apprenant, automatiquement détectés, pour optimiser l’acquisition des connaissances. Les problématiques et pistes de recherche sous-jacentes sont décrites dans la section suivante.

17 Jun 2017

Beyond meaning

Invited speaker @RAAI 2017. Bucharest (RO)


Other Publications Adrian Chifu

Beyond meaning

Adrian Chifu
Other Publications
01 Dec 2016

Presentation, Thesis research & SegChainW2V: Towards a Generic Automatic Video Segmentation Framework, based on Lexical Chains of Audio Transcriptions and Word Embeddings

Seminary (Séminaire d'accueil des enseignants-chercheurs de la FEG)


Other Publications Adrian Chifu

Presentation, Thesis research & SegChainW2V: Towards a Generic Automatic Video Segmentation Framework, based on Lexical Chains of Audio Transcriptions and Word Embeddings

Adrian Chifu
Other Publications
01 Feb 2014

Expansion sélective de requêtes par apprentissage

Seminary


Other Publications Adrian Chifu

Expansion sélective de requêtes par apprentissage

Adrian Chifu
Other Publications
01 Feb 2014

Difficult query predictors: combining statistical and semantic measures

Seminary


Other Publications Adrian Chifu

Difficult query predictors: combining statistical and semantic measures

Adrian Chifu
Other Publications