# Liste de nos séminaires

*(ordre anti-chronologique)*

**Séminaire DAPA** du** 15 / 1 / 2015** à **14h**

*Tensor factorization for multi-relational learning*

Raphael Bailly (Heudiasyc, Université Technologique de Compiègne, France)

Lieu : salle 101, couloir 25-26, 4 place Jussieu, 75005 Paris

Learning relational data has been of a growing interest in fields as diverse as modeling social networks, semantic web, or bioinformatics. To some extent, a network can be seen as multi-relational data, where a particular relation represents a particular type of link between entities. It can be modeled as a three-way tensor.

Tensor factorization have shown to be a very efficient way to learn such data. It can be done either in a 3-way factorization style (trigram, e.g. RESCAL) or by sum of 2-way factorization (bigram, e.g TransE). Those methods usually achieve state-of-the-art accuracy on benchmarks. Though, all those learning methods suffer from regularization processes which are not always adequate.

We show that both 2-way and 3-way factorization of a relational tensor can be formulated as a simple matrix factorization problem. This class of problems can naturally be relaxed in a convex way. We show that this new method outperforms RESCAL on two benchmarks.

**Bio**

R. Bailly is currently post-doc at Heudiasyc (since march 2014), Compiègne. He works with Antoine Bordes and Nicolas Usunier on multi-relational learning and word embeddings. He was previously in Barcelona for a post-doc with Xavier Carreras, whith whom he worked on spectral methods applied to unsupervised setting.

*Plus d'information sur Raphael Bailly : https://www.hds.utc.fr/~baillyra/*

**Séminaire DAPA** du** 27 / 11 / 2014** à **10h**

*The Frank-Wolfe Algorithm: Recent Results and Applications to High-Dimensional Similarity Learning and Distributed Optimization*

Aurélien Bellet (Télécom ParisTech)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The topic of this talk is the Frank-Wolfe (FW) algorithm, a greedy procedure for minimizing a convex and differentiable function over a compact convex set. FW finds its roots in the 1950's but has recently regained a lot of interest in machine learning and related communities. In the first part of the talk, I will introduce the FW algorithm and review some recent results that motivate its appeal in the context of large-scale learning problems. In the second part, I will describe two applications of FW in my own work: (i) learning a similarity/distance function for sparse high-dimensional data, and (ii) learning sparse combinations of elements that are distributed over a network.

**Bio**

Aurélien Bellet is currently a postdoc at Télécom ParisTech. Previously, he worked as a postdoc at the University of Southern California and received his Ph.D. from the University of Saint-Etienne in 2012. His main research topic is statistical machine learning, with particular interests in metric/similarity learning and large-scale/distributed learning.

*Plus d'information sur Aurélien Bellet : http://perso.telecom-paristech.fr/~abellet/*

**Séminaire DAPA** du** 13 / 11 / 2014** à **10h**

*Computer-Aided Breast Tumor Diagnosis in DCE-MRI Images*

Baishali Chaudhury (Department of Computer Science and Engineering, University of South Florida)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The overall goal of our project is to quantify tumor heterogeneity with advanced image analysis to provide useful information about tumor biology and provide unique and valuable insight into patient treatment strategies and prognosis.We introduced a CAD (computer aided diagnosis) system to characterize breast cancer heterogeneity through spatially-explicit maps using DCE-MRI images. Through quantitative image analysis, we examined the presence of differing tumor habitats defined by initial and delayed contrast patterns within the tumor. The heterogeneity within each habitat was quantified through textural kinetic features at different scales and quantization levels. The functionality of this CAD system was then evaluated by applying it in a multi-objective framework. Various common problems in breast DCE-MRI analysis (like extremely small dataset compared to the number of extracted texture features and highly imbalanced dataset) and different data mining techniques applied in our project to deal with them will be discussed.

**Bio**

Fourth year PhD Candidate in University of South Florida, Tampa, USA. Currently, working on the “Analysis of DCE-MRI breast tumor images for stratifying patient prognosis”. Broader research interests include: computer vision, data mining and machine learning, sparse data representation.

*Plus d'information sur Baishali Chaudhury : http://baishalichaudhury.wix.com/baishali*

**Séminaire DAPA** du** 30 / 10 / 2014** à **11h**

*WaterFowl: a Compact, Self-Indexed RDF Store based on Succinct Data Structures*

Olivier Curé (Laboratoire d'informatique Gaspard-Monge, Université Marne-la-Vallée)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 2 / 10 / 2014** à **10h**

*Subgoal Discovery and Language Learning in Reinforcement Learning Agents*

Marie desJardins (Department of Computer Science and Electrical Engineering at the University of Maryland, USA)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

As intelligent agents and robots become more commonly used, methods to make interaction with the agents more accessible will become increasingly important. In this talk, I will present a system for intelligent agents to learn task descriptions from linguistically annotated demonstrations, using a reinforcement learning framework based on object-oriented Markov decision processes (OO-MDPs). Our framework learns how to ground natural language commands into reward functions, using as input demonstrations of different tasks being carried out in the environment. Because language is grounded to reward functions, rather than being directly tied to the actions that the agent can perform, commands can be high-level and can be carried out autonomously in novel environments. Our approach has been empirically validated in a simulated environment with both expert-created natural language commands and commands gathered from a user study.

I will also describe a related, ongoing project to develop novel option discovery methods for OO-MDP domains. These methods permit agents to identify new subgoals in complex environments that can be transferred to new tasks. We have developed a framework called Portable Multi-policy Option Discovery for Automated Learning (P-MODAL), an approach that extends the PolicyBlocks option discovery approach to OO-MDPs.

This work is collaborative research with Dr. Michael Littman and Dr. James MacGlashan of Brown University, Dr. Smaranda Muresan of Columbia University. A number of UMBC students have contributed to the project: Shawn Squire, Nicholay Topin, Nick Haltemeyer, Tenji Tembo, Michael Bishoff, Rose Carignan, and Nathaniel Lam.

**Bio**

Dr. Marie desJardins is a Professor in the Department of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County, where she has been a member of the faculty since 2001. She is a 2013-14 American Council of Education Fellow, the 2014-17 UMBC Presidential Teaching Professor, and an inaugural Hrabowski Academic Innovation Fellow. Her research is in artificial intelligence, focusing on the areas of machine learning, multi-agent systems, planning, interactive AI techniques, information management, reasoning with uncertainty, and decision theory. Current research projects include learning in the context of planning and decision making, analyzing and visualizing uncertainty in machine learning, trust modeling in multiagent systems, and computer science education.

Dr. desJardins has published over 120 scientific papers in journals, conferences, and workshops. She is an Associate Editor of the Journal of Artificial Intelligence Research, is a member of the editorial board of AI Magazine, and was the Program Cochair for AAAI-13. She has previously served as AAAI Liaison to the Board of Directors of the Computing Research Association, Vice-Chair of ACM's SIGART, and AAAI Councillor. She is an ACM Distinguished Member, is a AAAI Senior Member, holds an appointment at the University of Maryland Institute for Advanced Studies, is a member and former chair of UMBC's Honors College Advisory Board, is the former chair of UMBC's Faculty Affairs Committee, and serves on the advisory board of UMBC's Center for Women in Technology.

*Plus d'information sur Marie desJardins : http://www.csee.umbc.edu/~mariedj/*

**Séminaire DAPA** du** 11 / 9 / 2014** à **14h**

*Clustering-based Models from Model-based Clustering*

Mika Sato-Ilic (Faculty of Engineering, Information and Systems. University of Tsukuba)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 3 / 7 / 2014** à **10h**

*Clustering de données temporelles, application à l'analyse des données issue des médias sociaux*

Julien Velcin

*laboratoire ERIC, Université Lyon 2*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 1 / 7 / 2014** à **17h**

*PLUIE (Probability and Logic Unified for Information Extraction): Interim Report*

Stuart Russell (University of California, Berkeley)

Ole Torp Lassen (LIP6, UPMC)

Wei Wang (LIP6, UPMC)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The goal of the PLUIE project is to investigate an old approach to

understanding language: the idea that declarative text expresses

information about the world. This idea is captured in the form of a

probability model that describes how sentences are generated from

worlds. A very simple model of this kind exhibits a number of

interesting properties including robust bootstrap inferences and

relation discovery. The talk will summarize the approach and cover two

specific subproblems: efficient split-merge MCMC inference in an

entity-mention model and flexible mention grammars for named entities.

*S. Russell est soutenu par, et cette présentation est donnée sous les auspices de, la Chaire Internationale de Recherche Blaise Pascal, financée par l'Etat et la Région Île de France, gérée par la Fondation de l'Ecole Normale Supérieure.*

**Séminaire DAPA** du** 5 / 6 / 2014** à **14h**

*Classification non-supervisée recouvrante par k-moyennes revisité*

Guillaume Cleuziou

*IUT Informatique d'Orléans*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 22 / 5 / 2014** à **10h**

*Collaborative activity in learning situations: forms and processes*

Michael Baker

*CNRS*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 15 / 5 / 2014** à **10h**

*A normal hierarchical model for random intervals / The silhouette index - an extension to fuzzy clustering and applications to feature selection*

Dan Ralescu / Anca Ralescu

*Department of Mathematical Sciences, University of Cincinnati, USA / Computer Sciences, EE & CS Dept. College of Engineering University of Cincinnati, USA*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

(10h30-11h20) Dan Ralescu, Professor, Department of Mathematical Sciences, University of Cincinnati, USA*A normal hierarchical model for random intervals*

Existing methods for analyzing interval-valued data include regressions in the metric space of intervals and symbolic data analysis, the latter being proposed in a more general setting. However, there has been a lack of literature on the parametric modeling and distribution-based inferences for interval-valued data.

(11h20-12h10) Anca Ralescu, Professor, Computer Sciences, EE & CS Dept. College of Engineering University of Cincinnati, USA*The silhouette index - an extension to fuzzy clustering and applications to feature selection*

**Séminaire DAPA** du** 20 / 2 / 2014** à **10h**

*Robust recommendations and their explanation in multi-criteria decision aiding*

Christophe Labreuche

*Thales Group, France*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Multi-Criteria Decision Aid (MCDA) aims at helping an individual to make choices among alternatives described by several attributes, from a (small) set of learning data representing her preferences. MCDA has a wide range of applications in smart cities, engineering, recommender systems and so on. Among the variety of available decision models, one can cite the weighted majority, additive utility, weighted sum or the Choquet integral.

Once the expression of the decision model has been chosen, the generation of choices among alternatives is classically done as follows. In a constraint approach, from a set of learning data (representing for instance comparisons of alternatives), one then looks for the value of the model parameters compatible with the learning data, which maximizes some functional, e.g. an entropy or a separation variable on the learning data. The comparisons among alternatives are then obtained by applying the model with the previously constructed parameters. The major difficulty the decision maker faces is that there usually does not exist one unique value of the parameters compatible with the learning data. Hence this approach introduces much arbitrariness since the generated preferences are much stronger than the learning data.

Robust preference relations have been recently introduced in MCDA to overcome this difficulty. An alternative is said to be necessarily preferred to another one if the first one dominates the second for any value of the parameters compatible with the learning data. In Artificial Intelligence, this operator is often called entailment. It is actually a closure operator. This necessity preference relation is usually incomplete, unless the model is completely specified from the preferential information of the decision maker.

The introduction of robust preference relation brings many new challenges:

- algorithmic aspects: how to design efficient algorithms to construct it?
- explanation: how to explain to the decision maker the recommended robust preferences? In other words, how are the recommendations derived from the learning data?

We will address these points in the talk.

**Séminaire DAPA** du** 6 / 2 / 2014** à **10h**

*Apprentissage actif en classification évidentielle sous contraintes*

Violaine Antoine

*ISIMA Limos*

Lieu : salle 101, couloir 25-26, 4 place Jussieu, 75005 Paris

La classification évidentielle et non supervisée se caractérise par l'utilisation de fonctions de croyance, et notamment l'utilisation de la notion de partition crédale. Cette notion élargit le concept de partition nette, floue, probabiliste ou possibiliste. Ainsi, elle permet de mesurer de manière précise l'incertitude quant à l'affectation d'un objet à une classe.

La classification sous contraintes, également appelée classification semi-supervisée, est une approche qui introduit une connaissance a priori sous forme de contraintes sur la partition recherchée. Nous nous intéressons ici à des contraintes au niveau des objets : une contrainte Must-Link spécifie que deux objets doivent être dans la même classe alors qu'une contrainte Cannot-link indique que deux objets se trouvent dans des classes différentes. L'ajout de contraintes permet une amélioration sensible des résultats de classification. Néanmoins, dans le cadre d'applications réelles, il est parfois difficile d'obtenir un jeu de contraintes intéressant. L'apprentissage actif consiste donc à obtenir ces informations à moindre coût.

Dans cette présentation, nous proposons deux nouveaux algorithmes de classification sous contraintes utilisant le cadre théorique des fonctions de croyance. Grâce à la partition crédale qu'ils retournent, nous pouvons identifier de manière précise les objets problématiques pour la classification. Un nouvel algorithme d'apprentissage actif est alors proposé afin de réduire l'erreur de classification.

**Séminaire DAPA** du** 19 / 12 / 2013** à **10h**

*The raise of graph databases/dataspaces and their relations with Linked Data and Ontologies*

André Santanchè

*Universidade Estadual de Campinas, Brazil*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

**Séminaire DAPA** du** 16 / 12 / 2013** à **16h**

*Extended Logic Programming and Intelligent System Development*

Asushi INOUE

*University of Cincinnati*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

A long-term effort toward a general application framework

for intelligent systems is introduced. Many intelligent systems

adopt a knowledge-based system architecture, and their development

thus differs from other application development. Expressing knowledge

as rules shifts one's perspective from data manipulation to relation

investigation. Our recent progress about two components are focused

- Extended Logic Programming (ELP), i.e. the keystone of this framework,

and a multi-view visualization scheme in order to effectively and

efficiently visualize the reasoning processes of ELP. Few representative

applications are showcased as time allows.

**Reference**:

K. Springer, M. Henry, A. Inoue, "A General Application Framework for Intelligent Systems,"

The 20th Midwest Artificial Intelligence and Cognitive Science Conference (MAICS2009),

Fort Wayne, IN, pp. 188-195, 2009.

**Séminaire DAPA** du** 5 / 12 / 2013** à **10h**

*Granular Models for Time Series Forecasting*

Rosangela Ballini

*Institute of Economics, University of Campinas, Brazil*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

time series forecasting. These models are constructed in two phases.

The first one uses the clustering algorithms to find group structures in a

historical database. Two different approaches are discussed: fuzzy c-means

clustering and participatory learning algorithms. Fuzzy c-mean clustering,

which is a supervised clustering algorithm, is used to explore similar

data characteristics, such as trend or cyclical components. Participatory

learning induces unsupervised dynamic fuzzy clustering algorithms and

provides an effective alternative to construct adaptive fuzzy systems.

In the second phase, two cases are considered. In the first case, a

regression model is adjusted for each cluster and forecasts are produced

by a weighted combination of the local regression models. In the second

case, prediction data are classified according to the group structure

found in the database. Then, forecasts are produced using the cluster

centers weighted by the degree with which prediction data match the

groups. The weighted combination of local models constitutes a forecasting

approach called granular functional forecasting modeling, and the approach

based on weighted combination cluster centers comprises granular

relational forecasting modeling. The effectiveness of the granular

forecasting approaches is verified using three different applications:

average streamflow forecasting, pricing option estimation and modeling of

regime changes in Brazilian nominal interest rates.

**Séminaire DAPA** du** 7 / 11 / 2013** à **10h**

*Automated Feature Weighting in Naive Bayes for High-dimensional Data Classification*

Shengrui Wang

*Université de Sherbrooke*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

This talk is about our recent work in the area of feature weighting for high dimensional data classification (and clustering). The first part of my talk relates to Naive Bayes (NB for short) classifier. Currently, in many real-world applications, high-dimensionality poses a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this work, we propose an automated feature weighting solution to enable the NB method to deal effectively with high-dimensional data. First a locally weighted probability model will be presented for implementing a soft feature selection scheme. Then an optimization algorithm will be presented to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies will show the effectiveness and suitability of the proposed model for high-dimensional data classification.

In the second part of this talk, I will briefly present our work on central clustering of categorical data with automated feature weighting. A novel kernel-density-based definition of cluster center is proposed using a Bayes-type probability estimator. Then, an algorithm called k-centers is proposed incorporating a new feature weighting scheme by which each attribute is automatically assigned with a weight measuring its individual contribution for the clusters.

**Séminaire DAPA** du** 4 / 11 / 2013** à **10h**

*Abductive reasoning made easy with Prolog and Constraint Handling Rules*

Henning Christiansen

*Roskilde University*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Abductive reasoning, or "abduction", means to find a best explanation for some unexpected observation. In a logical setting, an explanation can be a set of facts which, when added to our current knowledge base, makes it possible to prove the truth of the observation and, at the same time, is not inconsistent with the knowledge base. Abduction in this sense is a useful metaphor for many sorts of reasoning aiming at answering "why" or "what" questions such as medical diagnosis, language understanding and decoding of biological sequence data. Furthermore, models of abductive reasoning can lead to practical implementation techniques.

Introduced by Peirce, the notion has attracted much attention in philosophy, detective stories and computer science, most notably in logic programming. Until the shift of the millennium, abduction in logic programming was realized through complex meta-interpreters written in Prolog, which may have led to a view of abduction as being some hairy, difficult stuff, far too inefficient for any realistic applications. In this talk, we demonstrate how a fairly powerful version of abductive reasoning can be exercised through a direct use of Prolog, using its extension by Constraint Handling Rules as the engine to take care of abducible hypotheses.

**Séminaire DAPA** du** 10 / 10 / 2013** à **15h**

*New Perspectives in Social Data Management / Understanding Similarity Metrics in Neighbour-based Recommender Systems*

Sihem Amer-Yahia / Arjen de Vries

*Laboratoire d'Informatique de Grenoble / Centrum Wiskunde & Informatica, Amsterdam*

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

*New Perspectives in Social Data Management*

The web has evolved from a technology platform to a social milieu where a mix of factual, opinion and behavior data interleave. A number of social applications are being built to analyze and extract value from this data and is encouraging us to do data-driven research.

I will describe a perspective on why and how social data management is fundamentally different from data management as it is taught in school today. More specifically, I'll talk about social data preparation, social data exploration and social application validation.

This talk is based on published and ongoing work with colleagues at LIG, UT Austin, U. of Trento, U. of Tacoma, and Google Research.

*Understanding Similarity Metrics in Neighbour-based Recommender Systems*

Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the CWI Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: we won the ACM RecSys 2013 News Recommender Systems challenge!). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.

**Séminaire DAPA** du** 19 / 9 / 2013** à **15h**

*Fuzzy Semantic Sentence Similarity Measures *

Keeley A Crockett

*The Intelligent Systems Group, School of Computing, Maths and Digital Technology, Manchester Metropolitan University*

Lieu : 25-26:105

A problem in the field of semantic sentence similarity is the inability of sentence similarity measures to accurately represent perception based (fuzzy) words that are commonly used in natural language. Given the wide use of fuzzy words in natural language this limits the strength of these measures in the areas where they are practically applied.

This talk briefly reviews traditional semantic word and sentence similarity measures and then describes a new fuzzy measure known as FAST (Fuzzy Algorithm for Similarity Testing). FAST is an ontology based similarity measure that uses concepts of fuzzy logic and computing with words to allow for the accurate representation of fuzzy based words. Through empirical human experimentation fuzzy sets were created for six categories of words based on their levels of association with particular concepts. These fuzzy sets were then defuzzified and the results used to create new ontological relations between the fuzzy words. These relationships allowed for the creation of a new ontology based semantic text similarity algorithm that is able to show the effect of fuzzy words on computing sentence similarity as well as the effect that fuzzy words have on non-fuzzy words within a sentence. Initial experiments using FAST are described on two possible future benchmark “fuzzy” datasets. The results show that there was an improved level of correlation between FAST and human test results compared with two traditional sentence similarity measures.

The talk concludes by looking at one potential application area where semantic similarity measures are utilised in a Student Debt Advisor Conversational Agent to remove the need for extensive scripting and maintenance.