Liste de nos séminaires

(ordre anti-chronologique)

Séminaire DAPA du 11 / 9 / 2014 à 14h

Clustering-based Models from Model-based Clustering

Mika Sato-Ilic (Faculty of Engineering, Information and Systems. University of Tsukuba)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Recent advances in the area of information science have enabled the collection of multi-source data and complex data in vast amounts. Data analysis has been tasked with the increasingly significant mission of dealing with such data. Clustering is one type of data analysis used to detect and characterize the latent structure of data by classifying the objects based on similarities among objects. Model-based clustering is a framework of clustering methods and main issue of this is an assumption of a model to the data and by fitting the model to data, an adjusted partition will be estimated. Although this approach has the benefit of obtaining a clear solution as the result of the partition based on mathematical theory, we cannot avoid the risk the previously assumed model might not adjust to the latent classification structure of the data. Therefore, we propose a framework called clustering-based models in which we exploit obtained clustering result as a scale of latent structure of the data and apply it to the observed data, and then apply the modified data to a model in order to obtain a more accurate result. In this talk, several methods in this framework called clustering-based models with several applications will be introduced.

Séminaire DAPA du 3 / 7 / 2014 à 10h

Clustering de données temporelles, application à l'analyse des données issue des médias sociaux

Julien Velcin

laboratoire ERIC, Université Lyon 2

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Les modèles graphiques sont devenus très populaires pour traiter les problèmes de classification automatique. Dans cet exposé, je présenterai les travaux réalisés récemment au laboratoire ERIC pour deux problèmes différents de classification non supervisée. Le premier problème que nous avons attaqué s'inspire de modèles probabilistes de topic modeling pour capturer conjointement l'évolution des thématiques et des opinions exprimées dans un corpus de textes. Le deuxième problème abordé consiste à adapter les modèles de mélanges afin de capturer la dynamique des catégories. Les modèles présentés seront illustrés sur des données réelles issues des médias sociaux. J'en profiterai pour donner un aperçu de leur application, dans le cadre du projet ImagiWeb, qui consiste à extraire et à suivre l'image d'entités sur le Web.

Séminaire DAPA du 1 / 7 / 2014 à 17h

PLUIE (Probability and Logic Unified for Information Extraction): Interim Report

Stuart Russell (University of California, Berkeley)
Ole Torp Lassen (LIP6, UPMC)
Wei Wang (LIP6, UPMC)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The goal of the PLUIE project is to investigate an old approach to
understanding language: the idea that declarative text expresses
information about the world. This idea is captured in the form of a
probability model that describes how sentences are generated from
worlds. A very simple model of this kind exhibits a number of
interesting properties including robust bootstrap inferences and
relation discovery. The talk will summarize the approach and cover two
specific subproblems: efficient split-merge MCMC inference in an
entity-mention model and flexible mention grammars for named entities.

S. Russell est soutenu par, et cette présentation est donnée sous les auspices de, la Chaire Internationale de Recherche Blaise Pascal, financée par l'Etat et la Région Île de France, gérée par la Fondation de l'Ecole Normale Supérieure.

Séminaire DAPA du 5 / 6 / 2014 à 14h

Classification non-supervisée recouvrante par k-moyennes revisité

Guillaume Cleuziou

IUT Informatique d'Orléans

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

La classification non-supervisée recouvrante (overlapping clustering) consiste à faire émerger d'un ensemble de données, des classes d'individus similaires tout en autorisant chaque individu à apparaître pleinement dans plusieurs classes. Nous montrerons dans cette présentation en quoi ce type de structuration peu conventionnelle est primordiale dans de nombreux domaines d'application et en quoi le clustering recouvrant constitue une problématique de recherche à part entière. L'algorithme de partitionnement bien connu des k-moyennes nous servira alors de base pour introduire différentes modélisations des recouvrements de clusters, les stratégies associées pour explorer l'espace des solutions et enfin les extensions possibles vers les méthodes à noyau.

Séminaire DAPA du 22 / 5 / 2014 à 10h

Collaborative activity in learning situations: forms and processes

Michael Baker


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Diverse research fields are concerned with modelling collaborative activity, from artificial intelligence and evolutionary anthropology, to several branches of psychology, notably organisational psychology, social psychology and educational psychology. Particular visions of cooperation and collaboration have been elaborated in the field of (computer-supported) collaborative learning (e.g. Dillenbourg, Baker, Blaye & O’Malley, 1996), where models of collaboration are required for interpreting experimental results in terms of how the students interacted together, and for design of technologies for collaboration. In this context, a general distinction between cooperation and collaboration (Roschelle & Teasley, 1995) is now generally accepted: collaboration involves the mostly synchronous joint attempt to elaborate a shared representation of the problem to be solved, whereas cooperation tends towards less synchronous work, with division of sub-task responsibilities between participants. The main questions raised by this definition are: what is the nature of such “shared representations”, and what are the forms and processes by which they are co-elaborated? This paper deepens and extends these definitions in three main ways. Firstly, a definition of what “shared representation” means is proposed, as mutual acceptance, distinguished from belief (Cohen, 1992). Secondly, forms of cooperative activity are defined in terms of combinations of three gradual dimensions: (a)symmetry of interactive roles, (dis)agreement, and alignment (or coordination) on several levels (problem-solving stage, language, discursive representations). Finally, the discursive operations that constitute collaboration are described, in terms of four broad classes: extensional, cumulative, foundational and reformulative. The set of forms of collaboration associated with the specific case of argumentation dialogue will be described in particular detail, with elements of the model being illustrated with examples taken from several corpora of interactions between students.

Séminaire DAPA du 15 / 5 / 2014 à 10h

A normal hierarchical model for random intervals / The silhouette index - an extension to fuzzy clustering and applications to feature selection

Dan Ralescu / Anca Ralescu

Department of Mathematical Sciences, University of Cincinnati, USA / Computer Sciences, EE & CS Dept. College of Engineering University of Cincinnati, USA

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

(10h30-11h20) Dan Ralescu, Professor, Department of Mathematical Sciences, University of Cincinnati, USA
A normal hierarchical model for random intervals

Many statistical data are imprecise due to factors such as measurement errors, computation errors, and lack of information. In such cases, data are better represented by intervals rather than by single numbers.
Existing methods for analyzing interval-valued data include regressions in the metric space of intervals and symbolic data analysis, the latter being proposed in a more general setting. However, there has been a lack of literature on the parametric modeling and distribution-based inferences for interval-valued data.

(11h20-12h10) Anca Ralescu, Professor, Computer Sciences, EE & CS Dept. College of Engineering University of Cincinnati, USA
The silhouette index - an extension to fuzzy clustering and applications to feature selection

Introduced in 1986 by Peter J. Rousseeuw, as a visualization tool for the results of a clustering algorithm, the silhouette index has found more applications as a clustering validity index. Various features of this index recommend it. In this talk I will discuss two topics: (1) recent work on the extension of the silhouette index to fuzzy clustering, and (2) applications of the silhouette index to feature selection for a classifier.

Séminaire DAPA du 20 / 2 / 2014 à 10h

Robust recommendations and their explanation in multi-criteria decision aiding

Christophe Labreuche

Thales Group, France

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Multi-Criteria Decision Aid (MCDA) aims at helping an individual to make choices among alternatives described by several attributes, from a (small) set of learning data representing her preferences. MCDA has a wide range of applications in smart cities, engineering, recommender systems and so on. Among the variety of available decision models, one can cite the weighted majority, additive utility, weighted sum or the Choquet integral.

Once the expression of the decision model has been chosen, the generation of choices among alternatives is classically done as follows. In a constraint approach, from a set of learning data (representing for instance comparisons of alternatives), one then looks for the value of the model parameters compatible with the learning data, which maximizes some functional, e.g. an entropy or a separation variable on the learning data. The comparisons among alternatives are then obtained by applying the model with the previously constructed parameters. The major difficulty the decision maker faces is that there usually does not exist one unique value of the parameters compatible with the learning data. Hence this approach introduces much arbitrariness since the generated preferences are much stronger than the learning data.

Robust preference relations have been recently introduced in MCDA to overcome this difficulty. An alternative is said to be necessarily preferred to another one if the first one dominates the second for any value of the parameters compatible with the learning data. In Artificial Intelligence, this operator is often called entailment. It is actually a closure operator. This necessity preference relation is usually incomplete, unless the model is completely specified from the preferential information of the decision maker.

The introduction of robust preference relation brings many new challenges:

  • algorithmic aspects: how to design efficient algorithms to construct it?
  • explanation: how to explain to the decision maker the recommended robust preferences? In other words, how are the recommendations derived from the learning data?

We will address these points in the talk.

Séminaire DAPA du 6 / 2 / 2014 à 10h

Apprentissage actif en classification évidentielle sous contraintes

Violaine Antoine


Lieu : salle 101, couloir 25-26, 4 place Jussieu, 75005 Paris

La classification évidentielle et non supervisée se caractérise par l'utilisation de fonctions de croyance, et notamment l'utilisation de la notion de partition crédale. Cette notion élargit le concept de partition nette, floue, probabiliste ou possibiliste. Ainsi, elle permet de mesurer de manière précise l'incertitude quant à l'affectation d'un objet à une classe.

La classification sous contraintes, également appelée classification semi-supervisée, est une approche qui introduit une connaissance a priori sous forme de contraintes sur la partition recherchée. Nous nous intéressons ici à des contraintes au niveau des objets : une contrainte Must-Link spécifie que deux objets doivent être dans la même classe alors qu'une contrainte Cannot-link indique que deux objets se trouvent dans des classes différentes. L'ajout de contraintes permet une amélioration sensible des résultats de classification. Néanmoins, dans le cadre d'applications réelles, il est parfois difficile d'obtenir un jeu de contraintes intéressant. L'apprentissage actif consiste donc à obtenir ces informations à moindre coût.

Dans cette présentation, nous proposons deux nouveaux algorithmes de classification sous contraintes utilisant le cadre théorique des fonctions de croyance. Grâce à la partition crédale qu'ils retournent, nous pouvons identifier de manière précise les objets problématiques pour la classification. Un nouvel algorithme d'apprentissage actif est alors proposé afin de réduire l'erreur de classification.

Séminaire DAPA du 19 / 12 / 2013 à 10h

The raise of graph databases/dataspaces and their relations with Linked Data and Ontologies

André Santanchè

Universidade Estadual de Campinas, Brazil

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Graph as a data model to represent, store and link data has been receiving an increasing attention. Social networks, Linked Data and ontologies share common challenges, which foster research in topics like graph databases and dataspaces. This talk will present and overview of this scenario, emphasizing the following topics: exploiting latent semantics in "social content"; link-driven integration, Linked Data, dataspaces and "pay-as-you-go" integration; topology-aware, IR-inspired metrics for declarative graph querying; from graphs to ontologies. We will present examples in Biology domain.

Séminaire DAPA du 16 / 12 / 2013 à 16h

Extended Logic Programming and Intelligent System Development

Asushi INOUE

University of Cincinnati

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

A long-term effort toward a general application framework
for intelligent systems is introduced. Many intelligent systems
adopt a knowledge-based system architecture, and their development
thus differs from other application development. Expressing knowledge
as rules shifts one's perspective from data manipulation to relation
investigation. Our recent progress about two components are focused
- Extended Logic Programming (ELP), i.e. the keystone of this framework,
and a multi-view visualization scheme in order to effectively and
efficiently visualize the reasoning processes of ELP. Few representative
applications are showcased as time allows.


K. Springer, M. Henry, A. Inoue, "A General Application Framework for Intelligent Systems,"
The 20th Midwest Artificial Intelligence and Cognitive Science Conference (MAICS2009),
Fort Wayne, IN, pp. 188-195, 2009.

Séminaire DAPA du 5 / 12 / 2013 à 10h

Granular Models for Time Series Forecasting

Rosangela Ballini

Institute of Economics, University of Campinas, Brazil

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Granular models based on fuzzy clustering are presented as an approach for
time series forecasting. These models are constructed in two phases.
The first one uses the clustering algorithms to find group structures in a
historical database. Two different approaches are discussed: fuzzy c-means
clustering and participatory learning algorithms. Fuzzy c-mean clustering,
which is a supervised clustering algorithm, is used to explore similar
data characteristics, such as trend or cyclical components. Participatory
learning induces unsupervised dynamic fuzzy clustering algorithms and
provides an effective alternative to construct adaptive fuzzy systems.
In the second phase, two cases are considered. In the first case, a
regression model is adjusted for each cluster and forecasts are produced
by a weighted combination of the local regression models. In the second
case, prediction data are classified according to the group structure
found in the database. Then, forecasts are produced using the cluster
centers weighted by the degree with which prediction data match the
groups. The weighted combination of local models constitutes a forecasting
approach called granular functional forecasting modeling, and the approach
based on weighted combination cluster centers comprises granular
relational forecasting modeling. The effectiveness of the granular
forecasting approaches is verified using three different applications:
average streamflow forecasting, pricing option estimation and modeling of
regime changes in Brazilian nominal interest rates.

Séminaire DAPA du 7 / 11 / 2013 à 10h

Automated Feature Weighting in Naive Bayes for High-dimensional Data Classification

Shengrui Wang

Université de Sherbrooke

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

This talk is about our recent work in the area of feature weighting for high dimensional data classification (and clustering). The first part of my talk relates to Naive Bayes (NB for short) classifier. Currently, in many real-world applications, high-dimensionality poses a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this work, we propose an automated feature weighting solution to enable the NB method to deal effectively with high-dimensional data. First a locally weighted probability model will be presented for implementing a soft feature selection scheme. Then an optimization algorithm will be presented to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies will show the effectiveness and suitability of the proposed model for high-dimensional data classification.

In the second part of this talk, I will briefly present our work on central clustering of categorical data with automated feature weighting. A novel kernel-density-based definition of cluster center is proposed using a Bayes-type probability estimator. Then, an algorithm called k-centers is proposed incorporating a new feature weighting scheme by which each attribute is automatically assigned with a weight measuring its individual contribution for the clusters.

Séminaire DAPA du 4 / 11 / 2013 à 10h

Abductive reasoning made easy with Prolog and Constraint Handling Rules

Henning Christiansen

Roskilde University

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Abductive reasoning, or "abduction", means to find a best explanation for some unexpected observation. In a logical setting, an explanation can be a set of facts which, when added to our current knowledge base, makes it possible to prove the truth of the observation and, at the same time, is not inconsistent with the knowledge base. Abduction in this sense is a useful metaphor for many sorts of reasoning aiming at answering "why" or "what" questions such as medical diagnosis, language understanding and decoding of biological sequence data. Furthermore, models of abductive reasoning can lead to practical implementation techniques.

Introduced by Peirce, the notion has attracted much attention in philosophy, detective stories and computer science, most notably in logic programming. Until the shift of the millennium, abduction in logic programming was realized through complex meta-interpreters written in Prolog, which may have led to a view of abduction as being some hairy, difficult stuff, far too inefficient for any realistic applications. In this talk, we demonstrate how a fairly powerful version of abductive reasoning can be exercised through a direct use of Prolog, using its extension by Constraint Handling Rules as the engine to take care of abducible hypotheses.

Séminaire DAPA du 10 / 10 / 2013 à 15h

New Perspectives in Social Data Management / Understanding Similarity Metrics in Neighbour-based Recommender Systems

Sihem Amer-Yahia / Arjen de Vries

Laboratoire d'Informatique de Grenoble / Centrum Wiskunde & Informatica, Amsterdam

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

(15h00-15h45) DR CNRS, Sihem Amer-Yahia, Laboratoire d'Informatique de Grenoble
New Perspectives in Social Data Management

The web has evolved from a technology platform to a social milieu where a mix of factual, opinion and behavior data interleave. A number of social applications are being built to analyze and extract value from this data and is encouraging us to do data-driven research.
I will describe a perspective on why and how social data management is fundamentally different from data management as it is taught in school today. More specifically, I'll talk about social data preparation, social data exploration and social application validation.
This talk is based on published and ongoing work with colleagues at LIG, UT Austin, U. of Trento, U. of Tacoma, and Google Research.

(15h45-16h30) Professor, Arjen de Vries, Centrum Wiskunde & Informatica, Amsterdam
Understanding Similarity Metrics in Neighbour-based Recommender Systems

Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the CWI Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: we won the ACM RecSys 2013 News Recommender Systems challenge!). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.

Séminaire DAPA du 19 / 9 / 2013 à 15h

Fuzzy Semantic Sentence Similarity Measures

Keeley A Crockett

The Intelligent Systems Group, School of Computing, Maths and Digital Technology, Manchester Metropolitan University

Lieu : 25-26:105

A problem in the field of semantic sentence similarity is the inability of sentence similarity measures to accurately represent perception based (fuzzy) words that are commonly used in natural language. Given the wide use of fuzzy words in natural language this limits the strength of these measures in the areas where they are practically applied.

This talk briefly reviews traditional semantic word and sentence similarity measures and then  describes a new fuzzy measure known as FAST (Fuzzy Algorithm for Similarity Testing). FAST is an ontology based similarity measure that uses concepts of fuzzy logic and computing with words to allow for the accurate representation of fuzzy based words. Through empirical human experimentation fuzzy sets were created for six categories of words based on their levels of association with particular concepts. These fuzzy sets were then defuzzified and the results used to create new ontological relations between the fuzzy words. These relationships allowed for the creation of a new ontology based semantic text similarity algorithm that is able to show the effect of fuzzy words on computing sentence similarity as well as the effect that fuzzy words have on non-fuzzy words within a sentence. Initial experiments using FAST are described on two possible future benchmark “fuzzy” datasets. The results show that there was an improved level of correlation between FAST and human test results compared with two traditional sentence similarity measures.

The talk concludes by looking at one potential application area where semantic similarity measures are utilised in a Student Debt Advisor Conversational Agent to remove the need for extensive scripting and maintenance.

Séminaire DAPA du 19 / 7 / 2013 à 14h

Applications of a new effort based model of software usability

Dan E. Tamir

Texas State University, San Marcos, Texas

Lieu : 26-00:101

Résumé :
The effort-based model for software usability stems from the notion that usability is an inverse function of effort. This new model of usability can be used for evaluating user interface, development of usable software, and pinpointing software usability defects. In this presentation, the underlying theory of the effort-based model along with pattern recognition techniques are used to introduce a framework for pinpointing usability deficiencies in software via automatic classification of segments of video file containing eye tracking results. In addition, we demonstrate the way that these principles can be used to construct a nondestructive user interface where the user can effectively navigate the web with minimum attention. The approach presented enables deriving web browsers for vehicle drivers and potentially for the blind.

Biographie :
Dr. Tamir is an associate professor in the Department of Computer Science, Texas State University, San Marcos, Texas (2005 - to date). He obtained the PhD-CS from Florida State University in1989, and the MS/BS-EE from Ben-Gurion University, Israel.

From 1996-2005, he managed applied research and design in DSP Core technology in Motorola SPS. From 1989-1996, he served as an assistant/associate professor in the CS Department at Florida Tech. Between 1983-1986, he worked in the applied research division, Tadiran, Israel.

Dr. Tamir is conducting research in combinatorial optimization, computer vision, audio, image, and video compression, human computer interaction, and pattern recognition.

Séminaire DAPA du 28 / 6 / 2013 à 10h

Modeling Topics and Opinions in Asynchronous Conversations

Giuseppe Carenini

Department of Computer Science, University of British Columbia

Lieu : Salle Champarnaud 26-00:124

Due to the Internet revolution, human conversational data--in written forms--are accumulating at a phenomenal rate, as more and more people engage in email exchanges, blogging, texting and other social media activities. In this talk, we will present automatic methods for analyzing conversational text generated in asynchronous conversations, i.e., where participants communicate with each other at different times (e.g., email, blog, forum). Our focus will be on novel techniques to detect the topics covered in the conversation, and to identify whether an utterance in the conversation is expressing an opinion and what its polarity is.

Giuseppe Carenini, Associate Professor
Department of Computer Science
University of British Columbia,

Giuseppe is an Associate Professor in Computer Science at the University of British Columbia (BC, Canada). He is also a member of the UBC Institute for Computing, Information, and Cognitive Systems (ICICS) and an Associate member of the UBC Institute for Resources, Environment and Sustainability (IRES). Giuseppe has broad interdisciplinary interests. His work on natural language processing and information visualization to support decision making has been published in over 80 peer-reviewed papers. Dr. Carenini was the area chair for “Sentiment Analysis, Opinion Mining, and Text Classification” of ACL 2009 and the area chair for “Summarization and Generation” of NAACL 2012. He has recently co-edited an ACM-TIST Special Issue on “Intelligent Visual Interfaces for Text Analysis”. In July 2011, he has published a co-authored book on “Methods for Mining and Summarizing Text Conversations”. In his work, Dr. Carenini has also extensively collaborated with industrial partners, including Microsoft and IBM. Giuseppe was awarded a Google Research Award and an IBM CASCON Best Exhibit Award in 2007 and 2010 respectively.

Séminaire DAPA du 27 / 6 / 2013 à 10h

Exploration and Exploitation of Scratch Games

Raphaël Féraud

Orange Labs

Lieu : 25-26:105

We consider a variant of the multi-armed bandit model, which we call scratch games, where the sequences of rewards are finite and drawn in advance with unknown starting dates. This new problem is motivated by online advertising applications where the number of ad displays is fixed according to a contract between the advertiser and the publisher, and where a new ad may appear at any time. The drawn-in-advance assumption is natural for the adversarial approach where an oblivious adversary is supposed to choose the reward sequences in advance. For the stochastic setting, it is functionally equivalent to an urn where draws are performed without replacement. The non-replacement assumption is suited to the sequential design of non-reproducible experiments, which is often the case in real world. By adapting the standard multi-armed bandit algorithms to take advantage of this setting, we propose three new algorithms: the first one is designed for adversarial rewards; the second one assumes a stochastic urn model; and the last one is based on a Bayesian approach. For the adversarial and stochastic approaches, we provide upper bounds of the regret which compare favorably with the ones of Exp3 and UCB1. We also confirm experimentally that these algorithms compare favorably with Exp3, UCB1 and Thompson Sampling by simulation with synthetic models and ad-serving data.

Keywords adversarial multi-armed bandits ; stochastic multi-armed bandits ; finite sequences ; scratch games.

Séminaire DAPA du 13 / 6 / 2013 à 10h

Vers la gestion de l'imprécision dès la construction de systèmes d'information géographique à la visualisation des données : une démarche basée sur la théorie des ensembles flous

Cyril de Runz

CReSTIC, IUT de Reims Châlons Charleville

Lieu : 25-26:105

Ce travail se positionne dans le cadre de la manipulation de données spatiotemporelles réelles en tenant compte de leur imperfection et plus particulièrement de leur imprécision. La démarche présentée s'inscrit dans la volonté d'aller vers une meilleure gestion de celles-ci tant pour leur représentation que pour leur analyse et leur visualisation. Dans ce contexte, nos contributions portent tant au niveau conception et construction des systèmes d'information géographique que de l'interrogation et l'exploration (possiblement visuelle) des données. Nos méthodes se basent notamment sur la définition de pictogrammes visuels étendant l'UML, d'indices temporels flous, de graphes, de rangs, de coloriage guidé par les données, etc. La démarche sera illustrée autour d'applications sur des données issues de l'archéologie préventive et sur des cas d'études prospectifs en agronomie et en urbanisme.

Séminaire DAPA du 6 / 6 / 2013 à 10h

Experiments with Probabilistic Logic Programming applied to Biological Sequence Analysis

Ole Torp Lassen

Roskilde University, Danemark - LIP6 depuis 01/04/2013

Lieu : 25-26:105

Systems that combine logic programming and statistical inference in theory allow machine learning systems to deal with both relational and statistical information.In practice, however, such applications do not scale very well.The LoSt project was concerned with a compositional approach to overcome those challenges. In particular, we experimented with applying one probabilistic logic programming system, PRISM (Taisuke Sato & Yoshitaka Kameya), based on B-Prolog, to complex, large scale bio-informatical problems.Firstly, some important aspects of the PRISM system and its underlying implementation were optimised for application to large scale data.Secondly, we developed a compositional method of analysis, Bayesian Annotation Networks, where the complex overall task is approximated by identifying and negotiating interdependent constituent subtasks and, in turn, integrating their analytical results according to their interdependencies.Finally, we experimented extensively with the developed framework in the domain of procaryotic gene-finding. As part of the general domain of DNA-annotation, the task of gene-finding is characterized by large sets of extremely long and highly ambiguous sequences of data and, thus, represents a suitably challenging setting for efficient analysis.In general, we concluded that with the computing power of today, probabilistic logic programming systems, as exemplified by PRISM, can be applied efficiently - also in large scale domains. As such, probabilistic logic programming offers extremely expressive models with very clear semantics – facilitating increased focus on domain properties and less on programming complexity.