Liste de nos séminaires

(ordre anti-chronologique)


Séminaire DAPA du 27 / 6 / 2013 à 10h

Exploration and Exploitation of Scratch Games

Raphaël Féraud


Orange Labs

Lieu : 25-26:105

We consider a variant of the multi-armed bandit model, which we call scratch games, where the sequences of rewards are finite and drawn in advance with unknown starting dates. This new problem is motivated by online advertising applications where the number of ad displays is fixed according to a contract between the advertiser and the publisher, and where a new ad may appear at any time. The drawn-in-advance assumption is natural for the adversarial approach where an oblivious adversary is supposed to choose the reward sequences in advance. For the stochastic setting, it is functionally equivalent to an urn where draws are performed without replacement. The non-replacement assumption is suited to the sequential design of non-reproducible experiments, which is often the case in real world. By adapting the standard multi-armed bandit algorithms to take advantage of this setting, we propose three new algorithms: the first one is designed for adversarial rewards; the second one assumes a stochastic urn model; and the last one is based on a Bayesian approach. For the adversarial and stochastic approaches, we provide upper bounds of the regret which compare favorably with the ones of Exp3 and UCB1. We also confirm experimentally that these algorithms compare favorably with Exp3, UCB1 and Thompson Sampling by simulation with synthetic models and ad-serving data.

Keywords adversarial multi-armed bandits ; stochastic multi-armed bandits ; finite sequences ; scratch games.


Séminaire DAPA du 13 / 6 / 2013 à 10h

Vers la gestion de l'imprécision dès la construction de systèmes d'information géographique à la visualisation des données : une démarche basée sur la théorie des ensembles flous

Cyril de Runz


CReSTIC, IUT de Reims Châlons Charleville

Lieu : 25-26:105

Ce travail se positionne dans le cadre de la manipulation de données spatiotemporelles réelles en tenant compte de leur imperfection et plus particulièrement de leur imprécision. La démarche présentée s'inscrit dans la volonté d'aller vers une meilleure gestion de celles-ci tant pour leur représentation que pour leur analyse et leur visualisation. Dans ce contexte, nos contributions portent tant au niveau conception et construction des systèmes d'information géographique que de l'interrogation et l'exploration (possiblement visuelle) des données. Nos méthodes se basent notamment sur la définition de pictogrammes visuels étendant l'UML, d'indices temporels flous, de graphes, de rangs, de coloriage guidé par les données, etc. La démarche sera illustrée autour d'applications sur des données issues de l'archéologie préventive et sur des cas d'études prospectifs en agronomie et en urbanisme.


Séminaire DAPA du 6 / 6 / 2013 à 10h

Experiments with Probabilistic Logic Programming applied to Biological Sequence Analysis

Ole Torp Lassen


Roskilde University, Danemark - LIP6 depuis 01/04/2013

Lieu : 25-26:105

Systems that combine logic programming and statistical inference in theory allow machine learning systems to deal with both relational and statistical information.In practice, however, such applications do not scale very well.The LoSt project was concerned with a compositional approach to overcome those challenges. In particular, we experimented with applying one probabilistic logic programming system, PRISM (Taisuke Sato & Yoshitaka Kameya), based on B-Prolog, to complex, large scale bio-informatical problems.Firstly, some important aspects of the PRISM system and its underlying implementation were optimised for application to large scale data.Secondly, we developed a compositional method of analysis, Bayesian Annotation Networks, where the complex overall task is approximated by identifying and negotiating interdependent constituent subtasks and, in turn, integrating their analytical results according to their interdependencies.Finally, we experimented extensively with the developed framework in the domain of procaryotic gene-finding. As part of the general domain of DNA-annotation, the task of gene-finding is characterized by large sets of extremely long and highly ambiguous sequences of data and, thus, represents a suitably challenging setting for efficient analysis.In general, we concluded that with the computing power of today, probabilistic logic programming systems, as exemplified by PRISM, can be applied efficiently - also in large scale domains. As such, probabilistic logic programming offers extremely expressive models with very clear semantics – facilitating increased focus on domain properties and less on programming complexity.


Séminaire DAPA du 30 / 5 / 2013 à 10h30

Exploring Categories of Uncertainty - toward Structure of Uncertainty

Michio Sugeno


Tokyo Institute of Technology, Japan and European Centre for Soft Computing, Spain

Lieu : 25-26:105

As a conventional concept of uncertainty, we are familiar with the 'probability' of a phenomenon initiated in 17 century. Also we often discuss the 'uncertainty' of knowledge. Recently, Fuzzy Theory has brought a hidden uncertainty, 'fuzziness', to light. Reflections on these ideas lead to a fundamental question: What kinds of uncertainty are we aware of? Motivated by this question, this study aims to explore categories and modalities of uncertainty. For instance, we have found that (i) 'form' is a category of uncertainty; (ii) 'inconsistency' is a modality of uncertainty; (iii) the inconsistency of form is one of the major uncertainties. Through the classification of adjectives implying various uncertainties, we elucidate seven uncertainties (or nine if subcategories are counted) and identify three essential ones among them, such as the fuzziness of wording. Finally the structure of uncertainty will be shown. The obtained structure is verified by psychological experiments, while the validity of three essential uncertainties is examined by linguistic analysis.


Séminaire DAPA du 16 / 5 / 2013 à 10h

Introduction to Active Sets

Germano Resconi


Department of Mathematics and Physics, Catholic University, Brescia, Italie

Lieu : 25-26:105

An active set is a unifying space being able to act as a “bridge” for transferring information, ideas and results between distinct types of uncertainties and different types of applications. An active set is a set of agents who independently deliver true or false values for a given proposition. An active set is not a simple vector of logic values for different propositions, the results are a vector but the set is not.
The difference between an ordinary set and active set is that the ordinary set has passive elements with values of the attributes defined by an external agent, in the active set any element is an agent that internally defines the value of a given attribute for a passive element.
Agents in the active set with a special criteria gives the logic value for the same attribute. So agents in many cases are in a logic conflict and this generate semantic uncertainty on the logic evaluation. Criteria and agents are the two variables by which we give different logic values to the same attribute or proposition. Active sets is beyond the modal logic. In fact given a proposition in modal logic we can evaluate the proposition only when we know the worlds where the proposition is locate. When we evaluate one proposition in one world we cannot evaluate the same proposition in another world. Now in epistemic logic any world is an agent that know that the proposition is true or false. Now the active set is a set of agents as in the epistemic logic but the difference with modal logic is that all the agents (worlds) are not separate but are joined in the evaluation of the given proposition. In active set for one agent and one criteria we have one logic value but for many agents and criteria the evaluation is not true and false but is a matrix of true and false. This matrix  is not only a logic evaluation as in the modal logic but give us the conflicting structure of the active set evaluation. Matrix agent is the vector subspace of the true false agent multi dimension space. Operations among active set include operations in the traditional set , fuzzy sets and rough set as special cases. The agent multi dimensional space to evaluate active set include also the Hilbert multidimensional space where is possible to simulate quantum logic gate. New logic operation are possible as fuzzy gate operations and more complex operations as conflicting solving , consensus operations , syntactic inconsistency , semantic inconsistency and knowledge integration. In the space of the agents evaluations morphotronic geometric operations are the new frontier to model new types of computers , new type of model for wireless communications as cognitive radio. In conclusion Active set open new possibility and new models for the logic.


Séminaire DAPA du 15 / 5 / 2013 à 10h

Neuromuscular Modelling and Analysis of Handwriting: from Automatic Generation to Biomedical and Neurocognitive applications.

Réjean Plamondon


Laboratoire Scribens, Département de Génie Électrique, École Polytechnique de Montréal

Lieu : 25-26:105

Many models have been proposed over the years to study human movements in general and
handwriting in particular: models relying on neural networks, dynamics models, psychophysical
models, kinematic models and models exploiting minimization principles. Among the models
that can be used to provide analytical representations of a pen stroke, the Kinematic Theory of
rapid human movements and its family of lognormal models has often served as a guide in the
design of pattern recognition systems relying on the exploitation of the fine neuromotricity, like
on-line handwriting segmentation, signature verification as well as in the design of intelligent
systems involving in a way or another, the global processing of human movements. Among
other things, this lecture aims at elaborating a theoretical background for many handwriting
applications as well as providing some basic knowledge that could be integrated or taking care
of in the development of new automatic pattern recognition systems to be exploited in
biomedical engineering and cognitive neurosciences.

More specifically, we will overview the basic neuromotor properties of single strokes and will
explain how they can be superimposed vectorially to generate complex pen tip trajectories.
Doing so, we will report on various projects conducted by our team and our collaborators. First,
we will present a brief comparative survey of the different lognormal models. Then, from a
practical perspective, we will describe some parameter extraction algorithms suitable for the
reverse engineering of individual strokes as well as of complex handwriting signals. We will show
how the resulting representation could be employed to characterize signers and writers and
how the corresponding feature sets could be exploited to study the effects of various factors,
like aging and health problems, on handwriting variability. We will also describe some
methodologies to generate automatically huge on-line handwriting databases for either writer
dependent or writer independent applications as well as for the production of synthetic
signature databases. From a theoretical perspective, we will explain how, using an original
psychophysical set up, we have been able to validate the basic hypothesis of the Kinematic
Theory and to test its most distinctive predictions. We will complete this survey by explaining
how the Kinematic Theory could be utilized to improve some signal processing techniques,
opening a window on novel potential applications for on-line handwriting processing,
particularly to provide some benchmarks to analyze children handwriting learning, to study
aging effects on neuromotor control as well as developing diagnostic systems for
neuromuscular disorders. To illustrate this latter point, we will report typical results obtained so
far for the assessment of brain stroke most important modifiable risk factors (diabetes,
hypertension, hypercholesterolemia, obesity, cardiac problems, cigarette smoking).


Séminaire DAPA du 11 / 4 / 2013 à 10h

Co-clustering sous différentes approches: Modèles et algorithmes

Mohamed Nadif


LIPADE, Université Paris Descarte

Lieu : 25-26:105

La classification automatique est devenue  un outil important qui s'est beaucoup développé ces dernières années. Bien que les procédures de classification soient nombreuses et que la majorité d'entre elles ait pour objectif de construire une partition optimale des objets (lignes) ou des variables (colonnes), il existe d'autres méthodes, dites de classification croisée, qui considèrent les deux ensembles simultanément et cherchent à organiser les données en blocs homogènes. Comparées aux méthodes  de classification classiques, les algorithmes de classification croisée ont démontré leur efficacité dans la découverte de structures à partir de matrices de données de grande taille (lignes et/ou colonnes), sparses ou non.  Dans ma présentation, je vais considérer plusieurs approches en insistant particulièrement sur l'approche mélange basée sur les modèles latents par blocs (Latent Block Models) et l'approche factorisation basée sur la tri-factorisation de matrice non négative (Nonnegative Matrix Tri-Factorization).


Séminaire DAPA du 28 / 3 / 2013 à 11h

Spike-based computing and learning in brains, machines, and visual systems in particular

Timothée Masquelier


Adaptive NeuroComputation Group, Lab. of Neurobiology of Adaptive Processes, UPMC

Lieu : 25-26:105

Using simulations, we have first shown that, thanks to the physiological learning mechanism referred to as Spike Timing-Dependent Plasticity (STDP), neurons can detect and learn repeating spike patterns, in an unsupervised manner, even when those patterns are embedded in noise[1,2]. Importantly, the spike patterns do not need to repeat exactly: it also works when only a firing probability pattern repeats, providing this profile has narrow (10-20ms) temporal peaks[3]. Brain oscillations may help in getting the required temporal precision[4], in particular when dealing with slowly changing stimuli. All together, these studies show that some envisaged problems associated to spike timing codes, in particular noise-resistance, the need for a reference time, or the decoding issue, might not be as severe as once thought. These generic STDP-based mechanisms are probably at work in particular the visualsystem, where they can explain how selectivity to visual primitives emerges[5,6], leading to very reactive systems. I am now investigating if they are also at work in the somatosensory system. Finally, these mechanisms are also appealing for neuromorphic engineering: they can be efficiently implemented in hardware, leading to fast systems with self-learning abilities[7].

References
1 Masquelier, T. et al. (2008) Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains. PLoS ONE 3, e1377
2 Masquelier, T. et al. (2009) Competitive STDP-Based Spike Pattern Learning. Neural Comput 21, 1259–1276
3 Gilson, M. et al. (2011) STDP allows fast rate-modulated coding with Poisson-like spike trains. PLoS Comput Biol 7, e1002231
4 Masquelier, T. et al. (2009) Oscillations, phase-of-firing coding, and spike timing-dependent plasticity: an efficient learning scheme. The Journal of neuroscience 29, 13484–93
5 Masquelier, T. and Thorpe, S.J. (2007) Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput Biol 3, e31
6 Masquelier, T. (2012) Relative spike time coding and STDP-based orientation selectivity in the early visual system in natural continuous and saccadic vision: a computational model. Journal of computational neuroscience 32, 425–41
7 Zamarreño-Ramos, C. et al. (2011) On Spike-Timing-Dependent-Plasticity, Memristive Devices, and Building a Self-Learning Visual Cortex. Frontiers in neuroscience 5, 22


Séminaire DAPA du 14 / 3 / 2013 à 10h

Applications Intelligentes en Conception Innovante

Jean-François Omhover


Arts et Métiers ParisTech, Laboratoire CPI

Lieu : 25-26:105

La recherche en innovation et conception produits s’est progressivement orientée vers l’élaboration d’outils numériques intelligents comme support à l’activité des concepteurs : recherche d’informations adaptée aux métiers de la conception, génération de concepts ou de formes pour le design, indexation et manipulation de données sémantiques liées au modèle produit, middleware intelligent pour la programmation d’outils de conception... Le besoin se fait particulièrement sentir dans les phases amont du processus d’innovation (recueil du besoin, créativité, conception globale) dans lesquelles les données manipulées sont mal définies, ambigües ou floues.

 

Le rapprochement entre la communauté du génie industriel et celle de l’apprentissage automatique est donc amorcé. Mais il n’est pas sans poser problème. En effet, du coté conception les opportunités permises par ces technologies sont mal maîtrisées, il faut pouvoir traduire les besoins des concepteurs en termes technologiques, tout en maîtrisant la faisabilité. Du coté apprentissage, les besoins des concepteurs ne sont pas directement exploitables pour la recherche et le développement de la technologie, et la mauvaise définition du problème empêche souvent d’identifier les verrous technologiques sous-jacent.

 

Ainsi donc nous nous intéressons à la question de savoir comment identifier, concevoir et expérimenter des technologies intelligentes pour l’élaboration de systèmes adaptés au besoin des concepteurs ?

Au-delà des concepteurs, cette question de l’adaptation des technologies d’apprentissage au besoin des utilisateurs se trouve aujourd’hui amplifiée par l’émergence de nouveaux terrains applicatifs qui vont nécessiter la coopération entre design produit et intelligence artificielle (internet of things, open data).

 

Nous attaquerons donc cette problématique au travers de 4 exemples d’élaboration de systèmes support à l’activité de conception, en identifiant les différentes phases critiques par lesquels passe l’élaboration de ces outils intelligents :

-        Recueil et formalisation du besoin : un système d’aide au choix des méthodes de conception de produit [Thèse de Nathalie Lahonde]

 

-        Traduction du besoin : une expérimentation sur la catégorisation de mots clés issus de verbatims de designers [Projet KENSYS]

 

-        Conception globale et détaillée :proposition d’un système support à la génération créative d’esquisses pour les designers [Projet ANR GENIUS, www.genius-anr.org]

 

-        Evaluation : système de support à la recherche d’inspiration [Projet TRENDS, www.trendsproject.org]


Séminaire DAPA du 22 / 11 / 2012 à 10h

Global Seismic Monitoring: A Bayesian Approach

Stuart Russell*


Computer Science Division, University of California, Berkeley and LIP6, Université Pierre et Marie Curie

Lieu : 25-26:105

The United Nations Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) has developed the International Monitoring System, a global network of seismic stations, to detect potential treaty violations. CTBTO software analyses the signals from this network to detect and localize the seismic events that caused them.  This analysis problem can be reformulated in a Bayesian framework.  I will describe a Bayesian seismic monitoring system, NET-VISA, based on generative probabilistic models of event occurrence and signal transmission and detection. NET-VISA reduces the number of missed events by a factor of 2 to 3 compared to the currently deployed system. It also finds events that are missed even by CTBTO's expert analysts.

--
* L'orateur est soutenu par, et cette présentation est donnée sous les auspices de, la Chaire Internationale de Recherche Blaise Pascal financée par l’État et la Région d’Ile de France, gérée par la Fondation de l'École Normale Supérieure.
The speaker is supported by, and this talk is given under the auspices of, the International Research Chaire Blaise Pascal, funded by the French State and Ile de France Region and administered by the Fondation de l'École Normale Supérieure.


Séminaire DAPA du 8 / 11 / 2012 à 14h

Évolution Artificielle et Robotique Collective Adaptative

Nicolas Bredèche


Institut des Systèmes Intelligents et de Robotique (ISIR), UPMC

Lieu : 25-26:105

Dans le cadre du projet Européen Symbrion, nous étudions la possibilité de concevoir des essaims de robots capables de survivre dans des environnements inconnus, et potentiellement changeants. Dans cet exposé, je présenterai nos travaux récents sur la conception d'algorithmes distribués pour garantir l'auto-adaptation en ligne d'un essaim de robots, et qui s'intègre dans le cadre des algorithmes dits d'évolution artificielle. L'évolution artificielle permet en effet d'aborder des problèmes pour lesquels la tâche est définie de manière incomplète ou partielle, et où l'espace de recherche entretient peu de relation avec l'espace des solutions. Je présenterai en particulier les résultats d'une expérience menée avec une vingtaine de robots réels, ainsi que plusieurs travaux récents mettant en jeu l'interaction entre un environnement contraint et le processus d'auto-adaptation.


Séminaire DAPA du 25 / 10 / 2012 à 10h

Gradients de prototypicalité, mesures de similarité et de proximité sémantique : une contribution à l'Ingénierie des Ontologies

Xavier Aimé


INSERM

Lieu : 25-26:105

En psychologie cognitive, la notion de prototype apparaît de manière centrale dans les représentations conceptuelles. Dans le cadre de nos travaux, nous proposons d'introduire cette notion au sein des activités relevant de l'Ingénierie des Ontologies et de ses modèles de représentation. L'approche sémiotique que nous avons développée est fondée sur les trois dimensions d'une conceptualisation que sont l'intension (les propriétés), l'expression (les termes), et l'extension (les instances). Elle intègre, au sein de l'ontologie, des connaissances supplémentaires propres à l'utilisateur (pondération des propriétés, corpus, instances). Pratiquement, il s'agit de pondérer les liens "is-a", les termes et les instances d'une hiérarchie de concepts, au moyen de gradients de prototypicalité respectivement conceptuelle, lexicale et extensionnelle.  Nous avons étendu notre approche à la définition de deux nouvelles mesures sémantiques, en nous inspirant des lois de similarité et de proximité de la théorie de la perception. Nous présenterons différents use case de ces mesures (recherche d'information, validation d'ontologies, etc.).


Séminaire DAPA du 11 / 10 / 2012 à 10h

Prix boursiers, incertitude et couverture médiatique

Marie-Aude Laguna


DRM-Finance, Université Paris-Dauphine

Lieu : 25-26:105

Après avoir précisé les modalités de réaction des prix boursiers aux annonces financières en présence d’incertitude, nous présenterons les principaux défis empiriques posés par l’étude du rôle des médias sur les marchés boursiers : comment distinguer, notamment, le contenu informationnel des annonces du contexte informationnel des marchés.
La présentation s’appuiera sur le cas particulier de la réponse des marchés et des média aux accidents industriels.


Séminaire DAPA du 8 / 10 / 2012 à 14h

Global Technology Outlook (GTO) 2012

C. Mohan


IBM Fellow and Former IBM India Chief Scientist

Lieu : 2526:101

The Global Technology Outlook (GTO) is IBM Research's vision of the future for information technology (IT) and its impact on industries that use IT. This annual exercise highlights emerging software, hardware, and services technology trends that are expected to significantly impact the IT sector in the next 3-7 years. In particular, the GTO identifies technologies that may be disruptive to an existing business, have the potential to create new opportunity, and can provide new business value to our customers. The 2012 GTO is set to build not only on its 30 predecessors, but the 100 years of IBM innovation. The 2012 GTO reports on six key findings that share a common theme: analytics. The explosion of unstructured, and increasingly uncertain, data will amplify the need for the development of new models and new classes of computing systems that can handle the unique demands of analytics. The 2012 GTO focuses on six topics: Managing Uncertain Data at Scale, Systems of People, Outcome Based Business, Resilient Business and Services, Future of Analytics, and The Future Watson . In this talk, I will share the GTO 2012 findings with the audience. This talk should be of interest not only technical people but also to a much broader set of people.

Bio:

Dr. C. Mohan has been an IBM researcher for 30 years in the information management area, impacting numerous IBM and non-IBM products, the research community and standards, especially with his invention of the ARIES family of locking and recovery algorithms, and the Presumed Abort commit protocol. This IBM, ACM and IEEE Fellow has also served as the IBM India Chief Scientist. In addition to receiving the ACM SIGMOD Innovation Award, the VLDB 10 Year Best Paper Award and numerous IBM awards, he has been elected to the US and Indian National Academies of Engineering, and has been named an IBM Master Inventor. This distinguished alumnus of IIT Madras received his PhD at the University of Texas at Austin. He is an inventor of 37 patents. He serves on the advisory board of IEEE Spectrum and on the IBM Software Group Architecture Board’s Council. More information can be found in his home page at http://bit.ly/cmohan


Séminaire DAPA du 4 / 10 / 2012 à 17h

Analytics, Cloud-Computing, and Crowdsourcing – or How To Destroy My Job...

Piero P. Bonissone

Chief Scientist, SSA, GE Global Research

Biography:

A Chief Scientist at GE Global Research, Dr. Bonissone has been a pioneer in the field of fuzzy logic, AI, soft computing, and approximate reasoning systems applications since 1979. Recently he has led a Soft Computing (SC) group in the development of SC application to diagnostics and prognostics of processes and products, including the prediction of remaining life for each locomotive in a fleet, to perform efficient assets selection.  His current interests are the development of multi-criteria decision making systems for PHM and the automation of intelligent systems lifecycle to create, deploy, and maintain SC-based systems, providing customized performance while adapting to avoid obsolescence.

He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), of the Association for the Advancement of Artificial Intelligence (AAAI), of the International Fuzzy Systems Association (IFSA), and a Coolidge Fellow at GE Global Research.  He is the recipient of the 2012 Fuzzy Systems Pioneer Award from the IEEE Computational Intelligence Society. Since 2010, he is the President of the Scientific Committee of the European Centre of Soft Computing.  In 2008 he received the II Cajastur International Prize for Soft Computing from the European Centre of Soft Computing. In 2005 he received the Meritorious Service Award from the IEEE Computational Intelligence Society. He has received two Dushman Awards from GE Global Research. He served as Editor in Chief of the International Journal of Approximate Reasoning for 13 years. He is in the editorial board of five technical journals and is Editor-at-Large of the IEEE Computational Intelligence Magazine. He has co-edited six books and has over 150 publications in refereed journals, book chapters, and conference proceedings, with an H-Index of 30 (by Google Scholar). He received 65 patents issued from the US Patent Office (plus 15 pending patents).  From 1982 until 2005 he has been an Adjunct Professor at Rensselaer Polytechnic Institute, in Troy NY, where he has supervised 5 PhD theses and 33 Master theses. He has co-chaired 12 scientific conferences and symposia focused on Multi-Criteria Decision-Making, Fuzzy sets, Diagnostics, Prognostics, and Uncertainty Management in AI. Dr. Bonissone is very active in the IEEE, where is has been a member of the Fellow Evaluation Committee from 2007 to 2009. In 2002, while serving as President of the IEEE Neural Networks Society (now CIS) he was also a member of the IEEE Technical Board Activities (TAB). He has been an Executive Committee member of NNC/NNS/CIS society since 1993 and an IEEE CIS Distinguished Lecturer since 2004.


General Electric Global Research

Lieu : LIP6 (UPMC), en salle 2526:105.

We are witnessing the resurgence of analytics as a key differentiator for creating new services, the emergence of cloud computing as a disrupting technology for service delivery, and the growth of crowdsourcing as a new phenomenon in which people play critical roles in creating information and shaping decisions in a variety of problems. After introducing the first two (well-known) concepts, we will analyze some of the opportunities created by the advent of crowdsourcing. Then, we will explore the intersections of these three concepts.  We will examine their evolution from the optics of a professional machine-learning researcher and try to understand how his job and roles have evolved over time. In the past, analytic model building was an artisanal process, as models were handcrafted by an experienced, knowledgeable model-builder. More recently, the use of meta-heuristics (such as evolutionary algorithms) has provided us with limited levels of automation in model building and maintenance.  In the not so distant future, we expect analytic models to become a commodity. We envision having access to a large number of data-driven models, obtained by a combination of crowdsourcing, crowdservicing, cloud-based evolutionary algorithms, outsourcing, in-house development, and legacy models. In this new context, the critical issue will be model ensemble selection and fusion, rather than model generation. We address this issue by proposing customized model ensembles on demand, inspired by Lazy Learning. In our approach, referred to as Lazy Meta-Learning, for a given query we find the most relevant models from a DB of models, using their meta-information. After retrieving the relevant models, we select a subset of models with highly uncorrelated errors (unless diversity was injected in their design process.)  With these models we create an ensemble and use their meta-information for dynamic bias compensation and relevance weighting. The output is a weighted interpolation or extrapolation of the outputs of the models ensemble. The confidence interval around the output is reduced as we increase the number of uncorrelated models in the ensemble. This approach is agnostic with respect to the genesis of the models, making it scalable and suitable for a variety of applications.  We have successfully tested this approach in a regression problem for a power plant management application, using two different sources of models: bootstrapped neural networks, and GP-created symbolic regressors evolved on a cloud.


Séminaire DAPA du 5 / 7 / 2012 à 10h

Gérer l'inconsistence et l'incertitude dans la fusion d'information via les sous-ensembles maximaux cohérents: concept et illustration

Sébastien Destercke


Laboratoire Heuristique et Diagnostic des Systèmes Complexes, Compiègne

Lieu : 2526:105

Dans exposé, je parlerai des problèmes de fusion d'information en présence d'incertitudes importantes (peu de données, manque de fiabilité) et de conflit entre les sources d'information. Je considèrerai l'utilisation des théories probabilistes imprécises comme un moyen de modéliser l'incertitude, et j'appliquerai le principe de sous-ensembles maximaux cohérents pour fusionner les différents éléments (conflictuels) d'information. J'illustrerai ensuite l'application de ce principe par quelques exemples.


Séminaire DAPA du 26 / 6 / 2012 à 10h

A Calculus for Practical Reasoning

Alexander Artikis


National Centre for Scientific Research "Demokritos"

Lieu : 2526:105

In the field of complex event processing, among others, there is a need for computational frameworks supporting real-time reasoning, reasoning under uncertainty and automated knowledge construction. I will present a dialect of the Event Calculus that meets these requirements. The Event Calculus is a logic programming language for representing and reasoning about events and their effects. Our dialect includes novel caching techniques that allow for efficient temporal reasoning, scalable to large data streams. Furthermore, when ported to probabilistic frameworks, it may support various types of uncertainty, such noisy data streams and imprecise knowledge. To avoid the time-consuming, error-prone process of manual knowledge construction, I will present techniques for incremental structure learning that take advantage of large datasets. The Event Calculus dialect will be illustrated with the use of two real-world applications: complex event recognition for city transport management and public space surveillance.


Séminaire DAPA du 24 / 5 / 2012 à 10h30

Transverse Subjectivity Classification

Gaël Harry Dias


Professeur à l'Université de Caen Basse-Normandie

Lieu : 25-26 105

In this talk, we will present our research on learning models for subjectivity classification across domain. After a small introduction about related works and challenges of sentiment analysis, we will start by presenting new features for subjectivity analysis. Then, we will present two different paradigms of multi-view learning strategies to learn transfer models: multi-view learning with agreement and guided multi-view learning. Then, we will present an exhaustive evaluation based on both paradigms including two states-of-the-art algorithms and show that accuracy over 91% can be obtained using three views. In our concluding remarks, we will talk about future extensions of the presented methodology.


Séminaire DAPA du 3 / 5 / 2012 à 10h

Fouille d'espaces de recherche en optimisation combinatoire

Pascale Kuntz


Laboratoire d'Informatique de Nantes Atlantique École Polytechnique de l'Université de Nantes

Lieu : 2526:105 (Grande Salle)

Dans cet exposé, nous discuterons de l'intérêt de la fouille exploratoire de données pour tenter de découvrir des structures dans des espaces de recherche associés à des problèmes d'optimisation combinatoire, et ainsi améliorer les performances expérimentales de certaines méta-heuristiques. Des illustrations seront données notamment sur le problème de coloration de graphes.


Séminaire DAPA du 5 / 4 / 2012 à 14h

Attribution d’auteur : Une approche basée sur le vocabulaire spécifique

Jacques Savoy


Institut d'informatique, Université de Neuchâtel – Suisse

Lieu : Salle 25-26 105

Dans cette présentation, nous discuterons des familles de méthodes proposées pour résoudre le problème de l’attribution d’auteur (sur la base d’un ensemble de texte écrits par des auteurs connus, peut-on déterminer l’auteur d’un nouveau document).  Après un survol des diverses questions reliées à l’attribution d’auteur, nous présenterons des solutions classiques à cette question.  Notre modèle s’inscrit dans cette perspective et s’appuie sur le concept de vocabulaire spécifique d’un texte ou d’une partie d’un corpus.  Nous avons ainsi la possibilité de définir la spécificité lexicale d’un texte (ou d’un auteur).  Ensuite nous indiquerons comment cette spécificité peut être comparée à des profils d’auteurs afin de déterminer l’auteur possible d’un texte.  Afin d’évaluer notre approche nous avons conduit deux expériences sur des corpus de presse (Glasgow Herald : 5 408 articles écrit par 20 journalistes ; La Stampa : 4 326 articles rédigés par 20 auteurs).  Cette expérience démontre les qualités relatives des méthodes Delta, chi-carré ou celle basée sur la divergence de Kullback-Leibler.