Liste de nos séminaires

(ordre anti-chronologique)


Séminaire DAPA du 9 / 3 / 2017 à 10h

Massive Online Analytics for the Internet of Things (IoT)

Albert Bifet (Telecom ParisTech)


Lieu : salle 405, couloir 24-25, 4 place Jussieu, 75005 Paris

Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.

Bio

Albert Bifet is Associate Professor at Telecom ParisTech and Honorary Research Associate at the WEKA Machine Learning Group at University of Waikato. Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern
Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2015, 2014, 2013, 2012), and ACM SAC Data Streams Track (2017, 2016, 2015, 2014, 2013, 2012).

Plus d'information sur Albert Bifet : http://albertbifet.com


Séminaire DAPA du 22 / 2 / 2017 à 14h

Riding the Big IoT Data Wave: Complex Analytics for IoT Data Series

Themis Palnapas (LIPADE, Université Paris-Descartes)


Lieu : salle 101, couloir 26-00, 4 place Jussieu, 75005 Paris

The realization of the Internet of Things (IoT) is creating an unprecedented tidal data wave, consisting of the collection of continuous measurements from an enormous number of sensors. The goal is to better understand, model, and analyze real-world phenomena, interactions, and behaviors. Consequently, there is an increasingly pressing need for developing techniques able to index and mine very large collections of sequences, or data series. This need is also present across several applications in diverse domains, ranging (among others) from engineering, telecommunications, and finance, to astronomy, neuroscience, and the web. It is not unusual for the applications mentioned above to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size.

In this talk, we describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce solutions to this problem. Furthermore, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series.

Finally, we present our vision for the future in big sequence management research.

Bio

Themis Palpanas is a professor of computer science at the Paris
Descartes University (France), where he is a director of the Data
Intensive and Knowledge Oriented Systems (diNo) group. He received
the BS degree from the National Technical University of Athens,
Greece, and the MSc and PhD degrees from the University of Toronto,
Canada. He has previously held positions at the University of Trento
and the IBM T.J. Watson Research Center. He has also worked for the
University of California, Riverside, and visited Microsoft Research
and the IBM Almaden Research Center. His research solutions have been
implemented in world-leading commercial data management products
and he is the author of nine US patents. He is the recipient of
three Best Paper awards (including ICDE and PERCOM), and the IBM
Shared University Research (SUR) Award in 2012, which represents
a recognition of research excellence at worldwide level. He has been
a member of the IBM Academy of Technology Study on Event Processing,
and is a founding member of the Event Processing Technical Society.
He has served as General Chair for VLDB 2013, the top international
conference on databases. His research has been supported by the EU,
CNRS, NSF, Facebook, IBM Research, Hewlett Packard Labs, and Telecom
Italia.

Plus d'information sur Themis Palnapas : http://www.mi.parisdescartes.fr/~themisp/


Séminaire DAPA du 2 / 2 / 2017 à 14h

From mining under constraints to mining with constraints

Ahmet Samet (IRISA, University of Rennes 1)


Lieu : salle 101, couloir 26-00, 4 place Jussieu, 75005 Paris

The mining of frequent itemsets from uncertain databases has become a very hot topic within the data mining community over the last few years. Although the extraction process within binary databases constitutes a deterministic problem, the uncertain case is based on expectation. Recently, a new type of databases also referred as evidential database that handle the constraint of having both uncertain and imprecise data has emerged. In this talk, we present an applicative study case of evidential databases use within the chemistry field. Then, we shed light on a WEvAC approach for amphiphile molecule properties prediction.

Furthermore, the most existing approaches of pattern mining, which are based on procedural programs (as we often use/develop), would require specific and long developments to support the addition of extra constraints. In view of this lack of flexibility, such systems are not suitable for experts to analyze their data. Recent researches on pattern mining have suggested to use declarative paradigms such as SAT, ASP or CP to provide more flexible tools for pattern mining. The ASP framework has been proven to be a good candidate for developing flexible pattern mining tools. It provides a simple and principled way for incorporating expert's constraints within programs.

Bio

Ahmed Samet is a post-doctoral researcher at the University of Rennes 1. He received his M.Sc. degree in Computer Science from the Université de Tunis (Tunisia) in 2010. Then, he obtained a Ph.D. in Computer Science within a Cotutelle agreement between the Université de Tunis (Tunisia) and Université d'Artois (France). He held, at first, the position of a postdoctoral researcher with Sorbonne University: Université de technologie de Compiegne (France). His research topics involve decision making, machine learning under uncertainty and data mining.

Plus d'information sur Ahmet Samet : http://people.rennes.inria.fr/Ahmed.Samet/


Séminaire DAPA du 14 / 12 / 2016 à 10h

Challenges and issues with data quality measurement

Antoon Bronselaer (DDCM research group, Ghent University, Belgium)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Over the past years, challenges in data management having gained more and more attention. Assessment of quality of data is one such challenge that has tremendous potential. In this talk, we revise the current state-of-the-art about measurement of data quality and argue that there is a great need of fundamental research to establish formal systems for measurement of data quality. We revise a formal framework that was proposed very recently and expresses quality in an ordinal manner. We then show the role of uncertainty modelling within this framework. We conclude the talk with revising the role of fusion functions within systems of measurement of data.

Bio

Antoon Bronselaer is assistant professor at Ghent University and member of the DDCM research group (http://ddcm.ugent.be). Over the past ten years, he has been conducting research in the field of data quality, with an emphasis on the application of uncertainty models.

Plus d'information sur Antoon Bronselaer : https://biblio.ugent.be/person/802000047526


Séminaire DAPA du 8 / 12 / 2016 à 11h

Linguistic summaries of process data

Anna Wilbik (Eindhoven University of Technology, The Netherlands)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Linguistic summarization techniques make it easy to gain insight into large amounts of data by describing the main properties of the data linguistically. We focus on a specific type of data, namely process data, i.e., event logs that contain information about when some activities were performed for a particular customer case. An event log may contain many different sequences, because actions or events are often performed in slightly different orders for different customer cases.

We discuss protoforms that are designed to capture process specific information. Linguistic summaries can capture information on the tasks or sequences of tasks that are frequently executed as well as properties of these tasks or sequences, such as their throughput and service time. Such information is of specific interest in the context of process analysis and diagnosis.
Through a case study with a data from practice, we show that the knowledge derived from these linguistic summaries is useful for identifying problems in processes and establishing best practices.

Bio

Anna Wilbik received her Ph.D degree in computer science from the Systems Research Institute, Polish Academy of Science, Warsaw, Poland in 2010. She is currently an Assistant Professor at School of Industrial Engineering, Eindhoven University of Technology, The Netherlands. In 2011 she was a Post-doctoral Fellow at Electrical and Computer Engineering Department, University of Missouri, Columbia, MO, USA. In 2012 she participated in TOP 500 Innovators: Science - Management – Commercialization Program of the Polish Ministry of Science and Higher Education. Her research interests include linguistic summaries, data analysis, machine learning, and computational intelligence with a focus on applications in healthcare.

Plus d'information sur Anna Wilbik : https://www.tue.nl/en/university/departments/industrial-engineering-innovation-sciences/the-department/staff/detail/ep/e/d/ep-uid/20139476/ep-tab/4/


Séminaire DAPA du 20 / 7 / 2016 à 10h

Soft Hierarchical Analytics for Discrete Event Sequences

Trevor Martin (Artificial Intelligence Group, Bristol University)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Over recent years, increasing quantities of data have been generated and recorded about many aspects of our lives. In cases such as internet logs, physical access logs, transaction records, email and phone records, the data consists of multiple overlapping sequences of events related to different individuals and entities. Identification and analysis of such event sequences is an important task which can be used to find similar groups, predict future behaviour and to detect anomalies. It is ideally suited to a collaborative intelligence approach, in which human analysts provide insight and interpretation, while machines perform data collection, repetitive processing and visualisation. An important aspect of this process is the common definition of terms used by humans and machines to identify and categorise similar (and dissimilar) events.

In this talk we will argue that fuzzy set theory gives a natural framework for the exchange of information, and interaction, between analysts and machines. We will describe a new approach to the definition of fuzzy hierarchies, and show how this enables event sequences to be extracted, compared and mined at different levels of resolution.

Bio

Trevor Martin (M’07) is a Professor of artificial intelligence at the University of Bristol, U.K. He received the B.Sc. degree in chemical physics from the University of Manchester, in 1978, and the Ph.D. degree in quantum chemistry from the University of Bristol, in 1984. Since 2001, he has been funded by British Telecommunications (BT) as a Senior Research Fellow, for his research on soft computing in intelligent information management, including areas such as the semantic Web, soft concept hierarchies, and user modeling.

Plus d'information sur Trevor Martin : http://seis.bristol.ac.uk/~entpm/


Séminaire DAPA du 9 / 6 / 2016 à 10h

Vers une approche Agile d’Informatique Décisionnelle basée sur le Soft Computing

Gregory Smits (IUT de Lannion - département R&T)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

La valeur ajoutée d’un jeu de données réside dans les connaissances qu’un expert du domaine peut en extraire. Pour faire face à l’accroissement constant de la volumétrie des jeux de données qu’un expert a à traiter, des outils efficaces doivent être développés afin notamment de générer des explications concises, pertinentes et intelligibles décrivant les données et leur structure. Le terme Informatique Décisionnelle Agile (IDA) désigne les techniques visant à aider les experts (assureurs, décideurs, communicants, etc.) dans l’analyse de données métiers. Au cours de ce séminaire, une approche d’IDA basée sur l’utilisation de théories et techniques issues du soft computing sera présentée. Le soft computing est utilisé dans ce cadre applicatif pour construire une interface entre l’espace numérique/catégoriel de description des données et l’espace conceptuel/linguistique du raisonnement humain. Basé sur une modélisation du vocabulaire subjectif de l’expert, des explications linguistiques et personnalisées sont générées efficacement pour offrir une vue synthétique des données et de leur structure intrinsèque. Ces explications linguistiques sont ensuite traduites sous forme de visualisation graphique qui constitue également une interface expressive d’exploration des données. Les résultats de premières expérimentations montrent la pertinence et l’efficacité de l’utilisation du soft computing dans ce contexte.

Bio

Grégory Smits a obtenu un doctorat d’informatique en traitement automatique des langues naturelles à l’université de Caen (France) en 2008. Il est actuellement maître de conférence à l’IUT de Lannion (Université de Rennes 1) et est membre du laboratoire IRISA (Institut de Recherche en Informatique et Systèmes Aléatoires). Au sein de l’équipe SHAMAN (dépt. Data and Knowledge Management), ses recherches concernent principalement l’interrogation flexible de bases de données ainsi que les stratégies de réponse coopérative.

Plus d'information sur Gregory Smits : http://people.irisa.fr/Gregory.Smits/


Séminaire DAPA du 7 / 3 / 2016 à 17h

A Survey of Applications and Future Directions of Computational Intelligence

Gary Fogel (Natural Selection, Inc.)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Computational intelligence (CI) has a rich history of inspiration from natural systems. While the field continues to grow with new inspiration and methods, application of CI to real-world problems also continues to increase. CI is used on a daily basis in everything from rice cookers to electronic games, by transportation companies and financial analysts, in drug design and diagnostics. While these "non-traditional" methods of decision making accelerate to the marketplace, their acceptance requires and understanding of the advantages and limitations of CI approaches. This survey will introduce the public to CI methods and provide the necessary background to appreciate and understand their current and possible future applications in society.

Bio

Gary B. Fogel is Chief Executive Officer of Natural Selection, Inc. in San Diego, California. He received a B.A. in Biology from the University of California, Santa Cruz and a Ph.D. in Biology from the University of California, Los Angeles in 1998. His current research interests are the broad application of computational intelligence approaches to industry, medicine, and defense, focusing mainly on biomedical and chemical applications. He has authored over 100 peer-reviewed publications including the co-edited books Evolutionary Computation in Bioinformatics (Morgan Kauffman, 2003) and Computational Intelligence in Bioinformatics (Wiley-IEEE Press, 2008). He is an IEEE Fellow and serves as Editor-in-Chief of the Elsevier journal BioSystems and has served on the editorial boards of 7 other journals. He currently serves on the Administrative Committee for the IEEE Computational Intelligence Society and will soon receive the 2016 IEEE CIS Meritorious Service Award.


Séminaire DAPA du 22 / 10 / 2015 à 10h

Ensemble Approaches in Learning

Xin Yao (University of Birmingham, United Kingdom, President of the IEEE Computational Intelligence Sociey)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Designing a monolithic system for a large and complex learning task is hard.
Divide-and-conquer is a common strategy in tackling such large and complex
problems. Ensembles can be regarded an automatic approach towards automatic
divide-and-conquer. Many ensemble methods, including boosting, bagging,
negative correlation, etc., have been used in machine learning and data mining
for many years. This talk will describe three examples of ensemble methods,
i.e., multi-objective learning, online learning with concept drift, and
multi-class imbalance learning. Given the important role of diversity in
ensemble methods, some discussions and analyses will be given to gain a better
understanding of how and when diversity may help ensemble learning.

Some materials used in the talk are based on the following papers:

  • A Chandra and X. Yao, ``Ensemble learning using multi-objective evolutionary
    algorithms,'' Journal of Mathematical Modelling and Algorithms, 5(4):417-445,
    December 2006.
  • L. L. Minku and X. Yao, "DDD: A New Ensemble Approach For Dealing With Concept
    Drift,'' IEEE Transactions on Knowledge and Data Engineering, 24(4):619-633,
    April 2012.
  • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential
    Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B,
    42(4):1119-1130, August 2012.
Bio

Xin Yao is a Chair (Professor) of Computer Science and the Director of CERCIA
(Centre of Excellence for Research in Computational Intelligence and
Applications) at the University of Birmingham, UK. He is an IEEE Fellow and
the President (2014-15) of IEEE Computational Intelligence Society (CIS). His
work won the 2001 IEEE Donald G. Fink Prize Paper Award, 2010 and 2015 IEEE
Transactions on Evolutionary Computation Outstanding Paper Awards, 2010 BT
Gordon Radley Award for Best Author of Innovation (Finalist), 2011 IEEE
Transactions on Neural Networks Outstanding Paper Award, and many other best
paper awards. He won the prestigious Royal Society Wolfson Research Merit Award
in 2012 and the 2013 IEEE CIS Evolutionary Computation Pioneer Award.
His major research interests include evolutionary computation, ensemble
learning, and their applications, especially in software engineering.

Plus d'information sur Xin Yao : http://www.cs.bham.ac.uk/~xin/


Séminaire DAPA du 8 / 10 / 2015 à 10h

Contributions à la fouille de données complexes : données géographiques imprécises et/ou données massives

Cyril de Runz (Université de Reims Champagne-Ardenne (Groupe Signal, Image et Connaissance - SIC))


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

L'objectif porte sur la prise en compte de la véracité dans la fouille, visuelle ou non, de données géographiques imprécises et/ou de données massives. Notre travail porte premièrement sur la considération de la représentativité de la donnée floue dans un ensemble, deuxièmement sur la nuanciation des décisions de classements par l'intermédiaire des couleurs et troisièmement sur la collaboration à grande échelle des approches pour améliorer la validité des décisions. . Nos méthodes se basent notamment sur la définition d'indices temporels flous, de graphes, de rangs, de représentativité, de coloriage guidé par les données, etc. Pour cela, nous nous sommes aussi, en partie, appuyés sur des outils et méthodes compatibles avec MapReduce, à l'instar des cartes de Kohonen, et sur la distribution des opérateurs sur le nuage.
Mes contributions contribuent à une meilleure appréhension des données complexes (imprécises, massives, spatiales, temporelles) dans le cadre de leur fouille et leur visualisation.

Bio

Après avoir obtenu en 2005 un master recherche en intelligence artificielle à l'Université Paul Sabatier (Toulouse, France), j'ai soutenu en 2008 un doctorat en informatique à l'Université de Reims Champagne-Ardenne. Je suis, depuis septembre 2009, Maître de Conférences à l'Université de Reims Champagne-Ardenne et j'effectue mes recherches au CReSTIC. Je suis pour l'année 2015-2016 en délégation CNRS au LIP6 (équipe LFI).

Mes travaux portent sur la gestion de l'information spatiotemporelle imparfaite. Dans ce cadre, mes centres d'intérêt sont la fouille de données complexes, l'analyse/traitement de données spatio-temporelles imparfaites, les systèmes d'information géographique, l'exploration visuelle de données et, depuis deux ans, les problématiques liées au Big Data.

Plus d'information sur Cyril de Runz : https://sites.google.com/site/cyrilderunz/


Séminaire DAPA du 28 / 5 / 2015 à 11h15

A Fuzzy Rule-Based Approach to Single Frame Super Resolution

Nikhil Pal (Indian Statistical Institute, Calcutta)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

High quality image zooming is an important problem. The literature is rich with many methods for it. Some of the methods use multiple low resolution (LR) images of the same scene with different sub-pixel shifts as input to generate the high resolution (HR) images, while there are others, which use just one LR image to obtain the HR image. In this talk we shall discuss a novel fuzzy rule based single frame super resolution scheme. This is a patch based method, where for zooming each LR patch is replaced by a HR patch generated by a Takagi-Sugeno type fuzzy rule-based system. We shall discuss in details the generations of the training data, the initial generation of the fuzzy rules, refinement of the rules, and how to use such rules for generation of SR images. In this context we shall also discuss a Gaussian Mixture Regression (GMR) model for the same problem. To demonstrate the effectiveness and superiority of the proposed fuzzy rule-based system, we shall compare its performance with that of six methods including the GMR method in terms of multiple quality criteria.

Bio

Nikhil R. Pal is a Professor in the Electronics and Communication Sciences Unit of the Indian Statistical Institute. His current research interest includes bioinformatics, brain science, fuzzy logic, pattern analysis, neural networks, and evolutionary computation.
He is currently the Vice President for Publications of the IEEE CIS.
He is a Fellow of the National Academy of Sciences, India, a Fellow of the Indian National Academy of Engineering, a Fellow of the Indian National Science Academy, a Fellow of the International Fuzzy Systems Association (IFSA), and a Fellow of the IEEE, USA.
Nikhil R. Pal is an invited professor of the UPMC from May 22 to June 20, 2015.

Plus d'information sur Nikhil Pal : http://www.isical.ac.in/~nikhil/


Séminaire DAPA du 28 / 5 / 2015 à 10h

Does it all add up? A study of fuzzy protoform linguistic summarization of time series

Jim Keller (University of Missouri (USA))


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Producing linguistic summaries of large databases or temporal sequences of measurements is an endeavor that is receiving increased attention. These summaries can be used in a continuous monitoring situation, like eldercare, where it is important to ascertain if the current summaries represent an abnormal condition. Primarily a human, such as a care giver in the eldercare
example, is the recipient of the set of summaries describing a time range, for example, last night’s activities. However, as the number of sensors and monitored conditions grow, sorting through a fairly large number of summaries can be a burden for the person, i.e., the summaries stop being information and become yet one more pile of data. It is therefore necessary to automatically process sets of summaries to condense the data into more manageable chunks.
The first step towards automatically comparing sets of digests is to determine similarity. For fuzzy protoform based summaries, we developed a natural similarity and proved that the associated dissimilarity is a metric over the space of protoforms. Utilizing that distance measure, we defined and examined several fuzzy set methods to compute dissimilarity between sets of summaries, and most recently utilized these measures to define prototypical behavior over a large number of normal time periods.

In this talk, I will cover the definition of fuzzy protoforms, define our (dis)similarity, outline the proof that it is a metric, discuss the fuzzy aggregation methods for sets of summaries, and show how prototypes are formed and can used to detect abnormal nights. The talk will be loaded with actual examples from our eldercare research. There is much work to be done and hopefully, more questions than answers will result from the discussion.

Sponsored by the Computational Intelligence Society under its Distinguished Lecturer Program

Bio

James M. Keller is a Curators Professor in the Electrical and Computer Engineering and Computer Science departments at the University of Missouri as well as R. L. Tatum Professor for the college. Keller’s research interests are in computational intelligence with current applications to eldercare technology, bioinformatics, geospatial intelligence and landmine detection.

James M. Keller is a CIS Distinguished Lecturer.

Plus d'information sur Jim Keller : http://engineering.missouri.edu/person/kellerj/


Séminaire DAPA du 12 / 3 / 2015 à 14h

Grille bivariée pour la détection de changement dans un flux étiqueté

Vincent Lemaire (Orange Labs)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Cet exposé présentera :

  1. le contexte de la détection de concept drift dans un flux étiqueté : en analyse prédictive et en apprentissage automatique, on parle de dérive conceptuelle lorsque les propriétés statistiques de la variable cible, que le modèle essaie de prédire, évoluent au cours du temps d'une manière imprévue. Ceci pose des problèmes parce que les prédictions deviennent moins exactes au fur et à mesure que le temps passe. La dérive conceptuelle est une des contraintes en fouille de flux de données.

  2. une méthode en-ligne de détection de changement de concept dans un flux étiqueté : elle est basée sur un critère supervisé bivarié qui permet d’identifier si les données de deux fenêtres proviennent ou non de la même distribution. Notre méthode a l’intérêt de n’avoir aucun a priori sur la distribution des données, ni sur le type de changement et est capable de détecter des changements de différentes natures (changement dans la moyenne, dans la variance...). Les expérimentations montrent que notre méthode est plus performante et robuste que les méthodes de l’état de l’art testées.

Bio

Vincent Lemaire is a senior expert in data-mining. His research interests are the application of machine learning in various areas for telecommunication companies with an actual main application in data mining for business intelligence. He developed exploratory data analysis and classification interpretation tools.

Plus d'information sur Vincent Lemaire : http://www.vincentlemaire-labs.fr/


Séminaire DAPA du 15 / 1 / 2015 à 14h

Tensor factorization for multi-relational learning

Raphael Bailly (Heudiasyc, Université Technologique de Compiègne, France)


Lieu : salle 101, couloir 25-26, 4 place Jussieu, 75005 Paris

Learning relational data has been of a growing interest in fields as diverse as modeling social networks, semantic web, or bioinformatics. To some extent, a network can be seen as multi-relational data, where a particular relation represents a particular type of link between entities. It can be modeled as a three-way tensor.

Tensor factorization have shown to be a very efficient way to learn such data. It can be done either in a 3-way factorization style (trigram, e.g. RESCAL) or by sum of 2-way factorization (bigram, e.g TransE). Those methods usually achieve state-of-the-art accuracy on benchmarks. Though, all those learning methods suffer from regularization processes which are not always adequate.

We show that both 2-way and 3-way factorization of a relational tensor can be formulated as a simple matrix factorization problem. This class of problems can naturally be relaxed in a convex way. We show that this new method outperforms RESCAL on two benchmarks.

Bio

R. Bailly is currently post-doc at Heudiasyc (since march 2014), Compiègne. He works with Antoine Bordes and Nicolas Usunier on multi-relational learning and word embeddings. He was previously in Barcelona for a post-doc with Xavier Carreras, whith whom he worked on spectral methods applied to unsupervised setting.

Plus d'information sur Raphael Bailly : https://www.hds.utc.fr/~baillyra/


Séminaire DAPA du 27 / 11 / 2014 à 10h

The Frank-Wolfe Algorithm: Recent Results and Applications to High-Dimensional Similarity Learning and Distributed Optimization

Aurélien Bellet (Télécom ParisTech)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The topic of this talk is the Frank-Wolfe (FW) algorithm, a greedy procedure for minimizing a convex and differentiable function over a compact convex set. FW finds its roots in the 1950's but has recently regained a lot of interest in machine learning and related communities. In the first part of the talk, I will introduce the FW algorithm and review some recent results that motivate its appeal in the context of large-scale learning problems. In the second part, I will describe two applications of FW in my own work: (i) learning a similarity/distance function for sparse high-dimensional data, and (ii) learning sparse combinations of elements that are distributed over a network.

Bio

Aurélien Bellet is currently a postdoc at Télécom ParisTech. Previously, he worked as a postdoc at the University of Southern California and received his Ph.D. from the University of Saint-Etienne in 2012. His main research topic is statistical machine learning, with particular interests in metric/similarity learning and large-scale/distributed learning.

Plus d'information sur Aurélien Bellet : http://perso.telecom-paristech.fr/~abellet/


Séminaire DAPA du 13 / 11 / 2014 à 10h

Computer-Aided Breast Tumor Diagnosis in DCE-MRI Images

Baishali Chaudhury (Department of Computer Science and Engineering, University of South Florida)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

The overall goal of our project is to quantify tumor heterogeneity with advanced image analysis to provide useful information about tumor biology and provide unique and valuable insight into patient treatment strategies and prognosis.We introduced a CAD (computer aided diagnosis) system to characterize breast cancer heterogeneity through spatially-explicit maps using DCE-MRI images. Through quantitative image analysis, we examined the presence of differing tumor habitats defined by initial and delayed contrast patterns within the tumor. The heterogeneity within each habitat was quantified through textural kinetic features at different scales and quantization levels. The functionality of this CAD system was then evaluated by applying it in a multi-objective framework. Various common problems in breast DCE-MRI analysis (like extremely small dataset compared to the number of extracted texture features and highly imbalanced dataset) and different data mining techniques applied in our project to deal with them will be discussed.

Bio

Fourth year PhD Candidate in University of South Florida, Tampa, USA. Currently, working on the “Analysis of DCE-MRI breast tumor images for stratifying patient prognosis”. Broader research interests include: computer vision, data mining and machine learning, sparse data representation.

Plus d'information sur Baishali Chaudhury : http://baishalichaudhury.wix.com/baishali


Séminaire DAPA du 30 / 10 / 2014 à 11h

WaterFowl: a Compact, Self-Indexed RDF Store based on Succinct Data Structures

Olivier Curé (Laboratoire d'informatique Gaspard-Monge, Université Marne-la-Vallée)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

This talk will start with an introduction on the main strategies for storing and indexing RDF data sets. This will consider solutions based on a native RDF approach but also approaches using a relational or NoSQL storage backend. Then, I will present the main features of an on-going work that aims to distribute highly compressed structures adapted for the storage and querying of RDF triples. The compactness of the represented data is supported by an architecture based on Succinct Data Structures (SDS) which enables to store large datasets in main memory. A special form of entity encoding enables inferences in the RDFS entailment regime.

Séminaire DAPA du 2 / 10 / 2014 à 10h

Subgoal Discovery and Language Learning in Reinforcement Learning Agents

Marie desJardins (Department of Computer Science and Electrical Engineering at the University of Maryland, USA)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

As intelligent agents and robots become more commonly used, methods to make interaction with the agents more accessible will become increasingly important. In this talk, I will present a system for intelligent agents to learn task descriptions from linguistically annotated demonstrations, using a reinforcement learning framework based on object-oriented Markov decision processes (OO-MDPs). Our framework learns how to ground natural language commands into reward functions, using as input demonstrations of different tasks being carried out in the environment. Because language is grounded to reward functions, rather than being directly tied to the actions that the agent can perform, commands can be high-level and can be carried out autonomously in novel environments. Our approach has been empirically validated in a simulated environment with both expert-created natural language commands and commands gathered from a user study.

I will also describe a related, ongoing project to develop novel option discovery methods for OO-MDP domains. These methods permit agents to identify new subgoals in complex environments that can be transferred to new tasks. We have developed a framework called Portable Multi-policy Option Discovery for Automated Learning (P-MODAL), an approach that extends the PolicyBlocks option discovery approach to OO-MDPs.

This work is collaborative research with Dr. Michael Littman and Dr. James MacGlashan of Brown University, Dr. Smaranda Muresan of Columbia University. A number of UMBC students have contributed to the project: Shawn Squire, Nicholay Topin, Nick Haltemeyer, Tenji Tembo, Michael Bishoff, Rose Carignan, and Nathaniel Lam.

Bio


Dr. Marie desJardins is a Professor in the Department of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County, where she has been a member of the faculty since 2001. She is a 2013-14 American Council of Education Fellow, the 2014-17 UMBC Presidential Teaching Professor, and an inaugural Hrabowski Academic Innovation Fellow. Her research is in artificial intelligence, focusing on the areas of machine learning, multi-agent systems, planning, interactive AI techniques, information management, reasoning with uncertainty, and decision theory. Current research projects include learning in the context of planning and decision making, analyzing and visualizing uncertainty in machine learning, trust modeling in multiagent systems, and computer science education.

Dr. desJardins has published over 120 scientific papers in journals, conferences, and workshops. She is an Associate Editor of the Journal of Artificial Intelligence Research, is a member of the editorial board of AI Magazine, and was the Program Cochair for AAAI-13. She has previously served as AAAI Liaison to the Board of Directors of the Computing Research Association, Vice-Chair of ACM's SIGART, and AAAI Councillor. She is an ACM Distinguished Member, is a AAAI Senior Member, holds an appointment at the University of Maryland Institute for Advanced Studies, is a member and former chair of UMBC's Honors College Advisory Board, is the former chair of UMBC's Faculty Affairs Committee, and serves on the advisory board of UMBC's Center for Women in Technology.

Plus d'information sur Marie desJardins : http://www.csee.umbc.edu/~mariedj/


Séminaire DAPA du 11 / 9 / 2014 à 14h

Clustering-based Models from Model-based Clustering

Mika Sato-Ilic (Faculty of Engineering, Information and Systems. University of Tsukuba)


Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Recent advances in the area of information science have enabled the collection of multi-source data and complex data in vast amounts. Data analysis has been tasked with the increasingly significant mission of dealing with such data. Clustering is one type of data analysis used to detect and characterize the latent structure of data by classifying the objects based on similarities among objects. Model-based clustering is a framework of clustering methods and main issue of this is an assumption of a model to the data and by fitting the model to data, an adjusted partition will be estimated. Although this approach has the benefit of obtaining a clear solution as the result of the partition based on mathematical theory, we cannot avoid the risk the previously assumed model might not adjust to the latent classification structure of the data. Therefore, we propose a framework called clustering-based models in which we exploit obtained clustering result as a scale of latent structure of the data and apply it to the observed data, and then apply the modified data to a model in order to obtain a more accurate result. In this talk, several methods in this framework called clustering-based models with several applications will be introduced.

Séminaire DAPA du 3 / 7 / 2014 à 10h

Clustering de données temporelles, application à l'analyse des données issue des médias sociaux

Julien Velcin


laboratoire ERIC, Université Lyon 2

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Les modèles graphiques sont devenus très populaires pour traiter les problèmes de classification automatique. Dans cet exposé, je présenterai les travaux réalisés récemment au laboratoire ERIC pour deux problèmes différents de classification non supervisée. Le premier problème que nous avons attaqué s'inspire de modèles probabilistes de topic modeling pour capturer conjointement l'évolution des thématiques et des opinions exprimées dans un corpus de textes. Le deuxième problème abordé consiste à adapter les modèles de mélanges afin de capturer la dynamique des catégories. Les modèles présentés seront illustrés sur des données réelles issues des médias sociaux. J'en profiterai pour donner un aperçu de leur application, dans le cadre du projet ImagiWeb, qui consiste à extraire et à suivre l'image d'entités sur le Web.