Real Time Machine Learning

**Seminars****Inspired by university laboratory seminars,****we organize a monthly event that can take different forms:**

- presentation of a new LumenAI algorithm,

- application of an algorithm in an industrial context,

- introduction to a promising new subject in math, computer science or ML,- introduction to a fondamental concept or approach in math or computer science,

- summary of a recent article proposing a new approach.**Next sessions**Here are the topics for our next sessions:

"Clustering and media identification"

February 2019, Pau

Paul GAY

Summary: With the Internet and the digital transition, the media face challenges in managing their digital archives and exploiting information on the internet, whether from competing media or social networks. In this context, multimedia indexing provides tools for manipulating this data that is composed of text, video, and audio. Several companies and research projects have been created in recent years to meet these needs, combining skills in NLP, speech processing and vision. In this context, I will present the indexing systems of people that were developed during my thesis. I will talk about clustering and speaker and face identification in a supervised and unsupervised way. In particular, one of the tracks explored is to extract automatically the context of the video, for example the type of plan (studio, micro sidewalk) and the role of the people. The problem is then to find a sufficiently generic method that can adapt to different contents and situations: debates, reporting, etc.

"NLP & Community Detection"

February 2019, Paris

Wajdi Farhani & Chemsdin Naamane

Summary: Classical community detection algorithms work on graphs with numerical edge weighting (integer or floats). In real-life graphs, such as social networks or document-type relationships, edges could contain textual information, sentiments or large documents. During this seminar, Wajdi and Chems will present their work on the subject of community detection with textual weighting of edges. The principle of the method is to derive a polarity indice for each edge, based on a large pre-trained model thanks to a large corpus. Then, an exponential weight is applied to each edge and standard community detection is performed. Modularity index and accuracy is evaluated and the method is applied to a Twitter dataset filtered with hashtag Gilets Jaunes. LDA is also proposed to caracterize eah community.

**Past sessions**Find here the history of our seminars

"Community Detection

and Model Parallelization"

March 20, 2018

Yves Darmaillac

Summary : Community detection aims at identifying groups of nodes with relatively high density of edges inside, and relatively small connexions outside. We present a novel algorithm inspired by the popular Louvain Algorithm. Our version can work with e stream of data, and dinamically updates the partition of nodes. This algorithm is hierarchical, where each level is the aggregation of the previous level. The design is based on an architecture of parallel and similar workers, one by level. We explain this model parallelism (in contrast to the vanilla data parallelism) and detail the software engineering and the computer science involved.

"Mathematical Introduction to Optimal Transport and Applications in Machine Learning/Statistics"

April 10, 2018

Sébastien Loustau

Summary: In this presentation, we will introduce the optimal transport problem between two discrete probability measures. After a brief historical review, the presentation will focus on the latest theoretical and algorithmic advances proposed by Cuturi (2013) consisting in regularizing the optimization problem through an entropic penalty. This allows the calculation of the optimal transport between two discrete measurements of size n in O(n^2log n). Finally, applications will be presented in clustering (comparison of two clustering results) and in text mining (calculation of a distance between two documents). If time permits, we will also discuss an application in density estimation (or generative model) using a GPU implementation.

"WebApp/API How does it work? VueJs"

May 15, 2018

Thomas Baldaquin

Summary: How to create an end-to-end web application? In concrete terms, we will talk about REST! How to create scalable, high-performance, simple, modular, portable and resilient applications?

First we will talk about front application, and more precisely VueJS, a javascript framework competing with Angular. How is it possible to delegate application management to the client browser and what does it involve?

Then we will talk about the creation of an API that serves the data for our application. And how it is possible to test and assemble these bricks to create an architecture based on multiple services. We will use Pyramid, which is an HTTP framwork in python.

Finally we will present Lumoney, the administrative management application based on these tools."Edge-prediction problem

for social networks"

July 20, 2018

Teo Nguyen

Summary: This presentation will report on the 2-month internship that was intended to implement an edge-prediction algorithm. In a graph that evolves over time (for example, typically social networks), we try to predict which edges will be formed in the next period. The presentation will therefore explain how to calculate the probability that two nodes form an edge, and what metric is used to validate or not a prediction model. Finally, all test results from a manually constructed 200k edge graph will be presented.

"Some tools for the use of

the MCMC algorithm

on time series"

August 28, 2018

Claire Gayral

Summary: To make groupings between time series, we use a scala program, whose theory is based on Monte Carlo Markov Chain Methods (MCMC). This algorithm is complex. Among other things, its results depend on different parameters, specific to each dataset. The idea was to write a set of functions to quickly determine the initial parameters. Some questions have deserved particular attention: How to know if a clustering is "good"? When can we say that the algorithm has converged? Is it "stable"? This short presentation is here to introduce you to this!

"Large-scale prediction via sparsity induced by variable binarization"

September 28, 2018

Mokhtar Zahdi

Summary: We consider the supervised linear learning problem where a large number of continuous variables are available. To do this, we are interested in combining the trick of one-hot encoding of these variables with a new penalty called binarsity. In each group of binary variables resulting from the one-hot coding of a single raw variable, this penalty uses a regularization of total variation with an additional linear constraint to avoid colinearity within the groups. We propose non-asymptotic oracle inequalities for generalized linear models, and we illustrate the numerical performance of our approach over several datasets.

"Classification of e-commerce site pages and intelligent scrapping"

October 1, 2018

Chemsdin Naamane

Summary: A brief overview of the DOM-Trees and the constraints related to their operation, before taking care of the classification of the pages of e-commerce sites via NLP. Then intelligent scrapping via NLP and computer vision.

"Inversion of dependence in IT"

October 1, 2018

Yves Darmaillac + Peio Roth

Summary: I will present you the inversion of dependence which is the key to the sustainability of a software in the medium / long term. Well mastered, this concept eliminates "rigidity"; in other words, the software can evolve (and therefore be maintained) under good conditions. It is also a way to achieve productivity and quality gains from the first development cycles of a project, even if a small investment is required at the beginning.

In the second part, Peio Roth - an experienced developer - will introduce you to the challenges of technical excellence in software development."Median of Means"

November 2018

Édouard Genetay

Summary: Due to the rise of digital technologies in our societies, statisticians are now typically confronted with masses of raw data from which they must extract information. This data, for example from sensors or summarizing behavioural history, is generally corrupt, i.e. it contains measurement or data entry errors, missing data or even has been deliberately manipulated (cyber-attack). Therefore, the use of robust statistical analysis algorithms, i.e. those that are not very sensitive to the presence of outliers, is a crucial issue that allows the statistician to save valuable time and ensure the accuracy of his conclusions. Sometimes, the challenge is also to identify and understand the data that are precisely anomalous.

In this thesis project, we are interested in a general principle of robust estimation, called Median-of-Means a (MOM), which allows statistical procedures that are not very sensitive to outliers to be applied while detecting corrupted data. We propose to use the MOM principle for non-supervised classification tasks. Indeed, unsupervised classification is a data analysis technique, often used as an exploratory method on raw data, particularly justifying the use of robust algorithms. We envisage a practical implementation and mathematical analysis of the statistical guarantees of new unsupervised classification algorithms based on the MOM principle.

"Community detection with overlaps"

January 2019

Sébastien Loustau

Summary: Traditional community detection (and clustering in general) algorithms propose a partition of space, i.e. a significant restriction: each element can only belong to one community. In this presentation, we propose to address three algorithmic approaches to overcome this limitation:

• Leuven Fuzzy algorithm which uses several Leuven and Leuven's hesitations

• Genetic algorithm that predicts a matrix Nodes X Communities

• Local algorithm that looks for the community of a node at several resolution scales"AI helping surgeons in Operating Rooms"

April 2018

Julien Peyras

Summary: It is clear how AI first invaded marketing automation fields and more recently impregnated industries and its 4.0 fashion. However, introducing AI within OR (Operating Rooms) is a topic prone to reluctance which is still being discussed and is far from being commonly adopted.

During the https://www.gynecologic-surgery.com/ Congress we presented how a marketing-automation-like approach could actually bring value into the OR.

© 2019 - mentions légales

All Posts

×