Formations

In order to make you autonomous, it is possible to upgrade your data scientists on certain skills.
We can animate intra-company training in machine learning and deep learning, à la carte, according to your needs…
The intervention of experts and researchers at the top of the state of the art can also be considered within your structure.
Several formats are possible: short training courses lasting a few days, or coaching over a complete course sometimes spread over several months, with a few hours per week.

“Machine Learning” cycle

Graph Mining : Theory (7h)

Introduction to Graph Mining
Mathematical modeling and random graph models
Spectral theory and spectral clustering algorithm
Gluttonous optimization of modularity

Graph Mining : Practical Work (7h)

Generation of random graphs
Spectral decomposition and community detection
Larger Real Graph Readings
Leuven Algorithm
Applications to heterogeneous data analysis

NLP : Theory (7h)

NLP Preprocessing :: tokenisation, lemmatisation, part of speach, stopwords
Vectorization of the text
Similarities, distances applicable to the text
Clustering and Text Classification, Latent Dirichlet Allocation
Generative models and neural networks

NLP : Practical Work (7h)

NLP Preprocessing
One-hot encoder, tf-idf
Use of word embeddings (word2vec, doc2vec, fasttext, …)
Application to text classification and clustering

Case study: creating a chatbot (7h)

“Artificial Intelligence” Cycle

Statistics and real-time learning (4 hours)

In this course, we introduce two paradigms of machine learning: statistical learning and e-learning. Statistical learning is the conventional ML framework in which a model or algorithm is refined using a static set of observed data. This ML is based on standard iid assumptions and reveals some of the limitations of current applications. We therefore introduce a more realistic framework called e-learning, in which each observation is made sequentially. We present algorithms capable of adapting to each new dataset. We introduce standard algorithms – e.g. exponential weighted average, bandit algorithms – and highlight practical performance over different scenarios.

Program

  • Statistical learning: supervised and unsupervised cases, over-fitting, cross-validation and generalization.
  • E-Learning: Prediction with expert advice, online regression and clustering, UCB
  • Applications: Time series forecasting (oil and gas), AB Testing (Digital Marketing)

Referral systems (6 hours)

Referral systems are well known thanks to the Netflix challenge. Historically, the challenge has focused on exploring approaches to deliver accurate personalized content by predicting users’ ratings of movies. All of the necessary information was based on previous ratings of movie users. Recommendation systems are also used for information search and content discovery: combined with querying and browsing, they allow users faced with a huge amount of information to navigate through that information efficiently and satisfactorily. Finally, the notion of real-time decision making combined with the desire to offer accurate but diverse content has contributed to the evolution of traditional recommendation system approaches such as collaborative filtering from the factorization machine to bandit models. This historical evolution of the recommendation system towards bandit models will be at the heart of this course. We will focus on explaining the main statistical approaches in both theoretical and algorithmic terms. Factorization machines and bandit models will be explained and tested on real data sets.

Program

  • Factorization machines: theory and algorithms
  • Bandit Models for Online Prediction: Theory and Algorithms
  • Applications for food supermarket recommendations / for real-time bandit models, social media data capture

Automatic natural language processing (22 hours)

This training offers an overview of traditional methods, recent models and algorithms for analyzing textual data. You will discover here the main issues and levers of NLP, word vectorization and seq2seq. You will be able to practice these concepts using the Python language on practical work. A complete session will be dedicated to the creation of a chatbot and will take into account all the concepts learned during the NLP training.

Program

  • NLP preprocessing: tokenization, lemmatization, part of speech, stopwords
  • Vectorization of the text
  • Similarities, distances between texts
  • Text Grouping and Classification, Latent Dirichlet Allocation
  • Generative models and neural networks
  • One-hot encoder, tf-idf
  • Use of word embeddings (word2vec, doc2vec, fasttext, …)
  • Application to the classification and grouping of texts

Deep learning (16 hours)

The training will provide elements to understand deep learning and how it is implemented. There will be a practical part offering the possibility to manipulate these new concepts, solving cases involving the analysis of real data sets.

Program

  • MRE, generalization and regularization
  • Bias reduction: from linear to kernel methods
  • CNN (Convolutional Neural Network) for Images: Architecture and Complexity
  • Generalization and real dimensionality of a deep network
  • Unsupervised lexical extension
  • Supervised learning and curse of dimensionality
  • Neural networks for solving big problems
  • Preliminary stage: supervised architecture
  • TensorFlow: a Python framework for in-depth learning
  • RNN (Recurrent Neural Networks) with TensorFlow

Graph mining (16 hours)

This course provides an overview of traditional and newer methods, models and algorithms for the analysis of large graphs. You will discover the main issues and levers of graph mining, random graph models and the theoretical and algorithmic knowledge of graph analysis and community detection. You will also have the opportunity to practice these concepts in practical work using the Python language.

Program

  • Mathematical modeling and random graphing models
  • Spectral theory and spectral grouping algorithm
  • Optimization of modularity
  • Generation of random graphs
  • Spectrum decomposition and Community detection
  • Playback of larger real graphics
  • Leuven Algorithm
  • Applications to heterogeneous data analysis and social networks

Organisation of training courses :

Pedagogical and technical means

  • Reception of trainees in a room dedicated to training.
  • Projected training support documents.
  • Theoretical presentations
  • Case studies
  • Online provision of support documents following the training.

Monitoring of the implementation of the evaluation of the training results.

  • Time sheets.
  • Oral or written questions (MCQ).
  • Case scenarios.
  • Training evaluation forms.