fr | en
Laboratoire Angevin de Recherche en Ingénierie des Systèmes

Separated by coma

OCTAVE Research Project

Extraction and management of customer knowledge by unsupervised learning methods.
PhD student : Axel GUERIN
Thesis director: Frédéric Saubion; Co-director: Pierre Chauvet
Industrial supervision: C. COURTOIS (OCTAVE)

Start thesis : March 2021

Group: Information, Signal, Image and Life Sciences

Contacts : frederic.saubion @ and chauvet @

Objective of thesis

The objective of this PhD work is twofold. The main objective is to add a predictive segmentation to a decision support tool for the sales staff of Octave's client companies. This tool aims to allow them to target marketing campaigns on certain customer typologies. The second objective is the creation of a purchase recommendation engine, based on the same segmentation, which will be used for the websites of Octave's clients. As the product will eventually be used for several companies, it will be necessary to define a generic methodology applicable to the retail domain. In particular, this tool must be able to use the internal data of these companies as well as external databases.

Scientific and technical challenges

Definition of a methodology for the preparation of data in a generic way for retail:

A first hard point for this system is the realisation of a data management brick that allows the collection of very heterogeneous data from different sources. This brick will have to take into account missing, incomplete or even incorrect data. This data may also be changing, either because of a change in the client company's commercial activity or because of enrichment or modification of the type of data it stores, or because of modifications in the labels and/or associated formatting. Thus, a great deal of research needs to be carried out on tools and algorithms enabling the detection of erroneous data (or data that does not comply with the planned formatting), replacing missing data or deleting insufficiently qualified samples. This research is naturally also linked to the choice of segmentation and prediction models that will be implemented in the data analysis brick, such as Machine Learning (ML).

Construction of unsupervised learning algorithms :

The implementation of ML-type algorithms also requires adaptation to all these data typologies. The pre-processed data will require modifications within the unsupervised learning methods, for example, the choice of a relevant distance. These modifications go hand in hand with the preparation phase and may help to overcome the problem of missing data in some cases. A part of the work will be devoted to the evaluation of the relevance of these algorithms, measurement of the inter-cluster separation, intra-cluster inertia, comparison with a business expertise... The last part will be devoted to the visualisation of the results, the developed tool being intended for people not initiated in the field of artificial intelligence, these results will have to be visual and simple to understand.

Analysis and explanation of data

Logical data analysis can provide explanations to better understand the structural characteristics of groups of data. We plan to use approaches that we have already used with biological data in this new context. These methods also allow the extraction of characteristic patterns that can then be presented to users.

Development of a recommendation engine based on the results obtained by unsupervised learning methods:

This will enable the relevance of recommendations to be deepened by taking into account relevant parameters that have been unexploited until now. Indeed, recommendation engines in e-commerce today mainly work on a user rating system or links between different products. However, by tracing the entire customer journey and cross-referencing it with external data, the possibilities are much greater. This recommendation engine would be a direct application of the previous phase, the fact of being able to classify customers based on this data would make it possible to obtain a result that better takes into account the entire customer journey.