I am a Ph.D. in computer science. Currently, I work as a teacher at ENSSAT and as a researcher with IRISA/EXPRESSION team. The team is located in Lannion and Vannes, Brittany, France. My primary topics of interest are TTS (Text-To-Speech) and ASR (Automatic Speech Recognition). My Ph.D. work focused on the study of unit selection algorithms in Text-To-Speech synthesis. After, that I worked as a Language Technology Scientist at Voysis in Dublin, Ireland. At Voysis, I worked on TTS, ASR, NLP (Natural Language Processing) and DSP (Digital Signal Processing). Then, I came back to IRISA to work on TTS again. I subsequently joined ViaDialog to work on a new TTS product before going back to ENSSAT.
After my Baccalauréat scientifique (French degree), in 2007, I first obtained a DUT (Diplôme Universitaire de Technologie) in computer science in 2009. I then entered ENSSAT (École Nationale Supérieure des Sciences Appliquées et de Technologie) where I achieved a "diplôme d'ingénieur" as long as a master degree (from the university of Rennes) in 2012. Finally, I joined the IRISA/CORDIAL team (which became IRISA/EXPRESSION team) for my Phd. Between February and July 2015, I worked as an intern at IDIAP research center in Switzerland. I defended my Ph.D. on the 22nd of September in 2016.
Besides research, I have been teaching computer science at ENSSAT between 2013 and 2016. I was mainly (but not only) involved in the following courses: C and Java language, Web technologies (client side mostly), UNIX/Linux programming, distributed programming and artificial intelligence. I taught a total of 380h of courses during that period.
At ViaDialog, my main task was the development and maintenance of a TTS capability. I served as lead for all TTS-related work.This included developping a working prototype capable of performing fast speech synthesis with a level of quality in line with 2021 standards. The system, based on a neural end-to-end architecture required voice data in order to be trained. Hence, I worked on the recording and post-production of the data. Finally, I operated the transformation of the prototype into an usable product and worked on its maintenance and extension of its capabilities such as adding SSML support, adding security features and much more.
Here are a few key points:
At Voysis, I focused on a wide variety of tasks, across a large part of our stack. I worked on ASR, Voice activity detection (and query endpoint detection), wakeword detection, NLP, Audio Analysis and Processing and TTS. I ran several data collection (audio and text utterances) and several evaluation campaigns.
A few key points:
Image: My office at ENSSAT (10/2015).
I am working in the automatic speech synthesis field. The subject of my PhD has been the "Study of Unit Selection Text-To-Speech Synthesis Algorithms".
Two main strategy types are currently under consideration in this field. The first one relies on a statistical parametric approach where one tries to model speech signals. Models are then used in a generative way to produce speech utterances. It is called the Statistical Parametric Speech Synthesis approach (often shortened as "SPSS" though this conflicts with the name of IBM's statistics software). The second approach, which is an evolution of concatenation-based synthesis, consists in preserving and annotating a large speech corpus (usually several hours or even tens of hours), then extract fragments (called units) and paste them together to reproduce a the utterance that had to be synthesized (target utterance). The mechanism (not trivial) by which these fragments are selected is referred to as Unit Selection. The general technique is called Corpus-Based Speech Synthesis.
My thesis is to explore, diagnose Units Selection mechanism and suggest improvements. To meet these objectives, a corpus-based speech synthesis was needed. For reasons of independence, flexibility and to ensure a transversal control of the application, it was decided to build a completely new system rather than using and modifying an existing tool. So I spent a considerable time during my thesis in contributing to the implementation of the achieved engine within the team and adding features in it.
On the pure research part, I first took interest in evaluating the impact of the search algorithm on Unit Selection. In particular, the question was to identify whether or not optimality of the solution (ie. corpus units to be concatenated) was important and if not, what search strategy was the best. My conclusion was that the search algorithm sensibly impacts selection process only when searching for the optimal solution (or near optimal). Optimality of the solution is not necessary, however. Even a very pruned unit selection can be used with few sensible flaw.
I took also interest in the formulation of the cost function that allows the search algorithm to evaluate corpus units should have. Adapted preselection filtering method (i.e. Not consider the units deemed least useful) and Unit Selection influenced by a Vocalic Sandwich criterion (trying to avoid concatenation on points where they can degrade the signal) were tested. New target costs (judging the dissimilarity between a unit and it's desired characteristics) and concatenation (capacity of the unit to be pasted to the previous one without generating problems) have been implemented and tested. I am currently continuing my work it that direction, paying particular attention to Atom-based intonation decomposition technique which I discovered during my stay at IDIAP and it's possible applications to Target Costs.
Prior to working in the field of speech synthesis, I worked briefly on analogy relations, first trough an internship and then on my free time. In particular, I developed a search algorithm for analog proportions in concept lattice, which led to a publication. This work was a collaboration with Laurent Miclet and Henri Prade.
Keywords (thesis related): Unit Selection ; Corpus-Based Speech Synthesis ; Text-To-Speech Synthesis ; Cost Function ; Concatenation Cost ; Target Cost ; Neural Networks ; Deep Learning.
Image: My teaching/research place, ENSSAT engineering school in Lannion, Brittany, France
Courses I gave as part of a "Mission d'enseignement" (teaching mission) during my second year as a PhD student at ENSSAT engineering school.
Courses I gave as part of a "Mission d'enseignement" during my third year as a PhD student at ENSSAT engineering school.
Courses I gave as part of my ATER (Attaché Temporaire d'Enseignement et de Recherche) position at ENSSAT.
Courses I am in the process of giving at ENSSAT.
July 2015 - September 2015 • Travel
A few photos of places I visited & loved with a few comments.
December 2015 • Music
One of my favorite interests is music, but one cannot always go to concerts. So, to listen to music recordings, you need either speakers or headphones. Here are my impressions concerning the latter.