David guennec's Personal Pages

About me

Who am I?

I am a Ph.D. in computer science. Currently, I work as a teacher at ENSSAT and as a researcher with IRISA/EXPRESSION team. The team is located in Lannion and Vannes, Brittany, France. My primary topics of interest are TTS (Text-To-Speech) and ASR (Automatic Speech Recognition). My Ph.D. work focused on the study of unit selection algorithms in Text-To-Speech synthesis. After, that I worked as a Language Technology Scientist at Voysis in Dublin, Ireland. At Voysis, I worked on TTS, ASR, NLP (Natural Language Processing) and DSP (Digital Signal Processing). Then, I came back to IRISA to work on TTS again. I subsequently joined ViaDialog to work on a new TTS product before going back to ENSSAT.

My Background

David Guennec at TSD 2014 in Brno, Czech Republic

After my Baccalauréat scientifique (French degree), in 2007, I first obtained a DUT (Diplôme Universitaire de Technologie) in computer science in 2009. I then entered ENSSAT (École Nationale Supérieure des Sciences Appliquées et de Technologie) where I achieved a "diplôme d'ingénieur" as long as a master degree (from the university of Rennes) in 2012. Finally, I joined the IRISA/CORDIAL team (which became IRISA/EXPRESSION team) for my Phd. Between February and July 2015, I worked as an intern at IDIAP research center in Switzerland. I defended my Ph.D. on the 22nd of September in 2016.

Besides research, I have been teaching computer science at ENSSAT between 2013 and 2016. I was mainly (but not only) involved in the following courses: C and Java language, Web technologies (client side mostly), UNIX/Linux programming, distributed programming and artificial intelligence. I taught a total of 380h of courses during that period.

Resume

Mon CV en français est téléchargeable ici.

Experience

  • Since Sept. 2022University Lecturer and Researcher at Université de Rennes 1, ENSSAT, France.
    Teaching at ENSSAT Lannion. Working on speech production and audio deepfake detection with IRISA/Expression team.
  • Dec. 2020-Jul. 2022DataScientist Voice AI at ViaDialog, France
    Working on TTS & ASR problems (mainly neural TTS).
  • 2020-2021 (9 months) - Research Engineer at IRISA (ENSSAT Lannion), France
    Building a new end-to-end neural Text-To-Speech Synthesis system and training pipeline for the Expression team of IRISA.
  • 2017 - 2020 (3 years)Language Technology Scientist at Voysis, Dublin, Ireland.
    Work on User Simulation, Voice Activity Detection and Query Endpoint Detection, Text-To-Speech synthesis, Wakeword Detection, Audio Analysis & Processing and NLP tasks.
  • Feb. - Jul. 2015 (5 month) - Research visit at IDIAP research center, Martigny, Switzerland
    Analysis and synthesis of prosody for expressive speech synthesis. Work on F0 modelling using Atom-based intonation decomposition.
  • 2013-2016 (3 years) - Teaching at ENSSAT Lannion, France
    Two "missions d'enseignement" in 2013-2015 (service limited to 64h a year) and one ATER mission in 2015-2016 (full service).
    Intervention in several bachelor and master courses (UNIX, distributed programming, AI)
  • 2012-2016 (4 years) - Ph.D. at IRISA (ENSSAT Lannion), France
    Study of Unit Selection algorithms for Text-To-Speech Synthesis.
  • 2012 (6 months) - Research master’s degree internship at IRISA (ENSSAT Lannion)
    Evaluation of Sandwich Units performances in Text-To-Speech Synthesis.
  • 2011 (4 months) - Research on free time with Laurent Miclet
    Work on analogical proportions and formal concept analysis. Creation of an analogy-finding algorithm in concept lattices.
  • 2009 (3 months) - DUT degree Research internship at ENSSAT Lannion
    Processing written symbols using analogy. Study of natural analogical proportions.
  • 2008-2009 (7 months) - Student project IUT Lannion
    Developping a rendering engine for a 2D game.

Skills

  • Research: Recently: Text-To-Speech Synthesis, Natural Language Processing, Voice Activity Detection, Query Endpoint Detection, Audio Analysis. Older experience: Analogy Relations, ConceptLattices.
  • Languages (computing): Recently: C/C++, Python, Bash, Latex. Older experience: Java, Perl, Php, HTML/CSS/Javascript, PROLOG, SQL.
  • OS: UNIX/Linux, macOS, Windows.
  • Analysis techniques: UML, MERISE.
  • Management:
    - Head of staff for organizing the 2011 Festival des Langues (festival on world languages) in Lannion, France.
    - Member of staff organizing Jap’And Trégor 2011 (cultural event on Japanese culture).

Spoken Languages

  • French: Native Speaker.
  • English: Cambridge CAE (Certificate of Advanced English) - European level C1. 3y living in the Republic of Ireland.
  • German: European level B1 (Estimated).
  • Italian: Basics.

Formation

  • 2012-2016Ph.D. at ENSSAT - Doctoral school Matisse - University of Rennes 1.
  • 2011-2012 - Master’s degee of research in computer science at ISTIC,  University of Rennes 1
  • 2009-2012 - Diplôme d’ingénieur in computer science (Master’s degree + management skills) at ENSSAT, Lannion.
  • 2007-2009 - DUT in computer science (Foundation Degree) IUT of Lannion (Institute of Technology).
  • 2007 - Baccalauréat S with honors (equivalent to A Levels).

Research
& Industry

My work at ViaDialog (2020 - 2022)

At ViaDialog, my main task was the development and maintenance of a TTS capability. I served as lead for all TTS-related work.This included developping a working prototype capable of performing fast speech synthesis with a level of quality in line with 2021 standards. The system, based on a neural end-to-end architecture required voice data in order to be trained. Hence, I worked on the recording and post-production of the data. Finally, I operated the transformation of the prototype into an usable product and worked on its maintenance and extension of its capabilities such as adding SSML support, adding security features and much more.

Here are a few key points:

  • Developed a new TTS product: Built a TTS prototype and pursued productization of the tool, performance enhancement, packaging and added security features.
  • Recording of TTS voices: creation of the recording script, coaching of voice talents, quality checks and review of advancement.
  • Voices post-production, model training and enhancement: Creation and continuous enhancement of the TTS voices with internal evaluation and feedback from users.
  • Interactions with clients: Presentations, technical discussions with our clients and talks at tech events.
  • Management of interns and annotators: Managed several masters level interns who helped with the TTS prototype as well as annotators performing TTS related tasks.

My work at Voysis (2017 - 2020)

At Voysis, I focused on a wide variety of tasks, across a large part of our stack. I worked on ASR, Voice activity detection (and query endpoint detection), wakeword detection, NLP, Audio Analysis and Processing and TTS. I ran several data collection (audio and text utterances) and several evaluation campaigns.

A few key points:

  • Developed tooling for core pipeline: Built a user simulator to generate initial corpus to bootstrap training of new token tagger, intent classifer, named entity recognizer and language model.
  • Worked on assessing audio quality of queries: Analyzed queries manually (waveform, spectrogram...) and designed tools to simulate and automatically detect common types of issues.
  • Established evaluation strategies and wrote tooling: Done for TTS, VAD (Voice Activity Detection), Query Endpointing, Wakeword Detection, ASR, etc.
  • Performed data collections, cleaning: Created new Text-To-Speech (TTS) voice, training, evaluation and complementary corpora.
  • Gained experience as a cross-functional team member: Experience on core projects and most of our stack. Moved between teams : teams recomposed based on needs and skill sets.

My PhD

My office at ENSSAT (2016)

Image: My office at ENSSAT (10/2015).

I am working in the automatic speech synthesis field. The subject of my PhD has been the "Study of Unit Selection Text-To-Speech Synthesis Algorithms".

Two main strategy types are currently under consideration in this field. The first one relies on a statistical parametric approach where one tries to model speech signals. Models are then used in a generative way to produce speech utterances. It is called the Statistical Parametric Speech Synthesis approach (often shortened as "SPSS" though this conflicts with the name of IBM's statistics software). The second approach, which is an evolution of concatenation-based synthesis, consists in preserving and annotating a large speech corpus (usually several hours or even tens of hours), then extract fragments (called units) and paste them together to reproduce a the utterance that had to be synthesized (target utterance). The mechanism (not trivial) by which these fragments are selected is referred to as Unit Selection. The general technique is called Corpus-Based Speech Synthesis.

My thesis is to explore, diagnose Units Selection mechanism and suggest improvements. To meet these objectives, a corpus-based speech synthesis was needed. For reasons of independence, flexibility and to ensure a transversal control of the application, it was decided to build a completely new system rather than using and modifying an existing tool. So I spent a considerable time during my thesis in contributing to the implementation of the achieved engine within the team and adding features in it.

On the pure research part, I first took interest in evaluating the impact of the search algorithm on Unit Selection. In particular, the question was to identify whether or not optimality of the solution (ie. corpus units to be concatenated) was important and if not, what search strategy was the best. My conclusion was that the search algorithm sensibly impacts selection process only when searching for the optimal solution (or near optimal). Optimality of the solution is not necessary, however. Even a very pruned unit selection can be used with few sensible flaw. 

I took also interest in the formulation of the cost function that allows the search algorithm to evaluate corpus units should have. Adapted preselection filtering method (i.e. Not consider the units deemed least useful) and Unit Selection influenced by a Vocalic Sandwich criterion (trying to avoid concatenation on points where they can degrade the signal) were tested. New target costs (judging the dissimilarity between a unit and it's desired characteristics) and concatenation (capacity of the unit to be pasted to the previous one without generating problems) have been implemented and tested. I am currently continuing my work it that direction, paying particular attention to Atom-based intonation decomposition technique which I discovered during my stay at IDIAP and it's possible applications to Target Costs.

Prior to working in the field of speech synthesis, I worked briefly on analogy relations, first trough an internship and then on my free time. In particular, I developed a search algorithm for analog proportions in concept lattice, which led to a publication. This work was a collaboration with Laurent Miclet and Henri Prade.

Keywords (thesis related): Unit Selection ; Corpus-Based Speech Synthesis ; Text-To-Speech Synthesis ; Cost Function ; Concatenation Cost ; Target Cost ; Neural Networks ; Deep Learning.

Publications

PhD Thesis

International conferences with a reading committee

International Journals with a reading committee in french

International Conferences with a reading committee in french

Recent Reviews for International Conferences and Journals

  • Interspeech 2018, 2019, 2020, 2021, 2022, 2023.
  • ICASSP 2021, 2022, 2023 (Outstanding Reviewer), 2024.
  • Speech Prosody 2018, 2020, 2022.
  • Machine Translation Journal.

Teaching

ENSSAT engineering school in Lannion, Brittany, France

Image: My teaching/research place, ENSSAT engineering school in Lannion, Brittany, France

Year 2013-2014: 60h

Courses I gave as part of a "Mission d'enseignement" (teaching mission) during my second year as a PhD student at ENSSAT engineering school. 

  • Artificial Intelligence (Master's degree level): 8h + 12h Project
  • UNIX/Linux programming (Master's degree level): 20h
  • Distributed programming (Master's degree level): 14h Project
  • Introduction to Linux (Bachelor's degree level): 6h Labs

Year 2014-2015: 70h

Courses I gave as part of a "Mission d'enseignement" during my third year as a PhD student at ENSSAT engineering school.

  • Several courses in Algorithmics, C language and UNIX/Linux programming (Bachelor's degree level): 8h + 34h Labs + 14h Project
  • Web programming (Bachelor's degree level): 4h + 10h Project

Year 2015-2016: 250h

Courses I gave as part of my ATER (Attaché Temporaire d'Enseignement et de Recherche) position at ENSSAT.

  • Algorithmics, C language (Bachelor's degree level): 16h + 42h Labs/Project
  • Object Oriented programming (Master's degree level): 22h Labs + 8h Project
  • Advanced Algorithmics and Complexity (Master's degree level): 14h + 16h Project
  • Datastructures (Bachelor's degree level) : 20h Labs
  • Web Programming (Introduction+Client side) (Bachelor's degree level): Head of course: 6h Lectures + 12h Labs + 16h Project. Here are the lecture slides: Course.
  • Web Technologies (Bachelor's degree level): Head of course: 8h Lectures + 18h Project
  • Specification, Validation, Test, Profiling and Debug  (Master's degree level): 20h Labs
  • Distributed Computing  (Master's degree level): 2h Lectures +  16h Project. Here is the project subject: Subject.
  • UNIX/Linux programming (Bachelor's degree level): 10h Labs + 4h Project

Year 2022-2023: 257h (Anticipated)

Courses I am in the process of giving at ENSSAT.

  • Algorithmics, C language (Bachelor's degree level): 16h + 42h Labs/Project
  • UNIX/Linux programming (Bachelor's degree level): 10h Labs + 4h Project
  • Datastructures (Bachelor's degree level) : 20h Labs
Elsewhere on the web:

LinkedIn: Profile
ORCID: 0009-0006-3265-6321