David Guennec's Personal Pages

About me

Who am I?

I am a Ph.D. in computer science. Currently, I work as a teacher at ENSSAT and as a researcher with IRISA/EXPRESSION team. The team is located in Lannion and Vannes, Brittany, France. My primary topics of interest are TTS (Text-To-Speech) and ASR (Automatic Speech Recognition). My Ph.D. work focused on the study of unit selection algorithms in Text-To-Speech synthesis. After, that I worked as a Language Technology Scientist at Voysis in Dublin, Ireland. At Voysis, I worked on TTS, ASR, NLP (Natural Language Processing) and DSP (Digital Signal Processing). Then, I came back to IRISA to work on TTS again. I subsequently joined ViaDialog to work on a new TTS product before going back to ENSSAT.

My Background

David Guennec at TSD 2014 in Brno, Czech Republic

After my Baccalauréat scientifique (French degree), in 2007, I first obtained a DUT (Diplôme Universitaire de Technologie) in computer science in 2009. I then entered ENSSAT (École Nationale Supérieure des Sciences Appliquées et de Technologie) where I achieved a "diplôme d'ingénieur" as long as a master degree (from the university of Rennes) in 2012. Finally, I joined the IRISA/CORDIAL team (which became IRISA/EXPRESSION team) for my Phd. Between February and July 2015, I worked as an intern at IDIAP research center in Switzerland. I defended my Ph.D. on the 22nd of September in 2016.

Besides research, I have been teaching computer science at ENSSAT between 2013 and 2016. I was mainly (but not only) involved in the following courses: C and Java language, Web technologies (client side mostly), UNIX/Linux programming, distributed programming and artificial intelligence. I taught a total of 380h of courses during that period.

Resume

Mon CV en français est téléchargeable .

Experience

Since Sept. 2022 - University Lecturer and Researcher at Université de Rennes 1, ENSSAT, France.
Teaching at ENSSAT Lannion. Working on speech production and audio deepfake detection with IRISA/Expression team.
Dec. 2020-Jul. 2022 - DataScientist Voice AI at ViaDialog, France
Working on TTS & ASR problems (mainly neural TTS).
2020-2021 (9 months) - Research Engineer at IRISA (ENSSAT Lannion), France
Building a new end-to-end neural Text-To-Speech Synthesis system and training pipeline for the Expression team of IRISA.
2017 - 2020 (3 years) - Language Technology Scientist at Voysis, Dublin, Ireland.
Work on User Simulation, Voice Activity Detection and Query Endpoint Detection, Text-To-Speech synthesis, Wakeword Detection, Audio Analysis & Processing and NLP tasks.
Feb. - Jul. 2015 (5 month) - Research visit at IDIAP research center, Martigny, Switzerland
Analysis and synthesis of prosody for expressive speech synthesis. Work on F0 modelling using Atom-based intonation decomposition.
2013-2016 (3 years) - Teaching at ENSSAT Lannion, France
Two "missions d'enseignement" in 2013-2015 (service limited to 64h a year) and one ATER mission in 2015-2016 (full service).
Intervention in several bachelor and master courses (UNIX, distributed programming, AI)
2012-2016 (4 years) - Ph.D. at IRISA (ENSSAT Lannion), France
Study of Unit Selection algorithms for Text-To-Speech Synthesis.
2012 (6 months) - Research master’s degree internship at IRISA (ENSSAT Lannion)
Evaluation of Sandwich Units performances in Text-To-Speech Synthesis.
2011 (4 months) - Research on free time with Laurent Miclet
Work on analogical proportions and formal concept analysis. Creation of an analogy-finding algorithm in concept lattices.
2009 (3 months) - DUT degree Research internship at ENSSAT Lannion
Processing written symbols using analogy. Study of natural analogical proportions.
2008-2009 (7 months) - Student project IUT Lannion
Developping a rendering engine for a 2D game.

Skills

Research: Recently: Text-To-Speech Synthesis, Natural Language Processing, Voice Activity Detection, Query Endpoint Detection, Audio Analysis. Older experience: Analogy Relations, ConceptLattices.
Languages (computing): Recently: C/C++, Python, Bash, Latex. Older experience: Java, Perl, Php, HTML/CSS/Javascript, PROLOG, SQL.
OS: UNIX/Linux, macOS, Windows.
Analysis techniques: UML, MERISE.
Management:
- Head of staff for organizing the 2011 Festival des Langues (festival on world languages) in Lannion, France.
- Member of staff organizing Jap’And Trégor 2011 (cultural event on Japanese culture).

Spoken Languages

French: Native Speaker.
English: Cambridge CAE (Certificate of Advanced English) - European level C1. 3y living in the Republic of Ireland.
German: European level B1 (Estimated).
Italian: Basics.

Formation

2012-2016 - Ph.D. at ENSSAT - Doctoral school Matisse - University of Rennes 1.
2011-2012 - Master’s degee of research in computer science at ISTIC, University of Rennes 1
2009-2012 - Diplôme d’ingénieur in computer science (Master’s degree + management skills) at ENSSAT, Lannion.
2007-2009 - DUT in computer science (Foundation Degree) IUT of Lannion (Institute of Technology).
2007 - Baccalauréat S with honors (equivalent to A Levels).

Research
& Industry

My work at ViaDialog (2020 - 2022)

At ViaDialog, my main task was the development and maintenance of a TTS capability. I served as lead for all TTS-related work.This included developping a working prototype capable of performing fast speech synthesis with a level of quality in line with 2021 standards. The system, based on a neural end-to-end architecture required voice data in order to be trained. Hence, I worked on the recording and post-production of the data. Finally, I operated the transformation of the prototype into an usable product and worked on its maintenance and extension of its capabilities such as adding SSML support, adding security features and much more.

Here are a few key points:

Developed a new TTS product: Built a TTS prototype and pursued productization of the tool, performance enhancement, packaging and added security features.
Recording of TTS voices: creation of the recording script, coaching of voice talents, quality checks and review of advancement.
Voices post-production, model training and enhancement: Creation and continuous enhancement of the TTS voices with internal evaluation and feedback from users.
Interactions with clients: Presentations, technical discussions with our clients and talks at tech events.
Management of interns and annotators: Managed several masters level interns who helped with the TTS prototype as well as annotators performing TTS related tasks.

My work at Voysis (2017 - 2020)

At Voysis, I focused on a wide variety of tasks, across a large part of our stack. I worked on ASR, Voice activity detection (and query endpoint detection), wakeword detection, NLP, Audio Analysis and Processing and TTS. I ran several data collection (audio and text utterances) and several evaluation campaigns.

A few key points:

Developed tooling for core pipeline: Built a user simulator to generate initial corpus to bootstrap training of new token tagger, intent classifer, named entity recognizer and language model.
Worked on assessing audio quality of queries: Analyzed queries manually (waveform, spectrogram...) and designed tools to simulate and automatically detect common types of issues.
Established evaluation strategies and wrote tooling: Done for TTS, VAD (Voice Activity Detection), Query Endpointing, Wakeword Detection, ASR, etc.
Performed data collections, cleaning: Created new Text-To-Speech (TTS) voice, training, evaluation and complementary corpora.
Gained experience as a cross-functional team member: Experience on core projects and most of our stack. Moved between teams : teams recomposed based on needs and skill sets.

My PhD

Image: My office at ENSSAT (10/2015).

I am working in the automatic speech synthesis field. The subject of my PhD has been the "Study of Unit Selection Text-To-Speech Synthesis Algorithms".

Two main strategy types are currently under consideration in this field. The first one relies on a statistical parametric approach where one tries to model speech signals. Models are then used in a generative way to produce speech utterances. It is called the Statistical Parametric Speech Synthesis approach (often shortened as "SPSS" though this conflicts with the name of IBM's statistics software). The second approach, which is an evolution of concatenation-based synthesis, consists in preserving and annotating a large speech corpus (usually several hours or even tens of hours), then extract fragments (called units) and paste them together to reproduce a the utterance that had to be synthesized (target utterance). The mechanism (not trivial) by which these fragments are selected is referred to as Unit Selection. The general technique is called Corpus-Based Speech Synthesis.

My thesis is to explore, diagnose Units Selection mechanism and suggest improvements. To meet these objectives, a corpus-based speech synthesis was needed. For reasons of independence, flexibility and to ensure a transversal control of the application, it was decided to build a completely new system rather than using and modifying an existing tool. So I spent a considerable time during my thesis in contributing to the implementation of the achieved engine within the team and adding features in it.

On the pure research part, I first took interest in evaluating the impact of the search algorithm on Unit Selection. In particular, the question was to identify whether or not optimality of the solution (ie. corpus units to be concatenated) was important and if not, what search strategy was the best. My conclusion was that the search algorithm sensibly impacts selection process only when searching for the optimal solution (or near optimal). Optimality of the solution is not necessary, however. Even a very pruned unit selection can be used with few sensible flaw.

I took also interest in the formulation of the cost function that allows the search algorithm to evaluate corpus units should have. Adapted preselection filtering method (i.e. Not consider the units deemed least useful) and Unit Selection influenced by a Vocalic Sandwich criterion (trying to avoid concatenation on points where they can degrade the signal) were tested. New target costs (judging the dissimilarity between a unit and it's desired characteristics) and concatenation (capacity of the unit to be pasted to the previous one without generating problems) have been implemented and tested. I am currently continuing my work it that direction, paying particular attention to Atom-based intonation decomposition technique which I discovered during my stay at IDIAP and it's possible applications to Target Costs.

Prior to working in the field of speech synthesis, I worked briefly on analogy relations, first trough an internship and then on my free time. In particular, I developed a search algorithm for analog proportions in concept lattice, which led to a publication. This work was a collaboration with Laurent Miclet and Henri Prade.

Keywords (thesis related): Unit Selection ; Corpus-Based Speech Synthesis ; Text-To-Speech Synthesis ; Cost Function ; Concatenation Cost ; Target Cost ; Neural Networks ; Deep Learning.

Publications

PhD Thesis

D. Guennec, "Study of Unit Selection Text-To-Speech Synthesis Algorithms" . Defended in September 2016 at L'université de Rennes 1 (university of Rennes 1), France. Obtained with the "European Label".

An enhanced version of the document is available as a book (ISBN: 978-3-639-56032-9).

International conferences with a reading committee

[1] L. Miclet, H. Prade and D. Guennec, "Looking for analogical proportions in a formal concept analysis setting" in 8th International Conference on Concept Lattices and Their Applications, 2011.
[2] D. Guennec and D. Lolive, "Unit Selection Cost Function Exploration Using an A* based Text-to-Speech System" in 17th International Conference on Text, Speech and Dialogue, 2014.
[3] D. Guennec, J. Chevelu and D. Lolive, "Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis" in 18th International Conference on Text, Speech and Dialogue, 2015.
[4] J. Chevelu, D. Lolive, S. Le Maguer and D. Guennec, "How to Compare TTS Systems: A New Subjective Evaluation Methodology Focused on Differences" in 16th Interspeech Conference, 2015.
[5] P. Alain, J. Chevelu, D. Guennec, G. Lecorvé and D. Lolive, "The IRISA Text-To-Speech System for the Blizzard Challenge 2015" in Blizzard Challenge workshop, 2015.
[6] E. Delais-Roussarie, D. Lolive, H. Yoo and D. Guennec, "How to improve rhythmic patterns according to literary genre in synthesized speech" in 8th Speech Prosody Conference, 2016.
[7] M. Sečujski, B. Gerazov, T. G. Csapó, V. Delić, P. N. Garner, A. Gjoreski, D. Guennec, Z. Ivanovski, A. Melov, G. Németh, A. Stojković and G. Szaszák, "Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer" in 18th International Conference on Speech and Computer, 2016.
[8] D. Guennec and D. Lolive, "On the suitability of vocalic sandwiches in a corpus-based TTS engine" in 17th Interspeech Conference, 2016.
[9] P. Alain, J. Chevelu, D. Guennec, G. Lecorvé and D. Lolive, "The IRISA Text-To-Speech System for the Blizzard Challenge 2016" in Blizzard Challenge workshop, 2016.
[10] D. Guennec, H. Hajipoor, G. Lecorvé, P. Lintanf, D. Lolive, A. Perquin, G. Vidal, "BreizhCorpus: A Large Breton Language Speech Corpus and Its Use for Text-to-Speech Synthesis" in Odyssey workshop, 2022.
[[11] D. Guennec, L. Wadoux, A. Sini, N. Barbot and D. Lolive, "Voice Cloning : Training Speaker Selection with Limited Multi-Speaker Corpus" in Speech Synthesis Workshop (SSW), 2023.

International Journals with a reading committee in french

[1] A. Sini, L. Wadoux, A. Perquin, G. Vidal, D. Guennec, D. Lolive, P. Alain, N. Barbot, J. Chevelu, A. Delhay,"Regards croisés sur la production automatique de la parole pour des livres audio" in Revue TAL : TAL intermodal, 2022 (published in 04-05/2023).

International Conferences with a reading committee in french

[1] D. Guennec and D. Lolive, "Utilisation d’un algorithme A* pour l’analyse de la sélection d’unités en synthèse de la parole" in 30th Journées d’Études sur la Parole, 2014.
[2] D. Guennec and D. Lolive, "Une pénalité floue fondée phonologiquement pour améliorer la Sélection d’Unité" in 31th Journées d’Études sur la Parole, 2016.
[3] E. Delais-Roussarie, D. Lolive, H. Yoo and D. Guennec, "Patrons Rythmiques et Genres Littéraires en Synthèse de la Parole" in 31th Journées d’Études sur la Parole, 2016.
[4] J. Chevelu, D. Lolive, S. Le Maguer and D. Guennec,"Se concentrer sur les différences : une méthode d’évaluation subjective efficace pour la comparaison de systèmes de synthèse" in 31th Journées d’Études sur la Parole, 2016.

Recent Reviews for International Conferences and Journals

Interspeech 2018, 2019, 2020, 2021, 2022, 2023.
ICASSP 2021, 2022, 2023 (Outstanding Reviewer), 2024.
Speech Prosody 2018, 2020, 2022.
Machine Translation Journal.

Teaching

Image: My teaching/research place, ENSSAT engineering school in Lannion, Brittany, France

Year 2013-2014: 60h

Courses I gave as part of a "Mission d'enseignement" (teaching mission) during my second year as a PhD student at ENSSAT engineering school.

Artificial Intelligence (Master's degree level): 8h + 12h Project
UNIX/Linux programming (Master's degree level): 20h
Distributed programming (Master's degree level): 14h Project
Introduction to Linux (Bachelor's degree level): 6h Labs

Year 2014-2015: 70h

Courses I gave as part of a "Mission d'enseignement" during my third year as a PhD student at ENSSAT engineering school.

Several courses in Algorithmics, C language and UNIX/Linux programming (Bachelor's degree level): 8h + 34h Labs + 14h Project
Web programming (Bachelor's degree level): 4h + 10h Project

Year 2015-2016: 250h

Courses I gave as part of my ATER (Attaché Temporaire d'Enseignement et de Recherche) position at ENSSAT.

Algorithmics, C language (Bachelor's degree level): 16h + 42h Labs/Project
Object Oriented programming (Master's degree level): 22h Labs + 8h Project
Advanced Algorithmics and Complexity (Master's degree level): 14h + 16h Project
Datastructures (Bachelor's degree level) : 20h Labs
Web Programming (Introduction+Client side) (Bachelor's degree level): Head of course: 6h Lectures + 12h Labs + 16h Project. Here are the lecture slides: Course.
Web Technologies (Bachelor's degree level): Head of course: 8h Lectures + 18h Project
Specification, Validation, Test, Profiling and Debug (Master's degree level): 20h Labs
Distributed Computing (Master's degree level): 2h Lectures + 16h Project. Here is the project subject: Subject.
UNIX/Linux programming (Bachelor's degree level): 10h Labs + 4h Project

Year 2022-2023: 257h (Anticipated)

Courses I am in the process of giving at ENSSAT.

Algorithmics, C language (Bachelor's degree level): 16h + 42h Labs/Project
UNIX/Linux programming (Bachelor's degree level): 10h Labs + 4h Project
Datastructures (Bachelor's degree level) : 20h Labs

David guennec's Personal Pages

About me

Who am I?

My Background

Resume

Experience

Skills

Spoken Languages

Formation

Research
& Industry

My work at ViaDialog (2020 - 2022)

My work at Voysis (2017 - 2020)

My PhD

Publications

PhD Thesis

International conferences with a reading committee

International Journals with a reading committee in french

International Conferences with a reading committee in french

Recent Reviews for International Conferences and Journals

Teaching

Year 2013-2014: 60h

Year 2014-2015: 70h

Year 2015-2016: 250h

Year 2022-2023: 257h (Anticipated)

Interests

Trips (2015)

Headphones

David guennec's Personal Pages

About me

Who am I?

My Background

Resume

Experience

Skills

Spoken Languages

Formation

Research& Industry

My work at ViaDialog (2020 - 2022)

My work at Voysis (2017 - 2020)

My PhD

Publications

PhD Thesis

International conferences with a reading committee

International Journals with a reading committee in french

International Conferences with a reading committee in french

Recent Reviews for International Conferences and Journals

Teaching

Year 2013-2014: 60h

Year 2014-2015: 70h

Year 2015-2016: 250h

Year 2022-2023: 257h (Anticipated)

Interests

Trips (2015)

Headphones

Research
& Industry