Example imitated utterances for each approach
Description
Each row shows a demonstration waveform followed by three imitation approaches given the same text. Demonstration waveforms are 5 minutes long (for VCTK) or 10 minutes long (for Librispeech).
Sentence spoken
Here are some pages for who sells toms shoes.
Real demonstration utterance Both adapt speaker embedding and finetune network Only adapt speaker embedding Encoder network
Librispeech 2300
Librispeech 3575
Librispeech 7729
VCTK Speaker p301
VCTK Speaker p318
VCTK Speaker p360
Description
Each row shows a demonstration waveform followed by three imitation approaches given the same text. Demonstration waveforms are 5 minutes long (for VCTK) or 10 minutes long (for Librispeech).
Sentence spoken
Modern birds are classified as coelurosaurs by nearly all palaeontologists..
Real demonstration utterance Both adapt speaker embedding and finetune network Only adapt speaker embedding Encoder network
Librispeech 2300
Librispeech 3575
Librispeech 7729
VCTK Speaker p301
VCTK Speaker p318
VCTK Speaker p360
Description
Each row shows a demonstration waveform followed by three imitation approaches given the same text. Demonstration waveforms are 5 minutes long (for VCTK) or 10 minutes long (for Librispeech).
Sentence spoken
There were many editions of these works still being used in the 19th century..
Real demonstration utterance Both adapt speaker embedding and finetune network Only adapt speaker embedding Encoder network
Librispeech 2300
Librispeech 3575
Librispeech 7729
VCTK Speaker p301
VCTK Speaker p318
VCTK Speaker p360
Description
Each row shows a demonstration waveform followed by three imitation approaches given the same text. Demonstration waveforms are 5 minutes long (for VCTK) or 10 minutes long (for Librispeech).
Sentence spoken
The town is further intersected by numerous small canals with tree-bordered quays..
Real demonstration utterance Both adapt speaker embedding and finetune network Only adapt speaker embedding Encoder network
Librispeech 2300
Librispeech 3575
Librispeech 7729
VCTK Speaker p301
VCTK Speaker p318
VCTK Speaker p360
Effect of demo waveform length on quality
Description
Comparison of different lengths of demonstration utterances. Uses the first approach above: both adapt the speaker embedding and finetune the network.
Sentence spoken
Here are some pages for who sells Toms shoes.
Real demonstration utterance Using 10 seconds of demo waveform Using 1 minute of demo waveform Using 5 minutes (LibriSpeech) / 10 minutes (VCTK) of demo waveform
Librispeech 2300
Librispeech 3575
Librispeech 7729
VCTK Speaker p301
VCTK Speaker p318
VCTK Speaker p360