TPU Voice Quality database v1
A recorded database of laryngeal voice qualities
A freely downloadable database of speech in 9 laryngeal voice qualities, recorded by 16 naive (i.e., non-professional) Japanese talkers.
BACKGROUND
|
|
Voice quality is an important part of human vocal communication. It adds expressivity in both verbal and non-verbal interactions. However, voice quality is still a challenge in human-machine interface technologies.
John Laver (1980) presented a systematic framework to describe voice quality based on auditory impressions and physiological considerations. But as far as we know, there is no widely available multi-talker recording. To help accelerate research on voice quality and its numerous applications, we therefore undertook such a recording, specifically with a chosen subset of laryngeal voice qualities as attempted by naive (non-professional) Japanese native talkers.
The database version 1 includes the voices of 16 Japanese young adults (10 males and 6 females, ages 19-22), reading the same material in 9 laryngeal voice qualities. These voice qualities are nominal, in the sense that each talker tried to produce the intended voice quality as instructed by live demonstrations. However, as the talkers were not professional voice actors, they had varying degrees of success in producing each voice quality. Afterwards, the actually produced voice qualities were auditorily labelled and these labels are also included as part of the database.
|
DOWNLOAD
|
|
The entire database (Toyama Prefectural University, Voice Quality, ver 1) is here:
TPUVQ_v1.zip [filesize: 1.13 GB]
Examples:
|
SPECS
|
|
Talkers: 16 naive (non-professional) Japanese, including 10 males and 6 females
Material: /hai/, /a/, /i/, /u/, /e/, /o/, /aiueo/, and a 1-min story "The North Wind and the Sun" (in Japanese, "Kita Kaze to Taiyou")
Voice Qualities: Modal Voice, Whisper, Whispery Voice, Falsetto, Creak, Creaky Voice, Breathy Voice, Tense Voice, Harsh Voice
Equipment: B&K 4190 mic, B&K Nexus amp, RME ADI-2 Pro A/D converter, Sound-proof room
Sampling: 44.1 kHz, 24 bits/sample
Post-processing: polarity corrected, 40 Hz high-pass filtered
|
TERMS OF USE
|
|
The data can be used freely for research purposes, with the following citation:
Parham Mokhtari & Daisuke Morikawa (2022) "Introducing a Japanese multi-talker database of laryngeal voice qualities," in Proceedings of the Spring Meeting of the Acoustical Society of Japan, Paper 2-3Q-10, pp.1165-1166.
Redistribution in any form is prohibited.
If you download the data and find it useful or otherwise, we would love to hear from you.
Please do not hesitate to contact us.
|
ACKNOWLEDGMENT
|
|
This work was partially supported by a grant from the Casio Science Promotion Foundation.
|
|