VocalTract Simulation
Timedomain acoustic simulation of VT with sidebranches
By mathematically reformulating Maeda's famous 1D timedomain model of vocaltract acoustics, we extended it to allow any number of internal sidebranches such as the bilateral piriform fossae, thus enabling more naturalsounding articulatory synthesis of speech.
BACKGROUND


Algorithms for computer simulation of vocaltract acoustics can be categorized broadly as either frequency or timedomain. One of the most well known and widely used of the timedomain methods, is the one elaborated by Shinji Maeda (Speech Communication, 1982), which in turn was derived from the earlier work of James Flanagan and colleagues (Bell Systems Tech. J., 1975). Timedomain acoustic modeling of the vocaltract has the advantage of dealing more naturally with transient sounds such as consonants, as well as accounting more naturally for acoustic interactions between vocalfold and vocaltract airflows.
However, timedomain models were limited to modeling the vocaltract main airway (from glottis to lips) and only one sidebranch (usually the nasal tract). On the other hand, increasing evidence points to the acoustic importance of vocaltract sidebranches such as the piriform fossae (which extend bilaterally from the main airway in the hypopharynx): it is increasingly believed that such detailed geometries potentially lend naturalness and individuality to a person's voice. In this project, we therefore aimed to overcome the one sidebranch limitation, by examining in detail Maeda's timedomain model and rederiving its mathematical formulation to allow any number of sidebranches.

SINGLEMATRIX FORMULATION


The basis of the established timedomain vocaltract model is discretization in space and time, of the three partial differential equations that describe acoustic wave propagation in a tube with nonrigid walls (the equations of momentum, continuity, and wallvibration). This leads to an equivalent representation of the vocaltract in terms of an electrical lumped transmissionline (as shown above), with each Tsection representing the length, crosssectional area, and perimeter of its physical counterpart.
Maeda (1982) used the trapezoid rule for time discretization, and rearranged the transmissionline equations into 3 sets of linear algebraic equations, that were ultimately expressed as 3 matrix equations involving mainly volumevelocities: one set for the sections from glottis up to the nasaltract junction, a second set for the sections from that junction to the lips, and a third set for the sections from the junction to the nostrils. Thus in the original formulation, the simulator was hardwired to deal with only one sidebranch: that of the nasal tract, attached to the main tract at a particular junction corresponding physically to the velar opening.
As detailed in Mokhtari (2006) and in Mokhtari et al. (2008), we generalized the algorithm to any desired number of sidebranches, by reformulating the sets of equations into a single matrix equation. As shown above, our specific example retained the nasal tract while adding two additional sidebranches: the left and right piriform fossae. A comparison of vocaltract transfer functions calculated by the new model for five vowels revealed a good match with transfer functions calculated by an equivalent frequencydomain model.
Our singlematrix formulation of the timedomain model thus proved to be accurate in terms of the expected spectral peaks (poles) and dips (zeros) of steadystate vocalic configurations, while also promising all the transient and dynamicsrelated advantages of timedomain models in general. Indeed, based on this work we also proposed a timedomain speech synthesis system based on MRImeasured vocaltract geometries (Kitamura et al., 2006).

IMPACT


It gives us great pleasure to know that the singlematrix formulation has sparked renewed interest and prompted further developments:
 Ho et al. (2011) extended the computational model to include the branching structure of the tracheobronchial airways below the glottis, and also coupled the model with the classical, selfoscillating twomass vocal fold model.
 Elie & Laprie (2016) extended the computational model to include the possibility of bilateral channels within the main vocaltract (as occur during production of lateral consonants), and also connected a selfoscillating vocal fold model with a glottal chink.

REFERENCES (chronological)


J. L. Flanagan, K. Ishizaka & K. L. Shipley (1975)
Synthesis of speech from a dynamic model of the vocal cords and vocal tract
Bell Systems Technical Journal, 54(3), 485506.
S. Maeda (1982)
A digital simulation method of the vocaltract system
Speech Communication, 1(34), 199229.
T. Kitamura, H. Takemoto, P. Mokhtari & T. Hirai (2006)
An MRIbased timedomain speech synthesis system
poster presented at the 4th Joint Meeting of the Acoust. Soc. of America and the Acoust. Soc. of Japan, Hawaii, USA
abstract 1pSC5 in J. Acoust. Soc. Am., 120(5) Pt.2, 3037.
P. Mokhtari (2006)
Direct computational method of including piriform fossae and nasal cavity in a timedomain acoustic model of the vocal tract
poster presented at the 4th Joint Meeting of the Acoust. Soc. of America and the Acoust. Soc. of Japan, Hawaii, USA
abstract 5pSC6 in J. Acoust. Soc. Am., 120(5) Pt.2, 3372.
P. Mokhtari, H. Takemoto & T. Kitamura (2008)
Singlematrix formulation of a time domain acoustic model of the vocal tract with side branches
Speech Communication, 50(3), 179190.
J. C. Ho, M. Zañartu & G. R. Wodicka (2011)
An anatomicallybased, timedomain acoustic model of the subglottal system for speech production
J. Acoust. Soc. Am., 129(3), 15311547.
B. Elie & Y. Laprie (2016)
Extension of the singlematrix formulation of the vocal tract: Consideration of bilateral channels and connection of selfoscillating models of the vocal folds with a glottal chink
Speech Communication, 82, 8596.

