The phonology of Vietnamese features 19 consonant phonemes, with 5 additional consonant phonemes used in Vietnamese's Southern dialect, and 4 exclusive to the Northern dialect. Vietnamese also has 14 vowel nuclei, and 6 tones that are integral to the interpretation of the language. Older interpretations of Vietnamese tones differentiated between "sharp" and "heavy" entering and departing tones. This article is a technical description of the sound system of the Vietnamese language, including phonetics and phonology. Two main varieties of Vietnamese, Hanoi and Ho Chi Minh City, which are slightly different to each other, are described below.
Initial consonants which exist only in the Northern dialect are in red, while those that exist only in the Southern dialect are in blue.
The table below summarizes these sound correspondences:
|Diaphoneme||Hanoi||Ho Chi Minh City||Example|
|/v/||/v/||/j/ or /v/||vợ 'wife'||[və˨˩ˀ]||[jə˨˧] or [və˨˧]|
|gia 'to add'|
|/r/||/r/||ra 'to go out'||[ɺa˧]|
|/ʈ/||/ʈ/ or /c/||trẻ 'young'||[ʈɛ˩˥] or [cɛ˩˥]|
|/ʂ/||/ʂ/ or /s/||sinh 'born'||[ʂɪ̈n˧] or [sɪ̈n˧]|
|Close||/i/ ⟨i, y⟩||/ɨ/ ⟨ư⟩||/u/ ⟨u⟩|
|/e/ ⟨ê⟩||/ə/ ⟨ơ⟩
|Centering||/iə̯/ ⟨ia~iê⟩||/ɨə̯/ ⟨ưa~ươ⟩||/uə̯/ ⟨ua~uô⟩|
The IPA chart of vowel nuclei above is based on the sounds in Hanoi Vietnamese; other regions may have slightly different inventories. Vowel nuclei consist of monophthongs (simple vowels) and three centering diphthongs.
|/w/ offglide||/j/ offglide|
|Centering||/iə̯w/ ⟨iêu⟩||/ɨə̯w/ ⟨ươu⟩||/ɨə̯j/ ⟨ươi⟩||/uə̯j/ ⟨uôi⟩|
|Close||/iw/ ⟨iu⟩||/ɨw/ ⟨ưu⟩||/ɨj/ ⟨ưi⟩||/uj/ ⟨ui⟩|
|/ɛw/ ⟨eo⟩||/aw/ ⟨ao⟩
Thompson (1965) says that in Hanoi, words spelled with ưu and ươu are pronounced /iw, iəw/, respectively, whereas other dialects in the Tonkin delta pronounce them as /ɨw/ and /ɨəw/. This observation is also made by Phạm (2008) and Kirby (2011).
When stops /p, t, k/ occur at the end of words, they have no audible release ([p̚, t̚, k̚]):
|đọc||'to read'||/ɗɔk/||→||[ɗăwk͡p̚], [ɗăwk̚ʷ]|
The pronunciation of syllable-final ch and nh in Hanoi Vietnamese has had different analyses. One analysis, that of Thompson (1965) has them as being phonemes /c, ɲ/, where /c/ contrasts with both syllable-final t /t/ and c /k/ and /ɲ/ contrasts with syllable-final n /n/ and ng /ŋ/. Final /c, ɲ/ is, then, identified with syllable-initial /c, ɲ/.
Another analysis has final ⟨ch⟩ and ⟨nh⟩ as representing different spellings of the velar phonemes /k/ and /ŋ/ that occur after upper front vowels /i/ (orthographic ⟨i⟩) and /e/ (orthographic ⟨ê⟩). This analysis interprets orthographic ⟨ach⟩ and ⟨anh⟩ as an underlying /ɛ/, which becomes phonetically open and diphthongized: /ɛk/ → [ăjk̟̚], /ɛŋ/ → [ăjŋ̟]. This diphthongization also affects ⟨êch⟩ and ⟨ênh⟩: /ek/ → [ə̆jk̟̚], /eŋ/ → [ə̆jŋ̟].
Arguments for the second analysis include the limited distribution of final [c] and [ɲ], the gap in the distribution of [k] and [ŋ] which do not occur after [i] and [e], the pronunciation of ⟨ach⟩ and ⟨anh⟩ as [ɛc] and [ɛɲ] in certain conservative central dialects, and the patterning of [k]~[c] and [ŋ]~[ɲ] in certain reduplicated words. Additionally, final [c] is not articulated as far forward as the initial [c]: [c] and [ɲ] are pre-velar [k̟, ŋ̟] with no alveolar contact.
The first analysis closely follows the surface pronunciation of a slightly different Hanoi dialect than the second. In this dialect, the /a/ in /ac/ and /aɲ/ is not diphthongized but is actually articulated more forward, approaching a front vowel [æ]. This results in a three-way contrast between the rimes ăn [æ̈n] vs. anh [æ̈ɲ] vs. ăng [æ̈ŋ]. For this reason, a separate phonemic /ɲ/ is posited.
|ong, oc||/awŋ/, /awk/||→||[ăwŋ͡m], [ăwk͡p̚]|
|ông, ôc||/əwŋ/, /əwk/||→||[ə̆wŋ͡m], [ə̆wk͡p̚]|
|ung, uc||/uŋ/, /uk/||→||[ʊŋ͡m], [ʊk͡p̚]|
|ưng, ưc, ưn, ưt||/ɨŋ/, /ɨk/, /ɨn/, /ɨt/||→||[ɯ̽ŋ], [ɯ̽k̟̚], [ɯ̽n], [ɯ̽t̚]|
|anh, ach||/ɛŋ/, /ɛk/||→||[ăjŋ̟], [ăjk̟̚]|
|ênh, êch||/eŋ/, /ek/||→||[ə̆jŋ̟], [ə̆jk̟̚]|
|inh, ich||/iŋ/, /ik/||→||[ɪŋ̟], [ɪk̟̚]|
With the above phonemic analyses, the following is a table of rimes ending in /n, t, ŋ, k/ in the Hanoi dialect:
While the variety of Vietnamese spoken in Hanoi has retained finals faithfully from Middle Vietnamese, the variety spoken in Ho Chi Minh City has drastically changed its finals. Rimes ending in /k, ŋ/ merged with those ending in /t, n/, respectively, so they are always pronounced /t, n/, respectively, after the short front vowels /i, e, a/ (only when /a/ is before "nh"). However, they are always pronounced /k, ŋ/ after the other vowels /u, o, ɔ, iː, ɨ, aw, a, aː, ɛ, ə, əː/. After rounded vowels /aw, u, o/, many speakers close their lips, i.e. they pronounce /k, ŋ/ as [k͡p, ŋ͡m]. Subsequently, vowels of rimes ending in labiovelars have been diphthongized, while vowels of rimes ending in alveolar have been centralized. Otherwise, some Southern speakers distinguish /k, ŋ/ and /t, n/ after /u, o, ɔ, iː, ɨ, aw, a, aː, ɛ, ə, əː/ in formal speech, but there are no Southern speakers who pronounce "ch" and "nh" at the end of syllables as /k, ŋ/.
The short back vowels in the rimes have been diphthongized and centralized, meanwhile, the consonants have been labialized. Similarly, the short front vowels have been centralized which are realized as central vowels /ă, ə, ɨ/ and the "unspecified" consonants have been affected by coronal spreading from the preceding front vowels which are surfaced as coronals (alveolar) /n, t/.
|ung, uc||/uŋ/, /uk/||→||[ʊwŋ͡m], [ʊwk͡p̚]|
|ông, ôc||/oŋ/, /ok/||→||[ăwŋ͡m], [ăwk͡p̚]|
|ong, oc||/ɔŋ/, /ɔk/||→|
|anh, ach||/an/, /at/||→||[ăn], [ăt̚]|
|ênh, êch||/en/, /et/||→||[ɤn], [ɤt̚]|
|in ~ inh, it ~ ich||/in/, /it/||→||[ɪ̈n], [ɪ̈t̚]|
|um, up||/um/, /up/||→||[ʊm], [ʊp̚]|
|ưng ~ ưn, ưc ~ ưt||/ɨŋ/, /ɨk/||→||[ɯ̽ŋ], [ɯ̽k̟̚]|
|Hue||Quang Nam||Binh Dinh||Ho Chi Minh City|
|ung, uc||[ʊwŋ͡m], [ʊwk͡p̚]||[ʊwŋ͡m], [ʊwk͡p̚]||[ʊwŋ͡m], [ʊwk͡p̚]||[ʊwŋ͡m], [ʊwk͡p̚]|
|un, ut||[uːŋ͡m], [uːk͡p̚]||[uːŋ͡m], [uːk͡p̚]|
|ênh, êch||[ən], [ət̚]||[ən], [ət̚]||[ən], [ət̚]||[ɤːn], [ɤːt̚]|
|ên, êt||[eːn], [eːt̚]||[eːn], [eːt̚]||[eːn], [eːt̚]|
|inh, ich||[ɪ̈n], [ɪ̈t̚]||[ɪ̈n], [ɪ̈t̚]||[ɪ̈n], [ɪ̈t̚]||[ɪ̈n], [ɪ̈t̚]|
|in, it||[in], [it̚]||[in], [it̚]||[in], [it̚]|
The ông, ôc rimes are merged into ong, oc as [ăwŋ͡m], [ăwk͡p̚] in many Southern speakers, but not with ôn, ôt as pronounced [oːŋ͡m], [oːk͡p̚]. The oong, ooc and eng, ec rimes are few and are mostly loanwords or onomatopoeia. The ôông, ôôc (oong, ooc, eng, ec, êng, êc as well) rimes are the "archaic" form before become ông, ôc by diphthongization and still exist in North Central dialect in many placenames. The articulation of these rimes in North Central dialect are [oːŋ], [oːk̚] without a simultaneous bilabial closure or labialization.
|on, ot||/ɔn/, /ɔt/||→||[ɔːŋ], [ɔːk]|
|oong, ooc||/ɔŋ/, /ɔk/||→|
|ôn, ôt||/on/, /ot/||→||[oːŋ͡m], [oːk͡p̚].|
|ôông, ôôc||/oŋ/, /ok/||→|
|ong, oc||/awŋ/, /awk/||→||[ăwŋ͡m], [ăwk͡p̚]|
|ông, ôc||/əwŋ/, /əwk/||→|
With the above phonemic analyses, the following is a table of rimes ending in /n, t, ŋ, k, ŋ͡m, k͡p/ in the Ho Chi Minh City dialect:
ong / ông
oc / ôc
|Combinations that have changed their pronunciation due to merger are bolded.|
Vietnamese vowels are all pronounced with an inherent tone. Tones differ in
Unlike many Native American, African, and Chinese languages, Vietnamese tones do not rely solely on pitch contour. Vietnamese often uses instead a register complex (which is a combination of phonation type, pitch, length, vowel quality, etc.). So perhaps a better description would be that Vietnamese is a register language and not a "pure" tonal language.
In Vietnamese orthography, tone is indicated by diacritics written above or below the vowel.
There is much variation among speakers concerning how tone is realized phonetically. There are differences between varieties of Vietnamese spoken in the major geographic areas (northern, central, southern) and smaller differences within the major areas (e.g. Hanoi vs. other northern varieties). In addition, there seems to be variation among individuals. More research is needed to determine the remaining details of tone realization and the variation among speakers.
The six tones in the Hanoi and other northern varieties are:
|Tone name||Tone ID||Vni/telex/Viqr||Description||Chao Tone Contour||Diacritic||Example|
|ngang "flat"||A1||[default]||mid level||˧ (33)||◌||ba ('three')|
|huyền "deep"||A2||2 / f / `||low falling (breathy)||˨˩ (21) or (31)||◌̀||bà ('grandmother')|
|sắc "sharp"||B1||1 / s / '||mid rising, tense||˧˥ (35)||◌́||bá ('to embrace')|
|nặng "heavy"||B2||5 / j / .||mid falling, glottalized, heavy||˧ˀ˨ʔ (3ˀ2ʔ) or ˧ˀ˩ʔ (3ˀ1ʔ)||़||bạ ('to strengthen')|
|hỏi "asking"||C1||3 / r / ?||mid falling(-rising), emphasis||˧˩˧ (313) or (323) or (31)||◌̉||bả ('bait')|
|ngã "tumbling"||C2||4 / x / ~||mid rising, glottalized||˧ˀ˥ (3ˀ5) or (4ˀ5)||◌̃||bã ('residue')|
|Tone name||Tone ID||Vni/telex/Viqr||Description||Chao Tone Contour||Diacritic||Example|
|Quảng Nam||Bình Định||Ho Chi Minh City|
|ngang "flat"||A1||[default]||mid flat level||˦˨ (42)||˧ (33)||˦ (44)||◌||ba ('three')|
|huyền "deep"||A2||2 / f / `||low falling||˧˩ (31)||˧˩ (31)||˧˩ (31)||◌̀||bà ('lady')|
|hỏi "asking"||C1||3 / r / ?||mid falling-rising||˧˨˦ (324)||˧˨˦ (324)||˨˩˦ (214)||◌̉||bả ('poison')|
|ngã "tumbling"||C2||4 / x / ~||◌̃||bã ('residue')|
|sắc "sharp"||B1||1 / s / '||high rising||˦˥ (45)||˦˧˥ (435)||˧˥ (35)||◌́||bá ('governor')|
|nặng "heavy"||B2||5 / j / .||low falling-rising||˧˨˧ (323)||˦˧˦ (313)||˨˩˨ (212)||़||bạ ('at random')|
In Southern varieties, tones ngang, sắc, huyền have similar contours to Northern tones; however, these tones are produced with normal voice instead of breathy voice.
The nặng tone is pronounced as low rising tone (12) [˩˨] in fast speech or low falling-rising tone (212) [˨˩˨] in more careful utterance.
The ngã and hỏi tone are merged into a mid falling-rising (214) [˨˩˦] which is somewhat similar hỏi tone of non-Hanoi Northern accent mentioned above.
North-central and Central Vietnamese varieties are fairly similar with respect to tone although within the North-central dialect region there is considerable internal variation.
It is sometimes said (by people from other provinces) that people from Nghệ An pronounce every tone as a nặng tone.
An older analysis assumes eight tones rather than six. This follows the lead of traditional Chinese phonology. In Middle Chinese, syllables ending in a vowel or nasal allowed for three tonal distinctions, but syllables ending with /p/, /t/ or /k/ had no tonal distinctions. Rather, they were consistently pronounced with a short high tone, which was called the entering tone and considered a fourth tone. Similar considerations lead to the identification of two additional tones in Vietnamese for syllables ending in /p/, /t/, /c/ and /k/. These are not phonemically distinct from the sắc and nặng tones, however, and hence not considered as separate tones by modern linguists and are not distinguished in the orthography.
|Traditional Tone Category||Register||Tone name||Tone ID||Vni/telex/Viqr||Description||Chao Tone Contour by Location||Diacritic||Example|
|Hanoi||Quảng Nam||Bình Định||Ho Chi Minh City|
|bằng 平 "even"||bình 平 "level"||phù "high"||ngang "flat"||A1||[default]||mid flat level||˧ (33)||˦˨ (42)||˧ (33)||˦ (44)||◌||ba ('three')|
|trầm "low"||huyền "deep"||A2||2 / f / `||low falling||˨˩ (21)||˧˩ (31)||˧˩ (31)||˧˩ (31)||◌̀||bà ('lady')|
|trắc 仄 "oblique"||thượng 上 "rising"||high||hỏi "asking"||C1||3 / r / ?||mid falling-rising||˧˩˧ (313)||˧˨˦ (324)||˧˨˦ (324)||˨˩˦ (214)||◌̉||bả ('poison')|
|low||ngã "tumbling"||C2||4 / x / ~||mid rising, glottalized||˧ˀ˥ (3ˀ5~4ˀ5)||◌̃||bã ('residue')|
|khứ 去 "departing"||high||sắc "sharp"||B1||1 / s / '||high rising||˧˥ (35)||˦˥ (45)||˦˧˥ (435)||˧˥ (35)||◌́||bá ('governor')|
|low||nặng "heavy"||B2||5 / j / .||low falling-rising||˧ˀ˩ʔ (3ˀ1ʔ)||˧˨˧ (323)||˦˧˦ (313)||˨˩˨ (212)||़||bạ ('at random')|
|nhập 入 "entering"||high||sắc "sharp"||D1||1 / s / '||high checked rising||˧˥ (35)||˦˥ (45)||◌́||bác ('uncle')|
|low||nặng "heavy"||D2||5 / j / .||low checked falling||˧ˀ˩ʔ (3ˀ1ʔ)||˨˩ (21)||़||bạc ('silver')|
According to Hannas (1997), there are 4,500 to 4,800 possible spoken syllables (depending on dialect), and the standard national orthography (Quốc Ngữ) can represent 6,200 syllables (Quốc Ngữ orthography represents more phonemic distinctions than are made by any one dialect). A description of syllable structure and exploration of its patterning according to the Prosodic Analysis approach of J.R. Firth is given in Henderson (1966).
The Vietnamese syllable structure follows the scheme:
More explicitly, the syllable types are as follows:
|V||ê "eh"||wV||uể "sluggish"|
|VC||ám "possess (by ghosts,.etc)"||wVC||oán "bear a grudge"|
|VC||ớt "capsicum"||wVC||oắt "little imp"|
|CV||nữ "female"||CwV||huỷ "cancel"|
|CVC||cơm "rice"||CwVC||toán "math"|
|CVC||tức "angry"||CwVC||hoặc "or"|
C1: Any consonant may occur in as an onset with the following exceptions:
w: the onglide /w/ (sometimes transcribed instead as labialization [ʷ] on a preceding consonant):
V: The vowel nucleus V may be any of the following 14 monophthongs or diphthongs: /i, ɨ, u, e, ə, o, ɛ, ə̆, ɔ, ă, a, iə̯, ɨə̯, uə̯/.
G: The offglide may be /j/ or /w/. Together, V and G must form one of the diphthongs or triphthongs listed in the section on Vowels.
C2: The optional coda C2 is restricted to labial, coronal, and velar stops and nasals /p, t, k, m, n, ŋ/, which cannot cooccur with the offglides /j, w/.
T: Syllables are spoken with an inherent tone contour:
|Zero coda||Off-glide coda||Nasal consonant coda||Stop consonant coda|
|/a/||ạ, (gi)à, (gi)ả, (gi)ã, (gi)á
|/iə/||ịa, (g)ịa, ỵa
|Labiovelar on-glide followed by vowel nucleus||/ʷă/||oạy, (q)uạy
|Tone||a /a/, à /â/, á /ǎ/, ả /a᷉/, ã /ǎˀ/, ạ /âˀ/||á /á/, ạ /à/|
Below is a table comparing four linguists' different transcriptions of Vietnamese vowels as well as the orthographic representation. Notice that this article mostly follows Han (1966), with the exception of marking short vowels short.
Thompson (1965) says that the vowels [ʌ] (orthographic â) and [ɐ] (orthographic ă) are shorter than all of the other vowels, which is shown here with the length mark [ː] added to the other vowels. His vowels above are only the basic vowel phonemes. Thompson gives a very detailed description of each vowel's various allophonic realizations.
Han (1966) uses acoustic analysis, including spectrograms and formant measuring and plotting, to describe the vowels. She states that the primary difference between orthographic ơ & â and a & ă is a difference of length (a ratio of 2:1). ơ = /ɜː/, â = /ɜ/; a = /ɐː/, ă = /ɐ/. Her formant plots also seem to show that /ɜː/ may be slightly higher than /ɜ/ in some contexts (but this would be secondary to the main difference of length).
Another thing to mention about Han's studies is that she uses a rather small number of participants and, additionally, although her participants are native speakers of the Hanoi variety, they all have lived outside of Hanoi for a significant period of their lives (e.g. in France or Ho Chi Minh City).
Nguyễn (1997) has a simpler, more symmetrical description. He says that his work is not a "complete grammar" but rather a "descriptive introduction." So, his chart above is more a phonological vowel chart rather than a phonetic one.