Since the concept of "Mon-Khmer" (Munda vs. the rest) is now almost universally rejected, native words from Austroasiatic languages ought to be edited accordingly when an actual reconstruction of proto-Austroasiatic comes out, hopefully in a not so far future. Entries for reconstruction forms created using Shorto (2006) can be redirected.
When it comes to Sino-Vietnamese morphemes, it seems that nowadays the Northern forms dominate to the great extent, with non-Northern forms such as sanh, nhơn and ngãi becoming much less used or just completely disappeared. On the other hand, forms such as chánh, lãnh, thì are still maintained and many compounds involving these are no longer considered dialectal, existing alongside Northern chính, lĩnh, thời.
There is also the interesting triplet thực, thật, thiệt. Not too much can be said about the former two as their distribution was already all over the place when the Latin alphabet started to become widely used during the colonial period, but it seems like thực was quite a bit more common in Northern texts. thực and thật are now the two main forms seen in compounds, while thiệt still has its relevance in colloquial Southern speech.
Also particular is the case of hảo and hiếu, while the adjectival hảo (< 好 (MC xawX)) is the Northern form, the verbal hiếu (< 好 (MC xawH)) is Southern.
As far as I know, there is no comprehensive study on the dialectal differences of Sino-Vietnamese morphemes, which is unfortunate. Considering how very Sino-centric the study on Vietnamese was (and to some extent, still is), I'm surprised no one has picked up the task of examining the body of Vietnamese Latin texts written in the late 19th-early 20th century, when dialectal features in written texts were even more pervasive than nowadays.
This applies to texts not written in Latin script too, of course: the character 性 (xìng) in a text by a Southern writer would almost certainly not be pronounced tính as often transliterated these days, but tánh. In the same way, although we can't be absolutely sure considering his life and that his mother was a Northerner, it's possible that the character 𠊛 in the first line in Truyện Kiều would be read as , not , by the author himself.
In Vietnamese, there seems to be a very noticeable tendency for "suffixes" in rime replacement reduplication patterns to bear tone C. Discounting -iếc, which is not subjected to tonal assimilation, in Appendix:Vietnamese reduplication, only three "suffixes" (-ăn in B tones, -ang in A tones, -âp in B/D tones ) do not bear tone C. The other 5 "suffixes" all bear tone C.
Is there an explanation for this?
The dialects spoken in Đà Nẵng, Quảng Nam, and Quảng Ngãi are very marked for the very numerous vowel and rime shifts. From my very limited interaction with speakers of these dialects, I can say that the phonetic realizations of vowels in these dialects are extremely different from the rest of Vietnamese, with and bouncing around everywhere and the weaker presence or absence of labial-velar allophones. There have been some preliminary studies on the phonetics of these dialects () available in English, as well as a long dissertation by Tooyama () in Japanese.
I am not sure if these dialects should be classified as belonging to the Central or Southern dialect region, as they share many features with both the North Central and Southern dialects, but they sure blow my mind very time I interacted with a speaker of one.
Needless to say, there is no such thing as a single "Thanh Hoá dialect". In the Northern half of the province, a variety of Northern dialects are spoken, while the Central dialects are spoken in the south. Some of the features commonly associated with the Southern dialects seem to originate from this province, including the merger of the two C tones (the hỏi-ngã merger) and the transphonologization of the morpheme ấy into hỏi tone used in pronouns (i.e. ảnh, chỉ, etc.). As with some other coastal Northern dialects, the merger has taken place in some coastal areas of Northern Thanh Hoá (although with the on-going dialect leveling, they might disappear within the next 100 years), so that there are te (“bamboo”) for mainstream tre, teo (“to hang”) for mainstream treo, tâu (“buffalo”) for mainstream trâu.
Also needless to say, these dialects are horribly under-researched.
In Northern and Southern Vietnamese, each of the three vowels *aː *ɛː *ɔː have two reflexes: *aː > /aː ~ ɨə/, *ɛː > /ɛ ~ iə/, *ɔː > /ɔ ~ uə/ (note that like Modern Vietnamese, Proto-Vietic only had length constrast for *aː a and *əː ə, so the length of *ɛː *ɔː was not phonemic, but phonetic, and probably not pronounced any different from Modern Vietnamese /ɛ ɔ/, Ferlus assumed that they were phonetically short automatically before *-h). If we go from Proto-Vietic, there is one way to predict whether these vowels would diphthongize:
But other than that, it is literally random, and even the above rule has exceptions: ỉa has a diphthong, while bẻ has a conservative monophthong, although both ended in *-h at the Proto-Vietic stage.
What we do know for sure, is that the North Central dialects mostly escaped diphthongization and usually maintain the conservative monophthongs.
The Northern dialects can be defined as having underwent these innovations in vocalism:
It is fairly well-known that some North Central dialects (not all, probably not even the majority) exhibit the merger of the tones ngã (C2) and nặng (B2) into one tone that usually perceived as being nặng. I wonder if this has anything to do with pairs like giẫm-giậm, sẫm-sậm. The implication is whether there were cases of borrowing from the dialects with this merger to the "mainstream" dialects without it.
... Why is khiêu the Sino-Vietnamese reading of 挑 (MC thew) and 跳 (MC dew)?
Note that other characters within the same phonetic series, such as 兆 (MC drjewX) (> triệu), 桃 (MC daw) (> đào), 逃 (MC daw) (> đào), have the expected readings.
The innovative aspirated forms are a marked feature of the North Central dialects, some of these also present in the Southern dialects. One curious thing about these is that they are almost all verbs or adjectives, with bỏng/phỏng being the only noun.
Note that not all of the aspirated forms occur in all North Central dialects, but rather this is compilation from various sources. Some of these are orthographic inference from forms given in IPA in Nguyễn Thị Thuỷ (2022)'s paper on the Cao Lao Hạ lect; for example, "to scratch" was orginally given as kʰɑːj¹¹ˀ and rendered as khải here.
Lenited/plain | Aspirated | Specific correspondence |
---|---|---|
vót | phót | ⟨v⟩ vs. ⟨ph⟩ |
vỗ | phổ | ⟨v⟩ vs. ⟨ph⟩ |
vụng | phúng | ⟨v⟩ vs. ⟨ph⟩ |
(sưng) vù | (sưng) phù | ⟨v⟩ vs. ⟨ph⟩ |
vắt (cơm) | phắt (cơm) | ⟨v⟩ vs. ⟨ph⟩ |
vỡ | phỡ | ⟨v⟩ vs. ⟨ph⟩. The aspirated form seems to be obsolete. |
bỏng | phỏng | ⟨b⟩ vs. ⟨ph⟩ |
banh | phanh | ⟨b⟩ vs. ⟨ph⟩ |
bết | phết | ⟨b⟩ vs. ⟨ph⟩ |
bứt | phứt | ⟨b⟩ vs. ⟨ph⟩ |
bắt | phắt | ⟨b⟩ vs. ⟨ph⟩ |
dột | thốt | ⟨d⟩ vs. ⟨th⟩ |
dỗ | thỗ | ⟨d⟩ vs. ⟨th⟩ |
đừ | thừ | ⟨đ⟩ vs. ⟨th⟩. Also đờ vs. thờ (> thẫn thờ) with more conservative vocalism. |
gảy | khảy | ⟨g⟩ vs. ⟨kh⟩ |
gãi | khải | ⟨g⟩ vs. ⟨kh⟩ |
gỡ | khở | ⟨g⟩ vs. ⟨kh⟩ |
quấy | khuấy | ⟨c/k/q⟩ vs. ⟨kh⟩ |
cứa | khứa | ⟨c/k/q⟩ vs. ⟨kh⟩. Sinitic loan. |
quỵ | khuỵ | ⟨c/k/q⟩ vs. ⟨kh⟩. Sinitic loan. |
(ống) quyển | (ống) khuyển | ⟨c/k/q⟩ vs. ⟨kh⟩. Almost certainly a Sinitic loan. |
Potentially also đeo vs. theo, this is really uncertain, however.
Pretty much every language with written tradition has at least three speech registers: colloquial, formal, and literary (many languages have more, and unwritten languages can still have multiple formal registers). It seems that there is some preference in the option of whether to include "category word" in the words used call certain animals.
Colloquial and formal | Literary | |
---|---|---|
carp | cá chép | chép |
snakehead | cá lóc | lóc |
perch | cá rô | rô |
catfish | cá trê | trê |
loach | cá chạch | chạch |
grass carp | cá trắm | trắm |
Colloquial and literary | Formal | |
---|---|---|
cobra | hổ mang | rắn hổ mang |
banded krait | cạp nong | rắn cạp nong |
C. rhodostoma | chàm quạp | rắn chàm quạp |
pareids | hổ mây | rắn hổ mây |
Colloquial | Formal | Literary | |
---|---|---|---|
sparrow | sẻ, chim sẻ | chim sẻ | sẻ |
pheasant | trĩ, chim trĩ | trĩ | trĩ, chim trĩ |
peafowl | công | công | công, chim công |
dove, pigeon | bồ câu | bồ câu, chim bồ câu | chim bồ câu |
pelican | bồ nông, chim bồ nông | bồ nông | bồ nông |
falcon | cắt, chim cắt | cắt | cắt, chim cắt |
The extremely common classifier cái is a weird case. It's often considered a loan from the well-known Sinitic classifier 個 / 个, to which it shares notable phonetic similarity and for some time also in my view the origin of the Vietnamese word.
However, this etym presents not only in Vietnamese but also other Vietic languages: Muong Bi cảy, Tho keː³, Chut kɛ⁴, Chut kɛ⁴ (Nguyễn Văn Lợi, 1993), all used as classifier.
The dominant and general trend in Vietic languages when it comes to vowel shift is diphthongization, not monophthongization (there is monophthongization in the Southern dialects of Vietnamese, but they're obviously unrelated), so the written Vietnamese form and the Muong Bi form can be taken as innovative. The most likely original vowel was *eː, as preserved intact in Cuối Chăm (cf. Cuối Chăm keː¹ vs. written Viet. gai (“thorn”)).
Now I think 個 / 个 is out of the picture because the vowel mismatching, that doesn't mean I don't think the Chinese phoneme did not influence the Vietnamese word semantically though (on the contrary, I absolutely think it did), but what's the next most likely etymology? A native Vietnamese speaker would probably connect it with Proto-Vietic *-keːʔ (“female”) (> homophonous cái (“female”), and gái (“girls”)), and I think it's not bad too, "female" is usually connect with "great, main" in Vietic so I can see its development.
Anyway, I think there's a third option: the demonstrative "that" in Austroasiatic languages (Proto-Katuic *kii (“that”) > Pacoh ki, Semelai ke and the likes). The Katuic cognate of Vietnamese "thorn" is Proto-Katuic *kii, *ʔakii (“horn”) > Pacoh ki (“horn on nose, single tusk of rhino”) so the vowel correspondence is not a problem. If (big if) this is true, I don't think the item still had the meaning "that" at the Proto-Vietic stage, but might be just some kind of "focus/topic" marker (the thing there, you there). As seen in Northern Middle Vietnamese and modern Muong Bi, apart from classifying inanimate things, it also marks/marked some animals, indicating that its function at least when it comes to Proto-Viet-Muong used to be broader than it currently is in Vietnamese, which might not the most compelling argument for the "that"/focus marker hypothesis, but I do think it does point to that direction. Also I am not sure if the modern use of the Vietnamese word as a focus marker placed before another classifier is a trace of this possible old usage or not, although I lean more on that it is not.
Tuệ Tĩnh had some formulaic poems (that are mostly) in Chinese, but with the first line providing the translation of the medical ingredient in Vietnamese:
The first line means "The 木棉 (mùmián) is colloquially (i.e. in Vietnamese) called cây gạo".
In 天花粉 (thiên hoa phấn), the first line is:
The last character is 巴⿱例 (巴 (MC pae) + 例 (MC ljejH)), a compound phonogram (or phonogram with double phonetics, called chữ kép). It is a small puzzle why "sky" was written with 例 (lệ) as the phonetic in semantophonograms or a phonetic in compound phonograms, instead of something more suitable like 利 (lợi); if Shimizu was correct, poems like this had been altered later by scribes in order avoid the 利 (lợi) in the name of Lê Lợi (黎利), after all Tuệ Tĩnh died quite a while before the ascend of Lê Lợi to the throne.
His Nam dược quốc ngữ phú (南薬國語賦) is a work in Vietnamese that also lists Vietnamese names of many medical ingredients along with Chinese translation.
The early Vietnamese texts include Phật thuyết đại báo phụ mẫu ân trọng kinh (佛說大報父母恩重經), Cư trần lạc đạo phú (居塵樂道賦), Đắc thú lâm tuyền thành đạo ca (得趣林泉成道歌), Giáo tử phú (敎子賦), Thiền tông khoá hư ngữ lục (禅宗課虚語録), Nam dược quốc ngữ phú (南薬國語賦), Quốc âm thi tập (國音詩集). These texts are all obviously written in Northern dialects, some observations can be made:
"Place" series, attributive. At earlier stage, must always modify a noun, therefore nơi này (“this place”), ông nớ (“that man”), người nào (“which person”). nó is probably originally a member and variant of nọ, cf. French il, Japanese 彼 (kare).
"Place" series, nominal. Can be used on themselves, hence đây (“here, this place”), đó (“that thing”), đâu (“where”). đí disappeared in common use some time before colonial period, but survived for a while as part of đí gì but is now fully obsolete.
"Manner" series. ru disappeared some time in 20th century, rằng continues in common use in all dialects. sao is probably also a member.
"Extent" series.
"Extent" series.
ấy appears to be a stray.
All of these series have at least one "proximal": này/ni, đây, rày/ri, bây, vầy whose nucleus all certainly goes back to a high front vowel, with the Central ni and ri preserve the vowel as is, they also all have an A tone.
Four series have an interrogative pronoun/question particle: nào, đâu, ru, bao whose nucleus goes to a high or mid back vowel, with ru preserves the vowel as is, they also all have an A tone.
sao might be in fact a part of the r-series: it was extensively spelled with 牢 (MC law), indicating earlier *C-r-, if so, it and ru might be technically just variant of each other.
Proximal (*-iː) |
Distal 1 (*-iːʔ) |
Distal 2 (*-əːʔ) |
Distal 3/ Remote (*-ɔːʔ) |
Interrogative (*-uː or *-oː) | |
---|---|---|---|---|---|
Place, attributive n- |
ni nì này |
nấy | nớ | nọ | nào |
Place, nominal đ- |
đây | đí đấy |
đó | đâu | |
Manner r- |
ri rày |
rứa | ru sao | ||
Extent 1 b- |
bây | bấy | bao | ||
Extent 2 v- |
vầy | vậy |
Same chart, but with Nôm characters:
Proximal (*-iː) |
Distal 1 (*-iːʔ) |
Distal 2 (*-əːʔ) |
Distal 3/ Remote (*-ɔːʔ) |
Interrogative (*-uː or *-oː) | |
---|---|---|---|---|---|
Place, attributive n- |
尼 (MC nrij) 奈 (MC najH) |
乃 (MC nojX) | 女 (MC nrjoX) | 奴 (MC nu) | 芇, phonetic 鬧 (MC nraewH) |
Place, nominal đ- |
低 (MC tej) | 帝 (MC tejH) | 妬 (MC tuH) | 兜 (MC tuw) | |
Manner r- |
夷 (MC yij) 𣈙, phonetic 例 (MC ljejH) |
呂 (MC ljoX) | 𠱋, phonetic 由 (MC yuw) 牢 (MC law) | ||
Extent 1 b- |
悲 (MC pij) | 閉 (MC pejH) | 包 (MC paew) | ||
Extent 2 v- |
丕 (MC phij) | 丕 (MC phij) |
Nôm texts don't contain only semantophonograms, but were usually a mix of phonograms and semantophonograms, and of course, obviously Sino-Vietnamese elements with good ol' Chinese characters. Some pure semantograms also appeared occasionally. The choice of whether to use phonograms or semantophonograms was entirely up to the writer: what spellings were their preference, how much ambiguity they felt they could spare using phonograms, and of course, conventions (yes, Nôm characters as a whole were unstandardized, doesn't mean that there weren't conventions). This is same sort of deal with the choice of whether to use phonograms or semantograms for Old Japanese writers: some logic, but a lot of whims. The latter Nôm texts show tendency to use more semantophonograms, although phonograms did not go away and many words were still spelled predominantly with phonograms, similarly to how modern Japanese mostly uses phonograms for personal names and place names only, with content words spelled mostly with semantograms and kanas (which are descendants of phonograms).
Here're some lines that were spelled with almost only phonograms (and regular Chinese characters for Sinitic elements): (Nguyễn Trãi, QÂTT)
唏 (hơi, “breath, air”) (semantic 口 (khẩu, “mouth”) + phonetic 希 (MC xj+j)) is the only semantophonogram here, or maybe 唏 (MC xj+jX|xj+jH) as whole was used as a phonogram, per Miyake (2003).
On the other hand, this line was spelled with only semantophonograms: (Nam quốc phương ngôn tục ngữ bị lục)
Most often, it's a mix of both: (Nguyễn Trãi, QÂTT)
Demonstratives, personal pronouns, question words, final particles, as well as some common adjectives and verbs, were all chiefly spelled with phonograms; that is to say, the more common a word was, the more likely it was to be spelled with phonograms. For examples, the particle nữa (“furthermore”) the vast majority of the times was spelled with 女 (MC nrjoX), the common classifier cái with 丐 (MC kajH), đấy (“that thing”) with 帝 (MC tejH), tốt (“good”) with 卒 (MC tswot), có (“to have”) with 固 (MC kuH), ai (“who”) with 埃 (MC 'oj). Some common verbs are usually spelled with semantophonograms, however, like ăn (“to eat”) with 咹 (this one shows the power of convention: it's unlikely that literati could misread ăn if it was spelled 安 (MC 'an) all the time, but because 咹 became the conventional character, it was used).
This work is attributed to Huyền Quang (1254-1334). There seems to be two versions available on the Internet, this (1) and this (2). I've only compared the beginning of each, but there're very obvious differences between the two.
Also, shouldn't 認𫀅 be read as nhìn xem instead of nhận xem?
Proto-Vietic (Ferlus) | Vietnamese reflex | Null initial reflexes | |
---|---|---|---|
fire, firewood | *guːs ~ kuːs | củi | Semai òòs, Pacoh uih, Khmer អុស (ʼoh), Pear ʔɔːs |
excrement | *kəc ~ kɨc | cứt | Semai èc, Nyaheun ʔic, Khmer អាចម៍ (ʼac), Pear ʔec, Korku ic |
to swell | *kas | cảy | Semai as, Pacoh ayh/éih, Khasi at, Khmu ʔɛh |
Only these few examples. Likely innovation in Vietic, no clear environment.
Proto-Vietic (Ferlus) | Vietnamese reflex | Null initial reflexes | Note | |
---|---|---|---|---|
two | *haːr | hai | Khasi ar, Riang (kᵊ)ʔɑr¹ | All other branches have implosive *ɓ-. |
to sniff | *huːɲ (“to kiss”) | hun/hôn | Temiar ʔuːɲ, | Katuic also has *h-, cf. Pacoh hunh. Munda languages have similar words with *s-. |
stinky, foul-smelling | *hoːj | hôi | U ʔùj, Khmer ស្អុយ (sʼoy) | |
to finish | *heːt | hết | Khmer អត់ (ʼɑt), Mon အိုတ်, Pear ʔet, Sa'och ʔeːt, Sora əd- | In Sora and Pear, this evolved into a negative affix/negator. Khmer also has the negator usage, but kept the verbal senses. |
(that >) 3sg. | *hanʔ | hắn | Pacoh án, Nancowry an | The reflexes in Munda languages have both null initial and h-. |
to open | *haːŋʔ (“to open (mouth)”), *haːŋ (“cave”) | (Liha haːŋ³,) Vietnamese hang | Khasi ang, Riang ʔɑŋ¹, Khmu ʔaːŋ | "Cave" is probably a development from "opening". |
More complicated than *k-, which seems obviously innovative. In "to sniff" and "foul-smelling" at least, Proto-Vietic *h- could potentially a reflex of pre-Proto-Vietic *sʔ-, while for "(that >) 3sg.", the various forms with h- might indicate a grammatical function.
There is also at least one case of post-Proto-Vietic innovation for *h-: Vietnamese hàm (“jaw”), Cuối Chăm hɐːm² vs. Arem ʔæːm, Rục təŋʔaːm¹, Pacoh tang-am, etc.