Wiktionary Request pages (edit) see also: discussions
Requests for verification English add new request \| history CJK add new request \| history Italic add new request \| history Non-English (all other languages) add new request \| history Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.	Requests for deletion English add new request \| history CJK add new request \| history Italic add new request \| history Reconstruction add new request \| history Non-English (all other languages) add new request \| history Requests for deletion of pages in the main and Reconstruction namespace due to policy violations; also for undeletion requests.	Requests for deletion/Others add new request \| history Requests for deletion and undeletion of pages in other namespaces, such as appendices, templates and modules.	Language treatment requests add new request \| history Requests for changes to Wiktionary's language treatment practices, including renames, mergers and splits.
		Requests for moves, mergers and splits add new request \| history \| archives Discussion of proposed moves, mergers and splits of entries or other pages.	Category and label treatment requests add new request \| history Requests for changes to Wiktionary's categories or labels, including additions, deletions, renames, mergers and splits.
		Requests for cleanup add new request \| history \| archives Cleanup requests, questions and discussions.
{{attention}} • {{rfap}} • {{rfdate}} • {{rfquote}} • {{rfdef}} • {{rfeq}} • {{rfe}} • {{rfex}} • {{rfi}} • {{rfp}}

All Wiktionary: namespace discussions 1 2 3 4 5 - All discussion pages 1 2 3 4 5

Shortcut:
WT:LTR

This is the page for proposing changes to Wiktionary's language treatment practices, including language renaming, merging and splitting.

Use this page if you want to propose a non-trivial change to:

Wiktionary's list of languages (WT:LOL) or list of families (WT:LOF)
The language treatment practices documented at Wiktionary:Language treatment

For issues pertaining to a single language, such as orthography, start a conversation on the discussion page of the language considerations page (the so-called "About LANG" page), or the beer parlour if no such page exists.

Archiving: Language treatment requests, once closed and (if applicable) acted upon, are archived on Wikipedia-style archive subpages. These can be found at Wiktionary:Language treatment requests/Archives and in the list below:

Language treatment requests: Archive index
Language treatment requests/Archives/2015-19 Language treatment requests/Archives/2020-24 Language treatment requests/Archives/2025-29 Language treatment requests/Archives/pre-2015

2016

Nkore-Kiga

Latest comment: 8 years ago2 comments2 people in discussion

As can be seen at w:Nkore-Kiga language, Kiga should definitely be merged into Nyankore . Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledge^{discuss/deeds} 05:21, 18 September 2016 (UTC)Reply

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity 78%–96% between Nyankore, Nyoro , and their dialects; 84%–94% with Chiga , 81% with Zinza " (Kiga, meanwhile, is said to be "77% with Nyoro "), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) called Runyakitara taught at the University and used in internet browsing, but is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)Reply

	Input needed
	This discussion needs further input in order to be successfully closed. Please take a look!

Itneg lects

Latest comment: 29 days ago3 comments3 people in discussion

See w:Itneg language. All the dialects have different codes, but we really should give them a single code and unify them. I came across this problem with the entry balaua, which means "spirit house" (but I can't tell in which specific dialect). It's also known as Tinggian (with various different spellings), and this may be a better name for it than Itneg. —Μετάknowledge^{discuss/deeds} 02:09, 23 September 2016 (UTC)Reply

Support. Itnëg is listed as just one language by the KWF with its variants listed, and we know that their database has been consulted by actual speakers of the language especially pertaining to its dialects.

This has been a problem for me as well with Kankanaey, wherein it also has separate ISO codes, and is reflected here in Wiktionary. Noting that this merger request is from 2016, I hope this gets action soon. — 🍕 Yivan000 ^view_talk 05:51, 17 May 2025 (UTC)Reply

@Yivan000 Here's the thing, I cannot say that Itneg is one language at all. The dialects are far too different from one another, especially if it is spoken by a certain tribe. An example is how Maeng Itneg is very much closer to Kankanaey than to Inlaud Itneg, which is closer to Ilocano. While Maeng Itneg uses Min-, Inlaud Itneg uses Ag-. Adasen and Binongan also uses Ag- while Masadiit, Moyadan, Gobang, Mabaka, and Banao tribes use Man-. Itneg/Tinguian cannot be called one language alone. That's why plenty of Itneg people who go to other parts of IP municipalities in Abra use Ilocano to communicate with other tribes, because their Itneg will be so different. They will understant some, but not much. Amianero (talk) 06:38, 8 June 2025 (UTC)Reply

	Input needed
	This discussion needs further input in order to be successfully closed. Please take a look!

Paraguayan Guaraní

I just noticed that we have this for some reason. Guaraní is a dialect continuum that is quite extensive, both in inter-dialect differences and in geography, and certain varieties have been heavily influenced by Spanish or Portuguese. That said, our Guaraní content is, as far as I can tell, pretty much entirely on Paraguayan Guaraní, which for some reason has a different code, . My attention was brought to this by User:Guillermo2149 changing L2 headers (I have not reverted his edits, but they do cause header-code mismatch). We could try splitting up the Guaraní dialects, but it would hard to choose cutoffs and would definitely confuse potential editors, of which we have had more since Duolingo released a Guaraní course. I think the best choice is to merge into and mark words extensively for which dialects or countries they are used in. @-sche —Μετάknowledge^{discuss/deeds} 01:29, 1 November 2016 (UTC)Reply

Support and are the codes of the macrolanguage, is the code for the specific dialect spoken in Paraguay, also, until now, I haven't found any lemma to be out of . --Guillermo2149 (talk) 01:52, 1 November 2016 (UTC)Reply
Support. — Ungoliant ^(falai) 11:00, 1 November 2016 (UTC)Reply

Support merging gn and gug. - -sche (discuss) 14:33, 1 November 2016 (UTC)Reply

Support —Aɴɢʀ (talk) 15:02, 1 November 2016 (UTC)Reply

@Guillermo2149, Ungoliant MMDCCLXIV, -sche, Angr: I see now that there are three more Guaraní dialect codes that we have: Mbyá Guaraní , Chiripá , and Western Bolivian Guaraní . I presume that we should merge these into as well, but the case is arguably less clear given that in our current state, all our lemmas are really . What do you all think? —Μετάknowledge^{discuss/deeds} 22:51, 14 November 2016 (UTC)Reply
I stick by my motto, "When in doubt, merge". —Aɴɢʀ (talk) 09:53, 15 November 2016 (UTC)Reply

I think we should actually merge into and not viceversa. By the way, is the only one that should be merged, has similar and some equal words but the language is very different, and is similar and very close to but it's slightly different and always confused with --Guillermo2149 (talk) 00:37, 7 December 2016 (UTC)Reply

Don't forget there's also and apparently also . - -sche (discuss) 04:28, 16 May 2017 (UTC)Reply

2017

Merger into Scandoromani

Latest comment: 2 years ago4 comments3 people in discussion

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. --176.23.1.95 20:19, 25 January 2017 (UTC)Reply

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)Reply

I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)Reply

Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.

PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)Reply

Yenish

Latest comment: 7 years ago4 comments4 people in discussion

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-sche —Μετάknowledge^{discuss/deeds} 00:49, 7 April 2017 (UTC)Reply

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)Reply

The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)Reply

(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)Reply

2018

Category:Nahuatl language

Latest comment: 1 year ago8 comments3 people in discussion

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)Reply

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)Reply

I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)Reply

@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)Reply

A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)Reply

@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)Reply

Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)Reply

Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:

There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".

This, that and the other (talk) 09:25, 1 November 2023 (UTC)Reply

Language request: Old Cahita

Latest comment: 6 years ago3 comments2 people in discussion

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)Reply

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)Reply

I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)Reply

Merging Classical Mongolian into Mongolian

Latest comment: 4 years ago13 comments7 people in discussion

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)Reply

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply

@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.

In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.

Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)Reply

This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)Reply

@-sche, yes, the Persian model would be desirable.

It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.

The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)Reply

That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — Zack. — 14:25, 30 November 2018 (UTC)Reply

Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)Reply

I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)Reply

Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply

OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply

@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)Reply

There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, “sofa”) in Inner Mongolia (from 沙發／沙发 (shāfā) and софа (sofa, “sofa”) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.

Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)

Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. ^{(обсудить}/^вклад) 03:25, 27 September 2020 (UTC)Reply

@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.

Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)Reply

Renaming agu

Latest comment: 1 year ago5 comments3 people in discussion

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledge^{discuss/deeds} 05:42, 19 December 2018 (UTC)Reply

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)Reply

Indeed, the most common name by a longshot is Aguacatec, followed by Awakatek (but these are also names of the people-group), followed by Awakateko, then Aguacateco, and in dead last, our current name of Aguacateca. Can we rename to Aguacatec? - -sche (discuss) 07:02, 28 December 2023 (UTC)Reply

Support renaming to Aguacatec. Also being the name of the "people-group" is hardly an argument against it; the same is true of a huge number of languages including French, Welsh, Manx and the vast majority of language names ending in -ish. —Mahāgaja · talk 07:22, 28 December 2023 (UTC)Reply
Oh, to clarify, I didn't intend that as an argument against using that name, but as a qualification on the data; comparing which term is more common can't easily determine which is the most common name of the language if one term is also used for something else (the name of the people). But Aguacatec seems to be the most common name in e.g. the books about it in Glottolog's bibliography, too. Who has a bot that does renames? This one involves few enough entries that it could be done by hand, but it seems like the tasks that would need to be done are the same for many (all?) language renames, so it should be bottable... - -sche (discuss) 07:51, 28 December 2023 (UTC)Reply

2020

Retiring Moroccan Amazigh

Discussion moved from Wiktionary:Requests for moves, mergers and splits#Retiring Moroccan Amazigh %5Bzgh%5D.

We renamed this code from "Standard Moroccan Amazigh" to "Moroccan Amazigh", but failed to note that the "standard" part was key. This is a standardised register of the dialect continuum of Berber languages in Morocco, promoted by the Moroccan government since 2011 as an official language. Marijn van Putten says this is essentially Central Atlas Tamazight , but most of the people producing texts in it are native speakers of Tashelhit , so there is a bit of re-koineisation. However, if we move forward with good coverage of the Berber languages, every entry in will be a duplicate of or else a duplicate of marked with some sort of dialectal context label. By the way, the fact that there is an ISO code seems to be a political consideration rather than a linguistic one; compare the case of "Filipino", which we merged into Tagalog, or "Standard Estonian", which we merged into Estonian. @Fenakhay, -sche —Μετάknowledge^{discuss/deeds} 21:31, 16 March 2020 (UTC)Reply

Hmm, I see it's a rather recent attempt at standardization, too. I don't feel like I know enough about Tamazight to be confident about what to do, but it does seem like, if this is based on tzm, it could be handled as tzm (perhaps even, instead of putting "non-tzm" entries at shi+label, they could be tzm+label, unless they're obviously shi words). - -sche (discuss) 15:44, 19 March 2020 (UTC)Reply

Generally, it seems the words are quite obvious; the main differences between and are lexical (as far as I can tell, has more internal diversity w/r/t phonology than differences with ). But they're in a continuum anyway, and WP claims that there's debate on where to draw the dividing line. —Μετάknowledge^{discuss/deeds} 16:35, 19 March 2020 (UTC)Reply

And “Moroccan Amazigh” does not sound like a language name anyway if you have not been told it is one, it seems like “Berber as spoken in Morocco”, another reason to remove it. Fay Freak (talk) 15:59, 21 March 2020 (UTC)Reply

2021

Canonical name of "mep"

Latest comment: 3 years ago2 comments2 people in discussion

Currently, the canonical name of the language in WT is spelled Miriwung, even though every primary/secondary source I could find recommended the spelling Miriwoong, as that is consistent with the language's own orthography, while the spellings "Miriwung" and "Miriuwung" are considered nonstandard. Can someone fix it? --Numberguy6 (talk) 14:47, 8 May 2021 (UTC)Reply

It's not exactly hard to find sources spelling it as Miriwung, but I'm sure you're right. @-sche? —Μετάknowledge^{discuss/deeds} 22:52, 21 July 2021 (UTC)Reply

Names of `sah`, `alt`, `xgn-kha` and request for Soyot

Latest comment: 3 years ago4 comments4 people in discussion

The Constitution of the Republic of Sakha (Yakutia) (https://iltumen.ru/constitution) officially used язык саха referring to the language sah. A government decree («О Правилах орфографии и пунктуации языка саха») which approved the language’s current orthography, used язык саха instead of якутский язык from its annexe. However, this usage is not mandatorily popularised. I suggest Sakha to be adopted instead of Yakut due to the Constitution reference.

Whence atv ‘Northern Altai’ is not a singule language/dialect but a group of several (Kumandy, Chelkan & Tubalar), atv shall be split into subcodes. Furthermore Southern Altai is only a classifying term, Altai as an official term shall be suggested for alt.

Khamnigan xgn-kha, as a transitional dialect (with conservative phonology) between Buryat and Mongolian, its simple name may not create ambiguity.

In addition I also request a code for Soyot. It will help contrasting Sayan Turkic languages. LibCae (talk) 06:36, 2 September 2021 (UTC)Reply

The Constitution of the Republic of Sakha is not our guide to using English names. In the case of , most scholarly descriptions use "Yakut" (e.g. The Turkic Languages), there are far more raw Google hits for "Yakut language" than "Sakha language", and Google Ngrams show a preference for "Yakut" that has not waned over time (but we don't know past 2008, after which the data are incomplete).

I can't comment on the other code requests, but it would be more convincing if there were some evidence in favour of the need for these codes and their distinctiveness from their closest relatives. —Μετάknowledge^{discuss/deeds} 16:11, 2 September 2021 (UTC)Reply

I don’t see the argument how more information would come to light if we split Northern Altai. Surely also Northern Altai and Southern Altai are the most usual names, in either English or Russian. For that number of speakers Northern Altai has, how could there be a benefit? The major factor for editors is what sources they use, whether they indicate the sources and whether those are clear about the place of origin. I had many books about “the Aramaic dialect of ” where I don’t know which damn language code of Wiktionary it is supposed to belong to, Wiktionary making codes centered around city A and B but not village X, in the end I ignored to add anything. Fay Freak (talk) 17:00, 2 September 2021 (UTC)Reply

Oppose renaming Yakut

Support splitting atv

Support renaming alt to Altai

Abstain regarding xgn-kha

Support creating a code for Soyot, quite strongly so. Allahverdi Verdizade (talk) 17:13, 2 September 2021 (UTC)Reply

Renaming

Wikipedia uses the phrase "Ngul (including Ngwi)" to describe this language, which we currently call "Ngul", but this paper indicates that these are just two of several synonyms, and uses "Ngwi" as the primary name. We should follow suit. —Μετάknowledge^{discuss/deeds} 00:19, 21 December 2021 (UTC)Reply

Renaming

We currently call this language "Hamer-Banna", after two of its dialects; WP uses "Hamer". This hyphenated name is found in the literature, though it excludes the third dialect, Bashaɗɗa. Modern publications, following the lead of Petrollino's grammar, use the spelling "Hamar" for that dialect. As I see it, if we stick with the hyphenated name, we should change it to "Hamar-Banna", but we could also consider elevating the name of the primary dialect to cover the language as a whole, as WP does, though in that case we should use "Hamar" instead. —Μετάknowledge^{discuss/deeds} 07:56, 22 December 2021 (UTC)Reply

Indus Valley Language

Latest comment: 2 years ago2 comments2 people in discussion

We currently have this language, which Wikipedia refers to as the Harappan language, as . I suggest that we retire the code, because the language is undeciphered and its script has not been encoded, so there is nothing to add to Wiktionary in the foreseeable future. I also suggest that we retire the script code , which is only used for this language. @AryamanA —Μετάknowledge^{discuss/deeds} 07:14, 28 December 2021 (UTC)Reply

Support retiring both xiv and Inds. If the script should be deciphered and the language interpreted someday, we can always unretire them then. —Mahāgaja · talk 10:09, 3 December 2022 (UTC)Reply

Merging Yoruba dialects

Latest comment: 3 years ago6 comments3 people in discussion

Currently, we have codes for "Mokole" (see Mokole language (Benin)), "Ede Cabe", "Ede Ica", "Ede Idaca", "Ede Ije", "Ede Nago", "Kura Ede Nago", "Manigri-Kambolé Ede Nago", and "Ifè" (all of which are lumped into Ede language). These lects are all very close to Yoruba proper (which they use for formal and liturgical purposes), and spoken by people who are considered ethnic Yorubas; moreover, they are included in the Global Yoruba Lexical Database. I have added them as dialects of "Yoruba" in MOD:labels/data/subvarieties, but treating Yoruba as a macrolanguage means we must remove these codes. (Note: the family code would have to be removed as well.) @AG202, Oniwe, Oníhùmọ̀ —Μετάknowledge^{discuss/deeds} 07:29, 28 December 2021 (UTC)Reply

Merge, obviously again Ethnologue’s fabrications, which were then copied over from Wikipedia and some other “encyclopedias” with their impractical credulity towards this reference. Fay Freak (talk) 07:54, 28 December 2021 (UTC)Reply

If anything I would keep the Ede family code and change the lects to be etymology-only languages (edit: excluding probably Ifè since it is much more documented), but putting them all under Yoruba I unfortunately oppose for now. The Western Ede languages as seen here have a higher degree of separation from Nuclear Yoruba, and it checks out more when comparing, at the very least, the words and phrases of Ifè to nuclear Yoruba: Ifè-French Dictionary, Peace Corps - IFÈ O.P.L. WORKBOOK, J'apprends l'ife: Langue Benue-Congo du Togo. While there are obviously words that are shared due to them being related languages, it doesn't feel like a dialect of Yoruba (to me at least), so I feel uncomfortable grouping it under Yoruba. Though I do admit that I haven't really looked into the other Ede languages nearly as much. Edit: This paper may be helpful and at least shows some of the differences between Ifè & Yoruba and some aspects of the dialect continuum. Obviously some Ede varieties are much closer to Yoruba, but then I wonder what to do about the other ones. AG202 (talk) 15:09, 28 December 2021 (UTC)Reply

@AG202: Thanks for the sources. The question of whether to lump a code is in part based on how much extra work is entailed; would you be willing to work through a subsample to see how much we would just be duplicating Yoruba entries, and how much would be distinct? I'm not sure what you're actually advocating, because making them etymology-only languages (which you say you support) would require merging them (which you say you oppose). —Μετάknowledge^{discuss/deeds} 07:18, 29 December 2021 (UTC)Reply

@Metaknowledge Yea, sorry for that being unclear. I oppose the merger under solely Yoruba. Regarding the etymology-only part, I would support having all the Ede lects (excluding Ifè) under the header "Ede" and then differentiating on the definition line which Ede lect it is, mainly because they have much less coverage than Ifè, and it's harder to tell their mutually intelligibility. (Though as mentioned I'm not as well-versed with the other lects, so I might be entirely wrong about their continuum) In terms of working through a subsample, I am up to do so, though I am swamped at the moment so it'd definitely take a while, but from what I've seen so far, I'd be worried about putting possible Ifè terms like ɖíɖì (“belt”) or àntã̀ (“chair”) under a Yoruba header and keeping nice clear entries for readers. AG202 (talk) 07:52, 29 December 2021 (UTC)Reply

Looks reasonable. To clarify, my main note relates to observation that the language names currently in the data are too unnatural to find use and are not even meeting our CFI, which again means there is no entrotopy for those who know the languages to assign material to the designations with little doubt, as there is little to confirm the meanings of the language names, which should be a consideration if you devise new namings, in so far as you would like to not have private language but more or less obvious to new editors what the language codes are for. So I was not to mean that there cannot be a split in a different manner, or a smaller merge, but the current ones should be recognized as off the wall, and then there will have to be something that interrelates the remaining codes if one stumbles upon one, else it will be a reoccurring problem that an editor did not see the distinction of the available language codes. Fay Freak (talk) 01:36, 30 December 2021 (UTC)Reply

2022

Category:Gansu Chinese→Category:Gansu Mandarin? Category:Gansu Dungan?

Latest comment: 3 years ago5 comments3 people in discussion

Members:

價關: Gansu Dungan
可價: Gansu Dungan
綿魚: Gansu

@Justinrleung, RcAlex36, 沈澄心 —Fish bowl (talk) 05:55, 6 February 2022 (UTC)Reply

@Fish bowl: Gansu means actual Gansu in China, but Gansu Dungan should be its own label perhaps. I'm not sure why those entries are labelled specifically as Gansu Dungan, though, because do we know if it's not used in other varieties of Dungan? Pinging @Mar vin kaiser to know why he chose to label it as Gansu Dungan specifically. — justin(r)leung _{{ (t...) | c=› }} 06:03, 6 February 2022 (UTC)Reply

@Justinrleung: There's this website, I can't find the link now, that was like a mini Dungan dictionary, and for some of its words, it has a dialectal label. I think I got it from there. --Mar vin kaiser (talk) 08:39, 6 February 2022 (UTC)Reply

@Mar vin kaiser: This? I know these words are marked as Gansu here, but I wonder if we need to specify it as Gansu specifically when we don't know if other Dungan varieties use it. — justin(r)leung _{{ (t...) | c=› }} 09:02, 6 February 2022 (UTC)Reply

@Justinrleung: Oh, I added the label Gansu with the assumption that it's specifying that it's only used in Gansu. Aren't there just two dialects, Gansu and Shaanxi? --Mar vin kaiser (talk) 14:03, 6 February 2022 (UTC)Reply

Merge Category:Hokkien, Category:Hokkien Chinese; and perhaps move Category:Hainanese depending on the result of the previous

Latest comment: 1 year ago6 comments4 people in discussion

Category:Hokkien is an etymology language, while Category:Hokkien Chinese belongs to the {{dialectboiler}} system.

Category:Hainanese is presently both.

—Fish bowl (talk) 11:10, 7 February 2022 (UTC)Reply

@Fish bowl @Justinrleung @RcAlex36 @沈澄心 @AG202 IMO we should delete Category:Hokkien Chinese and recategorize the lemmas under it to Category:Hokkien. This is consistent with the treatment of other etymology languages, particularly since Hokkien is considered a dialect of the Min Nan language and not a dialect of "Chinese" (which is not a language). If you don't mind, I will go ahead and do this. (While we're at it, we should rename the Amoy etymology language to Xiamen Hokkien, which is currently a dialect category but not an etymology language, and give it a standardly-formed etymology code. Its current code is nan-xm, which is badly formatted; etymology codes should consist of sections of three letters, hence nan-xia. Same goes for nan-ph -> nan-phi, nan-qz -> nan-qua, nan-zz -> nan-zha, nan-jj -> nan-jin.) Benwing2 (talk) 05:29, 16 September 2023 (UTC)Reply

I also think we should upgrade Hokkien to a full language, esp. seeing as Min Nan is itself not a language but a macrolanguage. Benwing2 (talk) 05:30, 16 September 2023 (UTC)Reply

Agree that we should treat Hokkien as a full language - this feels like to be long overdue. I think in general each lect listed in {{zh-pron}} should be treated as a full language in its own right, which means Sichuanese (currently with etymology code ) and Leizhou (currently lacks a code, I would suggest or ) would be upgraded. We might also want to add more etymology codes, but that might warrant a separate discussion.

I however oppose changing the 3-2 letter codes, which are much easier to memorise (since this is just taken from the first letter of each syllable) and also are consistent with the location codes used in {{zh-pron}}. Changing them means that we would need to deal with two separate, inconsistent systems.

Regarding the category name issue, for some reason we also have categories like Cat:Mandarin Chinese, Cat:Cantonese Chinese, Cat:Hakka Chinese, Cat:Min Nan Chinese, etc. alongside the regular lemma categories. I don't really care about their treatment as long as the approach is consistent. – wpi (talk) 17:18, 16 September 2023 (UTC)Reply

@Wpi It is a pain to have nonstandard etym codes like this, as it requires adding code to various places to handle them. I don't see why the 3-2 codes are easier to memorize; the proposed 3-3 codes consistently use the first three letters of the lect in question, which is standard practice at Wiktionary, whereas the 3-2 codes aren't consistent (nan-ph is not the first two syllables of "Philippine"). In terms of the location codes in {{zh-pron}}, we should rename the latter to match the 3-3 codes. However, as a first step if you don't object, I will promote Hokkien to a full language, and we can continue the discussion on etym codes; in this case we should maybe eliminate Category:Hokkien in favor of Category:Hokkien Chinese for consistency with the other such categories, although in general we need to rethink the naming of these categories. Benwing2 (talk) 19:21, 17 September 2023 (UTC)Reply

I think one reason that 3-2 codes are easier to memorize is that {{zh-pron}} uses 2-letter codes for dialects of Hokkien. However, if it makes more sense for codes to be 3-3 to be consistent with other languages, I wouldn't mind it. I agree that whatever we do, we should make it consistent with CAT:Mandarin Chinese, CAT:Gan Chinese, CAT:Xiang Chinese, etc., (which means the easiest thing to do is to have CAT:Hokkien Chinese). — justin(r)leung _{{ (t...) | c=› }} 18:09, 19 September 2023 (UTC)Reply

Slavic phylogeny

Latest comment: 1 year ago12 comments5 people in discussion

Old Slovak ?

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply

I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply

I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply

I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply

@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply

Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply

Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply

@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply

@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply

Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply

I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Slavic phylogeny

Latest comment: 1 year ago37 comments10 people in discussion

East Slavic codes

Following up a long discussion on the Old East Slavic About: page, I'd like to propose the following splits:

Split off Old Ruthenian (zle-ort)
Set Old Ukrainian (zle-obe) and Old Belarusian (zle-ouk) as etymology-only descendants and labels of Old Ruthenian
Set Ukrainian (uk), Belarusian (be) and Rusyn (rue) as descendants of Old Ruthenian
Change Old Russian (zle-oru) to Middle Russian (zle-mru) and set this as a label of Russian (ru)

On the final point there was quite some discussion, and I personally support making Middle Russian as a full-fledged code, but since we couldn't reach consensus, I propose making that a separate discussion if need be.

The proposed historical borders of the languages are as follows:

Old East Slavic (until the 14th century)
Middle Russian (=Moscow Literary language; 14th century-18th century)
Old Ruthenian (='West Russian' Literary language; 14th century-19th century)

Pinging @Atitarev, ZomBear, Useigor, Ентусиастъ, Benwing2, Rua, Ogrezem. I apologise if I forgot anyone. Thadh (talk) 12:43, 2 March 2022 (UTC)Reply

I still support only the introduction of Old Ruthenian, which is missing but as before, I don’t claim to be an expert on the matter. The Russian corpus in the other discussion was helpful. When I filtered on “Middle Russian”, I think I was able to find a couple of words, which are now considered obsolete. The rest were words, which just need to be respelled to find quotes in (early) Modern Russian. I found a few different ways to abbreviate and also numerous misspellings. Overall I sort of feel why these additional splits are not so popular - little strong evidence to work with. Middle Russian may be allowed to be added, let’s just look for good cases.

To make decisions easier, why don’t we add a couple of specific examples for each new language code proposed - something to work with. (They can be vocab, grammar or pronunciation cases). They proponents should have examples in mind to make the case(s) stronger. We can work together on confirming or disputing those cases. --Anatoli T. ^{(обсудить}/^вклад) 22:57, 2 March 2022 (UTC)Reply

I'll see if I can make a list of features that distinguish Middle Russian from (Modern) Russian. In any case, for the time being, treating Middle Russian like Old East Slavic makes little sense to me, especially if we're splitting off Ruthenian (otherwise we get some kind of Dutch-Afrikaans situation), so we could go ahead with that now and in the meantime continue discussing MR's position as a separate code. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply

(edit conflict) You can use any of the examples already in discussions used as evidence, e.g. онтарь/оньтарь, агистъ, etc. BTW, I see that "Old Russian" was used incorrectly by ZomBear when actually talking about Middle Russian. "Old Russian" = "Old East Slavic". The Russian term for Middle Russian is старору́сский (starorússkij) but Old East Slavic (Old Russian) is древнеру́сский (drevnerússkij). --Anatoli T. ^{(обсудить}/^вклад) 00:21, 3 March 2022 (UTC)Reply

Quick update, I've found a relevant discussion from three years ago, Wiktionary talk:About Russian#Middle Russian?. Also, The Russian Language before 1700 (Matthews 1953) argues your and Fay Freak's point (that Middle Russian is too similar to modern Russian to warrant a linguistic distinction) Fun point, it also provides съмьрть's accentuation :0. I'll still look for differences in the corpora, but if the languages are too similar I guess I don't mind keeping the two together - as long as the descendants sections don't get too cluttered, I'm fine. Thadh (talk) 00:02, 3 March 2022 (UTC)Reply

BTW, I didn’t get back to you on the concern I have in regards to introduction of word stresses in Old East Slavic. My reason being there are many cases where assumptions can go wrong based on descendants. We should only use referenced data. Well, we don’t have native speakers to prove us wrong, do we? —Anatoli T. ^{(обсудить}/^вклад) 23:03, 2 March 2022 (UTC)Reply

Sure, but of course we can still use sound laws for words without referencing the specific word's reconstruction. A word like съмь́рть will have the stress on the second syllable, because otherwise the Russian term would be something like **со́мерть rather than сме́рть. However, I wouldn't know where to look for any reference on this specific word, and googling "съмь́рть" returns no results. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply

Of course, there could be strong (?) assumptions on vowels, which became silent (i.e. they are unstressed) but I wouldn't be so sure even on e.g. вода́ (vodá) (if it weren't referenced), since the word is stressed on the first syllable in some Ukrainian dialects, if you know what I mean. --Anatoli T. ^{(обсудить}/^вклад) 00:21, 3 March 2022 (UTC)Reply

@Thadh: I support your suggestions. Ентусиастъ (talk) 16:19, 3 March 2022 (UTC)Reply

I have already spoken before. I'm for it too.--ZomBear (talk) 00:57, 4 March 2022 (UTC)Reply

@Thadh: Again, unfortunately, I see that the discussion has stopped again. It's been almost a month since no one has written anything. Every day I look forward to the solution of this issue with the Old Ruthenian language. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

Done. What we need now is to split all pages into either Old East Slavic, Russian (with the Middle Russian label) or Old Ruthenian (with or without the Old Belarusian/Old Ukrainian label). Thadh (talk) 18:43, 21 March 2022 (UTC)Reply

I also removed Old Novogrodian as the child of Old East Slavic. Vininn126 (talk) 08:52, 4 October 2023 (UTC)Reply

@Thadh how about adding more etymology only language codes? Modern dictionaries use more than just Old Belarusian/Ukrainian. I saw Middle Bulgarian, Old Slovak, Old Slovene, Old Serbian, Old Croatian, Old Serbo-Croatian, Old Bulgarian, Old Upper Sorbian, Old Lower Sorbian. Possibly Middle Czech and Middle Polish also would be useful sometimes. Old Sorbian was also used by Boryś (Old Sorbian peleš as cognate for Polish pielesze), however we can't just link to both Lower and Upper Sorbian at once, so that would require full support for this language (?). Scientific publications mention Old Polabian as language of Polabian Slavs in Middle Ages, it is used usually for proper nouns like given names, theonyms, toponyms, sometimes ordinary words mentioned in Latin texts and it is always reconstructed language, I would like to have it tho. Sławobóg (talk) 14:32, 28 May 2022 (UTC)Reply

@Sławobóg I'll need from you in order to determine if the splits are worth it is:

- Exact boundaries of the languages' stages

- You need to check how much literature there is in the earlier stages of the language.

- You need to check in how much the languages differ from their modern stages.

Once you do that, we can continue the conversation about splitting them. It seems pointless to split a language off just because there are two inscriptions in some dusty old book. Thadh (talk) 15:15, 28 May 2022 (UTC)Reply

@Thadh: IMO Middle Polish would benefit greatly from the split.

Boundaries: As it is with extinct languages, there aren't really any exact boundaries, but it's usually defined as between the 16th and the 18th century; Polish Wiktionary has settled on years 1500 to 1750 to account for Doroszewski's dictionary.
Literature: There are two major corpora, accessible on the SPXVI and ESXVII websites.
Differences: I reckon the spelling and pronunciation differences, especially the employment of "slanted vowels" (samogłoski pochylone, I have no idea what their name is in English), should be enough.

Plus, like, this would help with attestation. Hythonia (talk) 11:08, 30 July 2022 (UTC)Reply

Middle Polish is also thusly defined on Wikipedia. I also think it would make more sense to have Middle Polish as an LDL. The alternative would be having a label. If we split, we'd have to add Middle Polish both to Proto Slavic descendent entries as well as intermediates on etymologies. Vininn126 (talk) 11:52, 30 July 2022 (UTC)Reply

Also pinging @KamiruPL, as an editor for Old Polish. Do you think we should fully split Middle Polish, create a label, or some other alternative? Vininn126 (talk) 13:44, 30 July 2022 (UTC)Reply

@Vininn126: I treat Arabic before the spread of printing in the Arab world, which is from 1800 (Napoléon brought the press to Egypt, which was then a state business that over time was rented by privates who would copy it), as LDL. The reason becomes more obvious for Hebrew where we are eager to include hapax legomena in the Tanakh and due to lacking distinctness of the Modern to the Biblical language, from which the former has been resurrected, have little desire to split. This is in analogy to the split of English from Middle and Old English, where basically the split happens following the new medium of printed books—accordingly if Polish literacy in the same fashion starts only somewhere in the 18th century then we become stricter only then.

Circumventing attestation criteria is no reason to split language headers, as your perception about whether something is another language is the same and only disingenuously modified by that consideration of its description. So more appropriate attestation criteria – and I think of the many carefully collected variants sadly left even unmentioned as a consequence of no sense of proportion applied to the teleology of our rules – by no means should serve motivation to split languages; we can already derive them by the accepted statutory interpretation methods.

To be clear, since legal thinking is unwonted and mysteriously strange to many in spite of people rightly being appointed for it in any society: In this case this is really just systematic interpretation: Since the community authoring the policies was biased towards English but the splits of other languages wrought comparative inconsistency with its situation according to which it has been split by chronolects, we break the criteria down to be suited for the languages they were only roughly devised for. Fay Freak (talk) 09:51, 31 July 2022 (UTC)Reply

In all honesty a label is likely the best option. Vininn126 (talk) 10:05, 31 July 2022 (UTC)Reply

@Hythonia @Sławobóg @KamiruPL I've gone ahead and added Middle Polish as a label. Vininn126 (talk) 12:11, 8 August 2022 (UTC)Reply

I've thought about this more, and I think there might be a case for Middle Polish as an L2. If we agree it should be split, I can help convert the existing entries to Middle Polish.

Here is my reasoning:

Old Polish, Middle Polish, modern Polish, and Silesian are four lects that are hard to separate accurately. Part of this argument hinges on Silesian, which we currently treat as an L2, and I don't see that changing. There are political, historical, and linguistic reasons

===Why Silesian should be an L2===

Its speakers feel strongly that it is a language, not a dialect, most Polish linguists pushing that it is a language include Jan Miodek, who is a notable prescriptavist who pushes more nationalistic views of how languages should be treated, and I believe that treating Silesian as a dialect is done partially to stifle any sense of individuality to further Polish control. However, I recognize that theory has some tinfoil-hat conspirist vibes to it, so I'll stick to its speakers strongly feel it is.
Significant linguistic difference: Silesian has a different phonology to Polish, and other grammatical features, such as retaining the Proto-Slavic aorist in an analytical past tense, as opposed to a more agglutinative/morphological one in Polish. It also recently has undergone strong standardization, as can be seen on silling.org and the ślabikŏrzowy szrajbōnek.
Significant lexical differences: Silesian differs quite a bit from Polish in terms of lexical information. Core inherited words are of course similar, but look at other Slavic languages. It's also been heavily "Policized", but so has Kashubian, which we also treat as an L2 and is recognized as a separate minority language in Poland, and both Kashubian and Silesian are recognized by ISO and Glottolog.
Finally, the key point to the overall arguement: Silesian is a descendent of Middle Polish. Most claims that it is Czechoslovakian are refuted by Silesian philologists.

===Why Middle Polish should maybe be an L2===

So if we decide that Silesian is an L2, that would give Middle Polish multiple descendents. This would "fix" many inherited etymologies, such as wszystek. This would also fix Latinate borrowings, where Silesian inherited an older pronunciation of Latinate words, and also the chain generally works better as Learned borrowing into Middle/Old Polish -> Polish + Silesian, as opposed to setting multiple Learned borrowings.

Furthermore, Middle Polish was siginificantly different from Modern Polish in terms of phonology and grammar (I recently updated the Middle Polish Wikipedia page). In terms of lexical content - there were significant shifts, I would say less than the standard differences between Slavic languages, but there were still trends, and dictionaries such as {{R:pl:SXVI}}, {{R:pl:SXVII}}, and occasionally {{R:pl:SJP1807}} or {{R:pl:SJP1900}} would be key in this. Furthermore, Middle Polish is otherwise resource poor, and should be treated as an LDL, label or not. Having it as an L2 is cleaner in terms of citations.

If we agree that this should be done, I would recommend setting the cutoff dates as c. 1500-c. 1780, with a language code of zlw-mpl. Vininn126 (talk) 12:39, 24 April 2023 (UTC)Reply

@Atitarev@Fay Freak@Hythonia@Sławobóg@Thadh@ZomBear@Ентусиастъ Vininn126 (talk) 17:30, 24 April 2023 (UTC)Reply

Update: there is debate as to whether Silesian should be listed as from Old Polish or Middle Polish, which really affects the above argument. Vininn126 (talk) 14:53, 25 April 2023 (UTC)Reply

Just flagging up that it's possible to give Middle Polish an etymology-only language code, and to set it as the ancestor of Polish (and Silesian, if desired). This would be a way to keep its entries under the Polish L2, while allowing etymologies to formally mention it. In turn, Middle Polish could have Old Polish set as its ancestor.

Of note is the fact we already have Middle Russian, Old Ukrainian, Old Belarusian, Middle Bulgarian and Early Modern Czech, which are all currently handled in the same way. Theknightwho (talk) 16:14, 25 April 2023 (UTC)Reply

Old Slovak ?

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply

I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply

I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply

I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply

@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply

Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply

Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply

@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply

@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply

Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply

I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Proposal to rename Ottawa (otw) to Odawa

Latest comment: 3 years ago2 comments2 people in discussion

I think Ottawa should be renamed to Odawa; It's the more common English name used to refer to the language nowadays, and preferred by speakers. What do you think? /mof.va.nes/ (talk) 15:47, 15 April 2022 (UTC)Reply

Support —Mahāgaja · talk 07:26, 18 April 2022 (UTC)Reply

Re-merge Kven and Meänkieli into Finnish

Latest comment: 2 years ago19 comments8 people in discussion

@-sche, Chuck Entz, Rua, Tropylium, Hekaheka, Surjection, Brittletheories, Mölli-Möllerö

In the previous discussion on this topic () it seems everyone has agreed that it's best to merge Kven and Meänkieli into Finnish. However, the discussion was closed without actually merging the codes, and currently we (again) have 40 Kven and 30 Meänkieli lemmas, many of which are also duplicated as Finnish for the reasons discussed in the above discussion. Has anyone changed their opinion or does anyone have anything to add to this or can we actually go ahead and merge the languages?

I guess related to this is also the question of how to handle dialectal morphology of Finnish dialects, but maybe that's a bit out of scope for this discussion. Thadh (talk) 16:24, 23 September 2022 (UTC)Reply

The strongest arguments in favour of splitting them are political and should therefore be ignored. Our task is to best present the most information, and that would best be achieved by merging the three lects. The dozens or so new dialectal terms will fit in quite well with the 1250 pre-existing ones. brittletheories (talk) 16:49, 23 September 2022 (UTC)Reply

Incubator says "Wikimedia does not decide for itself what is a language and what is a dialect. We follow the ISO 639 standard." This means that it's up to the agency that grants language codes, not to us, right? Meänkieli and Kven have written standards so they should stay as they are. (In my view, Tver Karelian should also be treated as a language so I could add Tver Karelian words without knowing if they're used in the more usual "vienankarjala" dialect.) Mölli-Möllerö (talk) 19:55, 23 September 2022 (UTC)Reply

The Incubator standards are not the same as our standards. Our language treatment does not strictly follow ISO 639. — SURJECTION ^{/ T / C / L /} 20:33, 23 September 2022 (UTC)Reply

@Mölli-Möllerö: On the Tver Karelian issue, you could also just leave the first parameter of {{krl-regional}} empty or |1=? it, and it will automatically be sorted in Category:Karelian term requests, and I'll be able to add the terms later. Or you could use either {{R:krl:KKS}} or another Viena source, the correspondences are usually quite easy. Thadh (talk) 20:44, 23 September 2022 (UTC)Reply

Wrong. There's a big difference between Wikimedia's administrative needs and the lexical needs of a dictionary. As for written standards: the world is full of languages with multiple written standards: Brazilian and European Portuguese, European and Canadian French, Austrian and German German, etc. We can't let others decide for us- each case needs to be considered on its own. We've chosen to merge languages treated as separate by ISO and recognize languages with no ISO codes. In other cases we've gone with the ISO. Chuck Entz (talk) 20:59, 23 September 2022 (UTC)Reply

For outsiders, Meänkieli (in Sweden) and Kven (in Norway) are languages or rather dialects that have become languages by virtue of being across the border (the Finnish-Swedish border and the Finnish-Norwegian border, respectively). Finnish speakers can easily understand nearly 99% of Meänkieli or Kven, and the main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other).

Linguistically they are 100% dialects, but politically both Sweden and Norway respectively have recognized them as separate languages, which is also what their speakers think. A more cynical person might say that they have deluded themselves into thinking their language is not Finnish in order to avoid persecution of Finnish that was prevalent in Sweden and Norway in the 19th and 20th centuries ("Finnish? what Finnish? we're not speaking Finnish, it's Meänkieli/Kven").

However WIktionary best handles cases like these, I don't know. 200 years is not enough for what is generally a phonologically conservative language for it to become anywhere near unrecognizable. It could be compared to how Karelian is now almost universally treated as a separate language, even though it forms a dialect continuum and has been diverging now for at least about 800 years (ever since the 1323 Treaty of Nöteborg).

Finnish sources almost exclusively consider Meänkieli and Kven to be dialects, even more so when these sources are linguistic-oriented (some other sources take a political stance and recognize that they are considered "minority languages" in their respective countries). — SURJECTION ^{/ T / C / L /} 20:34, 23 September 2022 (UTC)Reply

"The main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other)"... and the additional Swedish/Norwegian loanwords found in Meänkieli/Kven, of course. But many of these are also found in Finnish dialects. — SURJECTION ^{/ T / C / L /} 21:37, 23 September 2022 (UTC)Reply

The divergence of Karelian from Finnish, FWIW, almost certainly goes back at least 1200 years (to the archeological / mentioned-in-Novgorod-sources Old Karelian culture). The initial split-off of Northern Finnish dialects is probably about as old too.

What I would think of as the best argument against treating Meänkieli and Kven as languages is that they're not even internally well-defined — typically they're just catch-all terms for "Northern Finnish in Sweden" and "Northern Finnish in Finnmark" with relatively various dialects encompassed by each. There's some efforts (schoolbooks, etc.) towards a "standard" Meänkieli based on the Torne Valley dialect but I don't think it could be called actually standardized just yet. I suppose one thing we could do is to document whatever is done on this specifically under "Meänkieli" and leave anything else as dialectal Finnish, but that might be a bit premature still too. --Tropylium (talk) 07:44, 24 September 2022 (UTC)Reply

I would not say that "everybody" agreed on the merger. I didn't. I can only comment Meänkieli but I would not be surprised if similar argumentation would also apply for Kven:

The overall small number of Meänkieli words in Wiktionary only proves that we don't have an active editor in Meänkieli. There seem to be some 30,000 entries in this Meänkieli--Finnish-Swedish dictionary
The small sample of words we have proves nothing of similarity of the vocabularies. If you study the dictionary I mentioned (press "tutki") you'll find that there are considerable differences between Finnish and Meänkieli. In addition to vocabulary, conjugation of verbs seems to differ (e.g. Meänkieli: tukeat - Finnish: tuet - English: you support).
This article promotes the opinion that Meänkieli is a dialect. However the writers admit that the two are not readily mutually understandable: Finnish-speakers usually understand Meänkieli relatively well, partly because of their knowledge of Swedish, but for Meänkieli speakers Finnish isn't as easy. If we took a Finn who does not know a word of Swedish, they would be lost with a Meänkieli speaker.
This article starts from the maxim that Meänkieli is a dialect of Finnish but finishes with the conclusion that at the end of the day it is the spakers of a language themselves who decide the status of a language/dialect. Meänkieli speakers have made their opinion clear: they want it treated as a language. How competent are we to second-guess their point of view? Has any of us studied Meänkieli more than superficially?

Here is also a link to a Kven-Norwegian dictionary--Hekaheka (talk) 09:44, 24 September 2022 (UTC)Reply

To be fair all these points would still hold for Ingrian and Savonian dialects, too, and of Ingrian dialects I'm fairly certain no Finnish speaker would readily understand them much better than, say, Izhorian or Karelian. Thadh (talk) 09:51, 24 September 2022 (UTC)Reply

A clear-cut solution would be to stick to ISO. Ingrian has an ISO code, Savo hasn't. Is Ingrian currently treated as Finnish dialect? I think it shouldn't. --Hekaheka (talk) 12:05, 24 September 2022 (UTC)Reply

You're confusing Ingrian (inkeroinen) and Ingrian (inkerin (suomalainen)). The first one is the same as Izhorian and is handled as a distinct language, has an iso code, and is spoken by the orthodox Izhorians. The latter one is the same as Ingrian Finnish and is handled as a Finnish dialect, does not have an iso code, and is spoken by the lutheran Ingrian Finns. My remark concerned the latter. Thadh (talk) 13:46, 24 September 2022 (UTC)Reply

I've come around to say that I think they should be merged. We don't consider Valencian, Ulster Scots nor Lemko (the linguistic case is very similar between those examples and this one) to be their own languages despite political arguments that they should be considered as such (and even some recognition like in the ECRML). We shouldn't do so here either. And don't even mention the whole thing going on with Serbo-Croatian... The general trend on en.wikt seems to be to consider the linguistic argument more important than any political ones (which I can appreciate). — SURJECTION ^{/ T / C / L /} 11:51, 3 October 2022 (UTC)Reply

As a Norwegian, I find it odd that there is a proposal to merge Kven with Finnish - as Kven is an officially recognized minority language in Norway (Finnish is not). I do not agree with this merge, for the following reasons:

At least in Norway, Kven and Finnish are considered separate languages. You are able to get elementary school education and books in Kven (but not in Finnish, as far as I know) - you can even study Kven at the University of Tromsø and receive a bachelor's and master's degree in the language (there is a Finnish one as well, and they are considered two separate degrees). Kven people are considered a separate ethnicity, along with their language, descendant from Finns/Finnish.
Political reasons are of course relevant, not just linguistic ones. The average Kven speaker has never set foot in Finland, never studied any Finnish, nor consumed any part of Finnish culture and media (music, literature, etc.). An argument was that Finnish speakers understand 99% of Kven - as a Norwegian I understand up to 99% of Swedish and Danish, but they are not getting merged into one language called Scandinavian (for political reasons).
If merged, then in theory thousands of new Finnish entries on Wiktionary would emerge, in the form of "dialectal" words which are actually Kven words. If someone bothered to add them all (I, stubbornly, might) - then every Kven word and declension would need to be added under Finnish, and certain words and forms which don't even exist in Finnish dialects in Finland would be present. Every Kven word, even if the nominative singular is identical to Finnish, has a separate declension chart, every single one - there would then need to be a separate template to show these (I think Finnish Wiktionarians would be quite annoyed by this).
Kvens in Norwegian have fought very hard for their language, they have gotten their own language institute with a promotion of literature and culture in the Kven language - erasing their language from Wiktionary and treating it as a dialect of a language they don't even speak would be a huge slap in the face. Finns in Finland who speak a dialect of Finnish, also all know standard Finnish, Kven people do not. If a Kven person handed in an essay at a school in Finland, every other word would be marked as wrong or a typo. Supevan (talk) 22:49, 2 November 2022 (UTC)Reply

This entire argument can be boiled down to "Kven is standardized". So is Valencian and Croatian, but we still don't treat them as separate languages. — SURJECTION ^{/ T / C / L /} 14:57, 5 November 2022 (UTC)Reply

@Surjection: Actually, Kven isn't firmly standardised afaik. Thadh (talk) 14:58, 5 November 2022 (UTC)Reply

We should. Supevan (talk) 17:35, 5 November 2022 (UTC)Reply

@Supevan Most of these points were already raised for Meänkieli. I will try to answer them anyways.

1) First, our standard procedure is to emphasise linguistics over politics, even when much more controversial (see WT:Serbo-Croatian).

2) Secondly, and most importantly, you claim all Kven inflection should be incorporated into Finnish. This is false. There is already a ridiculous amount of variation in the inflection of the various Finnish dialects, and none of it is represented here. We simply do not have the capacity to maintain 30 different tables containing dozens of inflected forms. Additionally, natives do not stick to one variety of Finnish but mix standard Finnish grammar with that from various dialects and registers. It would also be naive to assume that Kven speakers all use one well-defined standard themselves. A language with a morphology as righ as that of Finnish leaves much space for variation.

3) You say, "thousands of new Finnish entries would emerge, in the form of 'dialectal' words which are actually Kven words", but this is only true if one assumes Kven not to be a collection of Finnish dialects, which is not a popular opinion among linguists. Besides, only a small number of these terms are exclusive to the Ruija dialects.

brittletheories (talk) 13:46, 27 January 2023 (UTC)Reply

2023

Church Slavonic and Moravian

Latest comment: 5 months ago20 comments8 people in discussion

Technically Old Church Slavonic and Church Slavonic should be two two separate languages (?), but we only have the former probably because of the small number of editors. These languages are always treated as two different languages in etymology. For now in etymologies and Proto-Slavic pages (*viňaga). For now we trick it as Church Slavonic: {{l|cu|асдф}} or Church Slavonic: {{desc|cu|асдф|nolb=1}}. That is not very convenient, we should have separate etycode for Church Slavonic.

We Should also have etycode for Czech Moravian, which is also pretty often used in Proto-Slavic pages (and many etym dictionaries), Serbo-Croatian has templates like that (ckm, sh-kaj, sh-tor). Sławobóg (talk) 12:53, 5 February 2023 (UTC)Reply

@Павло Сарт, Atitarev, Kamen Ugalj, Skiulinamo, Rua, ZomBear, Bezimenen, IYI681, Vininn126 pinging some people that might be interested. Thadh (talk) 13:03, 5 February 2023 (UTC)Reply

Support @Sławobóg I completely agree with you, we need a separate etymological code for the usual Church Slavonic language. I constantly thought about it, why is it not there.. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply

Support for Church Slavonic Безименен (talk) 13:45, 7 February 2023 (UTC)Reply

Oppose for Czech Moravian: there would be 20-30 more regional varieties that could spring if one started Balkanizing Slavic languages + I don't want to give food for thought to Z-Russians. There are already talks for forging Novorussian, Transnistrian, or Lipovan Russian in order to justify their expansive aspirations over former Imperial Russian territories. Безименен (talk) 13:45, 7 February 2023 (UTC)Reply

Support for Church Slavonic AshFox (talk) 11:51, 6 January 2025 (UTC)Reply

Support for Church Slavonic, or at least there should be a concrete way to handle non canonical words. Chihunglu83 (talk) 11:55, 6 January 2025 (UTC)Reply

@Sławobóg @AshFox @Павло Сарт, @Atitarev, @Rua, @ZomBear, @Bezimenen, @IYI681, @Thadh

It's been a long time, but rereading this thread, we can see at least 5 people for splitting Church Slavonic. I propose to split Church Slavonic and give it etymology codes for the two variants, which I think best matches consensus by number of votes, even if there is disagreement within that. As to Moravian, I think it would be a safe split, but we only had 2 people speak up on it. I'd like the input of other Czech editors. I'll also add Wiktionary:Language_treatment_requests#East_Lechitic_typology and say that the dialect groups all got etymology codes and it has not led to more codes and has overall been a massive benefit. Vininn126 (talk) 12:11, 6 January 2025 (UTC)Reply

@Vininn126: What two variants do you mean? Russian and Serbian? Russian and Croatian? I think this was always the issue with splitting, because we don't have enough people that could comment on which varieties can and cannot be handled together. Thadh (talk) 16:07, 6 January 2025 (UTC)Reply

Perhaps those in favor of splitting could comment. Vininn126 (talk) 16:15, 6 January 2025 (UTC)Reply

It appears it should have 4 variants. Vininn126 (talk) 09:48, 7 January 2025 (UTC)Reply

I have made zls-chs at Module:languages/data/exceptional. As far as etycodes for the recessions and setting east South-Slavic as descendants, this thread should be expanded. At this time an etycode for Moravian needs more input as well. Vininn126 (talk) 10:24, 7 January 2025 (UTC)Reply

> Etym-codes for recensions of Church Slavonic. AshFox (talk) 12:09, 11 January 2025 (UTC)Reply

I also propose to do away with similar problems in the tree of Slavic languages once and for all. I suggest:

South Slavic:

1. Add etymological code for Old Serbo-Croatian (zls-osh). With a redirect to modern Serbo-Croatian. Meets regularly in {{R:sla:ESSJa}}.

2. Add etymological code for Old Slovene (zls-osl). With a redirect to modern Slovene. Meets regularly in {{R:sla:ESSJa}}.

3. Move the Macedonian language to the descendant of Old Church Slavonic, as it was done some time ago with the Bulgarian language.

4. Add etymological code for Church Slavonic (cu-chu). Perhaps even with a division into Russian Church Slavonic (cu-rcu), Serbian Church Slavonic (cu-scu) and others, if any.

West Slavic:

1. Add etymological code for Middle Polish (zlw-mpl). With a redirect to modern Polish or (?). @KamiruPL, Vininn126

2. Add etymological code for Old Slovak (zlw-osk). With a redirect to modern Slovak. It was high time to do it! Meets regularly in {{R:sla:ESSJa}}. Especially if even Early Modern Czech (cs-ear) was awarded a separate code.

3. Possibly add (family code) a Czech–Slovak languages (zlw-csk) ?. Just like there are Lechitic (zlw-lch) F.

4. It's possible: add etymological code for "Old Sorbian" (see Wendish/Lusatian ?) (zlw-osb)? Perhaps with a redirect to Upper Sorbian or (?).

East Slavic:

1. Rename etymological codes Old Ukrainian (zle-ouk) & Old Belarusian (zle-obe) → Middle Ukrainian (zle-muk) & Middle Belarusian (zle-mbe), respectively. A similar request from another user was about six months ago (Wiktionary:Beer parlour/2022/September#“Old Ruthenian” language). Therefore, with "Old" for those languages, these are "parts" of Old East Slavic until the 14th c. (this is indicated on the en.Wikipedia).

2. Probably it is worth removing the Old Novgorod from the descendants of the Old East Slavic. Make it a separate and parallel ancient language in the East Slavic subgroup. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply

3. Add etymological code for Pannonian Rusyn with a redirect to Rusyn (rue).

PS: LOL, I'm serious, add an etymological code for "Early Proto-Slavic" (sla-ear) (?) with a redirect to Proto-Balto-Slavic (?). Because Wiktionary "for the standard" uses a rather late version of the Proto-Slavic language. And sometimes in the Etymology section it may be necessary to indicate an earlier form, and the presence of a separate etym-code for "Early PSl." would not be superfluous. --ZomBear (talk) 19:50, 5 February 2023 (UTC)Reply

I don't think any "Old Sorbian" is attested. Both Upper Sorbian and Lower Sorbian are attested only from the 16th century, and they were already distinct at that point. In theory there could be a code for Proto-Sorbian, but it would have to be a full-fledged protolanguage, not an etymology-only language. —Mahāgaja · talk 20:17, 5 February 2023 (UTC)Reply

@Mahagaja Yeah, I'm not sure about "Old Sorbian" either. This suggestion is only possible. I relied on the fact that in {{R:sla:ESSJa}} sometimes there are words with abbreviations "ст.-луж."/"др.-серболуж." ("старолужицкий"/"древнесерболужицкий" = translation "Old Sorbian") without specifying where the word belongs - to the Upper or Lower Sorbian language. --ZomBear (talk) 21:09, 5 February 2023 (UTC)Reply

@ZomBear: I agree with most of your suggestions, except for Old Serbo-Croatian and Old Sorbian. Serbs and Croats never had an organized shared language until 17-18 century. One could perhaps talk about an Old Serbo-Croatian stage in the development of the Dinaric Slavic complex, but there never was a common language that could be associated with this period (leaving aside the Bosno-Rascian recension of Church Slavonic or Glagolitic Croatian). The same holds in even greater magnitude for Sorbian. Sorbs may self-identify as one people ethnically, but linguistically their languages are noticeably divergent.

PS I also don't see much educational value in copying all the distinctions that you can find in ESSJa. Note that it often gives old spellings that precede various spelling reforms, dialectal forms which don't follow any orthographic standard, morphological variants (like diminutive forms, etc.) which don't contribute much additional insight, it provides local colloquial meanings which are clearly recent innovations, etc. I personally prefer a more concise and economic presentation for reconstructed terms rather than having 10-15 dialectal spellings of Serbo-Croatian or those monstrosities that are given as dialectal variants of Polish/Bulgarian/Slovenian by ESSJa. Meiner Meinung nach, such an information should go to the respective page of the daughter language, rather than overblowing the proto-Slavic Descendants section.

PS2 Early proto-Slavic is a useful designation, however, I don't know where exactly where one should draw the border between Early, Middle and Late proto-Slavic and what notation should be applied. Безименен (talk) 13:30, 7 February 2023 (UTC)Reply

As it stands, Middle Polish is listed as a variant of Modern Polish. We do see some significant phonological changes and a few semantic ones as well, however, it's hard to say whether it should have its own code or not. Even if it did, it would certainly be a redirect to Modern Polish, seeing as it's a period of only about 1250 years. (1500-1750). Vininn126 (talk) 13:36, 7 February 2023 (UTC)Reply

@Vininn126: That's 250 years. —Mahāgaja · talk 15:16, 7 February 2023 (UTC)Reply

The one and the two are right next to each other.

Polish Silesian and Silesian

Latest comment: 2 years ago14 comments3 people in discussion

@Shumkichi @KamiruPL The Cieszyn Silesia Polish category has many terms that should probably be moved to Silesian proper. Can we figure out which ones we need to fix? Vininn126 (talk) 12:29, 8 March 2023 (UTC)Reply

Also maybe @Hythonia, @Sławobóg Vininn126 (talk) 12:30, 8 March 2023 (UTC)Reply

Idk where Silesian proper starts and Silesian Polish ends so I don't think I'll be of much help o_ _ _ _ _ _ _ _ _ _ _ _ O Maybe let's just assume they'd all be used in Silesian anyway, and then we can add Polish headers to the few entries that can be considered dialectal Polish after we find some sources later??? Shumkichi (talk) 13:33, 8 March 2023 (UTC)Reply

@Vininn126, Shumkichi Not to throw a monkey wrench into this discussion but ... I read the Wikipedia article on Silesian and it seems there's debate over whether it's a separate language as well as a not-yet-established writing system. Given this, I wonder if it wouldn't be better to unify Silesian and Polish similarly to the way that all Chinese lects as well as Serbo-Croatian are unified. The motivation here is practical: it's significantly more difficult to implement and maintain all the infrastructure for two separate L2's vs. one unified L2, and the minority status of Silesian means it's likely to not get much love as a separate L2 (compare the situation with Jeju vs. Korean and Scots vs. English). Benwing2 (talk) 06:19, 16 March 2023 (UTC)Reply

@Benwing2 I've actually been trying to do some research on this. One problem with that system are the politics involved - there is a considerable Silesian group that consider it separate. I've also been trying to do some research on the pronunciation, but there are some major difference that point to Silesian having come from an older variant of Polish, as opposed to a modern one. And as to the orthography, recently, Ślabikorz śląski was introduced and has been fairly widely adapted, even silling.org has a normalizer - I've included all of this in WT:About Silesian, and I would actually like to go through all the entries and do a major cleanup. I've even been trying to set up other infrastructure. Vininn126 (talk) 09:59, 16 March 2023 (UTC)Reply

As to the fact of it coming from an older variant - there are significant sound differences, such as maintaining distinctions from previous long vowels, having more of a 7 vowel system like in Italian, and some significant grammatical differences like continuing the old aorist in a past tense system that's completely different. Vininn126 (talk) 10:22, 16 March 2023 (UTC)Reply

@Vininn126 I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary. IMO the latter question should be determined by what makes for less work and duplication. If the majority of terms in Silesian are the same as in Polish (which I suspect they are), it might make sense to unify them. The current set of lemmas is non-representative in that it mostly covers lemmas that are different in Silesian. Benwing2 (talk) 15:25, 16 March 2023 (UTC)Reply

@Benwing2 In order to determine that we need more data on that and currently there aren't any major Silesian dictionaries aside from Silling, which is relatively new, and it's currently doing a massive import of words. Currently they are important a Polish-Silesian dictionary so based on that alone it would suggest a lot sharing. However further work needs to be done to determine how different they really are. As someone who works with it more, I'd say it's not any more different than some of the differences between other Slavic languages, which are remarkably similar. Vininn126 (talk) 15:34, 16 March 2023 (UTC)Reply

@Vininn126: Makes sense, thanks. Benwing2 (talk) 15:42, 16 March 2023 (UTC)Reply

@Benwing2 And I think you didn't understand his point. Silesian is not a dialect of Polish since it doesn't come from modern Polish - they both come from Middle Polish (or you could call it Middle Silesian, it doesn't matter, it's just that Polish's always had more speakers, hence the privileged position of Polish over other dialects). That's why your comparison to Serbo-Croatian makes no sense since S-C. is a single language with most of its officially recognised "varieties" not even being different dialects nor even subdialects but simple local variants with at most a few different words, lol. Silesian and Polish, on the other hand, are full of seemingly small but SYSTEMATIC differences that all add up to them being sufficiently different (more so than e.g. Czech and Slovak, I'd say). And the important thing is that they differ not only in vocabulary but also in syntax.

"If the majority of terms in Silesian are the same as in Polish (which I suspect they are)" - no, they are not the same, and your suspicion is wrong. It's as if you looked at the spelling of some Kashubian words and compared them to their Polish cognates - yes, their orthographies are quite similar but it's jsut a superficial similarity. Shumkichi (talk) 20:17, 16 March 2023 (UTC)Reply

@Shumkichi Don't get all worked up over this. You didn't even read the first line of my comment: "I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary." Benwing2 (talk) 20:33, 16 March 2023 (UTC)Reply

@Benwing2 I'm not worked up??? And I did read it, that's why I said the orthographies are different, and that's enough NOT to merge Silesian entries with Polish ones. Polish has an official body that regulates its orthography so it can't use two different spelling norms that also differ in pronunciation. Capisci? Shumkichi (talk) 20:55, 16 March 2023 (UTC)Reply

Also, according to your argument, we should merge Czech and Slovak. But KKK, as they say in Polent. Shumkichi (talk) 20:56, 16 March 2023 (UTC)Reply

Alright, let's cool it here. It seems like Silesian is here to stay at least for the time being. Vininn126 (talk) 21:17, 16 March 2023 (UTC)Reply

Renaming Proto-Mon-Khmer to Proto-Austroasiatic

Latest comment: 2 years ago12 comments4 people in discussion

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages.

The page names can simply be renamed, and the lemmas do not need to be changed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer.

So how can this name change be done? Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Formerly:

Austroasiatic
- Munda
- Mon-Khmer (which Shorto reconstructed)
  - (about a dozen branches)

Now the consensus is that the tree has a rake-like structure (per Sidwell):

Austroasiatic
- (about a dozen branches including Munda)

That's why Mon-Khmer is an obsolete term now.

Similarly, with Sino-Tibetan, it formerly was:

Sino-Tibetan
- Chinese
- Tibeto-Burman (which Matisoff reconstructed)
  - (dozens of branches)

Now the consensus among many scholars is that the tree has a rake-like structure with many "fallen leaves" (quoting George van Driem), making Tibeto-Burman obsolete:

Sino-Tibetan
- (dozens of branches including Chinese)

Ngôn Ngữ Học (talk) 22:27, 18 March 2023 (UTC)Reply

Support. If this change happens we should delete Category:Mon-Khmer languages. Benwing2 (talk) 23:41, 18 March 2023 (UTC)Reply

Abstain. I prefer to wait for when an actual new reconstruction of Proto-Austroasiatic is published to do the move, see what I wrote at Wiktionary:About Proto-Mon-Khmer, but I do not actually oppose to moving now. However, if the move do happen, I'm would like to see a line like "This reconstruction is from Shorto (2006) for the obsolete concept of Proto-Mon-Khmer, and should not be treated as actual reconstruction of Proto-Austroasiatic, which as of now has not yet fully materialized, and is simply "placeholder" for the actual Austroasiatic etymologies" (probably as a template) to be added as warning for every reconstruction item. I very much want the same thing to happen to "Proto-Sino-Tibetan", considering a lot of them are no way near actual Proto-Sino-Tibetan, and the reconstruction items themselves are "icky" to say at least. PhanAnh123 (talk) 01:52, 19 March 2023 (UTC)Reply

@PhanAnh123: Take a look at Sidwell's Proto-Austroasiatic reconstruction and Shorto's Proto-Mon-Khmer reconstruction. Sidwell's inclusion of Munda and Nicobarese had virtually no impact on his Proto-Austroasiatic reconstruction (versus if he had only included the "Mon-Khmer" languages) because he considered Munda to be highly innovative and restructured, with few original retentions from Proto-Austroasiatic. Furthermore, it would be very confusing to have duplicates for both Proto-Austroasiatic and Proto-Mon-Khmer. I would just merge them as Proto-Austroasiatic. Ngôn Ngữ Học (talk) 19:25, 19 March 2023 (UTC)Reply

I have no intention to keep Proto-Austroasiatic and Proto-Mon-Khmer seperated (I consider Proto-Mon-Khmer to be likely a ghost after all), what I mean is that we either should keep the entries as are until actual Proto-Austroasiatic reconstruction comes about, or move the "Proto-Mon-Khmer" items to Proto-Austroasiatic but with the warning added. I know what you mean by "inclusion of Munda and Nicobarese had virtually no impact", because like Sidwell, I do think these branches are quite innovative, however, that does not mean I agree to move the Shorto's Proto-Mon-Khmer reconstruction to Proto-Austroasiatic without any warning, since Austroasiatic linguistics have progressed quite a lot even outside of those two branches. The vocalism in Shorto (2006) was very rudimentary reconstructed, which the reconstruction of the descendant branches as well as the recent "sneak peek" to Proto-Austroasiatic reconstruction by Sidwell improved upon; furthermore, the syllable structure itself is also slightly changed, it is now thought that a glottal stop phonetically presented in any Proto-Austroasiatic word that ended in a pure vowel (meaning any word ended in *aːj would still have *aːj, but those ended in **aː would automatically became *aːʔ), plus there is the status of *ʄ- that very much awaits assessment in the actual reconstruction of Proto-Austroasiatic. Like I said, I don't oppose moving, but there much be strings attached. PhanAnh123 (talk) 01:53, 20 March 2023 (UTC)Reply

@PhanAnh123, Ngôn Ngữ Học Such a warning can be added by bot to the top of all entries if both of you agree. Benwing2 (talk) 03:30, 20 March 2023 (UTC)Reply

@Benwing2: Agree, a warning placed by a bot should be sufficient. Also @PhanAnh123, we can use Sidwell & Rau (2015) for some of the basic Swadesh list words, but a full reconstruction of Proto-Austroasiatic is currently being done by Sidwell. It should come out in a few years. Ngôn Ngữ Học (talk) 10:19, 20 March 2023 (UTC)Reply

We are all in agreement then, so obviously now I support moving. With this Munda cognates can be directly added to the entries. PhanAnh123 (talk) 10:29, 20 March 2023 (UTC)Reply

Agree on the support.

~~Abstain~~ Support. I've seen assertions that Mon and Khmer actually form a subgroup within the traditional Mon-Khmer grouping. Of course, it could be something messy as with Indo-European, where we have at least Indo-Iranian and Balto-Slavonic. --RichardW57m (talk) 16:19, 21 March 2023 (UTC)Reply

There is no such thing as a Mon+Khmer grouping within Mon-Khmer. Some classifications propose Eastern, Southern, and Northern groupings within Mon-Khmer, but none of them put Monic and Khmeric together. Please consult the Austroasiatic languages article on Wikipedia to get a basic refresher of all the major previous classifiations. Ngôn Ngữ Học (talk) 15:04, 23 March 2023 (UTC)Reply

The cited articles do show that their crown group is larger than Monic + Khmeric, but it does look as though we don't need to worry about anyone using 'Mon-Khmer' to denote their (weak) association. --RichardW57m (talk) 11:36, 28 March 2023 (UTC)Reply

Renaming Proto-Hmong to Proto-Hmongic

Latest comment: 2 years ago3 comments3 people in discussion

Category:Proto-Hmong language needs to be changed to Category:Proto-Hmongic language. See Hmongic languages and Hmong language on Wikipedia.
Category:Proto-Mien language needs to be changed to Category:Proto-Mienic language. See Mienic languages and Iu Mien language on Wikipedia.

The Hmong-Mien language tree is like this:

Hmong-Mien
- Hmongic
  - Hmong
  - (dozens of languages)
- Mienic
  - Iu Mien
  - (several languages)

Proto-Hmong refer thus refers to only Hmong, not Hmongic. There are dozens of Hmongic languages that are not Hmong. They include Hmu, Pa Hng, Bunu, She, and others.

Same goes for Proto-Mienic. Proto-Mien technically refers to Proto-Iu Mien, but does not include Kim Mun, Biao Min, and Dzao Min.

Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Support. If we make this change we also need to rename the families, i.e. Category:Hmong languages -> Category:Hmongic languages and Category:Mien languages -> Category:Mienic languages. This is similar to the change from Category:Korean languages -> Category:Koreanic languages, which was implemented in Jan 2022. Benwing2 (talk) 23:45, 18 March 2023 (UTC)Reply

Support. Theknightwho (talk) 17:57, 1 June 2023 (UTC)Reply

Okinoerabu and Tokunoshima

Latest comment: 2 years ago3 comments3 people in discussion

Discussion moved from Wiktionary:Beer parlour/2023/June.

These are two Ryukyuan languages that we currently call Oki-No-Erabu and Toku-No-Shima, because that’s how they’re spelled in ISO 639. However, literature invariably uses the unhyphenated forms, and they’re also much easier to read.

Could we please therefore rename them to the unhyphenated forms? Theknightwho (talk) 19:39, 4 June 2023 (UTC)Reply

I dislike the EN penchant for glomming Japanese names into long undifferentiated strings, as I find that this instead makes them harder to read, and it erases the distinction between the actual component terms.

In some cases, the resulting interpretation or partial-expansion goes sideways, as we see at w:Tokunoshima, where the English text describes this as "Tokuno Island" -- the no portion is simply the genitive particle の (no), so as Japanese, this is better thought of as "Toku Island".

Name derivation, for those inclined to dive into the details...

The Japanese historical record bears this out, with the first mention in a 699 text as 度感. At the time, this may have been pronounced as something like twokom or dwokom, based on the Middle Chinese readings and known man'yōgana sound values, although some sites render this as toku or doku; it is not clear to me where the ku reading for 感 comes from. At any rate, the no is not part of the base of the name.
For those interested and who can read Japanese, here are several references at the Kotobank aggregator site. Search the page for 度感.
See also this entry at Nihon Jiten, which also lists 度感嶋 as an attested spelling with the pronunciation Toku Shima, further evidence that the base name is simply Toku and that the no is the particle.

That aside, I do see that w:Tokunoshima language lists the alternative rendering "Toku-No-Shima", and the w:Okinoerabu dialect cluster similarly lists the alternative rendering "Oki-no-Erabu". A quick-and-dirty Google hits comparison (including "the" to filter for English hits):

In the English-language web, the allthewordsruntogether renderings appear to be most common. Meanwhile, the

Language Subtag Registry based on ISO 639 and maintained by IANA

(https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) does indeed use the hyphenated descriptors.

Meh. After digging into this some, I realize I just don't care all that much one way or the other. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:09, 9 June 2023 (UTC)Reply

Searching on Google Scholar, it seems the unhyphenated forms are more common, but I concur with Eirikr's views that they look worse.

However, I would suggest that if we were to retain the hyphens, the two languages should be renamed to "Oki-no-Erabu" and "Toku-no-shima" (or the rarer "Toku-no-Shima"), since the these are more common from Google Scholar, and also because "no" is a particle that shouldn't be capitalised in a proper noun, cf. Southend-on-Sea, Stoke-on-Trent or von, de, etc. in surnames. – Wpi (talk) 11:20, 21 June 2023 (UTC)Reply

Correct language names

Latest comment: 1 year ago28 comments6 people in discussion

Could you correct Juǀ'hoan to Juǀʼhoan, Kwak'wala to Kwakʼwala, and K'iche' to Kʼicheʼ? There's no punctuation in the ethnonyms. If we want to use assimilated English forms, then the latter would be Quiché; I'm not sure about Juǀʼhoan. kwami (talk) 19:16, 13 July 2023 (UTC)Reply

Support. To clarify for people using low-resolution screens: the request is to use the modifier letter apostrophe character ʼ rather than the typewriter apostrophe '; the categories are currently at Category:Juǀ'hoan language (ktz) and Category:K'iche' language (quc). Our usual practice is to use the spelling most common in contemporary English-language discussions of the language. Which is more common in current books and journal articles, Kʼicheʼ or Quiché? —Mahāgaja · talk 19:30, 13 July 2023 (UTC)Reply
Just to be clear, I personally don't care about ASCII substitutions in category names; what I'm concerned about is proper headers in the dictionary entries. But it's fine by me if the two go together.

As for Kʼicheʼ or Quiché, the English-language lit has been moving from the Spanish form to the ethnonym. That's an ongoing trend, though of course not universal (e.g. 'German', 'Greek', 'Armenian' etc.). kwami (talk) 21:15, 13 July 2023 (UTC)Reply

The L2 headers and category names do need to match, at least for readers using tabbed browsing. Otherwise, the categories won't appear in the correct language tab. I think there are also bots that require the L2 header to be the canonical language name in order to work properly. —Mahāgaja · talk 22:20, 13 July 2023 (UTC)Reply

Okay. Works for me. kwami (talk) 22:24, 13 July 2023 (UTC)Reply

@Kwamikagami Normally at Wiktionary we use typewriter apostrophes rather than curly single quotes, and this issue is somewhat controversial, so this change is unlikely to happen without significant further discussion and consensus. Benwing2 (talk) 04:27, 24 July 2023 (UTC)Reply

I'm not requesting quote marks. That would also be incorrect. Rather, since we are attempting to use the endonym, IMO it should be the glottal stop or ejective diacritic that's in the orthography. kwami (talk) 04:41, 24 July 2023 (UTC)Reply

Indeed, no one is advocating curly single quotes. The modifier letter apostrophe is a different character; it's a letter, not a punctuation mark. There are several other language names besides these two that ought to be using it. —Mahāgaja · talk 06:23, 24 July 2023 (UTC)Reply

Sarci, for example, which was just moved to its endonym (minus tone marking). But I thought I'd wait to see how things went before attempting a more comprehensive proposal. kwami (talk) 06:27, 24 July 2023 (UTC)Reply

Support - this isn't a matter of using curly quotes vs straight ones; it's a matter of using the correct letter instead of punctuation. We already do this extensively in entries for languages that use it anyway. Theknightwho (talk) 15:39, 24 July 2023 (UTC)Reply

Going through WT:LOL, these are the languages whose names have the modifier letter apostrophe at Wikipedia but the typewriter apostrophe here:

More information

aah: Abu' Arapesh
bcr: Babine-Witsuwit'en
bei: Bekati'
bbj: Ghomala'
bko: Kwa'
byd: Benyadu'
caa: Ch'orti'
crq: Iyo'wujwa Chorote
crt: Iyojwa'ja Chorote
fmp: Fe'fe'
gaq: Gata'
gwi: Gwich'in
ilu: Ili'uun
kek: Q'eqchi
kjb: Q'anjob'al
ktz: Juǀ'hoan
kuk: Kepo'
kuy: Kuuku-Ya'u
kwk: Kwak'wala
lni: Daantanai'
lra: Rara Bakati'
lul: Olu'bo
mgo: Meta'
mhy: Ma'anyan
mlu: To'abaita
mtk: Mbe'
muc: Mbu'
mym: Me'ne
nea: Eastern Ngad'a
nnz: Nda'nda'
ood: O'odham
pav: Wari'
phq: Phana'
poh: Poqomchi'
pqa: Pa'a
quc: K'iche'
rob: Tae'
sda: Toraja-Sa'dan
srs: Tsuut'ina
ssq: So'a
stv: Silt'e
tfn: Dena'ina
tln: Talondo'
tyh: O'du
tzj: Tz'utujil
ulm: Ulumanda'
ulu: Uma' Lung
wmh: Waima'a
xkk: Kaco'
xky: Uma' Lasan
xoc: O'chi'chi'
myn-chl: Ch'olti'

Other languages with typewriter apostrophe whose Wikipedia article uses a different character include:

gez Ge'ez → Geʽez with ʽ (U+02BD modifier letter reversed comma)
hps Hawai'i Pidgin Sign Language → Hawaiʻi Pidgin Sign Language with ʻ (U+02BB modifier letter turned comma)
num Niuafo'ou language → Niuafoʻou with ʻ (U+02BB modifier letter turned comma)
tct T'en → Tʻen with ʻ (U+02BB modifier letter turned comma)
tsl Ts'ün-Lao → Tsʻün-Lao with ʻ (U+02BB modifier letter turned comma)

I support making all of these changes. —Mahāgaja · talk 19:54, 24 July 2023 (UTC)Reply

I oppose these changes. What is the actual benefit? From the above discussion, there are at least three different Unicode apostrophe-like characters involved, which are easily confused, and it will make it significantly harder to type the language names into headers, categories and the like. This is going to be a major pain in the ass for people like me who will have to clean up wrongly-typed apostrophes in language headers in innumerable articles created by IP's and other occasional contributors, who are unlikely to be able to type the right character. Furthermore, even with these changes, the language names in many cases will not actually match their endonym spelling; cf. the proposed Oʼodham, which is actually spelled ʼOʼodham natively with two apostrophes. Similarly, as pointed out by User:Kwamikagami, our spelling of the CAT:Tsuut'ina language doesn't include the tone mark that is present in the native orthography, and wouldn't even with the change in apostrophe. I should add that Wikipedia uses these Unicode chars specifically because Kwami went around renaming all the articles (formerly they used the straight apostrophes), and is not consistent, e.g. the article on the name of the people is still at O'odham with a straight apostrophe. Glottolog uses straight apostrophes for O'odham; so does , the Endangered Languages Project. In general, our policy is to use the *English* names for languages; we are not forced to use the exact native spelling. While I agree it's a good idea to approximate the spelling (e.g. avoiding exonyms where possible), I disagree we have to take this to the extreme of using the "correct" Unicode apostrophes (which I bet you will find native speakers not using in many cases as well). Benwing2 (talk) 20:22, 24 July 2023 (UTC)Reply

Other people's carelessness in using Unicode is no excuse for us to be careless, and anyway, language names can always be inserted by typing {{subst:\|xyz}}, which doesn't involve any non-ASCII characters. Latin a and Cyrillic а look identical in every font and font style too, but substituting one for the other is an error; it's no different with ' and ʼ. —Mahāgaja · talk 07:05, 25 July 2023 (UTC)Reply

I think you're missing the point. We don't include Cyrillic letters in language names, either. Benwing2 (talk) 07:13, 25 July 2023 (UTC)Reply

I know that. My point is that using ' where ʼ belongs is as bad as using Cyrillic letters in Latin-script language names. —Mahāgaja · talk 07:24, 25 July 2023 (UTC)Reply

I would support the changes, but only if they're truly the most used forms in terms of literature. Ideally we'd have people from each community give their opinions here, but alas, we're not afforded that. If the specific respective unicode apostrophe is used in literature, then we can use it here too. I can see the problem with inputting the apostrophes that's been brought up, but let's be real here, how many people are actually working on these languages to where this'd be a serious problem? I feel like this could be fixed with just an about:XYZ page or something. These languages unfortunately don't get enough traction. But again, I'd only support this if it can be proven that they're the forms used in English literature. AG202 (talk) 01:49, 17 August 2023 (UTC)Reply

@AG202 I agree with you, that is one of the points I made above, which has gotten lost in this thread. Benwing2 (talk) 02:08, 17 August 2023 (UTC)Reply

Ahh, got it, missed that, apologies. AG202 (talk) 02:11, 17 August 2023 (UTC)Reply

Hmm... like Benwing, my initial inclination is to oppose this, because the odds of anyone being able to type names with the fancy characters when adding entries is low (and given recent events, I wonder if one or more admins would block people for 'adding wrong language names' if people keep typing the names they're able to type). OTOH, I recognize that we require entries themselves to be input using correct spellings (with accents etc) and not in hacky ways... If we had a system like the French Wiktionary where no-one had to type the language names (instead only typing language codes, which only consist of easily-typeable ASCII characters), then changing the displayed character would be less of a problem (though still hard for navigating to categories, etc). Do we have a template with a simple short name people could subst: to produce the untypeable names, so they could write =={{subst:langname|foo-bar}}== to get ==Fooʾbar==? Or if we took this type of functionality and had a button people could periodically press (hosted on here like that Javascript is, not as a Python script on the computer of a user who might leave the project or be too busy to run it) that would search the database for instances of the typeable names and update them to the untypeable names, then it would be less of a problem (although it'd still be creating an unending maintenance task). - -sche (discuss) 16:22, 16 August 2023 (UTC)Reply

We do have {{subst:x2i}} that will convert the string _> to ʼ, but more helpfully we have (as I mentioned above) {{subst:\}}, which converts a language code to its canonical name. —Mahāgaja · talk 21:55, 16 August 2023 (UTC)Reply

Even with these workarounds, it seems extra work for no gain. There is no rule that says we need to follow native orthography to the T in our English names for languages; otherwise we'd have Deutsch in place of German, and русский in place of Russian, etc. I have seen no arguments that indicate why having these special apostrophes in language names gains us anything except some nebulous sense of "correctness". Benwing2 (talk) 23:07, 16 August 2023 (UTC)Reply

Deutsch is the endonym. What we're talking about here is using the proper Unicode characters for whichever name we decide to use. The apostrophe is a punctuation mark, and the glottal stop is not punctuation. Using the letter for glottal stop is analogous to using en-dashes and minus signs rather than hyphens. kwami (talk) 00:28, 17 August 2023 (UTC)Reply

Deutsch is the endonym

Yes exactly. The exonym can have apostrophes while the endonym has Unicode whatever. Nothing wrong with that. Benwing2 (talk) 00:56, 17 August 2023 (UTC)Reply

@Benwing2 I think we’re getting too focused on Unicode. The thing we should care about is what character is actually intended, which isn’t necessarily the same as what they actually wrote. To use an analogy: we don’t lemmatise the palochka with the numeral 1 or Latin l, even though both are probably more common than the actual palochka character, and that’s because we all know that the writer intended to use a palochka irrespective of what character they actually wrote in Unicode. Theknightwho (talk) 02:18, 17 August 2023 (UTC)Reply

@Theknightwho I think we'll just have to agree to disagree here. I don't think the analogy you are making here with palochka is very applicable and you're still missing the point made by User:AG202 about what's the most common usage in scholarly and other English sources. Benwing2 (talk) 02:24, 17 August 2023 (UTC)Reply

@Benwing2 The whole reason I brought it up is as an example of when the most common usage isn’t necessarily an indicator of what’s most appropriate. I’ve also seen plenty of typography mistakes in scholarly sources, too, or fonts that map common characters to a glyph of what is actually intended. You can’t just rely on the codepoint. Theknightwho (talk) 02:27, 17 August 2023 (UTC)Reply

Just to be clear, when I said common usage, I meant what character is actually intended, not necessarily parsing specifically based on codepoints. However, this isn't an easy task for sure, unfortunately. AG202 (talk) 02:49, 17 August 2023 (UTC)Reply

Doesn't matter whether it's the endonym or exonym: the apostrophe is a punctuation mark, and these are not punctuation marks. Yes, we can substitute, and that's common enough. We could also use a hyphen for a minus or a double hyphen for an em dash -- those substitutions are common too -- but that doesn't mean we should do that. We could substitute click letters with exclamation marks and pipes. But if we want Wiktionary to look professional, then IMO we should typeset it professionally, and not use ASCII substitutes just because they're easier to type. kwami (talk) 04:06, 17 August 2023 (UTC)Reply

Ktunaxa, Secwepemctsín

Latest comment: 1 year ago4 comments3 people in discussion

Could we rename Kutenai (kut) to Ktunaxa, and Shuswap (shs) to Secwepemctsín please? The first names are the Anglicized terms for the languages, and are somewhat outdated and/or not in use among speakers. GKON (talk) 22:46, 12 August 2023 (UTC)Reply

@-sche Can you weigh in here? There is nothing wrong per se with having exonyms for languages (we say "German" not "Deutsch" for example), and I note that Wikipedia still uses Kutenai and Shuswap. The main issue in my view is (a) avoid pejorative terms, and (b) use the most common terms as found in English-language sources. Benwing2 (talk) 23:37, 15 August 2023 (UTC)Reply

For Shuswap, almost no-one uses Secwepemctsín in English, either in books overall as tracked by Ngram Viewer, or in reference works about the language at Glottolog. For kut, Kutenai was the main name (in reference works/Glottolog and overall/Ngrams) until a few years ago, when Ktunaxa started to just barely overtake it. - -sche (discuss) 17:45, 16 August 2023 (UTC)Reply

That is true, however I would argue that for Shuswap, the use of this term is declining as seen by Ngram. The replacement is looking like Secwepemc, which is another word for the language that is kind of a good middle ground between Shuswap and Secwepemctsín, wouldn't you say? Also, the actual communities in Secwepemc traditional territory mostly use Secwepemc. For example, if there is some quote or phrase on a billboard in Shuswap, the billboard will say that it's in Secwepemc. Another real life example was a board in Banff town, which had greetings in multiple languages. Among them was Blackfoot, Stoney, Ktunaxa, and Plains Cree, (apart from Ktunaxa) these are all Anglicized terms. However the greeting in Shuswap was said to be Secwepemc.

Shouldn't we be using this term, seeing as it gets the most use in these modern times? GKON (talk) 17:09, 20 August 2023 (UTC)Reply

Akan varieties

Latest comment: 1 year ago2 comments2 people in discussion

@-sche This is another mess. Wikipedia has an article Akan languages yet according to both Glottolog and Ethnologue, all varieties are mutually intelligible and better classified as dialects, and indeed we have a single Category:Akan language (code 'ak'). The correct family tree seems to include a top level division into Fante, Twi and Wasa, all of which have ISO 639-3 codes (respectively fat, twi, wss; and Twi has the ISO 639-1 code 'tw' as well). Twi in turn is divided into Asante, Akuapem and Bono. Fante and all three Twi varieties have their own literary standards, and there is also a unified Akan literary standard based primarily on Akuapem. Up until recently, we had {{dialectboiler}} categories for Fante and Twi, called Category:Fante Akan and Category:Twi Akan. I added etym-only varieties for those two as well as for the Twi lects of Asante, Akuapem and Bono. Then I discovered we also have separate languages under Akan for Category:Abron language (= Bono), Category:Wasa language and Tchumbuli (which has no lemmas, and I have no idea what it is). None of these Akan languages have very many lemmas (< 10 each), and as mentioned Tchumbuli has none. I would recommend either we convert Akan into a family and fix up the hierarchy appropriately, or (preferably) we maintain the single Akan language and convert the sublanguages into etym-only varieties. The list of varieties under Category:Akan language is also somewhat messed up (e.g. what is 'Twi-Fante'?), but that is less important. Benwing2 (talk) 18:10, 17 September 2023 (UTC)Reply

Looking into the history (of the codes, on Wiktionary), I think the sub-dialects simply escaped notice at the time Twi, Fante, and Akan were merged. I note that the two Wasa entries we have are identical to Akan, and the Abron ones are very similar. I would merge them; AFAIK the difference was historically in spelling, not in speech, and since the 70s also not anymore in spelling. (I entered the Abron entries a year before the lects were merged, using a reference published two years before the speakers of Abron and the other dialects of Akan unified their orthographies. The Wasa entries were added in 2021 by a Japanese editor, also using an old pre-reform ref, which the user also used for the Akan spelling: we should check what the modern spelling is...) Re "Twi-Fante" being listed as a "variety" of Akan: it was originally listed as an alternative name of Akan; when 'alternative names for the language' and 'names of varieties' were split into being separate parameters, someone must've mis-assigned it. - -sche (discuss) 06:02, 23 September 2023 (UTC)Reply

New language codes for nested Persian translations

Latest comment: 1 year ago6 comments2 people in discussion

Per Wiktionary:Beer_parlour/2023/October#Persian_nested_translations_-_split_or_labelled?

@Sameerhameedy, @Benwing2, @Theknightwho.

New codes and labels, under "Persian" to work with MediaWiki:Gadget-TranslationAdder.js

"prs" - Dari
"fa-cls" - Classical Persian

Considering "fa-ira" for Iranian Persian. Anatoli T. ^{(обсудить}/^вклад) 05:13, 4 October 2023 (UTC)Reply

Don't we normally use ISO 3166 codes for countries? I'd say it should be "fa-IR". —Mahāgaja · talk 09:24, 4 October 2023 (UTC)Reply

@Mahagaja: Not sure what is right in this case but it must have been done.

Both "prs" and "fa-ira" seem already working but {{t+|اَفْغانِسْتان}} fails to link to fa:افغانستان

Persian:
Dari: اَفْغانِسْتان (fa) (afġānistān)

Iranian Persian: اَفْغانِسْتان (fa) (afġânestân)

Since the code is already working (apart from the interwiki) links, automatic nesting should be possible as well.

Need to make "fa-ira" link to "fa" Wiktionary, just like "cmn" links to "zh" Wiktionary. {{t+|cmn|阿富汗}} to zh:阿富汗

Chinese:
Mandarin: 阿富汗 (zh) (Āfùhàn)

@Benwing2, @Sameerhameedy, @Theknightwho: can someone please fix the the interwiki link? I think it was @Ruakh who made it work for Mandarin. I'll take a look at nesting. Anatoli T. ^{(обсудить}/^вклад) 00:07, 13 October 2023 (UTC)Reply

Actually the new codes still don't work with the translation-adder. Some changes to Module:languages/data submodules need to happen. Anatoli T. ^{(обсудить}/^вклад) 00:19, 13 October 2023 (UTC)Reply

@Mahagaja: "fa-ira" is correct per Module:etymology_languages/data Anatoli T. ^{(обсудить}/^вклад) 00:33, 13 October 2023 (UTC)Reply

Update: @Sameerhameedy: Language code "prs" can now be used for automatic nested translations: Persian\Dari. Just use the language code "prs" in the translation adder but I wasn't able to tweak modules for "fa-ira" or "fa-cls". Anatoli T. ^{(обсудить}/^вклад) 02:58, 13 October 2023 (UTC)Reply

Splitting Mazurian

Latest comment: 1 year ago20 comments5 people in discussion

I would like to open a discussion about the pros and cons of splitting Masurian as an L2 with the langcode zlw-maz and as a descendent of Old Polish. I would also like to preface this that while I am leaning towards split that I am not dead-set on it. The argument is as follows:

w:Masurian dialects would benefit a lot from having a separate L2. There are significant differences in pronunciation (extra vowels non-existant in Polish a loss of quite a few consonants), grammar (different endings from standard Polish), and vocabulary, especially outside the "core" vocabulary. Even a significant number of basic forms end up looking different from Polish, and it has many inflections and conjugations. I could place them in the tables for Polish, but it might get cluttery. I would like to also point out that {{R:pl:SgOWiM}} exists as a good, reliable source for entries.

Problems of splitting - most people do consider this specifically a dialect, even most speakers, and most forms of it today are heavily policized. However, at least up until the 20th century it was distinct and much more difficult to understand in comparison to standard Polish. My problem is that some of these differences are so vast it might not make sense to put them all under Polish. Vininn126 (talk) 21:43, 12 November 2023 (UTC)Reply

A point for not splitting is that some other dialects of Polish might be equally as divergent, such as Łowicz, in some respects. So what might be better is including multiple declension tables and the like. (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg): , @Benwing2, @PUC, @Thadh Vininn126 (talk) 12:58, 14 November 2023 (UTC)Reply

Here is some sample text The little prince in Mazurian. This channel has some other examples. As someone with high proficiency in Polish, I can understand large parts of it but there's also a significant portion that is very difficult, maybe 65% for me. Vininn126 (talk) 17:28, 14 November 2023 (UTC)Reply

@Vininn126 As you know, I tend to lean towards not splitting in cases of doubt, while Thadh leans towards splitting. Comparisons to multi-dialect languages like Occitan and Ancient Greek might be useful. In this case I don't know, but I think we're hampered by the lack of standardization. Benwing2 (talk) 23:06, 14 November 2023 (UTC)Reply

@Benwing2 There is a notation system widely used for Masurian which is present in the Wikipedia article that I'd be able to use for WT:About Masurian if split. Also, using this system would yield in 1) a different pagename 2) different pronunciation section (as the notation system is based on the different pronunciation) 3) different definition section at least outside of "core" vocab, and core vocab would only share 1-2 defs, as opposed to all of the obsolete senses as well. 4) different conjugation/declension section as well Vininn126 (talk) 08:53, 15 November 2023 (UTC)Reply

I’d be in favour of the split. As a native Polish speaker I find it difficult to understand some Mazurian texts and eg. parts of this Mazurian rendition of Colors of the Wind, Farbi Zietrżu would be straight ungrammatical in Polish (the infinitive construction in cÿsz ti słicháł zilkä wicz ‘have you heard the wolf howl’, which looks more Czech than Polish and in Pl. would have to be reworder as ‘czyś ty słyszał jak wilk wyje’ or ‘wycie wilka’ or something, but the infinitive doesn’t work).

Also I’ll note that Mazurian also keeps some phonemes long gone from standard Polish (like the /r̝/ phoneme written rż in the song above which Polish merged with ż /ʐ/).

And, @Vininn126, could you include me too in Polish-related discussions when pinging people? I feel left out ;-) // Silmeth ^@talk 12:28, 15 November 2023 (UTC)Reply

@Silmethule I can add you to the Polish ping group. Yes, the completely different set of phonology and grammar are both big points for me. Masurian also keeps reflexes of Old/Middle Polish pochylone vowels while getting rid of quite a few consonants. Reading up on the Wikipedia article, quite a few experts also claim it's a language. Vininn126 (talk) 14:16, 15 November 2023 (UTC)Reply

Having boned up more on Polish dialectology I'm definitely leaning now more towards split. I haven't been able to find another dialect (that we would mark as such) as divergent as Masurian. There's also a big gap of mutual intelligibility Vininn126 (talk) 15:48, 20 November 2023 (UTC)Reply

I'll also add I was wrong in the original post - the Masurians had a stronger sense of identity even more so than the neighboring regions. Vininn126 (talk) 16:47, 20 November 2023 (UTC)Reply

I'm still wavering, upon listening to more recordings. It might be possible to automatically generate pronunciation sections (even though they would be very, very different), and then it would just be a matter of giving special definitions a label and then I suppose conjugation/declensions... Vininn126 (talk) 09:17, 28 November 2023 (UTC)Reply

I'm no expert, but from what I've read here and elsewhere, I'm inclined to split. We need to decide on whether we want to spell it Masurian or Mazurian; bgc ngrams suggests Masurian is somewhat more common in English. —Mahāgaja · talk 21:20, 4 December 2023 (UTC)Reply
@Mahagaja Definitely Masurian - I write Mazurian because in Polish it's with a z. This is 2.5 votes for splitting - I'd be willing to write up an about page explaining the normalization. Vininn126 (talk) 21:24, 4 December 2023 (UTC)Reply

@Silmethule @Mahagaja Another question would be the langcode. Is the one I proposed best? I doubt it. At this point I'm fairly sure we are splitting.Vininn126 (talk) 13:48, 7 December 2023 (UTC)Reply

@Vininn126 Depending on the choice of Mazurian vs. Masurian, it should be zlw-maz or zlw-mas. Benwing2 (talk) 22:21, 7 December 2023 (UTC)Reply

@Benwing2 You're right, so it's probably gonna end up being zlw-mas. Vininn126 (talk) 22:23, 7 December 2023 (UTC)Reply

I'm going to go ahead with this today and make an entry. I've also been able to contact someone educated in this lect and they'll be able to check anything that I (or potentially we, me and him) make. There is a weak consensus it should be split, and if it's handled right I think it will be much better than smushing everything into Polish. Vininn126 (talk) 17:55, 8 December 2023 (UTC)Reply

@Benwing2 @Mahagaja @Silmethule Sorry for all the pings as of late. I figured now would be a good time to take a pause and look at the current state of things after the decision. We currently have 428 Masurian lemmas, Appendix:Masurian pronunciation, Appendix:Masurian Swadesh list, along with various infrastructure. I know this is a lot of material, I ask you to please take a look at some of these and give your input, and I thought now would be a good time before things got too big, and also at this point I am going to slow down.

Of the existing lemmas, I added mostly cognates, so there aren't many words unique to Masuria, but there are plenty of definitions and of course, pronunciations. I haven't been able to do any work with declensions, as Masurian declensions are too complicated for me at the moment, but I can assure you there are plenty of differences.

I also know I gave the impression I was gung-ho for a split, and also for a split for Goral, which isn't the case, I simply found resistance everywhere I went when trying to add Masurian information - some felt it clogged up the main Polish entry, didn't want particular information, other times I heard that it's remarkably different.

Having added all these terms, I can still see it going either way. On one hand, having it split as a language is a view held by some linguists, but not all (always a problem), and I think the orthography us few Masurian editors have been using easily demonstrate the phonemic difference (the template is phonemic except for (literally) 1 or (potentially) 2 phones, that being the ones represented by <ä> (which might be phonemic) and <ÿ> (which I believe is phonemically /i/).

However, if we merged, as I have seen various reactions to the split, and understandably so, I'd have a few questions.

What would be the best way to represent Masurian pronunciation? We could ignore spelling and put everything under the Polish spelling, using a respelling in the pronunciation module. This is the approach I take with Middle Polish, and it serves me well. For Masurian only terms (such as szmanta), I'd prefer to keep {{zlw-mas-IPA}}, similar what we have currently {{zlw-mpl-IPA}}. However this leaves us with the issue of <ä> and <ÿ>.

Another potential approach would be to keep the spellings, but I'd be less sure about this, as it works better for British/American English. One potential issue this would solve is the problem of standard Polish definitions absent from Masurian.

One other potential issue is the fact that Masurian would ideally be treated as an LDL. Currently Middle Polish is (not standardly!) treated as an LDL, despite being part of Polish, and it would be a shame to see the potential for someone to RFV all of them (perhaps they won't, but the option exists) and have certain very real terms deleted just because it's considered part of a WDL.

I know there's been a lot of talk about this lately, hopefully there isn't too much fatigue. That is why I decided it might make more sense to review this now and press on later. Vininn126 (talk) 23:40, 18 January 2024 (UTC)Reply

I was asked by Vininn to add my two cents on the issue, so here I go.

I must say I am worried about using language splits in order to circumvent the WT:WDL policy. I understand the frustration of having dialectal terms left undocumented, but there is no way to objectively draw a line between one dialect and another. In the end the smallest unit of a complete language system is an idiolect, and between that and a language family any grouping is ultimately either political or arbitrary.

I'm not sure how to define what is and isn't a language. I would say ISO codes are a good start, and after that splits may be warranted provided that there is abundant literature in the lect, a solid written language, or some major problems in mutual intelligibility... Knowing how Slavic languages are, the last one is probably not the case with these Polish lects. I don't know enough about them to comment on the first two.

With historical lects, a different issue comes up. In my opinion, it is only possible to treat a standard language as an WDL after its standardisation, and so I would prefer lects like Middle Polish to stand separate, like Old Ruthenian, and in my opinion the same should be done with Middle Russian (although this discussion led nowhere). Thadh (talk) 13:26, 19 January 2024 (UTC)Reply

@Thadh As to intelligibility, as mentioned above, I'd say that Massurian (and to a lesser extend Goral) is as intelligible as two other Slavic languages, so somewhat, but also quite diffificult for a lot of people. Middle Polsih is also the period when standardization really began and to some extend, solidified. Vininn126 (talk) 13:42, 19 January 2024 (UTC)Reply

@Thadh, Vininn126: regarding mutual intelligibility, my subjective opinion is that Middle Polish is easier for a modern Polish speaker than Masurian (if not because of anything else, then due to exposure in school to 16th and 17th century texts) – but since modern standard Polish does continue the standard that was established during Middle Polish period, I think there’s more to it. Masurian truly feels “foreign”. So if we’re willing to keep Middle Polish as a separate lang, IMO Masurian deserves the treatment too.

But then, regarding the factors of attestation in literature, separate grammar, recognition in separate ISO code, etc. – we’ve merged Classical Gaelic with modern Gaelic langs and it’s still not split – despite having its own ISO code, having very rich literature in 13th–18th centuries, its own grammar schooling tradition, established (if changing in time) spelling conventions, etc. So even we acknowledge those factors provide good guidance we definitely don’t always follow it very closely. // Silmeth ^@talk 14:20, 19 January 2024 (UTC)Reply

Proposal for several languages without ISO codes

Latest comment: 1 year ago48 comments7 people in discussion

Tagging @-sche and @Benwing2 who are likely to be interested in this. Here is a list of languages that currently lack ISO codes, with a brief explanation as to why they probably justify an L2 code. In a couple of cases, we're never likely to have more than a handful of entries for the language in question due to the scant number of attestations we have, but I don't think that should be used as justification for exclusion.

Baltic

Splitting Galindian (xgl) into East Galindian (xgl-eas) and West Galindian (xgl-wes).
This seems to have been a genuine mistake by the ISO: "Galindian" refers to two separate extinct languages within the Baltic family, which don't even seem likely to have been part of the same sub-branch. Both are poorly attested, however.
What is there to add in either language? WP says both are "poorly attested", but I'm having trouble finding whether they are actually attested or this is just an editor's euphemism for "not attested". (All I've found so far is a random website mentioning that some placenames are known or inferred for "Galindian".) This would help with deciding whether to just retire xgl, add full codes for East and West, or add etymology-only codes for them. - -sche (discuss) 19:29, 16 January 2024 (UTC)Reply

Creoles and pidgins

Scots-Yiddish (crp-syi)
A Scots-Yiddish creole spoken in the first half of the 20th century. Attestations are scanty, but some records do exist.
I'd like to see good evidence that this is a genuine creole (or even pidgin) rather than Scots with some Yiddish loanwords or simple code-switching. Pidgins rarely arise when there are only two languages in contact, and not all pidgins undergo creolization. —Mahāgaja · talk 07:36, 8 December 2023 (UTC)Reply
Yeah, I don't think we have enough evidence of this being a real, distinct language to add it. (Several of the relatively few works "in" the "language" appear to be inventing, or as they put it, "reimagining" it like a conlang.) - -sche (discuss) 19:05, 16 January 2024 (UTC)Reply

Dravidian

~~Beary (dra-bry)~~
What looks to be a creole between Malayalam and Tulu, with around 1.5 million speakers.
Seems reasonable to add; I can find a couple papers about it ("Linguistic features of Byari Language" and "Beary Language: Descriptive Grammar and Comparative Study") with various vocabulary. - -sche (discuss) 19:38, 16 January 2024 (UTC)Reply

Support —Mahāgaja · talk 19:43, 16 January 2024 (UTC)Reply

Created. Theknightwho (talk) 01:18, 3 February 2024 (UTC)Reply

Malamuthan (dra-mal)
A small tribal language related to Malayalam - we have quite a few of these already, and I see no obvious reason to exclude this one.
I'm having trouble finding any reference works about this; Mikhail S. Andronov (in A Comparative Grammar of the Dravidian Languages and A Grammar of the Malayalam Language in Historical Treatment) speaks of "the Malamuttan dialect". Perhaps we should just wait until someone has content they're wanting to add in this lect, to judge how distinct it is. - -sche (discuss) 19:38, 16 January 2024 (UTC)Reply
@-sche I'm not sure if you've seen it, but pages 37 to 39 of Tribal Languages of Kerala has some information about it, which notes a number of distinctive qualities; not least because they have a very strong tradition of isolating themselves from outsiders. That paper cites a 1981 reference work, but I assume it's in Malayalam. Theknightwho (talk) 14:35, 20 February 2024 (UTC)Reply

Germanic

~~Greenlandic Norse (gmq-grn)~~
A descendant of Old Norse spoken in Greenland until sometime in the 15th century, which diverged likely due to isolation (compare Icelandic and Norn). Some linguistic innovations and conservations have been noted, though the number of attestations is relatively small.
Oppose: This is concidered a dialect of Old West Norse, for which we already have code: non-own. --{{victar|talk}} 19:22, 7 December 2023 (UTC)Reply
@Victar That's an etymology-only code, not a full language code. Theknightwho (talk) 20:22, 7 December 2023 (UTC)Reply
I'm aware. This is a subdialect of a larger dialect. --{{victar|talk}} 20:30, 7 December 2023 (UTC)Reply

My initial inclination is to keep treating this as ==Old Norse== as far as L2s go (or if we really want to, treat it as ==Old West Norse== and upgrade OWN to being attested like Proto-Norse). Various Old Norse dialects including this one have some differences from one another, but I do not know that it makes sense to speak of Greenlandic Norse as a "descendant" of Old Norse when it was contemporaneous and stopped being spoken at around the same time as other Old Norse, and other members of the dialect continuum do not seem to have had trouble understanding it, or at least modern scholars don't (given the uncertainty over whether various texts or inscriptions represent Greenlandic Norse or e.g. the Icelandic dialect of Old Norse, and that it sometimes even comes down to just the shapes of runes rather than anything about which letters or words are used); it seems like we can continue to treat it as a dialect in the dialect continuum. It would be reasonable to add an etymology-only code, for use in various Greenlandic terms' etymologies (since we are extremely free with these, and have ety-only codes even for things like en-NNN vs en-US ... I see we even have "en-US-CA" although this does not appear to be used anywhere and I am going to suggest it be deleted along with Template:User en-us-ca...). - -sche (discuss) 20:12, 16 January 2024 (UTC)Reply

Closing this by giving it the etymology-only code non-grn under Old West Norse. Theknightwho (talk) 01:33, 7 February 2024 (UTC)Reply

Indo-Aryan

Kishtwari (inc-kst)
Closely related to Kashmiri (and sometimes classified as a dialect), but only retains partial mutual intelligibility, and (unlike Kashmiri) appears to be written using the Takri script.
Oppose: I have never seen Ka/ishtwari referred to anything other than a dialect of Kashmiri, alongside Kohistani, Poguli, Rambani, and Siraji. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
@Victar Poguli has an ISO code, so I’m not sure how much value your assertion has. Theknightwho (talk) 08:42, 8 December 2023 (UTC)Reply
And just because an ISO code exists, doesn't mean we on the project should create a language for it. Often times, village dialects have codes just because someone put out a paper on it, not because it's any more unique than any other dialect on the continuum of dialects. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
@Victar It calls into question the value of your statement that you have never seen it referred to as a language, if you’re putting it on the same level as a lect which does, in fact, have a language code. It also directly contradicts your previous statement as to the weight we should put on language codes. There is also the matter of the Takri script. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
It doesn't contradict my opinion at all. In my experience, partially when it comes to Indo-Iranian, is ISO over assigns language codes, so trying to give a language code to a dialect when even ISO doesn't is saying something. --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
@Victar None of which is relevant to the fact there is evidence it isn’t even written with the same script - please present something more substantive than a personal hunch, or a selective approach to the weight you put on language codes. Theknightwho (talk) 10:29, 8 December 2023 (UTC)Reply
A language written in multiple scripts is practically a hallmark of Indo-Iranian languages and to cite that as a reason to call it a different language would be naive. --{{victar|talk}} 10:39, 8 December 2023 (UTC)Reply
@Victar You’re being highly misleading: when a “dialect” is written in a different script, its speakers do not consider themselves to be speaking the same language, and it’s also highly divergent (to the point where it is tonal, unlike Kashmiri), then it creates a compelling case for separating it out. Theknightwho (talk) 10:44, 8 December 2023 (UTC)Reply
That is such an absurd statement. Script usage is frequently dependent of region and religion. Most literate Kashmiri speakers write in Perso-Arabic but the Hindus population uses Devanagari, regardless of any dialectal differences. Also I can't find any paper states Kishtwari is any more or less tonal than standard Kashmiri. You're overreliant on a Wikipedia article for your facts. --{{victar|talk}} 11:41, 8 December 2023 (UTC)Reply
@Victar Except this is the Takri script and it is directly related to “dialectal” differences, so your comparison is nonsensical because it shows that script usage in this case is affected by the lect, not other factors like religion. Standard Kashmiri isn’t tonal at all, as you very well know. Theknightwho (talk) 11:48, 8 December 2023 (UTC)Reply
Yes and the Kishtwari dialect is spoken in the region of the Kishtwar Valley, and the use of Takri is regional. Again, no paper I read remarks anything on tone. Unless you can provide a paper, your statement is meaningless. --{{victar|talk}} 11:57, 8 December 2023 (UTC)Reply

@Victar we also have code for haryanvi, considered a dialect of Hindi. So should it be removed? Word0151 (talk) 12:48, 8 December 2023 (UTC)Reply
🤷 Plenty of Hindi project users that can decide that. --{{victar|talk}} 01:33, 9 December 2023 (UTC)Reply
Urtsuniwar (inc-unr)
Closely related to Kalasha, but appears to be divergent enough to constitute a separate language with around 70% mutual intelligibility (compare Spanish/Portuguese with 85-90%).
Oppose: Urtsuniwar is a synonym for Kalasha, see Decker (1992). Some speakers just use more Khowar borrowings than others. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
@Victar Patently untrue - numerous references in the sources provide by WP (and elsewhere), and you’ve failed to explain the issue of mutual intelligibility. Theknightwho (talk) 08:45, 8 December 2023 (UTC)Reply
How is it "patently untrue"? Did you read Decker (1992): "Kalasha speakers in the Urtsun Valley sometimes call their language Urtsuniwar." I did explain the "issue of mutual intelligibility" -- speakers of Kalasha use varying degrees of Khowar borrowings. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
@Victar 70% mutual intelligibility is far below the threshold typically used to classify something as a dialect (80-85%) - the fact that one citation says they are the same does not discount the wealth of evidence to the contrary. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
What "wealth of evidence"? The first reference on the Wiki page literally lists Urtsuniwar under "Other Names" for Kalasha, beside Bashgali, Kalashwar, Kalashamon, and Kalash. Shall we make Kalashwar its own language as well? Another reference there is titled, I shit you not, "Kalasha of Urtsun". --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
@Victar Insufficient levels of mutual intelligibility, as stated several times. Theknightwho (talk) 10:32, 8 December 2023 (UTC)Reply

Iranian

Gorgani (ira-gor)
An extinct Caspian language attested in the 14th century, which appears to have formed a dialect continuum with Mazanderani. Previous discussion here.
Oppose: The few texts we have in Gorgani are almost indistinguishable from Old Tabari, the ancestor of Mazanderani, and should be considered a dialect of it, not its own language. There are actually more differences between Old Tabari and Mazanderani, but, like Classical Persian and Modern Persian, we treat them as the same language, in large part due to their use of an abjad alphabet. @Fay Freak --{{victar|talk}} 19:35, 7 December 2023 (UTC)Reply

@Victar In all seriousness: given you clearly respect the views of Borjian, how do you explain his apparent change in view from the line you quoted from 2004 and his 2008 paper on Gorgani in which he invariably refers to it as a language (not a dialect)? Theknightwho (talk) 22:43, 7 December 2023 (UTC)Reply
By its only being apparent. If you search for such a distinction. I’ve just looked into the 2008 paper again just for you. Normal(ly) people don’t look upon the statistical distribution of the employment of “language” and “dialect” in previous publications to find “changes in view” of linguists. Their views are rarely that sophisticated that one could make meta publications as one does on philosophers, and even then following such a bright shiny object is not an argument. language has multiple languages like sublanguage, including dialect, and one is not only not always anxious to make a distinction, there is usually nothing gained at all from such a “turf war”. All is language and words, rarely isolects or lexemes. Whether or not something should be treated separately is decided long before you realize you could beat the topics of this dichotomy again to fill your publication history.

In this case the talk of “language”, I may argue, is purposefully misleading people, to market one’s publication career. It’s just much more zhoosh to publish about whole “languages” than dialects. But it’s okay to embellish things a bit since the core message of a paper does not hinge on these concepts. All historical sciences use to be much less exact in their design than that of the jurist who has the peculiar task to weigh or find a balance for a final decision. Like how I formulate etymologies in probability terms is secondary to what information is provided, in other words: it is mostly rhetorics to present the material, the related forms, reconstructions, and bibliography—this is the science, the result is of little practical relevance, unlike in the legal art where in the end you get a sentence or recommend an action. There is a principal misunderstanding of what linguistic papers are about here I can make out. Benwing noticed. You take publications of an author and read them with an exactitude that they don’t provide, with “research results” that they didn’t care about. One could enjoy that there are still naive academics whose subjects are recondite enough for their not bewaring of a lawyer around the corner attempting to misinterpret them. Fay Freak (talk) 00:35, 8 December 2023 (UTC)Reply
@Fay Freak This seems like a very cynical answer, and it’s difficult to see how you’re not simply accusing Borjian of academic dishonesty. Also Benwing2 didn’t add anything on this topic - he simply asked for consensus. Theknightwho (talk) 08:48, 8 December 2023 (UTC)Reply

Nuristani

Zemiaki (iir-zem)
Spoken by around 500 people and related to Waigali, but I'm not seeing any indication it should be treated as a dialect in the literature.
Oppose: Morgenstierne (1974) calls it a dialect of Waigali, and Edelman (1999) is unsure, labeling it "jazyk/dialekt". We should play it safe and treat it like a dialect. --{{victar|talk}} 21:46, 7 December 2023 (UTC)Reply

Tungusic

~~Alchuka (tuw-alk)~~
A language in the Jurchenic branch (i.e. close to Jurchen and Manchu), which went extinct at some point in the 1980s. Records of the language aren't great, but there are a handful of works which go into detail.
~~Bala (tuw-bal)~~
A very similar situation to Alchuka above, though the language may still be moribund.
~~Kili (tuw-kli)~~
Formerly thought to be a dialect of Nanai (a Southern Tungusic language), but now thought to be a Northern Tungusic language influenced by Nanai due to geographical proximity; it had 40 speakers in 1990, and is likely moribund.

With no objections, creating these three. Theknightwho (talk) 18:28, 4 February 2024 (UTC)Reply

Yeniseian(?)

~~Jie (qfa-yen-jie)~~
Likely to be a Yeniseian language (though possibly Turkic), with only a single attestation from the 4th century (though it wouldn't be the first).

In the absence of objections, I'll create this, given the number of potential entries is capped at 4. Given the contention over its affiliation, und-jie is preferable as a code. Theknightwho (talk) 16:57, 4 February 2024 (UTC)Reply

Unknown

Xiongnu (und-xnu)
Attested ~~only via~~ in Old Chinese records of the language , but nevertheless, a handful of terms have been recorded (and we can, at least, make broad reconstructions as to how they would have been read): e.g. the Old Chinese borrowing 谷蠡.

Theknightwho (talk) 16:03, 4 December 2023 (UTC)Reply

Oppose Xiognu (Old Chinese is Old Chinese). West Galindian is also unattested. Is East Galindian attested outside of borrowings? If not, maybe keep as a substrate language?

Provisional support Zemiaki, Kishtwari, Urtsuniwar, based on the assumption there are no good arguments to keep these together.

Abstain for the others: poorly attested, extinct languages are usually subject to a lot of debate and usually dictionary entries in these don't turn out well, but they at least seem valid. Thadh (talk) 16:25, 4 December 2023 (UTC)Reply

@Thadh The issue with Galindian is that we need to deal with the present situation, since having a single language code for both is simply incorrect. Re Xiongnu, I'm not referring to borrowings - I'm referring to specific records of the Xiongnu language in Old Chinese sources. Theknightwho (talk) 16:30, 4 December 2023 (UTC)Reply

@Theknightwho: Do you mean mentions of terms à la Uindiorix, or do you actually mean texts à la Luwian? Because in the former case, I'm inclined to call it a borrowing rather than an attestation, whereas the second one is fair enough. Thadh (talk) 17:18, 4 December 2023 (UTC)Reply

@Thadh It's a bit tricky - for example, see , where Vovin argues (quite convincingly) that they're inscriptions in Xiongnu which used Old Chinese characters for their semantic values, except for terms that needed to be transcribed phonetically, such as titles or personal names. There's obviously precedent for this - compare Japanese, Korean, Vietnamese etc. Theknightwho (talk) 18:01, 4 December 2023 (UTC)Reply

@Thadh: Discussion will be considerably less confusing if people put their Supports, Opposes and Abstains under each individual case rather than grouping them together at the bottom. —Mahāgaja · talk 18:06, 4 December 2023 (UTC)Reply

@Mahagaja: I had quite general remarks: Living languages - split. Unattested languages - no split. Rest - abstain. I think repeating this ten times is a bit overkill. Thadh (talk) 21:12, 4 December 2023 (UTC)Reply

I'm usually sympathetic to adding extinct language X even if it's only attested as quotations/mentions/etc in old records in language Y, as long as we're sure X was a language (and different from, not just a dialect of, Y or another language). With Xiongnu, it seems like no one is sure which of various unrelated ethnolinguistic families the Xiongnu people and language(s) might have been from, or even if it was composed of multiple ethnolinguistic groups. That last part gives me pause. Are scholars generally in agreement that the attested words from the Xiongnu are all in one language, or is this like e.g. "Loup" where it's multiple different languages? (We currently have Category:Loup B language, but this is questionable and it seems good that we don't have any entries.) - -sche (discuss) 21:15, 4 December 2023 (UTC)Reply

@-sche A lot of that lack of certainty comes from two factors:

Because Xiongnu is filtered through Old Chinese characters, any kind of reconstruction therefore relies on us being able to accurately reconstruct the readings of those characters. This is something that is gradually improving, and - for example - we are in a much better position to make this kind of judgment than Pulleyblank was in the 1960s
There’s been a huge amount of (understandable) speculation as to whether the Xiongnu and the Huns were one and the same. If I had to put money on it I’d say they probably were related, but I strongly suspect there was a large dialect continuum involved (just as there was with the Mongolian languages a millennium later). However, I’m certainly not proposing we merge Hunnic with Xiongnu or anything as radical as that. What we do know is that the inscriptions which were found were created by the same Xiongnu who are written about in Old Chinese sources, because they were excavated in the old Xiongnu capital of Longcheng in Mongolia, which was discovered quite recently. The question is whether they’re in Old Chinese or Xiongnu, but I’m inclined to agree with Vovin that the evidence suggests the latter.

Theknightwho (talk) 03:36, 5 December 2023 (UTC)Reply

2024

Medieval Greek from Ancient Greek

Latest comment: 8 months ago104 comments13 people in discussion

Please, as in Wiktionary:Beer_parlour/2024/January#Petition_to_upgrade_Medieval_Greek, from Category:Ancient Greek language. (I am sorry that my browser has difficulty to read much of this page.) ‑‑Sarri.greek ^♫ I 09:45, 2 January 2024 (UTC)Reply

Support. The request is to split grk-gkm Medieval Greek out of grc Ancient Greek. Previous discussion at Wiktionary:Beer parlour/2023/March#Medieval Greek. @Fay Freak, Al-Muqanna, Nicodene, Vahagn Petrosyan, JohnC5, Benwing2, -sche, the people who participated in that discussion which (like most discussions at Wiktionary, unfortunately) ended inconclusively. By the way, we've been using gkm as if it were an ISO 639-3 code, but in fact it isn't one. A request was made for that code many years ago, but it's never been approved or denied. Therefore if the split is approved, we need to use the exceptional code grk-gkm. —Mahāgaja · talk 11:10, 2 January 2024 (UTC)Reply

Note: The proposal in question was rejected on Hallowe’en 2023. 0DF (talk) 19:54, 19 June 2024 (UTC)Reply

Support, but only if any editors are willing to clean up the mess left behind by the split, otherwise this should wait a bit. Also, we have to first figure out which of the many modern Greek varieties (Standard Greek, Mariupol Greek, Pontic Greek, Italiot Greek, Tsakonian, etc.) are to be descendants of Medieval Greek, and which shouldn't. Thadh (talk) 11:39, 2 January 2024 (UTC)Reply

I'm fairly familiar with Attic Greek, but not with Medieval apart from what I've read on Wikipedia. The sources that I've typically used for Ancient Greek entries when I used to create them don't cover Medieval. I wouldn't be opposed if you and a team of other people familiar with Medieval want to split it. I don't know if I can be of much use unless there are bugs in modules or something. — Eru·tuon 08:25, 4 January 2024 (UTC)Reply

Thank you. I will "clean up the mess left behind the split", @Thadh. It is only 248 words that need fixing, plus all related Modern Greek (el) etymologies; I have a list of 711 corrections. I do a lot of Medieval Greek at el.wiktionary, please do not worry, I will not destroy anything. I need one week to fix everything. Please, (@Erutuon) also Module:grc-pronunciation, Section Period for Template:grc-ipa-rows, Template:grc-ipa-rows-byz, Template:grc-ipa-rows-koi needs to say 10th century Medieval (or Mediaeval, according to your HomeRules) not 'Byzantine', Also at its /data might add med1 med2 also would be a nice addition. I am very happy, to resume work for med.greek! ‑‑Sarri.greek ^♫ I 04:51, 6 January 2024 (UTC)Reply

I suppose actually the lines for Medieval Greek should be removed from {{grc-IPA}} and moved into a separate {{grk-gkm-IPA}}. Likewise the option for |dial=gkm needs to be removed from all grc inflection tables and new grk-gkm inflection tables created. —Mahāgaja · talk 08:19, 6 January 2024 (UTC)Reply

@Mahagaja, no, not needed. IPA will be with parameter period=byz1 (or period=med1, if Erutuon might give an alias to this parameter). Also: learned medieval inflections are identical to the standard ancient inflections and there is no need to provide them separately. Nothing different. At el.wikt, if we care to repeat them, we add title: learned medieval inflection as in ancient greek. But we shall not provide any of that now. Never mind for vulgar inflections (I'll let you know about these) Thank you for your concern. ‑‑Sarri.greek ^♫ I 08:26, 6 January 2024 (UTC)Reply

We're really not supposed to use one language's templates in another language's entries, so if grk-gkm and grc are two different languages, then we're really not supposed to use things like {{grc-IPA}}, {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} in grk-gkm entries. And there may still be some differences; for example, does Medieval Greek ever use the dual number? If not then the dual shouldn't be shown in {{grk-gkm-decl}} and {{grk-gkm-conj}} as it is in {{grc-decl}} and {{grc-conj}}. —Mahāgaja · talk 09:10, 6 January 2024 (UTC)Reply

Thank you, (sorry, this page gives me page unresponsive at my Chrome browser, and is often difficult to write here.) Thank you @Mahagaja, The code gkm is in wide use, and although not -still- activated by ISO; there have been attempts to draw attention to its acceptance, and will notify if something changes officially. At el.wikt there are also dialectal gkm‑crt and gkm‑cyp as subordinate codes.
Thank you @Thadh, I will check all instances of insource:xxx and intitle:xxx occurances of relevant words and correct them. For the update Module:families/data/hierarchy#Hellenic and Module:etymology languages/data#gkm I submit here (quoted) the official greek source: Modern Greek Dialects What is a dialect? - Research Centre for Modern Greek Dialects, Academy of Athens

Nowadays we consider as dialects the Pontiac (in which the Greek of Crimea-Mariupol are included), the Cappadocian, the Tsakonian and the Southern Italian. All the other regional variants of the Modern Greek Standard are known as idioms. In particular, the Cretan and Cypriot idioms are exceptionally known as dialects, thus acknowledging an intermediate level of language variation.

All the modern Greek dialects Cappadocian.cpg, Italiot.grk-ita, Pontic.pnt which includes Mariupol idiom) and Modern Greek.el itself come from Medieval Greek, except Tsakonian.tsd, which is a special case. Thank you ‑‑Sarri.greek ^♫ I 13:07, 2 January 2024 (UTC)Reply

A bit off-topic, but most researchers I have read claim Mariupol Greek is, in fact, not a Pontic lect and doesn't share much if anything in common with Pontic it doesn't with other Greek lects. Thadh (talk) 13:34, 2 January 2024 (UTC)Reply

I kinda doubt editors are willing to clean up, or review the dialectology of the Abstandsprachen. The ideological distinction is barely worth the effort for that and for always checking in which chronolect a word has been used, an argument I often use, as we do not go completely without distinction if we don’t split at the L2 level: now it means we write a label if we know and abstain if we don’t bother. The result could become more often that someone doesn’t add a valid entry or etymological note due to fear of making a mistake. Fay Freak (talk) 19:46, 2 January 2024 (UTC)Reply

I oppose the change in name from “Byzantine Greek” to “Medi(a)eval Greek” for referring to this chronolect. I’m undecided about the split itself. @Sarri.greek: Could you point us to some well-developed Byzantine Greek entries in το Βικιλεξικό to give us some idea what they’d look like, and to what extent they’d contrast with Ancient Greek and Modern Greek entries, please? 0DF (talk) 02:19, 7 January 2024 (UTC)Reply

@0DF. _For the term, professors of linguistics might answer your question (ref). _Examples Παραδείγματα at wikt:el:Κατηγορία:Μεσαιωνικά ελληνικά. ‑‑Sarri.greek ^♫ I 08:45, 7 January 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Sarri.greek: Thank you for your response. I'll address the παραδείγματα first.
That category you linked (el:Κατηγορία:Μεσαιωνικά ελληνικά = “Category:Mediaeval Greek”) contains 1,804 entries, so I hope you'll forgive me that I only checked out the first column of entries (from el:ἀβαμπαρλιέρης to el:ἀλλάγιον — 63 pages). Of those, none of the gkm entries contained IPA transcriptions, and the only ones with inflection tables are el:αἰγοβοσκός and el:ἀλλάγιον. Those don't appear to be what I'd call "well-developed". As to contrast, the declension tables in αἰγοβοσκός and ἀλλάγιον are identical to Ancient Greek ones, even including the δυϊκός (duïkós, “dual”). As they are, those 63 entries suggest there would be no benefit to splitting gkm out of grc and that doing so would only create useless redundancy. That being said, I suspect that there could be some value in the split in the cases of entries like el:-άγρα, el:-αινα, and el:-αλγία, which present (currently unseized) opportunities to explain the loss of the accusative -ν, the loss of the dative entirely, and the collapse of the Ancient nominative–vocative plural -αι and accusative plural -ᾱς into the Modern -ες. I also see cases like the Modern Greek entry καλοκαίρι (kalokaíri, “summertime, summer”), which currently traces the word's etymology, via Byzantine Greek καλοκαίριν (kalokaírin, “good season, good weather”), to Ancient Greek καλοκαίριον (kalokaírion, “fine weather”). It would be great to know how καλοκαίριν (kalokaírin) declines; that being said, is there any reason why its declension couldn't be showcased perfectly well as a {{lb|grc|Byzantine}} {{alternative form of|grc|καλοκαίριον}}?
Now to the nomenclatural issue.
I've taken a look at the authority you cited; for the benefit of others reading this, here are its bibliographical details:

David Holton with Geoffrey Horrocks, Marjolijne Janssen, Tina Lendari , Io Manolessou, and Notis Toufexis (2019) The Cambridge Grammar of Medieval and Early Modern Greek, four volumes, Cambridge · New York · Port Melbourne · New Delhi · Singapore: Cambridge University Press, →DOI, →ISBN, →LCCN

The authors' rationale for their disuse of the term Byzantine Greek is to be found in the introduction to the work, in this paragraph from page xix:

The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700 (for details see Holton 2010, Holton/Manolessou 2010). Consequently, we employ the following terminology in order to denote sub-periods of the history of Greek, terms that also conveniently correspond to those widely used for periodization in Western historical thought: Early Medieval (EMedG) from about 500 to 1100; Late Medieval (LMedG) from about 1100 to 1500; Early Modern (EMG) from about 1500 to 1700.

Appeals to authority are all well and good, but that is poor reasoning. Yes, politics affect language, and the Byzantine Empire, whilst it existed, was (I think you'll agree) the political, cultural, and linguistic "centre of gravity" of the Greek world. The authors write that “for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not ‘Byzantine’ in a political sense” (my emphasis); however, a person's language doesn't (immediately) change with political borders. Earlier op. cit., on page iii, there occurs the sentence “The geographical area where Greek has been spoken stretches from the Aegean Islands to the Black Sea and from Southern Italy and Sicily to the Middle East, largely corresponding to former territories of the Byzantine Empire and its successor states.” Doesn't that show the centrality of that polity to the history of the Greek language during this period? The authors' reason is weak, and I reject it.
I see another problem here, which is that Holton et al. seem to be treating this chronolect as existing between AD ~500 and ~1700. As you probably know, the Middle Ages (a.k.a. the Mediaeval period) are traditionally bookended by the falls of two Roman Empires, starting with the fall of the Western Roman Empire in AD 476 and ending with the fall of the Eastern Roman Empire (i.e. the Byzantine Empire) in 1453; it's not too much of a stretch to push it later, to 500–1500, but I don't know any informed person who calls the seventeenth century mediaeval, so we couldn't call this chronolect “Medi(a)eval Greek”. Holton et al. are not alone in this, either: on page xviii op. cit. they mention the “dictionary of Kriaras and the Vienna-based Lexikon zur byzantinischen Gräzität”; that “dictionary of Kriaras” is Emmanuel Kriaras' Λεξικό της Μεσαιωνικής Ελληνικής Δημώδους Γραμματείας, 1100–1669 (“Dictionary of Mediaeval Greek Vernacular Literature, 1100–1669”, my emphasis). Maybe the Greek Μεσαίωνας (Mesaíonas) is conceived of differently from the English Middle Ages. It would be possible to call the chronolect “Mesaeonic Greek”, but we'd very much be neologising there; I could only find one instance of meseonic, so the adjective alone wouldn't even satisfy the criteria for inclusion.
Finally, I note that the other dictionary mentioned alongside Kriaras' is entitled Lexikon zur byzantinischen Gräzität (“Lexicon of Byzantine Graecity”), so it's apparent that not everyone rejects the term Byzantine Greek. Indeed, a text search for the string byzantin (case- and diacritic-indifferent) in the bibliography of The Cambridge Grammar of Medieval and Early Modern Greek (which occupies pages xxxvii–clxvi thereof) finds 201 instances. Some of those may be false positives, but that search would also have missed any instances hyphenated across a line break (byz-antin, byzan-tin, vel sim.) or in languages that spell the word bizant- or otherwise. My point is that Byzantine Greek is still a common term and one we should use.
0DF (talk) 09:23, 8 January 2024 (UTC)Reply

A bit of a nitpick, Byzantine Greek isn't any better than Medieval Greek as a label for the language after the fall of the Byzantine Empire. Strictly speaking it wasn't Byzantine Greek at that point, but Ottoman. But either term applies well to the majority of the period. — Eru·tuon 00:42, 9 January 2024 (UTC)Reply

I don't like the term Byzantine Greek because a naive reader could think it referred to a regional dialect rather than a chronolect. It would be easy for someone to think it referred to Greek as spoken in Byzantium as early as the time of Alexander the Great, and that it would not refer to Greek as written in Athens or Alexandria in AD 600. Also, 0DF, Holton et al. explicitly do not call the period from 1500 to 1700 medieval; they call it Early Modern Greek, just as we call the English of the same period Early Modern English. Wiktionary already uses 1453 as the border between grc and el; there's no reason separating grk-gkm out from grc should entail shifting the starting date of el later than it currently is. —Mahāgaja · talk 08:02, 9 January 2024 (UTC)Reply

I don't have much of a stake in this but I also favour Medieval Greek, though I wouldn't be opposed to having Byzantine Greek as an etym-only language attached to it. Theknightwho (talk) 08:43, 9 January 2024 (UTC)Reply

Side issue: if we split Medieval from Ancient, I suppose the Byzantine flag which is currently used for Ancient Greek in the "Add country flags next to language headers" gadget will need to be moved to Medieval Greek, and Ancient Greek will either need a new flag or no flag. - -sche (discuss) 19:48, 12 January 2024 (UTC)Reply

Preferably none. —Mahāgaja · talk 22:23, 12 January 2024 (UTC)Reply

@Mahagaja, Erutuon, Thadh, since I do not see any more objections: _phase_1: I have already cleaned up Modern Greek etymologies involving gkm (need 70 more to do, also supplying sources, ipa etc), to be ready for the term Medieval instead of Byzantine. This is

done by administrator: Module:etymology languages/data to do with "Medieval Greek", and aliases = {"Byzantine Greek"},
after this is done, I can correct similarly the term from Byzantine to Medieval at Module:grc:Dialects, Module:etymology_languages/canonical_names, Module:etymology_languages/code_to_canonical_name, also period=byzByzantine at Module:grc-pronunciation, labels at Module:accent qualifier/data, at Module:labels/data/lang/grc, Template:grc-IPA, Template:grc-ipa-rows, Template:grc-ipa-rows-koi, Template:langname-lite, Template:grc-ipa-rows-byz, Template:grc-IPA/sandbox/table and at texts Module:grc-decl/sandbox/decl/data, Wiktionary:Language treatment, Wiktionary:About Ancient Greek, Wiktionary:About Greek
after that, I can move Etymological and a few other Categories including the term Byzantine to Medieval. Of course -do not worry- I will not change Cateogires about historical Byzantine subjects, like Byzantine Empire etc.

These steps are for the name-change. If you provide permission and agree to upgrade, from grc, then _phase_2 from Module:languages/data/3/g to Module:languages/data/exceptional, the working alias gkm is already in place and I will be able procede with corrections for titles of Sections wherever needed, sources. etc. Especially where Modern etymologies need a Medieval lemma. Thank you for your help. ‑‑Sarri.greek ^♫ I 10:56, 3 February 2024 (UTC)Reply

There are objections. I would like to add that I too oppose renaming from Byzantine Greek or extending its time frame past the 15th century. Nicodene (talk) 02:17, 4 February 2024 (UTC)Reply

@Nicodene, I have suggested nothing about post 15th century = Early Modern Greek which we deal with in polytonic at el.wikt, not monotonic. But we are at _phase_1 now, which is to rename 'Byzantine language' to Medieval Greek. I am glad that you are interested in periodisation of Hellenic language; it is rare that non hellenists are interested or take time to study this. We can discuss it, if you wish at our Talk pages? Thank you ‑‑Sarri.greek ^♫ I 02:35, 4 February 2024 (UTC)Reply

(Why not here?)

I see. For the record I do support splitting it out of Ancient Greek, even if the (prescriptively correct, 'learned') inflections are going to be largely the same.

So far I don't see any real argument against the label 'Byzantine'. The point about political control is a bit spurious as the label 'Byzantine' is no way limited to the political level. It is civilisational.

The point about 'Byzantine Greek' being misinterpretable as 'the dialect of the colony of Byzantion' might be convincing if not for the unlikelihood of someone being simultaneously knowledgeable enough about history to even be aware of the (let's be honest) rather unimportant pre-Constantine city, yet also historically illiterate enough to be unaware of what 'Byzantine' means 99 times out of 100. Nicodene (talk) 02:58, 4 February 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Sarri.greek: Respectfully, I think you're being too hasty with this. I acknowledge I've been slow to respond; that has in large part been due to my work researching Atticism (see it, Citations:Atticism, and some of the word's relations) in connection with the more substantive question of whether there is value in splitting gkm from grc. My understanding of that matter is more-or-less in line with this paragraph from the website for Trinity College Dublin's 2024 International Byzantine Greek Summer School (IBGSS):

Byzantine Greek is the dominant form of Greek written during the Byzantine Empire (AD 330–1453). The spoken language changed significantly in this period and came close to Modern Greek, but most Byzantine authors use conservative forms of Greek that looked back to Classical Attic, the Hellenistic Koine and Biblical Greek. Therefore much of the vocabulary, morphology and syntax of Byzantine Greek are not significantly different from Classical Greek, which makes this course a suitable preparation also for reading Classical literature and the New Testament.

But to the matter of the nomenclature: I had previously been arguing that Byzantine Greek is just as good a term as Medieval Greek, but it appears that they may not be entirely synonymous. Please see the quotations I've collected at Citations:Medieval Greek. You'll see that Evangelinos Apostolides Sophocles uses the term Byzantine Greek (for 330–1453) and remarks that “if the expression Mediæval Greek is to be used at all, it should be restricted to the language of ” (622–1099), whereas Irach Jehangir Sorabji Taraporewala states that “Byzantine Greek is a direct development from the literary dialect of the second transition period ” but that “iterary Mediaeval Greek is a development of the colloquial of the previous (Neo-Hellenic ) period ”; those two sources directly contradict on the details, but they both distinguish the two chronolects. Edward Augustus Freeman speaks explicitly of “a literature, mediæval Greek or Romaic, as distinguished from Byzantine” and the writer for UNESCO discusses in a single sentence borrowings into “Byzantine Greek”, “mediaeval Greek”, and “Neo-Greek”; they appear to have particular time periods in mind, but I'm not sure what they are. And George Leonard Huxley refers to “Byzantine Greek” and “mediaeval Greek language and literature” in consecutive sentences, presumably synonymously, but not obviously so. Many more sources use both terms within the same work, without it being clear whether the terms mean different things or whether they're making a distinction without a difference. Can you explain these distinctions? Are they valid? If not, why not? If so, do you propose more than one offshoot to grc? If not, why not? If so, how many, and what should they be?
@Erutuon: I would argue that, in the same way that Greek writers contemporaneous with but geographically outside the bounds of the Byzantine Empire may nevertheless conform to Byzantine literary norms, Greeks writing after the Empire's fall may, from inertia or nostalgia, also conform to Byzantine literary norms, despite the change in their political context. By contrast, the Middle Ages are strictly chronological and have an exact terminus in the 1453 fall of Constantinople.
@Mahagaja: In my experience, Byzantium is used far more frequently to refer to the Byzantine Empire than it is to refer to the city; most people are unaware that the usage is originally a synecdoche and, whilst a lot of people know Istanbul used to be called Constantinople, far fewer know that Constantinople used to be called Byzantium (and fewer still know that Byzantium used to be called Lygos, but I digress). As such, I don't think that it is at all likely that a naïve reader would make that mistake. A mistake I know some people make, however, is with the qualifiers High or Upper and Low or Lower in geographical and geographically-based terms like Upper Egypt vs. Lower Egypt and High German vs. Low German, with High and Upper mistaken to mean "north(ern)" and Low and Lower used to mean "south(ern)"; I assume the confusion arises from the conventional orientation of maps in the Anglosphere. Despite that confusion, I would not, and I doubt you would, advocate replacing those terms with ones less susceptible to such naïve confusion. For another example, I'm sure a naïve reader could mistake Andalusian Arabic for Arabic spoken in the (present-day) Spanish region of Andalusia; the synonym Moorish Arabic is not susceptible to that confusion, so should we use that instead? There are other confusables as well, I'm sure. ⸻ Re Holton et al., I know they don't call Greek 1500–1700 "Medieval"; the fact that I quoted above a paragraph of theirs that ends "Early Modern (EMG) from about 1500 to 1700" should make that clear. My meaning was that Holton et al. are treating Greek 500–1700 as a single chronolect, which they call "Medieval and Early Modern Greek" and which Kriaras calls Μεσαιωνική Ελληνική (Mesaionikí Ellinikí). Holton et al. make a point of saying that their “system of periodization…is not based on external criteria” and that their “criteria are instead internal ones, based on clusters of important linguistic changes that see as occurring around 1100, 1500 and 1700”. If we did the same, that might indeed entail shifting the starting date of el later than it currently is.
@-sche: I don't have country flags beside language headers turned on and neither am I inclined to turn them on, but if you're interested in having them, you could use the Argead star (commons:File:Vergina Sun WIPO.svg) for Ancient Greek; the English Wikipedia uses that image in its country infoboxes as the flag of the Empire of Alexander the Great, as well as in many other places.
@Nicodene: I largely agree with you, but if we're going to split out gkm, wouldn't it be better to give the inflections that show the changes taking place between Ancient and Modern Greek? Wouldn't it be rather redundant if they had the same inflectional information as that given in Ancient Greek entries?
0DF (talk) 03:46, 4 February 2024 (UTC)Reply

More than one set of inflections could be shown - the learned and Atticising versus the humble and 'demotic', at least by the time of the Digenes Akritas. Or, working with one set of inflection tables, cases or endings falling out of vernacular use could be placed in brackets with an explanatory note regarding register. Apart from that there would be differences in phonology and in various cases semantics as well. Nicodene (talk) 03:56, 4 February 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Nicodene: To give us all some idea of the kind of inflectional variability we're dealing with, I added a table to βαθύς (bathús) of its Byzantine forms. There's already a lot there, but that's an underrepresentation, if anything. Annoyingly for our purposes, Holton et al. specifically omit the dative from their paradigms, despite the fact it occurs:

Nominative, genitive and accusative cases continue to exist in LMedG and EMG. The dative case, however, had gradually disappeared from the spoken language during the first millennium and its main functions were reassigned (see Humbert 1930, Lendari/Manolessou 2003, Horrocks ²2010: 183–5, 284, Holton/Manolessou 2010: 546–7). Nonetheless, datives survive in many of the written texts that this Grammar is based on, though mainly in documents and other texts in mixed or higher registers, and they may have a range of inherited functions. Particularly common are datives governed by the prepositions ἐν and σύν. Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.¹
¹The only exception that has been made is the dative reciprocal pronoun ἀλλήλοις, on the basis that its occurrence, which is quite rare, seems to be as much a lexical survival as a morphosyntactic feature (see 5.12).
—volume II, § 1.1, pages 241–242

and even in novel formations:

In addition to instances like the above, which could be deemed grammatically “correct” (i.e. in accordance with AG morphology and syntax), we also find dative forms with innovative phonology, stress or morphology, or new lexical items:
Of particular interest is the use of dative forms for loanwords:
—ibidem, pages 242–243

Moreover, Holton et al. exclude Atticist texts entirely (“the texts on which this Grammar is based – i.e. texts that are not systematically archaizing” — ibidem, page 243); accordingly, if we're to produce accurate and (in aspiration) exhaustive inflection tables, we shall have to supply the missing Attic forms and datives.
Holton et al. mention the dual number, as far as I can tell, exactly twice in their entire four-volume grammar:

The AG reciprocal pronoun (“one another”) had dual (gen. ἀλλήλοιν) and plural (gen. ἀλλήλων) numbers, and was declined for gender and case (genitive, accusative and dative).
—volume II, § 5.12, page 1,183

The London manuscript can be consulted at http://www.bl.uk/manuscripts/. Ms. Athous Pandel. 538, edited by Vasileiou (2003) has the unusual form εγκρεμίζεσθον Varl. & Ioas. (Pantel.) 303, which is unlikely to be an archaic dual (as the subject is 2 sg.), and probably a writing mistake for ἐγκρεμίζεσουν.
—volume III, § 4.3.1.2, page 1,551, footnote 54

so I don't know whether to infer from their silence that the dual saw no use in Byzantine Greek, or that its use was resticted to Atticist texts, and that it is for that reason that Holton et al. make no mention of it.
Certainly, we can't rely on Holton et al. alone to guide what we do about Byzantine Greek. Nevertheless, that table at βαθύς (bathús) is something concrete to work from. 0DF (talk) 23:37, 7 February 2024 (UTC)Reply

@0DF The effort is quite admirable, thank you. I can't imagine it is sustainable across hundreds of entries, so generating variants with an automated template would be the long-term approach. The tables would probably include a prominent disclaimer like 'not all forms necessarily attested'. The automated romanisation can probably be prevented somehow to alleviate crowding. Nicodene (talk) 23:19, 8 February 2024 (UTC)Reply

@Nicodene: I agree that the transliterations take up too much space and that they probably are best removed by default from Byzantine inflection tables. I also agree with including a prominent disclaimer of the kind you describe. I got the forms of βαθύς I added to that table from Holton et al., volume II, pages 746–757, wherein βαθύς serves as their paradigm for “Adjectives with Originally 3rd-Declension Endings” (§ 3.3), specifically “Oxytone Adjectives in -ύς” (§ 3.3.1). On the basis of Holton et al., volume I, page xxxiii (“When whole words are enclosed in brackets in the tables, the forms in question may reasonably be assumed to have existed, but no example has been located in the LMedG and EMG texts examined, e.g. (μιανοῦ), (χρυσοῦ).”), I presented each form which they give in parentheses instead with a preceding asterisk, as is standard in historical linguistics. Despite there being many forms already, the forms given by Holton et al. are an under-representation, if anything (Holton et al., volume I, page xxxvii, prefacing the Bibliography: “Classical, post-classical, early medieval and other learned Byzantine texts are not included below.”; volume II, page 746, below the synoptic table for βαθύς: “Residual forms, e.g. βαθέος, βαθεῖς, are not included in the above table, but will be discussed below where relevant.”; ibidem, page 242: “dative forms are not included in the paradigms set out in the chapters that follow”; ibidem, page 243: “the texts on which this Grammar is based – i.e. texts that are not systematically archaizing”); the forms Holton et al. give are only those non-dative forms which occur in lower-register texts written 1100–1700: a rather limited subset of the “Medieval and Early Modern Greek” whole that you'd reasonably expect that they're trying to describe. On the other hand, if we are to adhere to the 1453 cut-off for Byzantine Greek, we need to be careful to exclude those forms that occur only in texts from the seventeenth, the sixteenth, and/or the latter half of the fifteenth centuries.
I am increasingly recognising that inflection tables for Byzantine Greek terms ideally require certain features that are different from those that befit inflection tables either for Ancient Greek terms or for Modern Greek terms. One of those features would be the indication of pronunciation for each form, because the vocalic mergers of Byzantine Greek render its graphemes surjective upon its phonemes (i.e., with the exception of the bijective α ↔ /a/ and ου ↔ /u/, each vowel may be written in multiple ways, namely: αι, ε → /e̞/; ο, ω → /o̞/; ει, η, ι → /i/; οι, υ, υι → /y/, then, upon the completion of iotacism in the eleventh century, /i/) and because the representation of significant phonological processes that Byzantine Greek underwent (synizesis and various deletions) are only haphazardly reflected in spelling; this would call for a tie-in between a module such as that behind {{grc-IPA}} on the one hand and modules such as those behind {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} on the other. Another desirable feature, in the light of Holton et al., volume I, page xxxiii (“smaller tables classify the allomorphs as ‘General’ (if they occur widely in the texts examined), ‘Restricted’ (if they are found in only part of the period covered by the Grammar, or only in certain areas or certain types of text), or ‘Rare’ (if their occurrence is very limited)”), would be the means seamlessly to mark each form for its respective period, locale, genre, register, and frequency. The need for bespoke inflection tables, distinct from those designed for Ancient and Modern Greek, is an infrastructural and thematic argument in favour of treating Byzantine Greek terms separately from Ancient Greek terms on the one hand and Modern Greek terms on the other. 0DF (talk) 22:51, 6 March 2024 (UTC)Reply

phase 1

notifying administrators for grc @Mahagaja, JohnC5, Erutuon also @Thadh, Theknightwho, Benwing2 More than one month has passed. Am I to procede with _phase_1:rename Byzantine to Medieval? Do I have permission by administrators to start? Would an admin help with Module:etymology languages/data to do with "Medieval Greek", and aliases = {"Byzantine Greek"}, ? (because I am not an administrator, I cannot intervene)? Thank you.
On some other points: (I did not expect σχοινοτενεῖς, prolix discussions in this page, but at the corresponding Beer talk. Nevertheless, I am obliged to respond and clarify:)

Early Modern Greek: @Mahagaja, yes, the phase 1453-1669 (termination of Cretan literature) is Early Modern Greek (πρώιμη νεοελληνική, interchangeably 'Late Medieval' (όψιμη μεσαιωνική) why? _1. because of its retained mediaevalisms many prominent linguists use interchangeably the terms 'Late Medieval' and 'Early Mod.Gr' -we can discuss further-. And mainly _2. I would not propose a split of Modern Greek or further splits in general. We study it under Med.Gr. because its original script is polytonic, all modules, translit, ipa, etc are already in place -probably some modifications, or a few templates will develop-.
Period versus Style/Register. Hellenistic Koine -or even Attic dialect- is used by authors long after the 6th century (even until the 20th century in the form of Katharevousa). The typology (inflections etc) of their words are as in Grammatical rules of Ancient.Gr. (a label like learned may be used for some medieval Koine-style neologisms that might interest Med. We will not duplicate existing Ancient Greek inflections.
Polytonic original script. Please note, that greek conservative linguists of past century would 'correct' forms at their editions according to Anc.Gr. rules, while the progressive ones (who were prosecuted during these polemic times -Trial of Accents) like Kriaras, at some point switched to monotonic. Nowadays, it is inconceivable to change the script of an original source at a critical edition. Please note, that everything greek up to 1982 was written polytonic. Nowadays, everything, ancient too- might be seen (e.g. at internet, new books) written monotonically, beacuse it is easy-to-type/cheap-to-print.
Polytypy in Hellenic: @Nicodene, yes, it is a fact. It is a stubborn language: flactuation in suffixes runs through all grk. See modern verbs like εκλέγω#Conjugation, Template:el-conjug-'ακούω'. One cannot avoid Modern Greek inflections because of their too many allomorphs -and see how much is omitted! Appendix:Greek_verbs#Omitted.-
Will Medieval Greek acquire inflections at once? No. It takes some time for discussions, proposals, trials, to crystallise a method. A Working/Trials/Feedback.for.Med page would be a good way to start. (please check some first attempts at wikt:el:παλληκάριον. wikt:el:σκοῦπα, wikt:el:Template:gkm-κλίση-ουσ, a neologism but learned = in the ancient fashion at wikt:el:ἀπόκρεως. Med.Greek does not have Dative. Our learned friend's 0DF table at βαθύς, is a fusion of Koine datives into Med. for the difficult categories -ύς, -ής of adjectives with lots of learned forms preserved. We do not add Dative at Mod.Greek either (Mod.terms with dative@el.wikt but not at its Tables.).
I have not proposed any trial for an Appendix of clitic paradigmata and/or tables with the distinction of 'expected versus attested' forms yet.

Why try to formate a neoteric Section 'Medieval Greek' here rather than at el.wikt? Because here, there are so many learned and informed editors: experts -some, professionals-, who can help with their bibliography, their valued opinion in this project. At el.wikt I am totally alone in this project and I found it exhausting to update and patrol, make trials, have no feedback, no help for Med.Greek. All experts are assembling here. This wiktionary is the avant-garde of all wikts.
Admins! Please help to begin this project. Give me permission to start with _phase_1:rename Byzantine to Medieval. Allow _phase_2 (upgrade from etymol.language to an autonomus section), so that I can use the title Medieval Greek for poor Τζέτζης who has been waiting for this for a long time. Help, please, please, allow this long phase of greek at en.wikt to exist! Thank you. ‑‑Sarri.greek ^♫ I 04:58, 9 February 2024 (UTC)Reply

@Sarri.greek I tried to read through this discussion. It is confusing because there are two different issues (rename Byzantine -> Medieval, and split out Byzantine/Medieval from Ancient Greek). For issue 1 (rename), it looks like maybe two people (User:0DF and User:Nicodene) disagree with the name change and up to four are in support (User:Sarri.greek, User:Thadh, User:Mahagaja and User:Theknightwho). This is possibly enough for a rename but I feel uncomfortable without a clearer consensus, esp. given that I'm not sure whether User:Fay Freak opposes the name change and/or split (their prose is, as is typical, somewhat impenetrable). User:Erutuon and User:-sche seem willing to accept one or both changes but without a strong opinion. For issue 2 (split), it looks like User:Sarri.greek, User:Thadh, User:Mahagaja and User:Nicodene are in favor of a split, while User:0DF is undecided, User:Fay Freak possibly opposes (?), and User:Theknightwho has not expressed an opinion. Can all the people I just named let me know (1) did I get your opinion correct on both issues and (2) if not, what is your opinion, both about issue 1 (the rename) and issue 2 (the split)? Benwing2 (talk) 05:29, 9 February 2024 (UTC)Reply

Yes, support. Thadh (talk) 11:19, 9 February 2024 (UTC)Reply

Support — long overdue! — Salt marsh ^🢃 06:26, 9 February 2024 (UTC)Reply

@Benwing2: I was more warning with respect to the ambiguous consequences, without obstructing. If people are willing to invest work for a split, it is not my due to oppose it, since I do not expect to do Greek in the medium-term anyway, as it is low on my priority list, relatively to other interesting languages – I have not even followed the forthgoing of the discussion and don’t know what you all exactly intend, especially with respect to the 300–600 time, when I have derived Arabic terms from Byzantine Greek when I am not really sure whether they are from before Islam or right after it or a century later etc. and it might be split to Late Koine and Medieval Greek, which I am not particularly keen to revisit either and Greek editors might be good enough to pinpoint. Fay Freak (talk) 07:07, 9 February 2024 (UTC)Reply

@Fay Freak OK thank you, that clears things up. Benwing2 (talk) 07:12, 9 February 2024 (UTC)Reply

Thank you @Fay Freak for not opposing. Indeed the period of Late Koine 300-600 (600 accepted as turning point with original-Greek parts of Novellae at Iustinianos legal reforms, -langugagewise, while history has a different periodisation-), is under the jurisdiction of Ancient Greek administrators. As seen at {{R:DGE}} and Bailly2020: these dictionaries extend to authors of up to 6-9th, 10th, 13th centuries, when such authors use Koine as high register. ‑‑Sarri.greek ^♫ I 07:36, 9 February 2024 (UTC)Reply

@Benwing2, sorry to bother you again: what is going to happen? Would you like me to call more people to vote? Mr @A. T. Galenitis who edits all phases of Greek including Medieval is away. As you see, not many are interested in Greek. But, I am, I am: I am willing and available! Every year, less and less people will be voting. In the end, I will be the only voter! I am awaiting and anxious to start editing. Thank you. ‑‑Sarri.greek ^♫ I 18:26, 15 February 2024 (UTC)Reply

@Sarri.greek I'd like to give it a couple of weeks. As it is looking, the split seems pretty clear and the name change is leaning towards, although User:0DF has not added their votes yet. Note that in general you should not canvas votes, i.e. ping people specifically for voting purposes esp. if you believe they will vote in a particular way that you desire. Benwing2 (talk) 00:51, 18 February 2024 (UTC)Reply

@Benwing2, of course, of course! people vote if they agree, not because i called them. I am just informing people with whom we have been discussing about this for more than a year, people that have -or want to- edit Greek. Just Mr A. T. Galenitis, an excellent editor, who supports strongly. But they do not come very often, and they do not get messages except from their Talkpages. I always check Related changes for el and for grc, and I am sorry to say, that there are very few people interested. Perhaps some editors doing very many languages, create some exotic lemmata. Thank you very much, I can wait, I know how busy you are. ‑‑Sarri.greek ^♫ I 08:53, 18 February 2024 (UTC)Reply

Thank you very much @Sarri.greek for bringing this into my attention and for putting once again the effort for this very worthwhile change. Indeed, I have been rather inactive lately, but as the creator of many gkm lemmata I am adamant on the need for this split with arguments which have been repeated multiple times. I would be more than happy to put the required work for my own lemmata and create more while at it. Regarding the naming, both approaches have some historical value (with varying power of persuasion) to them yet from a functional point of view it doesn't make much sense to oppose the recent literature and main body of research within the field where "Medieval Greek" has become dominant (vide Holton's et al. recent monumental Cambridge Grammar of Medieval and Early Modern Greek) A. T. Galenitis (talk) 17:09, 16 March 2024 (UTC)Reply

I am sorry that I write so schoenotenically; I have difficulty with concision. I'm also sorry that I have taken so long to respond; I have done a lot research regarding this topic since η Δις Σαρρή first petitioned the Beer parlour for these changes. As you'll see below, whilst I still oppose the change of this chronolect's name, I have come to support its split into a lect with its own L2 header, at least in principle. I feel I should explain my position, especially regarding my “concern…that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here”.

Ὦ Δις Σαρρή· When you write things like:

I am left with the impression that you want the label “Medieval Greek” to refer only to the relevant period's basilect of the Greek diglossia. If that is your position, what then happens to the acrolect of that period? Does it remain part of Ancient Greek (grc)? And if so, should Katharevousa be treated similarly? Ultimately, is post-Classical Greek to be split primarily by register? Perhaps I've misinterpreted you, but if so, please clarify your position. If this is your position, you should make it explicit, so that everyone knows exactly what's being voted on. Perhaps this is what Fay Freak meant by “ambiguous consequences”. I could support either one, be it a split by period or by register. Here's a litmus test: In what variety of the Greek macrolanguage was the Suda originally written?

What I could not support is a split by period that excludes from Byzantine Greek its higher-register elements. You seem to want to do that when you say “Med.Greek does not have Dative.” and “We do not add Dative at Mod.Greek either”. It is untrue that Byzantine Greek does not have the dative; on the contrary, as Staffan Wahlgren writes, “The most important observation…is that the dative is so surprisingly alive and productive in such a wide range of Byzantine texts.” (Wahlgren 2014: Abstract) Even Holton et al. (2019: II, 241–243), whom I've already quoted at length above, acknowledge that “datives survive in many of the written texts that th Grammar is based on” and that “articularly common are datives governed by the prepositions ἐν and σύν”, before recording their decision nevertheless to exclude all datives (except ἀλλήλοις) with the single sentence “Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.” — Blink and you'll miss it! And those datives aren't all just learned preservations; especially noteworthy is the Early Modern Cretan Greek noun ἐμπιστευτιός (empisteutiós), which is one of the “ords belonging to paradigm have only been found in LMedG and EMG texts from Cyprus. In all cases these words are local variants of masculine words in -τής…. The earliest examples are from Assizes B (15th-c. ms).” (Holton et al. 2019: II, 451), and which has the dative plural form ἐμπιστευτηόδες (empisteutēódes) attested in a sixteenth-century text.

As a general concern, I think you lean on Holton et al. too much: their work has a far more limited scope than is immediately apparent. As Martin Hinterberger writes, despite the recent appearance of the Cambridge Grammar of Medieval and Early Modern Greek, it is not the “comprehensive linguistic description of written Byzantine Greek (in all its multifarious variants) remains one of the desiderata of Byzantine literary studies” (Hinterberger 2021: 21); in my opinion, though not (explicitly) Hinterberger's, Holton et al. have treated the Greek of 1100–1700 “as a degenerated, deficient form of classical Greek, or as an immature form of modern Greek” (Hinterberger 2021: 37). We should not do the same.

I want to end this on a note of praise. I admire the enthusiasm and hard work you pour into this. If I have the effect of applying brakes, please understand that I do so only to ensure clarity prevails and that the best decisions are taken, even if it might not seem that way to you. I notice that you are writing a module to handle the declension of all Greek nouns. I think this is a worthwhile effort, and it has a precedent in Module:zlw-lch-headword. It would certainly be good to have a common theme for all Greek nominal declension, since that would avoid such aesthetically objectionable clashing as currently exists in Λεϊβνίτιος (Leïvnítios). Keep up the good work! 0DF (talk) 01:51, 24 March 2024 (UTC)Reply

Rename to Medieval Greek

Support ‑‑Sarri.greek ^♫ I 05:54, 9 February 2024 (UTC)Reply
Support — Salt marsh ^🢃 06:26, 9 February 2024 (UTC)Reply
Oppose - Byzantine is the more common term and no valid argument has been given against it. Nicodene (talk) 08:19, 9 February 2024 (UTC)Reply
Thank you @Nicodene for your support for this language. Yes, the termByzantine is extremely common because we have Byzantine studies, Etudes Byzantines at Sorbonne, Byzantine Music, Byzantine Iconography, Byzantine Empire and so on. But I do not recall any language taking its name from an empire e.g. Roman Empire Latin, British Empire English? is there any example? Mandarin perhaps as non-linguistic term? The term was used pre-2000 influenced from the very common 'Byzantine' epithet. Greek linguists also used it, but later, preferred the term 'μεσαιωνικός, medieval. But, thanks anyway. ‑‑Sarri.greek ^♫ I 08:38, 9 February 2024 (UTC)Reply
The actual comparison to * would be *, which nobody says either. And it'd be strange to argue that British English, British music, and British art are all "named after an empire" just because there was also a British Empire. They're all named after Britain and the British people, just as all the things you mention are named after Byzantium and the Byzantines. Nicodene (talk) 09:08, 9 February 2024 (UTC)Reply
@Sarri.greek: As Nicodene wrote, Byzantine Greek isn't named for the Byzantine Empire; rather, both are named for the Byzantines, who are named for Byzantium. Languages are usually named for people, places, or polities (and polities are usually named for either of the former). Because of what people and places can be named for, this can result in pretty weird language names. For example, Big Nambas (nmb) and Nez Perce (nez) are named for peoples with the same designation, and those peoples are named for their codpieces and misnamed for the Chinooks' nose piercings, respectively. Toponymically, East, South, and West Bird's Head are named for Bird's Head, a peninsula of Papua that looks, indeed, like a bird's head; I can only assume that Port Sandwich (psw) was named for the Vanuatuan coastal settlement that has since been renamed Lamap; and Western Desert (nine dialect codes) is named for desert areas in western Australia (chiefly Western Australia). Many creoles have strange names. Other language names are odd for etymological reasons; for example, Ukrainian (uk, literally “borderlandese”, althought this etymology is disputed) and Zamboanga Chavacano (cbk, literally “poor-taste mooring-place”). And then there are names that are picturesque, like Cœur d’Alêne (crd, literally “heart of awl”), Hill (mrj) and Meadow Mari (mhr), Large (hmd) and Small Flowery Miao (sfm), and Blue (hnj), Green (also hnj), and White Hmong (mww). By comparison, Byzantine Greek is not at all strange or particularly romantic (pun intended).
I admit I got a bit carried away with the examples there. Sign languages are generally more clearly named for polities; for example, American (ase) and British Sign Language (bfi); compare the more obscure Maritime Sign Language (nsr). Dari (prs and gbz) supposedly derives from Classical Persian (darbār, “royal court”) and one could argue that Dano-Norwegian is named for the political union Denmark–Norway. However, the language name most unambiguously named for an empire is probably Imperial Aramaic (arc), named for the Neo-Assyrian, Chaldean, and especially Achaemenid Empires. Finally, consider Ashokan Prakrit, which goes one step further by being named for a specific emperor, namely the Mauryan Emperor Ashoka the Great (regnavit circa 268–232 BC). 0DF (talk) 00:34, 7 March 2024 (UTC)Reply
Support ~~{{abstain}} Both names seem about equally common, and I don't really care which one we use. I'm not opposed to either name.~~ Thinking about it some more, I've decided I prefer "Medieval". —Mahāgaja · talk 09:53, 9 February 2024 (UTC)Reply
Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
Abstain ~~{{support}}~~ Following the contributions of user 0DF to the discussion, I also see the merit of the term Byzantine Greek. Most importantly, I understand that I require additional reading before coming to a final conclusion. For the time being, abstaining (i.e. agreeing with either terminology to be adopted). A. T. Galenitis (talk) 21:28, 21 March 2024 (UTC)Reply
Oppose To avoid further perceptions of prolixity, I shall be terse:
Reasons for “Byzantine Greek”:
1. As I've argued before, the language should be called Byzantine Greek “because its production is inextricably linked to Byzantine civilization” (Hinterberger 2021: 22).
2. Other things being equal, endonymy is desirable. However, ready apprehensibility by Anglophone readers often supersedes this consideration. The Byzantines usually called themselves Ῥωμαῖοι (Rhōmaîoi, literally “Romans”), their country Ῥωμανία (Rhōmanía), and their language Ῥωμαϊκή (Rhōmaïkḗ). English Romaic and Rhomaic exist, but I wager they're little-known, and likely to be mistaken as relating to Romani or Romanian. Ancient Greek Ἕλληνες (Héllēnes) exists, but is not specific to the Byzantine period, and “Hellenic Greek” ⩰ Hellenistic Greek ≡ Koine Greek. There's Ancient Greek Γραικοί (Graikoí), but that's used for the macrolanguage “Greek”. There is marginal self-reference by Byzantines to their histories as Βυζαντιακαὶ (Buzantiakaì) and to themselves as Βυζάντιοι (Buzántioi), so “Byzantine Greek” is endonymic. By contrast, no people in the Middle Ages called themselves “Mediaeval” anything.
3. “Byzantine” is a fairly familiar term to the average educated Anglophone. It is an epithet applied to a great many disciplines, journals, and phenomena pertaining to the empire of that name (v. e.g. , , ), the vast majority of the primary sources for which are written in Byzantine Greek. Cet. par., it is desirable that referents systematically related in such a manner should share a nomenclature. I doubt that those various disciplines would adopt the relatively cumbersome “Mediaeval Greek X” nomenclature to replace the relatively concise “Byzantine X” nomenclature, and it would be ungrammatical to do so in compound modifiers such as Serbo-Byzantine.
4. The alphabetical and chronological orders of the three chronolects of Greek (that are written in the Greek alphabet) are the same. For any word homographic in the three chronolects — many (most?) consonant-initial ((pro)par)oxytones — this allows one to trace its development from Ancient Greek, through Byzantine Greek, and all the way up to the Greek of the present day by scrolling down the page and reading in order: a boon for comprehension. This serendipity would be lost if Byzantine Greek were renamed Mediaeval Greek.
Reasons against “Mediaeval Greek”:
1. Mediaeval means “of or pertaining to the Middle Ages (Latin Medium Aevum)”, but those Middle Ages were not universally significant. Traditionally, the Middle Ages are regarded as beginning in 476 with the fall of the Roman Empire in the West and as ending in 1453 with the fall of the Roman Empire in the East. Lingustically, the former had a considerable impact on Medieval Latin: the dissolution of Roman institutions, radical decentralisation, vernacular drift, development of feudalism, and immigration of unassimilated peoples lead to linguistic innovations and borrowing on a massive scale; often regarded as corruptions, various attempts were made to restore Classical Latinity, as in the Carolingian Renaissance, but these saw only partial success until the triumph of humanist Ciceronianism in the Italian Renaissance. Thus, Mediaeval Latin was succeeded by Renaissance Latin and then by New Latin. This makes the epithet “Mediaeval” highly suited to that chronolect of Latin. By contrast, Byzantine Greek saw no such dissolution, decentralisation, or feudalism, at least not until the Fourth Crusade; for Greek, the fall of 1453 was vastly more consequential than the fall of 476 — the opposite was true for Latin. This makes the epithet “Mediaeval” highly unsuited to that chronolect of Greek. For more, see Kaldellis 2019: ch. 4 (“Byzantium Was Not Medieval”), pp. 75–92.
2. The adjective has four justifiable spellings: mediaeval, medieval, mediæval, mediëval. Byzantine has only one. Cet. par., that a term's spelling be uncontested is desirable.
3. The English Wikipedia has three articles entitled “Medieval X” for languages (Medieval Greek, Hebrew , and Latin); in other articles I saw, they give Medieval Catalan as a synonym of Old Catalan, Medieval Spanish and Old Castilian as synonyms of Old Spanish, and for Galician–Portuguese they give the five synonyms Medieval Galician, Medieval Portuguese, Old Galician, Old Galician–Portuguese, and Old Portuguese. That would give the impression that, in language names, medieval and old are synonymous; not so Medieval Greek, which has the synonym Middle Greek (alongside Byzantine Greek and Romaic). Middle and Old are much more common as chronolect descriptors than Medieval (CAT:en:Languages has 2 members named “Medieval X”, 25 named “Middle X”, and 64 named “Old X”). AFAIK, no one calls Byzantine Greek “Old Greek”. IMO, “Middle X” only really works for languages with a threefold chornolectal division designated “Old–Middle–New X” or “Old–Middle–∅ X”. Greek, however, has a four- or even six-fold division — Mycenaean–Ancient–Byzantine–Modern or Mycenaean–Homeric–Classical–Koine–Byzantine–Modern — one would be hard-pressed, especially in the latter, to describe the Byzantine chronolect as being in the “Middle”.
4. Pace Κ. Α. Τ. Γαληνίτη, it is not at all apparent that the term “‘Medieval Greek’ has become dominant”, and contra Holton et al., here are uses of Byzantine Greek from three authors, with many more available. The ISO received three proposals in 2006–2009 to create new codes for Medieval Greek gkm, Ecclesiastical Greek ecg, and Katharevousa Greek elr; last year, the ISO rejected them all, partly due to “the lack of consensus among them” (p. 2). It is noteworthy that § 4 of the original change request for Medieval Greek gkm gave the language's name as “Middle Greek” and said of it that “he language is distinct from Ancient Greek in vocabulary, phonology, and grammar, and displays linguistic attributes which are characteristically Byzantine and uncharacteristic of Ancient Greek” , whereas the first page of the request for the new language code element gkm gave, as the reason for preferring the name “Middle Greek” over the autonym “Romaiki” and the alternative names “Byzantine Greek” and “Medieval Greek”, that “Middle Greek” was the “ost common amongst scholars” (!); it's only because Anastassia Loukina emailed SIL International to write that “the more common term used in Greek linguistics to refer to this stage of Greek is ‘Medieval Greek’ rather than ‘Middle Greek’” that the proposal was changed (by the ISO?) to one for “Medieval Greek”, although Δις Loukina merely asserted her claim, not citing anything. Is there any real evidence that any one term predominates?
Alas! So much for avoiding prolixity…
@A. T. Galenitis, Benwing2, Erutuon, Fay Freak, Mahagaja, Nicodene, Saltmarsh, Sarri.greek, -sche, Thadh, Theknightwho: For those of you who have voted or who intend to vote, I humbly request that you consider what I've written. For those of you not voting, I ping you in case you're interested and because you've taken part in this discussion before. To all of you, I apologise for the length of this post; I seem not to be very good at brevity. 0DF (talk) 07:37, 20 March 2024 (UTC)Reply
I've read all you wrote above but am not convinced by it, certainly not enough to change my vote. Points 2 and 4 pro Byzantine strike me as irrelevant, and point 3 sounds like it could equally be an argument to use the term "Anglo-Saxon" instead of "Old English", which I trust no one in this day and age still wants to do. None of the arguments contra Medieval strike me as particularly strong. —Mahāgaja · talk 07:56, 20 March 2024 (UTC)Reply
And what argument for 'medieval' struck you as strong? Nicodene (talk) 08:27, 20 March 2024 (UTC)Reply
I think somewhere in this discussion or an earlier one I said I prefer "medieval" because it makes it clear that the lect in question is a chronolect, not a regiolect. —Mahāgaja · talk 08:37, 20 March 2024 (UTC)Reply
Wut, even if Greek writing is located far in in Arabia or Ethiopia, I still call it Byzantine Greek provided it matches the period. Fay Freak (talk) 11:23, 20 March 2024 (UTC)Reply
Right, but calling it Medieval Greek makes it clearer that what's relevant is the time period, not the location. —Mahāgaja · talk 11:39, 20 March 2024 (UTC)Reply
The case can be made that 'Medieval' is chronologically explicit, but it is simply unimaginable that anyone could know the term Byzantine yet mistake Byzantine Greek for a regional label. Nicodene (talk) 11:59, 20 March 2024 (UTC)Reply
I don't find that unimaginable at all. It's certainly more plausible than someone thinking Byzantine Greek referred to overly complex or intricate Greek, but we can't entirely rule that interpretation out either. —Mahāgaja · talk 12:56, 20 March 2024 (UTC)Reply
It would require someone who knows about the city of Byzantium and yet is unaware of the existence of the Byzantine Empire, in other words a person that does not exist. As for the other potential sense of ‘Byzantine’, that is simply not an argument as it applies just as well to someone mistaking ‘medieval Greek’ as referring to a brutal or savage dialect. Nicodene (talk) 13:16, 20 March 2024 (UTC)Reply
Was Byzantine Greek also used outside the borders of the Empire? —Mahāgaja · talk 13:32, 20 March 2024 (UTC)Reply
Certainly, as it doesn't have to do with borders either.

If anyone has ever actually used ‘Byzantine Greek’ to distinguish one variety of Greek from another based on region or geopolitical control I've yet to see any sign of it. Nicodene (talk) 13:59, 20 March 2024 (UTC)Reply
So the language in question is used outside of the geographical area denoted by "Byzantine" but not outside of the chronological era denoted by "Medieval". That's why I prefer to call it Medieval Greek. —Mahāgaja · talk 14:11, 20 March 2024 (UTC)Reply
‘Byzantine’ is not a geographical area.

The one, and only, valid point in this is as stated above - that ‘Medieval’ is more chronologically transparent. Nicodene (talk) 14:21, 20 March 2024 (UTC)Reply
@Mahāgaja: Thank you for reading my rather overlong post. Responding to your points:
Do you regard point 2 pro Byzantine as irrelevant because you disagree with the statement “other things being equal, endonymy is desirable”? If so, I understand you, since that statement is my axiom for that point. Otherwise, I would appreciate a rationale.

I don't see how you could call point 4 pro Byzantine irrelevant for this project. In a dictionary of Byzantine Greek only, it indeed would be irrelevant, but since that's not what Wiktionary is, it's simply an error to call that point “irrelevant”.

AFAICT, “Anglo-Saxon” — itself a compound modifier — is on all fours with “Old English” in terms of its suitability for forming compound modifiers. That seems like a disanalogy to me.

Whereas “mediaeval” is traditionally clear vis-à-vis period (viꝫ 476–1453), a lot of usage muddies the waters. Jacques Le Goff throughout his career (or at least from 1977 onward) sought to extend the Middle Ages into “the eighteenth century, when, he believe, the European nation-states properly emerged” (Kaldellis 2019: ch. 4, p. 77). And conversely, some scholars of chronologically preceding and succeeding fields annex parts of the Middle Ages to their own periods: “The field of ‘late antiquity’ has been pushed by some to the early Carolingians (i.e., to the ninth century), whereas at the other end some historians of early modernity have reached back to claim everything after the twelfth century, when the European economy embarked upon a trajectory that would arc to modernity. With late antique and early modern historians claiming so much territory, that leaves only a rump Middle Ages squeezed around the turn of the millennium. Byzantium has little standing or stake in this debate.” (ibidem: pp. 77–78)

0DF (talk) 15:27, 20 March 2024 (UTC)Reply
I do disagree with the statement "other things being equal, endonymy is desirable". At Wiktionary, as at Wikipedia, what matters is what a language is commonly known as in English, not what its native name is. That's why we call German German, not Deutsch, and Dutch Dutch, not Nederlands. And no ancient language was known to its speakers with modifiers like "Old", "Ancient", "Classical", "Primitive" and so forth. And you yourself point out that Greek speakers of the era under discussion generally referred to their languages as (the Greek equivalent of) Romaic; but absolutely no one here is suggesting that Wiktionary's canonical name for this language should be Romaic. So that point is actually not an argument in favor of Byzantine at all; it's an argument against both Byzantine and Medieval. Point 4 is irrelevant because that's simply not a consideration we have ever had or ever should have. The names "Old Irish", "Middle Irish" and "Irish" are in reverse alphabetical order; so what? —Mahāgaja · talk 15:57, 20 March 2024 (UTC)Reply
@Mahagaja: Re “what matters is what a language is commonly known as in English”, I already wrote that “ready apprehensibility by Anglophone readers often supersedes th consideration”, so we don't disagree on the overriding importance of that. However, given a choice between two English names identical in their recognisability (which is an instance of that “other things being equal” qualifier), would you really maintain that endonymy wouldn’t even be a consideration to break the tie? That's not a strictly irrational position, but I would be surprised if you held it. Anyway, with regard to Romaic–Byzantine–Mediaeval, my point is that Romaic would be best in terms of endonymy, but its obscurity disqualifies it; whereas Byzantine and Mediaeval are comparably familiar to educated Anglophones, so Byzantine’s endonymy can break that tie. Is my position on this point any clearer now? That “Point 4” is nothing other than a consideration about page layouts which has some bearing on this issue; I'm not saying that it's a be-all and end-all, just that it's a relevant consideration, even if other considerations are primary. 0DF (talk) 00:09, 21 March 2024 (UTC)Reply

Split from Ancient Greek

Support, as creator of this proposal ‑‑Sarri.greek ^♫ I 05:54, 9 February 2024 (UTC)Reply
Support — Salt marsh ^🢃 06:26, 9 February 2024 (UTC)Reply
Thank you @Saltmarsh, my guru, mentor and administrator at Modern Greek! I promise to work as you have taught me. ‑‑Sarri.greek ^♫ I 06:33, 9 February 2024 (UTC)Reply
Support Nicodene (talk) 08:13, 9 February 2024 (UTC)Reply
Support —Mahāgaja · talk 08:26, 9 February 2024 (UTC)Reply
Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
Support A. T. Galenitis (talk) 16:46, 16 March 2024 (UTC)Reply
Support in principle — I am concerned, however, that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here. 0DF (talk) 07:46, 20 March 2024 (UTC)Reply
See § phase 1 (above) for an explanation of this comment. 0DF (talk) 01:55, 24 March 2024 (UTC)Reply

?

Happy month: καλό μήνα (kaló mína), @Benwing2, Mahagaja and everyone! Are we still on hold? I would like so much to come back, but how? having to write {m|gkm|xxx} all the time in pages with Ancient title... for example, @παπᾶς. I need: a month to review what exists. A year to do some labels for Learned Medieval (=archaisms and Hellenistic style), for Early Modern Greek (with medievalisms), some ready-to-fill-in inflection tables, some reference templates etc. I cannot even start without a code. Thank you. ‑‑Sarri.greek ^♫ I 17:00, 1 March 2024 (UTC)Reply

@Sarri.greek: I'm working on responses. Sorry for the delay. Please bear with me. 0DF (talk) 02:06, 2 March 2024 (UTC)Reply

Oh, M @0DF. What do you mean 'working on responses'? Please do not flood this page? We understand you are against. I shall make a special workpage-plan for MedGr once it is allowed. And with a talk page, and sections for every subject about it, where you can write as long texts as you like. Thank you. ‑‑Sarri.greek ^♫ I 06:04, 2 March 2024 (UTC)Reply

@Sarri.greek It looks like we have consensus for both changes, esp. for the split: 6-0 plus one undecided (User:0DF) for the split, 5-2 for the rename (User:Nicodene and User:0DF opposing). User:0DF, you never gave a response concerning the rename. Do you have anything you'd like to register (e.g. concerns, alternative suggestions, etc.)? Keep in mind that renames are easier to do than splits, so if for some reason it's decided in the future to undo the rename or switch to a third term, it wouldn't be such a big deal. Benwing2 (talk) 01:46, 17 March 2024 (UTC)Reply

Thank you all, thank you M @Benwing2! Great Sunday! I'm ready to start work! and will be checking the changes. I have prepared a trial-User:Sarri.greek/About Medieval Greek (in the pattern of WT:About Ancient Greek), a trial Template:User:Sarri.greek/gkm-IPA which needs to 'show' visibility, and more. Proposals and suggestions for the first-time-presentation of MedGr are welcome and needed from everyone, especially the administrators of Ancient and Modern Greek. e.g. at User About's Talkpage (or open an extra page?, please tell me, Sir, and everyone.) Thank you. ‑‑Sarri.greek ^♫ I

I don't have a ton to add to this discussion, any work to offer, or any great expertise - but in terms of periodizing, I wonder if it would also make sense to periodize Koine or classical as separate from ancient (in the sense that I guess ancient greek sort of goes until 300BC, and Koine/Classical goes until whenever we consider Byzantine/Medieval to start). My main thought here is that beyond new vocabulary borrowings replacing other vocabulary, or changes in grammatical forms or pronounciations, it is my extremely amateur perception that meanings, of some words at least, gradually shifted over the Ancient->Classical->->Modern period, especially as a result of Christianization. Or possibly that most attested pre-medieval greek texts are Classical rather than Ancient texts. I also think it's fine to call it Medieval Greek, that seems to be what English Wikipedia uses anyway. Anyway, that's my very late 2 cents to add to this discussion. -Furicorn (talk) 09:57, 31 August 2024 (UTC)Reply

@Furicorn: Thank you for your contribution. Currently, Ancient Greek is all Greek prior to 1453 except for that written in Linear B (which is Mycenaean Greek). Classical Greek and Koine Greek are not synonyms. The core of Classical Greek is the Attic Greek of the 5th century BC. Koine Greek is the form of Greek that developed as a consequence of the language's spread by the empire of Alexander the Great. It would certainly be possible to split Greek five ways — Mycenaean–Ancient/Classical–Koine–Byzantine/Mediaeval–Modern — but I expect that would result in a lot of redundancy, and I'm not sure it would be worth it. In reality, there is more difference between Homeric Greek and the rest of what we currently call Ancient Greek than there is between Classical Greek and Byzantine Greek, but that is not the split that was originally proposed in this discussion. As to what to call this chronolect, “that seems to be what English Wikipedia uses” is not a very strong argument unless you can tell us why it uses that name. 0DF (talk) 19:17, 12 September 2024 (UTC)Reply

Continuation (originally on Sarri's talk page)

(moved from User_talk:Sarri.greek/2024#The_old_discussion_in_its_new_place)

Hello, Sarri. It was good of you to create that updated signpost. How is your health nowadays? Do you feel up to answering those questions I posed you at WT:LTR#Medieval Greek from Ancient Greek yet? No pressure if not; I just thought I'd check. All the best. 0DF (talk) 05:00, 27 September 2024 (UTC)Reply

Hello M @0DF, thank you for your interest. Healthwise, I am under therapies (sometimes very hectic). I apologise, that I cannot remember your questions. I am typing with difficulty and I cannot participate in discussions that are too long.

If they rename 'Byzantine' to 'Medieval' or 'Mediaeval Greek', I will be able to check all occurrences. If they split Medieval Greek from Ancient Greek, I will slowly edit the not so many pages involved, marking the too many unmarked Koine entries too.

Attn @Benwing2, Chuck Entz as linguists and bureaucrats: I believe I will have the time to do it. It is simple: There IS a mediaeval period for Greek (grk) (working code everywhere: gkm, or more 'officially' proposed here as grk-gkm). Please include it in en.wiktionary, filling a gap of some 10 centuries from grk's c.3,000+ history. I might make a few very simple templates needed when necessary.

I will be happy to answer questions here; excuse my short answers. Thank you ‑‑Sarri.greek ^♫ I 13:28, 27 September 2024 (UTC)Reply

@Sarri.greek: I'm sorry to hear it's still rough for you. The questions are all in WT:LTR#Medieval Greek from Ancient Greek, but that became quite a long discussion by the end. It doesn't seem very ethical to subject you to more questioning in your current condition. Since renaming the chronolect was and is a matter of some contention, what's stopping you from being “able to check all occurrences” of Byzantine ? 0DF (talk) 01:16, 28 September 2024 (UTC)Reply

@0DF, thank you. gkm automatically gives 'Byzantine'. I have already cleaned up all old manual edits for the language. If administrators that are professional linguists do not prefer the title 'Medieval' to 'Byzantine', there is nothing I can do. ‑‑Sarri.greek ^♫ I 06:00, 28 September 2024 (UTC)Reply

@0DF @Sarri.greek I don't think either of you is going to change their mind with further discussion, so I don't think more questions are in order. Given that there is a (bare) supermajority of 4-2 (pro: @Sarri.greek @Saltmarsh @Thadh @Mahagaja; con: @Nicodene; @0DF) with one abstention, I am inclined to go ahead with the rename of Byzantine -> Medieval. More importantly, there is strong support for splitting Medieval/Byzantine out of Ancient Greek, and I don't want this name dispute to be a blocking issue. If it turns out that later on we decide to go back to the name Byzantine, that is not hard to do and I can do it by bot (I've done plenty of such renames before). @Sarri.greek Please be aware that logistically, splitting out gkm into its own L2 language requires adopting a temporary code for either the new L2 language or the old etym-only language while both are coexisting, until everything is moved. My inclination is actually to do the following:

Set up tracking for both the gkm and newly adopted grc-gkm codes.
Rename the etym-only code gkm -> grc-gkm by bot. Leave its name as "Byzantine Greek".
Remove the etym-only tracking for gkm once there are no more references. Leave the tracking for grc-gkm; this will make Sarri's job easier below.
Create a new L2 language gkm named "Medieval Greek". (Having them have different names is fortuitous as it will avoid some complaints about duplicate language names.)
Sarri, over time, moves the relevant entries to the ==Medieval Greek== header and cleans up any existing references to grc-gkm.
When all references to grc-gkm are gone, we can remove this etym-only code.

Also pinging @Theknightwho and @Surjection (who have been involved in prior language splits) for any technical comments. Benwing2 (talk) 07:12, 28 September 2024 (UTC)Reply

M @Benwing2, thank you so much for your reply and thorough plan! I can see how busy you are, dealing with so many languages. I appreciate your work, and your decision; a true gift to grk but also to me, personally. Please note, that admin @Mahagaja has proposed official code grk-gkm, not grc-gkm as it is a period of Hellenic language (grk). I'll follow your edits closely and will do my best, a bit slowly, but diligently; I shall rename Categories, update wikidata and do all the work where adiministrators need not to be bothered. I search with insource: and intitle: I might make a little label to produce: Late Medieval or Early Modern Greek +cat, if I encounter words of 1500, 1600. High register (with datives etc, similar to Koine), can be covered by {lb|gkm|learned}. Thank you, thank you. ‑‑Sarri.greek ^♫ I 08:31, 28 September 2024 (UTC)Reply

@Sarri.greek I'm not sure the context behind grk-gkm but grc-gkm is a temporary label used because it refers to a variety of grc. The temporary label will go away once everything is converted to the L2 language gkm. Benwing2 (talk) 08:40, 28 September 2024 (UTC)Reply

@Sarri.greek with apologies - this is really beyond my "pay grade" and terra incognita to me. I rarely venture there. To you personally Sarri - I have friends who have been through similar tribulations and worries, my best wishes. — Salt marsh ^☮ 17:48, 28 September 2024 (UTC)Reply

@Benwing2: I'll agree that no one objects to splitting Byzantine Greek out of Ancient Greek, and that the split should go ahead; however, I don't see how “this name dispute a blocking issue”. Why can't the split take place without changing the name? Unfortunately, I think the original discussion suffered for its obscure location in WT:RFM (now moved to the even more specialised WT:LTR); perhaps there would have been greater engagement had it taken place in WT:BP, where Sarri.greek initially posted about it. To remedy this, shall I draft a vote about the naming issue?

AFAICT, most of what needs to be done on “the front end” is to edit the 289 member-entries of Category:Byzantine Greek to rename, split out, or duplicate their contents as appropriate; of its member-subcategories, only Category:Byzantine surnames needs (presumably) to be renamed Category:Byzantine Greek surnames. As for the changes on “the back end”, I don't really understand why that six-stage process is necessary. Would it not be sufficient to make {{lb|grc|Byzantine}} (and its aliases) categorise into the temporary topic category Category:grc:Byzantine Greek until all the relevant entries are edited to use the new L2 header? I apologise if that is a naïve question. 0DF (talk) 09:14, 30 September 2024 (UTC)Reply

M @0DF. They are not Byzantine. Not necessarily. I would edit under such a title only for historical, artistic fields, probably at wikipedias. Thank you. ‑‑Sarri.greek ^♫ I 09:29, 30 September 2024 (UTC)Reply

@Sarri.greek: Sorry, what aren't Byzantine? 0DF (talk) 09:39, 30 September 2024 (UTC)Reply

@Sarri.greek: Do you mean the surnames? Are they, rather, Koine? Shall I recategorise them? 0DF (talk) 14:14, 30 September 2024 (UTC)Reply

M @0DF, aimez-vous les byzantinismes? ‑‑Sarri.greek ^♫ I 16:44, 30 September 2024 (UTC)Reply

@Sarri.greek: I'm sure that's very witty, and perhaps I should respond « Pas du tout ! » or something; I'm also sure you're better acquainted with French literature than I am. But to interpret you literally, rather than literarily, no, I don't tend to adopt overcomplicated solutions, and I fail to see how my technically naïve suggestion to Benwing is in any way more complicated than the plan he laid out. 0DF (talk) 17:40, 30 September 2024 (UTC)Reply

@0DF I don't honestly see the need to relitigate this with a formal vote. WT:RFM is the normal place where language moves and splits used to happen (and now WT:LTR). AFAIK everyone has been pinged and had a chance to comment, and the process requesting yes/no votes has been open far longer than a standard vote. However, I will defer to what User:-sche says, who has been the overall person shepherding language moves through; do you think a formal vote is needed? As for the plan I suggested, yes this is necessary because of the existence of the current gkm code; the labels are not the only place that Byzantine/Medieval Greek is being referred to. Benwing2 (talk) 19:08, 30 September 2024 (UTC)Reply

@Benwing2: I figured I'd wait a while before replying, with a view to letting things cool off and in the hope that Ms Sarri might acquiesce to my supplication for a rationale, but no such luck. If this were a līs, the prosecution would be expected at least to make a case. That has yet to happen; consequently, a judgment notwithstanding the verdict is appropriate. There has also been canvassing (see Special:Diff/78033207, Special:Diff/78033252); those are grounds for a “retrial”, surely. We might expect a safer finding from a superior court. In the meantime, I reiterate my suggestion that we make the split without changing the name, since there are no objections to doing that. 0DF (talk) 00:53, 21 October 2024 (UTC)Reply

Plan for Medieval Greek

Dear M @Benwing2, I keep checking Watchlists and your contributions, awaiting for your #plan for Medieval Greek. (WT:LTR#Medieval Greek from Ancient Greek) I know how busy you are with more important languages. But I am available for Greek and waiting... Long hours in front of the computer... Thank you ‑‑Sarri.greek ^♫ I 20:02, 18 October 2024 (UTC)Reply

@Sarri.greek: I ask again that you present some actual reasons for this proposed name-change, especially since you said in February that you “wouldn't mind terribly either” name. What's changed? 0DF (talk) 00:53, 21 October 2024 (UTC)Reply

The essence of my proposal (as at Jan2024petition & sources for documenting our lemmata in March2023) was to humbly inform the community of editors of the English Wiktionary of the newest developments on Medieval Greek language studies, defined now as a Medieval period of a language instead of "Byzantine" language as we often have been seeing at chairs of "Byzantine Studies" in universities all over the West. This was not MY opinion, but the opinion of professors like w:David Holton, w:Geoffrey Horrocks and many others. I humbly asked the linguists of en.wikt to take a look at their introduction at T:Cambridge Grammar of Medieval and Early Modern Greek
p.xvii … "as Greek scholarship was relatively slow to catch up with the advances made in textual criticism and editorial practice for medieval texts in other major European languages. Over the past thirty or so years much has changed in relation to the situation described above"
And they conclude at p.xix "For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700"
Whether there are linguists specialising in Greek (ancient, medieval or modern) who object, I did not hear any references from editors who object to the name "Medieval". The vote for naming languages is not about how one feels about it, but to give a chance to all the community to bring in references and enrich the information available.
Who is the linguist that opposes to the term "Medieval Greek" or the existence of it as a distinct period? ‑‑Sarri.greek ^♫ I 20:17, 21 October 2024 (UTC)Reply

@Sarri.greek @0DF We're looking for consensus rather than litigating something in a court of law, so I am reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus. I asked on Discord in the #hellenic channel and several additional people expressed support to one degree or another for the term "Medieval" and none for "Byzantine", so I am going to go ahead with the rename. Keep in mind that consensus does not have to be (and often is not) completely unanimous, and that renames are relatively easy to undo if for some reason this needs to happen. At the same time, however, a few people on Discord expressed strong reservations about the split and also said the amount of work required to effect the split might be a lot more than we think. Combined with the fact that in my experience, merges are even harder than splits, suggests to me that we should go slow in putting a split into practice. Let's first effect the rename, get the kinks worked out, and only then revisit the issue and look more deeply into what the split will involve. Benwing2 (talk) 20:38, 21 October 2024 (UTC)Reply

No problem with "slow", thank you M @Benwing2. Who are the linguists (contemporary ones) referred who think that it is not a distinct period? ‑‑Sarri.greek ^♫ I 20:55, 21 October 2024 (UTC)Reply

@Sarri.greek I'll let them reveal themselves if they want; they said they didn't want to participate in this discussion to avoid causing further upset and distress. Benwing2 (talk) 01:02, 22 October 2024 (UTC)Reply

@Sarri.greek, Benwing2: My apologies for taking a while to reply here. I am exceptionally busy in real life ATM, so haven't had the combination of time and lucidity to respond until now.

@Sarri.greek: In response to your question (“Who is the linguist that opposes to the term ‘Medieval Greek’…?”): I already cited above the Byzantinist professor Αντώνιος Καλδέλλης (Antónios Kaldéllis), whose 2019 book, Byzantium Unbound, has a chapter explicitly entitled “Byzantium Was Not Medieval”, which I excerpted in Citations:medieval. (Kaldellis is an Athens-born Greek, which I mention because it seems to matter to you.) The gist of his argument is that the Greek world didn't undergo the Middle Ages (the approximate millennium of notional benightedness intermediate between the dissolution of Classical civilisation and its Renaissance – literally “rebirth”), so the adjective “mediaeval” is improper when applied to it. Really, the closest analogue to the Middle Ages in the Greek world was the Τουρκοκρατία (Tourkokratía) of 1453–1821. Sure, you can call 1821 Greece's “delayenaissance”, though only if you call 1453 its “delayed Fall”, but then it becomes clear that Latin Europe's Middle Ages and the Greek world's “Middle Ages” have basically nothing to do with each other. I have more to say, but I'll leave it there to keep it short.

@Benwing2: I applied the “court of law” analogy because you used the term relitigate, that's all. Still, I think it was illuminating: I'm sure you won't deny that this whole proposal is ill-conceived and ill-pursued. I do find it galling that Ms Sarri can so grossly mischaracterise the foregoing discussion with statements like “I did not hear any references from editors who object to the name ‘Medieval’” having, by dint of her sheer obstinacy, eventually found a forum that will wave her proposal through; independently of the merits of her proposals, that kind of evasive and dishonest behaviour should not be rewarded. Finally, you wrote that you are “reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus,” before immediately invoking a conversation on Discord in order to overturn the consensus! How is such a double standard justifiable? But as a general principle, there is no way that “people on Discord said” should carry such weight, least of all when people on-wiki have not been made privy to the text of the discussion and those who opined on Discord decline to “own” their comments in the publicly-accessible on-wiki record. Wiktionary:Discord server neither permits nor prohibits such trust-me-bro invocations, but w:Wikipedia:Discord#Consensus clearly states that the relevant part of Wikipedia's policy on consensus (“Consensus is reached through on-wiki discussion or by editing. Discussions elsewhere are not taken into account. In some cases, such off-wiki communication may generate suspicion and mistrust.”) applies. We should have the same regulation, and judging by the comments by AG202, Mnemosientje, Sgconlaw, CitationsFreak, and koavf in Wiktionary:Beer parlour/2024/October#change in color to nyms, usex, affixusex, such a regulation has considerable community support.

0DF (talk) 17:56, 3 November 2024 (UTC)Reply

Solombala English

Latest comment: 1 year ago5 comments2 people in discussion

Howdy folks! Am wondering if it may be a good or a bad idea to add a new language code for Solombala English, which is a very little attested pidgin, which has some common features with Russenorsk. It has only 20 known words, and two of them are obviously misunderstood by the later translators (but can be seen in the original sources). All the words, as far I know, are presented here: w:ru:Соломбальский английский язык (I added some commentary and sources there as well, but long time ago). The main reason of my request is that Solombala may be useful in etymology of some Russenorsk words. Tollef Salemann (talk) 17:43, 9 February 2024 (UTC)Reply

Support. Theknightwho (talk) 08:54, 25 February 2024 (UTC)Reply

Created as crp-slb, since this has been open for a couple of weeks, and no-one else seems to have much to say. @Tollef Salemann.

I have given it the Cyrillic and Latin script codes because, having checked, the original 1849 source uses (pre-reform) Russian Cyrillic, but modern sources seem to prefer a Latin transcription exclusively: e.g. "vat ju vanted, asej!" is actually "ватъ ю вантетъ, асей!" in the 1849 source (pp. 406-7); note that вантетъ (vantet) has been transcribed as vanted, for instance. I can't find the 1867 source referred to, but I assume it's also in Cyrillic.

Please let me know if you think we should be handling the scripts in a different way, though. Theknightwho (talk) 09:30, 25 February 2024 (UTC)Reply

Thank you! There are also "my" instead of "tu". This was mistake of Broch i guess, and it seems like im the only who noticed it. There is also a funny story with his translation of "milek", cuz it was used in some adult context. As far i remember, there is no original Latin script Solombala, but im gonna first check through all the sources for being sure. The 1867 source took me a while to find last year, but i remember it wasn't impossible. Tollef Salemann (talk) 11:07, 25 February 2024 (UTC)Reply

@Tollef Salemann Alrighty - let me know if you think we should remove Latn. I should have also said that I've also set it to use Russian transliteration, for obvious reasons. Theknightwho (talk) 03:04, 27 February 2024 (UTC)Reply

Converting Min Nan into a family

Latest comment: 9 months ago29 comments13 people in discussion

Currently, we classify Min Nan (nan) as a language, despite it being a family of several Chinese lects. Because of this, the way we treat those lects is arbitrary and inconsistent.

Hokkien and Hainanese are both classified as etymology-only languages, despite Hokkien covering several major (dia)lects in its own right, and it being very common for entries to have a large number of Hokkien readings. It's not currently possible to add Hainanese to {{zh-pron}}, but it's also on the roadmap. In terms of how they are used, nothing distinguishes them from how we handle any of the full languages under the Chinese header, so there's no reason to classify them like this.
On the other hand, Teochew and Leizhou Min are classed as full languages, but they both have Min Nan set as their "ancestor", which is nonsense. I assume this was done so that the family tree looked right (see Category:Old Chinese language), but this has clearly happened because editors think of Min Nan as a family, not a singular language.

Currently, there is a pending request at the ISO in order to split Min Nan into a macrolanguage (though I won't address those which we don't currently have codes for, since that discussion is for another time).

nan should be converted to a family code.
Hainanese (nan-hai) should be converted to a full language.
Hainanese, Hokkien (nan-hok), Leizhou Min (zhx-lui) and Teochew (zhx-teo) should be on the immediate level below.
Given the large number of entries with numerous Hokkien readings, there are two options:
1. Convert Hokkien to a full language, with Quanzhou, Zhangzhou and Xiamen etymology-only languages, possibly with the addition of Taiwanese Hokkien.
2. Convert Hokkien to a family, and have Quanzhou Hokkien, Zhangzhou Hokkien and Xiamen Hokkien as full languages on the level below. I have no opinion on whether Taiwanese Hokkien (which is split out in the ISO proposal) should be treated separately if we do this.

Theknightwho (talk) 13:07, 17 February 2024 (UTC)Reply

Support the first three bullet points, but

Weak oppose on the fourth:

a potential slippery slope: Singapore, Penang, Longyan, etc. could warrant full languagehood if ZXQ and Taiwan are split
treatment of the above would be ambiguous due to the nature of Hokkien potentially not being monophyletic and the fact that eg. Taiwanese can’t really be called “a dialect of Amoynese” despite their shared transitionary nature
to draw a parallel with Northern Wu, Shanghainese and Suzhounese, both not being full languages, occupy a very similar geneological level when compared to ZXQ, though as far as the current trajectory is going, they will not be gaining full language-hood any time soon

Just my two cents — 義順 (talk) 02:57, 18 February 2024 (UTC)Reply

@ND381 Just to be clear, does that mean you support option 1 of point 4? Theknightwho (talk) 14:42, 18 February 2024 (UTC)Reply

ah yeah I misread what that said — yes, I would be in support of option 1 of the fourth point — 義順 (talk) 16:38, 18 February 2024 (UTC)Reply

@ND381 What do you mean by "transitionary"? 釆 (talk) 11:31, 28 February 2024 (UTC)Reply

I don't particularly know to much abt Hokkien linguistcs (I do Northern Wu) but from what I understand Amoynese and Taiwanese both exhibit features of both Zhangzhou and Quanzhou lects — 義順 (talk) 12:01, 28 February 2024 (UTC)Reply

@ND381 I see. This is the common wisdom, I guess.

In truth, it makes little sense to pretend that "Zhangzhou" & "Quanzhou" are cardinal dialects. For one thing, there is a great deal of variation within what are supposed to be "Zhangzhou" Hokkien & "Quanzhou" Hokkien. Quemoy & Tâng-oaⁿ 同安 dialects of "Quanzhou" Hokkien, as a clear example, are themselves "transitional to Zhangzhou". So the entire "Zhangzhou-Quanzhou" framework is made of duct tape. "Zhangzhou-Quanzhou" reflects Confucian administrative loyalties more than anything else, as the English terminology (via Mandarin Pinyin) suggests. And the exclusion of Amoy Hokkien from "Quanzhou" is arbitrary & inconsistent in itself. So, there's "nothing there", even if certain isoglosses unsurprisingly bundle along the old prefectural border. 釆 (talk) 08:59, 29 February 2024 (UTC)Reply

Similar to ND381,

Support the first three points. The second subpoint of point 4 is a terrible idea, since it leaves out Zhangzhou-Quanzhou mixed varieties of Hokkien, which is one of the reasons why "Hokkien" isn't monophyletic. It's also unclear whether dialects like Jinjiang and Philippine Hokkien would be subsumed under Quanzhou. While we're at this, we would also need to see how certain other varieties of Min Nan are dealt with under the structure based on the first three points, namely Longyan (including Zhangping), Datian, Youxi, southern Zhejiang and Zhangzhou-based varieties spoken in Guangdong/Guangxi. While the Language Atlas of China groups Longyan with other Quan-Zhang varieties, it seems that it traditionally isn't considered "Hokkien". We might also want to see where Hailufeng Min fits here. (I'm writing this in a little rush, so there might be more points that come along after.) — justin(r)leung _{{ (t...) | c=› }} 14:30, 18 February 2024 (UTC)Reply

@Justinrleung No, "Longyan" is most definitely not part of Hokkien, either linguistically or sociolinguistically.

Hai Lok Hong Hoklo is clearly parallel to Hokkien & Teochew.

The Hokkien dialects of southern Zhejiang are clearly part of Hokkien.

Many or most pieces seem poised to fall into place. 釆 (talk) 11:36, 28 February 2024 (UTC)Reply

@釆 I agree with you on this - Longyan should definitely be treated separately. I omitted it from the proposal because I specifically wanted to address the issue of whether we should treat Southern Min as a family, so I only mentioned the codes we currently have. It’s not supposed to be comprehensive, and in fact I was hoping it could set the stage for further additions, as I thought this change should probably happen before we add anything else. Theknightwho (talk) 13:07, 28 February 2024 (UTC)Reply

No particular vote as I don't think I'm qualified enough to discuss about Southern Min here as I very rarely edit it, but I share similar views with ND and Justin based on my limited understanding of the internal structure of Southern Min after reading Kwok (2018).

I reckon the treatment of Zhongshan Min should perhaps also be discussed here, given that Glottolog treats it as a subbranch of Southern Min, although it seems like some of it is Eastern Min. Eitherway I think it will need a code. – wpi (talk) 14:09, 23 February 2024 (UTC)Reply

Seconding this. Apparently, so-called "Zhongshan Min" is three mutually unintelligible languages, two of which may not belong to the NAN family (?) at all. 釆 (talk) 11:42, 28 February 2024 (UTC)Reply

I don't have many knowledge of the relationship between ZQX Hokkien and other Hoklo varieties like Chaozhou and Hainan.

However, Amoy variety, Quanzhou variety, and Zhangzhou one are mutually intelligible to some extend. Amoy varieties should be treated like a dialect of ZQX language linguistically. Just like Irish deirfiúr that has contained various pronunciation from the dialect locations in Ireland.

Concerning with whether the Taiwanese (Taigi) should be treated like a fully language or a dialect of Hokkien, it's something like Serbo-Croatian language separation issue.--Yoxem (talk) 10:50, 28 February 2024 (UTC)Reply

@Theknightwho

Supporting Item 2.

Not opposing Item 1 (nor Item 3) in this context, but — even disregarding misplaced outliers — how much evidence is there that these languages (say, Hainanese & Hokkien) belong to one family in a historical sense? (Wikipedia doesn’t treat Singlish & Jamaican Creole, for instance, as being in the same language family as English. Or do we use the term “family” differently around here?)

Supporting Item 4.1, excluding Taiwanese.

The “Zhangzhou-Quanzhou-Amoy” split reflects the mapping of Confucian loyalties. It corresponds somewhat to linguistic reality, but attempts to package “Zhangzhou” Hokkien & “Quanzhou” Hokkien in a systematic manner seem to give off more smoke than light, as suggested by Mar_vin_kaiser’s comment clarifying what “Zhangzhou Hokkien” should mean.

So so-called “Zhangzhou” Hokkien or “Quanzhou” Hokkien or Amoy Hokkien are all just Hokkien. The “Zhangzhou-Quanzhou” split reflects Confucian psychology, not linguistic reality, and “Amoy” was set up as a third group not for linguistic but for Confucian or face-related (“face truce”) reasons. If some words have lots of pronunciations, in part this reflects the sociolinguistic reality of a wide range of dialects being recognized as a single language. Also, marginal pronunciations seem to find their way into Wiktionary for Hokkien much more than for most other languages, but as long as they exist (and not just idiolectally) & are non-extinct, this is good & well. If extinct or poorly attested pronunciations are swelling the ranks, methods may need examined, but that’s for some other day.

There is something to be said for treating Penang-Medan Hokkien as another language. Even w/o getting into the genesis of Penang Hokkien, the phonology of the variety seems to bend the rules of plain Hokkien. But the convention seems to be to treat it as a dialect within Hokkien, and this in turn reflects the sociolinguistic reality. 釆 (talk) 11:54, 28 February 2024 (UTC)Reply

Pinging @Mar vin kaiser, Singaporelang, Mlgc1998, 幻光尘, LeCharCanon, MistiaLorrelay, Kangtw, The dog2, TagaSanPedroAko, Janinga Chang, Yoxem, 汩汩银泉, RcAlex36, Geographyinitiative for comment, who are all users who've edited recently that have some knowledge of Min Nan. Theknightwho (talk) 11:16, 27 February 2024 (UTC)Reply

Thanks for calling - but actually I'm not proficient on the historical & comparative linguistics of Minnan, so I'll report the opinion from @S.G.Junge1997 who is currently working on various Southern Han varieties (I'm doing so because he's currently suffering from IP block).

“As almost all the Sinitic languages that we discuss here, including Southern Min, Northern Wu and so-on, are de facto macrolanguages, it would be not proper to list just some variety of these macrolanguages as distinct languages while to consider other least-concerned languages a part of the huge dialect continuum, not mentioned the phonological, lexicological or genetic differences between the least-concerned varieties are much larger than these varieties with metropolitan native speakers. Janinga Chang (talk) 15:55, 27 February 2024 (UTC)Reply

...Taking Southern Min as an example, the macrolanguage Southern-Min itself is emerged among a group of coastal Min varieties in Dàtián, Fújiàn and surrounding area. Genetically, Southern Min can be divided into three varieties, the Western varieties used in Lóngyán and Zhāngpíng, Fújiàn Province, some remnants in Guǎngdōng Province (namely Zhōngshān Hokkien and some varieties of Leizhou Min), while the majority of Southern Min languages are in fact dialects of the massive Eastern varieties, including Chaozhou, Southern Min proper and Taiwanese Southern Min, these varieties shared a huge amounts of vocabularies and intelligibility, with only some of the characteristic vocabularies shared inside different branches. I'm not arguing about not list Chaozhou and Southern Min proper as different languages, but if one should consider listing Chaozhou and Quanzhou-Zhangzhou Southern Min or even Taiwanese Southern Min as separate languages appropriate, they must consider listing Dàtián qiánlù, Dàtián hòulù, Kǒngfūhuà, Sūbǎnhuà, Yànshí-Báishā, Lóngyán proper, Yǒngfú-Héxī, Zhāngpíng proper, Xīnqiáo-Xīnán and other small varieties concerned way less as distinct languages as well, (apart from Dàtián qiánlù and Dàtián hòulù, all these languages are different varieties of Western branch of the Southern Min which are using in different valleys around Lóngyán, most of which have less native speakers than 10k and are critically endangered, and although most of these languages share some common features, their differences in vocabularies and phonologies make them less intelligible internally than most of Eastern branch varieties, even not considering Chaozhou and Southern Min proper as different languages, some of these languages are still so diverse to be okay to be listed as separated) as it wouldn't be so appropriate to have "endangered" language varieties with often more than 1000k metropolitan native speakers listing as different languages while ignoring the real endangered languages with less than 10k native speakers and trying to hide their differences using a leftover garbage can discarded by thie metropolitan people who think their language is absolutely unique.”

Although this might sound offensive to some who values the traditional Quanzhou-Zhangzhou-Amoy-Taiwan layout more, his opinion is definitely worth considering since he had actually been to Longyan for fieldworks for several times. Janinga Chang (talk) 16:05, 27 February 2024 (UTC)Reply

Hi! I

Support the first three points, same as the ones above. I also reject the second subpoint of point 4 for the reasons mentioned. For the first subpoint of point 4, I support making Hokkien a full language. As for "etymology-only languages", I find it vague to say that a word from language X originates from "Zhangzhou Hokkien" when the way we've been using the term "Zhangzhou Hokkien" is the dialect specific to Zhangzhou city proper, and the word might have borrowed it not from Zhangzhou city proper. Seeing the reply of S.G.Junge1997, I'd be open to proposing Datian Min be listed as a separate language. --Mar vin kaiser (talk) 16:13, 27 February 2024 (UTC)Reply

@Mar vin kaiser Just FYI: "etymology-only language" is a misnomer; a much better description is "variant", as it covers everything from written standards like British English (en-GB) to chronolects like Old Latin (itc-ola) to regional varieties like Penang Hokkien (nan-pen). The thing that matters is that they're "part of" a full language (or, in some cases, another etym-only language). We already have codes for a few varieties of Hokkien, so that part isn't proposing anything new; just that they're nested under the new language code for Hokkien, instead of as sub-variants like they are now. Theknightwho (talk) 17:00, 27 February 2024 (UTC)Reply

@Theknightwho: Thanks for explaining! Then I see no problem with it. If ever, my question is why it should not be extended to Penang Hokkien, Singapore Hokkien, and Philippine Hokkien. --Mar vin kaiser (talk) 17:07, 27 February 2024 (UTC)Reply

@Janinga Chang Seconding parts of this. It was careless for all these varieties to have been anonymously swept into NAN w/o careful examination & debate beforehand. 釆 (talk) 12:09, 28 February 2024 (UTC)Reply

Support as well 1., 2., 3., and 4.1. as per further explanation of Theknightwho about variants under/part of Hokkien as a full language, e.g. Quanzhou, Zhangzhou, Xiamen, Penang, Singaporean, Philippine, Taiwanese, etc. etc. and also later expansion of no. 3 as well for the others under nan as a family to be their own as full languages under the nan family/branch of Min of Sinitic if they show divergent enough linguistic features and are realistically practically socially regarded by their speakers as separate from their closest of kin anyways by now, such as those mentioned above by Justinrleung and S.G.Junge1997 and those listed in the ISO pending request and other more there may be. Also, 4.2 is a bad idea due to there still being a lot of structurally similar or reasonably identical enough terms shared with these variants (ZXQ++) still tying them together despite some observable differences, whether in phonemic structure, vocabulary choices, tonal differences, and other tendencies of these variants. The gulf of difference with these variants (ZXQ++) is not yet like the difference with say what makes nan-hok, zhx-teo, nan-hai, zhx-lui, etc. different from each other, enough to definitively split them.

Also pinging as well other users I remember seeing them edit or create nan entries before: @Fish bowl, @Wikijb, @釆, @TongcyDai, @A-cai, @Hongthay for comment on this. Mlgc1998 (talk) 20:45, 27 February 2024 (UTC)Reply

Support the first three bullet points. RcAlex36 (talk) 04:30, 29 February 2024 (UTC)Reply

Thanks for calling and sorry for my bad english.

For point 4, I

Support the option 1 and

no support option 2.

Since I, as a native speaker (of ZC), I think the differences (of Zhangzhou Hokkien and other Hokkien tongues) are small that cannot split them to languages. I dare say they are just accents of Minnan/Hokkien.

For Teochew, Leizhou-ish and Hainanese, indeed their "ancestor" is not the Min Nan, but they are also southern descendant languages of ancient Min too — different to northern descendants like Fuzhou-ish.

(ZC: the Zhangzhou City accent of Hokkien)

MistiaLorrelay (talk) 10:06, 29 February 2024 (UTC)Reply

Split with option 1 of point 4, given the overwhelming support in the last two weeks. Taking inspiration from @Benwing2's process to split the Khanty languages above (see #Splitting Khanty Languages), I think this is what needs to happen:

Assign new language codes to Hokkien (nan-hbl) and Hainanese (nan-hnm), and change over Leizhou Min (zhx-lui → nan-luh) and Teochew (zhx-teo → nan-tws). For the sake of forward-compatibility, I've used the proposed codes from the pending ISO proposal, since that will make things simpler if they're accepted.
Assign a temporary family code to Min Nan (zhx-nan), which will be used while nan still exists as a language code.
Track any uses of the nan code.
Move all current {{nan-*}} templates to {{nan-hbl-*}}, since they all relate to Hokkien.
Convert any existing entries with the Min Nan headword to the relevant language (which I suspect will be Hokkien in the vast majority of cases, if not 100%).
Change any references to nan to use the appropriate code. Again, I suspect Hokkien will predominate.
Change any references to the existing etymology-only codes to use the appropriate code.
Delete nan as a language code, and add it as a family code, replacing the temporary code zhx-nan mentioned above.

At this point, I also suggest that we start a new thread to discuss any additional languages which should be added to the Min Nan family, as several have been suggested above. Theknightwho (talk) 18:52, 2 March 2024 (UTC)Reply

I Support points 1-3. However, ZXQ, Taiwanese, Penang, Singapore, and Philippine are really just variable accents with some regional vocabulary, like English dialects throughout England (are all those words recorded in Wiktionary too? They can't be as separate languages though?). Here in Taiwan, Taiwanese is getting more and more standardisation as the years pass, but I agree with another post comparing it to Serbo-Croatian (all accents of a single Stokavian dialect). There are different regional words used in Taiwan, but we start to understand them all as synonyms and I don't even know anymore which words belong to which specific location, like 日頭花 vs 太陽花, or 葉仔 vs 樹葉 vs 樹仔葉 vs 樹葉仔. I frequently travel throughout Southeast Asia and try to use Taiwanese in Penang and Singapore as much as possible. As someone mentioned, Penang has some interesting phonology, but I'm still able to hold conversations with taxi drivers--they speak in their way and I in mine. Though in Penang I've encountered drivers who talk freely at length and at times I find it hard to understand some of the details--they probably understand Taiwanese better than the other way around due to television dramas. But this interaction would not be possible for Chaozhou, which I consider so different as a separate language, and also Hainan and Leizhou--the phonology is far too different and they grammatically use different words. I feel that adding all the various regional pronunciations for ZXQ/Taigi clutters Wiktionary, and I believe that a better unifying meta-spelling would be better that enables regional pronunciations to be deduced through a few simple rules. I think it's best to mention whether a location has a completely separate word for something, rather than providing multiple pronunciations of the same word/字/morphemes. I also dislike the clutter and use of "invented" alternate romanisations that are not widely used or accepted, nor can anybody actually read. POJ or better, TâiLô, function just fine. Kangtw (talk) 09:36, 5 March 2024 (UTC)Reply

Sorry, when I posted support above, the green + button did not automatically appear when I posted. In spite of that, please consider my vote. Kangtw (talk) 09:39, 5 March 2024 (UTC)Reply

@Kangtw The vote has actually already closed, but everyone seems to have shared your view that Hokkien shouldn’t be split and should be treated as one language, so that’s how I’ve been carrying it out. Theknightwho (talk) 18:34, 7 March 2024 (UTC)Reply

@Theknightwho what remains to be done here? Cat:Min Nan language looks mostly empty. This, that and the other (talk) 09:51, 2 October 2024 (UTC)Reply

Add etymology-only codes for Proto-Anglo-Frisian and Proto-North Sea Germanic

Latest comment: 1 year ago17 comments4 people in discussion

As variants of Proto-West Germanic. This shoud hopefully be relatively uncontroversial, since we already have a healthy number of entries in Category:Anglo-Frisian Germanic and Category:North Sea Germanic, and there's a need for these due to both (sub-)families being mentioned in various etymology sections:

Anglo-Frisian: English welkin (and potentially Viking), Old English hrīþer and metegian, Old Frisian hrīther and Saterland Frisian dusse.
North Sea Germanic: English bastard and this (and potentially geck and geek), and Old English efest and nigonwintre (and potentially uton and wīcing).

No doubt there are many more entries where these could be referred to. Theknightwho (talk) 02:19, 27 February 2024 (UTC)Reply

@Theknightwho Anglo-Frisian is a well-established clade but I'm not so sure about North Sea Germanic. Cf. Wikipedia's comment:

North Sea Germanic, also known as Ingvaeonic /ˌɪŋviːˈɒnɪk/, is a postulated grouping of the northern West Germanic languages that consists of Old Frisian, Old English, and Old Saxon, and their descendants.

Ingvaeonic is named after the Ingaevones, a West Germanic cultural group or proto-tribe along the North Sea coast that was mentioned by both Tacitus and Pliny the Elder (the latter also mentioning that tribes in the group included the Cimbri, the Teutoni and the Chauci). It is thought of as not a monolithic proto-language but as a group of closely related dialects that underwent several areal changes in relative unison.

Benwing2 (talk) 04:36, 27 February 2024 (UTC)Reply

@Victar as a major PWG editor.

Not to mention the fact PWG is already pretty controversial (@Mårtensås had some strong opinions on the topic).

I don't think an etym-only code for either is needed at this time, as the supposed differences were very minor, and we don't represent it in our PWG entries afaik. So while the label signifies a term's distribution, it is still supposedly the same language as any other PWG reconstruction in the model we handle. Thadh (talk) 07:24, 27 February 2024 (UTC)Reply

I've never had a need for either, and North Sea Germanic is generally considered an areal grouping. -- Sokkjō 07:39, 27 February 2024 (UTC)Reply

I can see the argument against NSG, but there is very clearly a need for Proto-Anglo-Frisian based on the etymologies mentioned above. It’s not about whether any particular editor has a need for it themselves, and nobody is suggesting we create separate entries for them outside of PWG. Theknightwho (talk) 11:00, 27 February 2024 (UTC)Reply

@Theknightwho I see you created a category Category:Old Frisian terms derived from North Sea Germanic languages as well as Category:Elfdalian terms derived from North Sea Germanic languages and Category:Elfdalian terms derived from Anglo-Frisian languages. Why did you do that, since this discussion is far from resolved? Benwing2 (talk) 22:29, 27 February 2024 (UTC)Reply

@Benwing2 I've already removed the North Sea Germanic family, as I thought better of it. The question of whether we have an Anglo-Frisian clade is separate from whether we have a protolanguage for it (and that category was created back in November). Theknightwho (talk) 22:35, 27 February 2024 (UTC)Reply

Ignoring that fact that a genetic Anglo-Frisian family is disputed, as far as I'm aware, no one has published "Proto-Anglo-Frisian" reconstructions, not even Boutkan or Siebinga, so we wouldn't even have anyone to cite. -- Sokkjō 00:57, 28 February 2024 (UTC)Reply

@Sokkjo Then someone will need to deal with the etymology sections in those entries. Either we mention Anglo-Frisian reconstructions with a proper language code, or we don't mention them at all. Theknightwho (talk) 01:43, 28 February 2024 (UTC)Reply

Which entries, these: CAT:Anglo-Frisian Germanic? -- Sokkjō 02:11, 28 February 2024 (UTC)Reply

@Sokkjo English welkin (which refers to an "Anglo-Frisian Germanic" term), while Old English hriþer and metegian, Old Frisian hrither, and Saterland Frisian dusse all explicitly give Anglo-Frisian reconstructions. Theknightwho (talk) 02:15, 28 February 2024 (UTC)Reply

Amended. -- Sokkjō 04:16, 28 February 2024 (UTC)Reply

@Sokkjo You should also look at the entries mentioned in the North Sea Germanic list at the top of the thread. Once they're dealt with, I'll close this request as resolved. Theknightwho (talk) 06:44, 28 February 2024 (UTC)Reply

@Theknightwho Before resolving this, we need to clear up whether to let the existing 'Anglo-Frisian' family stand. You created it in November without discussion and it's not clear to me from this discussion whether there's consensus in its favor. Benwing2 (talk) 07:11, 28 February 2024 (UTC)Reply

@Benwing2 To explain the reasoning: I understood it to be an uncontroversial clade, which was reinforced by the existence of Category:Anglo-Frisian Germanic. I may have misunderstood the implications of that category, though. Theknightwho (talk) 07:26, 28 February 2024 (UTC)Reply

@Theknightwho I think what this shows is that all additions of clades, and more generally any addition of languages or families, needs discussion beforehand, no matter how uncontroversial it seems. Benwing2 (talk) 07:53, 28 February 2024 (UTC)Reply

@Theknightwho I see you also created the "High German" family back in November. Let me reiterate, you need to not create any more languages or families without discussion. Benwing2 (talk) 01:25, 1 March 2024 (UTC)Reply

Merging Tupinambá (tpn) into Old Tupi (tpw)

Latest comment: 1 year ago6 comments3 people in discussion

Tupinambá has only 3 entries, i, pá and ý, which are already covered by Old Tupi, i, pá and 'y/y. Also, Old Tupi is used as an umbrella term for all Tupi dialects in Wikitionary, so having a separate heading for Tupinambá doesn't make much sense. Trooper57 (talk) 17:11, 9 March 2024 (UTC)Reply

I also wanted to merge Tupinikin (tpk) for the same reason, just realised there's page for it. This one is basically blank, except for an empty maintenance category. Trooper57 (talk) 21:15, 9 March 2024 (UTC)Reply

tpw (Old Tupi) got merged into tpn (Tupinambá) in 2022, so we should probably follow suit. I don’t really understand why Tupinikin (tpk) should be merged, though. Theknightwho (talk) 21:52, 9 March 2024 (UTC)Reply

It's the same case of Tupinambá: what they call "Tupinikin language" is the variant of Old Tupi spoken by the Tupinikin people. I called them dialects but the difference is like General American to Southern American English, they differ on pronunciation in some points and call some things by different words, but aren't languages on their own. The category is just gonna stay blank forever as all lemmas will be put in Old Tupi anyway. Also, both Tupinambá language and Tupiniquim language redirect to Tupi language on Wikipedia.

About the code, I chose tpw over tpn because I prefer the name "Old Tupi", since it's neutral. I don't mind changing the code if we keep the name. Trooper57 (talk) 22:44, 9 March 2024 (UTC)Reply

@Trooper57 For reference ISO merged Old Tupi and Tupinambá to tpn, and the code tpw was deprecated. It also seems that all varieties of Tupi are extinct. If Tupinambá & Old Tupi are not significantly different from Tupiniquim perhaps they should all be merged into Tupi ? - سَمِیر | Sameer (^{مشارکت‌ها} · ^بحث) 21:54, 9 March 2024 (UTC)Reply

It seems theknightwho already said that while I was typing so my comment is now pointless 😞. - سَمِیر | Sameer (^{مشارکت‌ها} · ^بحث) 21:56, 9 March 2024 (UTC)Reply

Additional Southern Min languages

Latest comment: 1 year ago8 comments5 people in discussion

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381, Benwing2): Following the various discussions relating to Min in the last month or so, now seems a good time to propose the additional Southern Min varieties which we've been missing:

Zhenan Min (nan-zhn)
Datian Min (nan-dtn)
Longyan Min (nan-lnx) - sometimes grouped as part of Hokkien
Sanxiang Min (nan-zsh) - one of the Zhongshan Min lects; the other two are apparently Eastern Min
Swatow Min (nan-swt) - also known as Shantou
Hoklo Min (nan-hlh) - also known as Hailufeng or Haklau Min; currently etym-only but should be made a full language
Proto-Southern Min (nan-pro) - see Appendix:Proto-Southern Min reconstructions

Although we will want codes for all of these, it might not be desirable to count all of them as separate languages. I also suspect the list is far from complete. Theknightwho (talk) 19:32, 10 March 2024 (UTC)Reply

Support although (a) are we stuck with the above codes (i.e. they are proposed ISO 639 standard codes)? If not some of them could stand to be rationalized; (b) we should clarify earlier rather than later whether these should be full or etym codes (although for Chinese I suppose it makes less difference than elsewhere as the L2 header used is always "Chinese"). Benwing2 (talk) 19:37, 10 March 2024 (UTC)Reply

Swatow Min is classified under Teochew, so we do not need additional codes for it. The term "Hoklo" is a bit ambiguous because Hokkien speakers will consider "Hoklo" to refer to Hokkien. The dog2 (talk) 19:44, 10 March 2024 (UTC)Reply

@The dog2 The difficulty with "Teochew" as a name is that it refers to two different things: (1) what Wikipedia calls Chaoshan Min as a whole, and (2) the specific lect as spoken in Chaozhou, which it calls the Teochew dialect. We will still need a code for it either way, but the question is whether it should be an etymology-only code or a full language code. Theknightwho (talk) 19:54, 10 March 2024 (UTC)Reply

The first definition of "Teochew" already has a code for it. It is "zhx-teo". But I'd be open to changing it to be in line with that of the other Southern Min dialect. In Southeast Asia, the term "Teochew" in common parlance is generally understood to refer to the first definition. The dog2 (talk) 20:00, 10 March 2024 (UTC)Reply

@The dog2 Yeah, that makes sense. Just as a side point, the Teochew code was changed to nan-tws with the split of Min Nan, because it makes sense to give all the Southern Min codes the nan prefix, and the pending ISO code is tws. Theknightwho (talk) 20:21, 10 March 2024 (UTC)Reply

@Theknightwho: Thanks for starting this discussion. There are few issues here.

Zhenan Min might be a confusing name because Southern Zhejiang has both Southern Min and Eastern Min varieties; we may want to look into what other names we can use.
Datian Min might need to split further into Qianlu and Houlu dialects.
Does Longyan Min cover all Southern Min varieties spoken in the prefecture city of Longyan? Otherwise, there are several (sub)varieties of Longyan Min.
Swatow/Shantou should probably not be separate from Teochew - it's rare to consider them different varieties.
I personally prefer Hailufeng over Hoklo for the varieties of Southern Min spoken in Haifeng/Lufeng, since Hokkien may also be called Hoklo.

— justin(r)leung _{{ (t...) | c=› }} 20:11, 10 March 2024 (UTC)Reply

@Theknightwho

1. “Zhenan Southern Min” lies within Hokkien, both sociolinguistically & in terms of intelligibility. It’s pretty much an overseas cluster of Hokkien (and not only b/c it arrived by sea), and should be discussed in that context.

2. Yes, but “Datian Min” is not one language. Which “Datian Mins” belong within “Southern Min” (in any meaningful sense) is a question yet to be thoroughly considered.

3. Yes. “Longyan Min” is sociolinguistically not-Hokkien as well as mutually unintelligible vs Hokkien.

4. Yes. (Not sure if the other two are “Eastern Min”, but that’s a whole other ballgame.)

5. Swatow “Min” is part of Teochew, as others have pointed out.

6. Yes, most definitely. BTW, “Hoklo” refers to the language cluster that includes this language, Hokkien, Teochew, Taiwanese, & maybe others. So “Hoklo” & “Haklau” would be cognate non-synonyms, kind of like “Thai” & “Tai”, but not as striking.

7. Maybe the supposed proto-language should be fleshed out first? (+ I apologise if this is obvious, but Kwok’s “reconstructions” seem to be something quite different from what we usually mean by reconstruction. Also note (as with the ONESELF line) how much data it just flat-out ignores or omits (in this case perhaps in order to hang on to the presumed characters-of-etymology 家 & 己). 釆 (talk) 13:45, 11 March 2024 (UTC)Reply

Beserman

Latest comment: 1 year ago11 comments4 people in discussion

(Notifying Thadh, Tropylium, Surjection): Recently I’ve been adding Beserman Udmurt entries (Category:Beserman Udmurt), and contrary to my expectations, Beserman seems less similar to Udmurt than I initially expected (at least in terms of vocabulary and phonology). Beserman is usually considered to be a 'special' dialect of Udmurt, and since recently it also has it's own written standard. As far as I can see it definitely seems more convenient to create separate Beserman entries. I'm afraid that, if not, Udmurt might get pretty messy, with for most Udmurt entries a Beserman alternative form. A lot of information on the Beserman dialect can be found on http://beserman.ru/. I'll be glad to hear your opinions on this. Илья А. Латушкин (talk) 19:52, 13 March 2024 (UTC)Reply

At minimum most of the Beserman entries so far should not be listed as synonyms. Most are simply the result of a regular sound change from ы /ɨ/ to ө. Currently it seems this is also transcribed on here as /ʌ/ and translitterated as å, where at least the latter seems weird, most often I have seen the sound described as /ə/ (= Finno-Ugric transcription ə̑, which beserman.ru also seems to use). In any case, these could be easily accommodated similar to differences between e.g. English dialects, as alternate pronunciations + spellings (besides, this is not unique to Beserman but is paralleled by other dialects). A few other phenomena also come down to simple systematic pronunciation differences, e.g. the replacement of ӧ by /e/. It is unclear to me (and per current literature, it seems, also to Uralistics at large) how much else really differs between Beserman and even standard Udmurt. --Tropylium (talk) 20:07, 13 March 2024 (UTC)Reply

@Tropylium: The usage of synonym of stems from my usage of that format in Komi Izhma entries, e.g. асывыы (asyvyy). It's probably indeed a good idea to mark them as altforms, but the issue I have is mostly that Komi Izhma is actually semi-standardised alongside standard Komi, and the same issue is also present in Beserman.

On the differences between it and standard Udmurt, I honestly can't say a lot as I haven't worked too much with the language. It does feature some unique sound changes from the Proto-Permic language that set it apart from the other Udmurt dialects, like being the only Permic lect to (consistently) differentiate between the reflexes of *u and *ü. It also seems to have a national identity separate from other Udmurts. But other than that I would have to refer to Ilya, as they've worked with the language more closely. Thadh (talk) 20:47, 13 March 2024 (UTC)Reply

Sorry, whose *ü and where? Beserman has a few unique-looking cases of /ə/ (< ? *ɨ), but only in words where southeastern Udmurt more generally also shows /ʉ/ (the generally accepted historical scenario is that Beserman arises from the SE dialects of Udmurt, after a migration towards the north leaves them slightly isolated). --Tropylium (talk) 21:03, 13 March 2024 (UTC)Reply

Lytkin's. I'm talking of words like мөнөнө (månånå, “to go”) and зөмөнө (zåmånå, “to dive”). And I do take issue with your identification of the vowel as being a schwa, it most definitely isn't one. If you listen to actual recordings I think you'll agree that it is a low vowel, sometimes even as open as . Thadh (talk) 21:30, 13 March 2024 (UTC)Reply

/ə/ is not my identification but what reference literature insists calling it, e.g. the late Keľmakov's monographs on Udmurt dialectology like Udmurtin murteet (1994), Диалектная и историческая фонетика удмуртского языка (2003). A lot of beserman.ru's recordings do sound more like or , I agree. This could be a recent development, also e.g. the loss of ӧ is only post-WW1. --Tropylium (talk) 20:43, 14 March 2024 (UTC)Reply

Overall Permic languages have undergone some shifts in the recent century, also including the delabialisation of ӧ (ö) in practically all varieties of Komi. Since we are primarily a descriptive dictionary of the modern languages (earlier stages are a bonus!) I think we should stick to the modern pronunciation. The transcription of the vowel as å was taken over from Komi-Yazva, which has a very similar vowel written the same way. Thadh (talk) 09:07, 15 March 2024 (UTC)Reply

I know nothing about Udmurt, but I do agree that unless and until Beserman is considered a separate language, its entries should be formatted along the lines of {{alt form|udm|аску|from=Beserman}} rather than as synonyms of primary-dialect forms. —Mahāgaja · talk 21:40, 13 March 2024 (UTC)Reply

@Tropylium I have found some other sound correspondences between Udmurt and Beserman:

1. йырси ~ йөрчө 'hair', кырси ~ көрчө 'son-in-law'

2. кеч ~ кесь 'goat', ӟуч ~ дюсь 'Russian'

3. син ~ синь 'eye', кин ~ кинь 'who', нин ~ нинь 'linden'

4. тэй ~ тей 'louse', дӥсь ~ дись 'clothes', дэрем ~ дерем 'shirt'

5. ӝӧк ~ ӟек 'table', ӝыт ~ ӟөт 'evening', ӝужыт ~ ӟужөт 'high'

6. ньөм ~ ним 'name', йөвор ~ ивор 'news'

7. сылал ~ слал 'salt', плем ~ пилем 'cloud'

Илья А. Латушкин (talk) 18:24, 14 March 2024 (UTC)Reply

FWIW most of this is also within normal phonetic variation for Udmurt dialects, the /Te/ > /Tʲe/ change is the only systematic feature I don't recall seeing reported before (makes sense though, helps for not entirely losing the э/ӧ contrast).

One thing to consider is that even if we created Beserman separately, we'd then still want to note all forms like these in Udmurt entries, just now as etymological cognates rather than pronunciation variants. It might not save substantial work altogether. The etymologist in me at least thinks this would be probably the nicer option though, if you're already creating separate entries anyway. And it would be more consistent also with how we have split Komi-Zyrian and Komi-Permyak, instead of treating them as variants of single "Komi". --Tropylium (talk) 19:43, 14 March 2024 (UTC)Reply

The same thing has come to my mind as well, and at first sight the differences between Komi-Zyrian and Komi-Permyak do not seems to be much larger than those between Udmurt and Beserman.

I've found two more sound correspondences (1. ӟуч ~ дюсь 'Russian', ӟеч ~ десь ‘good’, 2. ньыль ~ ниль ‘four’, выль ~ виль ‘new’) and some Beserman words not found in standard Udmurt (most of them Turkic loanwords), eg. бикем ‘aunt’, биягам ‘husband's older brother’, бийөм ‘mother-in-law’, ўармиська ‘brother-in-law’, писяй ‘cat’ (also found as ‘писэй’ in dial. Udmurt), … Also some other, more sporadic, vowel correspondences have come up: изьыны ~ узьөнө ‘to sleep’, губи ~ гиби ‘mushroom’, чорыг ~ чорог ‘fish’, сюрес ~ сьөрес ‘road’, бугро ~ бөгра ‘felling’, … Илья А. Латушкин (talk) 08:50, 15 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 1

Latest comment: 1 year ago4 comments3 people in discussion

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hopefully this ping isn't too noisy. There are two more sources of Chinese lects here at Wiktionary that I have found that may need etym-only codes: qualifiers in thesaurus entries and labels in Module:labels/data/lang/zh. The following table is derived from thesaurus qualifiers (I computed this as part of converting nan codes and qualifiers to appropriate lect codes):

Qualifier	Count	Comment	Wikidata entry (if any)
ACG	1	Does this mean "Anime, Comics, Gaming"? Not a lect.
Anxi Hokkien	2	Need lect code?
Australia	1	Ambiguous
Buddhism	5	Not a lect
Buddhist temple	8	Not a lect
Chinese landscape garden	1	Not a lect
Christianity	1	Not a lect
Classical Chinese or in compounds	1	Ambiguous
Classical Chinese	59	Ambiguous
Classical	8	Ambiguous
Eastern Min; Southern Min	1	Ambiguous
Fuzhou	1	Ambiguous
Guangdong	1	Ambiguous?
Guiyang	1	Need lect code? Per w:Southwestern Mandarin, a subvariety of the Kun-Gui variety of Southwestern Mandarin	Q15911623
Harbin Mandarin	1	Need lect code; a variety of Northeastern Mandarin	Q1006919
Harbin	2	(same as above)
Hong Kong	24	Ambiguous
Hong Kong><tr:pot¹	1	Ambiguous
Hsinchu & Taichung Hokkien	1	??? Do we need two lect codes? Wikidata has a "Taichung Accent" (Q10914070) but it is a variety of Mandarin; can't find Hsinchu Hokkien in Wikipedia or Wikidata
Internet slang	9	Not a lect
Internet	2	Not a lect
Japanese calligraphy	1	Not a lect
Jilu Mandarin	1	Need lect code; primary subdivision of Mandarin	Q516721
Jinhua Wu	1	Need lect code	Q13583347
Korean calligraphy	1	Not a lect
Liuzhou Mandarin	2	Need lect code?	Q7224853
Liuzhou	1	(same as above)
Longyan Min	2	Need lect code (but will likely be transitioning to a full language, see #Additional Southern Min languages); per Wikipedia, a variety of Hokkien, but that may be wrong	Q6674568
Luoyang Mandarin	1	Need lect code; a variety of Central Plains Mandarin	Q3431347
Luoyang	3	(same as above)
Macau	2	a variety of Cantonese? Do we need a lect code?
Mainland China	3	Ambiguous
Mainland	2	Ambiguous
Malaysia	11	Ambiguous
Mandalay Taishanese	1	an overseas variety of Taishanese; Do we need a lect code?
Min	12	Ambiguous
Muping Mandarin	1	Do we need a lect code? This may be a variety of Shangdong Mandarin (Q3285432)
Muping	2	(same as above)
Nanchang Gan	1	Need lect code	Q3497239
Northern China	1	Ambiguous
Northern Mandarin	2	Ambiguous
Philippines	1	Ambiguous
Pinghua	1	Ambiguous
Pingxiang Gan	3	Do we need a lect code? A variety of Yiliu Gan Chinese (Q8053438)
Qing Dynasty	1	Not a lect
Sichuanese or Internet slang	1	Sichuanese = zhx-sic; Internet slang = not a lect
Singapore	13	Ambiguous
Son of Heaven	2	What is this? Not a lect.
Southeast Asia; dated or dialectal in Mainland China	1	Ambiguous
Southwestern Mandarin	2	Need lect code	Q2609239
TCM	3	Traditional Chinese Medicine? Not a lect.
Taichung & Tainan Hokkien	1	Do we need a lect code or two? See above under "Hsinchu & Taichung Hokkien" for Taichung Hokkien. Tainan Hokkien is mentioned in Wikipedia as being the prestige dialect of Taiwanese Hokkien but can't find it in Wikidata.
Tainan Hokkien	1	(see above)
Taiwan	24	Ambiguous
Taiwanese	2	Ambiguous
Taiyuan	1	Need lect code? Variety of Jin Chinese	Q10941068
Taoism	1	Not a lect
Thailand	2	Ambiguous
Urumqi	2	Need lect code? Variety of Lanyin Mandarin	Q10878256
Wanrong	1	~~This is a mountain indigenous township in Taiwan; I don't what lect is being referred to, and whether it's even Chinese~~ Refers to Wanrong County in Shanxi; a variety of Central Plains Mandarin, mentioned in the Great Dictionary of Modern Chinese Dialects; apparently a subvariety of Fenhe Mandarin (Q10379509)
Xi'an Mandarin	1	subvariety of Guanzhong Mandarin (Q3431648); not sure if it needs to be distinguished from Guanzong	Q123700130
Xi'an	1	(same as above)
Xinzhou	3	Need lect code? Variety of Jin Chinese, doesn't seem to have Wikidata entry
Yinchuan	1	Need lect code? Variety of Lanyin Mandarin
Yongchun Hokkien	1	Need lect code?	Q65118728
Yudu Hakka	1	Need lect code?	Q19856416

There are 14 lects among the above qualifiers with Wikidata entries that I could find, and some others apparently without Wikidata entries that might need a code. Benwing2 (talk) 03:12, 18 March 2024 (UTC)Reply

@Benwing2 Thanks for putting this together. On Longyan Min in particular, it's likely going to be separated out as a full language as per #Additional Southern Min languages, despite Wikipedia calling it a variety of Hokkien. Theknightwho (talk) 03:27, 18 March 2024 (UTC)Reply

@Theknightwho Ah, I see that now, thanks. Benwing2 (talk) 03:33, 18 March 2024 (UTC)Reply

@Benwing2: Wanrong refers to Wanrong County in Shanxi; this is a variety of Mandarin (Central Plains IIRC). — justin(r)leung _{{ (t...) | c=› }} 03:32, 18 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 2

Latest comment: 1 year ago19 comments5 people in discussion

@Theknightwho, Justinrleung Only pinging the people who responded to part 1 above. Here are the uncoded Chinese varieties with labels in Module:labels/data/lang/zh. As above, some have Wikidata items and some are too unspecific or ambiguous to turn into etym-only lects. Some are also clearly full languages or even families.

Canonical label	Label aliases	Comment	Wikidata item (if any)
`dialectal Cantonese`	—	Not specific enough
`Changzhounese`	`Changzhou dialect`, `Changzhou Wu`	subvariety of Northern (Taihu) Wu	Q1021819
`Chuzhou Wu`	`Chuzhou dialect`, `Lishuinese`, `Lishui dialect`, `Fujian Wu`, `Lishui Wu`	a variety of Chu-Qu Wu, a Southern Wu language; confusable with Quzhou Wu; not in Wikidata?
`Coastal Min`	`coastal Min`	Not specific enough
`Datian Min`	—	likely becoming a full language	Q19855572
`dialectal Eastern Min`	`dialectal Min Dong`	Not specific enough
`Gansu Dungan`	—	basis of the Soviet written standard for Dungan; not in Wikidata?
`dialectal Gan`	—	Not specific enough
`Guangxi Mandarin`	—	This is possibly the same as Guiliu (Gui-Liu) Mandarin (supervariety of Guilin Mandarin)	Q11111664
`dialectal Guangxi Mandarin`	—	Not specific enough
`dialectal Hakka`	—	Not specific enough
`Hong Kong Hakka`	—	Mentioned in the Wikipedia w:Hakka Chinese article	Q2675834
`Huzhounese`	`Huzhou dialect`, `Huzhou Wu`	subvariety of Northern (Taihu) Wu	Q15901269
`Inland Min`	`inland Min`	Not specific enough
`Jianghuai Mandarin`	`Jiang-Huai Mandarin`, `Lower Yangtze Mandarin`, `Huai`	primary branch of Mandarin	Q2128953
`Jiaoliao Mandarin`	`Jiao-Liao Mandarin`	primary branch of Mandarin	Q2597550
`Jilu Mandarin`	`Ji-Lu Mandarin`	primary branch of Mandarin?	Q516721
`dialectal Jin`	—	Not specific enough
`Korean Classical Chinese`	—	Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
`Linshao Wu`	`Linshao`, `Linshao dialect`, `Lin-Shao Wu`, `Lin-Shao dialect`, `Lin-Shao`	subvariety of Northern (Taihu) Wu; not in Wikidata?
`Liuzhou Mandarin`	—	a variety of Southwestern Mandarin	Q7224853
`dialectal Mandarin`	—	Not specific enough
`Min`	—	Not specific enough
`Nanning Pinghua`	—	a variety of Southern Pinghua Chinese; not in Wikidata?
`North America`	`North American`	Not specific enough
`Pinghua`	—	A family, not a language
`Shaoxing Wu`	`Shaoxingnese`, `Shaoxingese`, `Shaoxing dialect`	variety of Linshao Wu, in turn a variety of Northern (Taihu) Wu	Q7489194
`Shehua`	—	its own branch of Chinese	Q24841605
`Shuangfeng`	—	dialect of Old Xiang	Q10911980
`Siyi`	—	a Yue language? Includes Taishanese	Q2391679
`Southern Min`	`Min Nan`	Not specific enough
`dialectal Southern Min`	`dialectal Min Nan`	Not specific enough
`Southern Wu`	—	appears to be a Wu subfamily, including at least three languages
`Standard Written Chinese`	`SWC`	Per User:justinrleung, this refers to Standard Mandarin = Putonghua, different from Written vernacular Chinese which refers to the standard written vernacular varieties of the Qing and Ming dynasties, as opposed to Classical/Literary Chinese (NOTE: Wikipedia's Standard Written Chinese confusingly redirects to Written vernacular Chinese, and Wikipedia's article on that covers time periods from the Ming dynasty to the present, not just through the end of the 19th century)	Q727694
`Sujiahu`	`Su-Jia-Hu Wu`, `Sujiahu Wu`, `Su-Jia-Hu`	a subvariety of Northern (Taihu) Wu
`Vietnamese Classical Chinese`	—	Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
`dialectal Wu`	—	Not specific enough
`Wuzhou Wu`	`Jinhua dialect`, `Jinhuanese`, `Wuzhou`, `Wuzhou dialect`, `Jinhua Wu`	one of the Southern Wu languages	Q2779891
`dialectal Xiang`	—	Not specific enough
`Xinjiang`	—	subvariety of Lanyin Mandarin? Includes Urumqi Mandarin (Q10878256)
`Xinqu Wu`	`Quzhounese`, `Quzhou dialect`, `Shangraonese`, `Shangrao dialect`, `Xinzhou dialect`, `Xinzhou Wu`, `Quzhou Wu`, `Shangrao Wu`	a variety of Chu-Qu Wu, a Southern Wu language	Q6112429

Benwing2 (talk) 04:32, 18 March 2024 (UTC)Reply

@Benwing2: Huzhounese is Q15901269. Guangxi Mandarin should be approximately the same as Guiliu Mandarin, which is Q11111664. Hong Kong Hakka is Q2675834. Standard Written Chinese is usually referring to the modern standard, whereas Written Vernacular Chinese seems to refer to written vernacular Mandarin in the Yuan, Ming and Qing dynasties.

BTW, Xinzhou dialect as an alias for Xinqu Wu is problematic, since Xinzhou is ambiguous. Xinzhou Jin is a completely different variety from a different Xinzhou. — justin(r)leung _{{ (t...) | c=› }} 06:19, 18 March 2024 (UTC)Reply

@Justinrleung Thank you for finding those entries! I think we should remove all aliases that read 'Foo dialect' and consider only allowing aliases that include the language name in them. It is unfortunate that Wikipedia puts the primary entries for various Chinese lects under 'Foo dialect' instead of 'Foo Wu', 'Foo Jin', etc. for precisely the reason you mention. Even in the case of the same location mentioned, it's quite possible for a given location to have multiple dialects of different languages. Benwing2 (talk) 07:02, 18 March 2024 (UTC)Reply

@Benwing2: Thanks for tabulating these.

re: removing aliases that read 'Foo dialect', there are some dialects whose affiliation is not extremely clear, e.g. Huizhou dialect (not to be confused with Huizhou Chinese which is czh) and so we labelled it as "Huicheng dialect" ("Huizhou dialect" would also work but that will certainly be confused with czh).

Often the labels are used to achieve the text rather than categories, which is why there is a relatively large amount of |_| in {{lb|zh}}. One slighly extreme example would be 鐳#Etymology 2 sense 3,

{{lb|zh|Malaysia|&|Singapore|_|Cantonese|Hakka|Southern Min|;|Xiamen|Quanzhou|Zhangzhou|_|Hokkien|;|slang|_|in|_|Hong Kong Cantonese}}

, which is actually representing a large number of lects but it's not categorised properly due to the limits of {{lb}}. This is why sometimes you will find labels like {{lb|zh|Taiwan Hokkien and Hakka}} so that the desired result is achieved, even though it should actually be {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}}.

I would suggest to search for additional items in the form of {{lb|zh|Foo|_|Cantonese}} or {{lb|zh|Bar|_Wu}} which should unveil more unencoded dialects, some of which may already be covered in the previous section (e.g. something as mundane as {{lb|zh|Xiamen Hokkien}} isn't a recognized label so often it is inputted as {{lb|zh|Xiamen|_|Hokkien}}). (this is also why there is a relative abundance of Wu dialects in the labels data, probably the result of some dedicated user who added them)

I'll go over the actual individual lects later. – wpi (talk) 12:55, 18 March 2024 (UTC)Reply

Personally I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code.

Austrailia, Malaysia, Singapore, Thailand etc.: these may need a code for each lect (as appropriate), e.g. Malaysian Cantonese, Thailand Teochew (Malaysia may need to be further subdivided by location, we already have Penang Hokkien)
Guangdong: usually means Cantonese+Teochew+may be Taishanese+maybe Leizhou+maybe Hainan, this should be replaced accordingly
Hong Kong, Macau: usually refers to the standard form of Chinese (not necessarily Cantonese, but often somewhat influenced by Cantonese) spoken in HK/Macau respectively
Taiwan: similar to above
Hsinchu & Taichung Hokkien: there may be some need to create code for the Taiwanese Hokkien dialects, but I'll defer to others for this (but IIRC Hsinchu is predominantly Hakka speaking?)
Mandalay Taishanese: might need a code but probably won't be used much
Shehua: a branch parallel to Neo-Hakka (which we call Hakka/which is the only part of "Hakka" that we have coverage of), "She" is likely the more common academic term (but this clashes with She the Hmong-Mien language, both names share the same etymology).
- (the ancestor Neo-Hakka and She is parallel to Paleo-Hakka, but this is another rabbit hole, plus coverage of it is relatively poor)
Anxi Hokkien, Yongchun Hokkien, Muping Mandarin, Wanrong: seems relatively minor to be assigned a code? I'm not certain however.

Some comments (partly based on my observation of the usage in {{lb|zh}} and also based on our plans to increase coverage of dialects), grouped by branch:

Gan: label-wise we usually have Nanchang , Lichuan , Pingxiang , Taining , Yongxiu . These are all locations rather than subgroups (my understanding is that the subgrouping of Gan is quite undeveloped). It's worth noting that our Gan coverage is extremely lacking (due to both lack of data and lack of motivated editors), and most likely we will only have these four locations in the foreseeable future.
Hakka: Sixian may need to be divided into North Sixian/South Sixian. We might also want to add the rest of the Taiwanese Hakka dialects. Coverage of Yudu Hakka and Hong Kong Hakka seems OK.
Huizhou: this group is too small to have any meaningful subdivision, I think at most we can assign a code to Jixi .
Jin: I think we could have Taiyuan and Xinzhou . The other dialects have poorer coverage. (I didn't find any usage of Xinzhou Wu)
Wu: besides the mentioned ones, we may also need Danyang Wu? I'll defer to ND381 and Musetta6729.
Eastern Min: representative dialect is Fuzhou , other possible inclusion would be Fuqing and maybe Ningde . The rest seems too sporadic.
Xiang: Changsha , Shuangfeng , Loudi , Hengyang are major dialects. The coverage situation is similar to Gan.
Mandarin: the ones mentioned should be added generally.
Pinghua: Southern Pinghua is usually considered to be part of Yue. Worth noting Nanning Pinghua and Nanning Cantonese are different though.
Cantonese/Yue: I think we should add Siyi Yue and demote Taishanese to a variety of it. The usage of to refer to Cantonese or Yue is pending discussion. Other ones that could be added include Yangjiang and Dongguan , while the rest seems to have relatively poor coverage.
Southern Min is already dealt with elsewhere
Puxian Min: I believe this can have Putian and Xianyou ?

– wpi (talk) 16:37, 18 March 2024 (UTC)Reply

@Wpi Thank you for all the details! I just realized there is a third source of varieties here at Wiktionary, which is the dialectal data found in the data modules for {{zh-dial}}, specifically Module:zh/data/dial. For example, under 討食 / 讨食 you have a whole set of "dialectal synonyms of 要飯 / 要饭 (yàofàn, “to beg for food”)" in addition to the Thesaurus entries for 乞討 / 乞讨 (qǐtǎo) fetched using {{syn-saurus}}. Ultimately IMO we should probably merge the dialectal data in the {{zh-dial}} modules with the Thesaurus entries, but that is another can of worms. For now I'll just note that the {{zh-dial}} data conveniently comes with links to English or Chinese Wikipedia entries so it should be easy to find the relevant Wikidata items. *HOWEVER*, there are an absolute ton of varieties listed; I count 1,122 of them currently. (Of these, 969 have Wikipedia links, but many of these links are to geographic entries rather than dialectal entries.) I doubt all of these varieties need to be assigned etym-only codes. I think one way to pare them down is to go through the dialectal data and count how many synonyms there are for each variety. This should reveal which varieties are important enough to warrant codes (I imagine a lot of the varieties listed have no synonyms at all in the data). Benwing2 (talk) 22:32, 18 March 2024 (UTC)Reply

Please see User:Benwing2/zh-dialect-counts. This table lists all the varieties/dialects found among the dialectal synonym data along with counts, the Chinese dialect group they're in and the Wikipedia link, if any. (There 2,787 terms currently listed in the data.) I'm thinking we can start with the first 100 or 200 varieties listed, figure out what to do with them, and go from there. Also, the script I wrote to combine the counts with the variety data in Module:zh/data/dial output the following warnings concerning varieties for which there are synonyms but which aren't in Module:zh/data/dial:

WARNING: Found variety 'Luoyang' not in variety data
WARNING: Found variety 'Zhumadian' not in variety data
WARNING: Found variety 'Pingdingshan' not in variety data
WARNING: Found variety 'Zhoukou' not in variety data
WARNING: Found variety 'Xuchang' not in variety data
WARNING: Found variety 'Nanyang' not in variety data
WARNING: Found variety 'Luohe' not in variety data

Benwing2 (talk) 23:24, 18 March 2024 (UTC)Reply

@Wpi In response to some of your comments:

As for 'Foo dialect' issues, I think in cases like 'Huicheng dialect' where the affiliation isn't clear, we should just identify them as 'Huicheng Chinese'. It's true that we usually do that for top-level groups but I think it's better in this case than using "dialect".
I will search for labels specified using _ and such. Hopefully the usage isn't too inconsistent.
Concerning your statement "I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code", what is the alternative you are responding to? Is it further full-language splits (e.g. with Southern Min)?
For zh-HK, zh-MO, you say "standard language". If this is Cantonese, maybe we should use yue-HK, yue-MO?
For the specific lect comments, I don't know enough to respond but it all looks reasonable. User:Theknightwho, what do you think of the proposal to demote Taishanese to a variety of Siyi Yue?

Benwing2 (talk) 05:25, 19 March 2024 (UTC)Reply

In re point #2, see User:Benwing2/zh-label-sets. Benwing2 (talk) 06:41, 19 March 2024 (UTC)Reply

OK, only a few uses of labels involving 'Foo dialect', and only one involving a label actually listed in Module:labels/data/lang/zh, which was 𠀫𠀪 (which, BTW, is being RFV'd) using 'Hangzhou dialect':

  28 Huicheng dialect
   4 eye dialect
   3 ancient Chu dialect
   1 title=zh:Grammaire du dialect
   1 southern dialect
   1 some Mandarin with a Southern Chinese dialect
   1 of one's speech of the local dialect
   1 ancient Qi or Wu dialect
   1 ancient Qi dialect
   1 [[w:Luoyang dialect
   1 Sòng-Lǔ dialect
   1 Sichuan dialect
   1 Shaanxi dialect
   1 Northeastern dialect
   1 Ningyuan dialect
   1 Hangzhou dialect

I changed that one usage to 'Hangzhounese' and deleted all the 'Foo dialect' labels. We might want to add something for the 'Huicheng dialect' labels (cf. your mention above of this). Benwing2 (talk) 08:10, 19 March 2024 (UTC)Reply

@Benwing2:

re #3, I'm referring to when we are assigning the codes, i.e. groups like Siyi will have a full code whereas local dialect points like Taishanese will have etym-only codes.

re #4, it's basically Standard Written Chinese as used in Hong Kong/Macau. It should be "written/used" not "spoken" as I previously mentioned. There's a difference between yue-HK (Hong Kong Cantonese) and zh-HK (Hong Kong), it's a bit like Norweigian Nynorsk vs Norweigian Bokmal.

Also pinging @Justinrleung for comments to specific lects.– wpi (talk) 11:31, 19 March 2024 (UTC)Reply

@Wpi OK thanks. As for #3, I agree with your idea of the separation between full and etym-only languages going along group lines. As for #4, didn't realize there is this difference but it makes sense. Benwing2 (talk) 15:04, 19 March 2024 (UTC)Reply

Thoughts on Wu codes (locality codes are just suggestions):

Northern Wu subbranches imo don't really need codes but individual localities would be beneficial. Of which:

Changzhounese wuu-chz

Danyangese wuu-dan

Shaoxingese wuu-shx

are in need of codes (due to relative abundance of data, and will also be gaining zh-pron support soon). Some others to consider may include

Cixinese wuu-cix

Huzhounese wuu-huz

and all the other lects currently in Module:wuu-pron/sandbox. We are currently still working on it so it may be worth delaying the addition of these lect codes until we finish the Northern Wu overhaul.

Currently extant Northern Wu localities (Hangzhounese, Ningbonese, Shadi Wu, Shanghainese, Suzhounese) should all be listed under Northern Wu (wuu-nor) in the family tree on Category:Wu language (and any other system that may handle language families).
Southern Wu wise, I believe these would be helpful to have in the future, as we will be adding pages/making modules for them as soon as possible:

Jinhuanese / Wuzhou Wu wuu-jih

Taizhounese / Taizhou Wu wuu-tai

Lishunese / Chuzhou Wu wuu-lis

Shangraonese / Xinzhou Wu wuu-shr

in descending order of importance. I decided to split "Chuqu Wu" as is described on the chart as there is no clear consensus as to how the non-coastal non-Northern Wu bits should be split, but in general these three areas (Wuzhou, Chuzhou, Xinzhou) can be seen reflected in some way.

A Southern Wu code (wuu-sou) should not be made. It is likely not a familial grouping but rather just a term to use to contrast it with Northern Wu. There have been some preliminary studies that investigate whether it does form a coherent family, but results are mixed and sample sizes are small.

Regarding why there are so many Northern Wu localities, yes, muset & I added them, as unlike Hokkien for instance, the sociolinguistic attitude towards these lects is first and foremost the locality rather than the family (which contrasts with the "Hokkien" identity).

@Musetta6729 - only other active Wu editor: let us know if you have any other/conflicting ideas — nd381 (talk) 19:38, 19 March 2024 (UTC)Reply

@ND381 Thank you! I will probably take all your suggestions. Benwing2 (talk) 20:26, 19 March 2024 (UTC)Reply

Just only got the chance to look at this thread now - in terms of Wu I definitely agree with everything that ND has said so far, just two things I would like to mention:

First: Having Urban Shanghainese as a variety (maybe under something like wuu-ush) along with simply "Shanghainese" (wuu-sha) might be useful. This is due to a variety of reasons, but mainly that Contemporary "Urban" Shanghainese has showcased more convergent evolution with say, Ningbonese or Suzhounese during the last century, and has become more sociolinguistically and identity-wise distinct from many Non-Urban varieties surrounding it. With only the label "Shanghainese" now it is tricky to disambiguate between categories such as:

Primarily urban inventions not used in non-urban varieties, or that have spread out to non-urban regions as still recognisably "urbanite" speech
Common invention/retention in Non-Urban Shanghai varieties that are rare/obsolete/not used in Urban Shanghainese
Inventions in Non-Urban Shanghainese that is not geographically restricted to one specific region of Shanghai
Usage attested in both 1850s City-Center Shanghainese and contemporary Non-Urban, but not Contemporary Urban Shanghainese

Especially because all of this variance is also deeply interconnected with notions of locality, of new and old, of class, ethnicity and other sociolinguistic variables when looked at from an Urban Shanghainese standpoint. All of this has led to the use of ad hoc labels along with the Shanghainese tag like "old-period", "chiefly non-urban/suburban", "rare or obsolete" etc which is definitely not ideal. By having Urban Shanghainese as a variety I expect that this would be easier to manage - and as we go on to add more coverage on Non-Urban Shanghainese varieties we should hopefully be able to have more specific variety codes for lots of the Non-urban Shanghainese varieties too.

The second thing is a bit more minor - Suhujia (蘇滬嘉 - see linked Chinese Wikipedia article) might be a more commonly used term than Sujiahu (蘇嘉滬), which we seem to have now. The grouping seems to be somewhat areal and vaguely defined to me and I am doubtful of the extent to which having it might be useful, but nevertheless it's a fairly widely accepted grouping so thought I would bring this up in case we end up making the decision to add it. Musetta6729 (talk) 04:38, 24 March 2024 (UTC)Reply

Redid Chinese labels

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho I redid the label structure in Module:labels/data/lang/zh. I added missing labels corresponding to the new lects in Module:etymology languages/data, canonicalized the labels to include the group name (e.g. Xiamen Hokkien instead of just Xiamen), and added shorter aliases. Duplication is avoided in something like {{lb|zh|Xiamen Hokkien|Quanzhou Hokkien|and|Zhangzhou Hokkien}} (or equivalently, {{lb|zh|Xiamen|Quanzhou|and|Zhangzhou}}) by a new Chinese-specific label postprocessing function in Module:labels/data/lang/zh/functions, which attempts to remove duplicate group names as well as duplicate occurrences of "Taiwanese" in {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. Please let me know if you don't like the output in specific situations and I will tweak the function. Note that I removed the label Taiwanese Hokkien and Hakka and all its aliases, after converting all occurrences to use multiple labels like {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. I also changed a few categories to better reflect the lect name, e.g. the label Philippine Hokkien now categorizes into Category:Philippine Hokkien instead of Category:Philippine Chinese. Benwing2 (talk) 00:50, 20 March 2024 (UTC)Reply

@Benwing2: Thanks for setting this up. The function looks like it works well generally, but there are some cases where it might lead to confusion, such as {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}} showing up as "Taiwanese Hokkien, Hakka", which could mean the unintended "Hakka (in general) and Taiwanese Hokkien". Perhaps one way to prevent this is to only remove duplicate group names when there is an "and" somewhere in the chain? Is that something that could be done? — justin(r)leung _{{ (t...) | c=› }} 06:56, 20 March 2024 (UTC)Reply

@Justinrleung Yup, I can do that, thanks for the suggestion. Benwing2 (talk) 17:08, 20 March 2024 (UTC)Reply

@Justinrleung This should be done. Let me know if you see anything else needing fixing. Benwing2 (talk) 03:25, 22 March 2024 (UTC)Reply

Ramifying/filling out Yue Chinese

Latest comment: 1 year ago17 comments4 people in discussion

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): Apologies once again for the wide ping, as I haven't received any responses to some of my other pings. I added a bunch of labels for Yue Chinese lects, but it is revealing some issues:

We correctly classify Yue as a family, but it contains only two languages (Cantonese language and Taishanese language). Meanwhile per Wikipedia and Glottolog there are something like seven primary branches:
1. Yuehai Yue, which is more or less Cantonese proper.
2. Siyi Yue, which includes Taishanese.
3. Goulou Yue, most notably including Yulin dialect and its sublect Bobai dialect.
4. Yongxun Yue, with Nanning Yue as the representative dialect.
5. Gaoyang Yue, most notably including Yangjiang Yue.
6. Wuhua Yue.
7. Qinlian Yue, partly intelligible with standard Cantonese.
We are using the code yue for Cantonese proper and zhx-yue for the Yue family, which is inconvenient and contrary to ISO 639-3 usage.

I propose:

Change to using yue for the family and use some more specific code for Cantonese, either yue-can or yue-yue (for Yuehai Yue).
Create L2 languages for each of the above seven groups. We can reuse the "Cantonese language" for Yuehai Yue. This shouldn't entail any real splitting per se as we already have Yue as a family rather than a language.
Demote Taishanese to an etym-only variety of Siyi Yue and assign it a code yue-tai in place of zhx-tai.

Please also note, in the labels I created, the canonical name for each label has "Cantonese" in it for all sublects of Yuehai Yue but "Yue" for Yuehai Yue itself and for all other lects. Almost everything called "Foo Cantonese" (except for variants of standard Cantonese) has an alias "Foo Yue", but not the other way around. For example, the Dongguan dialect is called "Dongguan Cantonese" because it is a variety of Yuehai Yue, and has "Dongguan Yue" as an alias; but the Yulin dialect is called "Yulin Yue" and does NOT have "Yulin Cantonese" as an alias, since it is a variety of Goulou Yue rather than Yuehai Yue. Benwing2 (talk) 22:17, 28 March 2024 (UTC)Reply

Thanks for the ping. Here are some of my questions, to make sure I understand this better:

What would the categories of a normal entry like 不嬲 look like? I'm asking this because "Cantonese" and "Taishanese" are more recognisable than "Yuehai Yue" and "Siyi Yue" and I'm wondering if these more obscure names would end up in the entry. If this works like the other Chinese splits, I suppose the categories would not change, and just the categories of the categories would change?
We have plans (maybe) to include more Yue languages than just Cantonese and Taishanese, which primarily means expanding the scope of the "pronunciation" section of the entries, and this would also generate more categories. Would your proposal benefit this project because we could more easily categorise the new Yue languages to come?
While normal entries written using Chinese characters have the "Chinese" L2 header, romanisations have their respective header per language, such as xiànglái having the Mandarin L2 header and boán-liân having the Hokkien L2 header. We don't seem to do the same for Cantonese, and the pronunciation sections also don't link to the Cantonese romanisations, and I also can't seem to find any Cantonese L2 header. This might have been decided in an earlier policy that I don't know about, so I guess my question is, would it create problems if you demote Taishanese to an etym-only language?
Per your last point I tried to google "Yulin Yue" but the main results are about someone named Yulin Yue, so I tried to google "Yulin Yue" + language and got 235 hits, while "Yulin Cantonese" got me 73 hits (and "Yulin Cantonese" + language got me only 8 hits). This isn't a question per se, just a comment about how little-known other Yue languages are.
I feel like I just have to insert a comment about the choice of Mandarin exonyms vs. Cantonese exonyms vs. endonyms. I think the first option is generally how we do things (except for the names of the main branches), and I suppose this is just the result of the general scholarship, and I'm not really trying to subvert this practice, but I would just like to raise some awareness to this phenomenon.

The above. Apologies if 1999. --kc_kennylau (talk) 23:01, 28 March 2024 (UTC)Reply

@Kc kennylau Thanks much for the detailed questions! In response to your questions, let me see if I can answer:

There are two types of categories: (1) L2 language categories (e.g. Category:Mandarin lemmas); (2) etym-language categories (e.g. Category:Xi'an Mandarin). Under my proposal, we would probably use "Cantonese" in place of "Yuehai Yue" as the L2 language name, since they seem more or less equivalent; but "Siyi Yue" would be the L2 language subsuming Taishanese. This means that a Taishanese term would be categorized both under Category:Siyi Yue lemmas and Category:Taishanese Yue (or maybe just Category:Taishanese; there is some flexibility in the choice of etym-language categories). So essentially, things like Category:Taishanese lemmas would go away in favor of Category:Siyi Yue lemmas + Category:Taishanese Yue, but Category:Cantonese lemmas would remain (possibly with additional more specific categories like Category:Guangzhou Cantonese or Category:Hong Kong Cantonese, both of which already exist).
This proposal is somewhat orthogonal to how we handle the pronunciation section entries; the ones for Cantonese and Taishanese can remain as-is, but might categorize differently (as explained above).
If there were romanizations under a Taishanese header, they would have to be renamed to have Siyi Yue as the header and a label Taishanese attached, to make it clear that the romanizations are specifically Taishanese. (Similarly, entries like boán-liân used to be under a Min Nan header before Hokkien got split out as an L2 language.) But since we don't seem to have any such romanizations, this issue won't arise (at least for now).
As for the obscurity of Yue varieties other than Cantonese and Taishanese, I completely agree. The terminology isn't well-worked out and the term "Cantonese" is particularly problematic since it variously refers specifically to (a) the speech of Guangzhou specifically; (b) the more general Yuehai Yue language that Guangzhou speech is part of ; and (c) the entire Yue family. This issue doesn't seem to come up so much for other groups like Mandarin and Wu.
As for Mandarin vs. Cantonese/Yue naming, I am not wedded to using the Mandarin terms; I just chose them because that is what Glottolog and Wikipedia largely use. If the consensus is to use Cantonese-language terms for all lects or to use native terms (endonyms), we can do that as well. I am guessing the Mandarin terms see more usage just out of a sort of default familiarity (pretty much everyone who works with Chinese languages is familiar with Mandarin but many aren't familiar with Cantonese or other varieties, and several Yue varieties don't even have standard romanization schemes). Benwing2 (talk) 23:50, 28 March 2024 (UTC)Reply

I support the move in general (with a strong preference of using yue-can), however here's a couple of problems I can foresee with this proposal:

Goulou actually forms a dialect continuum with Southern Pinghua language, and therefore nowadays is usually thought of as part of Yue, but weirdly it has a separate language code. Should be included as well?
Yongxun is a (quite recent) descendant of Cantonese spoken in the major towns and cities in the Pearl River with minor influences from the substrate Goulou varieties. Personally I don't think it should be a separate branch.
As I mentioned before, there are (at least) two distinct varieties of Yue spoken in Nanning, we currently call them Nanning Cantonese (under Yongxun) and Nanning Pinghua (under Goulou-Southern Ping). How can the two be distinguished if it is renamed to "Nanning Yue"?

– wpi (talk) 04:19, 29 March 2024 (UTC)Reply

@Wpi Thanks very much for responding. In response to your issues:

I don't know enough about Pinghua to answer, but I note that Wikipedia's Pinghua article asserts that Pinghua has been treated as its own dialect group, separate from Yue, in most textbooks and surveys written since the 1980's. As for dialect continuums, there are many places where different branches form dialect continuums with each other but are still separated. (As an example, Western Bulgarian forms a dialect continuum with Torlakian, which in turn forms a dialect continuum with (other varieties of) Serbo-Croatian. Serbo-Croatian is considered a Western South Slavic language and Bulgarian an Eastern South Slavic language; despite what the Wikipedia article on Torlakian says, it's more often considered part of Serbo-Croatian than Bulgarian.) Maybe User:Justinrleung or User:沈澄心 can comment? There's an additional issue that if we group Southern Pinghua with Yue, what do we do with Northern Pinghua?
Likewise I don't know enough about Yongxun Yue to have a firm opinion; in any case it seems like we won't have any lemmas in it, so whether we make it its own L2 or group it with some other L2 (which one? Cantonese or Goulou?) wouldn't make much difference.
I think this is only an issue if (1) we leave Yongxun as its own group and (2) we put Southern Pinghua under Yue. If Yongxun is e.g. grouped with Cantonese and Pinghua left as-is, the current names are fine. If both dialects get considered non-Cantonese Yue, then one solution is to clarify them as 'Nanning Yongxun Yue' and 'Nanning Pinghua Yue' or something.

Benwing2 (talk) 04:55, 29 March 2024 (UTC)Reply

I would prefer to have Southern Pinghua be kept as its own group separate from Yue. It seems that generally speakers of Southern Pinghua would call their varieties Pinghua, distinguished from Baihua (traditionally Yue varieties). The situation in Nanning is a case in point.
I don't have a strong opinion on whether Yongxun should be a branch. The Language Atlas of China does mention a few criteria for separating Yongxun out as its own branch, but it seems like those criteria are retentions rather than innovations (from a cursory glance).

— justin(r)leung _{{ (t...) | c=› }} 18:43, 20 May 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘There has been some discussions, and for reference this is our current categorization:

Gwangfu Yue (廣府片) / Yuehai Yue (粵海片): the "main" branch of Yue that contains Cantonese (廣東話), which is the dominant language (besides Mandarin) within the Yue Chinese lects. Our current approach is to group other (more recent) descendents as sub-branches of this branch.
1. Guan-Bao Yue (莞寶片/莞寶小片): contains Dongguan Cantonese (東莞話) which is genetically close to Cantonese but might be a bit hard to understand for Cantonese speakers because of the differences in phonology. Some classify it as a sister-branch of Gwangfu, but I think we prefer to group it under Gwangfu.
2. Yong-Xun Yue (邕潯片/邕潯小片): contains Nanning Cantonese (南寧白話). Again this branch is sometimes considered separate from Gwangfu.
3. Sanyi Yue (三邑小片): the Cantonese spoken in Sanyi (literally "three counties") is highly intelligible with Cantonese, but I want to group them together because they share the innovation that their Tone 4 ("light level") is particularly high.
4. Xiangshan Yue (香山小片): contains Shiqi Cantonese (石岐話).
Siyi Yue (四邑片): the second most famous branch of Yue that contains Taishanese (台山話). This branch is particularly distinct within Yue, and there should be no debate over the status of this branch.
Gao-Lian Yue / Gao-Lei Yue (高廉片/高雷片): (the Lian 廉 here refers to the River Lian 廉江, which is unrelated to the Lianzhou 廉州 below, which is 145 km apart.) this branch is a merger of the traditional categories Gao-Yang Yue (高陽片) and Wu-Hua Yue (吳化片). The brief reason for this merge is that Gaozhou Cantonese (高州白話, the Gao of Gao-Yang) is also sometimes classified with Wu-Hua Yue, so I think it's better to just merge the two branches. I chose this name because it was also used in earlier classifications for more-or-less the same span. This covers the Yue lects spoken in the Prefectures Yangjiang (陽江), Maoming (茂名), and Zhanjiang (湛江).
Qin-Lian Yue (欽廉片): this category has more-or-less stayed the same across different classifications, but there are also (scholarly) opinions that this is more a regional grouping instead of a proper genetic branch. The following sub-branches have also been proposed in a paper where Qin-Lian is challenged (where I have removed Qinzhou Cantonese (欽州白話) which we consider to be a descendent of Cantonese instead):
1. Lianzhou Yue (廉州小片)
2. Lingshan Yue (靈山小片)
3. Xiaojiang (小江小片)
4. Liuwanshan (六萬山小片)
Gou-Lou Yue (勾漏片): this category is also quite consistent, with the main distinguishing feature being that voiced stop initials in Middle Chinese tend to become unaspirated. It is also quite distinct among the Yue lects. This lect is primarily spoken in Gwangxi instead of Gwangdong.
1. Luo-Guang Yue (羅廣小片): this is the Gou-Lou Yue which is spoken in Gwangdong. It might be a misnomer because the Luo stands for the City Luoding (羅定) in the Prefecture Yunfu (雲浮), but there might be no Gou-Lou Yue spoken here.

(Notes for non-Chinese speakers: 片 = branch, 小片 = sub-branch, 話 = dialect.)

There are some remaining problems:

Where does the name "Cantonese belong"? Should the sub-branches of Gwangfu Yue also bear the label "Cantonese"?
I support using yue for the whole branch and yue-can for "Cantonese" proper.
How should we treat sub-branches? Should they have their own codes?
Should the names be A-B Yue or AB Yue?

I am also pinging the Chinese editors again for more opinions. (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): --kc_kennylau (talk) 14:24, 23 May 2024 (UTC)Reply

Note that the proposed tree above is solely proposed by Kenny, and certain parts of it lack any sort of substantial discussion.

I strongly disagree with the proposed "Gao-Lian"/"Gao-Lei" group, as it clearly includes at least two groups with vastly distinct phonological features: Wu-Hua (1) has a three way contrast with its voiced/implosive stops and (2) pronounces MC affricates (精 series) as dentals, while the Gao-Lei and Liangyang groups (1) only have a two-way contrast and (2) pronounce MC affricates (精 series) as affricates - among many other differences. Note that the reason why Wu-Hua is sometimes described as Gao-Lei (e.g. in Zhan Bowei's 廣東粵方言概要) is most likely due to the lack of data on Wu-Hua. I should also note that Wu-Hua is sometimes considered to be an incoherent group, but regardless that should not result in placing the entirety of Wu-Hua with Gao-Lei. As to the question of whether Liangyang is distinct or not, it seems to me that the arguments for a separate Liangyang group is stronger, especially because it has a tone system distinct from the surrounding dialects and an inflectional personal pronoun system for 1/2/3pl that is much more similar to Siyi.

Essentially, my view is identical to the divisions in Language Atlas of China (but not the classification of certain lects), with the exception of placing Yong-Xun under Guangfu (since the Yong-Xun "features" are also found in a lot of modern Guangfu lects or historical dictionaries/rime books, and it is well known that Yong-Xun is descended from Guangfu) and splitting out Liangyang from Gao-Yang (Yangjiang data is not mentioned at all in the Atlas!), and perhaps also splitting out Guan-Bao and Xiangshan (according to 廣東粵方言概要), but I am uncertain as to their position within the tree.

Moreover, it would be splitting hairs when we go for the subgroups (小片), as research is often lacking beyond first level groups (even if there is research being done, often there is only one work to reference from).

Some further comments:

I think the usage of "Cantonese" among Yue lects should be relatively liberal - the general rule would be to apply it to any Guangfu lect and any dialect described as 白話, e.g. Qinzhou, Gaozhou, Nanning.
Agree with the use of yue for the whole branch and yue-can for Standard Cantonese (i.e. what we are currently using yue for).
Regarding the use of hyphen, it should be present when the name is a combination of two names. Goulou is named after the mountain of Goulou, so there shouldn't be a hyphen.

– wpi (talk) 16:10, 23 May 2024 (UTC)Reply

Thanks, Kenny and Wpi. I generally agree with Wpi's points. Kenny's Gao-Lian/Gao-Lei should be at least two groups: Gao-Yang and Wu-Hua. I don't have a strong opinion on whether Gao-Yang should be split further. As for the structure of the tree, such as whether certain groups belong under certain groups, I feel like we can be agnostic and have them placed under Yue without thinking too much about the internal groupings; this would mean we could have Yong-Xun, Guan-Bao, Xiangshan, etc. as sisters to Guangfu unless we have really strong feelings about the grouping. Luo-Guang seems to be a very erroneous idea that we should not bother adopting at all. — justin(r)leung _{{ (t...) | c=› }} 17:38, 23 May 2024 (UTC)Reply

Indeed, I should have emphasized that the tree above is not final, and I only posted it here to attract more discussion. Thank you for bringing that up.

I will talk about the Gao-Lian/Gao-Lei group here first and leave the other points to later replies.

The "three-way contrast" is not as simple as it seems. The evolution of Middle Chinese stops in Wu-Hua is not consistent. According to 粤语“吴化片”商榷 (2016) by 邵慧君, Middle Chinese *b- became /pʰ/ in Wuyang, and in Huazhou it was distributed (irregularly) between /p/ and /pʰ/. Using Jyutdict I was able to verify this (see table below). Note how 婆 became /p-/ in Shangjiang and /pʰ-/ in Xiajiang, and 抱 is the other way round. According to the paper, *p- became /ɓ-/ in Wuyang just like in Huazhou, but even so, since *b- became universally /pʰ-/ in Wuyang, that would only be a two-way contrast. Of course, the "number" of labial plosives isn't the important point here, but rather "how" they correspond with Middle Chinese and with each other. The situation becomes even more complicated if we account for the influence of dominant languages in this area, and I believe that *b- > /pʰ-/ in Wuyang is the effect of Hakka.
In summary, if you take *p- > /ɓ-/ as the defining feature of Wu-Hua, then it fails because it is not universal (even though you might attribute the remaining lects that have /p-/ as Cantonese influence); if you take the evolution of *b- instead, then it also fails because it is inconsistent between the lects.
As for pronouncing 精 as dental, if you look at the map in 醉 in Jyutdict, you will find that indeed the four Wu-Hua languages recorded all have a dental /t-/. However, if you keep going up from there, you will find that the dental initials continue to Yulin (鬱林) of Goulou Yue, and then even to Wuzhou (梧州) of Gwangfu Yue. To the right, though disconnected, you will find that Taishanese and Kaiping (開平) of Siyi Yue also have a dental initial. Indeed, it is possible that the dental initial spread from Wu-Hua to Yulin, just like how the guttural "R" spread all throughout Europe. However, I don't see an argument of why it has to be genetic in Wu-Hua in the first place.
According to the paper, Li Jian (李健) said that "鉴江源出粤西信宜市北部山区,南流经信宜、高州、化州、吴川四市入海。......整个流域粤语不但极为相似,而且南北渐变的痕迹也十分明显。" (paraphrase: the dialects of Xinyi, Gaozhou, Huazhou, and Wuchuan form a continuum). I don't think this observation can be attributed to a "lack of data". While the dialect in Gaozhou seems to me to be highly similar to Cantonese, I did find that interestingly the character 坐 has an /-ɛ/ final in Gaozhou and also in the Wu-Hua lects.
As for the Liangyang group, I have not looked a lot into this, so I will take your side and assume that Liangyang should indeed form a group. However, this does not contradict with my proposed Gao-Lei group, where there can simply be a Liangyang sub-branch. I do wonder though how you view the "inflectional personal pronoun system" as you mentioned that is "much more similar to Siyi". Do you think Liangyang split off from Siyi, or do you think Proto-Cantonese had such a system that was lost in other lects, or do you think this feature arose by contact between Liangyang and Siyi?

Character	Middle Chinese initial	Tone Category	Zhanjiang (湛江)	Wuyang (吳陽)	Huazhou Shangjiang (化州上江)	Huazhou Xiajiang (化州下江)
巴	*p-	level (平)	/pa/	/pa/	/ɓa/	/ɓa/
怕	*ph-	departing (去)	/pʰa/	/pʰa/	/pʰa/	/pʰa/
皮	*b-	level (平)	/pʰei/	/pʰei/	/pɛi/	/pɛi/
婆	*b-	level (平)	/pʰɔ/	/pʰɔ/	/pɔ/	/pʰɔ/
抱	*b-	rising (上)	/pʰoɐu/	/pʰoɐu/	/pʰɔu/	/pɔ̯ɒu/
鼻	*b-	departing (去)	/pʰei/	/pʰei/	/ɓɛi/	/pɛi/
白	*b-	entering (入)	/pʰaʔ/	/pʰaʔ/	/ɓak/	/pak/

--kc_kennylau (talk) 19:53, 23 May 2024 (UTC)Reply

By the way, we have three Yue lects currently covered by zh-pron (see 五), which are Dongguan Cantonese, Yangjiang Yue, and Yulin Yue.^{(COI: I added them.)} Should we have language codes for these three varieties? Something like yue-dgx, yue-yjx, yue-ylx? --kc_kennylau (talk) 14:58, 25 May 2024 (UTC)Reply

(Addendum: we just removed Yulin Yue) --kc_kennylau (talk) 15:00, 25 May 2024 (UTC)Reply

(You mean in addition to the two lects that have been here longer, so actually a total of four Yue lects now.) — justin(r)leung _{{ (t...) | c=› }} 15:12, 25 May 2024 (UTC)Reply

Just to help me understand the "lay of the land", are there papers that specifically group the dialects traditionally classified as Gao-Yang and Wu-Hua together? If so, what is the name they use for such a grouping? (From the way this was described above, it feels a little original-researchy, which we don't want to do.) — justin(r)leung _{{ (t...) | c=› }} 15:20, 25 May 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ (cc @Benwing2) After more discussion, @Justinrleung and @wpi have mostly agreed with the following tree (the codes are added by me):

Guangfu Yue (廣府片) yue-guf
Guan-Bao Yue (莞寶片) yue-gub
Xiangshan Yue (香山片) yue-xis
Yong-Xun Yue (邕潯片) yue-yox
Siyi Yue (四邑片) yue-siy
Liangyang Yue (兩陽片) yue-liy
Gao-Lei Yue (高雷片) yue-gal (defined as Gao-Yang in the Atlas minus Liangyang)
Wu-Hua Yue (吳化片) yue-wuh
Qin-Lian Yue (欽廉片) yue-qil
Goulou Yue (勾漏片) yue-gol

I also mostly agree with this, but I would just like to note that Guan-Bao, Xiangshan, and Yong-Xun (and likely Gao-Lei as well) are descended from Guangfu, and the last four (Gao-Lei, Wu-Hua, Qin-Lian, Goulou) branches are more areal than genetic. From what I can gather, the reason this structure is preferred over a more nested one is because currently all the genetic relationships are still not clear, as Justinrleung explained above.

I also don't know if some of the above branches should have "~ Cantonese" as an alias.

--kc_kennylau (talk) 13:30, 26 May 2024 (UTC)Reply

Agree with the above list of groups. For Wiktionary purposes, we would simply treat all ten of them as direct descendants of Yue without being specific on their relationship. (yue "Yue" would be a family)

On top of these I think we should have the following full code:

yue-can, "Cantonese", equivalent to (some of) the current use of yue, parent yue-guf

and the following etymology codes:

yue-gzh, "Guangzhou Cantonese", equivalent to existing yue-gua, parent yue-can
yue-hkg, "Hong Kong Cantonese", equivalent to existing yue-HK, parent yue-can
yue-tai or yue-hsv, "Taishanese", equivalent to some of the existing zhx-tai, parent yue-siy

The "Cantonese" suffix could be applied to (dialects of) Guangfu, Guanbao, Xiangshan, Yongxun, and other "Baihua" varieties such as Qinzhou and Gaozhou, all of which are often considered to be related to Standard Cantonese.

– wpi (talk) 14:11, 26 May 2024 (UTC)Reply

Agree. --kc_kennylau (talk) 21:24, 29 May 2024 (UTC)Reply

Manipuri vs Meitei language

Latest comment: 1 year ago7 comments6 people in discussion

I propose we change it to Meitei as the language is predominantly spoken by the Meitei people. Meitei is not the only language indigenous to Manipur. There are other ethnic groups in Manipur who speak different languages. So there are many Manipuri languages, Meitei is only one of them. 178.120.0.250 10:40, 9 May 2024 (UTC)Reply

FWIW; this is about renaming what we call Manipuri to Meitei. I told the IP to come here, but in hindsight, perhaps WT:RFM would be a better venue.

At least the English Wikipedia seems to use Meitei as the primary name for the language. — SURJECTION ^{/ T / C / L /} 11:10, 9 May 2024 (UTC)Reply

Sure, btw you can call me 178 if you want. It's a bit more specific. 178.120.0.250 11:31, 9 May 2024 (UTC)Reply

Yes, WT:RFM is the usual place for discussions about renaming languages. —Mahāgaja · talk 13:34, 9 May 2024 (UTC)Reply

i oppose the proposition as it is unneeded; the rename request is unnecessary as it neither adds nor removes anything valuable. There aren't any active editors in the language, and if such a user comes up and finds problem with the name he will point that out naturally and the the discussion will be fruitful. Discussing over it shall only cause a wastage of time, given that in this case the current name is obviously not obstructive. Word0151 (talk) 14:42, 9 May 2024 (UTC)Reply

Support seems like Wikipedia already changed the name. Not that we need to match Wikipedia, but if they changed it and the only interested editors here wanna change it too... why not? — Sameer ^{﴾مشارکت‌ها・بحث﴿} 15:52, 9 May 2024 (UTC)Reply

FWIW:

Google Ngrams shows "Manipuri language" having about 4x the usage of "Meitei language" and over 12x the usage of "Meithei language" in the most recent year (2019).
Wikipedia says that "Meitei" is now used by most Western scholars, although it's sourced to a single source (Chelliah), so take it with a grain of salt.
Wikipedia says that Indian government sources and the Indian constitution call it Manipuri, which is probably easily verifiable.
Ethnologue calls it "Meitei".
Glottolog calls it "Manipuri".
"Meitei" is closer to the endonym for the language.
As for Wikipedia's name choice, this happened in 2016 or earlier, and there is debate on the talk page about whether to call it Meitei or Manipuri, with the people in favor of Manipuri claiming it is the common name in English.

Benwing2 (talk) 08:36, 13 May 2024 (UTC)Reply

Please help to sort out Scandoromani

Latest comment: 1 year ago1 comment1 person in discussion

See also: #Merger into Scandoromani

Lattjo dives! I have started to make some more Scandoromani and there are 4 main problems which i need to ask about advices before i can go on.

Problem 1. As far I understood, Tavringer Romani is Swedish Scandoromani, also known as Traveller Swedish. Tavring is not something exlusively Swedish, and we already have Traveller Norwegian. May it be a good idea to rename Tavringer Romani to Traveller Swedish? Anyway, it's almost no difference between TS and TN, so may it be even a better idea to merge them into one L2 (Scandoromani)? See also the same problem number 4 about Månsing.

orthographies are consistently different, which seems to be the case. - said Theknightwho once about this problem. But is it really a good reason?

Problem 2. More serious one. Some of my first editions on Wiktionary were in Scandoromani and then i was so dumb that i have not included sources on the most entries i've created. And now many of my sources are completely gone from internet. Now i remember that some entries - i don't remember which exactly - are not even from sources, but i've created them together with my former neighbor, an old drunk guy who spoke the language. I mean, i checked them in dictionaries and found them, but some of them not, and now i don't remember which one exactly, and some of the dictionaries are gone.

Dictionaries i remeber but can not find: an old web 1.0 Norwegian website with black background; an long English PDF with ugly monospaced font comparing Scandoromani and Kalo; a scan of an old Swedish book with big fat letters"

Problem 3. What is "Tavringer Romani terms in nonstandard scripts"-category? The script is unspecified, so why is this category coming up?

Problem 4. What to do with Rodi and Månsing? They are jargons of Swedish and Norwegian, so how we should refer to them? I use to refer to them as jargons, using code "sv" (Swedish), specifying that its also used in Norwegian. I hope it's ok to do so. Otherwise, we maybe need them as independent L2s.Tollef Salemann (talk) 19:42, 15 June 2024 (UTC)Reply

Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan

Latest comment: 8 months ago25 comments7 people in discussion

(moved from Wiktionary:Beer parlour/2024/August#Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan)

These changes would bring Wiktionary in line with the naming conventions of modern English scholarship, as found in for instance the Oxford Guide to the Romance languages (2016).

Context:

Francoprovençal has been the name used in French scholarship since the 1970's. Removing the older hyphen lessened the misleading impression that the language is some sort of secondary blend of French and Provençal (Occitan). There is also an element of typographical convenience.
Veneto has always been the name used in Italian scholarship, if I'm not mistaken, with Veneziano predominantly or exclusively reserved for the varieties spoken in Venice and environs, as opposed to the rest of the Venetan domain (Ve1, Ve3‒7).

Nicodene (talk) 22:05, 9 August 2024 (UTC)Reply

Support, the Venetan proposal in particular has been a long awaited change, and given a part of modern Anglophone scholarship handle this sensibly we have little reason to stay behind. Catonif (talk) 22:15, 9 August 2024 (UTC)Reply

Support. Never heard of Venetan but if this is the accepted term, so be it. Benwing2 (talk) 07:40, 10 August 2024 (UTC)Reply

Thoughts, @Apisite, IvanScrooge98, Samubert96, Sartma, Ultimateria, Urszag, Word dewd544?

(Active users who speak Venetan or have contributed to its entries.)

Nicodene (talk) 20:52, 13 August 2024 (UTC)Reply

Thanks for pinging me. I am pretty indifferent to the hyphen question for Francoprovençal, while I am not fully convinced about Venetan; after all, Venetia is the anglicized name for the region of Veneto (if the linguistic reasoning is to distinguish the specific dialect of Venice from the language as a whole). But if Venetan is now most common in English-language professional literature, then I don’t think there is much to debate. _{(parla con me)} 21:21, 13 August 2024 (UTC)Reply

The region's name occurs ~15 times more often in English as Veneto than Venetia, according to a Google search for “region of ____” (119000 results versus 7960). The latter occurs generally in historical as opposed to modern contexts.

Also at the moment we have no (reasonable) way to indicate a term used in Venice proper, as opposed to, say, Padua. A dialect label like Venetian would be identical to the name we currently use for the overall language (contra, as mentioned, the name used in linguistics). Nicodene (talk) 22:05, 13 August 2024 (UTC)Reply

Yeah, as I said, I get the reasoning. The thing is Venetian, despite being most commonly a word for stuff from Venice specifically, is not a strictly technical term like Venetan is—which is what comes to me a bit off given that this project is not directed to linguists but rather to the general public. And we could still label entries from the dialect of Venice as Venice, Venice dialect, Venice Venetian or something along those lines. But, again, it doesn’t mean I strongly oppose changing Venetian to Venetan. _{(parla con me)} 22:19, 13 August 2024 (UTC)Reply

The general public in Italy would be surprised to hear the dialect of, say, Padua described as veneziano. E.g. on Italian Wiki Dialetto padovano redirects to this page, where veneziano is mentioned solely as an external entity: “le parlate dei centri più importanti…sono state influenzate dal veneziano”.

So this is more about the general public of English-speaking countries, which isn't aware that such a language exists, as opposed to a local variety of (Standard) Italian. Nicodene (talk) 23:00, 13 August 2024 (UTC)Reply

Fair enough. _{(parla con me)} 23:09, 13 August 2024 (UTC)Reply

How do you pronounce "Venetan"? Benwing2 (talk) 23:20, 13 August 2024 (UTC)Reply

For me it's /ˈvɛnətən/ < /ˈvɛnətəʊ/ (≈Italian /ˈvɛneto/) + /-ən/. Nicodene (talk) 23:31, 13 August 2024 (UTC)Reply

@Benwing2: I would rather pronounce the term as /ˈvɛneɪtʌn/. --Apisite (talk) 10:49, 14 August 2024 (UTC)Reply

Support If we are not going to have separate h2 for the main dialect groups of the Venetan language, then we must go for Venetan. As @Nicodene said, Venetian is the dialect of Venetan spoken in and around Venice. For instance, Paduans, Vicentines and Trevisans speak Paduan, Vicentine and Trevisan respectively, not Venetian. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 15:27, 15 August 2024 (UTC)Reply

@Benwing2 Shall we go ahead, then? Nicodene (talk) 18:00, 22 August 2024 (UTC)Reply

@Nicodene I'm finally getting around to this. For reference, here is (I think) the correct way to rename a language (e.g. "Venetian" -> "Venetan"):

First, list all the categories in Wiktionary (this takes a little while as there are ~ 1,000,000 categories and the listing is only 5,000 per second). Then find all the categories containing the word "Venetian", e.g. using python3 list_pages.py --namespaces Category (it is not sufficient to use the prefix-listing functionality to list categories starting with "Venetian" because there are other categories with "Venetian" in it elsewhere than at the beginning). Use this list to generate a list of category renames to supply to a script such as my rename.py script.
Then, download the latest dump file from https://dumps.wikimedia.org/ (beware, it may be up to 20 days out of date) and search through it for all occurrences of 'Venetian' (e.g. like this: bzcat enwiktionary-20241001-pages-articles.xml.bz2 | python3 find_regex.py -e '^.*Venetian.*$' --all --stdin > find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1).
Then, change the name in the language module itself (e.g. Module:languages/data/3/v for 'vec' = Venet(i)an), then regenerate the code <-> canonical name caches by going to Module:languages/code to canonical name and clicking on the Update button.
Then, rename the categories containing the old name, using the script input created in step #1. You want to do this soon after renaming the language itself. It should follow the language rename rather than precede, so that when each page gets regenerated as it's renamed, the {{auto cat}} regeneration succeeds.
Then, rename the language in the header of the lemmas and non-lemma forms, e.g. like this: python3 rewrite.py --from '==*Venetian*==' --to '==Venetan==' --cats 'Venetian lemmas,Venetian non-lemma forms,Venetan lemmas,Venetan non-lemma forms' --diff --track-seen --comment 'rename Venetian language headers to Venetan per ]' --save > rewrite.venetan-venetian-lemmas-non-lemma-forms.venetian-to-venetan.out.1.save. This should follow the category renames so that e.g. the new categories don't end up in Category:Empty categories. Note that we loop over both "Venetian" and "Venetan" lemmas and non-lemma forms (the latter last) so that we get any terms that were regenerated and moved categories between this step and the previous one, or while this step is in progress.
Then, rename the language in references to it in various places (especially but not exclusively in translation sections), using the output of step #2 as a guide. To do this, download the pages containing the word "Venetian", something like this: python3 find_regex.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1) -e 'Venetian' --text > find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig. Copy the file, e.g. cp find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1. Edit the latter file appropriately to change all occurrences of Venetian to Venetan that need to be changed. Push the changes using e.g. python3 push_find_regex_changes.py --direcfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1 --origfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig --comment 'Venetian -> Venetan per ]' --diff --save > push_find_regex_changes.find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.out.1.save.

Benwing2 (talk) 05:53, 15 October 2024 (UTC)Reply

@Benwing2 Nice! Thank you for your work, this is a good day. :) Catonif (talk) 21:09, 15 October 2024 (UTC)Reply

@Catonif Thank you! @Nicodene I tried to find all the remaining instances of Venetian that should be changed to Venetan, but some I'm not sure about, e.g. the "Venetian" dialect of Italian (should that be "Venetan"? is this actually referring to the Venetan language?). The remaining instances are here: User:Benwing2/venetian-to-venetan Please look over them and change any pages needing changing. Thanks! Benwing2 (talk) 21:17, 15 October 2024 (UTC)Reply

@Benwing2 I went through that list, only a few needed to be changed, very well done! By the Venetian dialect of Italian, do you mean CAT:Venetian Italian? That's fine, it is the regional Italian of the city of Venice. Catonif (talk) 21:51, 15 October 2024 (UTC)Reply

@Catonif Thank you! Yes, I was referring to that category. Benwing2 (talk) 21:54, 15 October 2024 (UTC)Reply

Thank you. I had no idea the process was so complicated.

I’ve gone through the list and made one correction. The other cases were already addressed by Catonif. Nicodene (talk) 22:29, 15 October 2024 (UTC)Reply

@Nicodene Also, I'd like to get more input before renaming 'Franco-Provençal' -> 'Francoprovençal'. No one above commented on this change, and the Wikipedia article on the language (which has a hyphen in it) says this:

Although the name Franco-Provençal appears misleading, it continues to be used in most scholarly journals for the sake of continuity. Suppression of the hyphen between the two parts of the language name in French (francoprovençal) was generally adopted following a conference at the University of Neuchâtel in 1969; however, most English-language journals continue to use the traditional spelling.

Benwing2 (talk) 21:29, 15 October 2024 (UTC)Reply

It seems roughly 50/50 in English, judging by results from the last few years in Google Scholar. There doesn’t seem to be an official spelling in English, but there is one in both French and Italian (in both cases without the hyphen). The closest thing to an official English spelling that I could imagine is the one preferred by Oxford University, which is more or less the “capital” of anglophone scholarship in Romance Linguistics. Nicodene (talk) 22:49, 15 October 2024 (UTC)Reply

@Benwing2: In your edits here and here you changed Venetian to Venetan, citing this discussion. The thing is that you changed the names of external Wikimedia projects, which are still the "Venetian Wiktionary" and "Venetian Wikipedia" regardless of the spelling convention we use in our own entries. So I'm not sure those edits are worthwhile. Ioaxxere (talk) 23:06, 15 October 2024 (UTC)Reply

@Ioaxxere Oops, I didn't realize those are external links. Please undo them, thanks! Benwing2 (talk) 23:12, 15 October 2024 (UTC)Reply

OK went ahead and did this. Benwing2 (talk) 23:13, 15 October 2024 (UTC)Reply

Rename wca from Yanomámi to Yanomam

Latest comment: 10 months ago2 comments1 person in discussion

I suggest we rename wca Yanomámi → Yanomam.

Our current name for this language (Yanomámi) is extremely confusing, given that its close relative guu, which we call Yanomamö, is also commonly called Yanomami (with or without various diacritics). In addition, the langauge family to which both of these languages belong is also called Yanomami, even by us (cf. Category:Yanomami languages). (The accent mark on Yanomámi is irrelevant; it may be present or not in any of these uses, so it doesn't help in distinguishing one from the other.)

Current practice in the academic literature is to call wca Yanomam, avoiding this confusion. See Helder Perri Ferreira, Yanomama Clause Structure, page 6: 'To avoid confusion then, the following terms are used in this thesis: Yanomam = either refers to a language of the Yanomami family or to its speakers. It corresponds to what Ramirez (1994a: 35) called the “Oriental super-dialect of Yanomami” or “Oriental Yanomami” (Yor). Migliazza (1972: 34) calls this language “Yanomam” as well.' Glottolog uses the similar term Yanomám; see here. Jacques Lizot's work tentatively follows Migliazza and also labels the variety as Yanomam, as does the Endangered Languages Project; see here. 'Yanomam' seems by far the most common designation for this lect in the current literature; it would make sense to rename the language accordingly. — Vorziblix (talk · contribs) 14:21, 28 August 2024 (UTC)Reply

Since I am intending to do some work with this language in the immediate future, I’m going to go ahead and make this change now to avoid having to make many more changes down the line. If there end up being any objections to the move, we can still discuss and undo the change then if needed. — Vorziblix (talk · contribs) 13:27, 3 September 2024 (UTC)Reply

East Lechitic typology

Latest comment: 9 months ago9 comments3 people in discussion

Pings: (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg, Silmethule, Rakso43243, Skerillion): , @Benwing2 @Mahagaja, @PUC, @Thadh
Previous discussions:

Relevant Wikipedia articles:

In this thread I would like once and for all to try and determine what should be and what shouldn't be an L2 on en.wiktionary based linguistic, technical, and other criteria.

It's not secret that when dealing with dialect clusters and groups that it can be a headache to determine all of this.

When it comes to Lechitic, the West (Polabian)/North (Pomeranian)/East isn't even a strong grouping anyway; much of East Lechitic didn't even undergo the so-called Lechitic ablaut (some linguists argue that it was later levelled, some argue it never took place), and Old Polish, like many other "Old" languages, is not a single language, but rather a group of dialects with varying phonological features and changes that can be shown to go to a single etymological form, even if that form wasn't omnipresent across all the lects it represents (for example Masuration is a very early change).

The lects in question are Silesian, Masurian, and Goral.

In terms of linguistics, Silesian doesn't differ from other dialect groups as much and shares much in common with Greater Polish and Lesser Polish. However, it has undergone a huge standardization recently, and the socio-linguistic aspect of all this cannot be ignored, either. In terms of technical aspects on Wiktionary, there's not much that it needs that is special, to be honest, but I feel its status as an L2 is fairly safe. I mention this for later points and for context. Mutual intelligibility between Silesian and Polish can vary vastly - depending on the vocabulary used it may be intelligible or not, typical of other Slavic languages.

Masurian was split initially for being incredibly divergent from Polish. It shares a fair amount with some neighboring dialects such as Kurpian, however, to a much greater extent, and mutual intelligibility between Masurian and Polish is limited. Even when using more common vocabulary, it can be difficult to understand, and also a large number of everyday terms differ either by etymology or by a significant number of phonemes. I feel the Appendix:Masurian Swadesh list demonstrates this well (Appendix:Polish Swadesh list]] for reference). As far as the orthography goes, Masurian is a not widely spoken lect, so levels of normalization within the culture are not high, but also its daily usage is not either. It could be possible to normalize to a Polish orthography with a few additions (namely áéóôû, which we are going to need for other dialects anyway. an explanation can be found at w:Dialects of Polish). In terms of technical aspects, many Masovian dialects, such as Kurpian, might need similar support, such as a different declension module, as many more consonant alternations exist due to the decomposition of soft bilabialis (i.e. budowa > budozie). Its status as an L2 is debatable.

Goral sits in between Silesian and Masurian in most regards. Culturally, it is one of the most spoken dialect groups (itself being a dialect group WITHIN the Lesser Polish dialect group, but the number of differences between dialects here is smaller than between other dialects within a dialect group) and its mutual intelligibility is much like that of the relationship between Silesian and Polish. Depending on the vocabulary used as well as the "thickness" of the speakers accent, mutual intelligibility can vary wildly. In terms of orthography, pagenames would differ about as much as some other dialect entries. What I mean is that in Middle Polish you had so called "slanted-vowels) (áéó) which all developed differently in different dialects, as well as w:Masuration. Goral dialect would be spelled on the whole very similarly to other Lesser Polish dialect words, so lekarz would be lykorz for both groups. In terms of technical support, it would also need new declension templates, but it could be handled using most of the same infrastucture as the rest of other Polish dialects. However, one big difference is many Goral dialects have initial stress, which stands in huge contrast to the rest of East Lechitic, which is penultimate.

Solutions:

Split all. Keep Silesian and Masurian split and split Goral as well, setting it as a descendent of Old Polish.
Status quo. Keep Silesian and Masurian split and do not split Goral.
Remerge Masurian. Silesian remains an L2, and Masurian and Goral would be dialects of Polish.
Remerge all.

I personally can see the first three options, or more specifically options 1 or 3. I'm strongly against merging Silesian, and I suspect most people here would be as well, but I am placing the option here for the sake of completeness. I have already set Polish dialects as LDL's on WT:About Polish, so questions of attestation can be put aside.

I am opting to leave out anything about Old Polish and Middle Polish here. Vininn126 (talk) 12:56, 1 September 2024 (UTC)Reply

I would prefer option 3. Almost no language is homogenous, and we can't endlessly split, we need to stop somewhere; I think written language is the most important thing for languages in (western) Eurasia: I'm pretty sure an average Masurian speaker will not see Standard Polish as a language separate from the one they write in day-to-day, and will have little problem to encode their variety in written Polish to a satisfactory degree. You can write a word like ony and pronounce it as /ónÿ/ without much of a problem. You can write /ôwtén/ as owten (which is probably attestable by the way!) and show you're a dialectal speaker. Just in the same way Finnish speakers write Finnish, Scots write Gaelic and Italians write Italian. Thadh (talk) 13:29, 1 September 2024 (UTC)Reply

Second idea: (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg, Silmethule, Rakso43243, Skerillion): @Benwing2, @PUC, @Thadh How would you feel about having etymology codes for the major dialect groups? We have already one for Middle Polish which has been very useful. I could see it being very useful for having for example pl-GP for Greater Polish, pl-LP for Lesser Polish, pl-MS (or something similar) for Masovian, maybe pl-BOR or something for both Borderlands (but I'm not sure we need that one) and also potentially pl-gor for Goral. This would be very useful for etymologies as each group has different tendencies for borrowing and its relation to other languages, such as some Greater Polish dialects having some vocab in common with Kashubian, for example. Vininn126 (talk) 10:21, 12 September 2024 (UTC)Reply

Only for those that have an actual demonstrably significant number of borrowings into another language that set them apart from Standard Polish or other groups. Thadh (talk) 10:26, 12 September 2024 (UTC)Reply

This would be fairly easy to do if we consider dialectal borrowings - (dialectal) Prussian German often borrowed from Masovian dialects, Slovak dialects often borrowed from Goral/Lesser Polish. Greater Polish most assuredly gave certain words in dialects of Kashubian. I'm fairly sure we could find examples of non Standard Polish words for each, and the given lects mentioned are unlikely to have borrowed from other dialect groups. Vininn126 (talk) 10:31, 12 September 2024 (UTC)Reply

@Vininn126 I just received your "Second idea" ping: 4 days late. No objections to adding etym codes for the major dialect groups, but they should follow the standard etym code notation, hence pl-gre for Greater Polish, pl-les for Lesser Polish, pl-mas maybe for Masovian, pl-bor for Borderlands, pl-gor for Goral. Benwing2 (talk) 22:44, 15 September 2024 (UTC)Reply

And I assume you are for option 3 in the first. Vininn126 (talk) 05:21, 16 September 2024 (UTC)Reply

Yes, not strongly though; I trust whatever you think is best. Benwing2 (talk) 05:28, 16 September 2024 (UTC)Reply

Okay, I think everyone who's going to say something has said their piece. I have tried asking everyone for their opinion. The decision is: Remerge Masurian, don't split Goral, and don't Merge Silesian. Greater Polish, Lesser Polish, Masovian, and Goral will get their own etymology codes. I can implement this starting this week. Vininn126 (talk) 17:48, 25 September 2024 (UTC)Reply

Paraguayan Guaraní (again)

Latest comment: 9 months ago7 comments3 people in discussion

Guaraní is a mess. Its problems include a broken pronunciation module, dozens of conjugation templates with no documentation of what they are for and a complete lack of references, but the worse one are the language codes. Currently, we have codes for both Guaraní (gn) and each one of its "varieties" — Chiripá (nhd), Classical Guaraní (gn-cls), Eastern Bolivian Guaraní (gui), Mbyá Guaraní (gun), Paraguayan Guaraní (gug) and Western Bolivian Guaraní (gnw) — and all of these are treated as distinct languages with their own L2 heading, which raises the question: if we have a heading for each variety, what the even is Guaraní? Looking through the lemmas, it seems to be a duplicate of Paraguayan Guaraní, an issue that has already been addressed seven(!) years ago, with no consensus in changing anything. Also, Classical Guaraní is currently listed as a descendant of Guaraní and a sister language of Paraguayan Guaraní, which is not ideal.

My proposal is:

Making gn a family code, similarly to Tupi-Guarani (tup-gua), putting Classical Guaraní as the ancestor of Paraguayan Guaraní and moving everthing from the Guaraní L2 to Paraguayan Guaraní.
- The position of the ancestor is still not clear to me, though. To my understanding, what Wiktionary calls Classical Guaraní is the language used in the 17-18th century Jesuitc missions of Paraguay, Argentina and South Brazil. It's the ancestor of Paraguayan Guaraní for sure, but its relation to Mbyá and Chiripá is not well explained, and authors just calling everything "Guaraní" doesn't really help...
Another way would be doing the opposite: merge everything into gn, make Classical Guaraní its ancestor and use {{lb|gn|x}} for the different varieties. This would be specially counterproductive because we would end merging Mbyá and Paraguayan, and they certainly aren't the same language. The problem is aggravated with Mbyá having a different spelling that uses X instead of CH.

Taggin' the only active Guaraní editors I know @RodRabelo7, Ovey 56 and @Theknightwho who seemed interested :p. Trooper57 (talk) 17:27, 13 September 2024 (UTC)Reply

Thanks! Finally someone spoke about it! Yes, the Wiktionary pages on Guarani are certainly a mess, but I'd say I liked more your first proposal, since the Guarani varieties are already considered by some as different languages.

Just some things on the language used by the Jesuits in their missions, it was its own language, just like the Jesuitic Nahuatl (I don't remember the language's official name).

I've already wanted for so long for Guarani be recognized as a group of languages than a languages with so many different dialects and not only for the differences in their vocabulary, pronunciation and integibility with one another, but because the contemporary Guarani peoples do not consider the group of varieties as a single language.

I totally agree on editing the pages to show they are different languages, as well as changing the automatic name that pops up when the code "nhd" is used. It should be either "Nhandeva", "Yandeva" or "Nandeva", since "Chiripa" is an outdaded term that some Yandeva people consider derogatory/insensitive. Junior Santos (talk) 13:17, 14 September 2024 (UTC)Reply

Interesting, so Classical and Paraguayan Guaraní were actually spoken at the same time, with the first being like a "formal" version used by the Jesuits?

And I think the categories were created when these names were still in use lol, most of the Tupian languages have been left untouched for years. The Kaapor don't seem fond of "Urubu", too. Trooper57 (talk) 14:52, 14 September 2024 (UTC)Reply

Also pinging @Rodrigo5260 who commented on the issue on Discord. Trooper57 (talk) 14:54, 14 September 2024 (UTC)Reply

Thank you, Trooper57, for pinging me into this discussion. First of all, I would like to mention that I have indeed noticed this mess with the Guarani entries. I have worked on some (Paraguayan) Guarani entries, and from my experience, almost all Guarani (gn) entries are actually Paraguayan Guarani (gug). However, since it's more common to see just Guarani, I opted to record them that way... I agree with the question: if we have a code for each variety, what is actually Guarani? I must admit that I only know the differences between (Paraguayan) Guarani, the Mbyá, and the Kaiwá (to which I recently added some entries, such as yrygwasu). I am less familiar with the other varieties. Regarding Classical Guarani (I prefer the term Old Guarani, by analogy to Old Tupi), this is the origin of (Paraguayan) Guarani, Mbyá, and Kaiwá, at the very least. Old Guarani is to these varieties what Old Tupi is to Nheengatu, for example. I also note that I have created the very first entries for Old Guarani, such as cabayu and ĭgaratá. What to do? I'm not sure yet, but I would like others to share their ideas. By the way, it would be interesting if we could gather at least one dictionary for each variety to get a better idea of what we are dealing with. I have a dictionary for (Paraguayan) Guarani, Mbyá, Kaiwá, and, of course, the Montoya's vocabulary on the so-called Classical Guarani, Tesoro de la lengua guaraní. RodRabelo7 (talk) 04:05, 15 September 2024 (UTC)Reply
Oh, and I'd support removing the diacritic from "Guaraní". "Guarani" is way better... RodRabelo7 (talk) 04:08, 15 September 2024 (UTC)Reply

About the last part, I haven't found any dictionaries yet, but there's some Eastern Bolivian Guaraní vocab in this pdf by UNIBOL Guarani. Trooper57 (talk) 16:22, 15 September 2024 (UTC)Reply

Add Guachí

Latest comment: 9 months ago3 comments2 people in discussion

Guachí is an extinct language known to have been spoken in Argentina in the 19th century; the only record is a word list of 145 words, from 1845. Apparently, it's usually classified as Guaicuruan, but WP says the data is insufficient to demonstrate that. For reference, we already have Appendix:Guachí word list. Theknightwho (talk) 14:18, 17 September 2024 (UTC)Reply

Hi, in the future I'd recommend not adding a language even if you want to, but no one replies to your suggestion to add it in 10 days. In general you need at least one other person to look over and agree with your suggestion. Please don't take silence as consent. In this case you should have pinged User:-sche, who can give you thoughts. I'm personally a bit skeptical as to whether a single word list is enough data to indicate even that it's a separate language as opposed to either a dialect of an existing language or a mishmash of randomly collected words. Benwing2 (talk) 10:19, 28 September 2024 (UTC)Reply

Same thing goes for Kalašma, which you recently added with a similar "silence = consent" assumption. Benwing2 (talk) 10:20, 28 September 2024 (UTC)Reply

Changing the canonical name of `kla` from "Klamath-Modoc" to "Klamath"

Latest comment: 8 months ago5 comments3 people in discussion

Wiktionary's canonical name for the language kla, spoken by the Klamath and Modoc peoples, is currently "Klamath-Modoc", which reflects the fact that the two peoples spoke different dialects. I propose that it be renamed "Klamath", which is the name that sources discussing the language predominantly (though not universally) call it.

The Klamath Tribes themselves call the language "Klamath". (The Modoc Nation could conceivably have a stake in the language being called "Klamath-Modoc", but I can't find any references to the language by name on their website.)
Most of the academic literature I can find about the language identifies it as "Klamath". In particular, the works of Albert S. Gatschet and M. A. R. Barker, who each produced by far the most extensive and most cited documentation of the language, call it "Klamath".
- The search string "Klamath language" yields significantly more results in both Google Scholar and JSTOR than the string "Klamath Modoc language".
The English Wikipedia article for the language has been titled "Klamath language" since 2011. Also, almost all sources in that article's bibliography refer to the language as "Klamath".

(In the interest of a fully informed discussion, it's worth noting that the following sources use the name "Klamath-Modoc": SIL International, Ethnologue, Glottolog, OLAC, and the California Language Archive.)

— Äþelwulf (talk) 20:56, 24 September 2024 (UTC)Reply

Is there anything I can do to elicit input on this matter? — Äþelwulf (talk) 20:19, 15 October 2024 (UTC)Reply

@Athelwulf Maybe ping User:-sche, who is often involved in these discussions? -sche, can you ping anyone else who you think might have relevant comments? Benwing2 (talk) 21:20, 15 October 2024 (UTC)Reply

BTW the fact that both Ethnologue and Glottolog use the name "Klamath-Modoc" is significant, although not decisive. Benwing2 (talk) 21:22, 15 October 2024 (UTC)Reply

You are right that "Klamath" is the more common term, and although it is hard to be sure how many uses of it mean the language and how many mean the dialect ("Klamath-Modoc" is arguably clearer about the scope), probably our preference for using the most common name should lead us to use Klamath here.
It is interesting that there are almost no uses of the native name. ("Klamath" is derived from the Upper Chinook designation for all the natives of the Klamath River Basin, including the Klamath and Karuk and Shasta and Yurok — Modoc is at least a Klamath-Modoc word for that variety — and Victor Golla, in California Indian Languages (2022), page 135, notes that after "Gatscher used 'Klamath' as the specific ethnographic name for the Indians of the reservation on Upper Klamath Lake and for their dialect of Klamath-Modoc, this usage soon became standard among anthropologists there was reluctance, however, to extend the term to the Modocs, who had been treated as a separate tribe since the Modoc War of 1872-1873 and their subsequent removal to Oklahoma.") - -sche (discuss) 21:26, 21 October 2024 (UTC)Reply

Ancestor of Azerbaijani

Latest comment: 6 months ago1 comment1 person in discussion

Hello, I wrote wiktionary articles in Azerbaijani written in the Azerbaijani Abjad (Turco-Perso-Arabic alphabet), but some other Azerbaijani users cancel all my edits on the pages, because they are "too old for Azerbaijani". The question is related to the constant rollbacks of information from articles written in the Azerbaijani Abjad alphabet, I constantly encounter these restrictions that they write "this word does not exist in modern Azerbaiani". This is due to the fact that the ancestor of the Azerbaijani language is not defined in Wiktionary, or rather it is defined as Old Anatolian Turkish, but this is too ancient an ancestor. For comparison, in the Turkish language (of Turkish Republic) the ancestor is indicated as the Ottoman language and then the old Anatolian Turkish, this is logical, Ottoman Turkish was used until 1920s. This completely solves the problem in the case of the Turkish language (of Turkish Republic). At the same time, there is no solution to this problem for the Azerbaijani language - the ancestor of the Azerbaijani language is indicated in wiktionary as Old Anatolian Turkish, which was used until the 14th century at the latest. Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 20th century (according to various sources, modern Azerbaijani can begin in 1922-1923, when the USSR occupied Azerbaijan, or in 1928, when the USSR translated the Azerbaijani language into latin alphabet) — Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 1920s. However, historically, the ancestor of Azerbaijani was considered as Ajami Turkish (trk-ajm, "Turkish of Persia" and was language of Qajars, Afshars, Qizilbashs, Qashqayi, Afshar etc, it is also ancestor for Iraqi Turkmen and Sonqori languages, also possible for Khorasani Turkish and Khalaji languages, For example, In book The Turkic varieties of Iran , Christine Bulut says (page 406) that written language for theese language was Ajam Turkic since 16th century. It is a good term). I could write Azerbaijani articles written in the Abjad alphabet within this language so as not to encounter restrictions, but as I understand it is not possible at the moment. Please help me with this issue, since I have a lot of literature and I want to create pages indicating these words, but I encounter restrictions from other users.

At the moment Azerbaijani language page says that Azerbaijani language comes from:

Proto-Turkic

Proto-Oghuz

Old Anatolian Turkish

but it should be

Proto-Turkic

Proto-Oghuz

Old Anatolian Turkish

Ajami Turkish

Please, create the language Category for this language Ajami Turkish (https://www.wikidata.orghttps://dictious.com/en/Q110812703) and make it ancestor it for Azerbaijani language. It will look like this: Azerbaijani language comes from Ajami Turkish (trk-ajm), which comes from Old Anatolian Turkish:

m = {

"Ajami Turkish",

110812703,

"trk-ogz",

"fa-Arab",

ancestors = "trk-oat",

entry_name = { = "ar-entryname"},

}

Sebirkhan (talk) 19:31, 8 December 2024 (UTC)Reply

January 2025

Etym-codes for recensions of Church Slavonic

Latest comment: 4 months ago8 comments5 people in discussion

Branch from: Church Slavonic and Moravian.

Recently Church Slavonic (zls-chs) was created as L2. Everything is fine, but Church Slavonic (CS) is divided into "dialects" (recensions or redactions), the spelling of which is very different in places. There is obviously a need for etymological codes for different variants of CS. The most famous recensions of CS are Croatian, Serbian and Russian. But in reality there are more. What are your suggestions, what can be done in this situation? What codes could be created? AshFox (talk) 09:00, 11 January 2025 (UTC)Reply

I recently studied the situation in the East Slavic variants of CS. Often the East Slavic variant of CS is simply called "Russian Church Slavonic" (RuCS), but this is a very, very simplified term. Because the East Slavic variant of CS that existed in the times of Rus (around 988‒1450) is very different from RuCS that exists now, which is very developed and whose spelling is extremely different from the archaic spelling of the times of Rus. The rules for using letters and spelling in modern RuCS can be found, for example, in Смирнова А. Е. (2024), Церковнославянский язык в таблицах. For example, the Greek name "Xenia" in modern RuCS orthography is written as Ѯе́нїѧ (Ksénija), while in the times of Rus it would have been *Ѯениꙗ (*Ksenija) < Gr. ξενία (xenía). Another significant difference is the presence of the reduced ъ (ŭ) / ь (ĭ) in RuCS during the Rus times and their complete absence now. And so on... In general, there are many differences between the modern RuSC and the RuSC of the Rus times, which does not allow us to perceive the Eastern Slavic version of CS from 988 AD to the present day as one "Russian Church Slavonic".

Recension of CS used in the times of Rus is called "ru:Древнерусский извод церковнославянского языка", which is roughly literally "Old Russian Church Slavonic". But the term "Old Russian" is now obsolete, so I propose calling it "Old East Church Slavonic" with code (zls-chs-orv), by analogy with Old East Slavic (orv). The term "Old East Church Slavonic" is already used on Wiktionary about 230+ times. "Old East Church Slavonic" will designate CS entries of words that are not Old Slavonic and at the same time have characteristic East Slavic features (ru:Древнерусский извод церковнославянского языка#Характеристика).
Afterwards, the "Old Moscow Church Slavonic" (ru:Старомосковский извод церковнославянского языка) comes from it, which was used in Muscovy approximately in the period 1450‒1650. I don't know, maybe it shouldn't be singled out separately, but considered as part of the next stage ‒ from 1650 to the present day ‒ ru:Синодальный извод церковнославянского языка : "Russian Synodal Church Slavonic" (also called "Новомосковский извод церковнославянского языка", literally "New Moscow Church Slavonic"). This modern version of Russian Church Slavonic is very developed, there is :a lot of literature and dictionaries on it, for example {{R:cu:Dyachenko:1900}}, {{R:ru:STsSRJa}} or "Большой словарь церковнославянского языка Нового времени". (Different spelling norms and forms of words do not allow the modern Synodal Church Slavonic to be united into one whole with СS CS used in the times of Rus ‒ "Old East Church Slavonic".) I propose to name this particular variant as "Russian Church Slavonic" with code (zls-chs-ru), from 1650 ‒ present day. Or you can call it "Russian Synodal Church Slavonic" or "Synodal Church Slavonic".
Also, after itself "Old East Church Slavonic" left not only "Old Moscow Church Slavonic" (> "Russian Synodal Church Slavonic"), but also Church Slavonic language on the territory of modern Belarus and Ukraine. There is very little information about it, but it is often called "Ruthenian Church Slavonic" (ru:Украинско-белорусский извод церковнославянского языка), which is also called "Ukrainian Church Slavonic". For completeness, can add him etym-code.

The tree part looks like this:

─┬ Church Slavonic (zls-chs)
 ├┬ Old East Church Slavonic (zls-chs-orv)
 │   ├──── Russian Church Slavonic (zls-chs-ru)
 │   └──── Ruthenian Church Slavonic (zls-chs-rt)

AshFox (talk) 11:50, 11 January 2025 (UTC)Reply

Sounds sound. Fay Freak (talk) 12:57, 11 January 2025 (UTC)Reply

Support. Etymology codes are an easy way to add precision without too much complexity. Vininn126 (talk) 12:59, 11 January 2025 (UTC)Reply

In the future, I think it would be desirable to have codes like this (names and codes themselves can be clarified/changed):

Czech-Moravian Church Slavonic (zls-chs-cs)
Bulgarian Church Slavonic (zls-chs-bg)
Macedonian Church Slavonic (zls-chs-mk)
Serbian Church Slavonic (zls-chs-sr)
Croatian Church Slavonic (zls-chs-cr)
Wallacho-Moldavian Church Slavonic (zls-chs-ro)

AshFox (talk) 18:43, 11 January 2025 (UTC)Reply

This division is slowly devolving into the tumultuous state of affairs from the times of Belić and Mladenov. I struggle to see how one can comprehensively differentiate between all of these renditions? Are we going to follow linguistic, historical, or geographical criteria?

For example, what label one should give, say, to Gregory Tsamblak writings? Did he write in Bulgarian, in Serbian, in Wallacho-Moldavian, or in Ruthenian ChSl.? After all, in different periods of his life, he worked in different places.

Or what label should be given, e.g., to Didactic gospels? Its author is Constantine Preslavsky, but all surviving copies were written in Medieval Ruthenia?

IMO, if specification is required, just write the concrete source of the word. It will save us the hassle of splitting hairs.

All in all:

Support for renditions that can be localized (Czecho-Moravian, Croatian, Rascian) and

Oppose for the rest. Безименен (talk) 10:25, 19 February 2025 (UTC)Reply

PS Bulgarian/Macedonian Church Slavonic is split into Literary Schools and into different time periods. It is misleading to clump them all together. For example, a text written by Preslav Literary School would have more in common with Czecho-Moravian from the same time period, rather than with the Tarnovo Literary School which emerged after from XII cent. Безименен (talk) 10:41, 19 February 2025 (UTC)Reply

Support. Chihunglu83 (talk) 21:45, 14 January 2025 (UTC)Reply

Old Albanian

Latest comment: 2 months ago9 comments4 people in discussion

Discussion moved from Wiktionary:Beer parlour/2025/January#Old Albanian.

For the purpose of helping users understand the Albanian language and its history, I believe there should be a new and separate language added to Wiktionary: Old Albanian. The Old Albanian language is considerably different from modern Albanian and having both would better help in studying the Albanian language. I think that if there are both Armenian and Old Armenian as separate languages, as well as many others, Old Albanian, too, should be added as separate. It is simply impossible to include such differing eras of the Albanian language under simply "Albanian", which includes modern Albanian of all dialects, as well as Old Albanian from over 500 years ago. The language has evolved considerably in phonology, forms, lexicon, and much more.

A single sentence in Old Albanian has become almost alien to modern Albanian speakers, although not entirely. Take, for example, Old Albanian, "Ënço, tyy të lusmë, Zot, të mujtunitë tat, e eja përse ti tue klenë të lutunitë tanë e të shpëtuomitë tanë na të jemi të denjë ën kësi perikuli të kuatëvet tinëve na me klenë dëlirunë e shelbuom." Shqypëtari (talk) 21:27, 14 January 2025 (UTC)Reply

If so, in view of likeness of attestation situation, the comparison should be made with Old Lithuanian, rather than Old Armenian. Fay Freak (talk) 21:31, 14 January 2025 (UTC)Reply

If anyone has the knowledge/time to critically analyze Old Gheg works of Buzuku (1555), Budi (1618–1621), Bogdani (1685), and the Old Tosk writings by Matranga (1592) and Variboba (1762), why not? I am just concerned it would be unsourced and turned into a huge mess. The initiator would also need to propose a way of spelling normalization. — This unsigned comment was added by Chihunglu83 (talk • contribs) at 14:13, 28 February 2025 (UTC).Reply

I think it's best to write it in the modern Albanian alphabet as the sounds are identical, as far as I know. But I'm not really used to wiki and maybe we can use the script used in its earliest attestation, or use the script it was most commonly written in. However I think that would be a bit messy and so I'd say use the modern Albanian alphabet. Shqypëtari (talk) 16:11, 17 April 2025 (UTC)Reply

Also I'm not sure if this is possible but would we be able to make it a language where sources are required before creating entries? Shqypëtari (talk) 20:22, 17 April 2025 (UTC)Reply

Yeah, just write it as you think it to be modern, if the equivalent is sufficiently certain (probably is not if from Ottoman Turkish alphabet at least).

The driveby editors don’t follow our taste in sourcing standards anyhow and splitting languages would give leeway to double standards, so it is easier to stress best practices from the perspective of modernity. But this is my opinion not knowing Albanian, I just know the kind of people who edit here. Fay Freak (talk) 21:15, 17 April 2025 (UTC)Reply

Hi Shqypëtari! :) Unfortunately my first time greeting you is an

opposition. I'm glad you're interested in contributing to the historical side of the Albanian language, but I believe the best way to do it is under a single header. I don't think a split as this one would make sense neither in theory nor in practice.

In theory, the claim of the language being considerably different stands only for the works of Buzuku and perhaps Matranga, which are indeed pretty wacky, but the following 17th c. works are much closer to the modern language. The sentence you give neatly shows the peculiarity of Buzuku's language, but it does not well represent the rest of what I assume you would like to treat as Old Albanian, notably Budi, Bogdani and Bardhi. There are modern-day varieties with a considerably higher discrepancy in phonology, forms, lexicon, and much more from the modern literary language than these authors are.

And in practice, as well, I am of the idea that building a robust infrastructure and editing habits that allow us to document the many dialectal and historical nuances of the language is a much preferrable option than dividing information into different places. I have been adding alternative forms and quotes from the early authors under the Albanian header, and nothing of this process has been hindering, IMO, the efficiency of how the information is conveyed. On the contrary, the entries under the main header are enriched, and it is easier to see all in one place the various phonetic correspondances between historical attestations and modern dialectal forms, as well as semantic evolution. Catonif (talk) 21:31, 18 April 2025 (UTC)Reply

Hello, thank you for replying. I'm not too familiar with the works of Budi, Bogdani, and Bardhi, and I think the main idea for me was that Old Albanian should be focused more on the older works, like in Meshari. If you think there are better alternative ways of treating Old Albanian, I understand. My main goal is really trying to add more detail to Albanian etymologies. For now, though not as often, I've been adding the forms seen in Buzuku's work under certain words, either as alternative forms or mentioning them in the etymology. Are there any example entries where I could see how you added quotes? Again, thank you. Shqypëtari (talk) 22:09, 18 April 2025 (UTC)Reply

@Shqypëtari Yes, I can get behind why Buzuku feels somewhat separate, there is a good number of books and papers dealing with its language and grammar alone, however it is not a good idea to have an entire L2 for a single work. These archaic forms can be added under ==Alternative forms== as well as ==Etymology== as you have been doing (even though personally it is my habit to keep the etymology section as clean as possible from the attested forms already mentioned in the altform section above). As for quotes, the parameter |norm= is very useful to bring the orthography to a more readable scheme, as examples from Buzuku: thërrmijë, zdryp, mnerë, asgjë. Catonif (talk) 13:30, 19 April 2025 (UTC)Reply

So generally if we are creating a historical word that did not survive till now, should we put that word under the same header but marked obsolete or maybe better to use head=* ? What would be the best way to handle it? Words such as mamës, nasip are currently unmarked (while marked with * on DPWA) and it would really cause confusion for Albanian learners... Also probably it needs to be in a separate category.

"Old Turkic" and "Bulgar"

Latest comment: 4 months ago57 comments4 people in discussion

See Wiktionary:Beer_parlour/2025/January#"Old"_and_"Orkhon"_Turkic,_plus_some_more for the discussion leading up to this

I request the creation of these following language headers:

Proto-Bulgaric
Danube Bulgar
Volga Bulgar
Orkhon Turkic
Ajem Turkic (see also the prior discussion Wiktionary:Language_treatment_requests#Ancestor_of_Azerbaijani)

AmaçsızBirKişi (talk) 10:06, 25 January 2025 (UTC)Reply

@Benwing2 can u help with this? Zbutie3.14 (talk) 01:50, 15 February 2025 (UTC)Reply

Are these new L2 languages or etymology variants of existing L2 languages? The former require a lot more consensus than the latter. Benwing2 (talk) 02:06, 15 February 2025 (UTC)Reply

I think they are etymology variants. We already have the tags and , is an etymology tag that deals with Bulgaric languages and , excluding , us having the ability to distinguish these would be nice. is also an etymological language, for and .

AmaçsızBirKişi (talk) 14:15, 15 February 2025 (UTC)Reply

Also there are these 2 siberian turkic languages with no code

https://en.wikipedia.orghttps://dictious.com/en/Soyot_language

https://en.wikipedia.orghttps://dictious.com/en/Dukhan_language Zbutie3.14 (talk) 21:52, 15 February 2025 (UTC)Reply

What is proto-bulgaric for? Isn't trk-ogr what we use for oghuric? Zbutie3.14 (talk) 15:17, 15 February 2025 (UTC)Reply

It's mainly for distinguishing from which stage of the Bulgar did loanwords into other languages derive from., at least in theory.

Mongolic loans from Oghur branch would separate from Oghuric, but Hungarian/Church Slavonic loans would separate from Bulgaric for instance:

Oghur
- Mongolic (borrowed from Oghur)
- (...) (borrowed from Oghur)
- Proto-Bulgaric
  - Church Slavonic (borrowed from Bulgar)
  - Hungarian (borrowed from Bulgar)
  - (...)
    - Chuvash

AmaçsızBirKişi (talk) 16:24, 15 February 2025 (UTC)Reply

OK I need a little more help. Etymology variants are always variants of something else (either another etymology variant or an L2 language) and can have their ancestor set separately (e.g. Old Italian is considered an etymology variant of Italian, but Italian has Old Italian as an ancestor). Currently xbo (Bulgar) is an L2 language with trk-pro (Proto-Turkic, another L2 language) as its ancestor. Would trk-blg-pro (Proto-Bulgaric) be an etym variant of trk-pro and would the ancestor chain go trk-pro -> trk-blg-pro -> xbo? And would xbo-dnb (Danube Bulgar) and xbo-vol (Volga Bulgar) be etym variants of xbo (Bulgar) and also have trk-blg-pro as their ancestor? (In a case like this I suspect we don't have to set the ancestor explicitly; xbo-dnb and xbo-vol would automatically have their ancestor as the same as xbo. @Theknightwho for verification.) And since we already have otk (Old Turkic) as an L2 language with trk-pro as its ancestor, would Orkhon Turkic (otk-ork) be an etym variant of otk and have trk-pro as its ancestor? Finally, where does trk-ajm (Ajem Turkic) fit? Currently, Azerbaijani has Old Anatolian Turkish (trk-oat) as its ancestor; presumably Ajem Turkic would slot in between Azerbaijani and Old Anatolian Turkish in the ancestor chain; would trk-ajm be an etym variant of trk-oat or of az? (i.e. which one is it more similar to?) And should the name be "Ajem Turkic" or "Ajami Turkic"? Benwing2 (talk) 22:09, 15 February 2025 (UTC)Reply

@BurakD53 wrote a paragraph about bulgar in the original thread

\\

Looking at the family tree on here, https://en.wiktionary.orghttps://dictious.com/en/Category:Proto-Turkic_language, Classical Azeri (az-cls) is a descendent of Azeri (az) so it goes az -> az-cls. It says that Classical Azeri is the form of Azeri used in the 16th - 20th century. Shouldn't it be the other way around and renamed to ajem? So it should be: old anatolian turkish -> ajem -> azeri Zbutie3.14 (talk) 23:47, 15 February 2025 (UTC)Reply

What is going on with that page???? old turkic is supposed to descend from south siberian, salar is supposed to descend from oghuz, why are the descendents of common turkic not listed as its descendents, there is no Sayan or Yenisei under south siberian. Am I missing something or is the page just garbage? Zbutie3.14 (talk) 00:03, 16 February 2025 (UTC)Reply

It's not garbage; it just needs some ancestors set. But as someone not familiar with the whole Turkic family tree, I need specific settings from you, @AmaçsızBirKişi and @BurakD53 before I make any changes. It sounds like there are some issues still to be worked out. Benwing2 (talk) 00:24, 16 February 2025 (UTC)Reply

I think the structure I've been working on this past week in https://en.wiktionary.orghttps://dictious.com/en/User:Zbutie3.14/trtable is the most accurate we have right now, @AmaçsızBirKişi and @BurakD53 please look at it and tell me if anything needs to be changed Zbutie3.14 (talk) 00:44, 16 February 2025 (UTC)Reply

All right but I still need you and @AmaçsızBirKişi to review my suggestions above for how to put this info into language codes. Benwing2 (talk) 00:51, 16 February 2025 (UTC)Reply

Frankly, I'm not sure if there is a difference between Proto-Bulgar and Proto-Oghur. When we look at it, we would reconstruct Proto-Turkic *öküz as Proto-Bulgaric *ökür because in Hungarian, which contains loanwords from Bulgaric, it appears as ökör. Similarly, we would reconstruct the Turkish word yemiş as Proto-Bulgaric *yémilč based on Hungarian and Chuvash, and this would be the same in Proto-Oghur. In this case, I’m not sure if we can speak of two separate languages. If there is a difference between Proto-Bulgar and Proto-Oghur, you (@AmaçsızBirKişi) should explain what that difference is. As things stand, unfortunately, there doesn’t seem to be any differences.

The Soyot language is considered a dialect of Tofa, and the Dukhan language is considered a dialect of Tuvan. If you believe these languages are distinct enough that they shouldn't be classified as dialects, then you should identify and describe the specific points of divergence. Then we can decide whether they should be considered dialects or not.

Regarding Bulgaric, we know that Danube Bulgar and Volga Bulgar are clearly distinct from each other. I have mentioned this before. Their writing systems, languages, and the cultures they were influenced by are all different. A people who adopted Islam, the Arabic script, and fell under Mongol rule cannot be equated with a people who adopted Christianity, Greek and Cyrillic scripts, and eventually became Slavicized. That’s why, as I said before, the distinction between xbo-vol and xbo-dnb can be made. In fact, we could also add xbo-kbn (Kuban Bulgar), which we will discuss in relation to Hungarian loanwords. However we don't have any inscription in this language.

On the site, Old Turkic includes both Orkhon Turkic and Yenisei Kyrgyz. There is already a separate code for Old Kyrgyz, but one could also be added for Orkhon Turkic. When placing it in the Descendants list, we should not forget that Yenisei Kyrgyz is a continuation of Orkhon Turkic. Even today, the Khakas people, who still bear the name "Kyrgyz" in the region, should be their descendants. After all, when we look at them, the Khakas, just like the Old Kyrgyz, are not Buddhists.

I'm not sure if Ajem Turkic and Classical Azerbaijani Turkic were actually a distinct language. Previously, I argued that they should be considered separate from Old Anatolian Turkish, but when I examined works supposedly written in the Azerbaijani region, I found nothing other than Old Anatolian Turkish. Whichever text I looked at, the language was OAT. This makes sense because there were two literary languages: one was Chagatai Turkic, also known as Eastern Turkic, and the other was Old Anatolian Turkic, also known as Western Turkic, which later evolved into Ottoman Turkish. Writers produced works in these two languages.

In short, I now believe that Azerbaijani should be classified under Old Anatolian Turkish. As for the term Ajem Turkic, it can be used not to indicate a distinct language but rather to refer to both Azerbaijani Turkic and Qashqai, since Ajami means Iranian in our language. Given that it refers to Turkic spoken in the Iranian region, this naming can be justified.

Regarding Classical Azerbaijani, there is no clear-cut distinction between it, Old Anatolian Turkish, and Ottoman Turkish. However, if a logical framework can be established and its distinction from other languages is clearly defined, perhaps a code could be assigned. Personally, I don't see this distinction clearly. Even in Fuzuli’s works, both ben and men appear within the same couplet. Maybe one distinguishing feature could be the use of -em instead of -üm as a suffix. Idk. BurakD53 (talk) 08:54, 16 February 2025 (UTC)Reply

@BurakD53 Thank you very much for your detailed comments. Keep in mind that etym variant codes can be assigned for lects that are not distinct enough to warrant treatment as a separate L2 language but where there is enough of a distinction where it makes sense to make a distinct lect code. As for the five proposed codes above, I think you're saying that Proto-Bulgar isn't needed; xbo-vol and xbo-dnb can be etym variants of xbo; Orkhon Turkic can be an etym variant of otk; and Ajem/Ajami Turkic is not a separate language hence a code isn't needed, or at most it needs to be an etym variant of Ottoman Turkish. Is that right? Benwing2 (talk) 09:03, 16 February 2025 (UTC)Reply

Yes, you're right. I don't think Ajem Turkic is necessary. The Oghuz classification in User:Zbutie3.14/trtable is completely suitable for me. I sincerely thank @AmaçsızBirKişi and @Zbutie3.14 for their efforts, and you as well for your evaluations. BurakD53 (talk) 09:35, 16 February 2025 (UTC)Reply

OK, just to clarify that I have this right:

Proto-Bulgar (or should it be Proto-Bulgaric?): Same as Proto-Oghur, but we have no language for this. Should we create Proto-Oghur and assign it trk-ogr-pro? If so, should this be an L2 language or an etym variant of trk-pro?
Danube Bulgar : Make an etym variant of xbo (Bulgar).
Volga Bulgar : Make an etym variant of xbo (Bulgar).
Orkhon Turkic : Make an etym variant of otk (Old Turkic)? Confusingly, we have Old Turkic and Old Uyghur as separate L2 languages, but Wikipedia says that Old Uyghur is a later dialect of Old Turkic. In that case, what is the difference between Wiktionary's Old Turkic and Old Uyghur lemmas? Should Old Uyghur have Old Turkic as an ancestor? Should Old Uyghur be merged into Old Turkic?
Ajem/Ajami Turkic: Make it an alias of Classical Azerbaijani. Make Classical Azerbaijani the ancestor of modern Azerbaijani and Qashqai.

Benwing2 (talk) 09:55, 16 February 2025 (UTC)Reply

1. Proto-Oghur would be a more inclusive term, allowing us to include the Khazars as well. Additionally, there is a separate Oghur dialect known as the s-dialect, traces of which can be found in Hungarian and some Uralic languages. These loanwords contain sz instead of gy in word-initial position, as they were borrowed from a different dialect.

2.

Support

3.

Support

4. In linguistic literature, Old Turkic includes Orkhon Turkic, Old Kyrgyz, Old Uyghur. However, Orkhon Turkic and Old Kyrgyz use the same script (despite Old Kyrgyz have some special letters) and share the same religion but in the different region. Old Uyghur, at least on the site, refers to texts written in the Old Uyghur script and associated with Manichaean or Buddhist traditions. The Old Uyghurs also produced works in the Orkhon script, but we classify those inscriptions under Orkhon Turkic on the site. For example, Irk Bitig, despite being a Manichaean divination book, is categorized as Orkhon Turkic. As far as I know, its language does not differ significantly from Orkhon Turkic. I would classify like this:

Old Turkic:
- Orkhon Turkic: (written in Old Turkic Script around the Orkhon Basin between 7th to 9th centruies)
  - Yenisei Kyrgyz: (written in Old Turkic script with Yenisei variants around the Yenisei Basin between 8th-13th centruies)
  - Old Uyghur: (written in Old Uyghur script which derived from Sogdian script around the Mongolia, Hami, Turpan, Gansu regions between 9th-14th centruies)
    - Western Yugur:

Or this:

Old Turkic:
- Orkhon Turkic:
  - Yenisei Kyrgyz:
- Old Uyghur:
  - Western Yugur:

5.

Support BurakD53 (talk) 10:40, 16 February 2025 (UTC)Reply

Old Turkic entryies are the entries of both Orkhon Turkic and Yenisei Kyrgyz. But Old Uyghur has different entries. Old Turkic is used as an umbrella term here, but Old Uyghur entries are treated separately. BurakD53 (talk) 10:45, 16 February 2025 (UTC)Reply

The reason I wanted separate headers for Proto-Bulgar and Proto-Oghur is that they are definitely not the same language. pOghur was thought to have been spoken before 1st century AD., while Proto-Bulgaric is much more recent (6th-13th centuries.)

Proto-Bulgar is also known as West Old Turkic, which was concurrent with the East Old Turkic (i.e. Orkhon, Yenisei, Uyghur, Karakhanid)

For an example, I'd point to *bugday. The ideal way for the Oghuric descendants to be written would be like this:

pTurkic: *bugday
- Early pOghur: *bugday ~ *buday
  - (bor) pMongolic: *buguday
  - (bor) pMongolic: *budagan
    - Late pOghur: *buɣδay
      - Early pBulgaric: *buɣzai̯
        Late pBulgaric: *būza
        (bor) Old Hungarian: buʒa
        Hungarian: búza
        
        Old Chuvash (MČ1): *pŭraĭ
        Middle Chuvash (MČ2): *pŭri
        Chuvash: pări

Whether or not we need as much detail as this one is up for debate, but having two different language codes for Proto-Bulgar and Oghur seems like a no brainer for me.

(By the way, I have used 'Old Chuvash' in that entry for Proto-Bulgar, and that page also has some problems, but the desclist I've written above must be correct, here are the sources: and )

^ Agyágasi, Klára (2019) Chuvash Historical Phonetics (Turcologica; 117), Wiesbaden: Harrssowitz, page 240

^ Róna-Tas, András, Berta, Árpád, Károly, László (2011) West Old Turkic: Turkic Loanwords in Hungarian (Turcologica; 84), volume 1, Wiesbaden: Harrassowitz Verlag, pages 186-188

AmaçsızBirKişi (talk) 11:28, 16 February 2025 (UTC)Reply

Is the dh > z change mentioned by Kashgari considered Proto-Bulgaric here? Do the Hungarian loanwords follow the dh > z pattern, or is this specific to just this word? BurakD53 (talk) 11:47, 16 February 2025 (UTC)Reply

That δ > z is just a step in the larger Bulgaric sound shift of *-d- > -r-. In the book by Róna-Tas, it's dubbed the "second rhotacism" and the following chain of sound changes are given: pTurkic cluster *-Vgd- leniates to *-Vgδ- > *-Vɣz- > *-V̄z- > *-Vr- and finally to -V̆r-.

I guess it is independent of the *-d- > *-y- sound shift present in other Turkic languages, but they have have affected each other.

The -ɣ- deletion and the lenghthening of the previous vowel seems to be a common theme before -d- in Bulgar, I don't know enough to call it regular, but see these for example:

pTurkic: *edgü ("good")
- pOghur: *ed(ɣ?)i ~ -ü
  - pBulgaric: *edV
    - (bor) Old Hungarian: idʲ ("holy")
      - Hungarian: egyház ("church")
pTurkic: *yogur- ("to knead")
- pOghur: *ǯuɣur-
  - pBulgaric: **Cūr- (?)
    - (bor) Old Hungarian: dʲǖr-öd
      - Hungarian: gyúr ("to knead, pug")
- pOghur: **ǯiɣur- (?)
  - pMongolic: *ǯigura-
  - pBulgaric: **Cǖr- (?)
    - - (?bor) Hungarian: gyűr ("to crumple")

Source: Same book and volume by Róna-Tas and Árpád, pages 307-310, 411

AmaçsızBirKişi (talk) 12:18, 16 February 2025 (UTC)Reply

Forgot to add that these examples also should any doubt as to whether or not to have a distinct Bulgar language code, apart from Oghur. Using Old Chuvash (c. 13-15th century, following the Volga Bulgar) for this would not be accurate at all. AmaçsızBirKişi (talk) 12:20, 16 February 2025 (UTC)Reply

Since we have adhine > ايرنى "erne" in Volga Bulgar, we can say that there is no trace of this z-shift in VB. Unfortunately, there are no recorded Volga Bulgar words that could serve as examples of this change. We can only confirm that the r-form exists for this specific word. However, if it is claimed that there was an intermediate stage *azne, considered Proto-Bulgaric, then this intermediate phase must have been significant, so we should have a language code. If we accept this, wouldn't Kashgari’s 11th-century record of azak (instead of ayak) for the Bulgars, Yemeks, Suvars, and some Kipchaks be classified as Proto-Bulgaric? But why wouldn’t Kuban Bulgaric *z < Proto-Bulgaric *dh > Volga Bulgaric r be considered a valid transition? Are we certain that Volga Bulgar evolved from an earlier *z? BurakD53 (talk) 12:43, 16 February 2025 (UTC)Reply

I mean why not this:

pTurkic: *bugday
- Early pOghur: *bugday
  - Late pOghur: *buɣδai̯
    - Kuban Bulgaric: *buɣzai̯
      - Late Kuban Bulgaric: *būza
        (bor) Old Hungarian: buʒa
        Hungarian: búza
    - Old Chuvash (MČ1): *pŭraĭ
      - Middle Chuvash (MČ2): *pŭri
        Chuvash: pări

BurakD53 (talk) 12:53, 16 February 2025 (UTC)Reply

Hungarian and Slavic loanwords from Bulgar have a quite noticable cut-off date, around late 10th and early 11th century. Volga Bulgar however is attested 2 centuries later. Also considering that the *-z- we are talking about would probably be a volatile and unstable sound, I don't see a problem with 10-11th century Bulgar *-z- shifting to 13-14th century Bulgar *-r-. Agyágasi also gives this chain of descendants for irne in Chuvash, for your information:

New Persian āδīna
- (bor) Late Proto-Bulgar: **azinʲa ~ **arʲinʲa
  - Volga Bulgar: ايرنى (érne)
    - Middle Chuvash (MČ1): *erne
      - Chuvash: irne

There are good reasons for the palatalization of Proto-Bulgar -r-, and this chain of sound shifts are consistent with what I've given above (*-Vd- > *-Vδ- > *-V/V̄z- > (some intermediary shift) > *-Vr- and finally to -V̆r-.)

Source for the New Persian to Chuvash sound shifts: Agyágasi's book I've ref'd above, page 191.

Maybe it's actually the Kuban Bulgar which is responsible for that shift, but I'd like to see some sources on Kuban Bulgar, if we even have any substantial material on that.

AmaçsızBirKişi (talk) 13:11, 16 February 2025 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘

The Kuban Bulgars seem to be the ancestors of the Volga Bulgars because, according to Tekin, they contributed words to Hungarian before the 8th century. We know that the Volga Bulgars migrated to the Volga Bulgar region from the Khazar state in around 9th century. Either the Kuban Bulgars were their ancestors or their cousins. As for Danube Bulgar, considering that the First Bulgarian Empire was founded in the 7th century, we can assume a similar background for them as well. To get the necessary answers, it would be useful to examine the Bulgar loanwords in Old Church Slavonic which evolved to modern Bulgarian.

However, I want to highlight an important point: foreign languages adapt and adopt sounds that are not present in their own languages. How can we be sure that the Hungarians didn't adapt the δ sound as z in their language? One of the strongest arguments supporting this theory appears to be Kaşgarî’s record. However, since Kaşgarî never actually visited the Bulgar and Suvar lands, this record is generally considered inaccurate.

If all of this, points to a proto-language with *z, I conclude that the Proto-Bulgar language should also have a code. Moreover, Proto-Bulgar already seems to refer to Kuban Bulgar. Danube Bulgar and Volga Bulgar must have evolved from it. @Benwing2

Proto-Oghur: *(r,l,lç,dh)
- Proto-Bulgar: *(r,l,lç,z)
  - (bor) Old Hungarian:
    - Hungarian:
  - Volga Bulgar: (r,l,(l)ç,r)
    - (...)
      - Chuvash:
  - Danube Bulgar: *(r,l,?,?)
    - (bor) Old Church Slovanic
      - Bulgarian:

BurakD53 (talk) 13:46, 16 February 2025 (UTC)Reply

@AmaçsızBirKişi @BurakD53 OK, there is no code for either Proto-Oghur or Proto-Bulgar(ic). And I'm still not sure what the ask is in terms of L2 languages. Do you want two new L2 langs, one new L2 lang or no L2 langs? Keep in mind that just because there is borrowing at different stages doesn't mean we need different L2 langs in all cases; etym variants may be enough. For example, we currently have no L2 codes for Proto-anything in the Romance family (although there is a pending proposal for Proto-Romanian or similar), and in the Slavic family we have only one L2 code for Proto-Slavic. In Germanic we have two L2 codes, for Proto-Germanic and Proto-West Germanic (although Proto-West Germanic is still somewhat controversial as a concept; it was mainly Victar pushing for PWG as a separate L2 language). Benwing2 (talk) 20:16, 16 February 2025 (UTC)Reply

I'm a bit confused. If oghur and proto-oghur are 2 different codes then shouldn't common-turkic and proto-common-turkic also be 2 different codes? We have a common-turkic code but no proto-common-turkic code. Common-turkic and oghur are both unattested so shouldn't we have only a proto-oghur and proto-common-turkic, no oghur and common-turkic code? Zbutie3.14 (talk) 21:15, 16 February 2025 (UTC)Reply

This is correct; the same situation exists in the Oghuz languages as well. Yes, Proto-Oghuz is a necessity, but if we already have an Oghuz code and can add reconstruction to it in the descendants list, why would we need a separate Proto-Oghuz language code? I think we should add the Proto-Bulgar code, and if necessary, we can add the reconstruction next to the Oghur heading. BurakD53 (talk) 09:14, 17 February 2025 (UTC)Reply

I figured out what the problem is. Right now common-turkic is a language. It should be a family, not a language. proto-common-turkic is the name of the language. This is why I was confused. Same should be done with oghur, oghur is a family and proto-oghur is a language. @Benwing2 first before adding new languages we should fix the stuff that's broken right now, so common-turkic should be made a family, proto-common-turkic should be a language, proto-oghur should be a language, the oghuz/kipchak/karluk/siberian families should be part of the common-turkic family, old turkic should be part of the south siberian family, and salar should be part of oghuz. Zbutie3.14 (talk) 14:21, 18 February 2025 (UTC)Reply

@Zbutie3.14 I went ahead and renamed "Common Turkic" to "Proto-Common Turkic" and changed its code from trk-cmn to trk-cmn-pro, so that trk-cmn can be used as the code for the Common Turkic family (currently it's still an alias for trk-cmn-pro). I realize now I should have pinged @AmaçsızBirKişi and @BurakD53 for confirmation but it seems like an obvious thing to do. Benwing2 (talk) 23:55, 18 February 2025 (UTC)Reply

I tried to convert all of the existing uses of trk-cmn based on the dump file and/or tracking in Special:WhatLinksHere/Wiktionary:Tracking/languages/trk-cmn. I am going to wait a day or two to see if any more uses pop up, and then create a Common Turkic family using the trk-cmn code. I'll deal with the other stuff at that point. Benwing2 (talk) 01:53, 19 February 2025 (UTC)Reply

thanks ur the best! <3 Zbutie3.14 (talk) 02:07, 19 February 2025 (UTC)Reply

I think etym variants will suffice, in a similar vein to cv-old and cv-mid er already have @Benwing2.

AmaçsızBirKişi (talk) 12:02, 17 February 2025 (UTC)Reply

@AmaçsızBirKişi OK, can you specify exactly which codes you want and what should be their parent language? Benwing2 (talk) 21:21, 17 February 2025 (UTC)Reply

I think we all agreed about xbo-vol, xbo-dnb, and otk-ork at least.

Oghur: (trk-ogr)¹
- Bulgar: (xbo)²
  - Volga Bulgar: (xbo-vol, etym variant of xbo)
    - (...)
      - Chuvash: (cv)
  - Danube Bulgar: (xbo-dnb, etym variant of xbo)

----

Common Turkic:
- Old Turkic: (otk)
  - Orkhon Turkic: (otk-ork, etym variant of otk)
  - Old Kyrgyz/Yenisei Kyrgyz: (otk-kir, etym variant of otk)

To see if they will support it or have any suggestions to solve the problem @Bartanaqa @Yorınçga573 @Ardahan Karabağ @Blueskies006 @Vahagn Petrosyan @Samubert96 @Əkrəm Cəfər

¹@AmaçsızBirKişi thinks here after should be Proto-Oghur. I support.

²@Amaçsızbirkişi thinks here should be Proto-Bulgar, and I support it instead of the reconstruction. Because I still think Proto-Bulgar is a -ð- language, not -z-.

Guys If you are here, plz see also Wiktionary:Language treatment requests#Proto-Oghuz and Proto-Arghu to be able to enter recorded lemmas, if you support or not. Thanks.

BurakD53 (talk) 07:19, 19 February 2025 (UTC)Reply

@BurakD53 Can you redo your table, making the following distinctions:

clearly distinguish full languages, etym languages and families;
include all the intermediate nodes;
boldface the stuff that needs adding;
indicate, when language B is indented under language A, whether A is ancestral to B.

In this case, I take it:

Oghur (trk-ogr) is a family which already exists, but Proto-Oghur does not exist and needs to be added. Proto-Oghur (trk-ogr-pro) would be an etym variant of Proto-Turkic (trk-pro), just like Proto-Oghuz is.
Bulgar is a full language which already exists, and has Proto-Oghur as its ancestor.
Volga Bulgar and Danube Bulgar are etym variants of Bulgar, but there is not an ancestral relationship. NOTE: I am going to use xbo-dan instead of xbo-dnb, for consistency.
Old Chuvash has Volga Bulgar as its ancestor; Middle Chuvash has Old Chuvash as its ancestor; Chuvash has Middle Chuvash as its ancestor. Anatri and Viryal are Chuvash etym variants but there is not an ancestral relationship.
Common Turkic is a family that will be created. Proto-Common Turkic already exists and is an etym variant of Proto-Turkic.
The Oghuz, Kipchak, Karluk and Siberian Turkic families will be placed under the Common Turkic family.
Old Turkic will be placed under the South Siberian Turkic family, which is under Siberian Turkic.
Orkhon Turkic will be created as an etym variant of Old Turkic, as Old Kirghiz already is.
Are there are ancestor/descendant relationships among Old Turkic, Orkhon Turkic, Old Kirghiz and Old Uyghur?
Salar will be placed under the Oghuz family per @Zbutie3.14.
Pecheneg (an L2 language), Salchuq (an L2 language), Khazar (an L2 language) and Arghu (an etym variant of Proto-Turkic, with L2 language Khalaj as its descendant) are currently hanging directly off of Proto-Turkic. Should they be moved elsewhere?

Benwing2 (talk) 07:49, 19 February 2025 (UTC)Reply

Going off of Burak's comment, here is the full descendants list (based on ancestry):

Proto-Turkic:
- Oghur(ic): (FAMILY)
  - Proto-Oghur: (ETYM) (#1)
    - Proto-Bulgar: (ETYM) (#2)
      - Volga Bulgar: (ETYM) ( should also work here #3)
        Old Chuvash: (ETYM)
        (...)
        Chuvash:
      - Danube Bulgar: (ETYM) ( should also work here #3)
        (...) (borrowings)
    - Khazar:
      - (...)
- Common Turkic: (FAMILY)
  - Siberian Turkic: (FAMILY)
    - Old Turkic:
      - Orkhon Turkic: (ETYM) (#4)
      - Yenisei Turkic: (ETYM) (#4)
      - Old Uyghur
        (...)
    - (...)
  - Arghu: (FAMILY)
  - Oghuz: (FAMILY)
  - Kipchak: (FAMILY)
  - Karluk: (FAMILY)

---

/// Footnotes: ///

'#1: Proto-Oghur, like you said, can be a etym-variant of Proto-Turkic. It will have Proto-Bulgar and Khazar as its descendants. We might need to add Tuoba, Apar and so on if we reach a consensus or if the need arise. But those are very tentative, so I digress.

'#2: Proto-Bulgar is the theoretical reconstruction of the Bulgaric languages, Danube and Volga (and also Kuban, but that's unattested) Bulgar. Its ancestor is Proto-Oghur and its descendants are Volga and Danube Bulgar variants, alongside the unsplintered Bulgar .

'#3: The new Volga variant of Bulgar will have Old Chuvash (and the contemporary Chuvash) as its descendants. Danube Bulgar does not need a descendant, since it is a dead branch. It's there mainly because of loanwords into Hungarian, Church Slavonic and Romanian.

'#4: Both Orkhon and Yenisei Turkic should have Old Turkic as their ancestor. We also might need to add Old Uyghur as a descendant of too. There is a recurring issue of previous edits confusing Orkhon Turkic and Old Uyghur, and people immediately assume a text to be Orkhon if it has runes, which is simply not the case. For example, almost half of the lemmas in Orkhon Turkic mainspace cites Ïrḳ Bitig, a work in Old Uyghur, for instance. Separating these would be more accurate.

---

/// Some more: ///

Arghu is a descendant from the Common Turkic branch, as far as I am aware. The confusion stems from the fact that it is the earliest branch to diverge from other Turkics, but it is firmly in the Common Turkic family.
I don't think it would be appropriate if we placed Old Turkic under South Siberian, that would be anachronistic. Yakuts and Dolgans have not migrated northwards at the time when Old Turkic was spoken.
Orkhon - Yenisei - Uyghur has no ancestral relation to one another. They all stem from Old Turkic, that's all.
Khazar is an Oghuric language. I've already talked about Arghu, and I do not know much about Salchuq or Pecheneg. We don't have any entries in neither, so I don't think chopping them off from the family table (for now) is that much of an issue.

Please let me know if I got something wrong!

AmaçsızBirKişi (talk) 11:58, 19 February 2025 (UTC)Reply

About #4: After the collapse of the Göktürk State, the language used in the Old Uyghur runic inscriptions was no different from Orkhon Turkic. It was a continuation of the same written tradition in the same region, around the Orkhon basin. Therefore, I believe that texts written in the Orkhon script, such as Irk Bitig, should not be included under Old Uyghur entries. In academia, Old Uyghur Turkic is often used to refer to texts written in the Old Uyghur script, while Irk Bitig is frequently classified as Old Turkic. In his book Irk Bitig: Book for Omens, Talat Tekin did not use the term "Uyghur" even once for Irk Bitig. Instead, he simply referred to it as "Old Turkic" and described it as a Manichaean ny dialect. As we know, Orkhon Turkic is also a ny dialect. Therefore I think that's why Yorınçga includes to Old Turkic instead of Old Uyghur. He can explain better. As stated in the source linked, Old Turkic texts written in the Orkhon script are referred to as the Manichaean dialect. See. All the Old Turkic texts written in the Orkhon script are referred to as the Manichaean ny dialect. BurakD53 (talk) 15:53, 19 February 2025 (UTC)Reply

Very well. I'll remove the quotations from Ïrḳ Bitig I added for Old Uyghur. Thanks for correcting me!

The Dergipark article you linked is dead, by the way.

AmaçsızBirKişi (talk) 16:18, 19 February 2025 (UTC)Reply

here. After reading a bit about it though, I'm not sure. Perhaps it would be more accurate to add it as Old Uyghur. Although it is written in the ny dialect, there are other differences, for example the use of the -gAy suffix for the future tense. Using the ablative suffix -dIn. These are different from Orkhon Turkic. I take my words back.BurakD53 (talk) 19:28, 19 February 2025 (UTC)Reply

I mean, sure why not? It was written in either year 930 or 942, way outside the range of other Turkic inscriptions (8th century).

We can remove the IB from the quotations part and the entries that rely only on IB when we deprecate the in favor of and . For example, yél ("mane") is only attested in IB and nowhere else in the Orkhon script. Entries like that will need removal.

AmaçsızBirKişi (talk) 19:58, 19 February 2025 (UTC)Reply

Probably not all the Runic inscriptions after the collapse of the Gokturk state, but Irk Bitig should be considered as Old Uyghur. BurakD53 (talk) 19:30, 19 February 2025 (UTC)Reply

I support the table.

Support. BurakD53 (talk) 16:05, 19 February 2025 (UTC)Reply

@AmaçsızBirKişi @BurakD53 @Zbutie3.14 OK I tried to implement everything in the above table. Please review the results. Arghu is not currently a family but an etym variant of Khalaj, so I just set its ancestor to Proto-Common Turkic. Also I gave Proto-Bulgar the code trk-bul-pro insead of trk-blg-pro, for consistency. Possibly it should be xbo-pro, but I don't know if it's kosher to have a protolanguage that is "Proto-" of a language rather than a family. Benwing2 (talk) 06:21, 20 February 2025 (UTC)Reply

Salar should be under Oghuz branch, just like Turkmen. Other than that, it's perfect. Thanks for resolving this issue.

AmaçsızBirKişi (talk) 10:55, 20 February 2025 (UTC)Reply

Is Khazar oghur? according to wikipedia it's disputed https://en.wikipedia.orghttps://dictious.com/en/Khazar_language Zbutie3.14 (talk) 13:36, 20 February 2025 (UTC)Reply

We are making quite a few requests, but may I ask for one more thing? Could we create three variants for qwm, just like we did for otk?

Proto-Turkic:
- Proto-Common-Turkic:
  - Kipchak: (FAMİLY)
    - Cuman-Kipchak:
      - Kipchak:
        Cuman: (etym variant of qwm) (here what I ask for)
        Crimean Tatar:
        Urum:
        
        Karachay-Balkar:
        
        Karaim:
        
        Krymchak:
        
        Kumyk:
        
        Armeno-Kipchak: (etym variant of qwm)
        
        Mamluk-Kipchak: (etym variant of qwm)

BurakD53 (talk) 07:30, 20 February 2025 (UTC)Reply

what do you think? @AmaçsızBirKişi BurakD53 (talk) 07:33, 20 February 2025 (UTC)Reply

Armeno Kipchak must be a descendant of Cuman too. Since Cuman is written in Crimea 14th ce., Armeno Kipchak is written in Crimea in 17th century. While Mamluk Kipchak written in Egypt in 13th-16th centuries, can't be a descendant of Cuman. I will just edit the table, to not confuse more. BurakD53 (talk) 07:44, 20 February 2025 (UTC)Reply

I added Cuman as an etym variant of qwm (Kipchak) and put Armeno-Kipchak under it, but I'm not sure about putting Crimean Tatar, Karachay-Balkar, etc. under Cuman. Currently the Kipchak-Cuman family (what you call Cuman-Kipchak) is under (a descendant of) the Kipchak language, whereas your tree above has them reversed. Can you edit your tree and label everything that's a family with the label "FAMILY" so we are completely clear what's going on? Also, Wikipedia asserts that "Cuman" and "Kipchak" are the same thing; see w:Cuman language. Benwing2 (talk) 23:15, 20 February 2025 (UTC)Reply

Also ping @AmaçsızBirKişi @Zbutie3.14. Benwing2 (talk) 23:16, 20 February 2025 (UTC)Reply

Proto-Turkic:
- Proto-Common Turkic:
  - Kipchak: (FAMİLY)
    - Cuman-Kipchak: (FAMİLY)
      - Kipchak:
        Cuman: (etym variant of qwm, location Crimea)
        
        Armeno-Kipchak: (etym variant of qwm, location Crimea)
        
        Mamluk-Kipchak: (etym variant of qwm, location Egypt)
      - Crimean Tatar: (location Crimea)
        Urum: (location Southeast Ukraine)
      - Krymchak: (location Crimea)
      - Karachay-Balkar: (location Caucasus)
      - Karaim: (location Crimea, Poland)
      - Kumyk: (location Caucasus) BurakD53 (talk) 08:21, 21 February 2025 (UTC)Reply
@BurakD53 This appears to not properly indicate the ancestor/descendant relationships. Presumably Armeno-Kipchak is a descendant of Cuman? What about Crimean Tatar, Krymchak and/or Karaim? Can you explicitly indicate the ancestor of each lect where it differs from the containment relationships shown in the above table? Benwing2 (talk) 08:56, 21 February 2025 (UTC)Reply
I don't have enough knowledge about these languages, so any comment I make could be incorrect. Yes, one is probably the ancestor or descendant of the other, but I'm saying this just based on location and the period. BurakD53 (talk) 09:03, 21 February 2025 (UTC)Reply

I think stands for the language of Codex Cumanicus right? If so yes we need that.

AmaçsızBirKişi (talk) 10:53, 20 February 2025 (UTC)Reply

Etymology-only codes and dialect labels for South Sumatran Malayic

Latest comment: 5 months ago1 comment1 person in discussion

I would like to request etymology-only codes and dedicated dialect labels (not sure if this is the right place?) for South Sumatran Malayic varieties under the Musi and Central Malay dialect groups. These varieties used to have their own ISO 639-3 codes before they (except , , and ) were merged into and in 2008. Per McDowell & Anderbeck (2020), many of these lects do have their own salient distinguishing features, and they remain treated as separate languages in most Indonesian publications. Specific words from several of these varieties have been borrowed into Indonesian, and they need to be etymologized properly (attested terms only, per Wiktionary:About Indonesian#Regional Languages).

Etymology-only languages currently needed:

or Palembang (formerly )
or Sekayu (formerly in Ethnologue 13, pre-ISO)
Lematang (formerly )
or Bengkulu (formerly )

Not necessary, but may be useful for tracing etymon reflexes:

Serawai (formerly )

Given the lack of universally accepted standard varieties in both and groupings, we also need to carefully label and categorize their entries according to their specific dialectal origin. I propose we adopt the classification given in McDowell & Anderbeck (2020), which retains most of the familiar local "language" labels (in Italics).

Musi dialect group

Upper Musi
- Musi Proper (= Musi, formerly in the narrow sense)
  - Kelingi
  - Penukal
  - Sekayu
- Pegagan (often misidentified as a dialect of Ogan )
- Rawas (formerly )
- Col
Palembang–Lowland
- Palembang (formerly )
  - Palembang Lama (traditional variety which includes a polite register akin to Javanese krama, taught locally in Palembang schools since 2024)
  - Palembang Pasar (urban koiné used as a regional lingua franca within and beyond the city of Palembang)
  - Pesisir (rural coastal variety, formerly listed under )
- Lowland
  - Belide (formerly under and )
  - Lematang Ilir (= Lematang, formerly )
  - Penesak (formerly )

Central Malay dialect group

Oganic
- Ogan (formerly )
- Rambang
- Enim (formerly )
Highland
- Bengkulu (formerly )
- Besemah (formerly in the narrow sense)
- Lematang Ulu (identical to Besemah)
- Lintang (formerly )
- Semende (formerly )
- Benakat
- Serawai (formerly )
  - Talo (*-a > , used by Adelaar to reconstruct Proto-Malayic)
  - Manna (*-a > )
- Kaur
- Pekal

Currently I have started using some of these labels in entries, cf. katek, rete, and muanai. At the very least, I think we need dedicated labels and categories for the etymology-only languages proposed above + the already existing (Besemah). The category names for dialects of and may be appended with "Malay", e.g. Palembang Malay, Musi Malay, Ogan Malay, Semende Malay, etc.

Note that prior to the merger of the codes (and up until now in Indonesia), the term "Palembang Malay" or "Palembang language" (bahasa Palembang) can only refer to the dialects under "Palembang" in particular, while "Musi language" (bahasa Musi) refers to dialects under "Musi Proper". The rest of the dialects are either treated as languages on their own, as dialects of Malay, or occasionally under other umbrella terms such as "Bengkulu language" (bahasa Bengkulu) for Highland dialects spoken in Bengkulu.

I am indifferent to the issue of whether we should lump together and with , and with . In particular, is sometimes placed closer to than to other lects (e.g. in Glottolog). Haji is an isolate within Malayic, sharing only ~60% of its lexicon with neighboring South Sumatran varieties, and is best treated as its own language. All , , and lects should be written in as the default script, but also uses , and is occasionally written with . Swarabakti (talk) 21:21, 25 January 2025 (UTC)Reply

Reconstruction:Common Romanian

Latest comment: 5 months ago8 comments5 people in discussion

Common Romanian, also called ‘Proto-Romanian’, is the reconstructed common ancestor of Aromanian, Istro-Romanian, Megleno-Aromenian, and Romanian. There is considerable scholarship on the subject. Sala 1976 treats the phonological aspects of the reconstruction in detail.

We already host such reconstructions under ‘Reconstruction:Latin’, which is problematic for a number of reasons:

The name. No scholar refers to this reconstruction as ‘Latin’, and that name can easily mislead our readers.
The orthography. Spellings like *⟨oestricula⟩ are quite out-of-step with reconstructions like /ˈstrekʎe/.

Proposed orthography: the phonemic transcriptions as they are now, except with some other way of indicating stress. For instance *strékʎe.

Pinging @Word dewd544, @Catonif, @Bogdan, @Benwing2 as potentially interested parties.

Nicodene (talk) 20:30, 30 January 2025 (UTC)Reply

No objection here except possibly to the name; "Proto-Romanian" sounds a bit better IMO although I'm not familiar with the scholarship to know what's the most common term. Benwing2 (talk) 21:07, 30 January 2025 (UTC)Reply

Is there any chance we could call it "Proto-Eastern Romance" since we group the languages in question together as the Eastern Romance languages? It gets a good number of Google hits. Also, both "Proto-Romanian" and "Common Romanian" are likely to be perceived as the ancestor of Romanian alone, not the other ones. —Mahāgaja · talk 21:09, 30 January 2025 (UTC)Reply

I agree that it would be useful to have Proto-Romanian (or however we decide to call it), not just for Latin words, but also for borrowings from Albanian. But about this particular case, while I agree that there can be a reconstruction before the split into Romanian and Aromanian pronounced /strekʎe/, the word itself is older, a Late Latin *oestricula must have existed, as the diminutive suffix was no longer productive at the later stage of the language (Proto-Romanian). I also wonder if we can find an obscure descendant of *oestricula in some dialect of Northern Italian, as often happens with Romanian words that are from Late Latin. Bogdan (talk) 23:02, 30 January 2025 (UTC)Reply

I’m not sure we can regard the criterion for Latin as ‘still having a productive reflex of -iculum’ in light of, for instance, Spanish -ejo.

@Mahagaja: Italian is often included under the label Eastern Romance, unfortunately. A possible option without this issue is Proto-Balkan-Romance.

Nicodene (talk) 03:43, 31 January 2025 (UTC)Reply

Even if others include Italian under Eastern Romance, we don't. We already use that term with the label roa-eas for a family consisting of ro, ruo, rup, and ruq. Calling the protolanguage of that family Proto-Eastern Romance would be internally consistent. That other people define the Eastern Romance family differently doesn't really have any relevance to what we call the protolanguage. —Mahāgaja · talk 07:13, 31 January 2025 (UTC)Reply

There has never been a discussion or vote on defining the label Eastern Romance, or using it on Wiktionary to begin with.

The vast majority of the time the term Eastern Romance has a broader scope than those four languages.

Nicodene (talk) 10:21, 31 January 2025 (UTC)Reply

Personally, I would be fine with this if it only implies we would still handle the situation exactly as we do now, the only difference being the language name as "Common Romanian" instead of "Latin" and the orthography more fitting, which are the two issues listed here. But I oppose this if, as I am to understand, this would take the role of a full-fledged language language and hence also have term inherited from attested Latin terms and terms borrowed from Slavic or some other Balkan language. This would increase the reconstruction up to an excessive number (approximately two thousands), an immense amount of work for little usefulness provided and greater informational clutter.

Regarding the name, were the first approach I mentioned go through, I would support "Common Romanian", or if we find it more coherent with the rest of the bunch, "Proto-Romanian". Any mention of "Eastern" or "Balkan Romance" I would vote against. Catonif (talk) 18:15, 31 January 2025 (UTC)Reply

February 2025

Proto-Oghuz and Proto-Arghu to be able to enter recorded lemmas

Latest comment: 4 months ago21 comments5 people in discussion

We have probably discussed this before on Discord and maybe on other discussion pages, but the Proto-Oghuz language should be eligible for entry. Some Proto-languages can be added to Wiktionary without formal reconstruction. Same is needed for Proto-Arghu. We don't have a code for it but Kashgarî recorded words in Proto-Arghu and we add these words like it's Karakhanid language, which is wrong. The same should be possible for Proto-Oghuz. A non-reconstructed language entry should be allowed for the Oghuz dialect recorded in the 11th century and earlier. As an example, I can add recorded lemmas in the Proto-Norse language, but can't for Proto-Oghuz. >>ᚺᚼᛁᛞᛉ<< BurakD53 (talk) 13:30, 18 February 2025 (UTC)Reply

@Benwing2 @AmaçsızBirKişi @Zbutie3.14 @Bartanaqa @Yorınçga573 @Ardahan Karabağ BurakD53 (talk) 17:15, 18 February 2025 (UTC)Reply

yeah completely agree for having separate entries for non-reconstructed proto-oghuz/arghu entries cuz they are quite literally attested. Bartanaqa (talk) 23:36, 18 February 2025 (UTC)Reply

@BurakD53 I'm not opposed but I assume there are very few such terms, is that right? If so, rather than simply turning off the "reconstructed" type for Proto-Oghuz and for Arghu, we (I) should implement the "anti-asterisk" feature mentioned in Wiktionary:Beer_parlour/2024/April#Mainspace_Proto-West-Germanic?. That way, they are still identified as reconstructed languages but you can create mainspace entries provided you identify them with the appropriate symbol (which might be a double exclamation point, !!). Benwing2 (talk) 07:54, 19 February 2025 (UTC)Reply

Alright, that's great! I believe that more than 200 words from Kashgarî's Diwan, written in the 11th century, should be considered Proto-Oghuz because this language belonged to the *Tağlığ group, whereas today all Oghuz languages, including Salar, belong to the *Tağlı group. The information provided by Arab travelers about the Oghuz people and their language in the 11th century and earlier can also be considered. More than a dozen words from Proto-Arghu must have been recorded in the Diwan as well, since I can recall about a dozen myself. BurakD53 (talk) 14:58, 19 February 2025 (UTC)Reply

This is داغ#Karakhanid the word in Arghu. Can you show me how do you change it as Proto-Arghu? So I can edit Oghuz and Arghu lemmas in the same way. @Benwing2 BurakD53 (talk) 19:39, 19 February 2025 (UTC)Reply

@BurakD53 We have no Proto-Arghu or Arghu family yet; Arghu is currently an etym variety of Khalaj. Should the Arghu family be added? See my comments above. Benwing2 (talk) 07:38, 20 February 2025 (UTC)Reply

It should be. The most distinctive feature of Arghu is that it changes the Old Turkic -ny- sound to -n-. Also, instead of using *emez and its variants like all other Turks or *degül like the Oghuz, it has its own way of saying "not." It is a language that has preserved primary long vowels and does not belong to the ayak group like the Oghuz. Which make it a whole different branch. BurakD53 (talk) 07:56, 20 February 2025 (UTC)Reply

In that case what should happen to the Arghu etym variety of Khalaj? Should it disappear in favor of Proto-Arghu? Benwing2 (talk) 08:06, 20 February 2025 (UTC)Reply

It should be:

Arghu:
- Proto-Arghu: (some words are attested by Kaşgarî in 11th century)
  - Khalaj:

BurakD53 (talk) 09:48, 20 February 2025 (UTC)Reply

Are you saying that the Arghu language (etym variety) should be converted into a family? It isn't clear to me. BTW I don't think there are any actual Arghu language entries being referenced currently, because there's no page Category:Arghu or Category:Arghu Turkic or any such thing. Benwing2 (talk) 22:57, 20 February 2025 (UTC)Reply

Arghu Turkic is attested in Diwanu Lügatit Türk, in 11th century. There is no such category because we haven't add Arghu lemmas yet. Arghu languages is a subfamily. See Argu languages. I have never heard that an etym variant of Khalaj called Arghu. Let's also ask to @Xenos melophilos. The only record of Arghu is in Diwan and that's why the group called Arghu. BurakD53 (talk) 23:51, 20 February 2025 (UTC)Reply

Arghu is a Common Turkic language, so it is a z-Turkic. İt is an adak group Turkic language according to *adak. İt has primary long vowels. It is a -n- group language as in *koń. BurakD53 (talk) 00:00, 21 February 2025 (UTC)Reply

@BurakD53 So what should we do exactly? Should we rename the Arghu language to Proto-Arghu? Should we instead add Proto-Arghu and keep the Arghu language? And what should Proto-Arghu be an etym variant of? How distinct is it from Proto-Common Turkic and/or modern Khalaj? Benwing2 (talk) 08:58, 21 February 2025 (UTC)Reply

Sorry to keep asking you the same questions but you need to be extremely explicit about all the various relationships. Maybe @AmaçsızBirKişi can help you. Benwing2 (talk) 08:59, 21 February 2025 (UTC)Reply

Proto-Common Turkic is a -ny- language, while Proto-Arghu is a -n- language. Proto-Common Turkic uses *ermez for not, Proto-Arghu uses da:g, Proto-Oghuz uses *degül. The difference between Khalaj and Arghu is Arghu was attested in 11th century, but Khalaj is a modern spoken language in 21st century. So Proto-Arghu is the ancestor of Khalaj. Khalaj has Azerbaijani borrowings, while Arghu doesn't. Khalaj probably influenced by Persian quite much. There are attested lemmas in Arghu but does not live in modern Khalaj. Do we have the words balık "mud", teşrüm "string ball", bitrik "peanut" in Khalaj? @Xenos melophilos can help better. I'm not sure we have all the attested Arghu words in Khalaj. That's normal because there is literally a millennium. Of course there must be phonological and maybe morphological differences too. BurakD53 (talk) 09:32, 21 February 2025 (UTC)Reply

I think we should remove Arghu language and add Proto-Arghu because there is no script in Arghu language we found today. Arghu language must be Proto-Arghu and we can reconstruct with the help of attested words in DLT. İf we have an Arghu language without any reconstruction, there will be only 15 or 30, max 50 lemmas, that will be all the language we have. So, I think Proto-Arghu is better. BurakD53 (talk) 09:40, 21 February 2025 (UTC)Reply

If you think the Proto language is unnecessary, just make it Arghu language. The only thing that matters to me is that Arghu can be presented as a separate branch from Karakhanid and as the ancestor of Khalaj. I honestly don't care about the rest. BurakD53 (talk) 10:13, 21 February 2025 (UTC)Reply

Well you guys are mentioning me. What do I think?

Arghu or protoarghu I don't care, anyways it'll look like protonorse (sometimes attested sometimes not). What it matters is that it should exist a language section called arghu or protoarghu

Arghu is attested with persoarabic, in khalaj there are more conservative dialects than others, and so arghu words that seem to not exist in khalaj could actually be preserved in some dialect

My point is that arghu should be a language appart , and not reconstructed because there are words attested in the divan (just like protorse) Xenos melophilos (talk) 14:50, 21 February 2025 (UTC)Reply

There should not be a "arghu" group because we have just one language with one descendant. Arghu or protoarghu language is fine Xenos melophilos (talk) 14:52, 21 February 2025 (UTC)Reply

Support.

AmaçsızBirKişi (talk) 19:02, 19 February 2025 (UTC)Reply

Medieval Greek 2025

Latest comment: 4 months ago2 comments2 people in discussion

Pending from 2024 (Benwing plan)

Waiting... . It would be useful for reviewing correctly etymologies, Cat:Koine Greek and Cat:Modern Greek simultaneously. At the moment I feel 'blocked' because it will be hectic to have to go back to my reviews to rereview them. I usually write a MedGr.reminder every January of every year since 2023. The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision. Thank you. ‑‑Sarri.greek ^♫ I 15:39, 19 February 2025 (UTC)Reply

@Sarri.greek: Hello again. Could you explain what you refer to by “The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision.”, please? 0DF (talk) 01:17, 20 February 2025 (UTC)Reply

March 2025

Code for Volga Turki?

Latest comment: 3 months ago6 comments3 people in discussion

Proto-Turkic: (trk-pro)
- Proto-Common Turkic: (trk-cmn-pro)
  - Kipchak: (trk-kip) (FAMİLY)
    - Kipchak-Bulgar: (trk-kbu) (FAMİLY)
      - Volga Turki: (?)
        Bashkir: (ba)
        
        Tatar: (tt)

>>Volga Türki<<

BurakD53 (talk) 18:34, 3 March 2025 (UTC)Reply

Data: Qul Ali Kıssa-i Yusuf and Volga Tatar tombstones (like Volga Bulgar inscriptions). - BurakD53 (talk) 18:37, 3 March 2025 (UTC)Reply

@AmaçsızBirKişi @Zbutie3.14 - BurakD53 (talk) 18:39, 3 March 2025 (UTC)Reply

fine with me Zbutie3.14 (talk) 23:28, 3 March 2025 (UTC)Reply

Why not?

Support.

@Benwing2 (I know we keep pinging you to add new lang codes for Turkic langugaes, but the thing is the previous ones were very inaccurate and missing.)

AmaçsızBirKişi (talk) 10:08, 4 March 2025 (UTC)Reply

@Benwing2 we need a code for Volga Turki it is an L2 language under the Kipchak-Bulgar family and it is the parent of Bashkir and Tatar. Also there are a lot of other things that need to be changed but we can deal with those later I guess Zbutie3.14 (talk) 19:14, 13 March 2025 (UTC)Reply

Some Sino-Tibetan considerations

Latest comment: 2 months ago14 comments5 people in discussion

New sub-proto-languages

I would like to propose some Sino-Tibetan sub-proto-languages:

Proto-Bodish (sit-bdi-pro), for Category:Bodish languages; a long list of Proto-Bodish forms is provided in Bodt's "East Bodish Revisited".
Proto-Tangkhulic (sit-tng-pro), for Category:Tangkhulic languages; David Mortensen has published extensively on this
Proto-Naish (sit-nas-pro), for Category:Naish languages; several reconstructions are given by Jacques and Michaud's "Approaching the historical phonology of three highly eroded Sino-Tibetan languages: Naxi, Na and Laze" (and also Li Zihe has his own reconstruction scattered across separate papers).
Ersuic languages (sit-ers) composed of ers (Ersu) and sit-liz (Lizu); its proto-language (which would thus be sit-ers-pro) is reconstructed by Yu 2012.

— Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:48, 6 March 2025 (UTC)Reply

@Benwing2 @Thadh @Justinrleung for consideration. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:49, 6 March 2025 (UTC)Reply

Definitely support for Proto-Bodish.

I'm not very familiar with the rest, so I can't readily say how good the reconstructions are (but on first glance, they seem fine). For Naish, I'm a bit worried that Proto-Naish may not turn out to be that different from Proto-Naic - are there any reconstructions of the latter? When dealing with just five languages, I think a higher-order reconstruction would potentially be more interesting than a lower-order if the languages are closely related. On the other hand, if there's no work being done on these and it's not likely to be the case in the future, we might as well include Proto-Naish now, if it's already reconstructed. Thadh (talk) 17:31, 6 March 2025 (UTC)Reply

Couldn't find any advancements to a Proto-Naic stage beyond Proto-Naish, no. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:42, 6 March 2025 (UTC)Reply

No objections but I don't know a lot about the intermediate structure of Sino-Tibetan. I have heard there is a good deal of uncertainty, are we pretty sure these families are valid? If so, I'm fine with creating the relevant proto-languages. Benwing2 (talk) 20:13, 6 March 2025 (UTC)Reply

Naish, Tangkhulic, and Ersuic are unchallenged. They should proceed without a hitch.

Bodish... has an annoying terminology problem. Everyone agrees that Tibetic and East Bodish belong together, but the terminology used to refer to such a grouping varies wildly:

Tibetic + East Bodish alone are the basis of Bodt's "Proto-Bodic" reconstruction. But Bodt defines "Bodish" as synonymous with Tibetic and "Bodic" as Tibetic + East Bodish + Tamangic + West Himalayish.
Bodish = Tibetic + East Bodish according to Hill (Hill actually rejects East Bodish as a genetic group, but still considers its components overall Bodish); consequently Proto-Bodish is the ancestor of this grouping. A similar definition of "Bodish" is also used in Glottolog.
Shafer uses "Bodish" for two levels of grouping, the lower level consisting of what is now accepted as Tibetic + East Bodish.
The current definition of Category:Bodish languages is Tibetic + East Bodish + Tshangla (and for some bizarre reason 'Olekha, which certainly doesn't look Bodish at all to me).

So basically, Bodt's "Proto-Bodic" is not a valid reconstruction for what he defines as "Bodic" (since it only uses two of the four Bodic branches for reconstruction), but it is valid for what Hill and Glottolog call "Bodish" (East Bodish + Tibetic) which essentially is a subgroup of what Tournadre and others call "Bodish" and Bodt calls "Bodic" (East Bodish + Tibetic + West Himalayish + Tamangic).

In the end, I would like for Tshangla and 'Olekha to be removed from Category:Bodish languages since their Bodishness is dubious. This leaves behind East Bodish and Tibetic, whose proto-language will be Bodt's Proto-Bodic = Hill's Proto-Bodish with code sit-bdi-pro. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 00:53, 7 March 2025 (UTC)Reply

Thanks for the details. So I take it you prefer the terminology "Bodish" for the narrower Tibetan + East Bodish, and "Bodic" for the wider group that includes Bodish + Tamangic + West Himalayish? Should we create an intermediate family "Bodic languages", since it doesn't currently exist? Benwing2 (talk) 01:01, 7 March 2025 (UTC)Reply

No. Absolutely do not use "Bodic" that way. "Bodic" is used in other literature taking after Bradley which adds an additional branch (whatever branch Kiranti is in) on top of the four-branch "Bodish" sensu lato. I have no good ideas on what to call four-branch Bodish; "Bodish (sensu lato)"? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:28, 7 March 2025 (UTC)Reply

"Macro-Bodish" or "Greater-Bodish" is probably better. – wpi (talk) 13:31, 18 April 2025 (UTC)Reply

Rearranging rGyalrongic and Tangut

Tangut (txg) should be placed inside Category:Rgyalrongic languages, not treated like a sister to it.

The whole rGyalrongic branch should have two subdivisions: West rGyalrongic (sit-wgy) consisting of Tangut txg, Horpa ero, and Khroskyabs jiq; and the other languages like Japhug sit-jap, Situ sit-sit, Zbu sit-zbu and Tshobdun sit-tsh belong to East rGyalrongic (sit-gya I guess).

— Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:09, 6 March 2025 (UTC)Reply

Is there a reason to prefer sit-gya instead of sit-egy parallel to sit-wgy? - -sche (discuss) 07:13, 19 March 2025 (UTC)Reply

This is because "Gyalrong" itself refers to East rGyalrongic alone. But I don't mind "egy" as a qualifier. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:44, 28 March 2025 (UTC)Reply

Done Tangkhulic, Ersuic, Naish and rGyalrongic reorganizations and proto-languages, since nobody objected. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:56, 5 April 2025 (UTC)Reply

Done Bodish rearrangements as well. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:30, 7 April 2025 (UTC)Reply

'Fingallian' and 'Yola' i.e. the Early Modern English dialects of Fingal, and Forth and Bargy

Latest comment: 3 months ago15 comments9 people in discussion

My argument here is that Fingallian and Yola as languages is inaccurate and are better classed as dialects of Early Modern English.

As detailed by Hickey (2005, pp. 196-198), 'the dialect of Fingal' is attested in three 17th century poems which display a small number of features showing the influence of Irish Gaelic and a couple of relatively conservative features (namely Middle English /i:/ and past participle 'y-'). 'The dialect of Forth and Bargy' is attested slightly more substantively from the end of the 18th century with two longer glossaries and some short texts, mostly poems/songs. These display a larger number of divergent or conservative features (2005, pp. 199-202). Hickey (2002, 2005) is essentially the primary scholar on historical varieties of English in Ireland and is clear in referring to these as dialects. No reliable sources make any mention of Yola/Fingallian languages. Similarly, Oxford English Dictionary notes Forth and Bargy words as variant forms under Irish English (Wexford), 1800s such as 'af' and 'av' for 'if' here.

These two dialects give a glimpse into the development of English in Ireland prior to the large scale language-shift that came in the following centuries. Whilst I recognise that the 'language vs. dialect' argument is mostly contrived and relative, it does not make any sense for these two to be classes as languages on wiktionary and any entries would be better described as dialectal Early Modern English. My view is that these varieties are not actually that different from contemporary or later dialects of English eg Yorkshire or Cumbrian dialects which are traditionally quite divergent from varieties of Southern English yet fall under English all the same. Further, the majority of words currently under the heading Fingallian are cited from a glossary of dialectal words from the 20th century and aren't strictly Fingallian anyway.

Sources:

Hickey, R. (2002). A source book for Irish English. J. Benjamins Publishing Company.
Hickey, R. (2005). Dublin English: evolution and change. J. Benjamins Publishing Company.
Oxford English Dictionary, s.v. “if (conj. & n.), Forms,” accessed December 2024,

MolingLuachra (talk) 21:00, 11 March 2025 (UTC)Reply

@MolingLuachra These diverged before Early Modern English (which begins c. 1500), so what is the basis for treating them as part of it? They also have a separate ancestry to modern Irish English, as they developed out of the forms of Middle English brought over centuries earlier, so putting them under the heading "English" (which strictly refers to Early Modern English onwards) feels contrived. The fact that many Fingallian entries are wrong isn't really relevant, either - those entries just need to be corrected.

Note also that Wiktionary is not Wikipedia - we aren't limited by whatever reliable sources choose to describe as a language. We merge some traditionally treated as separate (e.g. Serbo-Croatian, Catalan and Valencian), and separate others that are usually grouped together (e.g. Low German is split into Dutch Low Saxon and German Low German). Theknightwho (talk) 12:14, 12 March 2025 (UTC)Reply

@Zff19930930 as a prolific Yola editor. —Mahāgaja · talk 06:22, 13 March 2025 (UTC)Reply

Yola is much more conservative than Early Modern English. For instance, baake (bake) /baːk/ was heard and recorded in A Modern Glossary of the Dialect of Forth and Bargy, page 154. Thus, Yola can't be classified as a dialect of Early Modern English.

There is a comment about Fingallian in A NORTHCOUNTY DUBLINGLOSSARY, page 262.

This district, Fingal, had in former times a dialect based on the I3th century colonial South-Western English of the Pale. Fingallian, of which we have only the slightest records, must have closely resembled the Forth dialect, recorded by Poole early in the last century; but, owing no doubt to its nearness to the capital, it did not keep its peculiarities so long. One naturally looks for traces of this ancient speech in a North-Dublin glossary, but they are few and doubtful.

Some words give a flavour of Fingallian, particularly forms like fat for "what", fen for "when", ame for "them" or plack-keet for "placket". Fingallian did exist, and was extinct by the mid-19th century.

I will clean some Irish English under the heading Fingallian. Zff19930930 (talk) 13:01, 13 March 2025 (UTC)Reply

You could get 'fat, fen, etc. (pronounced with voiceless bilabial , the Irish slender /f´/) in most of rural Ireland up to the 20th century, hence eg. making fun of phwat is yer nam?! in An Béal Bocht by Myles na gCopaleen (and I wouldn’t be extremely surprised if there were still some old people with that in the strongest Gaeltacht areas). // Silmeth ^@talk 19:13, 13 March 2025 (UTC)Reply

A single conservative feature is not nearly enough to object to 'Yola' being a dialect of English. The conservative lack of vowel shift /iː/ → /ai/ and /aː/ → /eː/ is interesting and I'm not sure about that in particular but all of the features of the dialects in Forth and Bargy/Fingal are widely attested elsewhere or are clear substrate features of Irish. As I said, scholarly consensus is unambiguous in referring in these as dialects of English and my contention is that the classification for wiktionary's purposes as 'languages' makes no sense when other much better attested and much more divergent varieties of historical and modern 'Englishes' are not 'languages'. My argument is essentially that 'Early Modern English' as used by Burnley (1992) refers to a period of the language's history c. 1500-1800. As 'Fingallian' and 'Yola' are dialects of English attested during this period, I think it makes sense to call them 'dialects of Early Modern English'. Especially given that this is the convention taken with other instances of dialectal or historical variation in Old English, Middle English and English such as here where variant spellings reflecting regional or historical differences are given as 'Alternative forms' under the headword 'fader'.

I'm not sure what the relevance of the quote is but the examples you give are sort of besides the point. As Silmeth pointed out, the substitution of English /ʍ/ for Gaelic /ɸ/ is not limited to Fingal/F&B and continued to be a feature of Irish English until very recently if not still amongst some people. The shift in stress in a word like 'placket' is a feature of F&B not Fingal (argued by O'Rahilly 1932 to be a result of Norman influence). You say that 'Fingallian' 'was extinct by the mid-19th century', do you have any source for that? No reliable sources I can find say anything but that the dialect was only attested in three poems in the 17th century.

MolingLuachra (talk) 15:00, 14 March 2025 (UTC)Reply

Coming here from Wikipedia, if such dialects as traditional Somerset and Dorset English aren't considered separate, then neither should the dialect of Forth and Bargy be. Many authors, even while it was alive, talked about how similar they are. Fingallian even less so. Not to mention half the Fingallian etymologies are just wrong and made up by someone who clearly doesn't know much about Irish or the English of Ireland. I'm after correcting one that very clearly comes from Irish, and there's several others.

Also, as @MolingLuachrasaid, Fingallian is attested by three poems in the 17th century and likely didn't last out the century; indeed, it's unknown if those poems were even written by speakers or rather people making fun of it. There's really absolutely no reason it should be considered separate here. Sionnachnaréaltaí (talk) 19:02, 14 March 2025 (UTC)Reply

Support. Dialects form a continuum, and it makes no sense to categorize them solely based on point of divergence. In this instance, it seems clear that Yola and Fingallian belong to the broader English continuum; they differ from prestige varieties, but so are most other traditional dialects, especially in the region. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 22:47, 17 March 2025 (UTC)Reply

Support. As I said earlier, there's no real reason to consider it separate apart from comparing it solely with the prestige language. Even the people who wrote about it while it was alive or recent deceased never considered the Forth and Bargy dialect (Yola) or Fingallian to be separate (and we have scant information on Fingallian to begin with). If we consider these two separate on linguistic features, we might as well consider every traditional dialect of English as a separate language; if we consider them separate based on geographic location, why is American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English, et al. not considered as separate languages? Really, it seems the differences are exaggerated because the nearest dialects to it in the continuum aren't spoken as strongly anymore, and because we compare it to modern prestige English, not to the continuum of its time.

Sionnachnaréaltaí (talk) 14:07, 18 March 2025 (UTC)Reply

Comparing it to the differences between American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English seems like a pretty big exaggeration, given they are all widely mutually-intelligible with each other. Theknightwho (talk) 14:35, 18 March 2025 (UTC)Reply

All the more reason to not consider it separate. As far as we know it was mutually intelligible to other dialects on the spectrum. If they're considered a single dialect continuum because of mutual intelligibility, then the Forth and Bargy dialect should be too based on what we know. Sionnachnaréaltaí (talk) 19:44, 18 March 2025 (UTC)Reply

Modern prestige BrEng is completely intelligible to me but many traditional Scottish and Irish lects aren't. Hell, many English lects aren't, either, and as a US Southerner I still often struggle to understand rural AAVE. Are these all separate languages? ;P Chances are, most everyone can understand people from the next town over (at least as far as traditional dialectal boundaries go); that's what a dialect continuum is, no? 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 20:33, 18 March 2025 (UTC)Reply

Prestige dialects of English (those you have mentioned, pretty much) have developed in parallel for a long time with consistent cross-pollination, thus in this era we cannot consider geography as the sole factor in the continuum. We instead need to examine each individual lect and compare them to all other lects that share similar features; in this case, as has been raised above, both Fingal and Yola share many features with other contemporaneous Irish lects, putting it in a similar position as other traditional regiolects; that is to say, I think I don't think anyone would be opposed to treating these as separate languages if e.g. traditional Yorkshire is also treated as such—but they arent. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 20:38, 18 March 2025 (UTC)Reply

(Would you also merge Scots under ==English==?) I'm somewhat ambivalent, but am inclined to keep Yola separate; it has a divergent history (like Scots, which past discussions have also strongly though not unanimously kept separate), and has an ISO code, distinguishing it from some of the dialects people have pointed to above. Furthermore, it seems like handling it as ==English== would in practice mean deleting coverage of it, since it's not clear to me it would meet the criteria for inclusion (three uses-not-mentions) that English words are subject to. Fingallian OTOH (added without discussion) seems more dubious, particularly because it seems the only works "attesting" it may be parodies of it rather than actual records of it, and thus not reliable bases for entries (consider the differences between African-American English and parodies of it). - -sche (discuss) 03:22, 19 March 2025 (UTC)Reply

Oppose merging for now for the above reasons. We have pretty strict inclusion criteria for English per WT:CFI & WT:WDL, so if merging means losing the coverage we have, I cannot support it. And FWIW the last time we tried to make a carve-out for variants of WDLs, it unfortunately did not pass: Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's. AG202 (talk) 03:50, 19 March 2025 (UTC)Reply

Merge Northern Kankanay to just Kankanaey

I would like for these to be merged (i.e. deleting Northern Kankanay `xnn` and moving them all to just Kankanaey `kne`). My reasons include:

Resources for Kankanaey doesn't differenciate whether they are just `kne` or specifically `xnn`, aside from a few SIL wordlists.
The most comprehensive dictionary source ({{R:kne:Vanoverbergh 1933}}, and basically all of Vanoverbergh's works) is created in Bauko. Bauko is located right in the middle of known `kne` speakers (northern Benguet) and `xnn` speakers (Sagada). I arbitrarily decided to use it under `kne`, but it is also valid to put it under `xnn`.
- Bauko, Tadian, and Sabangan are the three municipalities wherein I find very difficult to categorize whether it is `kne` or `xnn`. Some sources say they are `kne`, whole others contradict it.
From my research, the most defining difference is the way to say "yes" (`kne` is aw while `xnn` is owen). Other than that, it is really difficult to decide what language code to put a term under.
The SIL wordlists definitely have differences, however I think they are just dialectal differences, especially the "s" – "h", "r" – "l", and "man-" – "men-" allophones. I have documented some of these as just pronunciation variants.
Both sides have the almost the same vocabulary. They just differ in pronunciation. It is unnecessary to have two entries for the same word, one for `kne` and one for `xnn`.
The speakers of `xnn`, known as Applai, still refer to the language that they speak as "Kankanaey".
{{R:kne:Ortograpiya 2016}} is the standard orthography of Kankanaey. Does it also apply to `xnn`?

All of these `kne`-`xnn` headaches can be resolved by just merging them into `kne`. — 🍕 Yivan000 ^view_talk 09:33, 12 March 2025 (UTC)Reply

As added points:

The KWF (the language body for all Philippine languages) lists Northern Kankanaëy (Kankanaëy Aplay) as a dialect of Kankanay.
The physical book of {{RQ:kne:Kalin Diyos}} has a section wherein it notes that some words may not be understood by speakers of other Kankanaey dialects (Kali Ay Adi Maawatan Sin Am-in Ay Ili, page xi) and provides a word equivalence list to help (page 1535).

— 🍕 Yivan000 ^view_talk 06:30, 17 May 2025 (UTC)Reply

Proto-Luwian ?

Latest comment: 3 months ago3 comments2 people in discussion

I occasionally encounter reconstructions for Proto-Luwian. Kloekhorst has some, Dunkel also has at least one. So shouldn't this be added as language? Exarchus (talk) 14:06, 23 March 2025 (UTC)Reply

I think generally Proto-Luwian doesn't have a lot of difference from Proto-Anatolian, and the number of different lexemes is very limited. The classification of lower-branch Anatolian is also unclear, so we'll have a problem with deciding which languages are Luwian and which are not. Thadh (talk) 16:52, 28 March 2025 (UTC)Reply

We already have a category Luwic languages. But it seems both 'Proto-Luwic' and 'Proto-Luwian' are in use, and Kloekhorst differentiates between them here: "This means that Lycian stems from a sister language to Proto-Luwian and that both can be regarded as distinct daughters of Proto-Luwic." But Kloekhorst in his earlier dictionary apparently considers Lycian part of 'PLuw.', given as "Proto-Luwian" on page xii. So I'm indeed not sure how established this classification is. Exarchus (talk) 17:24, 28 March 2025 (UTC)Reply

Bali-Sasak-Sumbawa

Latest comment: 1 month ago4 comments2 people in discussion

Bali-Sasak-Sumbawa languages don't belong to the Malayo-Chamic branch (since Malayo-Chamic only includes Malayic and Chamic languages). Can we remove Proto-Malayo-Chamic from their "ancestors" on their language data? Alfarizi M (talk) 23:55, 30 March 2025 (UTC)Reply

Hello @User:Fenakhay, can you help me with this? I can't edit the modules. Here are the module pages Module:languages/data/3/b (Balinese), Module:languages/data/3/s (Sasak and Sumbawa). Alfarizi M (talk) 09:13, 20 May 2025 (UTC)Reply

@Alfarizi M:

Done. I've put them under Category:Bali-Sasak-Sumbawa languages (according to Wikipedia). Is that correct? — Fenakhay ^{(حيطي · مساهماتي)} 09:39, 20 May 2025 (UTC)Reply

Thank you so much! And it's correct. Alfarizi M (talk) 09:41, 20 May 2025 (UTC)Reply

April 2025

Merge codes cir and meg

Latest comment: 2 months ago3 comments2 people in discussion

In 2023, ISO 639-3 merged code meg "Mea" into cir "Tîrî". I propose we do the same. This should hopefully be non-controversial. I also propose we use the spelling "Tiri" without accents, which is more common in the literature (which uses "Tiri" or "Tinrin" over "Tîrî" or "Tĩrĩ"). The only reason the Wikipedia article is at Tîrî language is because Kwami moved it there; putting random accents in Wikipedia article names is his m.o. Benwing2 (talk) 07:05, 7 April 2025 (UTC)Reply

Support a merger. For the name, poking around Google Books and Google Scholar, the spelling I see most often is Tinrin, but I defer to you if Tiri is more common in more recent or more linguistic works, which I did not have time to try to quantify. - -sche (discuss) 18:32, 10 April 2025 (UTC)Reply

I don't actually know whether Tiri or Tinrin is preferred; I was just expressing my dispreference for the forms 'Tîrî' (as Wikipedia has it) or 'Tĩrĩ' (as it also is written). I think we should go with 'Tinrin', which in any case is less likely to clash with other languages (for example, the northern Somali dialect is also known as 'Maxaa Tiri'). Benwing2 (talk) 19:10, 10 April 2025 (UTC)Reply

Treat Category:E language as a Category:Tai languages

Latest comment: 2 months ago4 comments3 people in discussion

Wikipedia claims that E language is a "Tai–Chinese mixed language".

Luo & Deng (1998) (which argues for that mixed language stance) identifies 53 out of 98 Swadesh list vocabulary as Kra-Dai (which is still a majority of the vocabulary), and 33 out of 98 as Sinitic, but the latter includes several words that are miscategorised e.g. sɔŋ¹ (purportedly from 雙 / 双) is from *soːŋᴬ and ultimately from Middle Chinese 雙 (sraewng), ku¹ (purportedly from 孤 (gū)) is from *kuːᴬ.

{{R:eee:Wei & Wei 2011}} suggests that the many supposedly Sinitic features in E suggested by Luo & Deng (1998) can also be found in other Tai or Kra-Dai languages. They further proposes that E actually constitutes as the third group of Zhuang, but I don't find this extremely convincing.

Overall my impression is that E is just a Tai language with a very strong Sinitic influence, and I suggest that we simply set the parent of E eee to Tai tai. (with the added benefit of only having to link to the Proto Tai entry instead of listing cognates) – wpi (talk) 18:34, 12 April 2025 (UTC)Reply

Meh. It seems impossible to tell whether it is underlyingly a Tai language which has heavily mixed with Chinese, or a Tai-Chinese mixed language. I have no strong feelings, but it seems like it should be possible (and if it is not currently possible, we should make it possible) to say that a term in a mixed (e.g. Tai-Chinese) language derives from a (e.g. Tai) protolanguage root—and link there for cognates—even if we don't reclassify the language as a descendant of solely that protolanguage. - -sche (discuss) 03:59, 19 April 2025 (UTC)Reply

@-sche: I don't have very strong feelings for this, but I simply find that (a) from a linguistic perspective, the mixed language argument is less convincing than the other – by the same logic one could say that English is a mixed language due to its large French/Latinate vocabulary and French-influenced morphology (well there are some who takes such view but the general consensus is that English is a Germanic language), and (b) from a editing perspective, it will be easier to work on etymologies for a normal language (as opposed to a mixed language), see for example the often inconsistent etymology template usage in our pidgin and creole entries. – wpi (talk) 13:49, 20 April 2025 (UTC)Reply

Support. With languages lacking historical documentation there is often not a solid line to be drawn between mixed language vs. creole vs. heavy loanage vs. substratum effect, et cetera, but if a reasonably clear leaning can be discerned and is being commented on in literature, there should be no issue (for our purposes) treating it as straight inheritance. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 08:29, 21 April 2025 (UTC)Reply

Rename Wára language to Upper Morehead

Wikipedia has the language at Upper Morehead language, which is likely to be more distinct of a name than Wára (there are two other Wara languages given in Wikipedia), but even more to the point, the name Wára actually refers to one of the dialects of this language, not to the language itself. Although Ethnologue (and hence ISO 639-3) appears to use the term Wára for the language as a whole, Glottolog calls it Anta-Komnzo-Wára-Wérè-Kémä based on the five identifiable dialects. The only comment in Wikipedia about this language is:

Upper Morehead, also known as Wára, is a Papuan language of New Guinea. Varieties are Wára (Vara), Kómnjo (Rouku), Anta, and Wèré (Wärä); these are divergent enough to sometimes be listed as distinct languages.

So maybe at some point some of the dialects will be split into separate languages but at this point given the single ISO code and Glottolog's view, I would keep as a single language and use a term that does not match any individual dialect. @-sche? Benwing2 (talk) 05:53, 14 April 2025 (UTC)Reply

Oof, the fact that not one but two dialects/lects of this language are sometimes spelled Wara (give or take some diacritics) seems confusing, but Wikipedia says "Upper Morehead" is also polysemous, sometimes denoting Arammba instead. I will try to find out how commonly it denotes this language vs Arammba. (Exonymic placename language names like "Upper Morehead" always feel a little bit kludgy to me, but sometimes it can't be avoided.) - -sche (discuss) 03:59, 23 April 2025 (UTC)Reply

Add Hanlao language

Latest comment: 2 months ago3 comments3 people in discussion

Hanlao language (漢佬話 or 旱澇話 in Chinese, both romanises to Hanlao) is spoken in the northern parts of Qinzhou, Guangxi, China. The primary sources are Luo (2016) (Bulletin of Chinese Linguistics #9 pp121-150, accessible via https://www.academia.edu/59519239/) and the Qinzhou City Annals.

Its affiliation is unclear, most sources either claim that it is a Zhuang-ised Sinitic language or a mixed language between Tai and Sinitic. However based on the description in Luo (2016) (e.g. 57 out of 97 Swadesh list words are cognates with Zhuang and 22 out of 97 with Sinitic), I believe the case is likely similar to Category:E language above, i.e. Hanlao is a heavily Sinitised Tai language. At any rate, it is clearly distinct from other Sinitic or Tai languages.

There is no ISO code, so I propose tai-han. – wpi (talk) 18:37, 14 April 2025 (UTC)Reply

Support on adding the language,

Weak support on Tai inheritance; agree in principle per E above but this specimen seems to have received less relavent coverage in literature. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 08:30, 21 April 2025 (UTC)Reply

Support adding it; neutral on how to classify it. I tried to do my part / due diligence and look for sources about it (so things don't just sit on this page getting little input); the few mentions of it I could find do support that it is a language (as opposed to e.g. a dialect of another Zhuang language; a main concern whenever anyone proposes to add a new language is to be sure it isn't already / better covered as another language). Luo's paper suggests that more central vocabulary is Zhuangic and more peripheral words come from Cantonese, Pinghua, and Hakka, yes? which would perhaps suggest it is indeed underlyingly Tai. - -sche (discuss) 23:03, 21 April 2025 (UTC)Reply

Add Podlachian Language

Latest comment: 1 month ago10 comments7 people in discussion

The Podlachian Language is the East-Slavic language spoken between Narew and Bug. This language has own website and has article in Wikipedia, but there aaren't anything in Wiktionary. Could I create some articles about this here? PGałązka (talk) 10:34, 22 April 2025 (UTC)Reply

@Underfell Flowey @AshFox @Ssvb @Sławobóg @Thadh @Benwing2 as users with any sort of regular contact with East Slavic. Vininn126 (talk) 10:53, 22 April 2025 (UTC)Reply

Ok. I have any sort of regular contact with East Slavic too, I'm a half-Podlashuk. PGałązka (talk) 13:27, 22 April 2025 (UTC)Reply

@PGałązka: One of the biggest problems is that the w:Podlachian language doesn't seem to have the ISO 639-3 code yet (look at the "Proposal for several languages without ISO codes" topic above). Also there are questions about the availability of citations in durably archived sources and about the number of potential Wiktionary contributors in this language. If it's just you alone and you eventually lose interest, then the Podlachian content may become a liability. Additionally, your "half-Podlashuk" self-assessed status is not very reassuring, as there were some hot topics Wiktionary:Beer_parlour/2025/April#Prohibit_AI-generated_content and Wiktionary:Beer_parlour/2025/April#Formally_allowing_removal_of_Babel_boxes_by_other_users_if_proficiency_is_contradicted recently. These are the details that would be useful to clarify. --Ssvb (talk) 14:24, 22 April 2025 (UTC)Reply

"Once again they want to divide Ukrainian dialects into separate micro languages...". Actually I'm not against adding Podlachian, and even West Polesian... but East Slavic languages should be tidied up... he tree of East Slavic languages should ideally look like this (see below), which would maximally please reality... under such conditions I am only for adding Podlachian and West Polesian... and not any other inadequate options with "attempts to deduce Podlachian from the times of Kievan Rus". — AshFox (talk) 14:35, 22 April 2025 (UTC)Reply

* East Slavic:
** Old East Slavic:
*** Middle Russian: 
**** Russian:
*** Old Ruthenian:
**** Middle Belarusian: 
***** Belarusian:
**** Middle Ukrainian: 
***** Carpathian Rusyn:
***** Podlachian:
***** Ukrainian:
***** West Polesian:
** Old Novgorodian:
*** Old Pskovian:

OK. But i don't understand. Can I add this language or no? PGałązka (talk) 17:28, 23 April 2025 (UTC)Reply

I'm pretty sure only template editors and admins can add a language. —Mahāgaja · talk 20:18, 23 April 2025 (UTC)Reply

I don't think that's what they were asking. @PGałązka: On a technical level, if you want to add Podlachian entries, it should to be added to Module:languages (which, yeah, only a template editor or admin can do), but you need consensus from other people who make entries in East Slavic languages before it's added, hence the pings above. You should wait for their input; in my experience though it might be a little difficult to split off a new language code, since you'll also have to go through Ukrainian entries and decide whether they fall under Podlachian or not. Saph (talk) 21:59, 23 April 2025 (UTC)Reply

Sorry to revive this nearly-month-old thread, but my two cents as someone who does stuff with Carpathian and Pannonian Rusyn (formerly also Belarusian), has no ethno-linguistic dog in the fight, and has had a bit of contact with Podlachian media: I wouldn't classify Podlachian under Ukrainian or Belarusian. Old Ruthenian is as far as I'd confidently go in terms of ancestors, but Podlachian displays both features of Belarusian and Ukrainian, most notably akanie and /d͡zʲ/ from the Belarusian perspective, but also a greater prominence of /ɫe/ from the Ukrainian perspective. Not to mention that different varieties classified under the broad umbrella of Podlachian display different degrees of Ukrainian and Belarusian characteristics. The Maksymiuk standard of Podlachian largely doesn't take akanie into account for example (like Ukrainian), but Niczos from Sw@da x Niczos sings in a variety that has akanie (like Belarusian).

Nonetheless I think separate classification is still a good idea, precisely because of this etymological ambiguity as to whether it belongs under Belarusian or Ukrainian or both or neither. In addition, Podlachian seems to be written in both the Latin and Cyrillic scripts (contrary to Maksymiuk's best efforts), and classifying them under either Belarusian or Ukrainian would just clog up the "Belarusian/Ukrainian terms spelled with X" categories, as it already is doing. Instead, one could look towards Serbo-Croatian as an example, and list Podlachian as written in both Cyrillic and Latin so it doesn't generate a million "spelled with" categories. Of course it does need to have a distinct ISO code first, or some code needs to be invented for classification within Wiktionary.

@Ssvb: about availability of citations: the Wikipedia page indicates that there are several texts in Podlachian being published regularly, as well as novels, poetry and memoirs. That's more potential citations than the entirety of Solombala English, which seems to rely entirely on a small handful of sentences from the 1800s for actual usage. My concern is that relying on the Svoja.org website too much would create a disproportionate image and under-represent certain varieties of Podlachian.

But that's just my two cents. Insaneguy1083 (talk) 11:22, 19 May 2025 (UTC)Reply

@Insaneguy1083 Thanks for taking a look and posting your opinion. I also have done my own research of the available public information, but I still would like @PGałązka to first provide a lot more details (their self-assessed language competence and the geographical location of the place where they learned it, since there are many local variants), and then make a practical proposal for their vision of how things should be preferably handled. Ssvb (talk) 04:54, 20 May 2025 (UTC)Reply

May 2025

Rename Kulon-Pazeh to Pazeh–Kaxabu?

“Kulon–Pazeh” refers to a linguistic subgroup containing Pazeh and the extinct Kulon (if ever existed as a language). AFAIC this is an obsolete terminology. The Kaxabu people are culturally related to the Pazeh. Should we use the modern term Pazeh–Kaxabu like on wikipedia instead? I felt a bit confusing to use Kulon-Pazeh. Chihunglu83 (talk) 02:22, 10 May 2025 (UTC)Reply

Update: code uun is now a retired code and it was split into pzh and uon in 2022, while Kaxabu has not recognized as a language. Chihunglu83 (talk) 15:09, 14 May 2025 (UTC)Reply

Old Slovene

Latest comment: 9 days ago24 comments7 people in discussion

I propose to add new South Slavic language: Old Slovene.

It is attested only in Freising manuscripts, a text from 10th century. It is first Slavic language text written in latin script.
Text with original notation
Text with critical transcription
Phonetic transcription of the text
Complete dictionary + IPA
Few features:
- -dl- cluster preserved: modliti : Slovene moliti < Proto-Slavic *modlìti
- aorist still exists: bui < Proto-Slavic *by
- imperfect still exists: stradacho < Proto-Slavic *strada(a)xǫ
- nasals still exist: zueti < Proto-Slavic *svę̑tъ, zodni < Proto-Slavic *sǫdьnъ
- jat' (ě) still exists: delati < Proto-Slavic *dělati
- some words dont exist in modern Slovene: *natruti "to feed" (impf. 3rd pl natrovuechu) < Proto-Slavic *natruti, *sankt "holy" < sānctus (in 16th century only in placenames), *sotonin "of Satan" (cf. Old Church Slavonic сотонинъ (sotoninŭ))
code: zls-osl
Not sure which notation should we use
It would be nice if we used expected nominative/infinitive forms as pagenames

Pinging @Vininn126, Linyker¹²³, Chihunglu83. Sławobóg (talk) 12:32, 14 May 2025 (UTC)Reply

So far I am of two minds - the very early attestation is indeed noteworthy. There are a few other issues I have.

It seems to be an exceedingly small number of lemmas.
There are indeed unique features, but enough that we couldn't modify Slovene structure to support it?

Linyker mentioned on the discord he wants to do some reading up on the subject matter soon. Vininn126 (talk) 07:17, 15 May 2025 (UTC)Reply

I would not oppose as long as someone edits it. Chihunglu83 (talk) 16:29, 15 May 2025 (UTC)Reply

I once suggested adding Old Slovene a year or more ago, but for a different reason. I'm glad someone else suggested this idea again. I support it! (although I understand that with my zero reputation, no one will take my opinion into account). AshFox (talk) 00:47, 21 June 2025 (UTC)Reply

Some information: 1) w:Slovenes#History, 2) w:Carantanians#Language, 3) w:Slovene dialects#Evolution. AshFox (talk) 18:46, 21 June 2025 (UTC)Reply

I made a template for quotes from "Freising manuscripts": {{RQ:zls-osl:BS|3|40|||}}

c. 972 CE – 1000 , Freising manuscripts, Möll valley, Carinthia, text III, folio 161, col. 1, line 40:

AshFox (talk) 00:08, 23 June 2025 (UTC)Reply

I added a dictionary template: {{R:zls-osl:BS}}

Ogrin, Matija (2007) “Slovar besedja Brižinskih spomenikov”, in Brižinski spomeniki: Monumenta Frisingensia‎ (in Slovene), ISLLV, ZRC SAZU, →ISBN

AshFox (talk) 11:49, 23 June 2025 (UTC)Reply

@Benwing2, good day. Help us, "we are stuck in this hole" and can't move further... Sławobóg and I would still like to get a separate language code for Old Slovene and together with him I will formalize in Wiktionary all the known lemmas (there are about half a thousand of them and the task has a final goal) of this unique, very archaic Slavic language of the 900s AD, which existed parallel to Old Church Slavonic. I hope you find some free time to give all this attention. Best regards, AshFox (talk) 13:52, 22 June 2025 (UTC)Reply

I made guideline. Sławobóg (talk) 20:59, 22 June 2025 (UTC)Reply

@AshFox I don't have the requisite background to know whether this is a good idea or not. I do know we have an awful lot of Slavic and Baltic languages and I want to make sure adding another one is the right thing to do. @Vininn126 @Thadh thoughts? Benwing2 (talk) 21:16, 22 June 2025 (UTC)Reply

I'm not very knowledgeable either. I think it may be best to finish the above-linked guideline and work out the issues before adding the code, but other than that I can't really say whether this code makes sense or not. Thadh (talk) 21:27, 22 June 2025 (UTC)Reply

Guideline is finished. Sławobóg (talk) 06:57, 23 June 2025 (UTC)Reply

Why aren't we using a scholar's transcription? One is sourceable and reflects scholarly work on the field. Vininn126 (talk) 07:12, 23 June 2025 (UTC)Reply

There are many transcriptions, website mentions just 2 of them. Igor Grdina's transcription is not that good, it's close to original ortography, and for example ⟨s⟩ represents and ⟨z⟩ can represent and which is annoying; ⟨u⟩ represents and , ⟨c⟩ can be or etc. My transcription fixes all these problems, and is close to transcription made by Alexandr Vasiljevič Isačenko, to which I have no access anyway. Sławobóg (talk) 11:11, 23 June 2025 (UTC)Reply

@Sławobóg: Look at English: orthographies don't have to be phonetic, they just have to be consistent. Thadh (talk) 11:20, 23 June 2025 (UTC)Reply

I know, but there are much less letters used than in English. And why would we prefer this transcription over Isačenko's? Transcription I made is better, easier to work with, and more compatible with modern Slovene, like Slovincian is with Kashubian. Plus it's going to be only me and AshFox who are going to be working on this language. Sławobóg (talk) 11:28, 23 June 2025 (UTC)Reply

The Slovincian transcription is based on one, in fact the only one, used by some people who have worked with the lect. Vininn126 (talk) 11:40, 23 June 2025 (UTC)Reply

Nice. And my eťe bi dět naš ne segrěšil is very close to Isačenko's eťe bi dêt naš ne sɘgrêšil. Sławobóg (talk) 11:53, 23 June 2025 (UTC)Reply

Clarification for those who did not understand, the example of transliteration of Исаченко is taken from here w:pl:Zabytki fryzyńskie#Transkrypcje. AshFox (talk) 12:14, 23 June 2025 (UTC)Reply

We are ready. Sławobóg (talk) 19:59, 23 June 2025 (UTC)Reply

@Sławobóg, I suggested yesterday that perhaps we should preserve such a feature of the original text's spelling as the method of conveying Proto-Slavic *y through the digraph ⟨ui⟩ . It looks like it was inspired by Old Church Slavonic with its ъ (ŭ) + і (i) > ꙑ (ŭi = y). Examples from the text: ⟨buiti⟩ w/ alt. form ⟨biti⟩ “to be” (OCS бꙑти (byti), PSl *byti) or ⟨mui⟩ “we” (OCS мꙑ (my), PSl *my). Instead of the ⟨byti⟩ and ⟨my⟩ you suggested. Moreover, there are not many such words. And in the original text the letter ⟨y⟩ is not used even once. AshFox (talk) 12:10, 23 June 2025 (UTC)Reply

No reason to keep it. Sławobóg (talk) 12:50, 23 June 2025 (UTC)Reply

I also commented on discord that I'm not sure this needs a split or not and that more research is needed. It might be possible to include this in Slovene, maybe not. Vininn126 (talk) 21:32, 22 June 2025 (UTC)Reply

Slovene was and still is a heterogenous linguistic grouping. To build a prescriptive account of the whole Old Slovene language based on a single document from a single dialect is methodologically ungrounded. At the very least, it has to be clarified that listed entries are in the Old Carinthian dialect. Even though no other dialects have been attested (so far), they certainly did exist. Безименен (talk) 10:14, 28 June 2025 (UTC)Reply

Add Khuzestani Arabic

Latest comment: 1 month ago5 comments3 people in discussion

Spoken in Southern Khuzestan, Iran, as a branch from Iraqi Arabic with influences from Gulf Arabic, Luri and Persian. Around half a million speakers.

Wikipedia article: Khuzestani Arabic
No ISO code, acm-IR in IETF but we could use acm-ira like fa-ira

Saam-andar (talk) 17:24, 25 May 2025 (UTC)Reply

IMO we have too many Arabic L2's already and don't need another one, esp. as Wikipedia explicitly describes this as a dialect of Gilit Mesopotamian Arabic and not its own language. Benwing2 (talk) 19:34, 6 June 2025 (UTC)Reply

Hmmm actually are you proposing this to be an etym-only language? That's probably OK. Benwing2 (talk) 19:36, 6 June 2025 (UTC)Reply

@Benwing2 Fair enough, would be thankful to have it as an etym-only. Saam-andar (talk) 12:11, 7 June 2025 (UTC)Reply

I've already responded to this request on Discord. It is merely a subdialect of Gelet-type dialects which are grouped under Iraqi Arabic. — Fenakhay ^{(حيطي · مساهماتي)} 23:36, 6 June 2025 (UTC)Reply

June 2025

Adding codes for Ohlone and Miwok families and proto-language reconstructions

Latest comment: 17 days ago7 comments4 people in discussion

According to the book Ohlone/Costanoan Indians of the San Francisco Peninsula and their Neighbors, Yesterday and Today, the term Utian is derived from Proto-Costanoan uţxi ("two") + -ian, created by William F. Shipley in 1978. I wanted to add that to the Etymology section on Wiktionary, but Wiktionary doesn't have an appropriate language code.

"Proto-Costanoan", more accurately Proto-Ohlone, is a lower-order reconstruction of Proto-Utian (nai-utn-pro), and the reconstructed proto-language of the Ohlone languages; its numerals were in 1990 reconstructed by Catherine A. Callaghan, who has also referenced it in her other publications. Callaghan uses the name "Proto-Costanoan", but the standard modern-day term for the family is Ohlone, and the term "Proto-Ohlone" is often used nowadays.

With that in mind, I'd like to request that the Proto-Ohlone language be added to WT:LOL/S, with the code nai-ohl-pro. The Ohlone language family is not currently on Wiktionary, so I'd also like to request that a corresponding code be added to WT:LOF, with the code nai-ohl. For completeness, I also request the addition of the Miwok family (nai-miw) and Proto-Miwok (nai-miw-pro, also worked on by Callaghan), the other major subdivision of the Utian languages. Ookap (talk) 19:40, 12 June 2025 (UTC)Reply

@Ookap I moved your requests to WT:LTR as that is where these sorts of requests are normally made. I don't know anything about Ohlone or Miwok or even who to ping other than @-sche; you might poke around to see who has edited terms in these families and ping them. Benwing2 (talk) 20:43, 12 June 2025 (UTC)Reply

Thanks! I've updated Help:Adding and removing languages to mention this page instead of BP. Ookap (talk) 20:51, 12 June 2025 (UTC)Reply

@Benwing2: I'm certainly not an expert (my main interest is ethnobiology), but I'm familiar with pretty much all of the languages of California in very general terms, so I will often at least have an opinion on them. California has been inhabited for a long time, has lots of geological barriers (not to mention covering a very large area) and has been out of reach of all the pre-Columbian civilizations of the Americas, so the historical linguistics is very complicated and hard to resolve into anything large-scale. You have some families that are better known elsewhere, like Uto-Aztecan, Na-Dene and Algic, but then you have a number of isolates and smaller families. There were a couple of ambitious proposals in the early days, the Hokan languages and Penutian languages, that still haven't been proven on the highest level, but there's been some progress on demonstrating the validity of many of the parts.

The Utian languages are are one of those "Penutian" parts where progress has been made. The Miwokan languages have always been accepted as a valid group and the Ohlone languages (I've always known them as Costanoan) as well (with some debate as to whether the latter are languages or dialects). I don't know much on the substance, but it seems to me like Utian should be worthy of Wiktionary recognition, and maybe the Yok-Utian languages. — This unsigned comment was added by Chuck Entz (talk • contribs) at 04:59, 13 June 2025 (UTC).Reply

Adding family codes for Miwok and Costanoan/Ohlone seems reasonable, and adding Proto-Miwok and Proto-Costanoan—unfortunately, I can find very few sources calling it "Proto-Ohlone" (which may mean we should also call the family "Costanoan" for consistency). - -sche (discuss) 03:03, 14 June 2025 (UTC)Reply

IMO given the weight of sources we should be using "Costanoan" unless there is strong evidence of a recent shift towards "Ohlone", and the name of the proto-language needs to match the name of the family unless there's a really good reason for the divergence (which I don't see here). Benwing2 (talk) 03:18, 14 June 2025 (UTC)Reply

I agree with Chuck Entz in that these are pretty accepted groupings. People aren't completely sure about Penutian and Yok-Utian (and I find at least Penutian a bit dubious), but Utian has long been very proven and accepted (and, along with Proto-Utian, is in fact already on Wiktionary). Similarly, the Miwok (or Miwokan) and Ohlone language families are very clearly accepted groupings, and for me should be on Wiktionary.

With regard to whether to use "Ohlone" or "Costanoan"...unfortunately, almost all sources on Proto-Ohlone (Proto-Costanoan) are from the 1990s, meaning they use the name "Costanoan". Living in the area nowadays, I can say that the name "Costanoan" has completely fallen out of use for the ethnic group and language family, perhaps partially as part of recent efforts to revitalize their culture and languages. Most people here would likely not know what "Costanoan" meant, but know "Ohlone" well, and from what I know Ohlone people, while they might know the word, disclaim it as a colonizer term—even Wikipedia lists the ethnic group as "formerly known as Costanoan". With that said, given formal linguistic sources, most of which are older, mostly calling the language family Costanoan, I suppose I can understand why Wiktionary might want to call it that. My personal preference having grown up in the homeland of the Ohlone leans heavily toward "Proto-Ohlone" and "Ohlone languages", but my more important personal preference is that the proto-language is added, no matter the name. Ookap (talk) 08:07, 20 June 2025 (UTC)Reply

Adding code for Proto-Ainu

Latest comment: 24 days ago1 comment1 person in discussion

Several Ainu entries (such as プリ and アㇷ゚ト) show derivations from Proto-Ainu, but there's no corresponding code on WT:LOL/S so they can use templates as is the norm. Proto-Ainu, the reconstructed proto-language of the various Ainu dialects (or of the Ainuic family, already in Wiktionary as qfa-ain), has been reconstructed by Vovin (if not others), and we even have an appendix of reconstructions. Therefore, I request a code, likely ain-pro or qfa-ain-pro, be added, so Ainu entries can use proper Wiktionary formatting. Ookap (talk) 19:52, 12 June 2025 (UTC)Reply

request from User:AmazingJus: Updating languages found in Module:languages/data/2 and Module:languages/data/3/k

Latest comment: 17 days ago5 comments2 people in discussion

(moved from User talk:Theknightwho)

Hi, could you update the data for these two languages to add a bit more flexibility?

For Ewe (ee), it’d be great if diacritics are stripped at the entry level, specifically acute, grave, circumflex and caron? They correspond to high, low, rising and falling tones respectively.

For Krio (kri), likewise, remove diacritics (but only acute, grave, circumflex) and also add a sort key with the following order:

ɛ after e
gb after g
kp after k (digraphs gb and kp are both treated as separate phonemes)
ɔ after o

Cheers heaps — ^{oi yeah nah mate} amazing JUSSO ... ! 01:11, 9 June 2025 (UTC)Reply

Also pinging user @Fenakhay — ^{oi yeah nah mate} amazing JUSSO ... ! 22:44, 11 June 2025 (UTC)Reply

Before implementing this, I'd like to hear some confirmation from other knowledgeable editors that these changes are correct, or at the very least, sources showing that (a) these diacritics are used in dictionaries, (b) the diacritics are not used in running text outside of dictionaries. Benwing2 (talk) 03:20, 14 June 2025 (UTC)Reply

@Benwing2 For the Ewe language, the tone markings are based on Nuseline's Ewe-English dictionary and Basic Ewe for Foreign Students. In the latter source, it says "Note that native speakers of Ewe often leave the marking of tones aside. For learners of the language, however, the marking of tones is essential".

For the Krio entries, the tones and letter orders are based on A Krio-English dictionary by Clifford Nelson Fyle. The Wikipedia article also says for the tones: "Three tones can be distinguished in Krio and are sometimes marked with grave (à), acute (á), and circumflex (â) accents over the vowels for low, high, and falling tones respectively but these accents are not employed in normal usage." — ^{oi yeah nah mate} amazing JUSSO ... ! 23:08, 14 June 2025 (UTC)Reply

It seems like there isn't any update on this so far — feel free to have a look at these sources for reference @Benwing2 — ^{oi yeah nah mate} amazing JUSSO ... ! 01:22, 20 June 2025 (UTC)Reply

Levantine merger

Latest comment: 6 days ago38 comments6 people in discussion

I found out from here that the split between North and South Levantine Arabic was dissolved by the ISO two and a half years ago. The 2023 discussion about merging them on Wiktionary didn't end up going anywhere because of a lack of available contributors, and I think that situation is even worse today, as the South Levantine Arabic project has left Wiktionary and North Levantine Arabic never sees much concerted activity.

I think it's good for the Wiktionary merger to happen, but I have some concerns. I want to solicit ideas for making the work manageable, keeping in mind that it could easily fizzle out this time too.

After merging, I don't get how to account for all of Levantine. The expansion to Levantine Arabic adds a burden on everyone to know things about Levantine varieties they probably don't have knowledge of in order to make an entry complete.

This burden is technically there now too, but I feel like the old ISO split creates small enough halves that it feels okay to get away with only focusing on one subvariety within those halves: North Levantine seems to mostly have had Lebanese Arabic contributors (just with a translit convention that's kind of? inclusive of Damascene) and South Levantine focuses on Palestinian because it was part of User:AdrianAbdulBaha's push to improve online resources for Palestinian Arabic. This feels harder to handwave away now.

I do feel like focusing on as small of a comprehensive area as possible is how you get things done, which is why I'm concerned about expanding the scope of Levantine.

Relatedly I'm worried about the module/template infrastructure growing unmanageably large compared to the amount of people available to maintain it (let alone maybe being unreadably spammy on transclusion). I'm lazily working on Module:User:Still, when you think about it/apc-IPA to account for some of the variation in what "North Levantine" was supposed to cover, where South Levantine Arabic never had something similar, but it seems like now it'd be good for an {{apc-IPA}} to also cover Palestinian and Jordanian varieties that I don't have very much knowledge of how to divide. The same goes double for {{ajp-conj}}.

Supposing the merger does ago ahead, South Levantine Arabic has 3016 entries (128 non-lemmas) and North Levantine Arabic has 505 entries (9 non-lemmas). I feel like all of those entries will need to be checked for whether they're exclusively Palestinian/South Levantine, exclusively North Levantine, or shared in order to add the right term, sense, or accent labels.

I think this has to be done by checking terms against published references (even for those of us currently active who do speak Levantine Arabic — at least for my part I don't know all of what's used and not used outside of my own dialect). Would it be of use to add some kind of "warning, this term needs to be assigned to a location — you can help by locating it in these references and then removing this warning" template under the L2 header of all 3k-ish merged entries to allow the effort to continue even as contributors come and go over time? (Just saw that this is what User:A455bcd9 suggested during the initial conversation as well)

Might be overthinking. Pinging some Arabic contributors, including old ajp editors that may still be around: User:Fayçalmf, User:Fenakhay, User:Benwing2, User:SarahFatimaK

Possible references

J. Elihay's 2004 Olive Tree Dictionary of Palestinian Arabic
Lughatuna's tags for Syrian/Palestinian/Lebanese Arabic
- يَاسِين عَبْد الرَّحِيم، مَوْسُوعَةُ العَامِّيَّةِ السُّورِيَّة (yāsīn ʕabd ar-raḥīm, mawsūʕatu l-ʕāmmiyyati s-sūriyya) from 2012 for (urban?) Syrian Arabic, also available on Lughatuna
Anis Freiha's 1947 Lebanese Arabic dictionary ({{R:apc:Freiha:1947}})
Roger Makhlouf's 2018 Lebanese/English lexicon
Maybe Barthélemy's 1890 Aleppine dictionary
Jordanian references??

Still, when you think about it (talk) 18:40, 15 June 2025 (UTC)Reply

@Still, when you think about it In my experience, mergers are always harder than splits, and you're running up against this reality. I think in practice it's fine to have a warning indicating that a given term was originally North Levantine or South Levantine and hasn't been assigned appropriate labels. I also think focusing on a limited set of dialects is sufficient, maybe just urban Syrian, Lebanese and Palestinian, or the three + Jordanian (I don't know how different Jordanian Levantine is from Palestinian Levantine). As for designing the templates themselves, we can maybe follow the approach of Occitan, which has been able to handle several dialects under one L2, and design the templates so that if someone knows the correct inflections for only a limited set of dialects, only those dialects get displayed. I can help with the coding aspects. There's also the Richard Harrell series of Syrian Arabic grammar and dictionaries, I know these are a bit old but generally I have found the series reliable. Benwing2 (talk) 18:55, 15 June 2025 (UTC)Reply

Also, {{pt-IPA}} is able to handle several different Brazilian and European Portuguese dialects, and might produce some ideas as to how to handle the pronunciation differences. The general approach followed is to prefer a single spec that gives the maximal information (e.g. I know that some Levantine dialects have merged short ĭ and ŭ but others keep them apart; the "maximal information" would distinguish these two and the underlying code would merge them appropriately for the dialects that merge them), but allow different specs for different dialects. Benwing2 (talk) 18:58, 15 June 2025 (UTC)Reply

Finally, there is the issue of how to represent the script. All of the resources I'm familiar with use transcription, but Wiktionary prefers using the original script. I don't even know if there's a standard for how to represent the various dialects of Levantine Arabic in Arabic script, much less the specifics of how this works if it exists. Maybe you can help me understand this. Benwing2 (talk) 19:00, 15 June 2025 (UTC)Reply

There's no top-down standard for dialects specifically, but in real life people write their dialects in the Arabic script, which I say is good for a non-specialist dictionary to reflect. (The exception is mostly Lebanese speakers in their early 30s and younger, who practically exclusively write in 3arabizi online, which would be good to document but has too many random variables). Descriptively/impressionistically, spelling matches Standard Arabic spelling, except for

stopped interdentals (always spelled with the plosive letter)
emphatics that all dialects have deemphasized (always spelled with the plain letter, not the emphatic letter, like ركد (rakad, “to run”))
feminine -i (almost always spelled ـي to match morphological reanalysis, as in ـكي ـتي)
3ms -o, which Lebanese often spell ـو instead of ـه
other sound shifts that collapse a distinction that the Arabic script is supposed to indicate, where it's more correct to match the Fusha spelling but commonplace to respell phonetically (e.g. ق، ـوا، ـة, assibilated interdentals, emphatics in dialects that are losing emphasis across the board, etc)

Still, when you think about it (talk) 14:25, 16 June 2025 (UTC)Reply

Speaking of standards, this reminds me that translits are an issue I forgot about. We don't have room to make them show different variants like we do with IPA. Would it honestly be acceptable to just do without translits? If not, I'm imagining a weird amalgam of different pronunciations, and it's a little awkward because even though they're trans-"liter"-ations they kind of suggest pronunciation info nevertheless:

بيض (bayð̣, “eggs”)
In terms of pronunciation, the combo of -ay- and interdentals is rare, but in terms of translit that's the highest-info representation of what these letters spell
تلة (talle, “hill”)
There are dialects with invariable ة (/⁠-a⁠/), but you can always derive that from the form of a dialect with ة (/⁠-a, -e⁠/) and not vice versa, hence the symbol -e here. (Or in the true spirit of translit do we want a special symbol only for ة}?)
قبضاي (qabaḍāy, “macho man”)
This is from an Ottoman Turkish /d/, which formally was loaned as /dˤ/, but some speakers with interdentals went on to associate this /dˤ/ with their /ðˤ/. Is the proper form قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/)? (I would prefer قبضاي (qabaḍāy /⁠qabaḍāy, qabað̣āy⁠/) but I don't want /q/ in |ts=)
تعتير (taʕtīr, tiʕtīr, “miserable situation”) ~ تعثير (taʕṯīr, tiʕṯīr)
This interdental seems fine. The ت only spells t and the ث is only used for dialects with interdentals, in which it can spell ṯ. But I'm not sure how to smooth over the templatic variation in the first vowel. I guess a real "transliteration" would be تعتير (tʕtyr) ~ تعثير (tʕṯyr) (same goes for all examples, of course), but I don't think anyone would want that...
قاظان (qāẓān, “water heater”)
(I actually thought nobody said this with an interdental, the IPA on that page is new to me. If they do, then قاظان (qāẓān /⁠-ẓān, -ð̣ān⁠/, “water heater”) in the same vein as قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/, “macho man”) above?)
ظروف (ẓrūf, “circumstances”)?
Or ظروف (ẓrūf, ð̣rūf, “circumstances”), or ظروف (ẓrūf /⁠ẓrūf, ð̣rūf⁠/, “circumstances”)?
صغير (zḡīr, ẓḡīr, “small”) ~ زغير (zḡīr, ẓḡīr)

Anything stand out as super wrong? The q feels a bit unfortunate because it's impossible not to try to pronounce it (and it's a minority pronunciation). I'm not sure how useful it is to have to make up our own WT:AR TR system like this vs. just not doing translits, if Wiktionary would allow that, and leaning only on the IPA available in entries. Still, when you think about it (talk) 15:32, 16 June 2025 (UTC)Reply

We can remove the interdental pronunciation on قاظان if you're unsure about it. I'm not 100% either.

About transliteration, we should probably stick to a standard. Probably one that matches urban South Levantine/urban Syrian. Alternate pronunciations can be represented by IPA, but having a standard would probably be more helpful. The amalgamation approach would probably be confusing. Fayçalmf (talk) 15:47, 16 June 2025 (UTC)Reply

Okay, I think that's a better idea. Potential translit guidelines:

No imala, only ا (ā)
Only ق (ʔ) in native vocab
No interdentals, only ظ ذ ث (ẓ z s) and ض د ت (ḍ d t)
Distinguish lax from tense -e -i and -o -u as urban Syrians or urban Palestinians/Jordanians do (I guess by majority rule, e.g. I know a coastal Syrian guy with sani for سنة (“year”) but the default ought to be سنة (sine, sane, “year”))
- What to do about -y -w? My own dialect has /maʃe/ مَشي (“walking”), /raʔe/ رأي (“opinion”), /ħelo/ حلو (“sweet, pretty, nice”), /ʒaru/ جرو (“puppy”) (MSA loan), but my understanding is these are all /-i -u/ in dialects that distinguish lax from tense final vowels. The Olive Tree Dictionary also gives ḥil^ew as an option for حلو, apparently.
  - Can we do مَشي (mašy, “walking”), رأي (raʔy, “opinion”), حلو (ḥilw, “sweet, pretty, nice”), جرو (jarw, “puppy”)?
  - Or just مَشي (maši, “walking”), رأي (raʔi, “opinion”), حلو (ḥilu, “sweet, pretty, nice”), جرو (jaru, “puppy”) and leave the details to the IPA? I actually like this better visually.
No diphthongs, except in cases where a dialect like Damascene will have them, like of course أو (ʔaw, “or”) and I think elatives like أوضح (ʔawḍaḥ, “clearer, clearest”) instead of ʔōḍaḥ
- Can we actually just do diphthongs unconditionally? It wouldn't be faithful to most dialects' pronunciation but I'm wondering if it's an OK tradeoff.
  - بيت (bayt, “house”), بيضة (bayḍa, “egg”), لون (lawn, “color”), لوقا (lawʔa, “crooked”), فير (fēr, “hair straightener”), بوش (bōš, “nada”), بنطلون (banṭalōn, banṭalawn)?
    - I think still شلون (šlōn, “how”) though (see last bullet below)
  - Or just بيت (bēt, “house”), بيضة (bēḍa, “egg”), لون (lōn, “color”), لوقا (lōʔa, “crooked”), فير (fēr, “hair straightener”), بوش (bōš, “nada”), بنطلون (banṭalōn)?
Violate these guidelines if the word itself is in violation of them, especially e.g. if it's restricted to or in imitation of a dialect that has differing features

Still, when you think about it (talk) 16:30, 16 June 2025 (UTC)Reply

I like these guidelines.

I agree with maši, ḥilu instead of maši, ḥilw.

I'm not sure what to do about diphthongs. It could be a case by case basis like what Wikipedia does with British v American spellings & just leave it how the person who wrote the article put it in or we could standardise (only diphthongs vs no diphthongs except where they are in Damascene). Fayçalmf (talk) 17:17, 16 June 2025 (UTC)Reply

Forgot one more thing to consider that's a whole headache of its own, which is the treatment of kasra and damma. I think when they're medial and closed or stressed we'd do best by trying to adhere to original i/u, as in:

متل (mitl, “like”), ضفر (ḍufr, “fingernail, toenail”), صدر (ṣidr, “chest”), جملة (jumle, “sentence”), خلص (xiliṣ, “he finished”)?

Quick review of other options, though:

Schwa them both like a lot of Lebanese and Syrian references do, but importantly not a lot of Palestinian Arabic references:
متل (mətl, “like”), ضفر (ḍəfr, “fingernail, toenail”), صدر (ṣədr, “chest”), جملة (jəmle, “sentence”), خلص (xəleṣ, “he finished”)
Same but with ⟨i⟩ to be more inclusive of non-schwa-ing lects (like my idiolect, so this is just a personal pet peeve). This gets a bit strained when it comes to terms like بكرة ("bikra", bukra) that are still predominantly with u.
متل (mitl, “like”), ضفر (ḍifr, “fingernail, toenail”), صدر (ṣidr, “chest”), جملة (jimle, “sentence”), خلص (xileṣ, “he finished”)
Adhere to one's own lect, but at least in my case this isn't of much use: I mostly merge to kasra outside of some sporadic retentions of damma, I systematically round this vowel around emphatics, and I don't feel like I have schwa.
- If I were using ⟨i u⟩ I'd transcribe my dialect as: متل (mitl, “like”), ضفر (ḍufr, “fingernail, toenail”), صدر (ṣudr, “chest”), جملة (jumle, “sentence”), خلص (xuliṣ, “he finished”)
- Otherwise if it were totally up to me my dialect would be: متل (metl, “like”), ضفر (ḍofr, “fingernail, toenail”), صدر (ṣodr, “chest”), جملة (jomle, “sentence”), خلص (xoleṣ, “he finished”)
- This seems like something to care about in the IPA section, though, not in the translit.

There's also -iC -uC in final syllables, which Cowell (Damascene) and the South Levantine project on here represent with e o. I would prefer i u for symmetry with the above and because as a bonus it's more inclusive of a common type of Lebanese variety that really does have (which I believe often although not universally comes with -uC as well). We can separately figure out how to get more granular with lax/tense kasra and damma in terms of the IPA, though.

خلص (xiliṣ, “he finished”)

Lastly, there's the epenthetic vowel, which despite the fact that I sorta believe it's phonemic in many varieties I still believe shouldn't be represented in translit:

متل (mitl, “like”)

May have still forgotten other stuff, which I'll try add as it comes to mind (+I'll ask anyone reading to do the same too!).

Still, when you think about it (talk) 17:43, 16 June 2025 (UTC)Reply

I'm not a Levantine speaker but my vote is to maintain the i and u in transliteration according to conservative dialects that still maintain the original distinction clearly, except for words that don't exist in such dialects, where either i or schwa is fine. Dialects that merge the two can just ignore the distinction in pronunciation. This is problematic for dialects like yours where some u have merged with i but not all; either just ignore those dialects or show two transliterations, one with i and one with u. Hope this makes sense. Benwing2 (talk) 17:56, 16 June 2025 (UTC)Reply

I'm not completely opposed to a merger, but I'm not really for it either. It's mostly personal bias because I do like them being split, but if most contributers agreed to merging, I wouldn't have an issue with that. Fayçalmf (talk) 22:50, 15 June 2025 (UTC)Reply

Regarding IPA for different dialects, we could do what the main Arabic articles do when listing other dialect pronunciation & have the "main" pronunciation along with subdialectal pronunciations listed under it

Example using قاضي
IPA^(key): /ʔaː.dˤi/,
- (Druze, Coastal Syria) IPA^(key): /qaː.dˤi/,
- (Bedouin) IPA^(key): /ɡaː.dˤi/,
- (Fellahi) IPA^(key): /kˤaː.ðˤi/,

This would allow showing diversity in pronunciation while not needing contributers to have extensive knowledge on different Levantine dialects. Fayçalmf (talk) 02:23, 16 June 2025 (UTC)Reply

@Fayçalmf Yes, this is very similar to how {{pt-IPA}} handles Portuguese pronunciations. We have a "general Brazilian" pronunciation (reflecting an amalgam of the most common features cross-dialectally, and approximately the way newscasters in Brazil speak) and a "general Portugal" pronunciation (approximately reflecting a cultured Lisbon pronunciation), and nested underneath each are specific Brazil and Portugal regional pronunciations. This is also similar to how {{es-IPA}} works. So this approach is definitely feasible. Benwing2 (talk) 02:28, 16 June 2025 (UTC)Reply

I also like the split just because it gives us a smaller area to work with, but I can see why it's arbitrary and you can come up with other isoglosses to create whatever other split you would like. Relatedly, the other day I wanted to edit North Levantine Arabic منشان to add Western Neo-Aramaic miššōn- as a descendant and found that that page only has a South Levantine Arabic entry, and it felt bad to duplicate that whole thing to North Levantine Arabic just to add one tangential note. So this is my personal thinking. Still, when you think about it (talk) 14:30, 16 June 2025 (UTC)Reply

Yeah. Realistically speaking, after the initial hurdle of tidying everything up post-merge, it would be really nice to have everything contained into one Levantine Arabic section. Would it be possible to have categories for terms that are either exclusively South or North Levantine like we already have for Lebanese Arabic, Syrian Arabic, Palestinian & Jordanian? Things like هم can be placed into the South Levantine category & هن in the North?

Like:

هم • (homme) (enclitic form ـهم (-hom))

(South Levantine) they

--

هن • (hinne) (enclitic form ـهن (-hon, -yon, -on))

(North Levantine, Galilee) they

Fayçalmf (talk) 15:34, 16 June 2025 (UTC)Reply

@Fayçalmf Yes we can easily create such categories. Benwing2 (talk) 17:23, 16 June 2025 (UTC)Reply

@Still, when you think about it @Fayçalmf I moved this topic to WT:LTR, which is where we normally handle language splits and mergers. In order for this topic not to stall, it would help if one of you could create a list of what is needed compared with what we currently have, and think about drafting a plan of action. I can help with the latter, but am somewhat unsure about the former as I have not studied Levantine Arabic much (I took a couple of years of MSA classes back awhile ago when I was in school, and have studied Egyptian and Moroccan Arabic on my own in fair depth). Benwing2 (talk) 05:12, 17 June 2025 (UTC)Reply

Of course. Could I get a little more elaboration on "a list of what is needed compared with what we currently have?" Fayçalmf (talk) 05:22, 17 June 2025 (UTC)Reply

@Fayçalmf Ultimately what we want is a specific plan of action regarding steps to take to implement the merger. See for example Wiktionary:Grease_pit/2023/January#apc_and_ajp_merged, where I enumerated a possible plan of action for merging Levantine Arabic, and Wiktionary:Language_treatment_requests/Archives/2020-24#RFM_discussion:_February–March_2024, which has a similar but more recent plan of action for splitting Khanty into separate languages that was actually put into practice. Part of the work will be creating new modules and templates to handled the combined language, and we need at least a preliminary working version of these modules and templates before we put a lot of working into actually merging the lemmas. In order to create those templates, we need to know how they should behave, and this requires some input from Levantine speakers. The current North and South Levantine Arabic headword templates appear to be based on the Standard Arabic templates that @Fenakhay and I (among others) put together, but there are also South Levantine Arabic verb conjugation templates (there don't seem to be any such templates for North Levantine Arabic). The current templates are not designed for a multi-dialect language, so there will need to be some thinking about how to design them to handle the differences among Levantine dialects. One relatively simple way of handling different dialects is to have one headword line per dialect; see for example Galician querer, which has a line for the standard norm and another line for the reintegrationist norm, and similarly has two conjugation tables. Another approach is to not have anything in the headword; see for example Occitan alenar, which has 5 conjugation tables but nothing in the headword. This latter approach might not make sense for adjectives and nouns, because it requires a declension table for each adjective and noun, which might be overkill (e.g. for nouns, all you need to list is the plural). So what I would need as a start is a specific design for the noun, verb and adjective headword templates, with some examples of what the input would be and how it might display. I would start with nouns (which are easier than adjectives) and start with examples, rather than trying to come up with a design right away. Pick some common nouns and think about how to best display them, and then come up with a template syntax for specifying the relevant forms. I can help with the template syntax if I have several examples of the nouns and their plurals (both in Arabic script and transliteration). After that we can tackle adjectives, and then verbs. Benwing2 (talk) 06:00, 17 June 2025 (UTC)Reply

I might as well thrown in a couple of other multistandard systems: آب has Urdu (sister lect to Hindi) and Persian (Classical Persian/Iranian Persian/Dari/Tajik) entries where you can see different approaches and examples of infrastructure used to present the different scripts and pronunciations. Note that some of these have their own language codes, scripts and L2 headers, but there are templates that tie them together. Not that I'm specifically recommending any of these for the case at hand, but it may spark some ideas. Chuck Entz (talk) 06:55, 17 June 2025 (UTC)Reply

- Create a bot to turn all ajp articles into apc, alongside changing headers to ==Levantine Arabic==. We either have it leave lemmas with both to be dealt with manually like the original thread suggests, or we manually merge any terms in both ajp & apc to apc before running the bot.

- I'm not really sure how the ajp conjugation table works. If it can handle variations of the same form already (i.e. اطلع for South Levantine & طلاع for North), then great. If not, there need to be accomodations made.

- New categories created for 'North Levantine Arabic' & 'South Levantine Arabic' to hold region exclusive terms à la the country categories.

- There has to be an agreement regarding ر in IPA. ajp tends to use r/rˤ, while apc uses r/ɾ. I suggested earlier having a "standard" for IPA with regional variations below it, so we could use r for the standard and ɾ and rˤ for the regional pronunciations.

Those are some bullet points I have for now. I'd still like to hear input from @Still, when you think about it as well as what to do with tables & modules that already exist for apc/ajp to make a finalised plan for merging. I'll add more if I think of more things we need later on. Fayçalmf (talk) 11:31, 17 June 2025 (UTC)Reply

For sure, the ajp-conj template can be used as a base, but it needs updating to be able to handle variation. I like the Occitan example with multiple tables and I'll try to think about how to implement something similar.

About the categories, I'm wondering if we can avoid recreating the North/South Levantine split. Would it be possible to stick to "chiefly Syrian, Lebanese", "chiefly Palestinian, Jordanian", alongside transitional areas? I was convinced by User:A455bcd9's reasoning in the ISO proposal that said that the division was somewhat arbitrary and not derived from literature.

For IPA, I want to give all major pronunciation standards equal weight instead of deciding on one standard ourselves. This doesn't solve the issue of how to transcribe ر, but it does leave it up to individual accents to have it transcribed in their own way without interfering with ours. I'm coming up blank for now on what to do about it, though. Still, when you think about it (talk) 07:43, 18 June 2025 (UTC)Reply

Not sure countries are the best boundaries. Rural vs urban is often bigger than country A vs B. There are also sectarian differences (esp. Druze). So unless we know that a word is only widespread inside one country's border, it's better to stick to traditional areas ("Jerusalem", "Damascus", "Beqaa Valley", etc.). A455bcd9 (talk) 07:52, 18 June 2025 (UTC)Reply

Thanks! It's more daunting but you're right. Sorry about the two username mentions, I had mistaken you for inactive. Still, when you think about it (talk) 10:57, 24 June 2025 (UTC)Reply

A standard for IPA doesn't have to be based on one country/region's pronunciation. It can be a generalisation, then any deviations from the generalised pronunciation can be accounted for as well in the IPA underneath the "standard" one.

Some examples:

تقيل / ثقيل

IPA^(key): /tʔiːl/
- (Fellahi) IPA^(key): /θkˤiːl/
- (Druze) IPA^(key): /tqiːl/
- (Bedouin) IPA^(key): /tɡiːl/

-

برداية

IPA^(key): /-/,
- (Lebanon, Imāla) IPA^(key): /bɪɾˈdeːje/, /bɪɾdeːj/
- (Galilee) IPA^(key): /bur.dæːj/

-

شطرنج

IPA^(key): /-/, ,
- (hyperforeign) IPA^(key): /ʃɑ.tˤɑˈɾɒ̃ːʒ/
- (Jordan, regional) IPA^(key): /ʃɑ.tˤɑ.ɾɑndʒ/

-

I like this method personally because contributers can just put in the most basic form of the pronunciation & nuance can be added in later by other contributers. If we do go with this method, then we have to figure out an order for them to go in so they're not random in every article. Fayçalmf (talk) 13:32, 18 June 2025 (UTC)Reply

Draft of a plan to merge North & South Levantine Arabic:

1. Rename apc to "Levantine Arabic"

2. Merge tables. Edit declension table to be able to accommodate North Levantine as well. (Note: I can't code, so I don't know the logistics of this step.)

3. Create a bot to merge ajp into apc. Leave articles with both apc & ajp entries alone to be dealt with manually.

4. Once everything is converted to apc, delete ajp from the language list.

5. Levantine speaking contributers will have to work on tidying things like IPA & formatting to meet new standards.

@Still, when you think about it, @Benwing2, @A455bcd9, @Fay Freak

Any thoughts/criticisms? Anything that needs to be added or something else that needs to be taken into account that I missed? Fayçalmf (talk) 20:20, 21 June 2025 (UTC)Reply

Looks like you guys can do it. I generally only have bookish knowledge so only added a few, often obsolete, Levantine terms, to finish some etymologies or circumstantial curiosity, when I could not think of much argument to sneak them in under the general Arabic header, and depending on the source it was sometimes left open which Levantine a term was gathered from (e.g. {{R:ar:Berggren}} claiming to have both Damascus and Jerusalem). Fay Freak (talk) 20:42, 21 June 2025 (UTC)Reply

Hey, sorry I dodged this. Randomly don't have the free time I've had for the last month or so. Will do my best to keep on top of this nonetheless because I don't want to leave it half unfinished. These steps look good to me, I'm just still hung up on the small details: the specifics of IPA formatting (and getting my apc-IPA template to not be broken, although that's more of a side project) and, annoyingly, the i/u thing when it comes to verb headwords, as the "North Levantine" dialects I'm aware of leveled almost all Form I verbs to yiCCuC vs yiCCaC* whereas "South Levantine" dialects seem to maintain a robust distinction between yiCCiC and yiCCuC that's completely impenetrable to me. Fortunately this doesn't affect the lemma (which will be the past form), but I guess this means either spamming multiple headwords per entry or just not doing headword lines? I prefer the former.

Verb conjugation tables may be easier to deal with. The different systems I'm aware of are

Coastal Syrian katbit, katbīto
- Also up in the mountains they use originally imperative forms like طلاع as the base lf the 1sg
Nearby Lebanese katbit, katbíto due to these dialects' a-elision, not as a purely morphological thing (do these dialects also have katbto?)
North-ish Lebanese ʕaṭyit, staḥyit, ḡilyit (typically w/ nonpast 3fs+2fs taʕṭe, tistíḥe, tiḡle but somewhat rarely 2fs tistiḥye); ḥkī "speak!" (احكي not احكيه), ktōb, kōl
Typical Lebanese and urban Syrian ʕaṭit, štarit, ḡilyit; katabit, katabíto; bimši, bḡanni, biḡannu (ignoring -i -u vs -e -o); tistíḥi~tistáḥi; stʕart, ḵtart; yitruk; ʔiḥki, ktōb, kōl
South Lebanese tistḥi; stʕirt, ḵtirt
Transitional South Lebanese~Galilean katabit~katabat, katabáto, kátabato; ʔiktub~ʔuktub, kōl
Palestinian/Jordanian and Aleppine bamši, baḡanni, b(i)ḡannu (last one also found in Lebanese areas); tistáḥi~tistaḥi
Palestinian and Jordanian yitrik; ʔuktub, kul "eat!"
Jordanian and regional Palestinian? katbato

There's some stuff I don't know the details of like the Palestinian distribution of eg ramaw vs ramu or what dialects do ḡilit.

Conjugation tables technically don't need to show connecting forms so we can ignore the -o stuff to start with. I would like to represent 4 and 7, of course, plus 3 and 5 if possible. (My knowledge of lesser-used forms is poor the further south in the region we get.) This actually seems doable with minimal to no Lua, unless we want some logic to automatically show multiple tables.

Still, when you think about it (talk) 10:55, 24 June 2025 (UTC)Reply

Regarding the i/u thing, it's possible to put both in the transliteration or just leave it to what the contributer put it in as. بكرة is already just (bukra), so leaving it as is would be fine, for example. Fayçalmf (talk) 11:47, 24 June 2025 (UTC)Reply

I really dislike multiple headwords on the same word. It's ugly. I think we could simply have the transliteration reflect the different pronunciations.

مشى • (maša) (non-past بمشي (bamši, bimši))

The ajp article for أخد has 2 declension table for بوخد & باخد. Perhaps we could do something similar?

===Conjugation===

====Chiefly Lebanon===

Just a suggestion. The current ajp table did a good enough job with إجا on apc (with much more coding), so this approach could work. Any other ideas? Fayçalmf (talk) 22:47, 24 June 2025 (UTC)Reply

I see your point about multiple headwords -- maybe whenever it's needed we can equally just add a new L3 with {{alternative form of}} (just did this at جاتوه) -- and the fact that small variations seem easy enough to represent within one tr:

كبس • (kabas) (non-past يكبس (yikbis, yikbus), active participle كابس (kābis))

The one last tr-related thing on my mind is when it comes to usexes and quotes. I believe the trans-"lit" for quotes should also follow pronunciation, like I did for Salam el-Rassi at أما (or to a lesser extent the yṣaḥḥ at واوا). I think usex translits can also just be in whatever dialect the usexer is most comfortable using or transcribing, especially because I don't see a reason to want to change the translits for all the ajp usexes. Can we enforce the use of an accent qualifier for usexes and quotes, like the (Lebanon) at the bottom of عبكرة?

Also, that last part and the IPA business seems like it means it's worth sitting down and figuring out an acceptable set of discrete sub-accents/dialects to enforce consistent representations of, which should be a priority but not block the merger from

happening to start with. Still, when you think about it (talk) 17:59, 26 June 2025 (UTC)Reply

I agree with your points about translit.

Druze/Coastal Syria is already being represented in apc, and Bedouin & Galilee pronunciations in ajp. We could represent Fellahi accents too, and then for anything else have cities to represent them if needed (which Galilee already is doing).

دكتور

IPA^(key): /dokˈtoːɾ/, /dʊkˈtoːr/
- (Beirut) IPA^(key): /ˈdʊk.tʊɾ/, /dɔkˈtœɾ/

Should Imāla be its own subsection? I think it should be with exception to Lebanon-only words like ڤيتاس.

I mentioned order before, should the subsections go alphabetically or do you think there's a better way to arrange them? No matter what, if we have multiple city specific pronunciations, those should definitely go alphabetically. Fayçalmf (talk) 11:40, 27 June 2025 (UTC)Reply

Does the order they're in really matter? Most words won't require more than 3 variations anyway Fayçalmf (talk) 02:35, 29 June 2025 (UTC)Reply

Actually, in terms of making a template, it does. How about the "standard," then Imāla, Druze/Coastal Syria (separated if need be), Bedouin, Fellahi, then anything else like hyperforeignisms or Galilee can be manually added underneath.

IPA^(key): /-/
- (Imāla, chiefly Lebanon) IPA^(key): /-/
- (Druze/Coastal Syria) IPA^(key): /-/
- (Bedouin) IPA^(key): /-/
- (Fellahi) IPA^(key): /-/

Fayçalmf (talk) 03:53, 30 June 2025 (UTC)Reply

Personal preference: no base form, each variant we list goes next to the others, but we put the more-urban options up top like you're doing here. Damascene, metro Lebanese, ?urban central Palestinian/Jordanian?, and then Druze, coastal Syrian, Beqaa/Qalamoun, Fellahi, and others? I found this classification of Palestinian and Jordanian dialects by Palva that may help decide on representative forms from down there, although it seems a bit outdated (on the one hand 1984 isn't at all long ago but on the other hand it says Galilean dialects predominantly preserve interdentals and /q/, which I know exists but I'm not sure it's predominant?).

I am wondering if we can get by without the imala tag. I see the merit in referencing the common name for the phenomenon visible in some pronunciations, but it'll also add clutter.

I'm admittedly dragging my feet on looking into botting the ajp->apc conversion but I believe that the only things we'll need in order to get started are that and maybe updated declension tables (since that infrastructure already exists). IPA pronunciations (since not much infrastructure already exists for them) can maybe be left as is to start with, with "Palestinian" appended to the current ajp accent quals and "Damascene" added as an accent qual for the current unlabeled apc pronunciations? Still, when you think about it (talk) 16:35, 1 July 2025 (UTC)Reply

I can do without a base form & leaving the translit to be the "generalised" pronunciation instead. The Imāla tag would essentially be the same as "metro Lebanese," so if we're doing the latter, we don't need the former.

I agree with the last part about adding quals. I think it would be helpful to have on articles before we get to manually adjusting thing. Fayçalmf (talk) 18:55, 1 July 2025 (UTC)Reply

+ We'll have to specify in the Levantine Arabic terms with /ɡ/ category that it's only for words that are pronounced with it in the majority of dialects. Otherwise, almost every word with ق would be viable to include. Also adding pre-existing ajp terms to the category that fit the criteria like جمبري, جول, أغورة, مزچان, etc.

Same with /p/ (i.e. دبرس) and /v/ (i.e. فيديو) Fayçalmf (talk) 19:02, 1 July 2025 (UTC)Reply

IPA transcription of Pannonian Rusyn "в" before a consonant

Latest comment: 20 days ago7 comments4 people in discussion

I have discovered, both through a Pannonian Rusyn grammar book and listening to actual Pannonian speakers, that "в" before a consonant does not usually make a /v/ or /f/ sound, but rather it is more like the Ukrainian/Carpathian Rusyn/Slovak realization, where it's closer to /w/. The problem is, I don't know exactly which IPA symbol to use. The grammar book transcribes the sound as the Belarusian ў (it literally gives праўда (praŭda) as an example of realization). The Carpathian Rusyn pronunciation template uses /w/. Ukrainian and Belarusian IPA templates use /u̯/. Whereas there doesn't seem to be a consistent way of transcribing it in Slovak IPA.

Sources: In this video, at 7:28, the narrator says жовта (žovta), and at 7:38, the narrator says правдиве (pravdive). There's probably more examples in that video but those are just two in immediate succession. And I found the grammar book here, the в stuff is on page 16. (It's in Pannonian Rusyn, but I just wanted to prove that the ў stuff is actually in the book and that I'm not making it up.) Although the book does note that в before ч or ш is pronounced /f/, in words like вчас (včas) or вшелїяк (všeljijak).

So which IPA symbol, /u̯/ or /w/, do we think should be used for в before a consonant in the rsk-IPA module? Pinging @Sławobóg, @Vininn126, @AshFox for your thoughts. Insaneguy1083 (talk) 23:37, 16 June 2025 (UTC)Reply

A question you should be asking yourself is is this phonemic or just phonetic? In which case your use of // is wrong. Vininn126 (talk) 04:09, 17 June 2025 (UTC)Reply

Well it's square brackets in the IPA template. I'm just used to writing forward slashes more casually. So should it be or then, you reckon? Insaneguy1083 (talk) 06:12, 17 June 2025 (UTC)Reply

I agree with what Ben said. Vininn126 (talk) 06:18, 17 June 2025 (UTC)Reply

I don't think it matters that much; might be better simply because it's more familiar to the average reader and easier to type. Benwing2 (talk) 05:15, 17 June 2025 (UTC)Reply

I agree. There isn't a real difference between /u̯/ and /w/; the choice between is more about what aspects you want to emphasize. Use /u̯/ if you want to categorize it as a vowel that is nonsyllabic in this environment; use /w/ if you want to categorize it as a consonant. In this case, we probably want to categorize it as a consonant since it alternates with /v/ in syllable-initial position, so /w/ is probably the better choice. But that doesn't mean /u̯/ is "wrong". —Mahāgaja · talk 06:29, 17 June 2025 (UTC)Reply

Well put. Vininn126 (talk) 06:30, 17 June 2025 (UTC)Reply

Pannonian Rusyn nonvirile?

Latest comment: 16 days ago10 comments3 people in discussion

Sorry that I'm adding another topic on Pannonian after such a short interval, but whose idea was it to add nonvirile as a separate noun gender? None of the Pannonian dictionaries that I use specifically define nonvirile as opposed to masculine pluralia tantum. There's masculine p.t., there's feminine p.t., and neuter p.t. in Pannonian. Not even Czech or Slovak use nonvirile on here. Did someone follow the Polish model a little too hard? If anyone can show me specific and definitive Pannonian documentation that nonvirile is defined as a noun gender, then fine, but otherwise I'll be reverting all the existing NV nouns into their respective pluralia tantums. Insaneguy1083 (talk) 16:27, 19 June 2025 (UTC)Reply

@Insaneguy1083 Before you just revert everything, see who added them and ping them to get their views. Maybe they had some reason, maybe not. Benwing2 (talk) 17:37, 20 June 2025 (UTC)Reply

@Thadh Hi, I've removed the nonvirile noun gender for Pannonian Rusyn nouns, since none of the dictionaries I use specifically mention nonvirile as a gender as opposed to just pluralia tantum. Even череґи (čeregi), the noun which you specifically changed to be NV, is listed in the dictionary as, and I quote, ж. мн. (ž. mn.). And there are neuter pluralia tantum like уста (usta) which are listed as с. мн. (s. mn.) in the same dictionary. Czech and Slovak don't use NV on here either, nor any other Slavic languages outside of the immediate Polish-sphere. Insaneguy1083 (talk) 18:11, 20 June 2025 (UTC)Reply

Did you specifically ignore what I said? I said ping them before reverting. Benwing2 (talk) 20:51, 20 June 2025 (UTC)Reply

Well, I had already reverted before you sent the initial message. It's not as if there are that many NV nouns anyway. There's like 14 of them or something, if even that, and it's just a matter of changing a few characters in the rsk-noun template if there exists an actual justification to use NV as opposed to just pluralia tantum. Insaneguy1083 (talk) 21:05, 20 June 2025 (UTC)Reply

@Insaneguy1083, Benwing2: Unlike Czech and standard Slovak, Pannonian Rusyn and afaik Eastern Slovak have a completely different gender system, where masculine human nouns have a different inflection than masculine animate, masculine inanimate, feminine or neuter:

я жем желєного мужу // я жем желєних мужох

я жем желєного коня // я жем желєни конї

я жем желєни лимун // я жем желєни лимуни

я жем желєне яблуко // я жем желєни яблука

я жем желєну вишню // я жем желєни вишнї

Now, I don't know if you notice this, but this is exactly the same system as in Polish. And as in Polish, it is impossible to tell from agreement whether a plural-only noun is masculine non-human, feminine or neuter, except for its inflection class, where mixed classes are still present. Now, it's nice that the Rusyn dictionaries you use have decided on some arbitrary gender for these nouns, but unfortunately we should be able to document any Pannonian Rusyn noun, which includes those that do not have an earlier dictionary entry. Furthermore, just like in Polish, there is nothing that makes череґи inherently feminine rather than masculine or neuter unless a singular *череґа exists. The third-person singular pronoun is the same for all genders, as are verbal endings.

I would appreciate it if you did not unilaterally remove such things from modules without first understanding the motivation behind it. Thadh (talk) 23:43, 20 June 2025 (UTC)Reply

@Thadh: That adjectival declension separating masculine personal (i.e. virile) and all others was already implemented in rsk-decl-adj. And if you had checked the referenced 2010 dictionary, you'll find that there does, in fact, exist череґа (čerega). To quote directly from the 2010 Rusyn-Serbian dictionary:

череґи ж. мн. (єд. череґа) кул. листови, мафиши

As you can see, it does point out the existence of a череґа (čerega), which on Wiktionary we can decline fully using rsk-decl-noun-f. And personally, I feel like if a Rusyn dictionary, written by Rusyns, indicates a singular form with a specified gender, then maybe we should take their word for it and implement this word as a feminine noun (arguably not even pluralia tantum to be honest, more like a feminine noun that is chiefly in the plural).

I've read the 1997 dictionary's grammar section, and I've also read the entire nouns and adjectives sections of the 2005 edition of the dedicated Pannonian Rusyn grammar book Ґраматика руского язика (quote from page 35: &28.1. Меновнїки можу буц хлопского, женского и стреднього род. (&28.1. Menovnjiki možu buc xlopskoho, ženskoho i strednʹoho rod.)). By all indications, even Rusyns themselves writing about Rusyn grammar do not specifically differentiate a "non-masculine-personal" gender for any context, other than pointing out that the plural accusative form of adjectives have a different form based on whether the noun is masculine personal.

It's nice that you'd like to document any Pannonian Rusyn noun, "which includes those that do not have an earlier dictionary entry". But the 2010 dictionary is pretty comprehensive (other than proscribed colloquial words like да (da)), and gives a gender for every pluralia tantum. And I feel like specifying the gender, e.g. to harmonize with etymology and cognates in the case of уста (usta), is rather important even if the resultant declension is the same with say a feminine p.t. or masculine inanimate p.t.. Insaneguy1083 (talk) 08:28, 21 June 2025 (UTC)Reply

@Insaneguy1083: How don't you see that the fact череґа exist is the reason the noun is feminine? Not all plural nouns have a singular though, even hypothetically. Those are the nonvirile nouns. Thadh (talk) 08:48, 21 June 2025 (UTC)Reply

@Thadh: Bottom line, languages in the immediate Polish-sphere use nonvirile as a grammatical gender because it is specifically laid out as one (niemęskoosobowy) in the official Polish grammatical canon. For Pannonian Rusyn, nonvirile is NOT in itself defined as its own gender, there doesn't exist any *хлопскоособови (*xlopskoosobovi), and Rusyn dictionaries do as much to provide the gender of pluralia tantum like уста (usta) or дзвери (dzveri), even if the declension for non-masculine-personal nouns are all the same in the plural, even if there doesn't exist a singular form. It seems disingenuous (and frankly unnecessary) to group a series of nouns using the noun gender system of a completely different paradigm, just because there are perceived similarities to the Polish system. If Pannonian Rusyns themselves decide one day that they will start using the nonvirile classification and classifying nouns as such in their own dictionaries, fine. But for the time being, differentiating the adjectival declension using rsk-decl-adj seems very much sufficient to me to address the differences between virile and nonvirile nouns. I'm just following the official line here.

@Vininn126 @Sławobóg as someone who interacts more with Polish-related entries, what are your thoughts on this? Insaneguy1083 (talk) 09:36, 21 June 2025 (UTC)Reply

I think it's disingenuous to follow Slovak grammar (which the dictionaries in question in this case seemingly follow) to explain Pannonian Rusyn grammar. If a word is a pluralia tantum, not attested in the singular, and uses the same case agreement in the nominative and accusative, then it is simply not part of any gender other than "not masculine personal". There's no way to otherwise see what the gender is, and using etymology or other languages is not only not sustainable, it's dishonest. Thadh (talk) 09:48, 21 June 2025 (UTC)Reply

Update Baltic (Golyad language & Dnieper Baltic group / classification of Galindian)

Latest comment: 10 days ago12 comments7 people in discussion

Wiktionary has a Galindian language and a code xgl for it. But the problem is that this language code combines 2 different languages. (w:Galindians)

Galindian xgl (synonym: West Galindian) ‒ language of West Baltic group, from Northeastern Poland. (w:Galindian language)
Golyad language (synonym: East Galindian) ‒ language of Dnieper Baltic group (which was forgotten on Wiktionary), from area near Moscow, Russia. (w:Golyad language)

The problem is similar to... if, for example, let's imagine... that Wiktionary ignored the divisions into Low German nds and (High) German de ‒ indicating both of them as "German". Explaining this by saying that "after all, both come from Proto-West Germanic gmw-pro and the names are similar, so specifying which is which is unnecessary."

I suggest:

1. Galindian xgl which is on Wiktionary:
1.1. Add a synonym "West Galindian" to it (or rename it altogether, this is at the discretion of the admins, but I think renaming is a bad idea).

1.2. Move it to West Baltic bat-wes, next to Old Prussian prg. Galindian xgl is currently outside the group ‒ precisely because it combines 2 completely different languages from different Baltic groups.
2. Add a 3rd group of Baltic languages (in addition to West Baltic and East Baltic) that was previously missing from Wiktionary:

Proposed name: Dnieper Baltic (w:Dnieper Balts / w:Dnieper-Oka language)

Synonyms: Dnieper-Oka Baltic, Eastern Peripheral Baltic

Code: eg. bat-dni (similar to East Baltic bat-eas and West Baltic bat-wes)

3. Add new L2 language ‒ Golyad language (w:Golyad language):

Proposed name: Golyad

Synonyms: East Galindian / Scripts: Latin script (Latn)

Code: eg. xgl-eas or considering the group ~ bat-dni-gol ~ bat-dni-gld.

─┬ Baltic (bat)
 ├┬ Dnieper Baltic (bat-dni)
 │   └── Golyad (xgl-eas)
 ├─ East Baltic (bat-eas)
 └┬ West Baltic (bat-wes)
     └── Galindian (xgl)

A stand-alone code for Golyad language will help to better indicate the etymologies of some words: eg. Old East Slavic голѧдь (golędĭ); many hydronyms in Russian and toponyms (eg. city Russian Волокола́мск (Volokolámsk)), some dialectal and not only Russian words, eg. Russian кромса́ть (kromsátʹ, “to shred”). I don't know who actively edits the Baltic languages and who is better to ping... @Vininn126, what do you say? Sorry if this is not your section and I pinged you in vain. AshFox (talk) 20:39, 20 June 2025 (UTC)Reply

UPDATE: Just noticed that @-sche raised a similar issue in January 2024: Wiktionary:Language treatment requests#Proposal for several languages without ISO codes. He also noticed that ISO has a mistake... they mistakenly combine 2 different languages in one Galindian xgl or something like that. AshFox (talk) 20:46, 20 June 2025 (UTC)Reply

Just a comment: We have sooooooooo many different Balto-Slavic codes. Do we really need more? Can we get away with etym-only codes? Benwing2 (talk) 20:53, 20 June 2025 (UTC)Reply

@Benwing2 Sorry, I understand that this is unfortunately not another code for a new dialect of the Polish language... but here is a situation where 2 different languages from completely different groups are mistakenly hidden under one language code. These are not 2 dialects of one language. These are separate languages that were territorially separated by 970‒1020 km. They did not even touch...

I understand that we have many language codes for other Baltic extinct languages, half of which are not attested. But is this an excuse to completely forget about the separate Dnieper Baltic group and not fix the error with the ISO code xgl. AshFox (talk) 21:19, 20 June 2025 (UTC)Reply

Wikipedia says neither (West) Galindian nor Golyad is actually attested in writing. If that's true, I don't think we should make L2's out of them, since there will never be lemmas in main space for them. On the other hand, making them etym-only means deciding what languages they're etym-only variants of. WP says (West) Galindian is "thought to have been a dialect of Old Prussian, or a Western Baltic language similar to Old Prussian", so it could be a variant of prg. But Golyad is apparently a variety of Dnieper-Oka, which is also unattested and has no ISO code, and which seems to be a branch of Baltic unto itself. Could we make it an etym-only variant of Proto-Baltic, which is itself already an etym-only variant of Proto-Balto-Slavic? Would we even want to, since however Dnieper-Oka and Golyad have been reconstructed, they're bound to be very different from PBS? Are there even published reconstructions of Dnieper-Oka and/or Golyad? —Mahāgaja · talk 21:28, 20 June 2025 (UTC)Reply

@Mahagaja Who said that if a language is not attested in writing, it cannot be L2? If so, then we need to be consistent and other Baltic languages (Skalvian svx, Curonian xcu, Selonian sxl, Semigallian xzm) that are not documented in writing should be converted into etymological codes... but this is, in my opinion, a stupid limitation. AshFox (talk) 21:37, 20 June 2025 (UTC)Reply

As for reconstructions of the Golyad language, I have not found a separate "Golyad dictionary" before, and I did not look for it. However, I have come across individual reconstructions in the entries of Russian etymological dictionaries under certain Russian words/toponyms/hydronyms that have Dnieper Baltic (Golyad) origin. For the sake of interest, I simply opened the Russian Wikipedia w:ru:Голядский язык article and chose the first link that came up ‒ Топоров В. Н. О балтийском элементе в Подмосковье // Baltistica and even so, some reconstructions of the Dnieper-Oka Baltic roots that were the source of hydronyms near Moscow are already visible there. But this does not mean at all that it will be necessary to create purposefully reconstructed entries for Golyad on Wiktionary. It is possible only for those Russian words having Dnieper-Oka Baltic origin... as an example. AshFox (talk) 21:59, 20 June 2025 (UTC)Reply

I agree with Mahagaja. These are substrates, not attested nor comparatively reconstructed languages, and shouldn't have L2s. If there are any other such languages we do consider L2s, then yes, these should also be removed. Thadh (talk) 23:47, 20 June 2025 (UTC)Reply

The problem is that Balto-Slavic classification is thoroughly muddled by politics because Slavs ruled and subjugated Baltic speakers and exaggerated the importance of Slavic. There are those who consider the division of East and West Baltic to be at the same level as the Slavic branch and there are those in Baltic linguistics who refuse to even consider the idea because they don't want to be in the same language family with Slavic. I'm not qualified to say who's right, but I think it's better to spell things out clearly and thoroughly so that we don't get caught up in such disputes. Better to treat everything as separate so it can be located in the classifications of either version. This is especially true with Galindan, since it could be argued that any grouping that contained both parts would also contain everything else in the entire Balto-Slavic family. Chuck Entz (talk) 21:58, 20 June 2025 (UTC)Reply

It is just that a demonym has been used twice, as Serbs and Sorbs are actually the same word, or we have to distinguish two Moldavias, and lots of oikonyms parallelly formed due to the limits of human creativity as applied on the limited language inventories, sometimes realistically close to each other as Schröttinghausen (33 kilometres or like seven hours by foot between each).

And AshFox is fully competent an editor to express needful distinctions. It is a well-known fact that Slavic speakers displaced some Baltic and Uralic languages. Some would only have been spoken in a few of such villages but between a tad greater distances, as we know in detail from descriptions of Africa or Papua in the recent century. You also imagine, after the published notes of less armchairy linguists than we are, how much fun the field-work of describing them all is, and that hence the historical material offers some diluted randomness, but no one longs to establish reconstructions, almost guaranteed as national myths don't depend on these peripheralia, so I think Mahāgaja misunderstood the purpose of filling the language data tree with languages we know to have existed, without prejudice to their contents. Mereological disagreement? There is a point in keeping their addition low-threshold by already vouchsafing their template codes; one already expends some motivation to outline and request them all, that putting in the effort as well to scratch the bottom of the barrel for the last content in Trümmersprachen would appear prohibitive. Fay Freak (talk) 23:21, 20 June 2025 (UTC)Reply

FTR, I think the unsigned January 2024 proposal to split Galindian was by Theknightwho; my contribution was "What is there to add in either language?"; I see now that the answer is "mentions in the etymologies of other languages". Galindian (Western Galindian) could be considered a variety of Old Prussian; if someone were proposing to add a code for it, I'd say make it an ety-only variant of prg; since the ISO/SIL and we already have a full code for it, we could just as well leave it as-is, unless someone particularly wants to reclassify it (it doesn't change much; either way, the only place the code's used is in etymologies). For Eastern Galindian / Dnieper Baltic, if it's not attested but it needs to be mentioned in some etymology sections, then any code we add should indeed be an etymology-only code, like Mahagaja and others have said; make the Baltic language family its parent (like Suevic has West Germanic as its parent). Do reference works, e.g. etymological dictionaries of the languages that are thought to have borrowed from the lect(s), tend to reconstruct Eastern Galindian and Dnieper Baltic as separate things that different terms derive from? I am wondering if we really need two separate codes or just one. - -sche (discuss) 01:17, 21 June 2025 (UTC)Reply

Can anyone speak to whether both "Eastern Galindian"/"Golyad" and "Dnieper Baltic" need etymology-only codes, e.g. if they are commonly reconstructed separately by dictionaries of Russian etymologies and would need to be mentioned separately in our entries' etymology sections, or whether we could get by with one etymology-only code? E.g. if we only need to say that some Russian (and other) words may derive from EG/Golyad, then we don't necessarily need a code for Dnieper Baltic, we can just have an etymology-only code for EG/G with the Baltic family as its parent. - -sche (discuss) 22:04, 26 June 2025 (UTC)Reply

Proto-Oghuz and Proto-Arghu

Latest comment: 3 days ago61 comments6 people in discussion

see similar heading in February 2025

If it's not possible or difficult, I have another idea. Instead of making Proto-Oghuz anti-asterisk, we can try this:

Oghuz:
- Proto-Oghuz: (trk-ogz-pro)
  - Middle Oghuz: (xqa-ogz) / (mid-ogz)
    - Old Anatolian Turkish: (trk-oat)
      - Gagauz: (gag)
      - Ottoman Turkish: (trk-oat)
        Balkan Gagauz Turkish: (bgx)
        
        Turkish: (tr)
    - Classical Azerbaijani: (az-cls)
      - Azerbaijani: (az)
      - Qashqai: (qxq)
    - Salar: (slr)
    - Turkmen: (TK)

Arghu:
- Middle Arghu: (xqa-arg) / (mid-arg)
  - Khalaj: (klj)

Middle- because they are one of the Middle Turkic languages. @Benwing2 @Surjection@AmaçsızBirKişi @Rttle1@Ardahan Karabağ@Bartanaqa

BurakD53 (talk) 01:53, 21 June 2025 (UTC)Reply

If you ask me, if it were up to me, I would also want Chigil (xqa-chi) and Yaghma (xqa-yag) as Karakhanid dialects; unlike Oghuz and Arghu, these directly point to the Karakhanid language because they are from the same branch. What I mean is, according to Kaşgarlı, both belong to the Karluk tribal confederation. I would want them, but the problem is, you won’t give them to me. BurakD53 (talk) 02:15, 21 June 2025 (UTC)Reply

Actually, the Proto-Oghuz period roughly corresponds to Middle Oghuz, most likely around the same time, but unfortunately, anti-asterisk doesn’t work. Either you’re too busy, or you’re not sure, or you prefer it to remain as a reconstructed structure. I would prefer one of those, the Proto-Oghuz language, if it has to be a reconstruction to entry lemmas, then Middle Oghuz is okay. BurakD53 (talk) 02:26, 21 June 2025 (UTC)Reply

Yes, they should be roughly from the same time period, but if someone wants to keep them separate, I understand and support that too. I just want the Oghuz-related entries in a dialectical dictionary written in Karakhanid to be separated out and assigned to the Oghuz category. That’s my point of contention. BurakD53 (talk) 02:32, 21 June 2025 (UTC)Reply

I've lost my PDF of DLT long time ago and too lazy to download it again tbh, so I won't be able to take a look to Chigil and Yaghma words. But I think, it is unnecessary to add language codes for dialects. We can mention them with the code of Karakhanid. Ardahan Karabağ (talk) 09:52, 21 June 2025 (UTC)Reply

Partially

Support, but maybe we should use a different name than "Middle ...". We can also just change into a non-asterisking descendant too, instead of adding new language codes.

AmaçsızBirKişi (talk) 06:41, 21 June 2025 (UTC)Reply

Oghuz is also totally okay for me, but I'm not sure if it's appropriate because it's also the name of the language family. BurakD53 (talk) 10:41, 21 June 2025 (UTC)Reply

I agree. BurakD53 (talk) 10:43, 21 June 2025 (UTC)Reply

Since I don't have enough informations about Arghu, I'm assuming that "Middle Arghu" is the one that is attested in DLT? If so, then I propose Proto-Arghu > Middle Arghu > Arghu, but as @AmaçsızBirKişi said, we could use a different name for Middle Arghu. Ardahan Karabağ (talk) 09:45, 21 June 2025 (UTC)Reply

I think Middle Arghu is a good choice because this term is in use in also other languages family. For example, Middle Chinese, Middle Mongolian, Middle English, etc. @AmaçsızBirKişi @Ardahan Karabağ BurakD53 (talk) 10:46, 21 June 2025 (UTC)Reply

I mean, it is, if there is no Proto-Arghu without asterisks. If there is, I'm not sure how old Arghu branch is, maybe also... 🤷‍♂️ BurakD53 (talk) 10:50, 21 June 2025 (UTC)Reply

@BurakD53 My apologies, I have been busy but I will look into the feasibility of implementing the anti-asterisk feature today or tomorrow at the latest. I don't think it will be difficult but I need to verify this in the code. Benwing2 (talk) 19:35, 21 June 2025 (UTC)Reply

My request is:

Proto-Turkic: (trk-pro)
- Oghuz: (family)
  - Proto-Oghuz: (trk-ogz-pro) <<<<<<<<<<
    - Salar: (slr)
    - Turkmen: (tk)
    - Old Anatolian Turkish: (trk-oat)
      - ?
        Azerbaijani (az)
        
        Qashqai (qxq)
      - Gagauz (gag)
      - Ottoman Turkish (ota)
        Turkish (tr)
        
        Balkan Gagauz Turkish (bgx)
        
        Cypriot Turkish (tr-CY) – BurakD53 (talk) 18:53, 30 June 2025 (UTC)Reply

Are all of these meant to be separate L2's? Even Cypriot Turkish? Can you clarify this? Benwing2 (talk) 19:03, 30 June 2025 (UTC)Reply

Cypriot Turkish people formally use the Turkish language, it shouldn't. BurakD53 (talk) 19:08, 30 June 2025 (UTC)Reply

it's just a dialect close to southwestern dialects BurakD53 (talk) 19:09, 30 June 2025 (UTC)Reply

You need to make a full proposal indicating what is an etym-only language, what is an L2 language, who is the parent and ancestor of what, and what the existing situation is. It's too confusing in the form you've presented it, for someone like me who is not an expert on Old Turkic languages. Benwing2 (talk) 20:00, 30 June 2025 (UTC)Reply

If we are going to add new langcodes, we should add Ajem-Turkic/Classical Azerbaijani where you left a question mark too, is my suggestion. There's previous discussion over it too, and I remember it was favored somewhat.

AmaçsızBirKişi (talk) 19:14, 30 June 2025 (UTC)Reply

There is az-cls code but it's the descendant of az, maybe it was a mistake BurakD53 (talk) 19:19, 30 June 2025 (UTC)Reply

I don't have a clear opinion on this topic. As I said before, someone should extract the az-cls sources and clearly define what data this language is based on. BurakD53 (talk) 19:21, 30 June 2025 (UTC)Reply

I believe would correspond to 14th-18/19th century Azerbaijani literature, like Chagatai. I'm sure at least someone in the future might get an interest and start creating such entries like the bulk of Ottoman entries we have

AmaçsızBirKişi (talk) 19:35, 30 June 2025 (UTC)Reply

I don't have the works in this language, I heard some but I cant find their pdfs – BurakD53 (talk) 23:09, 30 June 2025 (UTC)Reply

Also while we're at it Fuzuli, Şah İsmail etc. would be Classical Azerbaijani. And since some archaic words like yügüş, şol are technically unattested in the Latin script and therefore in Modern Azerbaijani it would belong at Classical Azerbaijani. Bartanaqa (talk) 19:26, 3 July 2025 (UTC)Reply

Sorry, not my area of expertise, but how does literature (both Turkish and English) usually label these DLT attestations? Looking up "Proto-Oghuz" "al-Kashgari" on Google Books did not give me the results I hoped for. I was wondering, we could keep Proto-Oghuz trk-ogz-pro as an etym-only code to Proto-Turkic, while keeping these forms under a new code and a new name, like Old Oghuz for example (trk-oog?), by analogy of the contemporary branches of Old Turkic, like Old Uyghur. Name and code may vary, take this generally as the two-code suggestion. Catonif (talk) 19:49, 30 June 2025 (UTC)Reply

I don’t really think this is necessary, but honestly it doesn’t matter at all, because what we call it isn’t that important. I had actually suggested this at the beginning of the discussion. I thought Middle Oghuz might be an appropriate term. The naming was criticized, yet the Turkic languages of this period are referred to as Middle Turkic languages. So, calling it Middle Oghuz isn’t wrong, and calling it Old Oghuz isn’t wrong either. – BurakD53 (talk) 23:16, 30 June 2025 (UTC)Reply

Since the languages we refer to as Middle Turkic lasted until the end of the Middle Ages, perhaps we made a mistake by using a broader term. So, Old Oghuz (trk-oog) is reasonable and appropriate. – BurakD53 (talk) 23:22, 30 June 2025 (UTC)Reply

I didn’t quote the sentence that Kashgari wrote in Arabic, I included the usage example he provided in the sentence as a usage example. – BurakD53 (talk) 23:35, 30 June 2025 (UTC)Reply


Proto-Turkic	Old Oghuz (11th ce.)	Modern Oghuz except Salar	Salar
-g	-g	-Ø	-Ø
k-	k-	g- generally	g- generally
-gAn	-An	-An	-An for old words, -gAn as a suffix
-gU	-AsI	-AsI	-gUsI as a suffix
-gAk	-Ak	-Ak	-Ak
*yarısgu	yarısa	yarasa	yarasan, yarsan, yersan
?	-gsI/-gsAk	?	?
-gUçI	-dAçI	-IcI	-gUçI
-dI	-dA	-DI	-ci

Old Oghuz is different than trk-oat or any other. – BurakD53 (talk) 00:15, 1 July 2025 (UTC)Reply

How many words are we talking about? If it's like 10, it might not make sense to create a L2 language just for that. Benwing2 (talk) 03:22, 1 July 2025 (UTC)Reply

more than 250 BurakD53 (talk) 09:36, 1 July 2025 (UTC)Reply

there are 111 in my user page, and it s not the half of it BurakD53 (talk) 09:43, 1 July 2025 (UTC)Reply

OK, in that case it should be a separate L2 I think. The anti-asterisk feature is intended for cases where a language is primarily reconstructed but has a small number of scattered attestations, like Proto-West-Germanic. What you're describing sounds more like Proto-Norse, which we consider an attested language despite the "Proto-" prefix because it has a corpus of several hundred words. Benwing2 (talk) 20:32, 1 July 2025 (UTC)Reply

OK, I see. BurakD53 (talk) 00:07, 2 July 2025 (UTC)Reply

Am I going to have it? (trk-oog) – BurakD53 (talk) 21:54, 2 July 2025 (UTC)Reply

I think this is reasonable. Proto-Turkic is something like 500 BC right? So Old Oghuz would be 1500 years later, which is a long time for linguistic developments to occur. @Catonif @AmaçsızBirKişi @BurakD53 what do you think? In order to create this I need to know:

(a) which script(s) was/were the language written in? (Arabic? anything else? and is it Perso-Arabic specifically? We have a whole lot of different Arabic script varieties listed in Module:scripts/data)

(b) what are the ancestor(s)? presumably just Proto-Oghuz?

(c) what is the correct name? Old Oghuz or Middle Oghuz?

(d) what are the direct descendant(s)? maybe Turkmen and Old Anatolian Turkish? Is Salar a descendant or does it descend from a sister language?

(e) how different is this from Old Anatolian Turkish? Could we alternatively make this an etym-only variant of OAT?

Benwing2 (talk) 02:27, 3 July 2025 (UTC)Reply

(a) Arabic

(b) Proto-Turkic > Proto-Oghuz > Old Oghuz

(c) Old Oghuz or just Oghuz

(d) Salar is a descendant. Turkmen, OAT and Salar are exact descendants.

(e) No, we can’t. It is more archaic than Old Anatolian Turkish. The part I wrote as Modern Oghuz in the table above also applies to OAT: Old Oghuz temürgen, OAT demren; OO arqamak, OAT aramaq; OO tuğrağ, OAT tuğra; OO satğaşmaq, OAT sataşmaq; OO bâqırmaq, OAT bağırmaq; OO ö(:)tünç, OAT ödünç... So it is really different. BurakD53 (talk) 02:49, 3 July 2025 (UTC)Reply

OO çekük, OAT çeküç BurakD53 (talk) 02:54, 3 July 2025 (UTC)Reply

OK. Keep in mind it's possible for an ancestor of a language to be an etym-only variant of it (as with Old Italian vs. Italian) but if you think they're different enough that this doesn't make sense, I'll follow your advice. However we need to establish the time periods clearly; Wikipedia says that OAT was spoken from the 11th to the 15th centuries, which overlaps with the 11th century time frame for Old Oghuz. Benwing2 (talk) 02:59, 3 July 2025 (UTC)Reply

Then Wikipedia is wrong, because the earliest Old Anatolian Turkish work was only written in the 13th century. Location: Eastern Anatolia. There are no written works before that. Mahmud al-Kashgari wrote his dictionary in the 11th century. Location: Middle Asia and probably part of Iran. If we assume Old Oghuz language as the language of the Oghuz Yabgu state, we can place it in the 9th-11th centuries, thus including the data recorded by Arab travelers passing through the Oghuz Yabgu territory during those years. BurakD53 (talk) 03:28, 3 July 2025 (UTC)Reply

All right, once I hear from @Catonif and @AmaçsızBirKişi I will create the L2. Benwing2 (talk) 03:36, 3 July 2025 (UTC)Reply

Support

AmaçsızBirKişi (talk) 06:28, 3 July 2025 (UTC)Reply

Support, thank you for the involvement. :) So what code are we settled on in the end? Because looking back at it xqa-ogz made perhaps more sense (not that it's a crucial detail, I don't mean to slow this down). Catonif (talk) 10:09, 3 July 2025 (UTC)Reply

OK. Let's get xqa-ogz L2, so I can enter its entries. Later, if I have time to deal with xqa-arg and to find out how many lemmas there, I’ll request that one too.

Support – BurakD53 (talk) 18:36, 3 July 2025 (UTC)Reply

Shouldn't it be Old Oghuz rather than just Oghuz, which is properly the name of a family? What is the name in the literature? Benwing2 (talk) 18:44, 3 July 2025 (UTC)Reply

Makes sense. xqa-ogz should be Old Oghuz. – BurakD53 (talk) 18:47, 3 July 2025 (UTC)Reply

careful with the name "Old Oghuz" tho. Some Turkish scholars use it to refer to Old Anatolian Turkish Bartanaqa (talk) 18:47, 3 July 2025 (UTC)Reply

@Bartanaqa Do you know how literature usually calls DLT Oghuz? Catonif (talk) 18:53, 3 July 2025 (UTC)Reply

According to book I have, title "Ana-Oğuzca Durum Morfemleri" by Kenan Azılı:

Proto-Oghuz
- Salar
- Selchuk Oghuz
  - Turkmen
  - Horasan Turkmen
  - Old Anatolian Turkish
    - Turkish (<Ottoman)
    - Gagauz
    - Azerbaijani

BurakD53 (talk) 19:12, 3 July 2025 (UTC)Reply

Alternatively Medieval Oghuz is an option but Old Anatolian Turkish is also a medieval language. – BurakD53 (talk) 18:56, 3 July 2025 (UTC)Reply

But I really liked it. It is also used for early Oghuzs in academia. – BurakD53 (talk) 19:00, 3 July 2025 (UTC)Reply

I mean can't we just call it "Oghuz" or just like we are doing "Proto-Oghuz". Or alternatively "Middle Oghuz" but icl Old Oghuz might be the most fit. Bartanaqa (talk) 19:06, 3 July 2025 (UTC)Reply

Other possible names are "Common Oghuz" (if this is truly the ancestor of all attested Oghuz languages) or "Early Oghuz". Benwing2 (talk) 19:13, 3 July 2025 (UTC)Reply

Alright, sources I checked simply call these attestations "Oghuz", which for our scopes is too generic. "Middle Oghuz" would make sense but it is confusing to have a "Middle" older than an "Old" (OAT). "Common" and "Proto-" usually refer to theoretical concepts rather than attested languages. "Medieval" is undeniably true but perhaps too vague for a language with a definition this specific (i.e. DLT). "Early" is a synonym of "Old" much less common in language names. I say we adopt "Old", and if it isn't a recognised label we will make it one. Catonif (talk) 19:30, 3 July 2025 (UTC)Reply

All right, we should just go with Old Oghuz or maybe Early Old Oghuz. If no further discussion I'll go with Old Oghuz. Benwing2 (talk) 20:03, 3 July 2025 (UTC)Reply

Early Old Oghuz is a perfect match, this way it won't be confused with Old Anatolian Turkish. – BurakD53 (talk) 20:09, 3 July 2025 (UTC)Reply

@BurakD53 @Catonif @Bartanaqa @AmaçsızBirKişi I created this language under the name "Early Old Oghuz" and put Salar, OAT and Turkmen as descendants. I don't know if putting Salar as the descendant is correct; it wasn't even specified as an Oghuz language, which I changed. Benwing2 (talk) 23:37, 3 July 2025 (UTC)Reply

Thank you for your all efforts on this topic. And I also thank everyone else who has been involved with this topic, for their support. Salar is an Oghuz language, and that's academicly correct. It's not wrong this way. But we’ll get a clearer idea over time whether it actually descends from Early Old Oghuz. – BurakD53 (talk) 23:51, 3 July 2025 (UTC)Reply

Thank you. Although "Early Old Oghuz" seems to imply the presence of "Late Old Oghuz" as another label. For what it's worth, in an informal vote on Discord "Old Oghuz" got 3/5 votes. Catonif (talk) 09:39, 4 July 2025 (UTC)Reply

No, actually there is no need. Late Old Oghuz is Old Anatolian Turkish. Old Oghuz has been used to refer OAT in Academia. – BurakD53 (talk) 13:06, 4 July 2025 (UTC)Reply

Yeah since Turkmens used Chagatai to produce documents until 18th Century and the Salar just didn't the only other medieval Oghuz language is OAT Bartanaqa (talk) 13:25, 4 July 2025 (UTC)Reply

I think Early Oghuz would be better. Yes, they are descendants of all Oghuz, because we're not sure if Oghuz in Karakhanid were homogeneous. Actually, we know that it wasn't. – BurakD53 (talk) 19:38, 3 July 2025 (UTC)Reply

Name of the Yalë / Yale language

Latest comment: 6 days ago6 comments2 people in discussion

I was looking at the page for bo and saw an entry for the language "Yale". I didn't know if this was an actual language, some kind of secret code used by Yale University students, or a spurious entry. When I tried to look for the language here on Wiktionary, I didn't find it at Yale, but I eventually found it at Category:Yale language which links to the Wikipedia page for the Yalë language. Before I can create an entry for the language's name, there is a question: should the language be canonically called Yale or Yalë on Wiktionary? And I believe this is a correct place to ask.

Evidence: Currently the categories and entry section headers on Wiktionary do not use the diaeresis. The Wikipedia page, Wikidata Yalë (Q2992915), and ELP use Yalë. Glottolog does not use the diaeresis in the page title but the comment on endangerment does. SIL and a 2020 paper from SIL-PNG authors do not use the diaeresis. Note that this paper uses Yade in the pdf name, though it primarily uses Yale. The paper is authored by Aannestad based on data left by the Campbells, who died before the grammar could be written up formally (mentioned on pg 5). On page 10 it says: "Yale has also been called ‘Yade’ and ‘Yare’, due to different transcriptions of the sound ; and in the Campbells’ orthography, it is properly spelled ‘Yalë’." This suggests to me that the canonical name here should be Yalë, but I don't think that's a decision I can or should make unilaterally.

Results of a decision may implicate:

Creating a Yalë page with hatnote links between it and the Yale page
Updating section headings for existing entries (e.g. bo#Yale would be changed to bo#Yalë)
Updating category names (e.g. Category:Yale lemmas to Category:Yalë lemmas)
Creating a Wiktionary:About Yalë page, if it would contain something not already at Category:Yale language
Create redirects or entries for common variant spellings/names for the language (e.g. Yale (language), Yadë, Yade, Yare, Nagatman, Nagatiman) to the canonical language name

Misc:

My interest in this topic is mainly just internal consistency: if words of this language are present in Wiktionary, then the word naming the language should itself have an entry. (I'm not planning to make long term contributions on the topic.)
The language currently only has 8 terms here on Wiktionary and the English Wikipedia page actually defines more words from this language than Wiktionary does. I don't know if that means the language is essentially "out of scope" for wiktionary or if it is currently a "stub".

Solid kalium (talk) 23:51, 28 June 2025 (UTC)Reply

Usually we don't include diacritic marks in the canonical names of languages if there is doubt as to whether the diacritic belongs. Wikipedia is not a good reference to use for this because they have a strong bias (based largely on Wikipedia user Kwamikagami) towards including diacritical marks regardless of what the literature prefers. Since Glottolog, SIL/Ethnologue and numerous sources agree on not including the diacritic, and there is no possibility of ambiguity or confusion, I would oppose a rename. Benwing2 (talk) 19:58, 30 June 2025 (UTC)Reply

Thanks! I've added definition entries for this language and people at Yale#Etymology 2.

I don't know if there's value in adding redirects or mentions at the alternate spellings/names. I haven't looked to see where they might be used other than in enumerated lists of alternate names. If you'd like me to add these, just let me know. Solid kalium (talk) 02:42, 1 July 2025 (UTC)Reply

Also, just adding that I realized "in the Campbells’ orthography, it is properly spelled ‘Yalë’" means that the name of the language, in the language itself, in a particular orthography, is spelled with the diacritic. Which is distinct from the name of the language when writing about it in English. Solid kalium (talk) 03:00, 1 July 2025 (UTC)Reply

Yup, exactly. As for creating entries for common variant spellings of the language name, that is completely fine as long as WT:CFI is respected, which means in practice that there exist at least three uses of the given spelling in "durably-archived media" or whatever (academic papers, etc.). I assume that this is the case for all of the spellings you list above. As for there being only 8 terms here, that just means no one has gotten around to adding more of them; no natural languages are out of scope for Wiktionary. The only thing is that if there's a practical orthography that has any use at all, it's best to cite the terms in that orthography (if possible) rather than using ad-hoc IPA-based spelling. I don't know what orthography the terms in the Wikipedia wordlist are written in so you'd have to poke around a bit to see if it matches the Campbells' orthography. Benwing2 (talk) 03:17, 1 July 2025 (UTC)Reply

Thanks for taking the time to inform me! I'll keep this in mind when I contribute in the future. Solid kalium (talk) 15:53, 1 July 2025 (UTC)Reply

July 2025

Yeniseian languages

Latest comment: 4 days ago7 comments3 people in discussion

The family tree's currently laid down like this, following older classification schema:

Proto-Yeniseian:
- Northern-Yeniseian:
  - (...)
- Southern-Yeniseian:
  - (...)

...but it should be like this instead, as given in the Wiktionary:Proto-Yeniseian entry guidelines (and as per Vajda 2024:371, which is also the source we use on Wiktionary):

Proto-Yeniseian:
- Ketic:
  - Ket:
  - Yug:
- Kottic:
  - Assan:
  - Kott:
- Arinic:
  - Arin:
- Pumpokolic:
  - Pumpokol:

I do not request language codes for the family branches, but they could be useful in the future where we might have more branch-specific reconstructions.

AmaçsızBirKişi (talk) 10:00, 2 July 2025 (UTC)Reply

I don't think there's any way to reorganize the subfamilies of Yeniseian without creating codes for them. Also, I recently edited и and ит, both of which mention Proto-Ketic in their etymology sections, for which we have no code. If we make a code for the Ketic family, we may as well add "-pro" to it and create a protolanguage for the family while we're at it. —Mahāgaja · talk 10:56, 2 July 2025 (UTC)Reply

Having something like (for Proto-Ketic) would be useful. Could a family/branch language code double as a proto-language code, too? To prevent clutter, of course.

AmaçsızBirKişi (talk) 11:21, 2 July 2025 (UTC)Reply

Proto-language codes are (almost?) always just the family code followed by -pro. They do need to be distinct. —Mahāgaja · talk 14:43, 2 July 2025 (UTC)Reply

If so, then we would need:

Ketic, Proto-Ketic: (We have a ton of reconstructed Proto-Ketic lemmas.)

Kottic, Proto-Kottic: (Not as much as Ketic, but it's quite distinct from the rest.)

Arinic, Proto-Arinic: (This one will also be useful for Xiong-nu lemmas, in literature it's called Old Arin, but Proto-Arinic works I guess.)

Pumpokolic, Proto-Pumpokolic: (This one will also be useful for Xiong-nu and Jié lemmas.)

Could you do this? I doubt if anyone would object to this at all.

AmaçsızBirKişi (talk) 16:06, 2 July 2025 (UTC)Reply

I know next to nothing about Yeniseian but in the interests of parsimony, can any of the above proto-languages be made etym-only varieties of Proto-Yeniseian? I don't know how old this family is or how different the various branches are. Benwing2 (talk) 03:02, 3 July 2025 (UTC)Reply

Yes, we don't need more than just etym-only variants.

AmaçsızBirKişi (talk) 06:25, 3 July 2025 (UTC)Reply

Cumbric

Latest comment: 8 hours ago4 comments3 people in discussion

(Notifying RichardW57, Arafsymudwr, Llusiduonbach, Linguoboy, Silmethule, Brutal Russian, Mellohi!, Silmethule, AryamanA, Caoimhin ceallach, Exarchus, Mellohi!, Pulimaiyi, Victar): Although we have some lemmas in Cumbric in main space, the language is in fact totally unattested, only reconstructed. And it's not even reconstructed on the basis of an attested daughter language, but solely on the basis of place names in England and Scotland. Not enough is known about the language for us to say with any certainty how it differed from Proto-Brythonic, so I propose that we change Cumbric from being a full-fledged L2 language to being an etymology-only variant of Proto-Brythonic. Thoughts? —Mahāgaja · talk 07:15, 3 July 2025 (UTC)Reply

For what it's worth, Jackson (1994, Language and history of early Britain, 4th ed.) states that three words are definitely Cumbric: kelchyn, galnes/galnys, and mercheta. That brings to the argument we've had before here of whether we should consider these true Cumbric words or Latin words with Cumbric roots (because they occur in Latin texts). I'm not sure where we stand on this. —Caoimhin ceallach (talk) 22:50, 6 July 2025 (UTC)Reply

Oh, that's true. I had forgotten about those. I don't think we've ever come to a consensus about how to handle words in barely attested languages that are only mentioned (not used) in a text in another language. Personally, I'm willing to keep Cumbric as a full language for the sake of these four terms (three different lexemes). But I do still think that any other Cumbric words should be listed as (reconstructed) Proto-Brythonic rather than reconstructed Cumbric, as we just don't know enough about it to distinguish reconstructed Cumbric from PBr. —Mahāgaja · talk 08:21, 7 July 2025 (UTC)Reply

As I've stated in some earlier discussions, I am of the opinion that languages that are only attested through another language are unattested. An etym-only code seems fine specifically for these terms that seem to be borrowed from the language, but the attestation through another language also means the language is reshaped so much that any analysis becomes tricky.

On that note, I think we have a bunch of other languages in a similar situation. For instance CAT:Thracian lemmas and CAT:Dacian lemmas seem to be filled with reconstructions that are based on borrowings, even though the attested material is so scarce, that a good reconstruction seems very difficult. I would also like to treat such terms as basically substrate terminology. Thadh (talk) 11:38, 7 July 2025 (UTC)Reply

Update Baltic #2 (unattested L2 languages → etym-only code)

Latest comment: 3 days ago1 comment1 person in discussion

Wiktionary:Language treatment requests#Update Baltic (Golyad language & Dnieper Baltic group / classification of Galindian)

Sorry for pinging everyone who responded in the previous topic. @Benwing2, @Mahagaja, @Thadh, @Chuck Entz, @Fay Freak.

Since adding a new L2 code for the Balto-Slavic languages is an "impossible quest", then as was suggested in the previous discussion... I propose to translate all Baltic languages that are not attested in writing from L2 to etymological as part of Proto-Balto-Slavic ine-bsl-pro:

Curonian xcu
Selonian sxl
Semigallian xzm
Skalvian svx
Galindian xgl (+ add synonym "West Galindian")

And also add as I asked earlier, but this time the etymological code for the language of Dnieper Baltic subgroup:

2016

Nkore-Kiga

Itneg lects

Paraguayan Guaraní

2017

Merger into Scandoromani

Yenish

2018

Category:Nahuatl language

Language request: Old Cahita

Merging Classical Mongolian into Mongolian

Renaming agu

2020

Retiring Moroccan Amazigh

2021

Canonical name of "mep"

Names of sah, alt, xgn-kha and request for Soyot

Renaming

Renaming

Indus Valley Language

Merging Yoruba dialects

2022

Category:Gansu Chinese→Category:Gansu Mandarin? Category:Gansu Dungan?

Merge Category:Hokkien, Category:Hokkien Chinese; and perhaps move Category:Hainanese depending on the result of the previous

Slavic phylogeny

Old Slovak ?

Slavic phylogeny

East Slavic codes

Old Slovak ?

Proposal to rename Ottawa (otw) to Odawa

Re-merge Kven and Meänkieli into Finnish

2023

Church Slavonic and Moravian

Polish Silesian and Silesian

Renaming Proto-Mon-Khmer to Proto-Austroasiatic

Renaming Proto-Hmong to Proto-Hmongic

Okinoerabu and Tokunoshima

Correct language names

Ktunaxa, Secwepemctsín

Akan varieties

New language codes for nested Persian translations

Splitting Mazurian

Proposal for several languages without ISO codes

Baltic

Creoles and pidgins

Dravidian

Germanic

Indo-Aryan

Iranian

Nuristani

Tungusic

Yeniseian(?)

Unknown

2024

Medieval Greek from Ancient Greek

phase 1

Rename to Medieval Greek

Split from Ancient Greek

?

Continuation (originally on Sarri's talk page)

Plan for Medieval Greek

Solombala English

Converting Min Nan into a family

Add etymology-only codes for Proto-Anglo-Frisian and Proto-North Sea Germanic

Merging Tupinambá (tpn) into Old Tupi (tpw)

Additional Southern Min languages

Beserman

More etym codes for Chinese varieties, part 1

More etym codes for Chinese varieties, part 2

Redid Chinese labels

Ramifying/filling out Yue Chinese

Manipuri vs Meitei language

Please help to sort out Scandoromani

Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan

Rename wca from Yanomámi to Yanomam

East Lechitic typology

Paraguayan Guaraní (again)

Add Guachí

Changing the canonical name of kla from "Klamath-Modoc" to "Klamath"

Ancestor of Azerbaijani

Names of `sah`, `alt`, `xgn-kha` and request for Soyot

Changing the canonical name of `kla` from "Klamath-Modoc" to "Klamath"