Wiktionary:Language treatment requests/Archives/2020-24

Language treatment requests: Archive index
Language treatment requests/Archives/2015-19 Language treatment requests/Archives/2020-24 Language treatment requests/Archives/2025-29 Language treatment requests/Archives/pre-2015

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call this "Mpuono"; "Mbuun" is much more common in the (admittedly limited) scholarly literature. For example, compare google books:"Mbuun" grammar and google books:"Mpuono" grammar. —Μετάknowledge^{discuss/deeds} 05:16, 10 March 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 23:26, 12 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently refer to the other languages most commonly referred to as "Lele" with parenthetical geographic disambiguators (Lele (Guinea), Lele (Papua New Guinea), Lele (Chad)). I suggest we do the same for and change "Bashilele" to "Lele (Congo)". There is an argument that "Lele (Congo)" should be "Lele (Democratic Republic of the Congo)", but that seems too long and none of these languages are spoken in the Republic of the Congo, so I think it's safe. —Μετάknowledge^{discuss/deeds} 05:30, 10 March 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 23:33, 12 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming the Sangu languages~~

We currently call "Chango", which is quite infrequently used. We could go for "Isangu" (with the language prefix), because there is another Bantu language called Sangu — or we could bite the bullet and go for geographical disambiguation with "Sangu (Gabon)" for and "Sangu (Tanzania)" for , which is what Wikipedia does, and is probably least misleading. —Μετάknowledge^{discuss/deeds} 04:29, 31 March 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 04:41, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

This is a case where a single language spoken in two countries has been given two codes. Mushungulu is the dialect of Zigula spoken in Somalia. Maarten Mous says: "I interviewed some of the refugees . They claim there is no difference between the way they speak Zigula and the Zigula of those who never left the area; the Zigula speakers present agreed to this. Noting down some expressions, I had the same impression when comparing this to my Zigula field notes from 1994." —Μετάknowledge^{discuss/deeds} 19:45, 31 March 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 04:49, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Split Bafaw and Balong~~

Currently, the code is called "Bafaw-Balong" in an attempt to cover the two dialects. However, these "dialects" are quite divergent, though fairly closely related. Maho (2009) separates them, as does the Linguistic Atlas of Cameroon. Emmanuel Chia suggests that heavy borrowing has confused linguists into considering them the same language in The Bafaw Language (Bantu A10) (which treats Bafaw and not Balong, considering it completely separate). I suggest that we rename to "Bafaw" and make a new code for Balong, maybe bnt-bal. —Μετάknowledge^{discuss/deeds} 20:12, 31 March 2020 (UTC)

Renamed and added . —Μετάknowledge^{discuss/deeds} 05:06, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Adding Lwel and Mpiin~~

The Boma-Dzing languages are a bit of a mess, as Dzing was given a code but none of its sister languages were initially. Lwel is considered a distinct language by Maho (2009) and by {{R:nzd:Crane}}, and seems to have a grammar (Éléments de grammaire morphologique de la langue lwel) that I can't find anywhere, but citations to it seem like a distinct language. I suggest a new code, bnt-lwl. —Μετάknowledge^{discuss/deeds} 21:02, 31 March 2020 (UTC) As for Mpiin, it's also considered distinct by Maho (2009) (and, I should add, these opinions are upheld by Van de Velde et al. (ch.2, by Hammarström)), and a selection of words in Koni Muluwa (2010) seem similar to, but consistently distinct, from other Boma-Dzing languages. I suggest a new code, bnt-mpi. —Μετάknowledge^{discuss/deeds} 05:44, 1 April 2020 (UTC)

Both added. —Μετάknowledge^{discuss/deeds} 05:29, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Retire~~

This code for "Mengisa" is listed as a duplicate code by WP. Van de Velde et al. (ch.2, by Hammarström) say: "The Eton that most Mengisa now speak is not linguistically remarkable and therefore we count it as an Eton variety". (The Mengisa originally spoke .) —Μετάknowledge^{discuss/deeds} 00:46, 1 April 2020 (UTC)

Support. - -sche (discuss) 06:15, 6 April 2020 (UTC)

Removed. —Μετάknowledge^{discuss/deeds} 05:16, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_and__as_Central_Teke|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge and as Central Teke~~

Van de Velde et al. (ch.2, by Hammarström) say: "are commonly enumerated separately but the recent comparison by Raharimanantsoa (2012) shows that such a distinction is untenable, wherefore we count them as the same". I am agnostic as to which code to choose, but the language should be called "Central Teke". —Μετάknowledge^{discuss/deeds} 00:55, 1 April 2020 (UTC)

Merged into . —Μετάknowledge^{discuss/deeds} 06:33, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

Almost everybody calls it "Tiv", except us (we have "Tivi"). Compare google books:"Tiv language" and google books:"Tivi language" (which has almost nothing except Library of Congress categorisation), for example. —Μετάknowledge^{discuss/deeds} 06:10, 30 March 2020 (UTC)

Support. —Mahāgaja · talk 05:48, 1 April 2020 (UTC)

Support. - -sche (discuss) 06:16, 6 April 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 04:37, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

Currently "Augila"; many spellings exist, but all modern linguistic work on the language in English (and most of the material on the town itself) refer to it as "Awjila". —Μετάknowledge^{discuss/deeds} 23:10, 19 March 2020 (UTC)

@Benwing2: There are a bunch of language renames on this page that I'd like to do (this being one of them). If I just ignore the categories, will your regular bot runs create the new ones and delete the old, or should I move the categories instead? —Μετάknowledge^{discuss/deeds} 04:17, 11 May 2020 (UTC)

@Metaknowledge I do a bot run every 3 days to create new categories, but it doesn't delete old ones. I have periodically deleted old categories listed in CAT:Empty categories, but I don't do it on a regular basis. I *think* all the old categories will eventually end up on that page. It's probably better to move the categories and manually fix the lemmas belonging to the renamed languages to have the new language name in their headers. For languages with very few lemmas it's probably not a big deal to do it manually; otherwise, let me know and I'll do it by bot. Benwing2 (talk) 04:29, 11 May 2020 (UTC)

Yeah, none of them are a pain in the ass individually, I was just hopeful due to how many languages there are to rename. (Awjila only has one entry, but I had to move 10 categories for it!) Thanks for the offer, I'll let you know if any have a significant number of entries. —Μετάknowledge^{discuss/deeds} 04:36, 11 May 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 04:36, 11 May 2020 (UTC)

RFM discussion: March–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming the Kota languages~~

The name "Kota" is commonly used for two very small languages, so this seems like another good opportunity to use parenthetical disambiguators rather than force one to use a name with a language prefix, rarely encountered in the literature. I suggest that we rename from "Kota" to "Kota (India)", and from "Ikota" to "Kota (Gabon)". —Μετάknowledge^{discuss/deeds} 05:35, 10 March 2020 (UTC)

Both renamed. —Μετάknowledge^{discuss/deeds} 23:36, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into__and_rename_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into and rename~~

Van de Velde et al. (ch.2, by Hammarström) say that "studies in the field emphasise that Lalia is simply a variety in the Bangando area with no special status vis-a-vis other Bongando varieties". While we're at it, we currently call "Longandu", which seems to be one of the rarer names. In the tradition of lopping off language prefixes and matching the usual final vowel, I suggest we rename it to "Ngando". —Μετάknowledge^{discuss/deeds} 01:21, 1 April 2020 (UTC)

Merged. As for the renaming, the issue is more complex than I had realised: there is also (Ngando language (Central African Republic)), which we perplexingly call "Bagandou". We could change the parentheticals later, but for now I'm going with "Ngando (Congo)" for and "Ngando (Central African Republic)" for . —Μετάknowledge^{discuss/deeds} 21:38, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call this "Longandu". I assume the reason for avoiding "Ngandu" was originally potential conflict with , which we call "Bangandu". Given that "Bangandu" seems like a perfectly fine name, I suggest we rename to "Ngandu". —Μετάknowledge^{discuss/deeds} 01:27, 1 April 2020 (UTC)

Well, I seem to have gotten these confused while attempting to clean them up, which is as good an argument as any for better names. We actually call "Lingombe", from which we should remove the language prefix. Unfortunately, that brings it into potential conflict with the poorly named "Bangando-Ngombe" , variously considered a dialect of Bangandu . I am instead renaming it to "Ngombe (Congo)" for maximal clarity, and will create a new thread to discuss . —Μετάknowledge^{discuss/deeds} 23:17, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

Van de Velde et al. (ch.2, by Hammarström) say: "the only source on this language (Hackett and van Bulck 1956:74) has it as a dialect of Nyanga-li ." If that's the sole source, I have no idea why it ever got a code. —Μετάknowledge^{discuss/deeds} 01:33, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:32, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_and__into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge and into~~

Van de Velde et al. (ch.2, by Hammarström) say: "Pelende and Lonzo denote political rather than ethnolinguistic sub-entities of Yaka ... as explicit linguistic data, whenever available, confirms". —Μετάknowledge^{discuss/deeds} 01:38, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:33, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

Van de Velde et al. (ch.2, by Hammarström) say: "the field research of Kraal (2005 1–7) finds this distinction untenable from a linguistic and ethnographic point of view". "Makonde" is the usual term. —Μετάknowledge^{discuss/deeds} 02:02, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:36, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (,_,__into_Shona_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge Shona dialects , , into Shona~~

Ehret and Kinsman (1981) "Shona Dialect Classification and Its Implications for Iron Age History in Southern Africa" have helpful trees of the Shona dialects, which confirm that , , are all mutually intelligible to a high degree with the other dialects of Shona . (In fact, and are subdialects of the larger dialects that most Shona dictionaries treat, and isn't even a valid clade, but merely a geographic convenience.) —Μετάknowledge^{discuss/deeds} 02:17, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:38, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Lala languages~~

The Bantu Lala language (South Africa) is certainly not the same as Zulu, as should be clear just from the sample in the WP article. Unfortunately, our current "Lala language" will have to be moved to make way. In the emerging tradition of geographic parentheticals, I suggest that be renamed to "Lala (New Guinea)" and a new code, maybe , be created for "Lala (South Africa)". (As a side note, there are also Lala-Roba and Lala-Bisa, which thankfully have hyphenated disambiguators.) —Μετάknowledge^{discuss/deeds} 02:33, 1 April 2020 (UTC)

renamed and created. —Μετάknowledge^{discuss/deeds} 21:07, 11 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merging into~~

There's a lengthy SIL report on the Nyiha languages, and skimming it suggests that the Malawian () and Tanzanian () dialects are mutually intelligible. Chances are that the codes haven't been used to separate the two countries' dialects in a consistent way anyway, because they've just been called "Shinyiha" (with the language prefix) and "Nyiha" (without) — the latter is what we should use. —Μετάknowledge^{discuss/deeds} 05:21, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 22:15, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call "Kikumu"; the language is most commonly referred to as "Komo", but we already have "Komo", which is not easily disambiguated (it's spoken in three countries and belongs to an obscure group). We can still do better than "Kikumu", though — I suggest we lop off the language prefix and call it "Kumu", which is more common than the name we currently use. —Μετάknowledge^{discuss/deeds} 06:10, 1 April 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 22:16, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call this "Oluwanga"; we should drop the language prefix, as we normally do and as matches the scholarly literature, and call it "Wanga". (Note that we need to reconsider the Luhya codes altogether, but that can be dealt with later.) —Μετάknowledge^{discuss/deeds} 20:23, 1 April 2020 (UTC)

Support removing the prefix. Comparing google books:"Wanga" Bantu OR Kenya language to google books:"Hanga" Bantu OR Kenya language, I notice that although the tribe seems to be mostly called the Wanga, the language is not infrequently "Hanga". (Another alt name is "Luwanga"). - -sche (discuss) 06:04, 6 April 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 22:23, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call "Bolo", which refers to a peripheral dialect of this language. This SIL report says that a conference decided on "Kibala", and Angola's Institute of National Languages followed suit (the alternative suggestion of "Kibala-Ngoya" having been rejected due to its historically derogatory usage). I suggest we do the same and use "Kibala". —Μετάknowledge^{discuss/deeds} 20:47, 1 April 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 23:10, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call "Shamay", which is a rather rare name for it. WP uses "Samay", but I suggest we follow Van de Velde et al. (ch.2, by Hammarström) and use "Osamayi". —Μετάknowledge^{discuss/deeds} 20:57, 1 April 2020 (UTC)

Renamed to "Osamayi". —Μετάknowledge^{discuss/deeds} 23:12, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

Van de Velde et al. (ch.2, by Hammarström) and indeed every linguistic source I can find has this as synonymous with or a dialect of Mashi . I can only see a bit of the relevant part in Laranjo Medeiros (1981) VaKwandu on BGC, where he seems rather confused about the language in general, but there may be useful information there if someone can see it (or if anyone still has academic libraries operating in their part of the world). —Μετάknowledge^{discuss/deeds} 21:16, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:17, 12 May 2020 (UTC)

RFM discussion: April–May 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

The Yauma are ethnographically distinct, but speak a Mbunda dialect according to Fleisch (2000). I see a journal article in 1970 that says Yauma is synonymous with "newer Mbunda" (in contrast with "Old Mbunda"), and I'm not really sure what's going on there, but it also supports a merger. I cannot find Yauma mentioned in the usual sources otherwise. —Μετάknowledge^{discuss/deeds} 21:32, 1 April 2020 (UTC)

Merged. —Μετάknowledge^{discuss/deeds} 23:21, 12 May 2020 (UTC)

RFM discussion: May–June 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

Currently called "Chuwabu". There are quite a lot of spellings, influenced by English, Swahili, Portuguese, and Africanist notation, and none of them are unquestionably more common, but this shows that the case for "Chuabo" is the best, which is supported by paging through the Google Books results as well. —Μετάknowledge^{discuss/deeds} 17:11, 5 May 2020 (UTC)

Support. - -sche (discuss) 01:39, 12 May 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 21:33, 15 June 2020 (UTC)

RFM discussion: May–June 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call "Bangando-Ngombe". It is variously considered a dialect of Bangandu : Wikipedia cites Glottolog to support a claim that the code is "spurious", and the Atlas linguistique de l'Afrique centrale calls it "une variété très voisine" to Bangandu. Without a clearer statement about mutual intelligibility or specific fieldwork, I am hesitant to merge it. However, the current name, which hyphenates a (differently spelt!) version of the macrolanguage with the dialect's name, is clearly undesirable. I suggest we rename it to "Ngombe (Central African Republic)" (the national parenthetical serving to disambiguate from "Ngombe (Congo)"). @-sche —Μετάknowledge^{discuss/deeds} 23:24, 11 May 2020 (UTC)

Seems reasonable; go for it. - -sche (discuss) 01:38, 12 May 2020 (UTC)

Renamed. —Μετάknowledge^{discuss/deeds} 22:38, 15 June 2020 (UTC)

RFM discussion: January 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Merging Gaulish and Lepontic

I'd like to address merging Lepontic with Gaulish because it is already included in the English article provided and the terms are frequently the same except for some differences in endings development without any necessary declension change. Comparing this to Latin, as it is merged with Old Latin, this may also be extended to Gaulish and Lepontic. Furthermore, Cisalpine Gaulish is already encompassed by Gaulish even though it was written in the Old Italic alphabet just like Lepontic. HeliosX (talk) 19:55, 15 January 2020 (UTC)

Having read both Lepontic language and Cisalpine Gaulish, I'm not convinced they're similar enough to warrant merging. —Mahāgaja · talk 21:06, 15 January 2020 (UTC)

I generally push for a conservative treatment for extinct languages with small, finite corpora. We can afford to over-split a little, if that even is the case here, and be no worse off for doing so. —Μετάknowledge^{discuss/deeds} 03:13, 17 January 2020 (UTC)

RFM discussion: February 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Adding Old Omagua

This helpful source strongly suggests that early texts written are sufficiently distinct from Omagua (omg) to be considered a separate language. I don't know much about it, but if this is right, we need a new code for Old Omagua, maybe tup-oom. @Lvovmauro, -sche. —Μετάknowledge^{discuss/deeds} 06:24, 14 February 2020 (UTC)

Meh. I don't object, but I'm not convinced it's necessary. We're talking about a time difference less than that between modern and Middle English, many of the differences seem to be in whether various morphemes remain attested into the present in various positions (or, conversely, whether modern ones happened to be found in the sparse early corpus), spelling variations are clearly attributable to texts being written by potentially non-fluent outsiders vs modern linguists and natives, and modern Omagua is small and dying. (Fox is another language which underwent changes in a relatively short time period that might be considered substantial, but it seems to still be treated as one language.) It seems to me like we could adequately, and perhaps even more sensibly, handle the differences by labelling words and spellings obsolete where necessary, lemmatizing the modern forms (and conversely labelling words/forms as modern developments in cases where we can tell that they are as opposed to that they just weren't attested in the old texts). - -sche (discuss) 21:12, 18 February 2020 (UTC)

Some languages do change a lot faster than others due to social pressures (hence why glottochronology doesn't really work). I found the differences in morphology and lexicon significant, especially in the hopes that someday, someone will build the infrastructure for (modern) Omagua here. But if you're not convinced, I'm fine with shelving this; I certainly won't be working on Tupian stuff myself. —Μετάknowledge^{discuss/deeds} 03:11, 19 February 2020 (UTC)

RFM discussion: April–July 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_into_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge into~~

Van de Velde et al. (ch.2, by Hammarström) say: "Dombe is a derogatory nickname for Tonga found in Hwange district of Zimbabwe". This raises an issue with , which we currently call "Tonga (Zambia)" (to distinguish it from "Tonga (Malawi)" and "Tonga (Mozambique)") — it's spoken in Zimbabwe too. The WP article is called "Tonga (Zambia and Zimbabwe)", which is awfully awkward name for a language spoken by 1.5 million people. I think we may have to cut our losses and stick with "Tonga (Zambia)" even after the merger. —Μετάknowledge^{discuss/deeds} 01:51, 1 April 2020 (UTC)

@-sche, Smashhoof, anyone have feelings about "Tonga (Zambia)" rather than "Tonga (Zambia and Zimbabwe)"? —Μετάknowledge^{discuss/deeds} 20:59, 11 May 2020 (UTC)

@-sche, Smashhoof, Mahagaja: Reminder that I'm still looking for input on this one. When you see a country name in a parenthetical disambiguation in a language's canonical name, do you assume that the language is exclusively spoken in that country? —Μετάknowledge^{discuss/deeds} 06:35, 15 June 2020 (UTC)

At least I would assume it's spoken primarily in Zambia, as in maybe at least 67% of speakers are in Zambia. But why not use subfamilies instead of countries as disambiguators? / could be "Tonga (Botatwe)", could be "Tonga (Southern Bantu)", and could be "Tonga (Nyasa)". —Mahāgaja · talk 06:41, 15 June 2020 (UTC)

@Mahagaja: "Primarily" is certainly correct here. But the idea of using subfamilies is an intriguing suggestion that I hadn't considered. My only fear is that those subfamilies are not very well known by any terms, as such classification is relatively recent, and to someone unfamiliar with the classification system, the word "Botatwe" would be meaningless and the concept of "Southern Bantu" would be ambiguous. —Μετάknowledge^{discuss/deeds} 06:51, 15 June 2020 (UTC)

I would personally prefer the more accurate label Tonga (Zambia and Zimbabwe), even if it's a bit clumsy. But I'm not opposed to keeping it as Tonga (Zambia) if you prefer. What I'm confused about is why you want to merge and . The chapter you're referencing splits as a separate language Toka-Leya, which suggests that language isn't mutually intelligible with Tonga. Smashhoof (Talk · Contributions) 17:25, 15 June 2020 (UTC)

@Smashhoof: Thank you for checking! I read that footnote as meaning that the "Toka-Leya dialects" were to be thought of as dialects of Tonga, but I see that Hammarström does give the sum of Toka, Leya, and "Dombe" its own line as a language. I went to check Hachipola (1991), who did a study on eight Tonga-group lects, including Toka, and says: " Toka informants all insisted that they were all legitimately Tonga. Valley speakers in particular consider their speech as the 'purer' form of Tonga of which Plateau is the corrupted version . The speakers of Plateau, Valley and Toka can also understand speakers of Ila, but with some difficulty. However, conversation can be carried out between, say, a Plateau speaker and an Ila speaker each using their own speech form. This is also true of Lenje on the one hand and Plateau on the other ". I glanced through the lexical items in the second half of the dissertation and found that these three lects were quite similar, and Toka might be closer to Valley Tonga than to Plateau Tonga. I'm really not sure where to draw the line, especially since we (by default) consider Valley and Plateau to both be . What do you think? —Μετάknowledge^{discuss/deeds} 19:17, 15 June 2020 (UTC)

@Metaknowledge I'm not sure what "Valley" refers to specifically. But I would follow Hammarström (and Glottolog) by classifying Toka, Leya, and "Dombe" as . Smashhoof (Talk · Contributions) 20:50, 19 June 2020 (UTC)

Renamed as "Toka-Leya". —Μετάknowledge^{discuss/deeds} 21:03, 25 July 2020 (UTC)

RFM discussion: August 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Adding Ashraaf

Ashraaf is the sister language to Somali, and currently included in its code despite being mutually unintelligible. Wikipedia has an entry at Ashraf dialect, despite the fact that they quote two sources that make the case for it being its own language, and their references use the usual spelling with a double a. It is somewhat poorly documented (there is no dedicated grammar), but multiple scholarly sources include vocabulary. I suggest the code cus-ash. —Μετάknowledge^{discuss/deeds} 01:33, 11 August 2020 (UTC)

I am inclined to agree/support; Green and Jones' piece outright says, elsewhere from the snippet WP quotes, that "despite all classifications of Ashraaf treating it as a dialect of Somali, our Marka speakers have intimated to us that both Marka/Somali and Marka/Maay intelligibility presents a challenge" and "despite the fact that some classifications treat Ashraaf as a dialect of Somali, Marka and Somali appear not to have a high degree of mutual intelligibility"; they say plurals are almost all different from Somali, and singulars also show some differences. (They mention the otherNames = "Af-Ashraaf", and the varieties {"Marka, Lower Shabelle"} and "Shingani", saying — as you probably saw, but as I will mention here for easier findability later — that for Shingani there is one 'theoretical article' on 'theme construction', Ajello 1984, one short grammatical sketch, Moreno 1953, and one book of pedagogical material, Abo 2007, and for Marka there is an unpublished grammatical sketch, Lamberti 1980, and an article on verb inflection, Ajello 1988.) And I see that Maay already has a separate code, reasonably (the IEL says standard Somali is "difficult or unintelligible to Maay and Digil speakers" unless they've learned it). The only thing that gives me pause is that, as they admit, many prior (albeit non-Ashraaf-specific) classifications have viewed Somali as a single language, as WP claims speakers also view it, despite the lack of mutual intelligibility. - -sche (discuss) 20:46, 11 August 2020 (UTC)

That's a matter of linguistic nationalism, coupled with an unfortunate habit of scholars like Lamberti to count everything spoken by people who identify as belonging to Somali clans as "Somali dialects". Maay, Jiddu, Garre, and many others were in this boat, but have all gotten ISO codes, except for Ashraaf. —Μετάknowledge^{discuss/deeds} 21:07, 11 August 2020 (UTC)

Added (as cus-ash with an initial entry at ilig; you may want to add diacritic stripping rules to the code and gender to the entry). - -sche (discuss) 04:20, 18 August 2020 (UTC)

RFM discussion: April 2018–October 2019

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Renaming svm

From ‘Molise Croatian’ to ‘Molise Slavic’. It seems most scholars in the field, excluding some in Croatia itself, have been switching to the latter name; see, for instance, the papers of Krzysztof Borowski or the more recent works of Walter Breu. The endonym is naš jezik (literally “our language”) or na-našo (literally “in our (manner)”), without any specific ethnonational designation; the other names are apparently recent impositions. For this see for instance Sujoldžić 2004:

Along with the institutional support provided by the Italian government and Croatian institutions based on bilateral agreements between the two states, the Slavic communities also received a new label for their language and a new ethnic identity — Croatian — and there have been increasing tendencies to standardize the spoken idiom on the basis of Standard Croatian. It should be stressed, however, that although they regarded their different language as a source of prestige and self-appreciation, these communities have always considered themselves to be Italians who in addition have Slavic origins and at best accept to be called Italo-Slavi, while the term »Molise Croatian« emerged recently as a general term in scientific and popular literature to describe the Croatian-speaking population living in the Molise.

Information about current scholarly usage is given by Walter Breu in the request for an ISO code here:

Slavomolisano: In scientific work, this name is predominantly used, either in its Italian form or in translations. As the language is "genetically" affiliated to the Serbo-Croatian macrolanguage with its dialectal continuum and the problems of its segmentation, a denomination, referring to one of the individual Standard languages of this group, e.g. Croatian or Bosnian, should be avoided, the more so, as its individual character is mainly due to the language contact with Italian and its dialects, especially that of Lower Molise.

Ethnologue, Glottolog, and SIL (as well as Wikipedia) all followed the ISO’s lead and list the language under ‘Slavomolisano’, the Italian form of ‘Molise Slavic’ (which would also be fine). AFAIK I’m the only contributor to Wiktionary in this language to date. — Vorziblix (talk · contribs) 19:57, 15 April 2018 (UTC)

If most scholars as well as Ethnologue, Glottolog, SIL, and Wikipedia all call it Slavomolisano, shouldn't we do the same, rather than call it Molise Slavic? —Mahāgaja (formerly Angr) · talk 11:46, 19 April 2018 (UTC)l

Sure, that would also be fine. The two are used pretty much interchangeably in scholarly publications (so the ISO note says ‘either in its Italian form or in translations’). It’s a bit of an odd situation, given the most prominent scholar in the field (Breu) submitted the language under ‘Slavomolisano’ to the ISO (hence the adoption by all the other organizations) but uses ‘Molise Slavic’ in his own recent English publications. But if you think it’s preferable to directly follow Ethnologue and the others I wouldn’t object. — Vorziblix (talk · contribs) 12:57, 21 April 2018 (UTC)

I’ll start switching this over to ‘Slavomolisano’ one of these days if no one has any objections. — Vorziblix (talk · contribs) 03:04, 24 September 2019 (UTC)

Done. — Vorziblix (talk · contribs) 16:05, 17 October 2019 (UTC)

RFM discussion: September–November 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Recreation of Template:hwc (Hawaiian Pidgin)

I would like to propose splitting HWC (Hawaiian Pidgin) off of EN (English) due to the following reasons:

1. The discussion in Template talk:hwc is outdated. It took place back in 2012/2013. This is before the US Census recognized Pidgin as a separate language in 2015 (2-3 years later). As such, the general linguistic view of HWC's status may have changed as a result of federal recognition.

2. Many of the arguments these people used were flawed (at least in my opinion).

- They implied Hawaiian Pidgin shouldn't be classified as a creole due to its high levels of mutual intelligibility with Standard English, even going as far as to compare it with "gangsta slang" from NYC or LA. The problem with this argument is that intelligibility doesn't define a creole. Creoles are naturalized pidgins, and pidgins are: "an amalgamation of two disparate languages, used by two populations having no common language as a lingua franca to communicate with each other, lacking formalized grammar and having a small, utilitarian vocabulary and no native speakers". That perfectly defines the origins of Hawaiian Pidgin. It formed as a result of many immigrant groups learning English. In fact, Tok Pisin has a very similar origin.

- The central claim that Hawaiian Pidgin can be understood by English speakers is only a half-truth to begin with. Many English speakers say they understand "Pidgin", but that's because the majority of Pidgin speakers mix a lot of their speech with Standard English, even more so in formal contexts. Full-on Pidgin can be a tough nut to crack. Furthermore, it's interesting that they use the argument of mutual intelligibility on Wiktionary, the same website that lists Bulgarian and Macedonian as separate languages. Those two languages are more similar to each other than Standard English is to Hawaiian Pidgin, yet for some reason, the BUL-MAC languages are separated meanwhile HWC is listed as dialect of EN.

3. Even in Wiktionary, it gets referred to as a "creole language". In the page Hawaiian Pidgin, it literally says "A creole language based in part on English used by most "local" residents of Hawaii. It's kind of odd that Wiktionary defines it as a creole language, but doesn't actually list it as one.

These are my main reasons why I do think Wiktionary should split hwc off of en. — Coastaline (talk) 00:25, 11 September 2020 (UTC)

I started that discussion in 2012, and I don't really agree with how I approached it then. Hopefully I can do a better job now. Your first and third points aren't really relevant here: the political status of hwc doesn't have any practical bearing on its linguistic status, and our entry has no authority over our language treatment; it could well be changed. But your second point gets to the crux of the matter, and I am no longer certain how to proceed. Ultimately, we need to identify texts or recordings that we can agree are "pure" hwc, as opposed to being mixed with (colloquial) English. You mention Bulgarian and Macedonian; one reason it's easy to separate them is that there are lots of texts in each, clearly belonging to one or the other. Mutual intelligibility may be one of our best guides if we can't even decide what to put in which category. —Μετάknowledge^{discuss/deeds} 02:45, 11 September 2020 (UTC)

My point wasn’t that the political status changes the linguistic status. I tried to say that federal recognition was a result of linguists changing their attitudes towards HWC, since the government wouldn’t randomly recognize it out of nowhere if it weren’t for linguistic perception. That’s why I thought the discussion needed to be revamped because there’s likely been a change in linguistic consensus from 2012 to 2020, if we’re seeing different developments like this.

And fair point about the third argument. But regarding the second one, I do think there are texts that are in pure hwc, such as the Hawaiian Pidgin Bible (Da Jesus Book). The book “Pidgin Grammar: An Introduction to the Creole English of Hawaii” also showcases pure Pidgin (although the explanations are in English). — Coastaline (talk) 04:27, 11 September 2020 (UTC)

Just to clarify: even if there's consensus to reinstate Hawaiian Pidgin as a separate language, the template {{hwc}} will not be recreated, because that's not how we work with languages anymore. Rather, an entry for hwc will be added to Module:languages/data3/h. I don't know anything about HP beyond what Wikipedia tells me, but if the entire Bible is written in the language illustrated by File:Hawaii Pidgin crop.jpg, then I do think it's appropriate to call that a separate language rather than a variety of English. It may not be as different from standard English as Bislama is (as Meta said 8 years ago), but I do get the impression that it's about as different from it as Jamaican Creole is, one big difference being that Jamaican spells things phonetically while Hawaiian sticks closer to standard English orthography (Jamaican Creole fuud vs. HP food), which makes HP easier to read for English speakers. —Mahāgaja · talk 15:24, 11 September 2020 (UTC)

Poking around the copy Google Books has snippets of, it does seem to all be like that, including e.g. "But wen da police guys wen come to da jail, dey neva find Jesus guys ova dea. So da police guys wen go back an tell da leadas: 'we wen find da jail door still yet stay lock, an da security guard standing by da doors, but wen we wen open um, nobody stay inside.' Wen da leada guys hear dat, da captain fo da temple guard an da main priest guys wen come mix up, an dey wen tink plenny wat goin happen." (I also found a book called "Pidgin to Da Max" which mentions a number of loanwords from Samoan, etc, and has its own usexes like "aalas dollahs means no mo' kala", and suggests the pronunciation of some words differs from English even when the spelling does not.) On one hand, it seems mostly intelligible, and there are substantially less intelligible dialects of English that we clearly treat as such; on the other hand, I see the argument above that it has a different origin. I have no strong opinion, but will say that on a practical level we do a poor job of covering most things we've merged into English, often having only a little vocabulary, little information on e.g. how verbs conjugate, and few usexes, since putting "wen da police guys wen come to da jail, dey neva find peopo" as a usex under police#English (as long as Hawaiian Pidgin is considered English) would probably be frowned on as inappropriately niche. - -sche (discuss) 09:42, 12 September 2020 (UTC)

Thank you for sharing your thoughts. I’d just like to mention that earlier, Metaknowledge said Bulgarian and Macedonian are separate languages since both lects can be easily identified in writing. But I do in fact believe Pidgin can be identified as well. Sure, not all Pidgin books spell things the same way but it generally keeps basic rules (such as “where” being “wea” and “the kind” being “da kine”). Pidgin writing as a result should be easily identifiable from American English. It’s kind of like Neapolitan, since it’s easy to identify it from Standard Italian but not everyone spells its words the same way, even among those who speak the same dialect of Neapolitan. Much like Pidgin, it’s not standardized but it still can be differentiated. — Coastaline (talk) 01:06, 13 September 2020 (UTC)

HWC Recognition (take 2)

@Mahagaja, @-sche In September, I requested that Wiktionary should add hwc as a language. However, the discussion died down and the status quo continued. Once again, I'd like to propose recognizing hwc as a separate code from en. The main counter-argument to my first post was that there's no agreed-upon way of separating pure Pidgin from Standard English. I actually disagree with this claim, because material in pure Pidgin does exist. The entire new testament was translated into Hawaiian Pidgin, and they're actually close to finishing the old testament. If you go to a Hawaii bookstore or search in Google Books, you can find more literature in pure Pidgin. The spelling tends to be uniform for most common words, with "where" being "wea" or "the" being "da". "Hamajang" and "da kine" are also uniform, and so are most Pidgin-exclusive terms. Spelling variations may occur but I don't think it stops it from being listed as a language. Wiktionary lists Neapolitan as one, yet spelling in that language is not standardized and many variations of the same word/pronounciation occur.

So what are your thoughts? Also, in case there's any confusion, Coastaline was my previous account. I switched to this current one since I recently recovered it. ― Haimounten (talk) 21:15, 16 October 2020 (UTC)

@Haimounten: These discussions can sometimes take a while. I'd encourage you to merge this into the original conversation and ping people who contributed, rather than creating a new section, which may confuse people new to this conversation. I think I now lean toward reinstating the code, but I still want to hear others' positions. —Μετάknowledge^{discuss/deeds} 21:44, 16 October 2020 (UTC)

I apologize. I will merge it with the previous discussion. ― Haimounten (talk) 22:24, 16 October 2020 (UTC)

I’m going to ping others just in case this discussion got missed. @-sche @Mahagaja — Haimounten (talk) 01:57, 3 November 2020 (UTC)

It's a grey area: much of the vocabulary is either identical to, or readily intelligible to a speaker of, standard English, and because Wiktionary is a dictionary (with entries for individual words) rather than a collection of long texts written in the lect (the way a hwc.Wikisource or even hwc.Wikipedia would be), differences in things other than vocabulary (such as grammar) are relatively less prominent / significant. However, I'm inclined to treat it as its own language (a) because of the argument that its origin suggests it's a distinct lect rather than a variant of English, and because treating it as its own thing would be consistent with how we treat Jamaican Creole or West African pidgin(s) as their own lects, and (b) because we're not meaningfully covering it as English, and I can't see us starting to: I can't see anyone putting up with wen da police guys wen come to da jail, dey neva find peopo as a usex on police, and we don't do a good job of indicating when a "general English" word is also used in a dialect, so while peopo might have its own entry with {{lb|en|Hawaiian Pidgin}}, it's unclear how anyone would know that e.g. police was used in Pidgin, since we wouldn't label it

{{lb|en|US|ncluding|Hawaiian Pidgin|and|Midwest|and|NYC|and|New England|and|Southern US|and|AAVE|Canada|UK|Ireland|Australia|NZ|India}}

. And we have someone who wants to add content in it and it does have its own ISO code, which we're no longer sure of our decision to retire. - -sche (discuss) 05:26, 3 November 2020 (UTC)

I'm still in favor of including hwc as a separate language, if for no other reason than that "wen da police guys wen come to da jail, dey neva find peopo" is not a sentence of English, not even a nonstandard variety. —Mahāgaja · talk 12:48, 4 November 2020 (UTC)

OK, I restored it, under code hwc and name "Hawaiian Creole" (since Ngrams suggests that's more common, and WP suggests it's also more correct, than Hawaiian Pidgin). I will say, though, there are things less "English" than "wen da police..." which we treat as en, like various dialects of England. (To pluck two examples out of the texts in the EDD, "A dooat like t'thowts a bin ower gyversum an hankeran eftre it", "Lükee zee tü 'er, 'er'th agot a rat! My eymers, 'ow 'er shak'th 'n!" These are far from the most extreme examples, just two examples I found in a quick search.) - -sche (discuss) 17:57, 4 November 2020 (UTC)

(Notifying @Coastaline, Haimounten that the code has been restored.) - -sche (discuss) 17:59, 4 November 2020 (UTC)

I would not necessarily be opposed to creating language codes for some traditional dialects of England as they are certainly as different from standard English as Scots and Yola are. —Mahāgaja · talk 19:22, 4 November 2020 (UTC)

RFM discussion: August–October 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Category:Jê languages

The names of most languages which belong to the Northern branch of the Jê family as they are now are not the ones actually in use (either by linguists or by their speakers), and for a couple of languages even Glottolog (let alone Ethnologue) gets it right. I suggest:

renaming Suyá (suy) as Kĩsêdjê (of which Suyá should be an alias);
renaming Kayapó (txu) as Mẽbêngôkre (of which Kayapó and Xikrin are two dialects) — if this is deemed too diacritic-heavy, Mẽbengokre is also acceptable, but Mẽbêngôkre is the standard in the field nowadays;
renaming Pará Gavião (gvp) as Parkatêjê;
renaming Apinayé (apn) as Apinajé;
creating codes at least for Pykobjê and Tapayuna;
creating a code for Proto-Northern Jê, which would be a daughter of Proto-Jê and an ancestor of all the aforementioned languages as well as Krahô and Canela.

I would also be happy to know if I can make any of the above changes myself without being an admin (I am new to Wiktionary so not everything is clear to me yet). Degoiabeira (talk) 01:19, 23 August 2020 (UTC)

In answer to the last question, on a technical level you can't make these changes without being an admin or Template Editor, since they require changing Module:languages. But at least three admins have eyes on this now (User:Mahagaja, User:Metaknowledge, and me) so any changes that need to be made will get made after discussion here. :)

As for the suggested renames: AFAICT Suyá is much more common than Kĩsêdjê (or Kisedje) overall, though most occurrences refer to the tribe, and many of the occurrences which refer to the language are from 25+ years ago, as you brought up on Mahagaja's talk page. Even then, judging by google books:Suyá OR Kĩsêdjê OR Kisedje language, the two names seem roughly equally(?) common in books from the last 25 years. On one hand, Kĩsêdjê being the autonym speaks in favor of it; on the other hand, Suyá being historically more common speaks in favor of it. (From the perspective of ease of input, both names contain diacritics, so neither is per se easy to type.) The situation with Kayapó / Mẽbêngôkre is much the same.

For Parkatêjê, it seems like that name might indeed be most common, aided by the fact that there are so many alternatives ("Gavião Perkatêjê", bare "Gavião", "Gavião do Pará", etc) that no one name is that common AFAICT.

For Apinayé, judging by both books and scholarly journal articles, Apinayé seems to be the most common name even in recent sources, and it is also most common among the sources Glottolog lists which are specifically about the language and which use one spelling or the other in their names, so that is fine as-is and should not be renamed.

I will look into Pykobjê and Tapayuna.

Creating a code for Proto-Northern Jê seems like a good idea (there are reconstructions of it), especially if you will be adding entries in it or mentioning it in etymologies of other words. :)

Pinging User:Ungoliant_MMDCCLXIV, in case you have anything to add.

- -sche (discuss) 07:41, 29 August 2020 (UTC)

I assume the other admins I mentioned above are following this, having participated in it when it was on the user talk page of one of them, but I'm going to re-ping User:Ungoliant MMDCCLXIV because I recall that pings must(?) be added in the same line as signatures in order to work. - -sche (discuss) 07:48, 29 August 2020 (UTC)

@-sche: The mention doesn't have to be in the same line as the signature, but mw:Manual:Echo seems to say that it has to be in a chunk of added lines that contains a signature. — Eru·tuon 19:22, 31 August 2020 (UTC)

Not much more than what you’ve researched yourself. Ultimately each language will have to be researched individually to find its ideal canonical name. I do recall reading that our canonical name of a native Brazilian language was considered an ethnic slur by the people who speak it. I think it was one with a bird’s name, maybe Gavião or Urubu Kaapor.

I’ll add that I believe the main criterion in choosing a canonical name should be the name used in English-language media only. If multiple English-language names exhibit roughly the same amount of use, only then should its status as an autonym or not be considered as a “tiebreaker” (as should practical concerns such as the non-use of diacritics). — Ungoliant ^(falai) 22:21, 29 August 2020 (UTC)

Thanks for looking into it. The situation with Kayapó vs. Mẽbêngôkre is arguably not the same, however: the label Kayapó has been ambiguously used for designating both (i) a specific Mẽbêngôkre-speaking people (that is, to the exclusion of the Xikrin) as well as the dialect they speak and (ii) as a synonym of Mẽbêngôkre (that is, encompassing both the Kayapó and the Xikrin). In this sense, Mẽbêngôkre is preferrable because it helps avoid this ambiguity. Kayapó and Xikrin could be listed as varieties of Mẽbêngôkre (perhaps as different between each other as are Quebec French and Hexagonal French). Regarding the remaining comments, I realize I might be an interested party (I've been doing comparative and some descriptive work on Northern Jê for ~6 years and I've heard too many linguists' and native speakers' comments regarding their preferred ways to spell the names of these languages), so I guess I'll just wait for a consensus among unbiased Wiktionarians to form. I can also confirm that the Proto-Northern Jê reconstructions you might have seen out there are most probably mine and yes, I would be willing to add the respective entries (guess it's not a problem as long as the reconstructions are published in peer-reviewed outlets). Degoiabeira (talk) 18:43, 4 September 2020 (UTC)

@Degoiabeira I got a bit distracted, but returning to this, I added a code for the family Northern Jê, sai-nje, and a code for the reconstructed language Proto-Northern Jê, sai-nje-pro.
You may already be familiar with how to use them, I don't know, but if not: reconstructions in it would go the Reconstruction namespace (Reconstruction:Proto-Algonquian/askyi is an example entry in that language), and are mentioned in etymologies in the way you see in e.g. hàki, while the family could can be used if you don't know the precise language (as in sagamité).
- -sche (discuss) 10:12, 12 September 2020 (UTC)

I've also added a code for Tapayuna, sai-tap and Pykobjê language as sai-pyk. - -sche (discuss) 10:41, 12 September 2020 (UTC)

Brilliant, thank you! Degoiabeira (talk) 04:56, 5 October 2020 (UTC)

RFM discussion: March–July 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Chuvantsy and Chuvan~~

Hi, I propose renaming Chuvantsy (xcv) to Chuvan. The pro of this renaming is the consistency between the WM projects (both the English wikipedia and Wikidata favour this name). Furthermore it's a more simple name, that is easily deductible from the native speakers' ethnonym. I don't personally see any downside to this, since all the primary (descriptive) sources on this language are written in Russian, and the term Chuvan seems to be just as, if not more, often in use as Chuvantsy. Thadh (talk) 23:43, 22 March 2021 (UTC)

Renamed to "Chuvan". —Μετάknowledge^{discuss/deeds} 23:13, 21 July 2021 (UTC)

Judeo-Arabic

(See Category_talk:Arabic_language#RFM_discussion:_September_2016–August_2021.) - -sche (discuss) 04:24, 3 October 2021 (UTC)

RFM discussion: August 2020–July 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Adding Fogaha~~

It's time to start chipping away at the mess that is our Berber codes. Wikipedia considers the Sokna language to be spoken in three oases: Sokna, Fogaha, and Tmessa. According to van Putten, the claim of any Berber variety spoken in Tmessa is wholly unsubstantiated. That leaves Sokna and Fogaha, which were each documented by different Italian scholars using their problematic style of notation, but do seem to be significantly different nonetheless. I suggest that we add a code ber-fog for Fogaha (variously spelled "Fuqaha", "Foqaha"; the definite article is often tacked on, including by its only documenter Paradisi, who wrote "El-Fogaha", but this seems contrary to one's expectations when it comes to alphabetisation). —Μετάknowledge^{discuss/deeds} 07:39, 10 August 2020 (UTC)

Support. (I can even find such spellings as "Fojaha", "Al-Fojaha", "Fodjaha",...)
Off of the immediate topic, but re Tmessa: I can find quite a few overviews which mention Tmessa, but AFAICT none provide detail about it. The Oxford Handbook of African Languages (Rainer Vossen, Gerrit J. Dimmendaal, 2020), page 285, says it is merely undocumented, not nonexistent, as part of a lengthy lament about Berber ("an unconvincing classification has been provided by Aikhenvald and Militarev (1991), at many points this classification seems to be arbitrary. Moreover, at points it classifies dialects which are fully undocumented (e.g. Tmessa in Libya) or which are not Berber at all (e.g. Tadaksahak, which is Northern Songhay, and the Kufra oases, which are Teda-speaking). Unfortunately, some of the main lines of this classification have been taken over by the Ethnologue"). I reckon these changes are fine, though, since if any documentation subsequently emerges attesting Tmessa, it'll either show that it should be (re)included in an existing code or given its own. - -sche (discuss) 21:28, 11 August 2020 (UTC)

Here's a blogpost that explains the deal with Tmessa. As for the spellings with (d)j, they merely betray people who have made a faulty transcriptional assumption because they can't actually read the Arabic script... —Μετάknowledge^{discuss/deeds} 22:17, 11 August 2020 (UTC)

Added. —Μετάknowledge^{discuss/deeds} 06:06, 22 July 2021 (UTC)

RFM discussion: August 2020–August 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Overhaul of Berber language codes~~

The current state of our Berber (Amazigh) codes is the ISO codes without any modifications made to their scheme, depsite its many problems. The Berber languages have comparable diversity to Romance, but specialists are distinctly uncomfortable with the idea of dividing up what is mostly a dialect continuum into over 25 languages, as Ethnologue does, despite the fact that nobody disputes this practice for Romance (even though on grounds of mutual intelligibility, it's hard to say why, say, Galician and Portuguese should be considered separate). For the most part, I think we should follow the Romance practice and be (mild) splitters; in cases like Tashelhit and Central Atlas Tamazight, you wouldn't gain much by merging (as most entries would still be on separate pages), and the remarkable internal dialectal diversity in these "languages" means that good coverage will still require a great deal of dialect-tagging. I have worked through the Berber classification used by Kossmann (2013) below, using his numbered "blocks", with my notes on changes to our scheme that I recommend.

Two languages, Zenaga and Tetserret , both separate from the continuum and each other. No changes needed.
The Tuareg group, which has a macrolanguage code , but also codes for Tahaggart , Tawallammat , Tayart , Tamasheq . Tamasheq is meant to encompass both Adagh and Taneslemt, and the sole sedentary Tuareg lect, Ghat (spoken in Libya) is missing a code. I struggled to find data on mutual intelligibility; Adam (2017) indicated that speakers from Ghat had a lot of trouble understanding non-Libyan dialects, but that this might be a recent phenomenon. There is a written Tuareg standard using Tifinagh, although it does not indicate vowels and does not seem widely used any more. Clearly, the current state is redundant, and I think collapsing into a single "Tuareg" would be workable, but perhaps not ideal.
South-Central Moroccan group, Tashelhit and Central Atlas Tamazight , both written in Tifinagh. There is also a poorly defined Judeo-Berber , which I don't want to get into (but reminds me of the issues with Judeo-Arabic). These are all fine, but ISO also gave a politically-motivated code for "Standard Moroccan Tamazight" , which is a redundant literary koiné that I have already posted a request to delete above.
Northwestern Moroccan group, Ghomara and Senhaja de Srair . No changes needed.
The Zenatic group, and here is where things get messy.
- The existing codes in the West are Tarifit , Chenoua , Tachawit . Missing are Iznasen (sometimes considered "Eastern Riffian", so presumably merged under ) and Eastern Central Atlas Tamazight (a kind of contact language between and , presumably merged under ).
- In the Moroccan and Algerian oases, Tagargrent , Tugurt/Temacine , Taznatit , Tumzabt , and Tidikelt (which is almost undocumented!) all have codes, thus excluding Sud-oranais (principally the Figuig dialect, which has a grammar). Mutual intelligibility in this cluster of dialects is known to be high, and this seems ridiculously oversplit; I would recommend one code, although I'm not sure what to call it. Ethnologue calls it "Mzab–Wargla", which are only two of the dialects and is obviously not great, while Kossmann calls it "Northern Saharan oasis" Berber, which is highly clunky, and my "North Saharan Berber" is a protologism.
- In the East, there is a single code for Sened , an extinct dialect in Tunisia, and no codes for the extant Tunisian varieties (principally Djerbi), nor for the closely related Zuwara dialect over the border in Libya. I recommend that be repurposed as "Tunisian Berber", and a new code be added for Zuwara (also known as "Zuaran"), which has a grammar by Mitchell; this could be considered oversplitting, but it would be odder for "Tunisian Berber" to have entries that are almost all from Libya.
The remaining languages of the East belong to various blocks, and are mostly covered appropriately: Kabyle , Nafusi , Siwi , Sokna , Ghadames , and Awjila . The only one missing is (the presumably extinct, and thus finite) Fogaha, which I recommended for a new code in its own section above.

Feedback would be appreciated, especially on the naming of North Saharan Berber. Note that I intend to give etymology-only codes for all the dialects that end up getting merged. —Μετάknowledge^{discuss/deeds} 22:04, 10 August 2020 (UTC)

Re North Saharan Berber: I think naming it "Northern Saharan Berber" is OK; we're not inventing the grouping, it's a descriptive name, and it's similar to and simply trims an excess word off of an existing name used in literature. - -sche (discuss) 21:06, 11 August 2020 (UTC)

Reviewing Tuareg further, I see that Heath (2005) refuses to take a stand: "it is very difficult to decide whether we are dealing with a single "Tuareg" language (with many dialects), or two or more languages (each with some internal dialectal variation)." That said, he endorses the four-way distinction (where "Tahaggart" should be renamed to "Tamahaq" to include the Kel Ajjer), supports the bundling of Tamasheq, and makes no mention of Ghat, presumably lumping it into Tamahaq as well. The 4 lects with codes all have good dictionaries (ttq and thz share a dictionary, but all forms are marked for which lect they belong to), and 3 of the 4 have grammars, so they won't be hard to sort out or attest. As a result, I now definitively lean toward retiring the macrolanguage and keeping the split codes. —Μετάknowledge^{discuss/deeds} 06:26, 26 August 2020 (UTC)

I still haven't dealt with this, and I found reason to merge Tuareg instead of splitting it. Sudlow (2008) says: "Tamasheq has split into three major dialects which the Tamasheq themselves refer to as ‘sha’, ‘za’ and ‘ha’, or Tamasheq, Tamajeq, and Tamahaq." His grammar itself supports the utility of that view, as it treats two dialects spoken together (often in a mixed form) in Burkina Faso, which would have to go under two different language codes in the splitter scenario. We would have to dialect-tag our entries carefully, of course. —Μετάknowledge^{discuss/deeds} 23:33, 8 March 2021 (UTC)

Kossmann, who is a leading Berberist, offers this judgement on Tuareg: "In spite of important differences, it would be exaggerated to consider these variants distinct languages." This makes me feel very secure in merging them, so I will finally get all of this in working order soon. —Μετάknowledge^{discuss/deeds} 19:41, 24 August 2021 (UTC)

I have made all the changes recommended in this section. I chose as the code for Northern Saharan Berber, and preserved the Tuareg dialect codes as etymology-only codes (adding an etymology-only code for Ghat). I added Zuwara as . I also merged Judeo-Berber into Tashelhit, following our practice with Judeo-Arabic (see Category:Judeo-Berber). —Μετάknowledge^{discuss/deeds} 05:37, 25 August 2021 (UTC)

RFM discussion: July–October 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Renaming~~

We currently call this language "Bangi Me"; the only grammar of it, and most recent scholarly work, uses the spelling "Bangime" instead. —Μετάknowledge^{discuss/deeds} 19:14, 14 July 2021 (UTC)

Support. (I notice Appendix:Bangime word list already uses the unspaced form, unlike the entry Bangi Me and categories.) - -sche (discuss) 04:57, 15 July 2021 (UTC)

Done —Μετάknowledge^{discuss/deeds} 07:27, 2 October 2021 (UTC)

RFD discussion: August–October 2021

The following discussion has been moved from Wiktionary:Requests for deletion (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Proto-Dardic entries~~

Currently there are 2, but none are sourced. ·~ dictátor·mundꟾ 19:03, 26 August 2021 (UTC)

That alone is not a reason to delete them. Perhaps @Victar will comment. —Μετάknowledge^{discuss/deeds} 03:30, 6 September 2021 (UTC)

So long as the child entries are on the Sanskrit entry, I don't mind if they're deleted, not because they're unsourced, but because Proto-Dardic is controversial. --{{victar|talk}} 06:33, 6 September 2021 (UTC)

Delete per

Wiktionary:Beer_parlour/2020/August#Unsourced_and_poorly_formatted_Proto-Dardic_entries_by_2401:4900:448F:3291:0:3F:8DB4:7F01_(talk)

Also, Reconstruction:Proto-Dardic/múkʰa- is poorly formatted. Kutchkutch (talk) 11:25, 6 September 2021 (UTC)

To clarify, I did not nominate them for deletion simply because of them being unsourced, but because of the language: seeing that some Proto-Indo-Aryan entries have also been deleted lately. ·~ dictátor·mundꟾ 06:56, 7 September 2021 (UTC)

@Kutchkutch: As far as the descendants of categorized Proto-Dardic entries are concerned, there are no more any instances of a Sanskrit entry wanting Dardic descendants. We have thus far spotted 4 Proto-Dardic entries (, , , ), and all of them are prepared for deletion. ·~ dictátor·mundꟾ 13:57, 6 September 2021 (UTC)
Delete Svārtava² • 14:01, 6 September 2021 (UTC)
@Bhagadatta. ·~ dictátor·mundꟾ 15:35, 27 September 2021 (UTC)
Delete. Those entries are unnecessary and involve complete guesswork. We can do without them. -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 16:11, 27 September 2021 (UTC)

@Inqilābī, Victar, Kutchkutch, Svartava2, Bhagadatta: Does this mean that we should not be reconstructing Proto-Dardic at all? If so, the real solution would be to remove the protolanguage code entirely — is that desired? Also, before they can be deleted, someone has to move the descendants. —Μετάknowledge^{discuss/deeds} 22:05, 8 October 2021 (UTC)

@Metaknowledge I would be supportive of that. No problem in removing it. I'll move the descendants. Svartava2 (talk) 02:54, 9 October 2021 (UTC)

@Metaknowledge: The descendants had already been moved one month ago, but I am not sure if we should get rid of the lang code, as it’s used in the Descendants section. @Bhagadatta, Kutchkutch, Victar: While the Proto-Dardic entries are to be deleted, should we keep the reconstructed forms in the Descendants section? @Svartava2 has removed the terms without prior discussion. ·~ dictátor·mundꟾ 06:48, 9 October 2021 (UTC)

I did see when I thought of moving the descendants that they were already moved. We can simply show Dardic in the descendants, name of the family (cat:Dardic languages) instead of {{desc|inc-dar-pro|-}}. Svartava2 (talk) 06:54, 9 October 2021 (UTC)

@Inqilābī, Svartava2: Even if the protolanguage code is removed and the reconstructed forms in the Descendants section are not needed, the Dardic label would still be necessary to separate those languages from other Sanskrit descendants. Kutchkutch (talk) 13:23, 9 October 2021 (UTC)

@Metaknowledge: Proto Dardic did most likely exist but there aren't enough published sources to cite while creating reconstructions. The study in the area is quite murky, with some authors even confusing them with Nuristani languages (Nuristani languages are geographically close to the Dardic languages but form a separate subfamily within Indo-Iranian unlike Dardic which is part of the Indo-Aryan subfamily). So yeah, Wiktionary would presently lose nothing by removing the inc-dar-pro code. It's like having Proto-Bengali-Assamese or Proto-Tamil-Kannada. If some good reliable sources that deal in Proto-Dardic become available later on, we can always add the code back. -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 13:02, 23 October 2021 (UTC)

I have removed the Proto-Dardic language from the module and its entries. Note to archiver: please archive this discussion to WT:LTD. —Μετάknowledge^{discuss/deeds} 20:38, 23 October 2021 (UTC)
@Bhagadatta, Inqilābī, Svartava2 The errors in Category:Pages with module errors that need to be addressed. Should every instance of the code inc-dar-pro in descendants trees be replaced by Dardic: or should this stage be removed altogether? Kutchkutch (talk) 12:43, 25 October 2021 (UTC)
@Kutchkutch for now it's ok to change all these to Dardic:, so that for is little change (just Proto- and recons term/ removed) and because we wouldn't have to change the ordering like if there is "{{desc|inc-dar-pro}}" before Gujarati we'd have to move the, say, Kashmiri descendant after Gujarati one in alphabetical order; hence its easier to be done by bot (most of these replacements were done by NadandoBot). From now it's up to the editor what to do; like with Bengali-Assamese: — diff and * {{desc|as}} * {{desc|bn}} * {{desc|syl}} are both valid and neither is wrong. However I do think that if there is only one desc like it would be better to show that one separate without label Dardic:. Svartava2 (talk) 15:09, 25 October 2021 (UTC)

RFM discussion: April 2020–March 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Remove avl “Eastern Egyptian Bedawi Arabic language”~~

An absurd language name apparently copied from glottologue. This is not even attestable, nor its synonym Levantine Bedawi Arabic, and no native would ever use this to create an entry. “Bedawi” is = bedouin. The “characteristics” in the Wikipedia article Northwest Arabian Arabic linked are in every or many Arabic dialects. There are difference in urban and desert-dweller speech everywhere but one sorts these speeches under the regiolects, so it would be Egyptian Arabic, South Levantine Arabic, South Levantine Arabic, Najdi Arabic. Fay Freak (talk) 13:11, 5 April 2020 (UTC)

Removed. —Μετάknowledge^{discuss/deeds} 01:39, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Remove aao “Algerian Saharan Arabic”~~

Unattestable or SOP language name fudged by the for-profit language database Ethnologue; the linked Wikipedia pages will always stay stubs. The Verkehrsanschauung sorts this under Algerian Arabic. There is a continuum with Moroccan Arabic. Pinging @Fenakhay following our considerations at Talk:زرودية; nobody has ever thought such a category. One can attest the term “Saharan Arabic” but this is of course not meant to denote one language. As distinguished from Hassānīya Arabic one could need codes eastwards for some dialects spoken in the Sahara and Sahel, like for Malian and Nigerien Arabic – Chadian Arabic we have as shu. Fay Freak (talk) 13:31, 5 April 2020 (UTC)

I think the idea here is that Algerian Arabic as spoken in Algiers belongs on a continuum that speakers in the southern and western desert of Algeria (inclusing many who are not ethnically Arab) do not belong on. I don't see the use of this code, though, because that speech definitely counts as Hassaniya. (As for the other countries on the fringe, we seem to count Nigeria and Cameroon under Chadian Arabic, which is less than ideal, but then again, I think that our whole system needs to be more like Chinese to be functional at all.) —Μετάknowledge^{discuss/deeds} 18:18, 5 April 2020 (UTC)

Removed. —Μετάknowledge^{discuss/deeds} 01:41, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_“Saidi_Arabic”_into__“Egyptian_Arabic”|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge “Saidi Arabic” into “Egyptian Arabic”~~

Hardly attestable language name. Occurrences for it turn out as “Saʿiidi Arabic” for example. “Upper Egyptian Arabic” (I don’t see use of “Upper Egypt Arabic”, but this is a detail) is sometimes posited, variously ill-defined because of superstratum influence from the Nile Delta, but this is usually not accepted as distinct language and can well be an sum-of-parts term; Arabic Wikipedia clearly says Ṣaʿīdīy Arabic is “within the Egyptian Arabic dialects”. There are but some isoglosses dividing Arabic-speaking Egypt latitudinally, but so one can shed urban and bedouin speech and Muslim and Christian speech. English Wikipedia says “the realisation of /q/ as is retained (normally realised in Egyptian Arabic as ” but this is the speech of the desert vs. the speech in the largest cities and for ق (q) can be found in northern Egyptian bedouins. Most natives of Upper Egypt would use the code for Egyptian Arabic, and I would too sometimes with label “Upper Egypt” or more specific labels. I have created the single entry in “Saidi Arabic” and only because there was this code. Fay Freak (talk) 13:54, 5 April 2020 (UTC)

Removed, with a new category Category:Upper Egyptian Arabic. —Μετάknowledge^{discuss/deeds} 01:43, 29 March 2021 (UTC)

RFM discussion: April 2020–March 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_“Sanaani_Arabic”,__“Ta'izzi-Adeni_Arabic”,__“Hadrami_Arabic”_into_a_new_“Yemeni_Arabic”|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merge “Sanaani Arabic”, “Ta'izzi-Adeni Arabic”, “Hadrami Arabic” into a new “Yemeni Arabic”~~

The categories are ill-defined; the second one is unattestable. Yemen is a dialectologically complicated area – the material is even partially collected outside of Yemen, as for instance the Dictionary of Post-Classical Yemeni Arabic 1990 is surveyed in Israel – with many isoglosses and the number of distinct languages is multiplied at will if one is oblivious of SOP designations of lects so for this reason, apart from nobody being able to safely use these categories, it is not wise to distinguish at the L2 level already. If you look into the Dialect Atlas of North Yemen and Adjacent Areas published by Brill four years ago and probably also Saint Elbakyan you will forget those categories and won’t be able to map content onto the codes. Apart from that the distinction is inconsistent with the fact that we have a code for “Judeo-Yemeni Arabic” jye – as if Jewish speech in Yemen were one language while Muslim speech were three! I just mention here that I find “Judeo-Arabic” and its sub-languages suspicious, it could all be only Classical Arabic or Arabic dialects with some peculiarities written in Hebrew script, remaining from a time when Wiktionary did not use {{spelling of}} for this purpose. The Jews might deal with it themselves. Fay Freak (talk) 14:23, 5 April 2020 (UTC)

I pinged you in a separate discussion above about Judeo-Arabic. As for Yemen... it is true that the codes are oversplit, but it is also true that the WAD attests to the fact that when Hejazi and Omani speakers disagree about a word, chances are that North Yemen agrees with the Hejaz or Egypt, and that South Yemen agrees with Oman. The existence of Piamenta's dictionary gives me hope for treating Yemen as a unit, but I don't know how he does it (do you have a PDF?). —Μετάknowledge^{discuss/deeds} 18:33, 5 April 2020 (UTC)

Nay, I do not. Fay Freak (talk) 18:36, 5 April 2020 (UTC)

I find this “South Yemen” and “North Yemen” part difficult and think that one needs an elevation profile of Southwest Arabia to dissect the Arabic dialects of and close to Yemen. Fay Freak (talk) 18:41, 5 April 2020 (UTC)

Deboo's Jemenitisches Wörterbuch is another good example of treating all Yemeni varieties together, with scrupulous dialect-tagging. —Μετάknowledge^{discuss/deeds} 01:54, 29 March 2021 (UTC)

Merged. I have arbitrarily chosen as the new code for "Yemeni Arabic". —Μετάknowledge^{discuss/deeds} 01:54, 29 March 2021 (UTC)

Adding Bwala

This doesn't seem to merit a discussion, but I am putting this here for posterity: I am adding Bwala, a previously undocumented Bantu language, as based on Bollaert et al. (2021). —Μετάknowledge^{discuss/deeds} 20:05, 21 December 2021 (UTC)

RFM discussion: August 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Yavapai vs. Havasupai-Walapai-Yavapai

We are currently treating Yavapai nai-yav as a separate language from Havasupai-Walapai-Yavapai yuf, but some of our yuf entries (e.g. 'ha) are tagged as Yavapai. Is there any difference between nai-yav and the Yavapai dialect of yuf? If not, can we merge nai-yav into yuf? And if we do want them separate, can we rename yuf "Havasupai-Hualapai" (as per Wikpiedia) or "Havasupai-Walapai" rather than keeping a different language's name in there? We only have one entry for nai-yav, namely ke#Yavapai. @-sche seems to be the only person who's dealt with these languages in the past. —Mahāgaja · talk 11:07, 30 August 2021 (UTC)

Interesting; I dug into the edit histories, and I added Yavapai in February 2016, at which time I also added a word. I added Walapai-and Yavapai-specific content under the trinitarian name Havasupai-Walapai-Yavapai a couple months later; I must've been looking to add the Walapai content, saw we had a code for Havasupai-Walapai-Yavapai, since Stephen G Brown had previously added Havasupai-Walapai-Yavapai content like T:yuf-personal pronouns and entries for those pronouns without dialectal labelling, and forgot about the Yavapai-specific code.
Wikipedia has Havasupai-Walapai and Yavapai as distinct but similar languages (a 1978 United States Indian Claims Commission decision says "the Walapai and Havasupai spoke a language and had a culture very similar to the Yavapai"); OTOH, Thomas Sebeok's Native Languages of the Americas, volume 1 (2013), page 467, cites them as an example of ethnic groups that fought "even though spoke the same Yuman language". Christopher Moseley's Encyclopedia of the World's Endangered Languages (2008) calls the language "Upland Yuman" ("Upland Yuman is a Yuman language"), spoken by the "Hualapai, the Havasupai, and the Yavapai, the last traditionally divided into four regional subtribes. Each community speaks a distinct variety, with the Yavapai varieties forming a well defined dialect, although all varieties are mutually intelligible with little difficulty." I suppose they should be merged. - -sche (discuss) 18:53, 30 August 2021 (UTC)

@-sche: OK, I moved our one nai-yav entry to yuf, deleted all the nai-yav categories, and deleted nai-yav from the language modules. —Mahāgaja · talk 10:39, 31 August 2021 (UTC)

RFD discussion: November 2021–January 2022

The following discussion has been moved from Wiktionary:Requests for deletion (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Knaanic~~

Knaanic hasn't been definitively proven to exist as a distinct language yet, and its few attestations (and entries here) seem indistinguishable from standard Slavic languages in Hebrew transcription. Here's a pretty good article by Dovid Katz on the topic. airy—zero (talk) 03:22, 4 November 2021 (UTC)

Knaanic is a useful holding category for the scattered attestations of West Slavic words in Hebrew script, which have to be included in the dictionary some way or another. This discussion doesn't belong at RFDO, because there is no chance we will delete all those entries, which pass CFI and definitely belong here. The question is whether we want to merge Knaanic into another language code, presumably Old Czech, although in all honesty, I rather doubt that all the Knaanic attestations can really be called Old Czech in good faith. The difficulty of deciding where they ought to go, and whether people would even think to look for them there, is why Knaanic exists as a separate language here in the first place. —Μετάknowledge^{discuss/deeds} 06:11, 4 November 2021 (UTC)

Oh, okay — thanks for the explanation! airy—zero (talk) 12:57, 4 November 2021 (UTC)

RFD-resolved. If you want to merge lects, propose something in WT:BP. — Fytcha〈 T | L | C 〉 12:13, 22 January 2022 (UTC)

RFM discussion: January–February 2022

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Adding Early Modern Korean (ko-ear) as an L2~~

(Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, Quadmix77, Kaepoong, Mahagaja, Metaknowledge): @Fish bowl, @Solarkoid, @Lunabunn, @Echo Heo (feel free to tag anyone else). Currently, Early Modern Korean (EMK) is listed as an etymology-only language, limiting its usage as a header and leading all EMK terms to be listed under the "Korean" header with an "Early Modern" label. However, through several discussions on the matter (thank you @ Tibidibi), it's been made more and more clear, that it needs to be separated out from Modern Korean in order for its coverage to be done well. Here are a few reasons (feel free to make any corrections if needed):

Contemporary Korean Linguistics and Korean dictionaries almost consistently separate out EMK as a separate language. For Korean linguistics done in Korean, the word 근대국어 (geundaegugeo) is used exclusively for it, and it's analyzed as its own. As for English sources, Cho, Sungdai; Whitman, John (2019), Korean: A Linguistic Introduction and Lee, Ki-Moon; Ramsey, S. Robert (2011), A History of the Korean Language along with several other resources also make the distinction.
Early Modern Korean used a rather different orthography compared to Korean with letters like ㅼ (st) and ㆉ (yoi) becoming obsolete after that time period, leading Modern Korean speakers without training in the language to struggle to comprehend it.
There are terms that are solely used in EMK such as 툐상 (tyosang) that deserve to have their own space and coverage rather than thrown under the "Korean" header and left alone.
Currently our etymologies are unclear in terms of showing the development of words from Middle Korean (MK) to EMK to Modern Korean (MoK) such as in ᄃᆞ리〮다〮 (tòlítá), or often skipping EMK altogether due to the lack of clarity. Additionally, with words such as 구무 (kwumwu), we'd be able to show more clearly the relation between EMK terms and their dialectal descendants.
Separating EMK out will allow us to more easily create pronunciation & transcription modules for the language (such as what was done with Jeju in Module:jje-pron & Module:jje-translit), rather than the somewhat inconsistent transcriptions that have been used in the past or attempting to include all the EMK use-cases within the current system.

Overall, having Early Modern Korean as its own L2 would greatly improve its coverage if done well, make life easier for the Koreanic editor community, make our content clearer and more precise for readers, and finally put Wiktionary in line with Koreanic Linguistics as a whole. AG202 (talk) 09:06, 23 January 2022 (UTC)

@AG202Strong support on all points. As a way of division, I suggest that:

The split between Middle Korean and EMK is generally a hard line at 1600, but the few texts that use MK-only characters may be treated as MK.
The split between EMK and modern Korean is more fluid.
- Texts made by foreign learners of the language in the late nineteenth century and onwards are considered modern Korean, despite being written in an EMK orthography, because they are often the first attestations of archetypally modern Korean words not attested in EMK, such as 때문 (ttaemun) and 언니 (eonni).
- 개화기 국어 (開化期國語, gaehwagi gugeo), the language of “modernizing” texts from the period from c. 1890 to 1910, is considered modern Korean despite having EMK orthography, because they are the first attestations of many of the Japanese orthographic borrowings (wasei kango) totally absent in EMK but a defining feature of modern Korean vocabulary.
- “Traditional” texts from the 1890–1910 period, such as personal letters of provincial people less affected by Westernizing trends or traditional novels untouched by Western and Japanese influence, are still considered EMK. The quotation at 섬섬옥수 (seomseomoksu), despite the late date, is a typically EMK material and would hence be quoted at Early Modern Korean 셤셤옥슈 (syemsyemwoksywu).
For IPA, I suggest aiming at a 1750 pronunciation, with the assumption that the vowel qualities were the same as in MK.--Tibidibi (talk) 09:39, 23 January 2022 (UTC)

Being that it's been three weeks, the only active EMK editor is in strong support, and there hasn't been any opposition (along with tagging all of the Korean editors in the working group), I'd like to close this as RFM-Split. @Erutuon, @Surjection, if you have the time, please. Edit: Apologies for forgetting to sign, and thank you @ J3133, for pinging them for me. AG202 (talk) 01:07, 13 February 2022 (UTC)

Done. Entries need to be moved over manually though — SURJECTION ^{/ T / C / L /} 11:32, 14 February 2022 (UTC)

RFM discussion: April–May 2022

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Formosan Kulon-Pazeh (uun) split to Kulon (uon) and Pazeh (pzh)

According to this ISO change request (Kulon-Pazeh Change Request Documentation), Kulon (uon) and Pazeh (pzh) are now separate languages. How do we update Wiktionary to add data specifically for pzh from now on (given that data is nonexistent for the long-extinct Kulon language). I'm in the process of adding lemmas and example sentences for several of the western Formosan languages. Kangtw (talk) 19:57, 3 April 2022 (UTC)

@Austronesier Chuck Entz (talk) 04:20, 4 April 2022 (UTC)

Given that all (uun)-lemmas are actually Pazeh lemmas, the easiest thing would be to just to rename "Kulon-Pazeh" to "Pazeh". –Austronesier (talk) 19:06, 6 April 2022 (UTC)

Well, and to recode entries to use the right code. (Is there really no data on Kulon?) Other ISO code changes discussed at Wiktionary:Beer_parlour/2022/March#2021_ISO_code_changes btw. - -sche (discuss) 22:16, 6 April 2022 (UTC)

Follow-up: Wiktionary:Etymology scriptorium/2022/May#ISO_639-3_code_uun_split_into_uon_an_pzh. - -sche (discuss) 17:52, 7 May 2022 (UTC)

See that discussion for more, but: I added uon and pzh; once instances of uun are updated, it can be removed. - -sche (discuss) 20:38, 8 May 2022 (UTC)

RFM discussion: Moving Finno-Ugric families to Uralic families

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Moving Finno-Ugric families to Uralic families

We don't recognize Finno-Ugric as a valid family; just Uralic. Hence urj is a valid code, but fiu isn't. Nevertheless, we're using fiu- as the prefix for four branches: fiu-fin for Finnic, fiu-mdv for Mordvinic, fiu-prm for Permic, and fiu-ugr for Ugric. I propose we use urj- for these instead, thus moving as follows:

fiu-fin → urj-fin
fiu-mdv → urj-mdv
fiu-prm → urj-prm
fiu-ugr → urj-ugr

At the same time, we should move the codes for the corresponding protolanguages:

fiu-fin-pro → urj-fin-pro
fiu-mdv-pro → urj-mdv-pro
fiu-prm-pro → urj-prm-pro
fiu-ugr-pro → urj-ugr-pro

as well as the code for the etymology-only lect Proto-Finno-Permic:

fiu-fpr-pro → urj-fpr-pro

I suppose we can keep fiu-pro as an etymology-only variant of urj-pro if it's important. What do others think? —Aɴɢʀ (talk) 14:29, 23 December 2016 (UTC)

That seems like a lot of disruption for a small theoretical benefit: we've always used codes like aus, cau, nai and sai that we don't recognize as families for making exception codes, so it's not a huge violation of our naming logic. In this case, though, it looks to me like we don't recognize fiu more because it's too much like urj, not because it's invalid, per se (though I don't know a lot on the subject). We do have gmw-fri rather than gem-fri, for instance. Of course, I'd rather follow those who actually work in this area- especially @Tropylium. Chuck Entz (talk) 20:59, 23 December 2016 (UTC)

I have no opinion on moving around family codes either way (it doesn't seem they actually come up much, whatever they are), but if we start moving around the proto-language codes, I would like to suggest simple two-part codes. Proto-Samic and Proto-Samoyedic are already smi-pro and syd-pro, so is there any reason we couldn't make do with e.g. fin-pro, fpr-pro, ugr-pro etc.?

Also, as long as we're on this topic, at some point we are going to need the following:

Proto-Mansi: (ugr-/urj-?)mns-pro
Proto-Khanty: (ugr-/urj-?)kca-pro
Proto-Selkup: (ugr-/urj-?)sel-pro

No rush though, since so far we do not even have separate codes for their subdivisions. The only distinction that comes up in practice is distinguishing Northern Khanty from Eastern Khanty (Mansi and Selkup only have one main variety that is not extinct or nearly extinct). --Tropylium (talk) 10:53, 24 December 2016 (UTC)

There is, actually, a reason: our exception codes are designed to avoid conflict with the ISO 639 codes, so they start with an existing ISO 639 code or a code in the qaa-qtz range set aside by ISO 639 for private use. fin is one of the codes for the Finnish language. fpr and ugr are apparently unassigned- for now. As for the three proto-language codes, those don't need a family prefix because they already start with an ISO 639 code. Chuck Entz (talk) 13:06, 24 December 2016 (UTC)

The only codes in the qaa-qtz range we actually use are qfa as a prefix for otherwise unclassified families and qot for Sahaptin (a macrolanguage that wasn't given an ISO code of its own), right? —Aɴɢʀ (talk) 07:55, 26 December 2016 (UTC)

Right. And I didn't notice that we used qot although it is not an ISO code; it seems we followed Linguist List in using it. For consistency, I suggest changing it to fit our usual scheme, so nai-spt or similar (nai-shp is already in use as the family code). - -sche (discuss) 05:10, 27 December 2016 (UTC)

Support, for consistency. fiu is different from nai, because fiu is agreed to be encompassed by a higher-level genetic family which also has an ISO code (urj), and that code can be used if we drop fiu. nai and sai are placeholders rather than genetic groupings, and they're useful ones, because If we dropped them we'd had to recode everything as qfa- (and might conceivably run out of recognizable/mnemonic codes at that point). - -sche (discuss) 05:10, 27 December 2016 (UTC)

Support per -sche, both the main issue being suggested here, as well as recoding Sahaptin. —Μετάknowledge^{discuss/deeds} 05:57, 19 January 2017 (UTC)

I've recoded Sahaptin and all the Finno-Ugric lects except fiu-fin-pro which requires moving a lot of categories, which I will get to later. - -sche (discuss) 17:20, 8 March 2017 (UTC)

I can volunteer to carry out the move from fiu-fin-pro to urj-fin-pro if there's still consensus that it should be done (which there is, as far as I can tell). — SURJECTION ^{/ T / C / L /} 09:19, 12 January 2023 (UTC)

Now there are lots of module errors in Cat:E as a result of these language code changes. It might be easiest to fix them by bot. @DTLHS, what do you think? — Eru·tuon 22:45, 9 March 2017 (UTC)

I'll see what I can do. DTLHS (talk) 23:09, 9 March 2017 (UTC)

@Erutuon I've done a bunch of them- I think the reconstructions should be fixed by hand. DTLHS (talk) 23:22, 9 March 2017 (UTC)

@DTLHS: Three incorrect language codes remain, I think: fiu-ugr, fiu-fpr-pro, urj-fin-pro. I couldn't figure out what fiu-fpr-pro should be; it seems to refer to Proto-Finno-Permic, but I searched various language data modules and didn't find a match. Is there someone who can look through and fix the remaining module errors that relate to incorrect language codes? @Tropylium, @Angr, @-sche? — Eru·tuon 04:52, 11 March 2017 (UTC)

Proto-Finno-Permic is an etymology-only language (and a kind of a legacy concept) that we encode as a variety of Proto-Uralic, if that helps. --Tropylium (talk) 13:58, 11 March 2017 (UTC)

The categories were easy to deal with: you just change the {{derivcatboiler}} to {{auto cat}} and the template plugs in the correct language code, if it exists. That also makes it a quick way to check whether there is a correct language code. by the time I finished that, there were only a dozen or so entries left in CAT:E due to everyone else's efforts, so I finished off the remainder by hand. It would have been easier if there hadn't been hundreds of other module errors cluttering up CAT:E- yet another reason for you to be more careful. Chuck Entz (talk) 21:10, 11 March 2017 (UTC)

I apologise for not catching and fixing those uses at the time I renamed the codes. I searched all pages on the site for each of the old codes, and some pages turned up , so I forgot to also do an "insource:" search to catch other uses inside templates like {{m}}. We so rarely change language codes compared to changing language names. - -sche (discuss) 23:36, 11 March 2017 (UTC)

Moved, some earlier, but Finnic is now also in process of being moved from fiu-fin to urj-fin. — SURJECTION ^{/ T / C / L /} 17:38, 14 January 2023 (UTC)

RFM discussion: New Permic codes

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

New Permic codes

Hi, I would like to request three new codes:

~~Perm for Anbur~~
prm-koo for Old Komi (Writing systems: Perm and Cyrs; Family: urj-prm)
prm-kya for Komi-Yazva (Writing system: Cyrl; Ancestor: prm-koo; Family: urj-prm)

Furthermore, the newly made prm-koo should be given as the ancestor of Komi-Permyak (koi) and Komi-Zyrian (kpv). The writing system Perm should be removed from both these languages.

The specifics of the codes' namings can be tweaked if there are any concerns. Pinging @Tropylium. Thadh (talk) 20:18, 16 November 2021 (UTC)

We already have Perm for the Old Permic script. Are you requesting that we change its canonical name to "Anbur"? —Mahāgaja · talk 08:04, 17 November 2021 (UTC)

Oh, couldn't find it for some reason... No , in that case it's okay. Thadh (talk) 08:20, 17 November 2021 (UTC)

Done. Thadh (talk) 13:17, 15 January 2023 (UTC)

RFM discussion: Mari phylogeny

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Mari phylogeny

It seems Eastern Mari (chm) is treated as an ancestor of Western Mari (mrj). Historically, that makes little sense though, because both are different written standards of the same continuum, so this is about akin to setting Nynorsk as a descendant of Bokmål, Komi-Permyak as a descendant of Komi-Zyrian or setting Livvi as a descendant of Karelian. I think we should set both as direct descendants of Proto-Uralic (urj-pro), or, alternatively, create a code for Proto-Mari. I know too little about the history of Mari languages to say anything sensible on that topic. Anyway, pinging some editors that might be interested, @Tropylium, Atitarev, -sche. Thadh (talk) 18:03, 20 February 2022 (UTC)

Previous discussions at Wiktionary:Beer parlour/2013/September#Merging Mari and Buryat varieties and Wiktionary:Language treatment/Discussions#Merging Buryat dialects; also, merging Mari dialects. —Mahāgaja · talk 18:25, 20 February 2022 (UTC)

I changed the one line in Module:languages/data3/m that derived mrj from chm. AFAICT this is all that needed to be changed? - -sche (discuss) 04:31, 1 June 2022 (UTC)

Yeah, that was the minimum requirement. I thought maybe if there were any solid Proto-Mari reconstructions, we could add a code for that, but seeing as nobody answered it's better to leave that for the future. Thadh (talk) 07:18, 1 June 2022 (UTC)

Making a family code

@-sche, Tropylium, Mahagaja Would anyone oppose moving chm to mean "Mari" (as a family), creating chm-pro for Proto-Mari and adding the iso-intended code mhr for Eastern Mari? If I understand correctly, @Surjection has volunteered to handle the bot jobs if needed. Thadh (talk) 18:52, 7 November 2022 (UTC)

No objection from me. —Mahāgaja · talk 18:57, 7 November 2022 (UTC)

No objection from me. Looking into the history, it seems we used the macrolanguage code chm for the standard variety (Eastern/Standard Mari) because of this discussion years ago, so I'll ping User:Atitarev who was involved in that old discussion. - -sche (discuss) 19:23, 13 November 2022 (UTC)

No objection from me either. Anatoli T. ^{(обсудить}/^вклад) 22:03, 13 November 2022 (UTC)

I'll begin the move probably later today, since there are no objections. — SURJECTION ^{/ T / C / L /} 08:57, 11 January 2023 (UTC)

Move ongoing. Please archive this discussion to Wiktionary talk:Language_treatment/Discussions, under the heading "Mari phylogeny". — SURJECTION ^{/ T / C / L /} 11:34, 11 January 2023 (UTC)

RFM discussion: January 2023

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Change L2 language name: Kurtop to Kurtöp~~

The Kurtöp language is a small Sino-Tibetan language spoken in Bhutan. We currently omit the umlaut from the name, but I suspect this is simply because ISO language names don't have diacritics in them. On the other hand, academic texts such as A Grammar of Kurtöp and An Overview of Kurtöp Morphophonemics do use it, and I think we should follow that trend. Theknightwho (talk) 15:57, 4 January 2023 (UTC)

Support. Thadh (talk) 16:58, 4 January 2023 (UTC)

Support. Vininn126 (talk) 17:01, 4 January 2023 (UTC)

Support. — Fenakhay ^{(حيطي · مساهماتي)} 17:03, 4 January 2023 (UTC)

Support —Mahāgaja · talk 11:28, 6 January 2023 (UTC)

RFM moved. Theknightwho (talk) 10:51, 14 January 2023 (UTC)

RFM discussion: November–December 2022

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Change "Middle Mongolian" to "Middle Mongol" (language code xng)~~

Discussion moved from WT:BP.

Literature on the form of common Mongolic spoken between the 13th and 16th centuries increasingly uses the term "Middle Mongol" rather than "Middle Mongolian". This is primarily because it was the ancestor to several extant Mongolic languages, not all of which are spoken by people who we would usually call Mongolians (though they are sometimes considered part of the Mongol people - especially historically). For example, Buryat (spoken in Buryatia, just north of Mongolia) and Kalmyk (spoken in Kalmykia, around 4,000km to the west of Mongolia). There are several others.

I also feel that this would (slightly) reduce the faulty implication that Mongolian is the "primary" descendant of Middle Mongol, too. To give a comparison, using the name "Middle Mongolian" is roughly equivalent to using the name "Old Russian" for Old East Slavic: obviously less than ideal.

This is supported by Glottolog, and I can provide recent academic sources using the term if needed. Theknightwho (talk) 18:08, 4 November 2022 (UTC)

Support, though these proposals are usually done in WT:RFM. AG202 (talk) 18:15, 4 November 2022 (UTC)

Support Hromi duabh (talk) 14:32, 25 November 2022 (UTC)

Support Just wondering when we're changing Old Portuguese to Old Galician-Portuguese for similar reasons...--Ser be être 是^talk/stalk 21:16, 10 December 2022 (UTC)

RFM moved. Theknightwho (talk) 15:26, 23 December 2022 (UTC)

RFM discussion: July 2022–June 2023

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Rename Ulukwumi (ulb) to Olukumi, add Yoruba as an ancestor for Lucumí (luq), & the case of Itsekiri (its)~~

As we're moving towards the creation of a more robust coverage of Yoruboid languages, see: agutan for a rough start, we'll need some changes made earlier rather than later.

"Olukumi" is the name often, if not most used, to describe the language, unsure why Ethnologue and SIL went with "Ulukwumi". Some sources: Entry for Olùkùmi in the Olùkùmi Talking Dictionary, Olukumi Bilingual Dictionary, Ngram.
Lucumi (luq) should be changed to have its name have the accent mark on the "i" becoming "Lucumí", and then being that it is a liturgical language derived from Yoruba, it should have Yoruba as its ancestor.
Additionally, there's the case of Isekiri (its), which looks as if it's spelled "Itsekiri" the most in English, including by the main driver of teaching the language nowadays "Learn Itsekiri". The spelling "Isekiri" seems to come from the spelling in the language which involves a ṣ, but being as that does not exist in English, it's often been adapted as a "ts". At the moment, I am leaning towards the "Itsekiri" spelling, though I would like more input if possible.

Those are the changes that we need for now, and hopefully we can continue increasing the coverage of these languages. AG202 (talk) 20:30, 6 July 2022 (UTC)

(Notifying Oníhùmọ̀, Oniwe, Egbingíga): AG202 (talk) 15:12, 21 July 2022 (UTC)

Sorry ooo, I don't usually pay attention to the notifications from Wiktionary and Wikipedia, I always think that it'll be difficult to use. About Ìṣẹkírì, in my opinion I think that you write it as Ìtsẹkírì because it isn't normal for them to write it with "ṣ" like us Yorùbá do. Egbingíga (talk) 02:44, 16 September 2022 (UTC)

I went ahead and changed Olukumi, per the persuasive evidence above. Not sure whether Lucumí really needs the accent; ngrams might suggest Lucumi was recently overtaken by Lucumí but it's not as overwhelming as with Ulukwumi not even getting enough hits to plot. - -sche (discuss) 23:25, 23 July 2022 (UTC)

@-sche Thank you for the changes! Hmmm for Lucumí, I'm still leaning towards including the accent, but I see your point. I do wish that there were more folks active here to comment on this though. AG202 (talk) 19:52, 13 August 2022 (UTC)

@-sche, seeing as though I'm just now seeing Wiktionary:Beer parlour/2019/September § Isekiri or Itsekiri? which had unanimous support, I'd think it'd be good to go ahead and make the change for Itsekiri. AG202 (talk) 15:19, 11 September 2022 (UTC)

I think Lucumí should be used as it more accurately transcribes how it is said Egbingíga (talk) 02:52, 16 September 2022 (UTC)

With this, I'm going to push forward with the changes. AG202 (talk) 22:26, 5 March 2023 (UTC)

@Benwing2, is it possible that we could move forward on these ones as well? The Olukumi change has already been made, but the others have not. AG202 (talk) 22:16, 21 March 2023 (UTC)

Isekiri → Itsekiri is now

done. — SURJECTION ^{/ T / C / L /} 12:17, 30 May 2023 (UTC)

Thanks! Lucumi → Lucumí is also now

done. Closing this discussion since I think everything was covered, and will archive in a week. AG202 (talk) 03:11, 1 June 2023 (UTC)

RFM discussion: October 2019–August 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Canonical name of "fan"

Our canonical name for fan is "Fang (Guinea)", which is unfortunate since it isn't spoken in Guinea. It's spoken primarily in Gabon and Equatorial Guinea. ~~I'd recommend calling it "Fang (Gabon)" since there seem to be more speakers in Gabon than in Eq.G. and since the name of Gabon is shorter. —Mahāgaja · talk 11:53, 17 October 2019 (UTC)~~ I'd recommend calling it "Fang (Equatorial Guinea)" since according to Ethnologue there are more speakers in that country than in Gabon. —Mahāgaja · talk 12:09, 22 October 2019 (UTC)

The fact that there are even some speakers in Cameroon (according to Wikipedia) but it's not the same as "Fang (Cameroon)" is icing on the confusion-cake... and they're both Bantoid languages, so disambiguating by family doesn't help, and both spoken in Central Africa, so we can't disambiguate by mere region as we sometimes do. - -sche (discuss) 16:00, 24 October 2019 (UTC)

@-sche: They are different branches of Bantoid, though. We could call "Fang (Beboid)" and "Fang (Bantu)". —Mahāgaja · talk 22:04, 24 October 2019 (UTC)

Ah, true; I saw that Wikipedia classified Fang language (Cameroon) as "Western Beboid" but caveated that it was not necessarily a valid family, but if Beboid overall is valid, then that works and is clearer (IMO) than picking just one of the countries to list. I would support renaming them in that way. The only other languages that come to mind which are disambiguated by family are "Austronesian Mor" and "Papuan Mor" both spoken in West Papua), which are mentioned on WT:LANG; probably we should change those to "Mor (Austronesian)" and "Mor (Papuan)" for consistency. - -sche (discuss) 18:00, 25 October 2019 (UTC)

Moved to Fang (Bantu) language and Fang (Beboid) language. —Mahāgaja · talk 18:46, 1 November 2019 (UTC)

I was going to rename Austronesian and Papuan Mor to use the 'parenthetical, postpositive' naming format for consistency with the Fangs and with how languages with country-name disambiguators are named, but I notice we also have e.g. Austronesian and Papuan Gimi, Austronesian and Sepik Mari (and some other Maris), and several other such languages, and I don't have time to rename all of those, so consistency will have to wait. (There is also "Sepik Iwam", but it appears to actually get called that, to distinguish it from the other Sepik language called Iwam which is spoken in the same place and belongs to the same Iwam subfamily of Upper Sepik. Confusing!) - -sche (discuss) 08:41, 28 November 2019 (UTC)

The reason I haven't archived this is that several other languages, mentioned above, still need to be renamed to fit the format of the other languages. I or someone else just need(s) to find the time... - -sche (discuss) 01:48, 6 August 2020 (UTC)

RFM discussion: May–September 2023

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Proto-Witotoan and Proto-Huitoto-Ocaina~~

Currently, the way they are treated makes no sense: Proto-Huitoto-Ocaina is handled as the parent of Proto-Witotoan and nothing else. In our word list, Proto-Witotoan is used to denote Proto-Bora-Witoto, which is a macroreconstruction that is very speculative. I propose we merge these into one language, Proto-Witotoan, which seems to be the more common term.

Notifying @-sche, Mahagaja. Thadh (talk) 15:32, 9 May 2023 (UTC)

No objection here. —Mahāgaja · talk 15:36, 9 May 2023 (UTC)

If I recall and understand correctly (?) the current setup is based on your request here, where I noted the issue of the broader/older grouping having been set in some entries as the child of a smaller/younger grouping. No objection to changing it. I see that a handful of Murui Huitoto entries use Proto-Witotoan in their etymologies, but these should be unaffected by removing Boran from its scope. - -sche (discuss) 16:53, 9 May 2023 (UTC)

Oof, I didn't even recall that discussion - I guess I was a bit too hasty, sorry. If there are no further objections, I'll just manually remove and/or change the reconstructions from the Murui Huitoto etymologies and fix this. Thadh (talk) 17:05, 9 May 2023 (UTC)

Merged. Thadh (talk) 20:34, 8 September 2023 (UTC)

RFM discussion: August 2015

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

West African Pidgin English varieties

Ethnologue has assigned codes to some but not all of the varieties of West African Pidgin English, and we in turn have incorporated some (e.g. pcm) but not all (e.g. not gpe) of those codes. As WP notes, the "contemporary English-based pidgin and creole languages are so similar that they are sometimes grouped together under the name 'West African Pidgin English'" (a name which also denotes their predecessor which developed in the 1700s). WP's examples are illustrative, particularly in that its Ghanaian and Nigerian Pidgin English examples are identical. I propose to merge at least the following three varieties into wes, renaming it "West African Pidgin English":

Ghanaian Pidgin English (gpe)
Nigerian Pidgin English (pcm)
Cameroonian Pidgin English (wes)

We could also discuss whether or not to merge Sierra Leone Krio (kri, which WP notes its often mistaken for English slang due to its similarity to English, but which has a somewhat distinct alphabet), Pichinglis / Fernando Po Creole (fpe), and Liberian Kreyol / Liberian Pidgin English (lir). - -sche (discuss) 21:11, 11 August 2015 (UTC)

The question is a very complex one. Firstly (but of least importance), scholars are divided on which lects have creolised and which have not, but it is generally agreed upon that at least some of the language you mentioned are not pidgins, which would make the name "West African Pidgin English" somewhat of a misnomer (the more neutral name "Wes-Kos" have been suggested as an alternative, but even linguists haven't fully adopted it). Secondly, all these lects are remarkably similar on a lexical level, but that's unsurprising; after all, they resulted from separate but very similar language contact events, and then probably modified each other (one scholar posits that Krio and Cameroonian Pidgin English relexified each other to some degree after pidginisation). The similarities are also obscured by the fact that there is nothing close to an agreed orthography for most of these, and pronunciation does differ a bit across West Africa. Linguistically, I'd probably merge them all, but practically that may not be the best decision. I know we have entries in pcm, but probably next to nothing for the rest, and if somebody wants to add them, given how each lect is very neatly assigned to a certain West African country, at least it won't be confusing for them to do so. Conclusion: the literature is schizophrenic, the lects mutually intelligible, and the existing situation remarkably unproblematic. Therefore I abstain. —Μετάknowledge^{discuss/deeds} 21:19, 16 August 2015 (UTC)

{{look}}

RFM discussion: July 2016–December 2020

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Balkan Gagauz Turkish

I see no evidence that this exists as a separate language, and move that it be merged with tr. The literature which references it seems to describe the dialect of Turkish which may be spoken by Gagauz people in the Balkan Peninsula. —Μετάknowledge^{discuss/deeds} 20:17, 3 July 2016 (UTC)

Wikipedia, citing Ethnologue, insists that Balkan Gagauz Turkish, Gagauz, and Turkish are all separate, and a few sources do seem to take that view, e.g. Cem Keskin, Subject agreement-dependency of accusative case in Turkish, or, Jump-starting grammatical machinery (2009) speaks of "Balkan Gagauz Turkish, Gagauz, Turkish, Iraqi Turkmen, North and South Azerbaijani, Salchuq, Aynallu, Qashqay, Khorasan Turkic, Turkmen, Oghuz Uzbek, Afshar, and possibly Crimean Tatar". Other references speak of Balkan Gagauz Turkish as a variety of Gagauz, e.g. James Minahan's Encyclopedia of the Stateless Nations says "The Gagauz speak a Turkic language also called Balkan Gagauz or Balkan Turkic, is spoken in two major dialects, Central and Southern, with the former the basis of the literary language. Other dialects Maritime Gagauz" (which comports with w:Gagauz's list of its dialects). Matthias Brenzinger's Language Diversity Endangered also treats Balkan Gagauz "or slightly misleading, Balkan Turkic" in his entry on Gagauz, but says it that the Balkan "varieties might deserve the status of outlying languages but very little information is available about them." (A few generalist references seem to subsume all gag into tr.) I would leave them all separate, pending more conclusive evidence that they should be merged. - -sche (discuss) 23:58, 3 July 2016 (UTC)

I think there's some confusion about what exactly we're talking about, and whether it's Gagauz or Turkish. Just because they use the term "Balkan Gagauz Turkish" doesn't mean that they're referring to the language with ISO 639-3 code bgx. When I look at who's citing the references listed for bgx at Glottolog, Manević (the reference for its classification) is cited in papers clearly talking about the dialects of tr. These are the only actual words attributed to this lect that I can find. —Μετάknowledge^{discuss/deeds} 00:33, 4 July 2016 (UTC)

@Tropylium, on the subject of Turkic languages spoken in Europe, do you know anything about this one, and about its differences or similarity to Gagauz and standard Turkish? - -sche (discuss) 01:08, 11 May 2017 (UTC)

I'm not previously familiar with this dispute, but here are a few handbooks on the topic:

Menges in The Turkic Languages and Peoples has the following slightly complicated quote (p. 11): "The Turkic languages spoken farthest west are the Balkanic dialects of Osman and Gagauz in Bosnia, Bulgaria and Macedonia. These seem to form two groups, one of possibly pre-Osman origin, and a later Osman one. To the former belong the Gaǯaly in Deli-Orman (Eastern Bulgaria), who, according to V. A. Moškov, are descended from the Päčänäg, Uz, and Torci (?), the Surguč, numbering about 7000 people in the district (vilājät) of Edirnä, who call themselves Gagauz. In Moškov's opinion, they, too, go back to the Päčänägs (?) and the Macedonian Gagauz; they number ca. 4000 people in southeastern Macedonia." — It seems clear that some group(s) corresponding to "Balkan Gagauz" is being identified here, but I am not even sure how to parse the sentence structure; e.g. are "Uz" and "Torci" some of the pre-Osman Turkic groups, or some of the alleged ancestors of the Gaǯaly? ("Osman" is, of course, Turkish.)
Hendrik Boeschoten in a classificatory chapter in Routledge's The Turkic Languages mentions that "a few speakers in northern Bulgaria, Romania and Greece, adhere to the Orthodox faith, and have their own history." This again seems to refer to "Balkan Gagauz", but with no indication of being its own language.

So far I would gather from this that "Balkan Gagauz" is at most a sister language of "non-Balkan Gagauz", and perhaps indeed just a different dialect group (perhaps one whose features are not reflected in written standard Gagauz). But the Manević 1954 paper would be more informative on this topic, if anyone wants to hunt it down. --Tropylium (talk) 11:55, 11 May 2017 (UTC)

@Allahverdi Verdizade, Crom daba: Here's an old, unresolved issue that could benefit from Turkicist eyes. —Μετάknowledge^{discuss/deeds} 23:59, 8 September 2018 (UTC)

I think Balkan Gagauz should be merged with gag, especially since it contains no entries. The few terms that would be specific for Gagauz spoken outside of the traditional Gagauz area in Moldova/Romania/Bulgaria can be dealt with within gag entries. The only thing is that some etymologies of other Turkic languages sometimes refer to Balkan Gagauz instead of Gagauz, because editors didn't know the difference between two. Otherwise I don't see any problems with merging them two.

On the other hand, Gagauz should definitely NOT be merged with Turkish, that is pretty obvious to me.Allahverdi Verdizade (talk) 05:09, 9 September 2018 (UTC)

@Metaknowledge This is a hard question, I can offer only guesswork.

I can't find any good maps for the distribution of Gagauz and (Muslim) Turks proper in the Balkans, most don't show Balkan Gagauz at all although we know they exist at least in Bulgaria and Macedonia.

It seems that they are not easily separated geographically from Muslim Turks although they presumably live in different localities. I'm guessing this means that their languages ("Balkan Gagauz Turkish" and "Rumelian Turkish") could be the same, although maybe only the latter call their language "Turkish", so I guess that they (would?) use Standard Turkish in education and administration.

This would be a good argument to merge Balkan Gagauz into Turkish, except that this paper shows that Balkan Turkic (if this really is a single language) is quite distinct from Anatolian Turkish and perhaps worth considering a different language. Baskakov also considers Balkan Turkish and (Moldovan) Gagauz to form a clade within Oghuz and Anatolian Turkish and Azerbaijani to form another. Crom daba (talk) 21:35, 30 September 2018 (UTC)

@Anylai, can you find anything in Turkish on the possible differences between Balkan Gagauz and Rumelian Turkish? Crom daba (talk) 21:38, 30 September 2018 (UTC)

Merge / delete it. The distribution of the name, the way it is “mentioned”, points towards it being a ghost language. The name is not attestable as used by anyone having particular information about it; nobody can add anything under it either in such a situation where it is a content-filled concept for nobody. Its alleged synonyms “Balkan Turkish” and “Rumelian Turkish” show it is just an SOP term for Turkish as spoken on the Balkans respectively Rumelia, i.e. remnant speakers of the Ottoman rule. German Balkantürkisch, distinguished from Türkeitürkisch as a regiolect. Fay Freak (talk) 13:38, 2 December 2020 (UTC)

{{look}}

RFM discussion: July 2016

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Marrithiyel

Maridan , Maridjabin , Marimanindji , Maringarr , Marithiel , Mariyedi , Marti Ke : should these be merged? References speak of a singular Marrithiyel language. - -sche (discuss) 21:30, 20 July 2016 (UTC)

RFM discussion: February–March 2017

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Chinese Pidgin English (cpi)

This is not a separate language at all, it's just English with different grammar and some loanwords, but other than that it's completely intelligible with standard English. As such, it should be moved to Category:Chinese English. -- Pedrianaplant (talk) 15:19, 8 February 2017 (UTC)

That's not at all the impression I get from Chinese Pidgin English. It seems to be a distinct language to me, as much as any other English-based pidgin. —Aɴɢʀ (talk) 16:45, 8 February 2017 (UTC)

We did delete Hawaiian Pidgin English in the past though (see Template talk:hwc). I don't see how this case is any different. -- Pedrianaplant (talk)

I know we did, but I didn't participate in that discussion (only 3 people did), and I disagree with it too, probably even more strongly than I disagree with merging cpi. —Aɴɢʀ (talk) 17:02, 8 February 2017 (UTC)

Basically, this is a terminological problem. There may have been a true pidgin in each of these cases, but it has not been recorded. What is called a pidgin in many descriptive works is instead a dialect of English that is very easy to understand, nothing like the real English-based pidgins and creoles that I have studied. If you look at the actual quotations used to support lemmas in Chinese Pidgin English, you find that it is Chinese English. Support merge, but leave as an etymology-only code. —Μετάknowledge^{discuss/deeds} 23:16, 8 February 2017 (UTC)

At least some texts seem very distinct, to the point of unintelligibility; consider "Joss pidgin man chop chop begin" (Whedon's translator begins chopping things? or "god's businessman begins right away"?). On the other hand, other sentences given by Wikipedia are quite intelligible...and possibly not attestable under the stricter CFI to which English is subject. I'm not sure what to do. (Our short previous discussion also didn't reach a firm resolution.) - -sche (discuss) 17:46, 8 March 2017 (UTC)
I mean, I use joss and chop chop in English normally (having grown up in a fairly Chinese environment likely has something to do with that)... and I think that was chosen as an especially extreme example. —Μετάknowledge^{discuss/deeds} 03:32, 25 March 2017 (UTC)

RFM discussion: November 2018–December 2023

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Cleanup suggestions for some badly attested Semitic languages, needing admin action~~

Discussion moved from Wiktionary:Grease_pit/2018/November#Cleanup suggestions for some badly attested Semitic languages, needing admin action.

Pray somebody add |scripts = {"Narb"} to Module:languages/data3/x after line 1026 for xna. (Otherwise mentions of words in it are shown in slanted letters.)
Added. DTLHS (talk) 03:17, 14 November 2018 (UTC)
It seems that even MediaWiki:Common.css needs a new class for Narb added, to get font-style: normal; Sarb is there and has it, Narb is not there. If the mention of a North Arabian word in عَنْكَبُوت (ʕankabūt) works then it is complete. Also I see that in Module:scripts/data Narb does not have direction = "rtl" while Sarb has. Fay Freak (talk) 14:43, 15 November 2018 (UTC)
Good catch. I've updated Common.css and Mobile.cc and set it to display rtl. Sadly, it seems there are no fonts that display it. If you or I could find a good image of what the letters are supposed to look like, I might have time to make a basic font iff the letters don't have to be joined the way they do in Arabic. - -sche (discuss) 22:08, 18 November 2018 (UTC)
I as an Archfag recently had a great update three weeks ago that adds displaying support for Old North Arabian, amongst other things like which improved Arabic and Syriac script rendering everywhere. gucharmap calls the name of the font by “Noto Sans Old North Arabian”, which I find in the filelist of the noto-fonts package. @-sche Fay Freak (talk) 22:29, 18 November 2018 (UTC)
I think everything under Category:Old North Arabian script languages should be “Ancient North Arabian” (xna), it is to wonder that Dadanitic (sem-dad), Hismaic (sem-his), Safaitic (sem-saf), Taymanitic (sem-tay), Dumaitic (sem-dum), Hasaitic (sem-has), Thamudic (sem-tha) are separate languages on Wiktionary (some also with no script assigned). (Prolly someone went through some lects and added all he found.) Those lects are at a level of attestion or study where it does not even matter whether they are dialects or languages, and “Thamudic” is even a collective term for any of the Ancient North Arabian lects not further classified. Many inscriptions cannot be classified unto more specific lects anyway (you know, people also were nomads and wrote graffiti here and there) and they can only be entered as “Ancient North Arabian”. With words being found randomly and in concise consonantal writing I don’t see why one would pursue separation other than by stating the find spot.
Also, “Qatabanian” (xqt), “Sabaean” (xsa), “Minaean” (inm), “Harami” (xha, redirects to “Minaean” on Wikipedia), Hadrami (xhd) – likewise otiose distinctions, regarding form and amount of attestion of Epigraphic South Arabian, as the name says only epigraphically attested, without any vowels –, have been unpopular in use already, entries and etymologies use the header “Old South Arabian” (sem-srb). I suggests to cross out those. Etymology-only is possible so one can use those in {{cog}} when in an individual case a word is known to be attested as of one of the dialects. North Arabian epigraphy categorization is more complex and it is better anyway to mention in each etymology where a lexeme has been encountered.
1. Himyaritic (sem-him), as an attested language, is rather mythical because the Ḥimyarites wrote Sabaean. Wikipedia mentions “three Himyaritic texts”, at the same time in the Encyclopedia of Arabic Language and Lingustics s.v. we read about two: “It is not even possible to establish whether they were written in the same language. The first text dates from around 100 C.E. and the second from around 300 C.E.” And about the secondary material from Early Medieval Arabs: “It is easy to see that quotations from Himyaritic offer very different readings according to the manuscripts.” Or according to others, mentioned in the EALL, Ḥimyarite is the same as Arabic, only with peculiar features (which might as well derive from Arabicized transmission, or later language fusion or whatever, much that could fool us). It could be grouped with those spurious languages if this category held languages from Antiquity.
Gurage is according to Wolf Leslau, it’s most eminent scholar, one language with twelve dialects; others share this view. The material for this language, particularly by Leslau across his works, only lists words as “Gurage”, without qualifying if they are “Inor”, “Mesqan” or some other Gurage, so on Wiktionary one cannot simply give “Gurage” words (which has recently been done in Semitic comparisons by abusing the code of the largest dialect Sebat Bet Gurage, in spite of the source saying “Gurage”). The following dialects I find on en.Wiktionary as languages: Kistane/Soddo (gru), Mesqan (mvz), Sebat Bet Gurage (sgw), Silt'e (stv), Inor (ior), Muher (sem-mhr), Mesmes (mys), Chaha (sem-cha), Wolane (wle), Zay (zwa); some of these are considered subdialects of Sebat Bet Gurage. There are more I don’t find on Wiktionary. It’s perhaps like with the Aramaic dialects yore or the Low German dialects today. People publish Westphalian dictionaries but it’s still Low German and so treated by Wiktionary. I suspect that instead of holding controversial subdivisions deriving from Ethnologue we should, holding to the sources, keep the Wiktionary-language level higher. The source for a certain word can be further qualified by labels as with Coptic. I mean that with language, unlike with biological taxonomy, one cannot simply assume that distinctiveness of a taxon is ascertained by experiments and then authoritatively published in some reference. As the individual forms are described in this dictionary, one must weigh if the data allows distinction at all. Currently it looks to me that hence Gurage must be lumped; I don’t know if, with new data or emerging different literary standards, separating the lects with separate codes will later be convenient (the increase in language material will be disappointing and unlikely someone will come and add Gurage in thousands of entries anyway, let’s be realistic), but I doubt that it would be comfortable. See also Why is Old Novgorodian a separate language in Wiktionary? This is the question: Is the difference in data enough to justify separation? The actual language-dialect distinction does not matter, it must be seen functionally, for dictionary purposes, for dictionary purposes. And if linguists publish material as “Gurage” the distinction is probably not good for Wiktionary headers. Isn’t it out of scope of Wiktionary to distinguish lect clusters when they are generally unwritten and chiefly written by and variously lumped and splitted by linguists? That’s a difficult question. Also I fear that such distinctions might be precisely the cause why nobody comes and pours out his rich Gurage knowledge. An adept would not be sure to distinguish, pendulating between two extremes, not witting if he should split as much as he can by all kinds of criteria or if to standardize and to abstract. To help though first all mentioned codes need the Ge'ez and Latin script both assigned, and the macrolanguage created. Maybe there will be late order from early ambiguity. Though I would perhaps do the order by lumping and labelling by location, were I that certain aficionado.

The obese Wiktionary:List of languages currently comprising 8055 lects needs cuts however. Fay Freak (talk)

This discussion really belongs at rfm, because that's where we normally discuss changes to whether or how we recognize a language. The Grease pit is for discussing how to implement something along those lines- not whether it should be implemented. The other option would be at the Beer parlour, but this seems like something that would benefit from the more specialized focus of rfm. Chuck Entz (talk) 03:39, 14 November 2018 (UTC)

Good distinction. I hesitated at 4:13 AM where to put it because of the mixed content. Moved. Fay Freak (talk) 14:16, 14 November 2018 (UTC)

Some prior discussion of Thamudic et al is on Category talk:Hismaic language; IIRC they were separated because literature does mention them as distinct entities, but if they were very similar or often treated as one language, and especially if there's difficulty in assigning specific texts to specific ones due to similarity, that would be an argument for reversing that decision and going back to the conservative approach of treating them all as one language with 'dialect'/'region' labels where appropriate.
(As to the venue, yes, these discussions tend to happen on RFM for quirky historical reasons — originally the discussions entailed actually merging or splitting language templates — although some have proposed the Beer Parlour as a more logical venue. There are minor benefits and drawbacks to either venue; this venue does have the advantage that discussions stay on the page until resolved.) - -sche (discuss) 17:20, 14 November 2018 (UTC)

I avoided Beer Parlour because I thought it is only for matters already affecting people, but it would not affect anyone we know now. Fay Freak (talk) 14:43, 15 November 2018 (UTC)

Who is likely to have access to resources on Africa's Semitic languages that could help judge what to do with Gurage? User:Metaknowledge, User:Wikitiki89? Wikipedia insists "The Gurage languages do not constitute a coherent linguistic grouping", which seems incompatible with merging them. William A. Shack, in his book on The Gurage, writes that "each Gurage dialect is usually understood only by its own speakers, and there is a rough correlation between the contiguity of dialect groups and the extent to which their dialects are mutually intelligible." (Steven Danver, in his (general-focus) encyclopedia, says "the languages of the different groups of Ethiopian Gurage are seldom mutually intelligible.") Marvin Lionel Bender, in his 1976 Language in Ethiopia, says "Although seventeen varieties of Gurage dialects are listed, mutual intelligibility reduces this to four languages and three dialect clusters as follows (Hetzron classification):
  Gogot, Misqan, Muxir, Soddo
  East Gurage (Inneqor, Silti, Urbareg, Weleni, Zway)
  Central West Gurage (Chaha, Gumer, Gura Izha)
  Peripheral West Gurage (Ener, Geto, Indegegn, Innemor)"
However, his very next sentence is: "Gogot, Muxir, Soddo comprise a geographical (non-genetic) grouping of non-mutually-intelligible languages known as 'North Gurage'", all of which seeems to suggest that merging all of the Gurages would not be sound.
- -sche (discuss) 17:28, 14 November 2018 (UTC)

The cited grouping of course adds to the confusion. Three languages, but four dialects clusters, not mentioning their intersections? Well, we will not find out how one should see them without deep-diving. But the question is which direction Wiktionary should go: likely the current division is not correct. Should Wiktionary just add all possible splits so they can be cleaned up later when someone would commit himself to add the whole Gurage and judge about which distinctions are most convenient or should we have one macro-code because distinction is hopeless? The reason why I have even mentioned Gurage is that for example Leslau’s Etymological Dictionary of Geʿez which I like to use just gives words as “Gurage”, which sounds like there is a common vocabulary. Fay Freak (talk) 14:43, 15 November 2018 (UTC)

Perhaps you can deduce from Leslau's literature list which Gurage language he gets his data from? He seems to have written an etymological dictionary of Gurage as well, presumably its foreword could clear things up.

His own field studies. I hade linked his Etymological Dictionary of Gurage (“according to Wolf Leslau” etc.). Fay Freak (talk) 15:23, 17 November 2018 (UTC)

As a volunteer project (run on fancy), we really have no other choice than to wait for someone to investigate the matter deeply and order the languages in a manner that facilitates their lexicographical work.

Maybe we need non-genetic language group categories and ways to give forms in unindentified languages belonging to language groups. Crom daba (talk) 15:49, 15 November 2018 (UTC)

@Fay Freak, -sche: A bit late, but here are my responses to the three outstanding problems (your #2–4):
1. It is fairly evident that Ancient North Arabian is not a single language, and I advocate that sem-xna be abolished rather than the specific language codes; read Al-Jallad (2018), "What is Ancient North Arabian?". He sees Safaitic (which he has written a grammar of) and Hismaic as being of the same continuum as Old Arabic, but they are obviously too distinct from Classical Arabic for lexicographical purposes. He supports the distinctness of the others as languages, and of the various "Thamudic" lects. Based on Al-Jallad, I would prefer we split Thamudic B, C, D, etc as necessary; each language will have a very small corpus, but it seems like the most honest way to do it, and if more inscriptions are found, the lettered Thamudic wastebaskets will probably get their own names as the others did.
2. Old South Arabian is also not a single language, though Sabaean was the standard that the other lects imitated, and I advocated that sem-srb be abolished as well. Multhoff (2019) in The Semitic Languages makes the case for four distinct languages: Sabaean, Minaean, Qatabanian, and Hadrami. She makes no mention, however, of Harami. Macdonald (2000), "Reflections on the linguistic map of pre-Islamic Arabia" explains that "Harami" is a name given to a few Sabaean texts that seem to have been contaminated by other Semitic languages, which is not at all an unusual feature and not unique to that site, so I suggest we remove that code.
  1. As for Himyaritic, I now think I was wrong to include it. There are three texts often attributed to it, but see Stein (2008), "The ‘Himyaritic’ Language in Pre-Islamic Yemen", which makes a strong argument to consider these as simply very late examples of Sabaean, which is indisputably the language of the other texts of the region in that script.
3. Finally, for Gurage, the chief problem is that some scholars follow Hetzron in saying that Gurage is polyphyletic, in which case lumping would be committing a grave error (and the same charge has been levelled for Aramaic, with perhaps more evidence). Meyer (2011) in the International Handbook does seem to support the unity of Gurage, and treats the lects together, which gives me hope for lumping, but he is unwilling to commit to whether they should be considered dialects or languages. I think your Gurage-adding genius is mythical, so we have to choose which is least bad: many languages with scanty coverage, because their forms may be similar to forms entered under a different L2 header; or one Gurage language with decent coverage, but many forms that are not marked for what dialect they belong to and therefore a poor resource. I hesitantly support merger, given those choices. —Μετάknowledge^{discuss/deeds} 03:13, 10 August 2020 (UTC)
4. An addendum: "Hadrami" is a terrible name for xhd, and invites confusion with Hadrami Arabic. Wikipedia uses "Hadramautic", but N-grams and a quick literature review suggests that "Hadramitic" is more common. @Fay Freak, -sche again (yes, I know I'm pestering, but I don't want to move forward on all this alone, both because I am fallible and because some of these, particularly splitting OSA, would require a bit of work, although in that case there is an online corpus that will help immensely). —Μετάknowledge^{discuss/deeds} 02:40, 17 August 2020 (UTC)

Re North Arabian: Many works I browsed through speak of Old North Arabian as a unit with dialects, but also carefully specify what lects (including Thamudic B vs C, etc) words are attested in. Some imply, in their presentations, that a large number of words are identical between dialects, at least in the sample of vocabulary that they're treating (e.g., the pronouns treated in Roger D. Woodard, The Ancient Languages of Syria-Palestine and Arabia (2008), pages 197-198), though this seems to be because the authors are presenting 'normal', normalized and romanized forms, given Al-Jallad's evidence that words (even the supposedly distinctive definite article) varied not just among dialects but even within the writings of individual speakers. The native script also loses many possible differences in pronunciation, but then, we are a written, writing-based dictionary. I find slightly more works speaking of "Ancient North Arabian dialects" than "Ancient North Arabian languages", and the fact that some authors have argued the varieties are the same language not only as each other but even as Arabic itself does suggest a high degree of similarity (or that the scholars in question are lumpers). As we're dealing with small, extinct and apparently clearly delineated corpora, it seems like the conservative approach of treating each under its own L2 could be better, and we could retire xna ... unless we need it as a wastebasket for unsorted things, which Al-Jallad (and Fay Freak, above) suggests we would. (Bah, It's messy business, deciding what's a language and what's a dialect...) I will try to dig into the rest later. - -sche (discuss) 04:10, 19 August 2020 (UTC)

Well myself I have added Sabaean, Minaean, Qatabanian entries meanwhile, understanding and quoting a few inscriptions, although apart from some occasional features I noticed little how such an inscription can be classified as either, other than by provenience or rulers or gods mentioned—but that must be due to my blasé comparative approach that also makes me read Romance without recognizing the individual language. So somehow the volition to a merge is gone, though the lumping codes “Old South Arabian” and “Old North Arabian” must be kept for inscriptions no one has classified. Both are useful.

For Himyaritic, however, nothing is left. As here said already, the three alleged Himyaritic inscriptions don’t even need to be in the same language, and they aren’t even from anything to be called Ḥimyar (there are “Lesser Himyarites” and “Greater Himyarites” and the ethnic identity is fragmentary, too, by the way). In the “Critical Reevaluation” of the Ḥimyaritic language – cited by Wikipedia on Himyaritic language one does not know what for: their “undeciphered-k language” header recently introduced is surely a made-up term, oddly suggesting that these inscriptions are yet another language when those “k-language” inscriptions are exactly those otherwise claimed for Himyaritic, so we see Wikipedia editors had no clue and phantasize together languages due to their disdain for primary sources – helpfully includes a map, also coming to the conclusion “we have no reason to assume the existence of some “non-Ṣayhadic” language in pre-Islamic Yemen that was spoken besides the (Late) Sabaic idiom known from the inscriptions.” That from the fact that “Himyaritic” words typically given from Arabic sources are all also found in Sabaic, and the grammar found in the three inscriptions, including the prefixed instead of postfixed article which is only found in two of them, is too either found in Sabaean or can well be ascribed to their being poetry, which is also the reason for their being poorly understood. Many Arabic poems are also hard to understand and mostly helped by the copious material for the language which is not the case for languages with so limited a corpus, like Old South Arabian. Even in the Digests, Latin prose, not all passages are of discoverable meaning.

What would hinder man though to add understood words with quotes from the ominous inscriptions as Sabaean? Or anything from Arabic sources transmitted as Himyaritic instead of Arabic as Sabaean? For there is no evidence for it being a particular language. You see, from the corpus-based standpoint Wiktionary takes Himyaritic must go. Nothing can get the header “Himyaritic”, it can only be mentioned at Sabaean or Old South Arabian entries that Himyaritic nature is suggested by those who have come to believe in this extraordinary claim for which extraordinary evidence is not provided. Fay Freak (talk) 04:18, 6 August 2021 (UTC)

I went on and moved our only “Ḥimyaritic” entry after that famous sentence to Yemeni Arabic in which the word طَيِّب (ṭayyib) for “gold” turns out otherwise known, and to be nothing else than Classical Arabic طَيِّب (ṭayyib, “good”) meaning “refined” and therefore gold, while Old South Arabian could not have developed such sense, so it is clear the famous quote one has been so inept to classify is at best only macaronic Sabaean-Yemenite Arabic. It is well put by Marijn van Putten:

The Arab grammarians were interested in describing correct usage of language of Classical Arabic. It is quite clear that Himyaritic (and by extension Yemeni Arabic) did not fall in the category of 'correct usage'. Within this context, it is of course not surprising that anything that is "wrong" and from Yemen might be denoted as Himyaritic. This would then include both varieties of Yemeni Arabic and some surviving vestiges of Ancient South Arabian. Fay Freak (talk) 04:59, 13 September 2021 (UTC)

Now also in a new article by Koutchoukali like communis opinio, though his blogs transpire by him stalking Wiktionary: later Muslim historians would refer to anything related to South Arabia’s pre-Islamic history as “Himyaritic,” all memory of its other states having passed away. Fay Freak (talk) 01:01, 18 September 2021 (UTC)

@-sche, Theknightwho: So are we gonna delete Himyaritic now? I promise there is nothing to add in it, and I have ever been averse to use the code, as on بطيخ; everything one needs to know about it is in the dictionary entry Himyaritic. Everything else in this five-year old topic is resolved:

Old South Arabian and Ancient North Arabian are both necessary evils in addition to individual languages of the Old South Arabian and Ancient North Arabian families because inscriptions are all over the place and sometimes not informative enough for specific identification, so we can’t remove the languages like we have removed Nahuatl, but like for Nahuatls the language treatments use to be sharp. Gurage nobody will reliably untangle in the near future; we may have a family under sem-eth for the “Gurage cluster”. Fay Freak (talk) 03:07, 6 December 2023 (UTC)

OK, per the discussion above I have removed Himyaritic. - -sche (discuss) 14:43, 7 December 2023 (UTC)

I struck the thread then. Fay Freak (talk) 00:52, 8 December 2023 (UTC)

RFM discussion: September 2018–February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Arawak and Island Carib~~

Any objections to me renaming Lokono arw (4 entries) and Kalinago crb (0 entries) to Lokono and Kalhiphona, respectively? Arawak is easily confused with the Arawak/Arawakan proto language and family, and Carib is one of two often confounded languages, the Carib language and the Island Carib language. --Victar (talk) 04:03, 6 September 2018 (UTC)

No objection to renaming Arawak, but I'm not sure about Kalhiphona, which seems to be quite rare even on a Google web search, and which seems to invite as much possible confusion (in its various spellings) with the various spellings of Garifuna as it avoids with other "Carib"s. - -sche (discuss) 06:56, 19 September 2018 (UTC)

Arawak → Lokono Done. Island Carib → Kalinago (rather than Kalhiphona, after further discussion at #Arawak and Carib again) Done. — Vorziblix (talk · contribs) 20:38, 2 February 2024 (UTC)

RFM discussion: July–August 2021

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Lenape

The decision to deprecate Lenape as a canonical language needs to be revisited. It seems to me that removing Lenape as a canonical language is like removing French as a canonical language and instead having only Quebecois and Parisian. Classifying Wënami and Munsee as seperate canonical languages is a narrow linguistic differentiation that quickly spills into the political. In a 2013 discussion (archived at Category talk:Unami language) the difference between Wënami amd Munsee was compared to English and Scotts. This is a faulty comparison because the political and cultural situation is in no way analogous (and language versus dialect is always a deeply political not just academic subject). The continuum of dialects once spoken in Southern New York, New Jersey, Delaware, and Pennsylvania were on the brink of disappearing forever. What we are witnessing now is a process of standardization that could revive the Lenape language. I hope the Wiktionary community would not want to contribute to fracturing such efforts. Unnecessary rigidity about what is to be considered a canonical language could jeopardize such standardization and revival. I strongly urge everyone to reconsider having a canonical language category for Lenape, and specifying the dialect of origin for specific terms, usages, and grammatical conventions. Note that all orthographies for Lenape are relatively recent inventions based on various European languages, and that standardization here is more than warranted. Andreas.b.olsson (talk) 05:17, 29 July 2021 (UTC)

@-sche Chuck Entz (talk) 05:45, 29 July 2021 (UTC)

Also pinging @DCDuring as a participant in the previous discussion. —Mahāgaja · talk 06:06, 29 July 2021 (UTC)

This is tricky. Some people want to document each language as such, especially with an eye to their past as mutually unintelligible languages (Marianne Mithun, The Languages of Native North America, 2001, page 331; Siebert even seems to suggest Munsee was more closely related or similar to Mahican than to Unami) with extensive differences in orthography and phonology (e.g. in having l vs r in their reflexes of PA's *r/*l phoneme). (Archaeological as well as linguistic evidence suggests the distinction between the two groups goes back to prehistoric times.) Merging northerly Munsee into the now more dominant southern Unami could complicate documentation of that critically endangered language, which some people are studying and learning from its remaining speakers in Canada. On the other hand, the most prominent revitalization efforts in the United States, based mostly on southern Unami, do seem to speak of Lenape as a single language (and FWIW, Canadian Munsee efforts, albeit with a different spelling, also seem to speak of a Lunaape language), apparently aiming to standardize the two into one language with an eye towards keeping it alive into the future. We have to think carefully about what to do here. - -sche (discuss) 17:52, 29 July 2021 (UTC)

I have nothing to contribute beyond the wish for some precision in the etymologies of toponyms, which can easily be accomplished with qualifiers or labels without complicating the creation of good language entries. DCDuring (talk) 18:33, 29 July 2021 (UTC)

You say on Talk:mochipwis that "I'm a learner of modernized Lenape, a language in the process of being revived. I am not an expert in the differences between Wënami versus Munsee. What I do know is that there are no first-language speakers of either dialect left." This goes to the heart of the issue, I think: Wiktionary does not only cover modern languages as they are currently spoken, but also covers languages that existed in the past. Although (as noted in the earlier discussion) I initially created "Lenape" content under a unified code, following the "lumper" approach of the US revivalists, the differences between the lects are (as Chuck noted in that same disscusion) historically extensive, to the point that the modern linguistic literature I've been able to find that says anything about their intelligibility accepts Mithun's statement that they were mutually unintelligible, which militates against combining them. (OTOH, if modern speakers are trying to merge them, that militates in favor of a merger.) Wiktionary does merge e.g. many Sinitic languages even when they're mutually unintelligible when spoken (and conversely we split e.g. Bokmal and Nynorsk even though they're not merely mutually intelligible but the same language), so either approach could be made to work. - -sche (discuss) 19:29, 29 July 2021 (UTC)

@-sche: What about recognizing all three codes? We could have Lenape (either using del or creating a new code for it, e.g. del-len, so as to keep del for the family) for the modern language undergoing revitalization, and also have unm and umu for the historical stages. —Mahāgaja · talk 19:57, 29 July 2021 (UTC)

Adding a link here to the important Lënape Talking Dictionary maintained by the Tribe of Delaware Indians. It should be recognized that the code del (for Delaware) is potentially problematic and controversial. If I say in New York City “I speak a little Delaware” I’m likely to get quizzical looks. But if I say “I’m learning Lënape” an educated resident of the city is likely to remember social studies classes in elementary school, contextualize, and understand what I mean. Wouldn’t it be better to have a new code for modernized Lënape that contains the letters in Lënape? Andreas.b.olsson (talk) 21:56, 29 July 2021 (UTC)

We use the ISO 693-3 codes for languages, so that's out of our control. They're not going to change the code; they've refused for www, which has some significant real-life problems. In practice, the codes may sometimes look like abbreviations, but in theory, there's no necessary connection; it's just a three letter code.--Prosfilaes (talk) 01:30, 30 July 2021 (UTC)

@Prosfilaes there is no ISO designation for the Lënape variants. So what language does the ISO code del stand for? Old Wënami? Mondern Lënape? Munsee dialects still spoken in Moraviantown? Andreas.b.olsson (talk) 02:50, 30 July 2021 (UTC)

@Andreas.b.olsson: In the ISO itself, del stands for a macrolanguage called Delaware whose individual languages are Munsee umu and Unami unm (see https://iso639-3.sil.org/code/del). We're not obliged to use ISO's names, though, so we are free to call del Lenape or Lënape rather than Delaware. Don't get hung up on the letters used for the code; they don't have to bear any relation to the name of the language (xcl stands for Old Armenian, for example, and the code for Mapudungun is arn). —Mahāgaja · talk 07:40, 30 July 2021 (UTC)

@Mahagaja:Fair enough about the abstraction. I did not know that ISO had a more complex sub-categorization. Thank you for providing the link and helping my knowledge about these standardizations grow. My concern was related to the fact that del is derived from Delaware – the European name for the region – and not Lënapehokink, the Lënape name. However, as far as I understand the Lënape are themselves comfortable with the syncretic designation and retain it in their name (Nation of Delaware Indians). I should therefore not assume any offending connotation in the designation del. I’m trying to reach out to the Oklahoma branch of the Lënape. Let’s see what they say about this issue. As a dominant constituency they should have a say about this actively evolving linguistic situation. On an additional note, there is a significant difference in the orthography being actively taught by the mainly Munsee branch in Moraviantown Canada.Andreas.b.olsson (talk) 08:49, 30 July 2021 (UTC)

I’m not sure there is anything wrong in the abstract splitting of the dialects/languages. It’s the naming in Wiktionary for these lects that does not reflect how these lects are referred to in actuality (where there is unfortunately ambiguity). A proper and respectful name is needed for the following variants:

Variant currently taught at Talking Lënape Dictionary and extensively documented by Jim Rementer et al in recent history (late 20th century).
Variant currently taught in Moraviantown where there seems to be wide variations between the last handful of fist-language speakers. I’m not sure any of them are still alive but the community seems to be sticking with their own historical orthographical variant rather than using Rementer’s et al. revised system (the one I’m learning and using).
Historical variant as recorded in the time of David Zeisberger (18th century).

The modern variant in Moraviantown seems to be referred to alternatively as Lunaape and Munsee Language. The modern variant based on Rementer et al work is consistently referred to as Lënape or alternatively without the schwa (Lenape). Andreas.b.olsson (talk) 13:06, 30 July 2021 (UTC)

How about the following naming:

Lënape
Lunaape
Old Lenape

The distance between the lects spoken by the Unami and Munsee tribes at the time of Zeisberger and prior seems largely speculative. Andreas.b.olsson (talk) 16:27, 30 July 2021 (UTC)

For the purpose of historical linguistics Old Lenape can be divided into:

Southern Unami
Northern Unami
Munsee

Note here that Unami seems to be a term derived from historical linguistics. I believe it should not be used to denote any of the revitalized lects currently in use. However, the Stock-Bridge Munsee do refer to their Lunaape variant in English as the “Munsee language”. Andreas.b.olsson (talk) 14:54, 31 July 2021 (UTC)

Unless the Stockbridge-Munsee Community object, I would suggest using Lunaape as the name of their orthographic variant and dialect.Andreas.b.olsson (talk) 15:04, 31 July 2021 (UTC)

Here is what I propose in terms of naming, categorization, and ISO encoding.

Old Lenape: del
1. Munsee: umu
2. Unami: unm
Lënape: lnp
Lunaape: lne

LNP and LNE are not used yet by ISO 639-3. Andreas.b.olsson (talk) 15:29, 1 August 2021 (UTC) North and South Unami can be treated as minor variants (i.e. dialects). Munsee is treated as distinct, leaving the question open about whether it was more closely related to Mohican. Andreas.b.olsson (talk) 15:35, 1 August 2021 (UTC)

We can't be making up our own three-letter codes, but we can put them after a family code, separated by a hyphen. So if we wanted to, we could in theory create alg-lnp, alg-lne, etc. —Mahāgaja · talk 10:08, 2 August 2021 (UTC)

Why can’t we be making up codes if they are not used? This can be in conjunction with ra request to ISO that these lects be added with those codes. It’s one thing to request a change, another the treat the ISO 639 as extensional. With its increased influence Wiktionary ought have some influence on ISO. Andreas.b.olsson (talk) 16:40, 2 August 2021 (UTC)

We can't make up codes because ISO might assign them to some other languages in the future. I think you might be overestimating Wiktionary's influence; also, the ISO committee would take a whole lot of convincing before they agreed to two more codes. Considering you started out this discussion arguing against treating Unami and Munsee as two different languages, it seems like it would be quite a stretch to convincingly argue that they are, in fact, five different languages. From everything you've explained above, I'm starting to think we should recognize only del as a canonical language and reassign unm and umu to be etymology-only codes for subvarieties of del; if necessary, we can add more etymology-only codes (since these are local to Wiktionary and have nothing to do with ISO) for other chronolects and regiolects. —Mahāgaja · talk 18:38, 2 August 2021 (UTC)

@Mahagaja I think your low-balling Wiktionary’s growing influence, and a few people’s ability to influence seemingly immobile institutions. My point has evolved based on @-sche’s input about linguistic historicity and the strong difference between orthography used by those who learn Lunaape versus Lënape. Importantly, there is extremely strong merit in treating the language recorded by Zeisberger et al in the 18th century as a different language. It is grammatically quite distant from Lënape and Lunaape. Note that the del, unm, and umu seem based on the historical study by Moravian missionaries (who referred to the language as “Delawarische Sprache”). There needs to be a clear distinction between these historical lects (which I’m referring to as “Old Lenape”) and the languages spoken today. Andreas.b.olsson (talk) 03:26, 3 August 2021 (UTC)

Seeming immobile to whom? They've seemed relatively responsive to me; they offer a final decision on most requests in an annual wrapup. I certainly question any need to bluster ahead and then expect SIL to follow; that seems likely to be counterproductive.--Prosfilaes (talk) 09:31, 4 August 2021 (UTC)

I would also note here that the German entry in Wikipedia is titled “Delawarisch” and talks about it first as a single language, and then alternatively two languages depending on one’s view. The equivalent English entry is “Delaware languages”. However, in both entries the focus is on the language(s) studied by Moravian missionaries et al, and then linguists studying those initial European studies (which haphazardly invented wildly different orthographies based on the authors own mother tongues). @-sche is right: it’s complicated. Here are the facts:

Old Lenape is grammatically very different from modern variants.
There are two very different modern orthographical variants.

If this were Norwegian, you would without hesitation have Gammelnorsk, Bokmål, and Nynorsk. Andreas.b.olsson (talk) 04:01, 3 August 2021 (UTC)

So if we are going to base decisions on precedent – which one ought to - then the way Wiktionary has approached the orthographically distinct Norwegian Bokmål and Nynorsk speaks for having distinct language codes for Lënape and Lunaape regardless of whether they should be treated as the same language or not. Unfortunately, del really in the end refers to Old Lenape since the ISO further subdivides it into Unami and Munsee, which are really references to the language(s) studied by Zeisberger’s et al and those who continue to study their studies. To use it for Lënape and Lunaape would be confusing and improper.Andreas.b.olsson (talk) 04:25, 3 August 2021 (UTC)

This is not Norwegian, and there's no other language that is presented like Norwegian. The closest are Ottoman Turkish and modern Turkish and Urdu and Hindi, and in both of those cases you're looking at different scripts, Arabic and Latin or Devanagari plus vocabulary divergence. One has to look at all the precedent, not just one example.--Prosfilaes (talk) 09:31, 4 August 2021 (UTC)

On a last note for tonight, the codes alg-lnp and alg-lne proposed by @Mahagaja for Lënape and Lunaape could be a reasonable compromise.Andreas.b.olsson (talk) 04:34, 3 August 2021 (UTC)

IMO, as someone who as added a good portion on the Munsee entries, unless someone can demonstratively show the need, I would say the way things are set up right now is perfectly fine. More work should be made to show the linguistic differences before any new codes are created. This all seems like hypothetical conjecture to me. --{{victar|talk}} 05:25, 3 August 2021 (UTC)

Having looked closer at Zeisberger’s work, I should say that how distant Old Lenape is from Lënape and Lunaape is uncertain to me. There seems to have been a lot of simplification in how verbs are used, with certain modes falling greatly out of favor (e.g the subordinate mode). The issue is complicated by the very different orthography used by Zeisberger, who used German inspired phonological conventions. If Wiktionary were to want to document these changes though, it ought to leave a space for Old Lenape. The issue reminds me of Swedish, which went through a drastic change in orthography in the late 19th and early 20th century.

I think what I get back to is that Unami is not the proper L2 name for a language. The proper name for this language is Lënape. So even if we keep the coding structure in tact, the name of that lect needs to be changed (Unami => Lënape). Andreas.b.olsson (talk) 09:24, 3 August 2021 (UTC)

Out of curiosity @Victar, how come you chose Munsee as the name of the L2 lect and not Lunaape? I see that the community in Moraviantown uses either designation. Nonetheless, what made you chose one over the other? If you are a member of the Stockbridge-Munsee community, please help me better understand and my sincerest apologies for potentially overstepping. Andreas.b.olsson (talk) 09:36, 3 August 2021 (UTC)

We (Wiki/Wikt) base much of our language nomenclature and classifications on the recommendations of Ethnologue.. --{{victar|talk}} 19:15, 3 August 2021 (UTC)

I guess another thing that bothers me is that the code unm – however abstract in the mind of the Wiktionary community – carries the stigma of a split. It’s continued use will cause continued sclerosis. It’s does not seem a term that all parties – Lënape in Oklahoma, Munsee in Canada, and enthusiasts like myself in New York City – can all rally around. It seems to me that there is something quite new going on, an amalgamation and standardization that should be captured in vivo by Wiktionary. Andreas.b.olsson (talk) 10:14, 3 August 2021 (UTC)

(Sorry, I was busy/distracted.) If we could reach out to Lenape folks, including Munsee speakers or learners, for clarification of whether they consider the lects one language (in particular, if Munsee consider their language to be the same as Unami, or if it is only the US-based folks running the Unami-based Lenape Talking Dictionary who name it as if it speaks for all Lenape), that'd provide useful evidence: if speakers want to consider the lects one language, it'd be evidence that we should merge (and distinguish with qualifiers, etc); otherwise, the current setup is fine. It seems some of the objection comes down to just the names, preferring "Lenape" to "Unami"; in another discussion Andreas notes that speakers would sooner say the equivalent of "I speak Lenape" than "I speak Unami", but this is not evidence that they are one language (consider various Dogon languages, which are all Dogons though not all mutually intelligible, including Bangime language whose speakers consider themselves Dogon who "seem unaware that is not mutually intelligible with any Dogon language"). Simply renaming "Unami" to "Unami Lenape" (and "Munsee" to "Munsee Lenape") is an option, although we usually try to use the most commonly used names for languages, which may be the current names. The shape of the codes is a non-issue since they're mostly not visible to readers, only editors. This discussion does make me realize that they're reader-facing in the names of certain categories, and that this is probably confusing, but that's a general issue not specific to Lenape. (How many people looking at our Khanty content would guess that the list of berries is at "Category:kca:Berries"? Shouldn't we always use language names?) There's no linguistic basis for introducing more than one new code (e.g. a five-way split!), as floated above. - -sche (discuss) 02:24, 5 August 2021 (UTC)

I’m the newcomer here @-sche. Though I have used Wiktionary for a long time, I did not prior to this participate in its maintenance. I will defer to you all on the best back-end encoding, and whether proliferating codes for anticipated future linguistic differentiating makes little sense. My objection – and this was not clear to me before either – seems reduced to the fact that “Lënape” was stripped from how the lect(s) are referred to. I think there is sufficient online evidence to make the affirmation that the so called Unami branch is almost exclusively referred to as Lënape. Therefore I think it would be warranted to make this change as soon as possible. Having looked at quite a few of the “Unami” entries it seems to me that they are almost all derived from the work of the Talking Lënape Dictionary project. Hence the issue of the distance between the lect recorded by Zeisberger et al versus the more recent collected speech samples of the Talking Lënape team is mute, and can be resolved later. Note though this may mean having to encode for another lect at some point in the future (especially since the spelling used by the Moravian missionaries was based on German and very different). I agree fully that the Munsee community should have a say in whether to treat their lect as the same as that taught by the Talking Lënape Dictionary project. It should be noted though that it can be factually stated that they have to date used a very different orthography in lessons and materials they have posted online. Andreas.b.olsson (talk)

@-sche, @Mahagaja, @Victar can we split out the decision about the naming for the lect with the code unm and rename it from Unami to Lënape? There is overwhelming evidence that Lënape should be the proper name of the language, and not Unami. The code issue and whether to treat the Munsee branch as a separate language (and what to call it) can be dealt with separately and later.

@-sche, @Mahagaja, @Victar asking again about splitting out the issue of the language's proper name and making an initial narrower decision. I would like to continue building out Wiktionary's Lënape entries, but first I would like to make sure the language currently referred to as "Unami" is referred to by its proper name. Andreas.b.olsson (talk) 13:01, 12 August 2021 (UTC)

I have no objection (other than that it's hard to type) to renaming unm Lënape. —Mahāgaja · talk 13:10, 12 August 2021 (UTC)

AFAICT "Lenape" is a far more common name than "Lënape", so renaming "Lenape" to "Lënape" does not seem appropriate. And because Unami is only one variety of Lenape, renaming Unami to Lenape/Lënape would be confusing. The difficulty seems to be that the main US-based language project produces its Talking Dictionary seemingly based on Unami but named as if it's one overarching "Lenape"/"Lënape" language. (I know this kind of thing has come up before, where a prominent dictionary of what it considers "a language" doesn't specify which of two actually-distinct lects its words are, or conflates them; I will track down other examples if necessary, but it seems tangential.) I don't want to stand in the way of a Native language revitalization effort, so the above-discussed option of leaving Unami and Munsee for the historical (and historically not mutually intelligible) varieties and making Lenape a language code (rather than a family code) for the "unified" language has some appeal, but it would entail duplicating a lot of content which is currently Unami. There is also the option of renaming Unami to "Unami Lenape" to get the "Lenape" name in there. - -sche (discuss) 19:19, 30 August 2021 (UTC)

Again, I agree -- renaming Unami to Lenape would simply cause confusion whilst adding no benefit and renaming it to "Unami Lenape" does not serve to disambiguate. --{{victar|talk}} 20:13, 30 August 2021 (UTC)

I’m not sure I understand for whom this would be confusing @Victar. I agree @-sche that Lenape may be simpler to type than Lënape. Anyway, the schwa indicator is optional in the dominant modern dialect of Lënape. I have a deep hunch here that Linnaean taxonomic orthodoxy and perpetuation of erroneous past anthropological claims are at fault for the strict linguistic distinction between Unami and Munsee. The more I study Lënape, the fewer distinctions I see in the various dialects except for in the choice of orthography (which noticeably stems from quite recent European attempts to record an unwritten language). Andreas.b.olsson (talk)

I suspect that Wiktionary is perpetuating a European construct erected between the late 18th and early 20th century. Of course, these faulty past errors of European linguists have had real effect, seemingly causing “two languages” to appear to exist. I need to further study the Munsee branch to substantiate my claims.

I continue to believe Unami is a culturally and linguistically inappropriate term for any dialect of Lënape. If the language spoken by the descendants of the Unami tribe of the Lënape were to have any other name than Lenape – which I insist is the appropriate designation – it might be Delaware. This is a name I believe the (predominantly) Unami descendants living in Oklahoma accept and sometimes use in English despite its European origins.

Wanishi Andreas.b.olsson (talk)

RFM discussion: January–February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Arawak and Carib again~~

Resurrecting an old discussion on the same subject above, I propose we rename Arawak (arw) and Island Carib (crb) to Lokono and Kalinago, respectively. As User:Victar noted in that discussion, Arawak is easily confused with the Arawak/Arawakan family. (Indeed, I have run into many etymologies where someone has mislabelled a word from a different Arawakan language as Arawak). Island Carib, meanwhile, is (1) not a Cariban language, making etymological discussions occasionally confusing; (2) no longer generally called by that name, since the people are now officially called the Kalinago by the Dominican government; (3) easily confused with the unrelated 'true' Carib language, which we call Galibi Carib.

I would also add this latter language (the one we currently call Galibi Carib (car)) to the discussion, and suggest it should be renamed either Kari'na or Kari'nja, names that are both more common in the current literature and closer to the native name (karìna in Courtz’s orthography). (The most common name for this language in the literature is actually simply Carib, but this may itself be confusing.) — Vorziblix (talk · contribs) 15:56, 3 January 2024 (UTC)

Support: Obviously still support renaming both to Lokono and Kalinago, respectively. I can't speak to Galibi Carib as I'm not familiar with the academic literature. --{{victar|talk}} 16:22, 5 January 2024 (UTC)

I hope that the entries Arawak, Island Carib, Cariban, Kalinago, Galibi Carib, Kari'na, Kari'nia, and Carib have or will have all the definitions and attestation to help relatively normal users through this. DCDuring (talk) 18:54, 5 January 2024 (UTC)

I’ve added most of those missing entries (and improved one or two others), and have also changed Galibi Carib (car) to Kari'na, since I am actively working on that language and would rather make the change while our coverage is still small. I will leave the discussion to run for a few more weeks before changing the others, and we can always also revert/change Kari'na to something else if the disucssion turns that way. — Vorziblix (talk · contribs) 13:42, 16 January 2024 (UTC)

All three of these moves are now complete. The change away from ‘Arawak’ came not a moment too soon; a large amount of references to ‘Arawak’ (arw) in our entries were entirely wrong and needed to be corrected to ‘Arawakan’ (awd), and indeed there may be some remaining references to arw that are still wrong. — Vorziblix (talk · contribs) 06:42, 2 February 2024 (UTC)

Arawak → Lokono, Island Carib → Kalinago, and Galibi Carib → Kari'na Done. — Vorziblix (talk · contribs) 20:38, 2 February 2024 (UTC)

RFM discussion: January–March 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Mansi Languages Split~~

Since the Mansi language represented in Wiktionary isn't a thing, since it is several languages compiled under one name and code, I would like to propose a split. It would be beneficial to split it along the 4 main dialects, (the ones still extant) Northern, Eastern and (extinct) Western, Southern. The subdialectal markings can be represented with labels, already in place in the current category for Mansi. Ewithu (talk) 18:19, 9 January 2024 (UTC)

DoneEwithu (talk) 13:22, 9 March 2024 (UTC)

RFM discussion: January–March 2023

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Rename Pomeranian to Proto-Pomeranian

"Pomeranian" is essentially a term for the family consisting of the Kashubian and Slovincian languages. In fact, Pomeranian (or its Polish counterpart, język pomorski) has always been used as a synonym of Kashubian (Polish język kaszubski), as witnessed by the 1893 dictionary called Słownik języka pomorskiego czyli kaszubskiego ("Dictionary of the Pomeranian a.k.a. Kashubian language").

There are no written records of an ancestor of both Kashubian and Slovincian, and any attestation of a Pomeranian lect will automatically fall into either of the two categories according to our current handling. However, the two languages do share an ancestor, and this ancestor did influence other languages. As such, it seems only logical to set this language as a proto-language, and rename it to Proto-Pomeranian in reference to its being the unattested ancestor of a language family, rather than being an attested language.

Pinging @Sławobóg, Vininn126, Gnosandes. Thadh (talk) 13:55, 9 January 2023 (UTC)

Support. Vininn126 (talk) 13:59, 9 January 2023 (UTC)

Support. Sławobóg (talk) 14:10, 9 January 2023 (UTC)

Support. // Silmeth ^@talk 15:09, 9 January 2023 (UTC)

Oppose. Gnosandes ✿ (talk) 16:47, 9 January 2023 (UTC)

Support. —Mahāgaja · talk 08:24, 11 January 2023 (UTC)

Support. — Fenakhay ^{(حيطي · مساهماتي)} 11:56, 11 January 2023 (UTC)

Support I have looked into usage and it appears that this ancestor—not only in English, but also Russian and other current tongues of linguistic science—can only be called “Pomeranian” in the same way as Proto-Slavic can be called “Slavic” language, and it the same way as “Slavic” is any Slavic language, or “Turkic” any Turkic language etc., “Pomerian” is any language of the said group. Fay Freak (talk) 12:35, 11 January 2023 (UTC)

Done. @Vininn126, Thadh, Sławobóg, Silmethule, Gnosandes, Mahagaja, Fenakhay, Fay Freak Currently there is no family corresponding to Proto-Pomeranian and the code for this language is 'zlw-pom', which is exceptional in lacking the '-pro' suffix normally given to proto-languages. I'm thinking we should add a 'Pomeranian' family with code 'zlw-pom' and give Proto-Pomeranian the code 'zlw-pom-pro'. Thoughts? Benwing2 (talk) 07:25, 28 February 2023 (UTC)

Yes, absolutely. zlw-pom should be a family code and zlw-pom-pro the protolanguage code. —Mahāgaja · talk 08:13, 28 February 2023 (UTC)

@ZomBear, Mahagaja Can you take a look at Reconstruction:Proto-Slavic/žarъ? @ZomBear listed a non-reconstructed descendant for Proto-Pomeranian and I'm not quite sure how to fix this as I'm not sure where that term comes from. Benwing2 (talk) 23:18, 28 February 2023 (UTC)

@Benwing2 error corrected by adding * (in *žarъ). Pomeranian term I met here — Martynaŭ, V. U., editor (1985), “жар”, in Этымалагічны слоўнік беларускай мовы (in Belarusian), volumes 3 (га! – інчэ́), Minsk: Navuka i technika, page 210 -- ZomBear (talk) 03:16, 1 March 2023 (UTC)

BTW I renamed the codes. Benwing2 (talk) 23:24, 28 February 2023 (UTC)

Obviously, there is no “Proto-Pomeranian”. There are no reconstructions, no comparisons of paradigmatic morphology, and so on. Gnosandes ❀ (talk) 08:41, 1 March 2023 (UTC)

The little reality of the field of study does not equal the irreality of the language itself. Subbranch proto forms are often left unreconstructed for being too small a fish when a larger one is available. Fay Freak (talk) 22:28, 4 March 2023 (UTC)

RFM discussion: January–February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Merging Proto-Pomeranian~~

As it stands, Proto-Pomeranian is next to useless and I don't see it becoming any more useful. There is a single lemma that we could treat as Pre-Kashubian. I don't see any reconstructions coming in the future as a lot of linguistics consider Slovincian a dialect of Kashubian (this needs further analysis and it's something I'm working on). The next highest level that linguists agree on is something like Pomeranian-Polabian, they share a lot of sound changes in common, but I think it would make more sense to treat descendants sections with something like:

* Pomeranian-Polabian:

** {{desc|csb}}

** {{desc|pox}}

** {{desc|zlw-slv}}

I propose we remove Proto-Pomeranian and merge with Kashubian. Vininn126 (talk) 12:42, 30 January 2024 (UTC)

Support This was worth a try but didn't turn out well. Thadh (talk) 12:53, 30 January 2024 (UTC)

There is no such thing as "Pomeranian-Polabian", hard no to that. Sławobóg (talk) 14:06, 30 January 2024 (UTC)

And to the other, more pressing issue? Vininn126 (talk) 14:43, 30 January 2024 (UTC)

I support removing Proto-Pomeranian, adding it was a mistake. We have nothing to make it work. It is probably not different from Proto-Polish and stuff like that. Sławobóg (talk) 15:00, 30 January 2024 (UTC)

Support. I wrote a long time ago that "Proto-Pomeranian" is nonsense. I also now write that "Pomeranian-Polabian" is nonsense. ɶLerman (talk) 18:28, 30 January 2024 (UTC)

I'm going to speedy delete this tomorrow - I think everyone who has an opinion has spoken. As to the grouping, I have a new idea that I will bring up elsewhere. Vininn126 (talk) 13:27, 5 February 2024 (UTC)

Deleted. Vininn126 (talk) 13:42, 6 February 2024 (UTC)

@Vininn126: It turns out there were at least half a dozen entries linking to Proto-Pomeranian words in either the etymology or the descendants. There were a couple of those where the proto-form was followed by its alleged descendants, which I fixed by simply replacing it with the word "Pomeranian:". The rest need to be dealt with. Chuck Entz (talk) 04:56, 7 February 2024 (UTC)

@Chuck Entz I thought I had caught them - can you provide a list? Vininn126 (talk) 08:12, 7 February 2024 (UTC)

@Vininn126 See CAT:E. Benwing2 (talk) 09:33, 7 February 2024 (UTC)

Ah, those hadn't popped yet when I checked, guess the server needed time to refresh. Vininn126 (talk) 09:37, 7 February 2024 (UTC)

@Vininn126: Sometimes it takes a few days (widely transcluded modules can take langer- there are still errors popping up from a mistake someone made and immediately fixed last week).

A better method is to use insource:"zlw-pom-pro" in Special:Search with the option for searching all namespaces checked. The single remaining one is in the Reconstruction namespace (There's also the archived rfv discussion for the reconstruction you deleted, but I'm not sure what to do about that). Chuck Entz (talk) 15:42, 7 February 2024 (UTC)

@Chuck Entz I don't see any reconstruction? Vininn126 (talk) 15:47, 7 February 2024 (UTC)

@Vininn126: Reconstruction:Proto-Slavic/vydati Chuck Entz (talk) 15:52, 7 February 2024 (UTC)

RFM discussion: February–March 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Splitting Khanty Languages~~

A long-standing issue with Ob-Ugric Languages as a whole (which I tried pointing out with the request for splitting Mansi as well) is that they are categorized under one language, while they could be considered separate languages, one fact is due to mutual unintelligibility.

But this talk is considering Khanty languages. We could do the way that it is on glottolog with the Khantic being the root of the all other Khanty languages. You can also find other sources for this on the List of languages of Russia (v2023) (in russian) site with the 143-146 numbers on the list

@Nyuhn I know you've only started to work on the Khanty language not so long ago, but I especially would love to hear your thoughts on this matter :] Ewithu (talk) 13:02, 6 February 2024 (UTC)

@Thadh Can you comment (or ping the appropriate editors, if you know them)? You seem to know a lot about obscure languages. Benwing2 (talk) 03:28, 7 February 2024 (UTC)

I'm not read-up on Mansi and Khanty and don't have the time to at the moment, but maybe @Tropylium, Rua, Surjection may have a clue. Thadh (talk) 08:37, 7 February 2024 (UTC)

Yes almost all western researchers have been on board for decades now with "Khanty" as being at least three languages. Russian scholars are catching up to it too (e.g. a study was recently advertized as "Khanty dialects found to differ more than Slavic languages"). The minimal division would be Northern Khanty, Southern Khanty, Eastern Khanty as still used by e.g. Salminen in Routledge's recent (2023) The Uralic Languages, 2nd Edition handbook. In terms of practicality for someplace like Wiktionary, there's three or four essentially different literary standards for Northern (Kazym, Shuryshkar, Obdorsk; also the by now little used Middle Ob) and two or three for Eastern (Surgut, Vakh-Vasyugan, nascently Salym) which each could be treated as different languages too. The division of Northern is not highly in line with what comparative studies would have to say (many field doculects could not be nicely fit to it) but for anything documented from literary use, this would save a lot of "(Vakh dialect)" "(Obdorsk dialect)" comments. --Tropylium (talk) 12:48, 7 February 2024 (UTC)

If the literary standards represent mutually intelligible dialects I would suggest keeping them unified. This is somewhat similar to the situation with Galician, where there are two competing literary standards ("standard" and "reconstructionist", with the former made to look like Spanish and the latter to look like Portuguese), and we do use labels to handle the differences (and if the same spelling is used for both, there are two verb tables under Conjugation). The main issue I see with creating separate languages is it leads to duplication as you can't simply point the word in one literary standard to another using {{alt form}}. Benwing2 (talk) 21:18, 7 February 2024 (UTC)

Compare also Occitan, handled as a single language with six or so dialects. Benwing2 (talk) 21:21, 7 February 2024 (UTC)

I don't know too much about Galician nor Occitian, but if the vocabulary difference would be our biggest problem between the Khanty dialects, we would be ok to keep it unified.

But since between Khanty dialects, even the grammar differs. I don't know how we could handle the different ways nouns decline or verbs conjugate if we keep it unified. Ewithu (talk) 22:10, 7 February 2024 (UTC)

@Ewithu What I mean is, split based on mutual intelligibility, hence maybe a 3-way split or however-many-way split, but not split dialects just because they have different literary standards, if the literary standards refer to mutually intelligible dialects. Grammar also differs among Galician and Occitan standards and it's handled just fine. Benwing2 (talk) 22:49, 7 February 2024 (UTC)

Coptic is another example where we handle multiple (4) literary standards under a single L2. Benwing2 (talk) 22:56, 7 February 2024 (UTC)

Komi is splitted into three languages, Udmurt is in two, Selkup is also in two, Mari is in three (should be Northwestern one two by the way), Koibal has its own section despite being essentially a Kamassian dialect. I think splitting it into big chunks would really help reduce messiness, especially if there are also going to be dialects written down.

Two of the Mansi languages are extinct, to me it just feels wrong to mix everything together. Kaarkemhveel (talk) 23:27, 7 February 2024 (UTC)

@Kaarkemhveel I am not proposing mixing everything together but splitting based on mutual intelligibility. Glottolog for example has 5 Khantyic languages and 3 Mansic languages. What I'm concerned about is splitting every proposed literary standard into its own microlanguage; this would give us 8+ Khanty languages and I don't know how many Mansi ones. Things like "Koibal has its own section despite being essentially a Kamassian dialect" IMO should not be considered precedents for micro-splitting. Benwing2 (talk) 00:57, 8 February 2024 (UTC)

Right, that sound reasonable. In my head at least it looks like Northern/Eastern/(Salym?)/Southern Khanty and for Mansi either Nothern/Eastern/Western/Southern or Northern/Eastern/Western&Southern (though the last suggestion based rather on their current use status).

Kaarkemhveel (talk) 06:29, 8 February 2024 (UTC)

Dear EnWiktionary bosses! I must keep you informed that so called 'Khanty', 'Koryak', 'Selkup', 'Nenets' and some other languages which are used to be considered as united languages have never been a matter of fact of the reality but only a product of early-Stalin national policy. Speaking about Khanty subfamily, the main sources have been already provided above. Khanty subfamily consists of at least 4 languages: Vakh-Vasyugan with Vakh literary norm; Surgut with its literary norm; Southern with no actual written use afaik; and Northern with previously used Middle Ob literary norm and currently used Kazym, Shuryshkary and Lower Ob literary norms. All literary standards I mentioned have, for instance, schoolbooks presently being released, some of them I have in paper on my right hand. The languages of Khanty group (as well as Sami to give an understandable parallel) are not mutually intelligible. For instance, Northern Khanty morphology is very close to the Northern Mansi one with 3-4 noun cases while Vakh-Vasyugan has 10-13 cases like Southern Selkup which is believed to be the remaining of the archaic state because of the close contact with Selkups. So people applying for splitting Khanty just range things as the things are, moreover they are ready to work. Grigoriy Korotkih (talk) 09:50, 8 February 2024 (UTC)

@Grigoriy Korotkih Hi Grigoriy, thanks for your comments. I don't think anyone is objecting to splitting Khanty or Mansi, it's just a case of working out what the splits will be. Benwing2 (talk) 21:42, 8 February 2024 (UTC)

Khanty languages could be considered separate languages. Most of the dictionaries describing them through dialects or subdialects (говоры). Those division already presented in the Wiktionary using regional categories.
How it would be beneficial to the Wiktionary to have three Khanty L-2 sections on the "тур" page instead of one?
Also we will need to somehow handle translations and etymology links from other languages, that refer to unspecified Khanty. Nyuhn (talk) 09:23, 9 February 2024 (UTC)

Should we merge East Slavic languages because they have words like нога or зуб then? Don’t different Khanty lects have different inflection paradigms, pronunciation, even use sometimes?
Speaking of refering to "unspecified Khanty": wouldn’t it be beneficial to the Wiktionary to specify it, rather than pile everything into "Khanty"? Kaarkemhveel (talk) 12:47, 10 February 2024 (UTC)

So in light of all that has been discussed, we should be good to split both languages, according to the cardinal directions of the North East South, and West. We can use labels to further specify which dialect it came from. Naturally each separate language (represented by the 4 cardinal directions) will have to (if possible to find) have an {{alt form}} section to make switching between dialects (represented by the river they lived beside traditionally) easier. Support or Oppose? Ewithu (talk) 09:59, 24 February 2024 (UTC)

@Ewithu Northern/Southern/Eastern/Western seems correct for Mansi but for Khanty maybe it should be Northern/Southern/Eastern per Wikipedia; Western Khanty appears to be a larger grouping consisting of Southern and Northern. I agree with the idea of identifying dialects by rivers. (We also ended up doing this for the CAT:Ye'kwana language, inspired by the Khanty and Mansi situation, as using river names was the only unambiguous and accurate way of referring to different dialects that we could come up with.) Benwing2 (talk) 10:20, 24 February 2024 (UTC)

@Benwing2 Alright, that's sounds good. Now we just need to get to it haha Ewithu (talk) 13:52, 26 February 2024 (UTC)

@Ewithu I think what we need to do is as follows (for Khanty, similarly for Mansi):

Assign language codes kca-nor, kca-sou and kca-eas to the split-out languages.
Assign a temporary family code urj-kca to the new Khantyic (Khantic?) family; see below.
Set up tracking for all uses of kca as a language code. (This will make it easier to find the references to this code so they can be changed.)
Split the current {{kca-*}} templates into per-new-language templates. Thankfully there are only 4 templates. I assume the pronouns listed in {{kca-table-ppron}} are Kazym pronouns. I'm not sure if all the Khantyic languages use the same set of cases; if not we need to modify the new declension templates to have the appropriate sets of cases in them.
Split the Khanty modules. There are only three of them: Module:kca:Dialects, Module:labels/data/lang/kca and Module:kca-translit. I think the translit module can stay as-is.
Move all the lemmas and non-lemma forms to the new languages, and change template references accordingly.
Change all references to language kca to use the appropriate code. This will principally be in Etymology, Translations and Descendants sections.
Delete the Khanty language from Module:languages/data/3/k.
Repurpose the kca code for the Khantyic family.

Benwing2 (talk) 03:41, 27 February 2024 (UTC)

@Ewithu Steps 1-3 are done. Tracking can be found here for Khanty: Special:WhatLinksHere/Template:tracking/languages/kca. There is also tracking for Mansi here: Special:WhatLinksHere/Template:tracking/languages/mns. I have not done the corresponding split for Mansi yet because Glottolog only has a three-way split for Mansic (Southern, Northern, Central) instead of a four-way split (Southern, Northern, Eastern and Western), preferring to consider East Mansi and West Mansi as dialects of Central Mansi. This means we need a bit more discussion. Benwing2 (talk) 05:05, 28 February 2024 (UTC)

@Benwing2 Wonderful Thank you!

So I would say split it 4 way, because on one hand, the western dialect is extinct, while the eastern (is also extinct in theory, but not proven yet) is extant, and western has a lack of Cyrillic dictionaries or sources (except that one Gospel of Mathew written in Lower Lozva most likely), while Eastern even has audio pronunciations on Lingvodoc. While I would also say keep it as Central Mansi, because those two languages have very few resources but I don't this is a really valid argument. Ewithu (talk) 06:17, 28 February 2024 (UTC)

@Ewithu My general take is we should go by mutual intelligibility unless there's really a good reason to do it otherwise. Since Glottolog seems to think the two varieties are mutually intelligible (at least that's what I assume by the Central grouping) and there are few resources overall, I think it's easier to have fewer splits; we can always split later if the need arises. But I don't feel super strongly about it; maybe someone else can chime in. Benwing2 (talk) 06:23, 28 February 2024 (UTC)

In any case let me know if you need help with the Khanty splitting effort (in particular anything best done by bot). Benwing2 (talk) 06:24, 28 February 2024 (UTC)

@Benwing2 Minor point, but are we sure about the name "Khantyic"? It sounds really odd to me, and I think "Khantic" is more common going by GBooks. Theknightwho (talk) 06:51, 28 February 2024 (UTC)

@Theknightwho Good point. I used "Khantyic" because this is what Glottolog uses. Benwing2 (talk) 07:06, 28 February 2024 (UTC)

I can't actually confirm your assertion about Google Books; "Khantic" does seem to occur more often but most of the hits seem to be garbage. Benwing2 (talk) 07:09, 28 February 2024 (UTC)

@Benwing2 I was just going by the number of confirmed hits I could see. I'll have a proper look later today. Theknightwho (talk) 12:21, 28 February 2024 (UTC)

@Benwing2 Can't we just call it "Khanty languages"? Everyone calls the protolanguage "Proto-Khanty", nobody calls it "Proto-Khantyic" or "Proto-Khantic". — SURJECTION ^{/ T / C / L /} 13:42, 28 February 2024 (UTC)

I agree with this.

Also steps 4 and 5 are done for Khanty. Also redirected the reference templates according o their language used. Ewithu (talk) 14:29, 28 February 2024 (UTC)

@Surjection I renamed Khantyic -> Khanty. I see you split all the lemmas and carried out step 9 above but step 7 isn't done. Benwing2 (talk) 22:03, 28 February 2024 (UTC)

7 should be done. A few cases were left with etymology templates that used kca as a source, but those should work, as it now means "from one of the Khanty languages". Most references were updated; some others were indiscriminately changed to kca-nor with attention templates, as I was requested to do. — SURJECTION ^{/ T / C / L /} 22:06, 28 February 2024 (UTC)

@Surjection See e.g. роман. There is an error in the Descendants section due to the use of language code kca. See the tracking link above for all remaining references. Benwing2 (talk) 22:12, 28 February 2024 (UTC)

Done. The remaining instances in mainspace at least are valid uses of the family code. I'll check the other namespaces (at least the ones that matter) as well. — SURJECTION ^{/ T / C / L /} 22:22, 28 February 2024 (UTC)

OK thanks. Benwing2 (talk) 22:26, 28 February 2024 (UTC)

@Ewithu: If the differences in grammar, phonology and morphology of Eastern Mansi and Western Mansi are not too large, then the fact one is little-attested and extinct is all the more reason to keep the two together. Thadh (talk) 09:49, 28 February 2024 (UTC)

@Thadh Alright, keep 'em one. Could you do the honors if you are free? Also should we add Proto-Mansi and Proto-Khanty as well? Ewithu (talk) 09:54, 28 February 2024 (UTC)

@Ewithu I have added the Mansi family along with Northern/Southern/Central Mansi languages. Benwing2 (talk) 22:35, 28 February 2024 (UTC)

Alright thank you! I'll move the templates to the Northern one. Ewithu (talk) 22:39, 28 February 2024 (UTC)

Considering how often etymological sources distinguish between Western and Eastern Mansi though, we may want to consider adding them as etymology-only languages. — SURJECTION ^{/ T / C / L /} 08:13, 29 February 2024 (UTC)

@Surjection Sounds good, I'll go ahead and do that. Benwing2 (talk) 08:15, 29 February 2024 (UTC)

Done Into: Eastern Khanty, Northern Khanty, Southern Khanty, Central Mansi, Northern Mansi, Southern Mansi Ewithu (talk) 13:29, 9 March 2024 (UTC)

RFM discussion: February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Some missing Sino-Tibetan languages~~

Hi, this is a proposal to add three new language codes, relating to a small family of Bai(-adjacent) languages which have been dubbed the Cai-Long languages (from Chinese 蔡龍語支), spoken in western Guizhou.

@Stevey7788 very kindly created an expansive vocabulary list of ~~Greater Bai~~ Macro-Bai languages several years ago, which covers all of these in detail, including a number of sources (all of which are in Chinese, unfortunately).

Caijia (蔡家話) sit-cai

The only extant member of the family, with approximately 1,000 speakers in western Guizhou, with the numbers having declined from over 10,000 in the 1980s. Just over 950 terms are listed in the vocabulary list, including those of the dialects spoken in Shuicheng and Yangjiazhai.

Longjia (龍家語) sit-lnj

Extinct from around 2010. Just under 600 terms are listed in the vocabulary list, including those of the dialects spoken in Anshun and Dafang).

Luren (盧人語) sit-lrn

Extinct from sometime between 1960 and 1980. 61 terms are listed in the vocabulary list.

Theknightwho (talk) 18:59, 3 February 2024 (UTC)

No objections from me although I know little-to-nothing about these languages. Benwing2 (talk) 03:52, 7 February 2024 (UTC)

Support. What should their ancestor/family be listed as? – wpi (talk) 17:28, 11 February 2024 (UTC)

@Wpi Macro-Bai, and then giving the core Bai languages (i.e. the ones we already have language codes for) their own language family within that simply called "Bai languages". I've checked, and I can't find any evidence the term "Greater Bai" is actually used outside of Wiktionary or Wikipedia, so we should definitely fix that. Theknightwho (talk) 17:17, 16 February 2024 (UTC)

Created. Theknightwho (talk) 02:05, 17 February 2024 (UTC)

RFM discussion: February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Splitting Northern Bai properly~~

Currently, we use the code bfc for Northern Bai, classifying it as one of the Bai languages. However, in 2014, bfc was split into Panyi Bai and Lama Bai ( ). Lama Bai was given the new code lay, while Panyi Bai retained the old code bfc. At some point since then we added Lama Bai to our list of languages, but we never renamed bfc to Panyi Bai, which seems to have been an oversight. From the documents, it seems like they still do form a "Northern Bai" grouping, however. Theknightwho (talk) 02:25, 17 February 2024 (UTC)

We only have two lemmas for Northern Bai, so I don't think this would be a difficult job. Theknightwho (talk) 02:25, 17 February 2024 (UTC)

@Theknightwho

Support. Benwing2 (talk) 02:56, 17 February 2024 (UTC)

Split. Having checked the source, both current lemmas should be under Panyi Bai. Theknightwho (talk) 15:35, 24 February 2024 (UTC)

RFM discussion: March 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Anglicising the names we use for the branches of Min Chinese~~

On the back of the recent split of Min Nan into a family, I thought it would be a good idea to consider renaming the various branches of Min to the names usually used in English. With the exception of Coastal Min and Inland Min, the names we currently use are transliterations from Mandarin, whereas most sources - including academic sources - prefer to translate them into English. I have no doubt we only do this because the ISO uses the transliterated names, for whatever reason. As such, I propose the following:

mnp: Min Bei → Northern Min
cdo: Min Dong → Eastern Min
nan: Min Nan → Southern Min (note: currently being converted into a family)
czo: Min Zhong → Central Min
cpx: Puxian → Puxian Min (edit: added per the discussion below)

It's difficult to use Google Ngrams with this, since there are a ton of false positives, but the general trend is very strongly towards the English names historically, with the transliterations increasing in popularity in recent publications. However, taking a look at Google Books results, a lot of the transliterations come from poor quality sources, or in many cases texts that simply contain the ISO language names. Google Scholar is somewhat more helpful, though, and shows a clear trend towards the English names even with very recent papers. Theknightwho (talk) 18:02, 4 March 2024 (UTC)

Support; this is also consistent with general practice here at Wiktionary to prefer English terminology rather than native-language terminology. Benwing2 (talk) 00:15, 5 March 2024 (UTC)

Support; this is what i normally do outside of wiktionary anyways — 義順 (talk) 21:21, 7 March 2024 (UTC)

Neutral / Support —Fish bowl (talk) 05:43, 10 March 2024 (UTC)

Question: Should cpx "Puxian" also be moved to "Pu–Xian Min" (cf. wikipedia:Pu–Xian Min?) —Fish bowl (talk) 05:43, 10 March 2024 (UTC)

@Fish bowl Oppose use of em-dash or en-dash in the name of the language. Benwing2 (talk) 05:54, 10 March 2024 (UTC)

Yeah that's sensible. I copy-and-pasted it from Wikipedia and didn't notice it was there. Imagine it's an ASCII hyphen. —Fish bowl (talk) 05:58, 10 March 2024 (UTC)

Not sure here. Wikipedia uses dashes in the names of many languages (e.g. most of the Yue languages) where Glottolog uses closed compounds. Glottolog uses the form Pu-Xian with a hyphen not en-dash (although Omniglot uses Puxian). In general I prefer less punctuation than more so my gut would say Puxian but properly you'd have to look at scholarly sources to see which one is most common. Benwing2 (talk) 06:19, 10 March 2024 (UTC)

It is because the name 莆仙 Pu(-)Xian is formed from two placenames, 莆田 Putian and 仙遊 Xianyou. This is also the case for the Yue dialects listed on Wikipedia. —Fish bowl (talk) 06:24, 10 March 2024 (UTC)

@Fish bowl "Puxian Min" seems to be a lot more common than "Pu-Xian Min" in English-languages sources, but I agree it should be renamed as well. I'll add it to the list at the top. Theknightwho (talk) 06:34, 10 March 2024 (UTC)

Support. (Also the Bei, Dong, Nan, Zhong are not native anyway; they're Mandarin). — justin(r)leung _{{ (t...) | c=› }} 18:17, 10 March 2024 (UTC)

Moved, given this isn't controversial. Theknightwho (talk) 14:12, 12 March 2024 (UTC)

RFM discussion: February 2022–March 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (_from_descendant_of_"Middle_Chinese"__to_ancestor_of_"Old_Chinese"_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Change "Chinese" from descendant of "Middle Chinese" to ancestor of "Old Chinese"~~

The family tree looks weird.
Old Chinese is subsumed under ==Chinese==.

@Justinrleung, RcAlex36, 沈澄心

—Fish bowl (talk) 06:34, 6 February 2022 (UTC)

I wonder if we should push it even further and have Proto-Sino-Tibetan be the ancestor? Even Old Chinese being the ancestor of Old Chinese may be weird. — justin(r)leung _{{ (t...) | c=› }} 06:54, 6 February 2022 (UTC)

I note that Category:English language similarly belongs to Category:Middle English language, although a major difference in Wiktionary treatment is that ==English== does not cover Category:Middle English language or Category:Old English language. —Fish bowl (talk) 11:19, 7 February 2022 (UTC)

@Fish bowl I'm not a Chinese editor, but from a outside perspective, that'd feel more weird to see (how can Chinese be the ancestor of Old Chinese?). It's also make the Chinese lects go like "Chinese -> Old Chinese -> Middle Chinese -> lect", which seems more confusing to me. Honestly, at this rate. I'd just remove Chinese from the family tree entirely with its current treatment. Or, at least make it on the same level as Old Chinese, rather than an ancestor. AG202 (talk) 04:33, 4 March 2022 (UTC)

I don't see a problem with the current setup...? As AG says, making modern Chinese an ancestor of Old Chinese would be weird (and wrong). I think the current setup is fine...? The only awkwardness is that we group all kinds of Chinese under one L2. I'm going to <s>strike</s> this as no action taken so it can be archived soon, since the discussion is very old and this page is very large, but you can reopen it if you object... - -sche (discuss) 05:33, 31 March 2024 (UTC)

RFM discussion: December 2023–February 2024

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~Language codes for the proto forms of upper branches of Dravidian (south, south-central, central, north)~~

Has been discussed many times and was also agreed up to add them like here and here

The proposal was because there are tons of terms which are restricted to certain branches, sometime being only attested in like 4 languages of 1 branch (*cinki for example) in a family of 80+ languages with 4 branches but are reconstructed to the proto stage, {{R:dra:DL}} has many reconstructions for the inner branches. {{R:dra:Southworth}} mentions many cases where BK reconstructs terms to PD when they are restricted to certain branches like PSD or PCD further saying it should be reconstructed only to the proto branch only. AleksiB 1945 (talk) 13:21, 21 December 2023 (UTC)

Support. Theknightwho (talk) 23:06, 30 December 2023 (UTC)

Created (several days ago now): @AleksiB 1945 we have:

Proto-Central Dravidian dra-cen-pro
Proto-North Dravidain dra-nor-pro
Proto-South Dravidian dra-sou-pro
- Proto-South Dravidian I dra-sdo-pro (covering what we formerly called South Dravidian)
- Proto-South Dravidian II dra-sdt-pro (covering former South-Central Dravidian)

This is to properly align with the format our Proto-Dravidian entries have been following anyway. Theknightwho (talk) 00:03, 24 February 2024 (UTC)

RFM discussion: Adding Proto-Bai (February 2024)

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Adding Proto-Bai

While we do currently have a family covering the Bai langauges (sit-bai), we don't have the code sit-bai-pro for Proto-Bai. There are a few publications out there which give Proto-Bai reconstructions, and our Macro-Bai comparative vocabulary list contains 469 from Wang Feng's Comparison of languages in contact: the distillation method and the case of Bai (2006). I'm not suggesting that we blindly create entries for all of these, but we already reference Proto-Bai in four entries anyway: Lama Bai ɕy³³, Southern Bai ɕy³³, Central Bai xuix and Chinese 山 (shān), so there's already a need for the code. Theknightwho (talk) 03:13, 17 February 2024 (UTC)

Support — 義順 (talk) 02:46, 18 February 2024 (UTC)

Created - given there's a real need for it, and no objections have been forthcoming. Theknightwho (talk) 04:58, 23 February 2024 (UTC)

RFM discussion: Moving Nung to Nùng (February 2024)

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Moving Nung to Nùng

Three reasons:

Both of these names are common in the literature, but newer and more professional publications seem to prefer Nùng, which I suspect is down to typesetting no longer being an issue.
It matches our general treatment of other languages in Vietnam, such as Tày, Ná-Meo, Ts'ün-Lao etc.
Other than headings, this spelling is already used in entries in the form of labels (e.g. bân, nãhm, nưhng, slao, slíhm).

Theknightwho (talk) 06:08, 17 February 2024 (UTC)

Moved - given it's consistent with our handling of it elsewhere. Theknightwho (talk) 04:08, 25 February 2024 (UTC)

@Theknightwho I think you should have waited on this. You only waited a week and no one supported the move. I don't agree with the general principle that we should include all the native-language accents and other Unicode chars in the Wiktionary names of languages. These names should reflect the *English* usage of such names, not the native-language usage. Possibly the name change was justified in this particular case but I don't want any precedent set that would justify e.g. moving the name O'odham to ʼOʼodham (or even worse, a half-ass rename like Kwami did to the Wikipedia article Oʼodham language, which includes the "Unicodified" apostrophe in the middle of the name but omits the apostrophe in the beginning). Benwing2 (talk) 05:02, 25 February 2024 (UTC)

@Benwing2 I'm not trying to set any precedent - I just noted that it's generally spelled with the diacritic in more recent (English) publications, and our entries already spelled it that way outside of the headings. Given no-one opposed it, it didn't seem like an issue. I'll wait longer in future, though. Theknightwho (talk) 05:21, 25 February 2024 (UTC)

RFM discussion: Move Hachijo Japanese to Hachijo language & more (February–March 2024)

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Move Hachijo Japanese to Hachijo language & more

Discussion suggested by @Theknightwho. Please share any thoughts you may have.

Recent scholarly consensus is that Hachijo is distinct from Japanese, being one of the earliest branches of if not outright parallel to (Old) Japanese. Hachijo should thus warrant a conversion into a language.

Similarly, Tugaru-ben and Satuma-ben could also be elevated to language level, though the exact classification on Kyuusyuu lects and the affinity of Tugaru-ben with the rest of Japanese is still not widely agreed upon. This is to be discussed. I am personally ambivolent on the issue though this has already been brought up on preliminary discussions on the enwikt Discord.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): — 義順 (talk) 21:06, 18 February 2024 (UTC)

Support. I am not knowledgeable enough to have an opinion on this, but we will need to make a decision on whether it's a descendant of Old Japanese or not before adding this. (Also, less important, but I'd prefer we used the name Hachijō with the macron.) Theknightwho (talk) 01:50, 19 February 2024 (UTC)

Abstain, as a person who cares about historical linguistics, but doesn't actually keep up on the literature. Cnilep (talk) 03:12, 19 February 2024 (UTC)

Technical Queries:

Do we have an ISO code?
If we don't, how will we organize this in our infrastructure?

Also, +1 for @Theknightwho's points that 1) we need to be clear about the provenance of Hachijō (daughter language of OJP? or niece, sharing an even older parent?), and 2) we should use the macron spelling as "Hachijō". ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:22, 19 February 2024 (UTC)

There's no ISO code for it, from what I can see, so we'd have to treat it with an exceptional code, per the guidelines at WT:Languages. Something like jpx-hcj. AG202 (talk) 17:37, 19 February 2024 (UTC)

Good choice. I will raise it in a separate thread, but the issue of spelling also affects some of the Ryukyuan languages, where the names we've lifted from the ISO are suboptimal and aren't really used outside of contexts directly related to the ISO standard itself. Theknightwho (talk) 19:40, 20 February 2024 (UTC)

Since there hadn't been any activity in a while, I've gone ahead and created Hachijō with the langcode jpx-hcj, as a descendant of Eastern Old Japanese (an etym-only language part of Old Japanese). Currently, it has the two templates {{jpx-hcj-head}} and {{jpx-hcj-kanjitab}}, though more language-specific templates can be created as needed. I've converted most terms that were in Category:Hachijō Japanese, but the remaining ones are more complex, so I wanted to leave them to somebody with a better idea of what should be done with them. Pinging (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2, ND381, AG202): — This unsigned comment was added by Theknightwho (talk • contribs) at 06:19, 15 March 2024 (UTC).

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): Redoing pings, since they didn't work as I forgot to sign (which should teach me from doing this kind of thing just before I go to bed). Theknightwho (talk) 16:33, 15 March 2024 (UTC)

RFM discussion: Making Sichuanese a full language (February 2024)

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Making Sichuanese a full language

We currently handle Sichuanese (cmn-sic) as an etymology-only variant of Mandarin. Four things to consider:

Its treatment in Chinese entries is on the same level as that of Mandarin (proper) and Dungan: i.e. it is listed in the "Mandarin" section, in which "Mandarin" is used in the broad sense. Given that Dungan (which is treated as a full language) is also listed there, there's no reason why Sichuanese should be any different. See 女, which gives Mandarin 女 (nǚ), Sichuanese 女 (nyü³) and Dungan 女 (nü, II). Note that Sichuanese transliteration is already handled automatically, since the transliteration module handles it as an independent lect.
Sichuanese is Southwestern Mandarin, whereas we use Mandarin to refer pretty much exclusively to the modern standard of Beijing Mandarin (except for that heading in the Chinese pronunciation table, already mentioned). They are pretty distinct.
As with Hokkien and Hainanese (as mentioned in #Converting Min Nan into a family), the designation of etymology-only language seems to have been arbitrary, and was likely influenced by the fact it doesn't have its own ISO code.
The line is much blurrier here, but it was raised in a past thread about Shanghainese (re the category Category:Shanghainese Chinese) that categories with names such as Category:Sichuanese Mandarin cause confusion, because of the distinction between Sichuanese and standard Mandarin as spoken in Sichuan.

Two further considerations:

We might instead want to make Southwestern Mandarin a full language instead, with Sichuanese itself being part of that.
The code would likely need to be changed, as retaining cmn-sic would be misleading if we split it out of cmn.

Theknightwho (talk) 19:29, 20 February 2024 (UTC)

Support Sichuanese,

Oppose SW Mando — 義順 (talk) 20:25, 20 February 2024 (UTC)

@ND381 How do you propose we handle SW Mandarin that isn't Sichuanese in the future? Theknightwho (talk) 04:07, 25 February 2024 (UTC)

Either along provincial boundaries and make areal groups (they're all similar enough) or just use Li's atlas. It just feels kind of weird to have "Southwestern Mandarin" as a language since it's very much a linguistic concept — 義順 (talk) 08:13, 25 February 2024 (UTC)

@ND381 Wouldn't that amount to splitting it into multiple languages, though? I'm not sure if that's warranted. Theknightwho (talk) 08:17, 25 February 2024 (UTC)

Yes, it would. If you don't see it being practical I can make it practical by adding a few hundred Wuhanese/Kunmingese entries next Sunday. Plus, I really don't see any other functional alternative for resolving this issue — 義順 (talk) 08:25, 25 February 2024 (UTC)

@ND381 I have no opinion - I just want to make sure we don't add Sichuanese and then find we need to move everything to SW Mandarin later on when adding other SW Mandarin lects. If you think they need to be separate languages then that's totally fine. Theknightwho (talk) 08:49, 25 February 2024 (UTC)

Right, thanks for the clarification. I really don't see Southwestern Mandarin needing a language-level code any time in the future, as we have Wuhanese implementation and expansion set as a long term goal (see here) and, as mentioned, Southwestern Mandarin doesn't really act like a single coherent unit outside of linguistics even when compared to ones we have like Xiang or Jin, which have "Hunanese" and "Shanxinese" connotations respectively — 義順 (talk) 08:57, 25 February 2024 (UTC)

Created as zhx-sic. The practical effect of this is relatively minor, since we've already been treating it like a full language anyway. Theknightwho (talk) 18:54, 27 February 2024 (UTC)

Middle Polish (yet again)

The following discussion has been moved from Wiktionary:Language treatment requests (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

I propose we make Middle Polish an Etymology only code with the language code zlw-mpl. Would be very useful for linking and mentioning. @Sławobóg @ZomBear @Mahagaja @Thadh @KamiruPL Vininn126 (talk) 13:11, 11 May 2023 (UTC)

I know this is a boring topic that many people find irrelevant, but in what contexts would this code be used? (e.g. are there a lot of Ruthenian borrowings from Middle Polish?) Thadh (talk) 16:03, 11 May 2023 (UTC)

Or a lot of Middle Polish only inheritances from Old Polish, or Silesian derivations. Vininn126 (talk) 16:13, 11 May 2023 (UTC)

@Thadh in the Old Ruthenian there are a lot of Polish borrowings, just the period of the 1500-1700s. This is, to some extent, one of its features that alienated the modern Ukrainian and Belarusian languages first from Middle Russian and then from modern Russian. --ZomBear (talk) 16:46, 11 May 2023 (UTC)

To me, as the editor of the Old Ruthenian entries, this would be helpful. In the Old Ruthenian zle-ort language (existed in the period ~ 1387-1798), there are extremely many Polish borrowings. Words borrowed in the 1400s (before 1500) have to be indicated from Old Polish zlw-opl, everything is fine here. But borrowings in the period of 1500-1700 have to be indicated as borrowed from the modern Polish language. The presence of a separate code for Middle Polish would solve this inaccuracy. --ZomBear (talk) 16:42, 11 May 2023 (UTC)

@ZomBear @Sławobóg Done, thanks @Theknightwho! Vininn126 (talk) 14:00, 18 May 2023 (UTC)

@Vininn126 thank you and everyone who contributed to this. I have already created the first one, the Old Ruthenian гартова́ти (hartováti), where it is listed as a borrowing from Middle Polish. ZomBear (talk) 17:55, 18 May 2023 (UTC)

Renaming Wiradhuri (wrh) to Wiradjuri

The following discussion has been moved from Wiktionary:Language treatment requests (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

I think we need to change this one because—so far as I can tell—"Wiradjuri" is the most current and by far most common English spelling for this language since at least the 1980s, as opposed to our current spelling (see "Category:Wiradhuri language"). "Wiradjuri" is also the form used in official signage and communication (see for instance: local shire boundary signage; a city council webpage; a unit from NSW state school curriculum; cultural information from the National Indigenous Australians Agency—a federal government agency). Helrasincke (talk) 03:44, 29 March 2023 (UTC)

Support. Even on Glottolog, where they use the -dh- form, the -dj- form (and then -dg- forms) is more common in the names of the reference works about it they have catalogued. - -sche (discuss) 20:54, 1 April 2023 (UTC)

@Benwing2? This, that and the other (talk) 12:44, 27 December 2023 (UTC)

@This, that and the other

Done. Benwing2 (talk) 04:49, 1 January 2024 (UTC)

Oriya (or) -> Odia (or)

The following discussion has been moved from Wiktionary:Language treatment requests (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Previous discussions on the matter:

Same issue, Oriya should be renamed to Odia (same thing applies to the script). According to multiple discussions, our language names are based on common English usage, and it looks like the consensus is Odia as of late. I'll copy-paste what I said last time:

"Glottolog, Collins Dictionary, Wikidata, Wikimedia, and the first instance of "Odia" or "Oiya" on the language's own Wiktionary seen at the top here all have made the change to and use "Odia" as the (primary) spelling as well. Also, see this discussion on the English Wikipedia about the matter, with the eventual decision being to move to "Odia" based on the rapidly increasing usage at the time 6 years ago (which is bound to have increased by now). More sources: Cambridge Dictionary, the Oxford English-Odia Dictionary (ISBN: 9780199474554), Microsoft, Google (another Google source), the Concise Oxford Dictionary of Linguistics, and multiple Wikimedia blog posts made by Odia natives which label Odia wiki projects using "Odia".

The main primary source, dictionary-wise, that I've been able to find in support of "Oriya" is the OED, but even in the old Lexico dictionary they preferred "Odia". Dictionary.com lemmatizes both, but interestingly enough only puts the name of the ethnic group at "Odia". Ethnologue also prefers "Odia" for the name of the language. As stated two years ago, it feels very strange that we're very much in the minority that still uses the old name. CC: @Benwing2 AG202 (talk) 15:09, 6 September 2023 (UTC)

Support. Common name for it in English. CitationsFreak (talk) 15:16, 6 September 2023 (UTC)

Support. I don't think the OED is that relevant in this case, their wheels turn slowly at the best of times and their last citation is from 2001. —Al-Muqanna المقنع (talk) 15:42, 6 September 2023 (UTC)

I think the rename from Oriya -> Odia was a bit pointless but I won't oppose a rename on common-usage grounds. Benwing2 (talk) 23:39, 6 September 2023 (UTC)

Support per above, this change is well supported by sources. MSG17 (talk) 00:16, 7 September 2023 (UTC)

Oppose I now prefer the term 'Odious', but am prepared to compromise and use the term 'Oriya' for the script. Note that the official character names all contain 'ORIYA'. --RichardW57 (talk) 10:26, 9 September 2023 (UTC)

Unicode is notoriously slow to update names if that’s what you’re referencing, but I can’t take the first point seriously. AG202 (talk) 23:49, 9 September 2023 (UTC)

Unicode characters names are immutable. The best closest they come is aliases, as when it was pointed out that several pairs of Lao character names were the wrong way round. --RichardW57 (talk) 18:08, 10 September 2023 (UTC)

So, they don't change with the times? CitationsFreak (talk) 21:47, 10 September 2023 (UTC)

@CitationsFreak They’re never changed, even when they’re completely wrong. It’s about ensuring backwards compatibility. Theknightwho (talk) 02:45, 18 September 2023 (UTC)

Done. Benwing2 (talk) 04:28, 1 November 2023 (UTC)

Late oppose, Odia is a non-standard misspelling that has become widespread due to propagation by the stupid, English-illiterate government. Oriya is the original, standard English term for the language and ethnicity. The change in the transliteration is reminiscent of the biggest genocider in history changing his country’s name from Cambodia to Kampuchea. It’s a shame we treaded the same non-humanist path. Inqilābī 14:53, 28 June 2024 (UTC)

For what it's worth, "Odia" is now preferred by the OED as well. I'll also be archiving this at some point. AG202 (talk) 15:20, 28 June 2024 (UTC)

Split into and

The following discussion has been moved from Wiktionary:Language treatment requests (_into__and_|permalink]]).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

is an etym-only code added back in 2014 (diff) as and later renamed to in 2019. and are ISO 639-3 codes added in January 2020. Note that the current data module incorrectly suggests (Cantonese) to be the parent of , but they are generally considered to be distinct, which is mentioned in ISO's comment on the change request. -- Wpi31 (talk) 14:40, 23 August 2022 (UTC)

Support 沈澄心 ✉ 12:29, 6 October 2022 (UTC)

Support — justin(r)leung _{{ (t...) | c=› }} 16:10, 6 October 2022 (UTC)

Support Theknightwho (talk) 15:39, 13 December 2023 (UTC)

Given this has been open for over a year, I'm going to close this as split. Theknightwho (talk) 15:39, 13 December 2023 (UTC)

etymology codes for remaining Chinese varieties in Module:zh-usex/data

The following discussion has been moved from Wiktionary:Language treatment requests (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Current variety code	Description	Current langcode	Proposed langcode	Current romanization	Comment
MSC	MSC	cmn	cmn	Pinyin
M-BJ	Beijing Mandarin	cmn	cmn-bei?	Pinyin
M-TW	Taiwanese Mandarin	cmn	cmn-TW?	Pinyin
M-MY	Malaysian Mandarin	cmn	cmn-MY?	Pinyin
M-SG	Singaporean Mandarin	cmn	cmn-SG?	Pinyin
M-PH	Philippine Mandarin	cmn	cmn-PH?	Pinyin
M-TJ	Tianjin Mandarin	cmn	cmn-tia?	Pinyin
M-NE	Northeastern Mandarin	cmn	~~cmn-nea~~ cmn-noe?	Pinyin
M-CP	Central Plains Mandarin	cmn	~~cmn-cpl~~ cmn-cep?	Pinyin
M-GZ	Guanzhong Mandarin	cmn	cmn-gua?	Pinyin	Guanzhong
M-LY	Lanyin Mandarin	cmn	cmn-lan?	Pinyin
M-S	Sichuanese	zhx-sic	zhx-sic	Sichuanese Pinyin
M-NJ	Nanjing Mandarin	cmn-njn	~~cmn-njn~~ cmn-nan?	Nankinese Pinyin
M-YZ	Yangzhou Mandarin	cmn-yaz	~~cmn-yaz~~ cmn-yan or cmn-yzh?	IPA	IPA as a placeholder
M-W	Wuhanese	cmn-wuh	cmn-wuh?	IPA
M-GL	Guilin Mandarin	cmn-gli	~~cmn-gli~~ cmn-gui?	IPA	IPA as a placeholder
M-XN	Xining Mandarin	cmn-xin	cmn-xin?	IPA	IPA as a placeholder
M-UIB	dialectal Mandarin	cmn	cmn-bei-unk?	Pinyi	UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC
M-DNG	Dungan	dng	dng	Cyrillic
CL	Classical Chinese	cmn	~~cmn-cla~~ lzh-cmn?	Pinyin
CL-TW	Classical Chinese	cmn	~~cmn-cla-TW~~ lzh-cmn-TW?	Pinyin (Taiwanese Mandarin)
CL-C	Classical Chinese	yue	~~cmn-cla-TW~~ lzh-yue?	Jyutping
CL-C-T	Classical Chinese	zhx-tai	~~zhx-tai-cla~~ lwz-tai?	Wiktionary
CL-VN	Vietnamese Literary Sinitic	vi	???	Sino-Vietnamese
CL-KR	Korean Literary Sinitic	ko	???	Sino-Korean
CL-C	Classical Chinese	yue	~~yue-cla~~ lzh-yue?	Jyutping
CL-PC	Pre-Classical Chinese	cmn	~~cmn-pcl~~ lzh-pre?	Pinyin
CL-L	Literary Chinese	cmn	~~cmn-lit~~ lzh-lit?	Pinyin
CI	Ci	cmn	~~cmn-cip~~ lzh-cip?	Pinyin
WVC	Written Vernacular Chinese	cmn	~~cmn-wrv~~ cmn-wvc?	Pinyin
WVC-C	Written Vernacular Chinese	yue	~~yue-wrv~~ yue-wvc?	Jyutping
WVC-C-T	Written Vernacular Chinese	zhx-tai	~~zhx-tai-wrv~~ zhx-tai-wvc?	Wiktionary
C	Cantonese	yue	yue	Jyutping
C-GZ	Guangzhou Cantonese	yue	yue-gua?	Jyutping
C-LIT	Literary Cantonese	yue	yue-lit?	Jyutping
C-HK	Hong Kong Cantonese	yue	yue-HK?	Jyutping
C-T	Taishanese	zhx-tai	zhx-tai	Wiktionary
C-DZ	Danzhou dialect	yue-dan	yue-dan?	IPA	IPA as a placeholder
J	Jin	cjy	cjy	Wiktionary
MB	Min Bei	mnp	mnp	Kienning Colloquial Romanized
MD	Min Dong	cdo	cdo	Bàng-uâ-cê / IPA
MN	Hokkien	nan-hbl	nan-hbl	Pe̍h-ōe-jī
TW	Taiwanese Hokkien	nan-hbl	nan-hbl-TW?	Pe̍h-ōe-jī
MN-PN	Penang Hokkien	nan-hbl	nan-pen	Pe̍h-ōe-jī
MN-PH	Philippine Hokkien	nan-hbl	nan-hbl-PH?	Pe̍h-ōe-jī
MN-T	Teochew	nan-tws	nan-tws	Peng\'im
MN-L	Leizhou Min	luh	luh	Leizhou Pinyin
MN-HLF	Haklau Min	nan-hlh	nan-hlh	IPA	IPA as a placeholder
MN-H	Hainanese	hnm	hnm	Guangdong Romanization
W	Wu	wuu	wuu	Wugniu
SH	Shanghainese	wuu	wuu-sha	Wugniu
W-SZ	Suzhounese	wuu	wuu-szh	Wugniu
W-HZ	Hangzhounese	wuu	wuu-hzh	Wugniu
W-CM	Shadi Wu	wuu	~~wuu-sha~~ wuu-chm?	Wugniu	wuu-cm? including Chongming, Haimen, Changyinsha etc
W-NB	Ningbonese	wuu	wuu-ngb	Wugniu
W-N	Northern Wu	wuu	wuu-nor?	Wugniu	general northern wu, incl. transitionary varieties
W-WZ	Wenzhounese	wuu-wz	wuu-wen?	Wugniu
G	Gan	gan	gan	Wiktionary
X	Xiang	hsn	hsn	Wiktionary
H	Sixian Hakka	hak	hak-six?	Pha̍k-fa-sṳ
H-HL	Hailu Hakka	hak	hak-hai?	Taiwanese Hakka Romanization System
H-DB	Dabu Hakka	hak	hak-dab?	Taiwanese Hakka Romanization System
H-MX	Meixian Hakka	hak	hak-mei?	Hakka Transliteration Scheme
H-MY-HY	Malaysian Huiyang Hakka	hak	hak-hui?	IPA	IPA as a placeholder
H-EM	Hakka	hak	~~hak-emo~~ hak-eam?	IPA	Early Modern Hakka, IPA as a placeholder
H-ZA	Zhao'an Hakka	hak	hak-zha?	Taiwanese Hakka Romanization System
WX	Waxiang	wxa	wxa	IPA

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho We currently have 66 bespoke "variety codes" for Chinese lects in Module:zh-usex/data for use by {{zh-x}} (there are 67 entries in the data module, but CL-C occurs twice). The module maps them to language codes (full and etymology-only), but in a non-unique fashion. I haven't counted but I'm guessing maybe 40% of the variety codes have existing full or etymology-only language codes. I propose creating etymology-only codes for the remaining ones and then doing a bot run to replace the bespoke codes with Wiktionary language codes. In the above table I list my suggestions. Note that some of the currently listed lang codes don't exist and some of them are badly named or formatted and should be renamed. Benwing2 (talk) 05:52, 10 March 2024 (UTC)

lmao are we finally getting around to extirpating these?

Personally I believe that giving every dialect branch and family (and location?) will be an unwieldy alphabet soup. I propose using full names for clarity and simplicity.

The Classical Chinese codes seem silly; perhaps we should use a bipartite system giving the text language and the pronunciation language, such as lzh/cmn-TW (Literary Chinese in Taiwanese Mandarin pronunciation).

(Can we do {{zh-pron}} next?) —Fish bowl (talk) 06:16, 10 March 2024 (UTC)

Yeah I don't know much about Classical Chinese; I just added codes for them to correspond to the existing variety codes. I have no objection to merging some of the variety codes. The main disadvantage to using full names in etym codes is that they're long to type. Benwing2 (talk) 06:23, 10 March 2024 (UTC)

(Counterpoint to the "alphabet soup" concern: these 2-letter ad-hoc codes have worked so far(?) —Fish bowl (talk) 06:33, 10 March 2024 (UTC))

@Fish bowl We're going to need codes for quite a few of these anyway, when they eventually get added to {{zh-pron}}, so we might as well give them proper codes now. Quite a few of them will need to be made full languages at some point, but giving them etym-only codes is probably fine for now. Theknightwho (talk) 06:42, 10 March 2024 (UTC)

Support the proposal generally. I don't have strong feelings about the particular etymology codes that are proposed here. While we're at it, perhaps what is called Literary Cantonese should be renamed to Hong Kong Written Chinese or the like. "Literary Cantonese" is kind of a confusing label because it's more like a Mandarin-based register with varying degrees of influence from Cantonese and Literary Chinese. — justin(r)leung _{{ (t...) | c=› }} 06:37, 10 March 2024 (UTC)

Support in general. CL-* and CI should perhaps be lzh-* instead of cmn-*, or what Fish Bowl has said above. The WVC ones could use *-wvc instead of *-wrv - the former would be easier to remember. Danzhou should be zhx-dan, since it is just often grouped under Yue for convenience but not really a Yue lect. Penang Hokkien should be nan-hbl-pen as it's a Hokkien dialect. Other than these no strong feelings, though I would caution against using too much of the syllables that are too frequently found in place names (e.g. zhou as zh in Suzhou wuu-szh and Hangzhou wuu-hzh, these are already defined so it's fine), otherwise we will run out of possible letter combinations very soon. Also concur with Justin's view on "Literary Cantonese". – wpi (talk) 07:00, 10 March 2024 (UTC)

@Wpi @Theknightwho Yeah I think we should try to write up a proposed set of conventions for new etymology language codes. Generally I try to use the first three letters of the lect name unless that creates ambiguity (e.g. I used nea for Northeastern because nor would be ambiguous with Northern), but beyond that some thought is required. Benwing2 (talk) 07:28, 10 March 2024 (UTC)

@Benwing2 @Wpi I’d oppose having 9 letter codes except where it’s unavoidable (e.g. some proto-languages), since they’re awkward and difficult to remember. In cases like Penang Hokkien, just using the family code as a prefix is probably fine. Theknightwho (talk) 14:36, 10 March 2024 (UTC)

@Theknightwho Yes that is totally reasonable. Benwing2 (talk) 19:39, 10 March 2024 (UTC)

Support

Shadi could be wuu-chm (ie. Chongming) to avoid overlap with Shanghainese if you so wish. wuu-nor looks fine. — 義順 (talk) 09:23, 10 March 2024 (UTC)

Support the general proposal, but I'm wondering if we should have something for colloquial putonghua. There are some colloquial terms that are not specific to the Beijing dialect of Mandarin, and probably some that are more used among non-native Mandarin speakers in southern China. The dog2 (talk) 14:29, 10 March 2024 (UTC)

Comment: I updated some of the suggested codes based on comments and based on my attempts to be more consistent. I am using the following logic for defining codes:

Use the first three letters of the variety/dialect/lect name if possible.
If that causes ambiguity:
1. If the lect name has three components, use the first letter of each.
2. If the lect name has two components and one of them begins with a digraph, use the digraph along with the first letter of the other, e.g. Yangzhou could be abbreviated yzh.
3. Otherwise, if the lect name has two components, use the first two letters of the first component followed by the first letter of the second, e.g. Early Modern could be abbreviated eam and Northeast(ern) could be abbreviated noe.

In response to User:The dog2's comments about colloquial Putonghua, would the current M-UIB variety code suffice? The comment by it is UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC.

In response to User:Wpi: I updated the Classical Chinese codes to use lzh-*. What about 'Vietnamese Literary Sinitic' and 'Korean Literary Sinitic'? Are these actual lects that are essentially Vietnamese/Korean-influenced usage of Classical Chinese, or are they merely the use of Chinese terms in Vietnamese and Korean? In the former case we could adopt codes lzh-VI and lzh-KO; in the latter case I'm not sure we need any codes. Benwing2 (talk) 23:32, 10 March 2024 (UTC)

@Theknightwho @Chuck Entz We have errors coming from the undefined codes that currently occur in Module:zh-usex/data. We either need to temporarily change the undefined codes to one of the currently in-use codes, or go ahead and define etymology-only language codes corresponding to the undefined codes (not necessarily using the codes already present; see my suggestions above). The set of undefined codes causing errors is wuu-wz (Wenzhounese), cmn-gli (Guilin), cmn-xin (Xining), cmn-njn (Nanjing), cmn-yaz (Yangzhou), yue-dan (Danzhou). Benwing2 (talk) 23:39, 10 March 2024 (UTC)

@Benwing2 Of those, I'm pretty sure Danzhou and Nanjing should be full languages for sure, but I'm unsure about the others. @wpi, justinrleung? Theknightwho (talk) 23:43, 10 March 2024 (UTC)

@Theknightwho Full languages take longer to gain consensus. IMO if there are no objections we can define them for now as etym-only languages and upgrade them later when the discussion has played out. (E.g. I have heard it said that Wenzhounese itself consists of several mutually incomprehensible dialects, meaning potentially we would need several full languages at some point.) Benwing2 (talk) 23:49, 10 March 2024 (UTC)

@Benwing2 Sure. In the case of Danzhou, it should probably be made a child of Chinese (zh) for now, then. It's traditionally been counted as a Yue lect, but more recently it's been treated as an unclassified divergent lect; however, we treat Yue as a family (zhx-yue), and reserve the code yue for Cantonese. Whether or not Danzhou is part of Yue, it's definitely not part of Cantonese in the sense we're defining it as. Theknightwho (talk) 23:54, 10 March 2024 (UTC)

I don't know about calling it "Beijing-esque". One thing is that you don't really get the erhua in Taiwan, or when people from southern China speak Mandarin. And someone from Beijing will always distinguish between 咱們 and 我們 when speaking standard Mandarin, but you don't see that among people from southern China when they speak Mandarin. Should we just have a generic "Southern Chinese Mandarin then"? The dog2 (talk) 00:09, 11 March 2024 (UTC)

@The dog2 "Southern Chinese Mandarin" seems problematic because Mandarin is a huge area with lots of diversity, and here you don't mean "Mandarin as spoken in the southern part of the Mandarin-speaking area" so much as "Southern Standard Mandarin". Benwing2 (talk) 00:30, 11 March 2024 (UTC)

@Benwing2: What I meant is Mandarin as spoken today in traditionally non-Mandarin-speaking areas like Fujian and Guangdong. The dog2 (talk) 01:40, 11 March 2024 (UTC)

@The dog2 Right, but conceptually your "Southern Chinese Mandarin" is completely different from e.g. Southwestern Mandarin. The latter refers to the lects spoken natively in the southwestern part of the Mandarin-speaking area but the former refers not to the native Mandarin lects in the southern part of the Mandarin-speaking area (which would be something like Jianghuai Mandarin) but to the variety of Standard Mandarin (which is a northern Mandarin variety) as spoken in southern regions that natively don't speak Mandarin at all. Benwing2 (talk) 01:46, 11 March 2024 (UTC)

@Benwing2: It's a little more complicated than that these days. In some places like Nanning and Fuzhou, most of the younger generations can't speak the local dialect anymore, and now speak standard Mandarin as their native language. The dog2 (talk) 04:27, 11 March 2024 (UTC)

@The dog2 OK sure (what you are describing is unfortunately happening everywhere), but are you objecting to the term "Southern Standard Mandarin"? ("Southern Chinese Mandarin" seems to me both problematic for the reasons I have outlined, and redundant in that "Mandarin" is a variety of "Chinese".) Benwing2 (talk) 04:34, 11 March 2024 (UTC)

@Benwing2: Maybe let's ask Justinrleung what he thinks is a good name, because I can't really think of one. And it really depends on which part of China. In the Teochew-speaking areas, the dialect is still going very strong. And people from Fuzhou often lament that people from southern Fujian have preserved the dialects much better than in Fuzhou. The dog2 (talk) 05:05, 11 March 2024 (UTC)

@Benwing2, The dog2: I don't really know what the purpose of this "Southern Standard Mandarin" for the purposes of zh-x or elsewhere on Wiktionary. Terms that are chiefly used in the south but still considered standard aren't usually marked as southern, and there isn't really a cohesive variety, just possibly some shared tendencies. — justin(r)leung _{{ (t...) | c=› }} 05:17, 11 March 2024 (UTC)

The "Beijingesque" tag is chiefly used by @Dokurrat. Dokurrat, can you explain this tag a bit? —Fish bowl (talk) 22:32, 11 March 2024 (UTC)

@Fish bowl: Sometimes a word or expression exist in both Beijing dialect and my native lexicon and I would construct an example sentence for it. In such case, I feel weird to label my example sentence as "Beijing dialect", as I'm not a Beijing dialect speaker. And hence I created this "UIB" tag thing. Now that I review this thing, I think I could've just used a tag that says "Mandarin" instead. I have no issue retiring the "UIB" tag or renaming it or doing whatever y'all see fit with it. Dokurrat (talk) 08:20, 13 March 2024 (UTC)

@Benwing2, Theknightwho: I'm not sure about Nanjing being a full language but not other varieties of Mandarin (Yangzhou is also Jianghuai, for example, so why would it be differentially treated?) Danzhou can be a full language since its status is disputed. BTW, I'm not exactly sure about "Cantonese" being not the same as "Yue" in current practice on Wiktionary. It just seems like that because all Cantonese entries in jyutping are based on Standard Cantonese (because Jyutping is inherently created for Standard Cantonese) and translations are 99.9% in Standard Cantonese because of our editors' knowledge; however, in "zh-dial" and "zh-pron", "Cantonese" means "Yue". I'm not exactly down with the idea that yue = Cantonese and zhx-yue = Yue, since it's kind of different from how we're treating nan, for example. — justin(r)leung _{{ (t...) | c=› }} 01:00, 11 March 2024 (UTC)

@Justinrleung I think it would be sensible to have a major thread each for Mandarin and Yue to hash out how we handle them, in a similar fashion to what we've been doing for (Southern) Min, since it would help to iron out these kind of issues, as I think the current piecemeal approach leads to a lot of confusion. In particular, there's the issue you point out as to what we should be using the yue and cmn codes for. Theknightwho (talk) 01:06, 11 March 2024 (UTC)

@Theknightwho: Yes, I agree. — justin(r)leung _{{ (t...) | c=› }} 01:12, 11 March 2024 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ I added etym codes for the lects in Module:zh-usex/data that were causing errors, using my proposed codes above. They are marked as temporary, pending further discussion. Benwing2 (talk) 22:38, 11 March 2024 (UTC)

Done. I omitted the things listed as "DO WE NEED THIS?" above and also omitted "Literary Chinese" because I have no idea how i differs from just lzh, which is also "Literary Chinese". Please note, there are lots more lects mentioned in Module:labels/data/lang/zh and in qualifiers in Chinese thesaurus entries; I will post separately about these. Benwing2 (talk) 23:55, 17 March 2024 (UTC)