Hello, you have come here looking for the meaning of the word Wiktionary talk:About Chinese. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary talk:About Chinese, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary talk:About Chinese in singular and plural. Everything you need to know about the word Wiktionary talk:About Chinese you have here. The definition of the word Wiktionary talk:About Chinese will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary talk:About Chinese, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
Latest comment: 13 years ago2 comments2 people in discussion
Autoformat has identified a number of entries that have the non-conforming language name "Chinese (traditional/simplified)". There are others that it has not yet flagged as well. I could not be trusted to correct this properly. DCDuringTALK17:32, 9 May 2010 (UTC)Reply
As a rule, assume it's Mandarin. Traditional/Simplified entries go on a single, they're not really different 'scripts' but more like the French spelling reforms, were paraître becomes paraitre as the circumflex doesn't serve any purpose. Mglovesfun (talk) 08:26, 29 September 2011 (UTC)Reply
?
Latest comment: 13 years ago3 comments3 people in discussion
(Note: I don't know a thing about Chinese.) A few questions/issues:
WT:AZH#Min_Nan says that Min Nan "has four main branches... This poses a problem for Wiktionary, since these dialects are not mutually intelligible, and only one L2 header may be used per ISO 639 code. ... To date, virtually all entries for Min Nan have been based on the Amoy dialect, which is widely considered to be a de facto standard. The disposition of other dialects such as Teochew and Qiongwen Hainanese remains undecided at this time." I'm pretty sure that standard practice for branches among languages is to use context labels for words that don't exist in some branches. Why should this language be different?
I seem to recall some consensus about not allowing toneless pinyin entries? If there was, shouldn't this be mentioned on WT:About Chinese?
WT:AZH lists {{infl}} as being the standard template to use, and repeats it many times for all the languages that do not yet have templates built for them specifically. Rather than showing an explanation for {{infl}} over and over again, wouldn't it make sense to make the page say that for dialects that don't have specific templates yet, use infl, and then explain how to use it once?
Are these languages treated as separate languages or as dialects of one languages? If they're separate languages, why do things like Category:Chinese templates exist, instead of being split into sections?
What is the Wiktionary code for Mandarin, zh or cmn?
Just one answer for the moment: #What is the Wiktionary code for Mandarin, zh or cmn?. This is annoying but the assisted method doesn't work well with cmn, it creates {{ tø|cmn| for translations, this they can't be linked to zh:wiki. zh works better but bots change them to cmn. ZH is short for Chinese 中文 (Zhōngwén), CMN is Chinese Mandarin but both have the word Mandarin in templates. I learned to live with this :) The reasons for existence of Chinese and Mandarin are historical. Mandarin is standard Chinese and most written Chinese material is in Mandarin. There are no YUE, NAN, etc. Wiktionaries but there are some new WIkipedias in dialects. --Anatoli12:36, 24 May 2010 (UTC)Reply
I proposed on WT:BP, and still do propose eliminating zh, zh-cn and zh-tw from category names. zh is used for translations as the Mandarin Wikiprojects uses the code zh not cmn. Mglovesfun (talk) 08:28, 29 September 2011 (UTC)Reply
I support moving to About Chinese languages. IMO as long as there is no Mandarin-specific information to be split off of that page, hard-redirect from About Mandarin. Precedent, fwiw, is About sign languages, redirected to from both About American Sign Language and WT:AASE (ase is American Sign Language) as well as from WT:ASGN (sgn is the group (or whatever it's called) code for sign languages).—msh210℠ (talk) 21:03, 10 November 2010 (UTC)Reply
Latest comment: 13 years ago17 comments7 people in discussion
I propose to make it a language policy of banning all proper names used in Mandarin context if they are not in Hanzi, regardless whether there are citations - Chinese do write in foreign language occasionaly, these foreign words don't become Chinese though. Foreign words should be and are transliterated into Chinese characters, otherwise they should not be considered Mandarin. The complexity is not a justification for not following this rule. This is to avoid entries such Thames河, Alps山, Alzheimer病, etc. once and for all. PRC and RC policies both regard using names in Roman letters as incorrect, which is widely accepted. --Anatoli05:18, 29 September 2011 (UTC)Reply
I support this. Japanese speakers also use Latin-based foreign words in their writing occasionally, when there is a perfect katakana equivalent. Sometimes, it's done for stylistic reasons (as, very unfortunately, Western cultures are considered trendy in Asian countries), sometimes, well, some just want to show off. You can find this aspect especially in their song lyrics. Quite often the English lines don't even make sense whatsoever. Anyway, I digress. As I noted, writing in foreign scripts especially Latin-based languages is especially trendy among younger generations. Ok let me put it another way. I have seen English speakers putting words in Japanese hira or kata characters in their writing, when the same concept can be written in English perfectly. It's the result of a change in people's perception towards the Japanese (language or otherwise), which is now considered trendy and also the proliferation of Japanese learners in the past decade. Again, does it mean these words are now considered borrowed into English? If you say yes, then I have no problem with Thames河 being included in this dictionary. Jamesjiao → T ◊ C06:00, 29 September 2011 (UTC)Reply
Re: setting up a vote (something mentioned in the BP): do you want to set up a vote that would only ban proper nouns? Or do you think common nouns like e-mail地址 should be banned, too? If so, then the vote could be broader. But your comments on RFV suggest you wouldn't delete all mixed-script entries (eg Y字). Presuming you'd like to ban e-mail地址 but not Y字, how can the vote be worded, so that it does that? - -sche(discuss)06:03, 29 September 2011 (UTC)Reply
@-sche, don't get me wrong, mixed scripts are perfectly normal, like the ones you listed and many more, eg. AA制(ēi'ēi zhì). Karaoke can only be written as 卡拉OK in Mandarin. I'm talking about proper nouns, I don't want mislead users to believe that Oslo is Oslo市 in Chinese, even if you find examples of usage. I have seen a Chinese map of Australia on a Chinese site on the internet where ony biggest cities were translated into Chinese. A user like Engirst would start quoting the untranslated names as Mandarin, which is wrong.
I was just comparing the analogy of using Japanese hiragana/katakana in English (esp. among Japanophiles) with the use of English (or other Latin script based languages) words in Chinese (due to trendiess probably?). This might not be a perfect analogy, but it's a start. You will also find that people are more inclined to use Latin characters in, especially for Proper nouns when using a computer keyboard (as opposed to handwriting). I also mentioned the fact that monolingual Chinese speakers wouldn't understand a mixed construction like this. Jamesjiao → T ◊ C06:45, 29 September 2011 (UTC)Reply
Oh another thing is pronunciation. For a word to exist in a language, there has to be a way to pronounce it. I can't imagine a non-English speaking Chinese speaker trying to pronounce Thames河 even if he/she is able to recognize and even pronounce the individual letters. Jamesjiao → T ◊ C06:52, 29 September 2011 (UTC)Reply
I definitely don't think that Kana words in English are to be considered English but I haven't seen it, that's why I couldn't understand what you mean. Yes, you're right, most Chinese speakers wouldn't have a clue how to pronounce Thames河 or Seine河, Hudson河 or Volga河. --Anatoli09:47, 29 September 2011 (UTC)Reply
There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.413:02, 30 September 2011 (UTC)Reply
Wow, I get such a strong sense of déjà vu here... Engirst, do you have any original arguments? Your points above have been refuted. As noted elsewhere:
we already have a record of Thames and a record of 河(hé);
using a term from one language in a sentence of another language may represent w:code-switching instead of borrowing;
there is nothing intrinsically Chinese about Thames;
the use of Thames in Thames河 is an example of an English term used as an English term in a Chinese context;
the use of Thames in Thames河 is a collocation of two independent terms;
So, to extrapolate a basic list of criteria for including any word from Language A under the heading for Language B, not just proper nouns:
Is the term used in Language B to convey any meaning that is different from its meaning in Language A?
Alternately, is the term used widely enough in Language B that most speakers and/or readers of Language B should be expected to know and readily use the term?
Well, that's it, actually. I can't think of any other solid reasons for including a term from one language under the heading for another language. Use in Language B does not necessarily mean that the term has been adopted into that language. As soon as the term is used as Language B, i.e. where it has some meaning that is specific to that language or where it is well-known and widely used, then I am happy to advocate listing under both Language A and Language B headings.
-- HTH, Eiríkr Útlendi | Tala við mig23:04, 30 September 2011 (UTC)Reply
This is a very comprehensive list. Code-switching is what I had in mind, but I couldn't remember the term at the time. Code-switching occurs extremely often in Taiwan, not just between Mandarin and English, but Japanese, Korean and even their local flavour of Hokkien dialect as well. I often see short Japanese phrases like かわいいね。。。 in Taiwanese online blogs mixed in with Chinese characters. This is a very typical case of code-switching in writing. Jamesjiao → T ◊ C02:06, 5 October 2011 (UTC)Reply
Not being a speaker of Mandarin or Japanese, I have a question which might help to clarify the issue for those in a similar position. Which of the following example in English best equates to "Thames河" in Mardarin: "résumé" (a French word, wholly adopted but retaining glyphs which are not properly in the English alphabet), άλφα (a Greek word which, when used, is italicized to indicate that it is from a different language), or something completely different? I do think it might be a bit early for voting, since in all of the discussions around this topic I have only seen 5 or 6 contributors. - DaveRoss02:37, 5 October 2011 (UTC)Reply
TheDaveRoss, it's only one user, not many (who creates/recreates them), trust me, with different IP's. The issue at hand is that this user claims that "Thames河" - English "Thames" + 河 (river) is a Mandarin word, citing examples from books. Note that river names are always followed by 河 or other similar words in Mandarin. There are other examples where foreign names are written in Mandarin without translating, showing the foreign name in the original script. My argument is that the Chinese word for Thames is 泰晤士河 (Google Books -3,150 hits) and there is no reason to include the SoP term Thames河, there is nothing Chinese in Thames. The rule and common practise is transliterate/translate people's names cities, etc. no matter how small. There are borrowings into Mandarin, very few have also a few Roman letters (三K黨 / 三K党Ku Klux Klan) but writing full names in Roman letters is a case of code-switching. OK#Mandarin is a common noun, not a proper name, it has become partially naturalised. Like any other language, Mandarin uses native script to write words, using other scripts when it absolutely has to. "London市" or "Hyde公园" are not exceptions, they are case of code-switching (simply Chinglish) - correct and common terms - "伦敦", "海德公园". The issue is not just Mandarin specific. Some argue that bluetooth should be the right way to write the word in Russian. A similar situation could arise for Japanese, Russian, Hindi or Korean, Arabic, others, where people insert Roman letter names. I believe these names don't become naturalised. I hope expressed myself well. If a word in Roman becomes naturalised, then we can include them, still discussing pizza#Mandarin (a common word). --Anatoli03:07, 5 October 2011 (UTC)Reply
Pinyin with no tra or sim
Latest comment: 13 years ago9 comments7 people in discussion
Is there any sensible way to find these? I have been speedy deleting some of these; given that {{pinyin reading of}} links to the tra and sim, it seems reasonable. For example we don't allow plurals that don't have a singlular ({{plural of|xyz}} when xyz doesn't exist yet). If anyone wants to create Hanzi entries for these, then recreate the pinyin, it is with my blessing. Mglovesfun (talk) 12:27, 2 October 2011 (UTC)Reply
He is saying that we don't allow a plural form entry for English words when the singular form does not yet exist. He is asking if that also means that we shouldn't have the pinyin form when the traditional or simplified Mandarin forms do not yet exist. He has been deleting them when he sees them. - DaveRoss02:39, 5 October 2011 (UTC)Reply
I think Engirst considers the character entries too complex and is not worth his time creating. I digress. There is in fact here: vote (That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.). It doesn't however explicitly exclude pinyin entries when there are no character entries present. Maybe the wording can be change to something like: That a pinyin entry, using the tone-marking diacritics, only be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.. Jamesjiao → T ◊ C02:47, 5 October 2011 (UTC)Reply
Sounds like a reasonable suggestion. There's not enough resources validating Romanisation entries (SoP, attestability, etc. let alone the Chinese characters - often one version is omitted). Not sure how this can be done but I support voting on this. Maybe Engirst will start creating some Chinese character entries before adding pinyin? (wishful thinking) --Anatoli03:11, 5 October 2011 (UTC)Reply
I'm OK with users adding valid pinyin (attestable / with correct tone-markings) without adding hanzi, I'm also OK with users creating valid plurals (attestable) without creating the singulars... we allow that on de.Wikt, we even have bots to create forms without regard to the presence of the lemmata, because in that way, a user who looks up the form or the pinyin will at least have a bit of information, better than nothing. Having said that, I think all of you, as the active Chinese editors, could form a consensus and agree that you interpret the vote as requiring hanzi to exist first (this is how I always interpreted the vote), and delete pinyin entries that have no hanzi form, without having a new vote. - -sche(discuss)03:36, 5 October 2011 (UTC)Reply
Seems like without any vote, nothing can be achieved in Mandarin space, most active Chinese editors (except for this user) all disagree with Engirst (he may now be avoiding his own user account) but changing or deleting his entries causes edit wars or someone may think he is just being bullied. --Anatoli03:55, 5 October 2011 (UTC)Reply
Just as another reference for comparison --
If I understand it correctly, the current policy for Japanese entries is to have the main entry with most of the information located under the kanji headword when there is one, or under the kana headword otherwise, and for the romaji (Japanese pinyin, as it were) entries to *only* serve as disambig pages pointing users to the relevant other headwords. Consequently, romaji entries should not have any "See also", "Derived terms", "Usage notes", or other headings. The kōgai(kōgai) entry is a good example of this in action. -- Eiríkr Útlendi | Tala við mig04:56, 5 October 2011 (UTC)Reply
Pinyin romanisation rules went further - parts of speech are not allowed but we do have many pinyin entries without hanzi. --Anatoli00:23, 7 October 2011 (UTC)Reply
Are the tone-markings on these words correct?
Latest comment: 13 years ago3 comments2 people in discussion
Talk:Nèi Ménggǔ, Talk:Ménggǔ. (Other editors: feel free to list entries in this section if you doubt they have correct tone-markings. It should be helpful to have a single place to gather them for cleanup. If there is such a place already, other than the clogged WT:RFC page, please move these there.) - -sche(discuss)12:01, 3 October 2011 (UTC)Reply
Your examples show the tone sandhi where the original third tone is pronounced as second in front of another third tone but it's usually not reflected in pinyin romanisation. --Anatoli12:54, 3 October 2011 (UTC)Reply
Latest comment: 13 years ago4 comments2 people in discussion
Templates like {{cmn-noun}} allow p for pinyin as a first parameter. This should be phased out. There's an effort to remove all pinyin from part of speech categories and have them only in Category:Mandarin pinyin and subcategories, at some point the templates will have to follow suit, though we're months away from being ready. So this is a heads up. --Mglovesfun (talk) 21:22, 10 November 2011 (UTC)Reply
But this parameter serves the same purpose as tr - transliteration and the hyperlink allows to see if there are other hanzi with the same pinyin. I have no strong opinion on your suggestion at the moment.
I've been checking your list at User:MglovesfunBot/cmn-parts-of-speech-Latn, as you have noticed. It's quite big, very time consuming, inviting other Sinophone editors to join the effort. If the entry' hanzi are red-linked, it can be deleted, rather than converted. Sometimes I also leave entries if they only have a Japanese but no Mandarin entry (planning to add them later). --Anatoli21:54, 10 November 2011 (UTC)Reply
I have some doubts about your request. The main reason being many homophones, and then the request should also specify if we want jiantizi, fantizi or both (there are variant characters) too. The conversion is far from straightforward. Perhaps, using audiofiles based on toned pinyijn was the right choice, even if it's more complicated to use bots to add audio files to hanzi entries. I see some of audio entries miss tone marks. --Anatoli(обсудить)21:43, 19 February 2012 (UTC)Reply
I think the audio files should stay at the pinyin filenames, because if I am not mistaken, multiple characters with the same pinyin romanization X have the same pronunciation. Giving the file a pinyin filename allows it to be uploaded to all characters that have pinyin X. It seems easier to write a bot to do that, than to host the same file under dozens of names. - -sche(discuss)21:48, 19 February 2012 (UTC)Reply
If Commonsrad seems not clear enough, I have no objections to give the name 5 bytes more — the data space of 2000 bytes more won't mind either. I chose a name close to {{Commonscat}} because it is very similar to it. In fact, Commonsrad can told a variation of Commonscat but with a display better suited for its usage, and the possibility for easy expansion whenever wanted. If then it may be used to link non-radical Chinese glyph Wiktionary pages to their Commons categories, Commonsrad is not so misleading than a clearer descriptive name like {{commons radical}}. -- sarang♥사랑18:09, 11 April 2012 (UTC)Reply
It seems to be a use at Wiktionary to have template names with lower case initials (with upper case redirects)? Another question to decide! -- sarang♥사랑05:48, 12 April 2012 (UTC)Reply
The transliteration and four-corner number, respectively, of these characters were tagged {{fact}}; can anyone verify them? they and the Japanese character 䋖 (the On-reading of which has been questioned) are the last remaining Han characters tagged {{fact}}. - -sche(discuss)00:01, 27 December 2012 (UTC)Reply
Toneless pinyin usage notes
Latest comment: 11 years ago16 comments6 people in discussion
Currently, our toneless pinyin entries all have a usage note at the bottom which says:
English transcriptions of Chinese speech often fail to distinguish between the critical tonal differences employed in the Chinese language, using words such as this one without the appropriate indication of tone.
I don't have much of a problem with it (although maybe "Chinese" should be changed to "Mandarin"), but I realized that if we do want to change it, it will be somewhat difficult, and some of them may be edited and fall out of synch. To solve that, I propose that we create a template called {{cmn-toneless-note}} or something similar and ask an editor with an AWB account to change all instances of the text into a template call. What do you guys think? —Μετάknowledgediscuss/deeds19:13, 6 January 2013 (UTC)Reply
Support. Also, "using words" should probably be "writing syllables". (We don't have toneless-pinyin entries for whole words, only for individual syllables.) —RuakhTALK20:28, 6 January 2013 (UTC)Reply
Well... sort of. On one hand, you are correct that this is only used for specific syllables, but OTOH the syllables are words, in the loose Chinese way of looking at what constitutes a word. (One Chinese man was trying arduously to convince me that all words in Mandarin are one syllable long. I was unsuccessful in my attempts to get him to revise his native definition of what a word is to the Western linguistic concept.) Incidentally, the entries (like nu#Mandarin) also point to forms like nǚ, which not only is marked for tone but also has a different vowel, and perhaps the note should reflect that. (Of course, I'm not sure how useful that is anyway, because when my friends don't have access to the character 女, they type nv3, not the equally inaccessible diacritic form.) —Μετάknowledgediscuss/deeds21:13, 6 January 2013 (UTC)Reply
Well, if our goal were to conform to "the loose Chinese way of looking at" their languages, then we'd treat all of them as dialects of a single language. It isn't, so we don't. By most linguistically-well-informed accounts, the vast majority of Mandarin words are bisyllabic. —RuakhTALK22:50, 6 January 2013 (UTC)Reply
I don't find it arrogant but one needs to know Chinese (also Vietnamese, Thai, etc.) are traditionally called monosyllabic as all or almost all polysyllabic words are made of component words, exceptions are phonetic transription, characters that have lost their meaning over the time but it's less of a case with Mandarin. --Anatoli(обсудить/вклад)04:44, 10 January 2013 (UTC)Reply
I was referring to the "dialect/language" comment, where he regarded "we" as identical to himself in having the personal stance of considering "Chinese is not a single language" to be false. It is a language, by Wikipedia at least. 129.78.32.2105:04, 10 January 2013 (UTC)Reply
Views on this differ but I agree that Chinese topolects are more like dialects than separate languages, even if they may not be mutually comprehensible when spoken, quite different on the written level, they are often closer than dialects of other languages (provided they are written the Chinese way, using hanzi, not Roman, Cyrillic, Arabic or other scripts). Wiktionary treats Chinese topolects differently as per language headers but translation are all nested under "Chinese", e.g. Chinese/Mandarin, Chinese/Cantonese, etc. --Anatoli(обсудить/вклад)05:13, 10 January 2013 (UTC)Reply
Please note that full words in toneless pinyin were explicitly forbidden by votes and almost unanimous agreements, it happened before Metaknowledge became active. --Anatoli(обсудить/вклад)22:54, 6 January 2013 (UTC)Reply
Delete all pinyin, whether toned or not. Move it to Appendix at least. It is merely a transcription scheme, not even official orthography. 129.78.32.2105:06, 10 January 2013 (UTC)Reply
It doesn't work this way. IP users (anonymous) with no or little contributions have little influence and structure is decided after discussions, votes, etc. Entries in Category:Mandarin pinyin do not claim they are proper writing, they are a helpful tool for users to help them find hanzi entries. They have limited information, all information is contained in hanzi entries. Compare bàoyuàn and 抱怨(bàoyuàn). --Anatoli(обсудить/вклад)05:19, 10 January 2013 (UTC)Reply
I knew they contain limited information. Still, they should not exist in the main namespace. This is a dictionary, much more specific than a "tool". The search function is sufficient in directing users to character entries for polysyllabics. With the monosyllabics a link to an Appendix page is all that is necessary. Keeping everything in the main namespace is unworthily energy-consuming. 60.240.101.24606:40, 10 January 2013 (UTC)Reply
Proposal to change topical categories for Mandarin to match other languages, sort by pinyin, not radical
Latest comment: 11 years ago1 comment1 person in discussion
How did you find them, at random or you have a script for that? User:Tooironic used to add IPA but he is less active now, User:Wyang has developed an entry creation template - Template:cmn new, which also generates the IPA, so for 积累, the IPA is /t͡ɕi⁵⁵ leɪ̯²¹⁴⁻²¹⁽⁴⁾/. My preference is to delete the IPA altogether (replace with {{rfp}}, rather than showing the wrong info. --Anatoli(обсудить/вклад)23:36, 23 May 2013 (UTC)Reply
I found them at random(ish). I used WP:AWB to find entries containing deprecated IPA characters, and happened to notice that in addition to containing deprecated characters, all of these entries also lacked vowels. - -sche(discuss)23:47, 23 May 2013 (UTC)Reply
Unified Chinese vote
Latest comment: 10 years ago1 comment1 person in discussion
Demonyms and language names are common nouns in Chinese. I suggest to use lower case for pinyin and no space, even if dictionaries are inconsistent. Please vote below and invite anyone who might be interested. So, for example: For 中國人/中国人 (Zhōngguórén) - zhōngguórén, 中文 (Zhōngwén) - zhōngwén, not Zhōngguórén/Zhōngguó rén and Zhōngwén.
I do, in English but nihongo or nihonjin is not capitalised. Russian, Finnish doesn't capitalise those. French only capitalises demonyms, not languages. It can go both ways with language names and demonyms, dictionaries have one or the other way. That's why this discussion. --Anatoli(обсудить/вклад)10:20, 8 May 2014 (UTC)Reply
@Geographyinitiative: Can you make up your mind if you support or oppose this proposal? You voted twice on the same day. It's illegal (your vote won't count) and supporting both options is not part of this vote but you can comment or abstain. Also, please check the topic of the vote. This is about capitalisation of Mandarin pinyin only and only for demonyms e.g. 中國人/中国人 (Zhōngguórén) and language names e.g. 中文 (Zhōngwén). Country, city names, etc. are not part of this vote - they are capitalised. --Anatoli T.(обсудить/вклад)01:01, 27 May 2019 (UTC)Reply
Whoops, I see what you are saying. But anyway, here's my response: 汉语拼音正词法基本规则 (2012) 6.3.3 says "专有名词成分与普通名词成分连写在一起,是专有名词或视为专有名词的,首字母大写。例如:Míngshǐ(明史)Hànyǔ(汉语)Yuèyǔ(粤语)Guǎngdōnghuà(广东话)Fójiào(佛教)Tángcháo(唐朝)"
Xiandai Hanyu Cidian 7 p513 Hànyǔ / p1620 yuèyǔ (but Yuè by itself is capitalized- don't know what's going on there) / p488 No entry for 广东话, but there is one for "Guǎngdōng níngméng" and "Guǎngdōng yīnyuè" / p396 Fójiào / p1273 No entry for 唐朝, but Táng by itself is capitalized
(Withdrawn) 01:32, 27 May 2019 (UTC)
(Withdrawn) 07:47, 28 May 2019 (UTC)
I find your comments in this section about "do them all" disturbing. Do you actually realise we're building a dictionary? It's not a playground. I don't think this minivote is going anywhere, anyway. --Anatoli T.(обсудить/вклад)08:24, 28 May 2019 (UTC)Reply
It's a bit extreme, but I think there is sense to it. We could avoid debate about whether chengyu are hyphenated or whether 不知道 is bu zhidao or buzhidao. And including bu zhi dao AND buzhidao might make searches easier. —Suzukaze-c◇◇08:48, 28 May 2019 (UTC)Reply
I ran into a capitalized pinyin entry today: Lai2 (linked to from 萊). Capitalized tone-number pinyin like that does look weird to me. Capitalized diacritical pinyin looks less weird. - -sche(discuss)20:11, 31 May 2014 (UTC)Reply
Capitalisation and part of speech of month names
Latest comment: 8 years ago2 comments2 people in discussion
==Historical languages==
{{wikipedia|History of the Chinese language}}
{{wikipedia|Historical Chinese phonology}}
Historical Sinitic languages include the spoken languages {{w|Middle Chinese}} (ltc) and {{w|Old Chinese}} (och), the written language {{w|Literary Chinese}} (lzh), and the protolanguage {{w|Proto-Sino-Tibetan}}. Entries for words in these languages are used, except for Proto-Sino-Tibetan, which is a protolanguage and thus in the Reconstruction namespace. These terms can also appear in etymologies for entries in modern Sinitic languages, and in entries for languages that have borrowed from Chinese, notably Japanese, Korean, and Vietnamese.
Finer distinctions are possible, such as Late Middle Chinese and Early Middle Chinese for the spoken language, and Literary Chinese versus earlier Classical Chinese for the written language. These distinctions can be made in the text of etymologies, but these do not have ISO 639 codes, and thus are not used for level 2 headings.
The precise meaning and status of these “languages” is complicated: narrowly speaking “Middle Chinese” and “Old Chinese” refer to various phonological reconstructions, notably based on rime dictionaries, and do not necessarily refer to a specific historical dialect or common language. Nevertheless, they are useful designations for historical periods.
Most modern Sinitic languages descend from Middle Chinese, with the notable exception of Min, which diverged earlier, with Proto-Min also descending from Old Chinese; see ]. A notable example of this difference is {{m|zh|茶}}, from which English {{m|en|tea}} is from Min and {{m|en|chai}} is from other Chinese.
Literary Chinese is significantly different from the spoken languages; this may be compared with Medieval Latin versus Romance languages. Literary Chinese (lzh) is the correct source language for literary terms in modern Sinitic languages, notably {{w|chengyu}} ({{w|four-character idiom}}s), and in borrowings such as the corresponding Japanese {{w|yojijukugo}}.
===Middle Chinese===
{{wikipedia|Middle Chinese}}
As Middle Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *.
===Old Chinese===
{{wikipedia|Old Chinese}}
{{wikipedia|Old Chinese phonology}}
As {{w|Old Chinese phonology}} is not attested (it is only reconstructed), please be sure to mark pronunciations with *. As sources differ, please carefully cite specific references (author and year) for any reconstructions.
Obsolete policies on cognates and stubs
==Cognates and stubs==
Across Sinitic languages, a single written form is very frequently shared across a long historical period and wide geographical area. Thus cognate entries in different languages appear on the same page; this occurs quite frequently for cognates in closely related languages in other scripts, but to nowhere near the same degree as in Sinitic languages. Due to this, it is generally unhelpful, and possibly incorrect, to create an entry for one Sinitic simply by copying the heading and definitions for Mandarin. It is unhelpful because this adds no information beyond which a reader could themselves guess (cognate so probably the same meaning), and possibly incorrect because words do differ between these language; blindly copying without a reference is not reliable.
Thus, when creating a new Sinitic entry, please try to add ''some'' information distinctive to the particular language, particularly pronunciation, references, or citations.
For etymologies, each entry should include an Etymology section indicating its immediate ancestor term. For native words in modern Sinitic languages this is either Middle Chinese (most) or Proto-Min (thence Old Chinese) for Min languages. Per usual practice (see ]), it is acceptable to include full etymologies back to Proto-Sino-Tibetan in modern entries. However, unless there is something specific to the etymology of a term in a given language, this is tedious to repeat for all modern languages. It is thus preferred (and sufficient) to only include the full history at representative languages, namely Mandarin and Min Nan (most used in each branch), with other languages just indicating the immediate predecessor and having a link reading “more at Mandarin/Min Nan”.
Similarly, it is tedious and not helpful to list contemporary cognate terms ''unless'' some particular relationship or contrast is being given. Instead, ancestral relationships can be given both backwards (in the Etymology section), to Middle Chinese, Old Chinese, and Proto-Sino-Tibetan, and forwards (in the Descendents section), from Middle Chinese, Old Chinese, and Proto-Sino-Tibetan to later forms. In these Descendents sections, listing pronunciations of descendent terms along with the spelling allows easy comparison, and avoids the duplication of the same listing in all modern forms. These are more useful than sibling relationships between cognates.
New font for Chinese?
Latest comment: 8 years ago12 comments3 people in discussion
@Justinrleung, Suzukaze-c Is it just me who feels the font for Chinese is not as pretty as Japanese? I updated my Mac and it has become even uglier. It lacks the 'weight' (is this the correct term?) in comparison. For example, 產 - even the cangjie input looks prettier than the Chinese font. Thoughts? (Disclaimer: I know nothing about fonts...) Wyang (talk) 06:25, 12 October 2016 (UTC)Reply
It looks like the browser default (and thus probably the best choice) is "PingFang SC". SimSun seems to be imposed on readers by MediaWiki:Common.css. (man, there are some questionable font choices there...) —suzukaze (t・c) 07:41, 12 October 2016 (UTC)Reply
(holy shit someone shares my hatred for code2000) It's also weird how the fonts for .Hans and .Hant are defined a second time later on. —suzukaze (t・c) 07:48, 12 October 2016 (UTC)Reply
I think it may have been me who messed it up before (羞慚). Any recommendations on what the
Ooohhh, I like this. It definitely looks better and more solid than before. If no one objects, we will change it to this until someone proposes an improvement. Wyang (talk) 07:49, 13 October 2016 (UTC)Reply
Simplified Chinese in all templates and modules
Latest comment: 8 years ago6 comments4 people in discussion
I think we should stick to the promise of providing simplified Chinese in all templates, modules. The dialectal data tables currently don't show simplified forms. Do people think we need to cater for that? I understand this will be formatting and other work involved but simplified Chinese users shouldn't feel neglected. --Anatoli T.(обсудить/вклад)09:31, 14 October 2016 (UTC)Reply
Yeah, it is disabled for now. Displaying both made the table look very cluttered. I was thinking about developing a js switch for all Chinese entries, allowing the user to choose trad/simp in all Chinese texts (zh-l, zh-x, zh-der, zh-dial, etc.). Wyang (talk) 11:01, 14 October 2016 (UTC)Reply
But that will only work for registered users. How about we have the simplified characters display as ruby, like this: 我們, 妳們? (We might want to increase the size of the ruby.) — justin(r)leung{ (t...) | c=› }16:36, 14 October 2016 (UTC)Reply
The switch may be a dropdown underneath the ==Chinese== header, similar to how this page hides the romanisation on a click. The Ruby method is potentially good too, if we can increase the size and align them well, though making links may be more complicated. I think User:Suzukaze-c was trying to write some sort of gadget for this some time ago, but I can't find it now. Wyang (talk) 21:21, 14 October 2016 (UTC)Reply
Why not just display 我們/我们 with a suppressed romanisation? The columns may need to get wider and care should be taken to have correct conversions with the ability to override. What does everybody think? --Anatoli T.(обсудить/вклад)02:53, 15 October 2016 (UTC)Reply
Latest comment: 7 years ago12 comments6 people in discussion
Hi all. I'm thinking about overhauling the format of Chinese definitions, by using a templated approach which strictly associates word information (part of speech, synonyms, antonyms, measure words, examples, dialectal equivalents, etc.) with the individual senses. It may be along the lines of User:Wyang/zh-def. I think this is more conducive to the efficient expansion of the Chinese content with more synonyms, antonyms, ... etc. information. What does everyone think about the changes? Wyang (talk) 09:16, 23 November 2016 (UTC)Reply
Thanks guys. It is a big change, but my feeling is that this sort of sense-synonym/antonym/... integration has to be done sooner or later; there were some calls before (for example User:DTLHS/export, which was referenced in this layout), but no one has really tested doing it. The reason for the integration is that synonyms etc. are only valid on a sense-specific basis, the same as classifiers (which has already been adapted to be sense-specific) and dialectal equivalents. Moedict and Cantodict also do the same.
The code can probably be simplified, such as switching pos to argument 1, and definition to argument 2. The enclosing zh-def template may be omittable too - if we can automatically generate the <ol> ~ </ol> using some css magic. If a Java gadget could be designed to allow GUI edit of the individual senses, while the raw code remains unchanged, that would be the most fantastic. The increase in Lua memory usage seems quite small - I tested with the equivalent current code, which was 18.96 MB, slightly smaller compared to the new version (19.62 MB). A good thing about enclosing senses is that sense ids can be created and used to reference individual senses elsewhere. Bot conversion of the definitions should be reasonably straightforward too. Wyang (talk) 14:24, 23 November 2016 (UTC)Reply
I removed the need for the outer enclosing template, and integrated all the code into a single template. It looks like this:
where the different senses are separated by |-, and the effect is the same. It should be easier to use now. The memory requirement is slightly reduced in the process: 18.97 MB, nearly the same as the current format (18.96 MB). Wyang (talk) 23:18, 23 November 2016 (UTC)Reply
Hi. Sorry, I got the ping but I'm a bit confused. It's a good effort but I agree that this is a radical change with the current format and too different with other languages again. Displaying PoS's in front of definitions (translations) are definitely worth considering. --Anatoli T.(обсудить/вклад)09:38, 24 November 2016 (UTC)Reply
Thanks Anatoli. I accidentally discovered something I wrote > 2 yrs ago on Talk:一致, and it seems my desire to change the format has been long-standing... The division of definitions by part of speech is really not ideal for analytic and less inflecting languages (努力, 保險, 可能). IMO treating synonyms, antonyms and so forth as belonging to senses is also important, as we add more and more of these see-also-type of words. At the moment 生 (shēng) looks fairly neat (albeit not as clear as if the PoS info were next to the senses), but if I add synonyms, antonyms, see-also terms as in User:Wyang/zh-def#生, the page could become quite confusing. Wyang (talk) 11:22, 24 November 2016 (UTC)Reply
As already mentioned above, these are radical changes you are proposing. Since they go to the heart of Wiktionary's layout, you'd be better off seeing if you can carry them out by getting support from other members of the community for ALL languages, not just Chinese. ---> Tooironic (talk) 12:49, 24 November 2016 (UTC)Reply
I really have no faith in the Wiktionary community in this. Haiz.
If we, as the Chinese-editing community, believe that a current practice is unfittingly designed for Chinese, we should strive to achieve what we think is most suitable. I myself only have limited power in making a difference. It's like the opposition to {{zh-pron}} formatting and the Chinese merger before; other people are unfamiliar with this, so unless we adopt what is right, we won't progress efficiently. Wyang (talk) 11:58, 25 November 2016 (UTC)Reply
Sichuanese
Latest comment: 7 years ago5 comments3 people in discussion
I'd like to add entries for Sichaunese, but it appears that it doesn't have an ISO code, so one would have to be created. I don't think it should be included under the Mandarin section for zh-pron and other places due to the differences between the two (47.8% lexical similarity and < 60% intellegility) and also there is the sheer number of potentional listings that could be under Mandarin (ie Shandong, Shaanxi, Dongbei etc.) Maybe listing under Southwest Mandarin would be okay though. Most of the coverage would probably be on the Chengdu dialect, but I'm not sure how other dialects, some of which are quite different, would be accounted for.--Prisencolin (talk) 02:13, 10 December 2016 (UTC)Reply
It is a variety of Mandarin though - it would make more sense to group it under Mandarin and reorganise the Standard Mandarin tags accordingly. Wyang (talk) 16:44, 13 December 2016 (UTC)Reply
I'm not contesting that it's part of the Mandarin branch, I'm just concerned that at some point we might have over a dozen different entries under "Mandarin" that could appear and it might be a bit disorganized. Why don't we just create a new module for "Southwest Mandarin" and put it under that? It still has "Mandarin" in the name after all. It also has the benefit of being able to group related varieties together in specific subcategories.--Prisencolin (talk) 04:55, 15 December 2016 (UTC)Reply
'The body of this page needs to be updated to explain the new policy'
Latest comment: 7 years ago9 comments5 people in discussion
Hi, regarding the message reading 'The body of this page needs to be updated to explain the new policy.', I'd like to know when the update is going to be carried out, or at least where I can read the new policy. Thanks in advance. --Backinstadiums (talk) 15:48, 8 June 2017 (UTC)Reply
Yes, the notice can go now. I put it there after we moved to the unified Chinese L2 header but the policy described the old standards. Now it's matching what we are doing.--Anatoli T.(обсудить/вклад)22:15, 8 June 2017 (UTC)Reply
@Justinrleung I don't know if this policy is a draft. It's official - either endorsed by a vote or unchallenged by the community. The format of soft redirects wasn't endorsed, though but wasn't challenged either. There are still thousands of unconverted Mandarin and Cantonese hanzi entries, which are hard to convert for obvious reasons. Things to discuss are pinyin, jyutping and POJ entries (headers, categories and templates), Cyrillic Dungan and Arabic Xiao'erjing. What to do with topolects without an established writing system and lack of transliteration standards.--Anatoli T.(обсудить/вклад)23:47, 8 June 2017 (UTC)Reply
@Justinrleung I see your point, thanks. Yes, leave it as is. Another thing we need to do for Chinese (and any language in scriptio continua, including Vietnamese) is to define CFI. Definition of "word" or "some of parts" are not exactly the same in Chinese as with languages with spaces. Even German or Finnish criteria for inclusion differ from English. --Anatoli T.(обсудить/вклад)
On that note, I would like to suggest that we relax the part of CFI on personal names slightly, to allow names which are directly found in idioms and set phrases, such as
I'm still curious as to how irregular the phonology (esp. tone sandhis) is - this will determine the kind of system that would be ideal for use. Wyang (talk) 23:01, 5 July 2017 (UTC)Reply
Well, the tone sandhi is pretty complicated. I've made a lookup table for 2-word sandhi, but it's based on my accent, not the de facto "standard" accent in urban Wenzhou. Other than that, the phonology is not that irregular. Mteechan (talk) 04:38, 6 July 2017 (UTC)Reply
Latest comment: 6 years ago6 comments4 people in discussion
Wiktionary:Votes/pl-2014-04/Unified Chinese decided that words written in Chinese characters should be unified to Chinese header. However it also says the formats of templates in words written in non-Han scripts devised specifically for particular topolects above are not the subject of the vote and can be discussed separately if needed.
Sinitic terms (lemma or not) written in non-Han scripts includes:
Pinyin romanization of words
Jyutping romanization of characters
POJ form of words
Cyrillic Dungan
Xiao'erjing words
others, like zhuyin fuhao
There're two different topolect headings to use:
Use the topolect as heading (e.g. Mandarin, Cantonese, Min Nan, Dungan)
Use Chinese as heading for all terms (like this and this)
I propose to migrate all Sinitic terms (lemma or not) to Chinese header and eliminate any topolect header, to finish unification of Chinese. Any thought? Note this proposal only concerns header and says nothing about category.
--Zcreator (talk) 02:24, 4 February 2018 (UTC)Reply
Weak Oppose, since romanizations like Jyutping are made specifically for Cantonese, unlike hanzi spellings, which can be shared across dialects.
AcceleratedFormCreation.js seems to be using "Chinese" because the accelerated creation links are found under a Chinese header. —Suzukaze-c◇◇07:05, 3 May 2018 (UTC)Reply
The only difference from the current Běijīng entry would be a different L2 header (Chinese) and a linked name of the romanisation. Since Hanyu Pinyin is only used for Mandarin, it becomes obvious, which lect the romanisation applies to. --Anatoli T.(обсудить/вклад)07:29, 3 May 2018 (UTC)Reply
My understanding is that unification of Chinese reduces duplication due to the large number of shared written forms across lects.
There is no such concern for romanizations, which are unique to a certain lect, so I think they should not use the "Chinese" header. Min Nan is Chinese, but I am not yet convinced that chai-iáⁿ#Chinese is helpful. I imagine that a "unified Chinese" plan would never have taken place if China used phonetic scripts, and there were no hanzi to "bind" lects together.
Thanks for the response. Let's see what other people think. Converting Min Nan Pe̍h-ōe-jī to Chinese L2 was looked at favourably but not everyone thinks we should have Hanyu pinyin entries in the first place. --Anatoli T.(обсудить/вклад)13:22, 3 May 2018 (UTC)Reply
Dungan Cyrillic transliteration
Latest comment: 6 years ago3 comments2 people in discussion
I also think we should review the method itself, which is not quite standard, anyway - e.g. get rid of Cyrillics in the translit and make it more meaningful. --Anatoli T.(обсудить/вклад)03:20, 11 April 2018 (UTC)Reply
By all means do please. Something similar to Xiaoxuetang would be ideal, but it would also mean a large maintenance requirement. Wyang (talk) 05:00, 25 September 2018 (UTC)Reply
Romanisations of Chinese
Latest comment: 5 years ago2 comments2 people in discussion
According to the present policy, Pinyin romanisations of monosyllables and polysyllables for Standard Mandarin (aka Putonghua), such as "yī" and "bùguò" are allowed. However, for Standard Cantonese, only Jyutping romanisations of monosyllables of monosyllables are allowed (e.g. jyut6, ping3), while those of polysyllables are disallowed. Why is there such unequal treatments for the two languages? I believe that Jyutping romanisations of polysyllables should be allowed and massly created, as Pinyin romanisations of polysyllables are allowed and exist in a large quantity. Jonashtand (talk) 06:34, 9 December 2018 (UTC)Reply
If it's really that confusing, then we should split up all Chinese into different dialects. Right now we are basically saying, if an entry has a Mandarin pronunciation, we can tell people about that, but if it has only dialect pronunciations, we're just going to ignore those and not let you see them.
(Withdrawn) 23:33, 4 April 2019 (UTC)
Of course we can but it needs to be split by dialects, if they are not Mandarin. I don't know we have a version of {{zh-l}} for specific lects, if not, the transliterations need to be provided manually. --Anatoli T.(обсудить/вклад)23:54, 4 April 2019 (UTC)Reply
This is easier said than done. Something we need to consider is which lect to display (if we're only displaying one). Mandarin has that status of being the standard for Chinese, but there's no rule to say which one should be chosen if a term is non-Mandarin but used in many other lects. Do we want to display several lects at a time? Do we need a separate parameter to specify which lect(s) we want to display? There are just so many things to consider that editors like me haven't actually bothered to address about it. — justin(r)leung{ (t...) | c=› }03:08, 5 April 2019 (UTC)Reply
I don't like your e.g. diff because, again, you are mixing dialects together without supplying the important information, eg. {{q|Cantonese}}: (Cantonese) next to the word (before or after) or under a different subheader. --Anatoli T.(обсудить/вклад)04:39, 5 April 2019 (UTC)Reply
Problem with zh-see- simplified character version 有边读边
Latest comment: 5 years ago5 comments4 people in discussion
As a temporary measure, what about making extract_gloss return "" if extracting failed (that is, if the gloss contains characters like |, =, {, etc.)? --Dine2016 (talk) 07:15, 23 April 2019 (UTC)Reply
(Withdrawn) 22:10, 24 April 2019 (UTC)
(Withdrawn) 02:01, 26 April 2019 (UTC)
(Withdrawn) 02:05, 26 April 2019 (UTC)
(Withdrawn) 04:05, 26 April 2019 (UTC)
You need to have a more positive attitude. The template can't handle other template inside of it at the moment, which can, I'm sure be resolved. What can YOU do to help this "nonsense dictionary"? Edits are linked like this: diff. --Anatoli T.(обсудить/вклад)04:11, 26 April 2019 (UTC)Reply
(Withdrawn) 04:32, 26 April 2019 (UTC)
This is also a technical issue, not specific to Chinese. WT:GP is a place to ask for problems, not necessarily Chinese editors. Wyang has left and the rest of us may not be Lua and template savvy, not on the same level, anyway. I didn't ask for full definitions to be added to {{zh-see}}, which caused the issue in the first place. We can revert to simple soft-redirect without any definitions, which will invariably cause problems. --Anatoli T.(обсудить/вклад)04:37, 26 April 2019 (UTC)Reply
Latest comment: 5 years ago2 comments2 people in discussion
(Withdrawn) 06:06, 1 May 2019 (UTC)
@Geographyinitiative: Using zh-see for POJ is still experimental, but I think they all should be converted to zh-see eventually. (I can't seem to find the relevant discussions though.) About the quotation, it should either be put on the 綴 page or in Citations:tòe. BTW, tòe does not mean "with" but "to follow". — justin(r)leung{ (t...) | c=› }07:57, 1 May 2019 (UTC)Reply
We are a dictionary too. The references give the correct pronunciations. As for capitalisations, spacing and hyphenation of the transliterations, it’s up to dictionary owners’ rule. Pinyin is not that important and the hyphen was used for etymological purposes in those dictionaries, nothing to do with the Chinese spelling or how the word is pronounced. --Anatoli T.(обсудить/вклад)08:10, 21 May 2019 (UTC)Reply
There's no such thing as "the right thing" if policies and conventions differ between different sources. Using correct Chinese is important. Pinyin is a tool, not a language.
As dictionary editors, we can and should decide on rules, make a proposal, have a vote, write it up as a language policy. Pinyin is not a writing system, this should only be used as such, even if we allow soft-redirect entries for pinyin.
I oppose adding all romanisations from all dictionaries. We may want adding zhuyin for Min Nan and Hakka but we almost covered all romanisations for varieties in use, adding all possible variants is silly, no dictionary does it. Any dictionary defines romanisations they use and consistently stick to the definitions. --Anatoli T.(обсудить/вклад)07:34, 26 May 2019 (UTC)Reply
Latest comment: 5 years ago1 comment1 person in discussion
I'll be concise for those knowledgeable, and refer to brief and basic bibliography for those who are not.
The Chinese elasticity/flexibility is a lexical property of chinese terms, two sides of the same coin, which must be reflected in the very same entry for a certain lemma.
Therefore, for example the fifth version of the prestigious XDHYCD (Xiandai Hanyu Cidian) applies mutual annotations in the respective entries, so that the entry for 煤 mei ‘coal’ reads "noun, … also called 煤炭 mei-tan ‘coal-charcoal’", and the entry for 煤炭 meitan ‘coal-charcoal’ is annotated as "noun, 煤 mei ‘coal’".
Unfortunately, currently in wiktionary this is wrongly reflected in the broadly termed 'compounds' section, as a synonym or after 'see also', and only for the monosyllabic version.
Please, before commenting read the following brief article (and if necessary further references within it); if you still have any questions, I'll be glad to try and answer them.
Latest comment: 5 years ago2 comments2 people in discussion
I'd like to add information about how tone contours are orally distributed along Mandarin rhyme diphthongs. I've tried to find a graphic with variables such as time, volume of speech, pitch levels, etc. to no avail --Backinstadiums (talk) 09:07, 22 October 2019 (UTC)Reply
@Geographyinitiative: The Chinese header's not going any time soon, even if it should be. There's no quick and dirty way to change existing framework to separate lects, whatever that looks like. Sometimes it's more about usability than "correctness", whatever that may look like. — justin(r)leung{ (t...) | c=› }01:51, 17 December 2019 (UTC)Reply
(Withdrawn) 01:58, 17 December 2019 (UTC)
@Justinrleung: Thanks for pinging. Can we see some {{diff}}'s, please?
Latest comment: 4 years ago1 comment1 person in discussion
(Withdrawn) 13:07, 29 December 2019 (UTC)
(Withdrawn) 13:31, 29 December 2019 (UTC)
The translit without the “u” is not WG but another phonetic non-standard transliteration based on WG. For English and other speakers, it’s easier to make sense of eg “ko” than “kuo”, etc, hence "Komindang". Anatoli T.(обсудить/вклад)14:23, 29 December 2019 (UTC)Reply
Jyutping /-a/, /-oet/
Latest comment: 4 years ago2 comments2 people in discussion
Latest comment: 4 years ago32 comments7 people in discussion
I noticed that WT:WDL lists "Chinese" as a WDL, which seems contrary to our usual practice (as far as I can tell) of treating dictionaries as sufficient for most topolects and for classical vocabulary. Does anyone object to changing it to "Standard Mandarin", to clarify that non-Mandarin Chinese only requires one use or reliable mention for attestation? —Μετάknowledgediscuss/deeds23:22, 14 May 2020 (UTC)Reply
@Metaknowledge: Standard Mandarin should be good. That being said, I'm not sure if that would include Standard Chinese from Hong Kong (not really Cantonese proper, but usually read in Cantonese), which is sufficiently documented for sure. Written Cantonese may also be included, I think - it's pretty robustly attested. Other topolects would not qualify for WDL status as of now AFAICT. — justin(r)leung{ (t...) | c=› }00:17, 15 May 2020 (UTC)Reply
(edit conflict) @Justinrleung: Hi. Do we really have enough attested material in Written Cantonese? I am surprised. It's always been talked about as a mostly spoken lect with comics and other informal writings occasionally using it. I may have been out of touch wit the latest developments, though. --Anatoli T.(обсудить/вклад)01:20, 15 May 2020 (UTC)Reply
@Atitarev: I mean there are always newspaper/magazine articles (usually tabloids) that may mix in Cantonese or use Cantonese entirely. Would that be enough for it to be considered WDL? — justin(r)leung{ (t...) | c=› }01:26, 15 May 2020 (UTC)Reply
@Suzukaze-c: Thanks. Yes, if the spoken media is considered good for CFI, then yes, but we are a written dictionary, so if a word is written in MSM (in movie or news subtitles) but pronounced in Cantonese, how do you reconcile that with what we are doing here? --Anatoli T.(обсудить/вклад)01:40, 15 May 2020 (UTC)Reply
@Justinrleung: Re: your question - I don't know if they are enough. In the past, from what I read and heard discussions about, it wasn't. The Cantonese version of diglossia makes it difficult to separate what is standard and written, since when it's written and is standard, then it's MSM, not Cantonese. --Anatoli T.(обсудить/вклад)01:43, 15 May 2020 (UTC)Reply
Yes, I object because this policy favors Mandarin as the standard and disregards the presence of other languages, and also it makes it easier to create dialectal entries using only one mention which will introduce many errors to this site. Can I know why non-Mandarin Chinese has been omitted? How about Mandarin read using Cantonese? Will that be considered part of Standard Mandarin? I want to see other Chinese languages such as Cantonese treated the same as Mandarin. All languages equal, not one superior over the rest. Iambluemon (talk) 01:14, 15 May 2020 (UTC)Reply
Where do you see favouring Mandarin? They are talking about attestations. Standard Mandarin is easier to attest than other forms, since Chinese write much less in the varieties. Chinese contributors have put a lot of efforts in actually improving the coverage of other Chinese varieties, not the other way around. --Anatoli T.(обсудить/вклад)01:20, 15 May 2020 (UTC)Reply
Non-MSM lects are often not "well-documented". I think it's fairly straightfoward. Keeping stricter regulations will make it harder for non-MSM content to be on the site, which would lead to a fairly unequal picture on Wiktionary where MSM would dominate even more than it already does. —Suzukaze-c◇◇01:24, 15 May 2020 (UTC)Reply
I support add Cantonese as WDL. If you put Standard Mandarin as the only "well-documented" variety it creates the impression that Mandarin is the dominant variety, that Standard Mandarin is Chinese, that other Chinese varieties are inferior to Mandarin. User:Iambluemon02:56, 15 May 2020 (UTC)Reply
Iambluemon, it looks like you misunderstood my intention. This policy would favour the non-Mandarin languages by giving them more lenient coverage. You claim it will "introduce many errors", which seems like a straw man argument — I challenge you to find even a single unambiguous error in an otherwise reliable source that would be entered into Wiktionary as a result. —Μετάknowledgediscuss/deeds01:54, 15 May 2020 (UTC)Reply
One example of error is the diglossia situation in Cantonese. If you read Mandarin article using Cantonese, it doesn't mean that every Mandarin word automatically transformed into Cantonese word. We can also read Classical Chinese poems using Mandarin, Cantonese, Hokkien, etc, but it doesn't mean that all those words automatically become Mandarin, Cantonese, Hokkien. No, they are just readings, not actual words, not dictionary material. We need stricter criteria, don't have editors copy and paste individual character readings into compound words. Use material from spoken Cantonese (not MSM) that are available in written form. User:Iambluemon02:54, 15 May 2020 (UTC)Reply
@Iambluemon: Please don't make assumptions, you have made a few today. We know all that. We don't include Cantonese words simply because there is a word in Mandarin. I'm talking about a common situation when a newsreader speaks Cantonese while their teleprompters and subtitles on the screen are in standard Chinese (automatically converting what they say into the correct Cantonese). The written and pronounced words will mismatch and written words will not be added as Cantonese, if they don't have Cantonese readings and they are used. --Anatoli T.(обсудить/вклад)04:03, 15 May 2020 (UTC)Reply
While MSC/MSM words are not Cantonese in a strict sense, they can be considered Cantonese in a broader sense. Cantonese is the main spoken language of instruction of Chinese classes in Hong Kong, so MSC/MSM texts are always read in Chinese. Hong Kongers also write MSM texts and are meant to be read in Cantonese (although they can also be read in Mandarin). — justin(r)leung{ (t...) | c=› }03:53, 15 May 2020 (UTC)Reply
@Justinrleung: Consensus among the editing community (i.e. this discussion) suffices. That said, it's a bit unclear to me whether there's consensus regarding the status of Cantonese as a WDL, and we have to decide these together. (I don't know enough about the quantity of Cantonese material that is easily searchable and meets CFI to have an opinion.) —Μετάknowledgediscuss/deeds04:07, 15 May 2020 (UTC)Reply
I forgot to say this, but back in 2012, when Cantonese and Hokkien have their own header, Cantonese and Hokkien are treated as well documented language. Why downgrade their status now in 2020? User:Iambluemon03:00, 15 May 2020 (UTC)Reply
Add Teochew to the list of Chinese dialects that has a growing amount of writing. And there's currently quite a fair bit of Teochew media that is produced in China. I'd probably consider it the third most well-documented dialect of Chinese after Cantonese and Hokkien. The dog2 (talk) 01:55, 24 May 2020 (UTC)Reply
This isn't a competition. We're essentially talking about languages with strong publishing industries, or else which have such an absurd amount of other durably archived media that it makes up for a lack of written material. —Μετάknowledgediscuss/deeds02:08, 24 May 2020 (UTC)Reply
AFAICT, Teochew is nothing close to Hokkien when it comes to the amount of material out there (whether written or spoken). After some thought, although written vernacular Cantonese has a good amount of publication, it cannot compare to the amount of publication in Standard Mandarin / Standard Written Chinese. Thus, the only variety we should consider as a WDL should be Standard Mandarin / Standard Written Chinese. — justin(r)leung{ (t...) | c=› }03:54, 24 May 2020 (UTC)Reply
For sure there is a lot less Teochew than Hokkien material, but it's still one of the better documented dialects of Chinese. And yes, none of the dialects even come close to Mandarin when it comes to written material. I've been to Hong Kong and Macau and even there, written material is mostly in standard Mandarin. The dog2 (talk) 04:26, 24 May 2020 (UTC)Reply
@The dog2: Teochew is irrelevant to this discussion. I don't know if you fully understand the implications of this discussion. If Teochew were to be listed as a WDL, a lot of Teochew entries would have to go for lack of attestation per WT:ATTEST. No one is doubting the existence of Teochew material, which is quite a lot compared to other dialects like Changsha Xiang, just as an example, but it's just not eligible for consideration as a well-documented language. — justin(r)leung{ (t...) | c=› }05:08, 24 May 2020 (UTC)Reply
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘OK, Teochew certainly cannot be considered well-documented. Hokkien and Cantonese are somewhat on the fence, but I'd lean towards not considering then well-documented. The dog2 (talk) 05:18, 24 May 2020 (UTC)Reply
This is better, make it more specific (Standard Written Chinese) rather than changing Chinese to Mandarin which is bad suggestion. Does Standard Written Chinese also include literary/formal Cantonese, the type of Cantonese used in official functions and ceremonies that is based on written Mandarin? User talk:iambluemon08:42, 8 June 2020 (UTC)Reply
Latest comment: 4 years ago25 comments6 people in discussion
Unified Chinese allows us to document the non-Mandarin languages faster: all you need to do is add the pronunciation, and the meaning if different from Mandarin. But this has the disadvantage that only the difference from Mandarin is documented. For example, the 寫字樓 entry currently has the following definitions:
Now it is clear that sense 2 is Cantonese only, but is sense 1 Mandarin only or both Mandarin and Cantonese? And does the absence of the "Hokkien" label mean "this sense is not in Hokkien" or "the editors have not yet considered that language"?
One way to solve the problem is to build {{zh-dial}} data. For example, although 都 has the definitions "all; both" and "(Cantonese) as well; also; too", the {{zh-dial}} tells us that the first sense is also in Cantonese. But this may not be feasible for the smaller entries. Another way is to add examples, but this is not 直觀 for the reader. It is much better to build the senses separately and in full for each language, while retaining the common pronunciation base. (A common pronunciation base is necessary or there will be no place for dialectal readings of Mandarin.) This can be achieved by the following entry layout:
==Chinese==
{{zh-forms|...}}
===Pronunciation===
{{zh-pron
|m=...
|c=...
...
}}
===Definitions===
...
----
==Cantonese==
===Pronunciation===
{{yue-pron}} // transcludes the pronunciation in {{zh-pron}}, showing only |c=
===Definitions===
...
but this requires more typing and part of the advantages of Unified Chinese is lost.)
What do you think of this layout? I think it's better to adopt it incrementally, focusing on words with varying meanings across languages first. Most terms such as 粵語 surely don't need splitting unless they get lots of examples in a variety of languages.
Yes, I am very happy to read this suggestion. I want to be able to differentiate between Chinese words that are used in all Chinese varieties and Chinese words that are only limited to certain dialects and this is very good solution to deal with the problem. It only involve extra typing. I definitely support this proposal. User talk:iambluemon08:55, 8 June 2020 (UTC)Reply
I'm not a big fan of either proposal. I can see many edge cases where the first option would be problematic. In colloquial Cantonese, 奶奶 refers to "one's husband's mother" or "madam", but in the written language, Cantonese speakers may use this word to refer to "paternal grandmother" as well and it would be read in Cantonese (even though it would not be used in actual speech). I also can't imagine the mess it would be if we adopt the first option for long single character entries with multiple etymologies/pronunciations. The second option is essentially reverting back to disunified Chinese with a redundant Chinese section, which is even messier than before. — justin(r)leung{ (t...) | c=› }09:09, 8 June 2020 (UTC)Reply
I like the first option. If it is messy for long single character then we only apply this format for compound entries. if there is different usage in spoken and colloquial Cantonese we can use usage notes to explain the difference. Maybe second option is not so nice. Iambluemon (talk) 09:17, 8 June 2020 (UTC)Reply
Another problem with the first option is that even usage within the dialects of Mandarin or any other major grouping of dialects would have variation. Just look at 阿婆 or 阿公. Are we gonna split it up to every single dialect possible? I don't see how our status quo is much different from other languages like English, where there are lexical differences between different dialects of English. — justin(r)leung{ (t...) | c=› }09:32, 8 June 2020 (UTC)Reply
I don't mind usage differences within dialects of Mandarin, or variations within the same dialect group. At least people will be more careful when adding definitions. Right now people just copy and paste pronunciation without bothering whether it is literary or colloquial or which dialect group the word belongs to. Iambluemon (talk) 10:34, 8 June 2020 (UTC)Reply
Yet another problem with the first option is that we have language varieties in L3 headers. This would automatically push PoS to L4 headers (and when we have more than one etymology, they'd get pushed to L5). Also, if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together. — justin(r)leung{ (t...) | c=› }10:44, 8 June 2020 (UTC)Reply
奶奶 meaning "paternal grandmother" in literary Cantonese: is it a borrowing from Mandarin? Does it occur only in Mandarin contexts (奶奶的……) or does it also occur in Cantonese contexts (奶奶嘅……)? I think we need only cover the colloquial language in ==Cantonese==, ==Hokkien==, etc. The literary language based on Mandarin is already covered under ==Chinese==; the additional L2 headers are for those who want to study the colloquial language without Mandarin influence.
Messiness: I agree that the first option doesn't look good (which is why I reverted this proposal initially). Under the second option, you can still focus on Unified Chinese if you want to. The additional language headers are for those interested in the individual languages (like @Geographyinitiative and me), and they don't need to come with glyph origins, etymologies, etc. The main motivation for having additional language headers is because the current Unified Chinese format doesn't treat the individual languages well: is the "office building" sense of 寫字樓 also in Cantonese? If not, should I add "(Mandarin)" or "(not Cantonese)" if I don't want to research the other languages? etc.
How to split under the first option: Splitting by the so-called 一級方言 (Mandarin, Cantonese, Gan, etc.) should be enough. The current 阿公 entry looks fine and doesn't need splitting. But if it gets dozens of examples in a variety of dialects it might be useful to split by language.
PoS headers pushed to L4 under the first option: Wyang thinks that PoS headers should be abolished for Chinese, and they're currently replaced by the dummy ===Definitions=== header for single-character entries. The L3 language headers were intended to take that place. If this is not possible, one can use the following format instead:
if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together: terms like 我們 may have more pronunciations than definitions.
The current Unified Chinese format is Modern Standard Written Chinese oriented. If we don't solve the problem with additional language headers, what about an extended version of User:Wyang/zh-def that displays a little matrix showing which senses apply to which languages? The cells of the matrix can be simple yeses and noes or they could contain labels like "dialectal" (as in "dialectal Mandarin") or "morpheme" (襪 is a morpheme in Mandarin but a noun in Cantonese). --Nyarukoseijin (talk) 12:17, 10 June 2020 (UTC)Reply
@Nyarukoseijin: 奶奶 is probably not a good example because I personally wouldn't use it in writing for "paternal grandmother". It's possible to see it in Chinese textbooks/books (taught in Cantonese) in Hong Kong though. It's definitely unlikely for people to say 奶奶嘅 to mean "paternal grandmother's". So under your proposals (especially the second option), does that mean we'll have Chinese as Standard Written Chinese and Mandarin, Cantonese, etc. as covering only the colloquial versions? This is very difficult to determine, as the dialects are in a continuum from colloquial to formal (often closer to Standard Written Chinese). Take the word 太陽 as an example. In Hong Kong, 太陽 is quite commonly used in everyday speech and has kind of replaced the more colloquial words (熱頭 or 日頭), but in Taiwanese Hokkien, it seems to be restricted to literary/poetic registers (like in a song). Would we have to split 太陽, and if so, what's the most appropriate way of doing so?
About 寫字樓, in Hong Kong, it's usually the "office" sense, but the "office building" sense is also possible (at least according to some Cantonese dictionaries). The usual implication of not labelling is that it's totally fine at least in Standard Written Chinese across regions. Whatever additional lects listed in {{zh-pron}} would be okay to use the words (to varying extents). Of course, we need to do a better job at labelling and writing usage notes so that we have a representation of all the lects that is as accurate as possible.
Back to your proposals. They seem to allow several formats to coexist (no splitting for 粵語 but splitting for 寫字樓 maybe). How do we decide which format to use? There are always edge cases that would be hard to define. As for the third option you just proposed, I don't think abolishing PoS across the board is the way to go. Chinese may be more "flexible" with PoS due to the lack of overt morphology, but that doesn't mean we should abandon PoS, especially for entries with more than one character. And where would we put literary Chinese (文言文) under your proposals?
About 一級方言, we would definitely need to define these properly. Do we group Min as just Min, we follow Ethnologue and split it as Min Nan (which includes Hainanese, Leizhou Min, and maybe Zhongshan Min), Min Dong, Puxian Min, Min Bei (which includes Min Bei proper and Shaojiang Min) and Min Zhong? Do we group Pinghua under Cantonese? Do we group Shehua with Hakka? What do we do about varieties that Ethnologue doesn't seem to deal with (Xiangnan Tuhua, Shaozhou Tuhua)?
奶奶: If a text has Cantonese pronunciation but Mandarin vocabulary and grammar, I think it should be subsumed under the ==Chinese== header. The Taiwanese Min Nan and Hakka dictionaries by the Ministry of Education ROC have only 阿媽/阿妈 (a-má) and 阿婆 (â-phò), not 奶奶.
Which words to split: My proposal isn't about splitting. It's about building separate dictionaries for the Chinese languages alongside a dictionary of the Chinese macrolanguage. This means that ==Chinese== still has the fullest coverage, just at a higher level. It's similar to the English dictionary market where we have both the Oxford English Dictionary and the Middle English Dictionary. You don't have to build ==Cantonese== if you don't want to.
PoS: I didn't say abolish PoS. Abolish PoS headers and use labels instead. Single-character entries need labels anyway.
Literary Chinese: ==Literary Chinese== of course. The primary meaning of 走 in Literary Chinese is different from that in Modern Chinese, but the current ==Chinese== entry doesn't mention it. --Nyarukoseijin (talk) 08:49, 17 June 2020 (UTC)Reply
@Nyarukoseijin: Thanks for clarifying. So would it be fair to say that what you're proposing is close to how Arabic is treated as of now, i.e. Chinese is reserved for Modern Standard Chinese, and language headers for other varieties should coexist with the Chinese header? For example, would 太陽 be formatted like this: a Chinese header with "sun" and "sunshine" (and "greater yang"?), a Mandarin header with "sun", "sunshine" and "temple" (labelled as SW Mandarin), a Cantonese header with "sun", "sunshine" and "temple" (labelled as Guangxi), a Gan header with "sun", "sunshine" and "temple", a Jin header with "sun" and "sunshine", a Min Bei header with "temple", a Min Nan header with "sun" (labelled as formal/literary) and "temple" (labelled as Leizhou) and a Literary Chinese header with "sun" and "greater yang"? You said 阿公 doesn't need to be split unless we have examples in many dialects. I don't think this is a good criterion for splitting. Entry layout should not revolve around examples. Essentially, we need refine the proposal by specifying the scope that each of "Chinese", "Mandarin", "Cantonese", "Gan", "Hakka", "Jin", "Min Bei", "Min Dong", "Min Nan", "Min Zhong", "Puxian Min", "Wu", "Xiang" and "Literary Chinese" covers if this is to be put through a formal vote. But of course, I still see the current layout as better (and would like to see Arabic follow suit). — justin(r)leung{ (t...) | c=› }16:42, 17 June 2020 (UTC)Reply
I shouldn't have used the word "splitting". It's not about splitting. It's about adding individual languages, not subtracting anything from ==Chinese==. The individual languages would probably be independent from ==Chinese==, like the Dictionary of Old English, the Middle English Dictionary, etc. from the Oxford English Dictionary. This means that (1) ==Chinese== won't be restricted to MSC due to the additional headers, just as the OED isn't restricted to Modern English due to the other dictionaries. (2) Additional headers don't have to be built all at once; they can have their own paces. (3) You don't have to work on them if you don't want to. Building ==Chinese== and {{zh-dial}} content is still the most efficient way to document those languages (and, in an age of language suppression, more ethical). The additional headers are for info which ==Chinese== doesn't handle well (e.g. 'run' being the primary sense of 走 in Literary Chinese), and are completely opt-in. --Nyarukoseijin (talk) 17:58, 17 June 2020 (UTC)Reply
This doesn't sound to me like a fully fledged idea. The OED covers both modern and Middle English (but not Old English), because they don't care that they overlap considerably with the Middle English Dictionary — they're different enterprises, and they have no interest in being consistent. At Wiktionary, we want to make one dictionary for all languages, and being consistent isn't optional — it's necessary. You're saying that if I want information about Cantonese, then sometimes I should look under a 'Chinese' header and sometimes I should look under a 'Cantonese' header, and there will be no way to predict which. That is antithetical to how Wiktionary is organised, and it makes it less usable for both humans and machines. —Μετάknowledgediscuss/deeds18:13, 17 June 2020 (UTC)Reply
I totally agree with @Metaknowledge. We're not an anthology of dictionaries, but one dictionary with many languages. There would be significant overlap if we start allowing Chinese in addition to other language headers unless we define Chinese as something different from what it is now. — justin(r)leung{ (t...) | c=› }21:31, 17 June 2020 (UTC)Reply
@Nyarukoseijin: Hello. You haven't presented a case where the CURRENT structure doesn't work. I don't understand why you want to change something that's not broken (other than in one of our troll's mind). If there are senses (or PoS), which are only specific to Mandarin (not applicable to Cantonese, etc.), they can be marked/labelled so. The current definitions of e.g. 走 are good. If you want to specifically say that the sense is only applicable to Mandarin (or maybe a bunch of other varieties, they can be labelled so), e.g. {{lb|zh|Mandarin|Jin|...}} ... --Anatoli T.(обсудить/вклад)03:32, 18 June 2020 (UTC)Reply
One of the disadvantages of adding labels is that it's all-or-nothing. If you add "(Mandarin)" to the "walk" sense of 走, it would suggest absence in Cantonese, Gan, etc. So you have to research all the languages and dialects at once. Also there are no filters that allow me to see only Cantonese senses and examples/quotations. --Nyarukoseijin (talk) 05:25, 18 June 2020 (UTC)Reply
@Nyarukoseijin: It is a fair point, thank you very much but it's not a show stopper. If the number of such cases was really large, then the unified approach wouldn't work. The differences between lects are of interest to contributors and generally addressed as a matter of priority. The differences often show in very frequent but common words. The higher (more formal) the level of writing, the less differences you find (very true for Arabic varieties as well). There is no clear line there, any where, as dialects borrow from each other. One statement is generally true that the formal written variety of Mandarin is generally applicable to other Chinese lects. No, you don't have to know or research the usage in other lects. Editors ether add what they know of find in dictionaries. --Anatoli T.(обсудить/вклад)07:07, 18 June 2020 (UTC)Reply
Even if Unified Chinese is undone, it has still done more good than bad. When several groups of people are under persecution, saving them at once is more ethical than saving them separately if it allows more people to be saved. And that's what happened: the Mandarin nouns : Cantonese nouns : Wu nouns ratio has changed from 20467 : 317 : 10 to 95,620 : 73,447 : 6,193. My proposal is about experimenting with new ways to document the languages, not abandoning the good old way. --Nyarukoseijin (talk) 06:17, 18 June 2020 (UTC)Reply
I wonder if a system where contributors can confirm/deny that X word is used in Y lect (or specify: it is literary, etc.), and the result would be calculated to produce {{lb}} is feasible. —Suzukaze-c (talk) 06:31, 18 June 2020 (UTC)Reply
I agree. We should try to always improve and fine-tune what we have in place. If someone labels an English word as British, if they are from UK, another editor can Oz or NZ labels, if the same usage applies to Australia or New Zealand. --Anatoli T.(обсудить/вклад)07:07, 18 June 2020 (UTC)Reply
A note from last year that I found today by chance: no: "used in Hokkien" // yes: "Cantonese: no; Hokkien: yes; Teochew: unknown; ..." —Suzukaze-c (talk) 05:11, 20 June 2020 (UTC)Reply
Language treatment: Only the macrolanguage is treated as a language?
Latest comment: 4 years ago7 comments4 people in discussion
Wiktionary:Language treatment says that for Chinese: "Only the macrolanguage is treated as a language". Are we sure about this? Does this mean other varieties not treated as a language in Wiktionary? I think this contradict current practice. In example such as Hsi-ninghttps://en.wiktionary.orghttps://en.wiktionary.org/w/index.php?title=Hsi-ning&type=revision&diff=59443856&oldid=59443845 language code for "Mandarin" is preferred over language code for "Chinese".
The Unified Chinese vote is about treating Chinese varieties under a single header and using "zh" language code. Does it abolish other Chinese varieties or disallow their language code? Someone can explain? User talk:iambluemon 09:00 8 June 2020 (UTC)
@Iambluemon: "Treat as a language" in that context refers to how entries are made. The current practice is that we don't have Cantonese, Mandarin, Xiang, etc. entries, but Chinese entries with Mandarin, Cantonese and/or Xiang subsumed under it. (There are exceptions to this, but I digress.) This does not "abolish other varieties" as there are no other varieties if all Chinese varieties are treated as Chinese. This also has nothing to do with how lects are treated in etymologies. There are many "etymology-only" languages/lects/varieties. — justin(r)leung{ (t...) | c=› }09:25, 8 June 2020 (UTC)Reply
The page for Wiktionary:Language treatment mentions that it is to "document cases where Wiktionary's treatment of lects deviates from that of the ISO/SIL". It doesn't mention that language treatment is in the context of how entries are made. And isn't there a header for Min Nan entry in Wiktionary based on Latin POJ? Maybe the description can be more specific, such as "only the macrolanguage is treated as a language for lects written in Han script". Iambluemon (talk) 11:16, 8 June 2020 (UTC)Reply
(Withdrawn) 01:49, 9 June 2020 (UTC)
Please stop your baseless accusations about any ideologies deeply ingrained at Wiktionary. If anything is missing or is incorrect in a Chinese dialect, it means nobody has added it yet. The use of one L2 "Chinese" for all Chinese varieties only brought positive development to the varieties, otherwise miserably neglected. All the promises to provide a separate treatment for each individual word in any given Chinese lect have been kept. You can define not only pronunciations but usage, part of speech, which are specific to Cantonese, Min Nan, etc. Wikipedia, which you quote so much as superior to Wiktionary, uses zh-min-nan language code for Min Nan, we are fairer, we just use "nan". We provide all readings, Min Nan Wikipedia only uses POJ.
Another thing Geographyinitiative fails to notice is that the 中文 (= Chinese) version of Wikipedia is written in Written Standard Chinese, based on Mandarin. This isn't much different from us in treating Mandarin-based Written Standard Chinese as de facto Chinese. — justin(r)leung{ (t...) | c=› }02:45, 9 June 2020 (UTC)Reply
One big disadvantage of Wikipedia is that their Hokkien version is written in the Latin alphabet, when by far the most common way of writing Hokkien is with Chinese characters. I think this makes their Hokkien version far less accessible to the average Hokkien speaker than it could be. The dog2 (talk) 03:52, 9 June 2020 (UTC)Reply
Yes, I mentioned that above. It fits the agenda of those who prefer the separate treatment. At Wiktionary we provide both the Chinese characters and the romanisation (POJ fro Min Nan). The infrastructure is there for editors. Editors are free to focus only on the terms written in Chinese or POJ. Editing the Chinese characters only doesn't exclude the romanisation to be used (providing the manual or automated transliterations). --Anatoli T.(обсудить/вклад)04:08, 9 June 2020 (UTC)Reply
Latest comment: 4 years ago4 comments2 people in discussion
@Justinrleung Hey, I remember somewhere you talked about the possibility of an in-house romanization for the various Mandarin dialects. I was thinking that it's possible to adapt Sichuanese pinyin as a romanization for most of the Mandarin dialects available in 現代漢語方言大詞典. Maybe the only "complications" are checked tones and number of tones. For checked tones, an "-h" final can be added (like for Nanjing and Yangzhou), and for number of tones, most have 4, but a few have 3 (no problem I guess), and some southern ones have 5 (because they have checked tones). What do you thnk? --Mar vin kaiser (talk) 13:51, 24 June 2020 (UTC)Reply
@Mar vin kaiser: We can definitely look into it, but we'll have to look at them one by one and check other sources. We should start with one representative from each major grouping:
Northeastern: Harbin
Jilu: Jinan
Jiaoliao: Muping
Central Plains: Luoyang, Wanrong, Xi'an, Xining, Xuzhou
Harbin should be pretty straightforward - we could just use pinyin for it. For the other groupings, let's look at Jinan, Muping, Ürümqi and Nanjing for now. We already have coverage of Central Plains with Dungan and Southwestern with Chengdu, so we can probably worry about those later. What do you think? — justin(r)leung{ (t...) | c=› }21:31, 24 June 2020 (UTC)Reply
@Justinrleung: Yeah, that looks good for Nanjing. And maybe it can be used for Yangzhou also, their phonology look almost identical. For Southwestern Mandarin, I was just looking into it, it looks like Sichuanese Pinyin can be used with Guiyang and Wuhan, except maybe it doesn't have "l". I'm gonna look into Jinan next. --Mar vin kaiser (talk) 10:47, 26 June 2020 (UTC)Reply
Allowing Jyutping polysyllabic entries as non-lemmas
Latest comment: 4 years ago1 comment1 person in discussion
I started a discussion in Beer Parlour in April but no one has responded, so I post it here again:
Under the current policy, Jyutping transliterations for Cantonese are only allowed for monosyllables, such as zoeng1, but not polysyllables; while Pinyin transliterations for Mandarin are allowed for both, as in zhāng and jǐnzhāng. I propose that Jyutping should be given the equal status as Pinyin that polysyllables be allowed as non-lemma entries, since Jyutping has acquired the status as the standard phonetic transliteration for Cantonese in Hong Kong, considering that:
Latest comment: 4 years ago6 comments1 person in discussion
(Withdrawn) 00:33, 16 September 2020 (UTC)
@Geographyinitiative: In general usage, it seems to be considered a variant of 璇. In the case of names, basically many variant characters may pop up, so it's not surprising that it's written as 刘璿 rather than 刘璇 (although we should always take Wikipedia with a grain of salt). — justin(r)leung{ (t...) | c=› }00:55, 16 September 2020 (UTC)Reply
(Withdrawn) 01:02, 16 September 2020 (UTC)
@Geographyinitiative: I'll have to look at what other sources say. My gut feeling is that 璇 should shouldn't be listed as a simplified form of 璿, but we should have the definition say "alternative form of 璇". It's probably even better to collapse it as {{zh-forms}} if it's an exact equivalent (which is what I need to check) so that everything is centralized at 璇. — justin(r)leung{ (t...) | c=› }01:05, 16 September 2020 (UTC)Reply
One thing that I guess can be done is to have a special table for uncountable words mass count words that can be expanded right next to the definition. The dog2 (talk) 22:20, 28 September 2020 (UTC)Reply
@Suzukaze-c: I guess that's one way to do it. Another issue is that we could have a lot of arbitrary ones, like 克, 磅, 盒, 箱, 杯, etc., based on how we group things. What stops us from adding these to {{zh-mw}}? (In other words, should we have constraints on what we allow in the template?) — justin(r)leung{ (t...) | c=› }23:11, 28 September 2020 (UTC)Reply
@Tooironic水 is one. We list 瓶; 滴; 池; 盆; 杯 - but these don't refer to the same amounts of water, and these are not intrinsic to 水 (since water isn't really a count noun). We can list even more, like 滴, 鍋, 口, etc., depending on how much water we're referring to. It's not quite good that we list them without explanation. Also, most physical objects could be "classified by" 箱 because we can technically put any physical object in a box. — justin(r)leung{ (t...) | c=› }05:24, 29 September 2020 (UTC)Reply
I see. I have thought about this before. What you say makes sense. Maybe we could vote to only include classifiers for words/senses that are actually countable? ---> Tooironic (talk) 05:51, 29 September 2020 (UTC)Reply
Latest comment: 4 years ago3 comments3 people in discussion
Currently, there's an unwritten rule that {{zh-forms}} and Images should be placed right under the Chinese. However, this leads to two issue.
For entries with long pronunciations, the images are no longer visible once the user scrolls down to the definition. As wiktionary keeps on expanding, all Chinese entries will have long pronunciations sections as they are filled out. For instance, see 鹿.
{{zh-forms}} breaks up the Etymology section making it harder to read. 蝴蝶 is a good example of this issue.
Proposal 1
{{zh-forms}} and links to Wikipedia should be placed at the end of the Etymology section.
Images should be placed directly under the part of speech section, e.g. under Noun
Proposal 2
{{zh-forms}}, links to Wikipedia, and Images should be placed under the part of speech section, e.g. under Noun
The position of {{zh-forms}} shouldn't change. Images can be placed wherever it's logical. That would mean they could either be with definitions or right under {{zh-forms}} (and {{zh-wp}}, if it's there). They should not be placed under pronunciation because they are not part of the pronunciation. — justin(r)leung{ (t...) | c=› }06:46, 3 November 2020 (UTC)Reply
In the case of 小, it is not the same across all dialects. Many southern dialects use 細. While 大 seems to be the same across all dialects, but if you know any dialects that use a different word, go ahead and create the table. Ditto for 水. The dog2 (talk) 17:24, 12 November 2020 (UTC)Reply
I do think it'd be useful to have these even if there's no dialectal difference at all (although if we look hard enough, there may be some somewhere; maybe not in the synchronic data, but in the diachronic data). — justin(r)leung{ (t...) | c=› }17:28, 12 November 2020 (UTC)Reply
@Suzukaze-c: Obviously, I can't speak for all dialects because I don't speak all of them, but yes, for the latter, all the dialects I know use 大. While for 小, that is not the case, because Cantonese, Hokkien and Teochew all used 細. I'm not sure having a dialectal module is necessary if it's the same word across all dialects, though I'm not vehemently opposed to having one either. The dog2 (talk) 17:35, 12 November 2020 (UTC)Reply
Latest comment: 3 years ago9 comments2 people in discussion
@Frigoris, Suzukaze-c, and others; while editing under the Etymology sections of specific Sino-Japanese readings, I have somewhat intuitively reconstructed some Middle Chinese pronunciations which there are none in their modules using patterns in the reconstructed Old Chinese pronunciations. Here are the characters that I have edited with the notes: 樣 (*jɨɐŋH), 婿 (*seiH)
Does anyone know any others that have Old Chinese reconstructions but no known Middle Chinese attestation? There should be a category for that. Thanks, ~ POKéTalker(═◉═) 21:24, 23 February 2021 (UTC)Reply
Slap in the face... last time I edited the shinjitai (new character form in Japan) 様(yō) with the etymology section there was probably no MC pronunciation for 樣--how long was that...
There are some Chinese characters that have a reconstructed Old Chinese pronunciation but no Middle Chinese most likely due to lack of information added to their proper modules in here. Any recommended websites with such information? ~ POKéTalker(═◉═) 21:36, 23 February 2021 (UTC)Reply
@Poketalker: There should theoretically not be such cases because the OC reconstructions should always have MC reflexes. There are two issues: (1) Zhengzhang sometimes reconstructs anachronistically without evidence from early (pre-Han) texts and perhaps reconstructs OC for "late" words, and (2) Guangyun uses different variants than the modern-day standard. In both cases you mentioned, it was the case that Guangyun used a different variant (㨾 and 壻) instead; in this case, we would simply have to move the module or copy the module over to the modern-day standard. The issue of a "late" word not found in Guangyun can't be solved because there probably isn't any "standard" way to reconstruct MC in those cases. — justin(r)leung{ (t...) | c=› }21:47, 23 February 2021 (UTC)Reply
@Poketalker: For 淘, I'm not sure what the basis of Zhengzhang's reconstruction is. It might be based on Jiyun, but I can't seem to find this word in Guangyun. For 瑪瑙, it was written as 碼碯, so I've created the modules for 瑪 and 瑙 based on 碼 and 碯. — justin(r)leung{ (t...) | c=› }17:05, 27 February 2021 (UTC)Reply
Many newspapers published in Mainland China (such as Guangming Daily, 中华读书报 and 文摘报) and many TV programs in Mainland China are available online for free.
Latest comment: 1 year ago2 comments2 people in discussion
(Withdrawn) 16:32, 6 April 2022 (UTC)
Standard Mandarin is much more widely spoken and studied than any other variety of Chinese, so it makes sense that it should appear first in the list of pronunciations. This is part of a broader principle that also applies to other languages: it makes sense to give more focus to the "standard" variety than other varieties. This is not because of "a special status in the realm of linguistics", but rather a practical consideration to help our readers. —Granger (talk·contribs) 18:50, 6 April 2022 (UTC)Reply
(Withdrawn) 19:38, 8 May 2022 (UTC)
My bad if this discussion was taken elsewhere. From here it looks like @Geographyinitiative's concern wasn't addressed.
Russian is "much more widely spoken and studied" (in @Granger's words) than Bulgarian and may have more contributors, but the inequality — if we can call it that — ends there. And so it is for any number of language pairs. How does it "help the users of Wiktionary" that Cantonese is in effect treated as a dialect of Mandarin? Is there a non-circular justification for this? 釆 (talk) 01:45, 21 December 2022 (UTC)Reply
Why are many Chinese words that are from Japanese wasei-kangos not treated as wasei-kangos?
Latest comment: 1 year ago4 comments3 people in discussion
In Wiktionary, only part of the Chinese words that are from Japanese wasei-kangos are added with template "wasei kango" in their section "Etymology" (such as "電話", "進化", "宗教", etc.), while others are not. In fact, words like "階級", "社會", "文明", "主義", "獨裁", etc. are also wasei-kangos. Why are they not treated as wasei-kangos in this website? --NasalCavityRespiratory (talk) 09:44, 10 April 2022 (UTC)Reply
@Fish bowl: So the templates in many entries have been removed because their etymologies have not been verified? So how do they make sure that some of the words like "電話", "進化" are verified to be wasei-kangos? (My native language is not English and I am a new user. Please forgive me.) —NasalCavityRespiratory (talk) 10:35, 10 April 2022 (UTC)Reply
Latest comment: 2 years ago2 comments2 people in discussion
Does the requirement to have a Traditional Chinese form as the lemma still hold for words that exist only in Simplified Chinese form and don't have a Traditional Chinese form (in which case the Traditional-Chinese-as-lemma requirement would force us to invent a Traditional Chinese form out of whole cloth)? Whoop whoop pull upBitching Betty ⚧️ Averted crashes20:27, 15 April 2022 (UTC)Reply
Latest comment: 2 years ago1 comment1 person in discussion
Currently the synonym, compounds, derived terms, etc. sections contains large amount of links to other entries, which is often accompanied the pinyin of that entry. (This includes pinyin automatically created by usage of {{zh-l}}, {{zh-m}}, etc, as well as pinyin in plain wikitext or those used in {{zh-der}}.) There are several issues caused by this:
It belittles and ignores the non-Mandarin languages which do not use pinyin. Note that the pronunciation/romanisation/transliteration of these are only listed at the entry pages and (rarely) example sentences but not under these sections (there might be some, but almost non-existent that I haven't found one yet), even when that link is specified to be limited to that language.(e.g. in 吹水#synonyms)
When generated automatically, it assumes that the entry has a Mandarin pronunciation, even when the entry is not (commonly) used in Mandarin (e.g. 靚#Compounds under Etymology 2).
Editors sometimes would not check the correctness of the automatically generated pinyin, especially when there is a large amount of them in these sections
This creates clutter, inflates the page size considerably, and uses relatively expensive functions to derive pinyin from the provided words/characters
This causes inconsistencies where only some links have pinyin (e.g. see 口/derived terms).
The pinyin can be confused with the language names and/or glosses, since both are in the exact same font style (e.g. in 吹水#Synonyms)
The readers can already check the (often-more-detailed) pronunciation tables via the links, so I believe that removing them will not cause considerable usability issues. -- Wpi31 (talk) 14:52, 4 July 2022 (UTC)Reply
CJKV Character list by Ideographic Description Characters
Latest comment: 2 years ago1 comment1 person in discussion
"(This includes names derived at an older stage of the language.)"
Latest comment: 2 years ago1 comment1 person in discussion
(Withdrawn) 20:13, 3 October 2022 (UTC)
The parenthetical appears to be boilerplate, also used in categories such as Category:English surnames from Spanish. It presumably applies to Chinese too; surely there are, for instance, English surnames derived from Cantonese back in the 19th century when Cantonese phonemically distinguished place of articulation for sibilants. Not everything is a political conspiracy. —Granger (talk·contribs) 20:29, 3 October 2022 (UTC)Reply
"Topolects" vs. "Unified Chinese"
Latest comment: 11 months ago9 comments5 people in discussion
Recently, I began to use "Min Nan" as header for several Min Nan entries, based on the existence of following cases:
Min Nan lemmas of Japanese origin, e.g. 歐兜邁 should be more appropriately expressed in Pe̍h-ōe-jī as o͘-tó͘-bái.
Min Nan lemmas with uncertain etynom and with diverse choices of Han characters (as sematic or phonological loans), e.g. siâⁿ written in 唌, 饗, 眩, 城, 邪, 炫, etc.
Min Nan lemmas with uncertain etynom and without clearly widespread use of Han characters, e.g. phián in phián-thô͘ (to scratch on the ground; to dig soil and turn it over).
Min Nan should not be used as a heading for Chinese character entries, even if the word is exclusively used in Min Nan.
I don't completely object to this practice; I even support the usage of {{zh-see|xx|poj}} to direct Pe̍h-ōe-jī entries to Han character entries. What I object to is the attempt to "unify Chinese" to the extreme and suppress the subjectivity of various topolects in the meantime.
I understand some people want to merge all entries of Chinese languages for the sake of simplicity or even unity. But it is well-known that languages themselves do not possess inherent simplicity, let alone unity. While the identification, naming and classification of languages are considered objective and scientific, the unification of languages should be considered subjective and arbitrary, and often with a political purpose.
Regardless of the motivation for unifying the language, its impact is too severe to ignore. If one attempts to "unify" a language, some varieties will definitely gain prestige and some get marginalized, as blatantly pointed out and preached by the rationale in Wiktionary:Votes/pl-2014-04/Unified_Chinese. However, The fact that 99% of Mandarin lemmas are cross-topolectal is a consequence that Mandarin has long been occupying the written corpus of Chinese languages and dominating in national language policies of many Chinese-speaking countries. If the practice of "unifying Chinese" in Wiktionary is kept implemented but "Chinese" entries are always centered on Modern Standard Mandarin, then the diversity of Chinese languages will only worsen.
Taking a stance to preserve the diversity of language, not to eliminate it, I recommend encouraging any demonstration of the subjectivity of topolect as a language. After all, since we allow Okinawan and the Yonaguni lemmas written in Kana (see 大和 for example), what justifies us to prohibit the Hakka language written in Han characters from having its own lemmas?
Based on the above reasons, I would like to propose some improvements to the current practice:
About heading
If a lemma belongs exclusively to one topolect listed in ISO:636-3, no matter it is written in Han characters or in any allowed romanization, it should be placed under the L2 header specifiying the topolect on its own. The "exclusive usage" should always be attested, of course. For example, 𢯭手 should be placed under the header ==Hakka==, and 鬥跤手 under ==Min Nan==.
If a lemma is used in two or more topolects listed in ISO:636-3, then it can be placed under the header ==Chinese==, in the sense that Chinese is a macrolanguage to which the lemma belongs. For example, 幫忙 (used in Mandarin, Cantonese, etc.) and 鬥相共 (used in Min Nan and Zhao'an Hakka) can be placed under the header ==Chinese==, as currently presented.
Using ISO:636-3 as a criterion is only advisory but not mandatory. More nuanced distinctions are also welcome.
The templates for specific topolects should be accordingly updated.
About category
Topolect-specific thematic categories (e.g. Category:nan:Technology) should be allowed for all topolects.
Template:zh-pron should be modified such that a term is automatically categorized into multiple topolects when the corresponding pronunciations are given.
About the Han character variants
Template:zh-forms, Template:zh-see and their respective modules should be modified so that the variants of Chinese characters in different Chinese topolects can be processed and categorized. For example, the recent addition of the available value trc in Template:zh-see aims to indicate the Taiwanese Southern Min Recommended Characters. Such specification only appears in Min Nan terms. An expedient approach would be adding a parameter to specify the language code and to display the name of the topolect.
Creating modules or glosses specific to each topolect is also feasible.
About etymology
When a Han character is proven a phonological or semantic loan in a certain topolect and therefore owns its own variant forms, it is better to handle the etymology separately. Multiple L3 headers labelling ===Pronunciation x=== or ===Etymology x=== () can be used. See 絚 for an example.
Simply put, if a term is cross-toplectal, it can be treated as a "Chinese" term, taking into account the toplectal varieties. If a term is specific to just one toplect, it is treated independently.
I am not trying to overturn the decision made in Wiktionary:Votes/pl-2014-04/Unified_Chinese. My suggestions are certainly not perfect and require more discussion. I just hope to inspire everyone to take the variety of Chinese language more seriously.
Oppose. The current practice based on the vote is to have all topolect terms, including a large number Cantonese, Min Nan specific terms. {{zh-pron}} takes care of categorisations, which won't include Mandarin, etc. if only |mn= (Min Nan) was specified. Besides we have {{lb|zh|Min Nan}} labelling technique to make it even more specific. ==Min Nan== L2 headers are only used for POJ soft redirects. The rationale on the vote includes an example of a Cantonese-specific word, which is never used outside Cantonese. Anatoli T.(обсудить/вклад)23:31, 6 December 2023 (UTC)Reply
Oppose It leads to confusion in formatting, giving editors an additional hurdle in learning entry creation/formatting. When a term is later found to not only be used in Min Nan but other varieties, there would also be several places where changes need to be made, leading to higher probability of malformed entries; these are usually small changes that only affect categorization, which make them hard to detect. I really warn you against implementing any of these ideas you mention above until consensus has been reached. — justin(r)leung{ (t...) | c=› }00:37, 7 December 2023 (UTC)Reply
One area where we could do better if we are to continue with the unified Chinese approach is with categorization with {{C}}. We should probably have both {{C|zh|X|Y|Z}} and {{C|nan|X|Y|Z}} when a term is used in Min Nan. — justin(r)leung{ (t...) | c=› }00:41, 7 December 2023 (UTC)Reply
I just re-read the proposals above, and there are a few other things I would support. The points under pronunciation are definitely something we should pursue; I don't think there would be anyone opposed to having support for additional varieties. Hailu is already one of the things we're planning to implement in the future.
Partial oppose. I share basically the same views with Justin. While I am certainly in opposition to the "let's dump everything under Chinese" approach, I don't think the suggestions regarding "heading" would be feasible, and perhaps even worse than the existing approach.
On top of that I think that the points under "Han character variants" are symptoms of the problems of the templates {{zh-see}} and {{zh-forms}} - I also find them rather problematic, but I disagree with the suggestions; they should instead be rewritten/replaced with better templates. – wpi (talk) 03:00, 7 December 2023 (UTC)Reply
Oppose. Given that you only make reference to Southern Min and Neo-Hakka, I would like to fill you in that (at the very least) a lot of ISO-636-3 groups are questionable at best and some, like wuu would just end up with the same problem we've started with. Splitting headers, like what wpi and justin have already said, would cause a big mess. Whereas your ideas for zh-pron improvement are interesting, at the current point in time, I don't think any form of what you have proposed would be ergonomic or even feasible in Northern Wu due to how varied and complex the tone sandhi systems are. — 義順 (talk) 08:08, 7 December 2023 (UTC)Reply
too many label aliases?
Latest comment: 7 months ago5 comments3 people in discussion
(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hi everyone. I just implemented functionality to display all the labels in all languages that categorize into a given category. See Category:Taiwanese Hokkien for an example. As can be seen in this category, we have a ton of aliases that produce the Taiwanese Hokkien category. Do we really need all of these aliases? It makes bot work a pain to have to account for all of them and it seems needless. What do people think of cutting down the number to something more reasonable? E.g. do we really need a Taiwanese Hokkien and Hakka label (with 20 or so aliases) at all? Why not just put Taiwanese Hokkien and Hakka separately? Benwing2 (talk) 04:09, 17 March 2024 (UTC)Reply
Please also decide on labels. I can see former "Min Nan" changes back and forth with a couple of days(?) and still inconsistent for translations (new, recent, old) and Chinese entries. Confused between "Southern Min" and "Hokkien" for "nan-hbl" language code.
@Atitarev Yup, my script to fix up translation tables now sorts things alphabetically in nested translations as well as at the top level. Formerly it didn't change the order of nested translations but that's been fixed. In the former state I ran it on all translation tables but I haven't rerun it universally since the fix, only on pages where I renamed "Min Nan" to Hokkien. If you see any translation tables with Southern Min or Min Nan in them, please let me know; I suspect they've been added recently (i.e. after the Mar 1 dump I used to find translation tables with Min Nan translations). Benwing2 (talk) 04:20, 17 March 2024 (UTC)Reply
@Benwing2 Just on the "Taiwanese Hokkien and Hakka" label, I think it's supposed to be used instead of having "Taiwanese Hokkien" and "Taiwanese Hakka" separately. I agree it's silly, though. Theknightwho (talk) 04:43, 17 March 2024 (UTC)Reply
@Theknightwho Yeah I suppose it was added to avoid a bit of redundancy with separate labels Taiwanese Hokkien and Taiwanese Hakka displaying the word "Taiwanese" twice. But that seems hardly enough reason to have the label and if this is really an issue, we can add a capability in Module:labels to compress adjacent labels of certain sorts in certain ways. Benwing2 (talk) 04:47, 17 March 2024 (UTC)Reply
Shuangfeng = Loudi?
Latest comment: 3 months ago2 comments2 people in discussion
Let's categorize single-character entries by radical
Latest comment: 3 months ago2 comments2 people in discussion
This seems like something that must have been thought of and dismissed before, but here goes...
I would like to propose a set of categories that would be added to entries for all hanzi characters covered by the radical and strokes system.
The top category would be called something like "Han characters by radical", and the others would be called "Han characters with radical 一", etc.
This would cover the same ground as Appendix:Chinese radical and its subpages, but I see it as a complement to them rather than a replacement: one would be able to switch back and forth between them depending on personal preference and convenience.
Now for the technical part: all it would require to implement this would be to modify {{Han char}} to generate the categories with sortkeys, and to add code for the categories to the appropriate category modules so {{auto cat}} would know what to do with them.
As for the sort keys: I have created a non-Lua template, {{1chn}}, that will convert any number from 0 to 50 to a single character using Unicode enclosed characters.
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
⓪
①
②
③
④
⑤
⑥
⑦
⑧
⑨
⑩
⑪
⑫
⑬
⑭
⑮
⑯
⑰
⑱
⑲
⑳
㉑
㉒
㉓
㉔
㉕
㉖
㉗
㉘
㉙
㉚
㉛
㉜
㉝
㉞
㉟
㊱
㊲
㊳
㊴
㊵
㊶
㊷
㊸
㊹
㊺
㊻
㊼
㊽
㊾
㊿
This would allow section headers for stroke counts that would be sorted in the correct order, since MediaWiki categories use the first character of the sortkey for those. It would be a simple matter of generating sortkeys from the |as= parameter and the character itself. The only peculiarity is that the Unicode characters for the digits 0-9 are different in appearance from those of the higher numbers. There may also be issues of font coverage for those blocks.
I'm assuming the technical changes will be relatively minor, but I might as well ping @Theknightwho in case I've overlooked something about the backend. I don't know enough Lua to do the module work myself, so I want to be sure I'm not asking too much, and I don't want to add too much system overhead to already overloaded pages. Chuck Entz (talk) 01:12, 27 July 2024 (UTC)Reply
@Chuck Entz I agree with categorising by radical. In terms of the sortkey, I agree that it's a good idea to just use the additional stroke count, since otherwise eveything would be in the same section.
There are some potential complications, as certain characters have different stroke counts depending on the language (or jurisdiction): e.g. 着 has 11 strokes in mainland China and 12 everywhere else, while 漢 has 14 strokes in China, Taiwan and Korea, but 13 in Japan etc. Theknightwho (talk) 01:27, 27 July 2024 (UTC)Reply