Hello, you have come here looking for the meaning of the word User talk:Wyang/Archive1. In DICTIOUS you will not only get to know all the dictionary meanings for the word User talk:Wyang/Archive1, but we will also tell you about its etymology, its characteristics and you will know how to say User talk:Wyang/Archive1 in singular and plural. Everything you need to know about the word User talk:Wyang/Archive1 you have here. The definition of the word User talk:Wyang/Archive1 will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofUser talk:Wyang/Archive1, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
Latest comment: 11 years ago4 comments2 people in discussion
As far as I know, no one has expressed an opinion on the edits you tried to retract. I accidentally blocked the IP that added them (presumably you), but I thought I was blocking another IP when I did it. I unblocked the first IP as soon as I discovered my error. My apologies for any misunderstanding. Chuck Entz (talk) 21:12, 18 January 2013 (UTC)Reply
I'm sure there are lots of people from other countries who disagree with the way we drive on the right side of the road in the US, but I've never heard of any of them driving on the left side just to make a point- it wouldn't change anything important, but it sure would make a mess of things... Chuck Entz (talk) 11:29, 22 January 2013 (UTC)Reply
You are using an inappropriate analogy. Driving on which side of the road is unimportant because the purpose of driving is to reach a destination, and the options are basically equally efficient in this respect. But here the right side of the road is full of bumps and hollows and may not even lead to the desired destination, whilst the left side is well-paved and -targeted. I know this has been raised a zillion times but whenever the issue was raised, the decision-making coterie has been reluctant to realise the benefits and opposed it fiercely. Yes the issue is complicated, but asking people to comply with whatever rules the previous people came up with but not the most logical rules is not the way to resolve this. This issue is going to be raised another zillion times and people need to examine the issue impartially and accept that a change in format is hugely beneficial to further editing. Wyang (talk) 10:38, 24 January 2013 (UTC)Reply
zh-n. is not an intuitive template, and traditionally any Wiktionary templates that use mouseover to show more information, like certain Persian conjugational templates (if my memory serves me well), have an obvious explanatory line at the top. —Μετάknowledgediscuss/deeds02:16, 25 January 2013 (UTC)Reply
điềubí mật just means "secret things", to emphasise that "bí mật" which has both adjective and noun senses, is used here as a noun. Same for sự which is used if the following word has both verb and noun senses. Wyang (talk) 10:39, 2 February 2013 (UTC)Reply
The deal with words such as điều or sự is to display them, anyway. One can use alt=điều bí mật, e.g. điều bí mật (changing now). Oh, please don't be too sensitive on us using "cmn" instead of "zh". Your language hasn't been destroyed. We cover dialects as well, especially "yue" and "nan". We had long discussions and votes, so we had to compromise. If you add "cmn" in the Babel, users will know that you speak Chinese, especially its standard and most known form - Mandarin. --Anatoli(обсудить/вклад)11:43, 3 February 2013 (UTC)Reply
I have replaced 中文 with 普通話/國語 and 普通话/国语 in language user templates, even though it doesn't cover all names for Mandarin, these templates separate Mandarin speakers from 粵語/粤语 speakers, etc. --Anatoli(обсудить/вклад)00:54, 4 February 2013 (UTC)Reply
I speak non-Mandarin Chinese natively (and I don't really know which branch of Chinese this dialect falls under under ISO 639), and can communicate well (4) in Modern Standard Chinese (MSC). I can't find templates to accurately describe this situation. {{User zh}} doesn't exist any more, and {{User cmn-4}} is confusing MSC (no code) with Mandarin (a group of Chinese dialects, code: cmn). Wyang (talk) 01:41, 4 February 2013 (UTC)Reply
The branches ("dialects") are somewhat determinable by region... I'm sure you could figure out what it's called in English by the ISO if you look it up on Wikipedia. If you feel comfortable telling me where you grew up, I could help finding possibilities for you to check. I think grouping MSC with actual Beijing region dialects is probably the best way to solve that particular subproblem, just based on coverage and similarity. —Μετάknowledgediscuss/deeds01:47, 4 February 2013 (UTC)Reply
We don't have templates for all dialects. That maybe a problem but small dialects are usually not in big demand. You can add your dialect to your user page, so people could find you. We merge here Mandarin and MSC, treating entries/translations in standard Mandarin and Northern Chinese as one language. Using {{qualifier}}, marking words as regional dialects and using other means. Everything is solvable, you can even create templates for your dialect, discuss some technical details first. --Anatoli(обсудить/вклад)01:52, 4 February 2013 (UTC)Reply
Which dialect is unimportant, because that is too specific. It doesn't make sense that {{User zh}} doesn't exist, while {{User ar}} or {{User ms}} do. All are macrolanguages. Arabic or Malay speakers are likely to perceive themselves as speaking Arabic or Malay (not some variety that has an ISO code) when encountering non-speakers, the same way that Chinese speakers do. Wyang (talk) 02:37, 4 February 2013 (UTC)Reply
Standard or most common Arabic (including certain colloquialism common to various dialects, loanwords) us "ar". For dialects we have "ary", "arz", etc. Arabic wan't heavily discussed, we didn't have battles and multiple votes about. As I said, a while ago we have reached a compromise for translations:
* Chinese:
*: Cantonese:
*: Mandarin:
*: Min Nan:
etc.
Having "Chinese" as the main header for entries was rejected by some of your compatriots and Taiwanese people and others. "Mandarin" is more specific than "Chinese". When one says "an Arabic word", no-one immediately question which variety, "a Chinese word" raises questions like "Mandarin or Cantonese". If we used zh instead of cmn, yue and nan a and "Chinese" instead of "Mandarin", "Cantonese" and "Min Nan" we would have a mix-up. In any case, things are the way they are, if you want to open this can of worms, then you can start a discussion in the Beer parlour. I personally don't want any change and other Chinese editors (native or learners) got used to the status quo. --Anatoli(обсудить/вклад)03:03, 4 February 2013 (UTC)Reply
Or,
* Chinese:
*: Cantonese: ]
*: Classical Chinese: ]
*: Gan: ]
*: Hakka: ]
*: Huizhou: ]
*: Jinyu: ]
*: Mandarin: ]
*: Middle Chinese: ]
*: Min Bei: ]
*: Min Dong: ]
*: Min Nan: ]
*: Min Zhong: ]
*: Old Chinese: ]
*: Pu Xian: ]
*: Xiang: ]
*: Wu: ]
Latest comment: 11 years ago1 comment1 person in discussion
From experience I know that very large switch statements like the one in that template are very slow. I hope you'll take that into account and not use this template often, or always substitute it. You can also try an alternative approach, by using subpages, one for each character, in the same way as {{langrev}}. That would be faster I think, especially when there are many options. —CodeCat03:44, 8 February 2013 (UTC)Reply
နေ
Latest comment: 11 years ago4 comments2 people in discussion
Latest comment: 11 years ago18 comments5 people in discussion
First off, many thanks for your various ZH and KO term additions! (I don't suppose you have any more detail about 아귀(agwi) etym 3, like first appearance or quotes or anything?)
I read above that “equating "Mandarin" with (or using it to denote) "Standard Chinese" or "Written vernacular Chinese" is just outright wrong.” However, the EN WP article on Standard Chinese says right in the first line that MSC == Mandarin, leaving me confused. I ask purely out of ignorance -- I studied some 普通话 for a couple semesters in university, but most of my time is taken up with Japanese. Given my meager understanding of the wide varieties of Chinese, I'm left wondering what MSC as a spoken lect would equate to, if not Mandarin? I thought Mandarin was the proper English label for 普通话, and I thought too that 普通话 was the same thing as MSC, but perhaps I'm way off the mark? Does "Mandarin", as you understand it, mean the Beijing dialects more specifically? Curious, -- Eiríkr Útlendi │ Tala við mig07:46, 17 February 2013 (UTC)Reply
Thanks. For agwi, I could only find quotations and dialectal forms (agu, akku), not Middle Korean forms. The addition of the obsolete sense of "mouth" by KYPark seems reasonable but needs checking though (may be dialectal instead; I only know of agari). A possible etymological connection between these is interesting: ag- (, ağız) is the common Altaic word for "mouth", "surviving" (from an Altaicist's POV) in Modern Korean as agari (derogatory: "mouth").
Wrt MSC, "Mandarin" is the name for a group of Chinese dialects, while MSC is a standardised variety of Chinese. There is no "Standard Mandarin" really, as MSC and written vernacular Chinese (the standardised written form of MSC) serve as de facto standards for spoken (in PRC, ROC, sg) and written (all Chinese-speaking regions) Chinese. Wyang (talk) 01:10, 18 February 2013 (UTC)Reply
Thanks for both answers. Interesting about ag-; I note also that Turkish ağız purportedly derives from Proto-Turkic *āgıŕ, and that final "r" appears to have echoes in KO agari and JA anguri (“agape, gawping”). That said, JA anguri looks like it might ultimately derive from verb aku, “to open”. That might still be traceable to Altaic “mouth” words, but it seems to get tenuous, unless Altaic also has words of similar sound that have to do with “opening”. I note that KO 열다(yeolda) doesn't seem to include any such ak or ag elements, though I suppose this might be the result of some phonetic change from an earlier form. That said, it looks like Old Turkic had aç- (“to open”), from Proto-Turkic*aç-, *ač-(“to open”), which is interestingly close to JA root ak “to open”. -- Eiríkr Útlendi │ Tala við mig01:24, 18 February 2013 (UTC)Reply
Sorry, Wyang but I'm sure your answer to the second question is biased. In the Western world, "Mandarin" (language) stands for two things - 1) the most common Chinese dialect (or group of dialects) - 官话 (Guānhuà), 北方话 (Běifānghuà) and 2) the standard Chinese (Putonghua, Guoyu, Huayu) - 普通话 (Pǔtōnghuà), 国语 (Guóyǔ), 华语 (Huáyǔ). It's just a reality. People study Mandarin at universities. Even though "standard Chinese" would a more correct term, it's seldom used, even in academic circles. Dictionary names still use just "Chinese", e.g. Chinese-English dictionary. --Anatoli(обсудить/вклад)01:27, 18 February 2013 (UTC)Reply
Yes, but the majority of people who call it that probably think of Chinese as a simple dichotomy between Mandarin and Cantonese.. Wyang (talk) 02:57, 18 February 2013 (UTC)Reply
There may be some who do but dictionaries only describe the language as it used. I can attest that Mandarin classes where people are especially aware what Mandarin actually is, still use either Mandarin or Chinese to refer to the standard Chinese language they study, even when they study standard Chinese. There are too many names and too many language codes. The current practice is not based on the lack of knowledge or confusion but a compromise. We use "Mandarin" header, even if we talk about Northern Chinese dialects (not a standard Chinese term), like 啥, etc. --Anatoli(обсудить/вклад)03:33, 18 February 2013 (UTC)Reply
I believe it is an inappropriate and inefficient compromise, as the 15 or so headings for Chinese will largely turn out to be reduplications of each other eventually. Wyang (talk) 03:43, 18 February 2013 (UTC)Reply
You probably mean a different issue now. Words that are 95-99% identical in dialects but are split into Mandarin, Cantonese, etc.? There are not many editors eager to develop dialects. Min Nan and Cantonese are a slight exception. I don't think you'll have luck persuading the community to merge them into one language but if you stay longer, you may get a case. --Anatoli(обсудить/вклад)03:51, 18 February 2013 (UTC)Reply
That's what I meant. Using "Mandarin" to denote something that should be more appropriately labelled "Chinese" only seems fine now because there is currently practically nil additions in other varieties, but it will increasingly appear less appropriate as the category of "Mandarin" start to become saturated and other varieties grow. Wyang (talk) 04:02, 18 February 2013 (UTC)Reply
Well, the community warmed up over time to merging Serbo-Croatian varieties, Romanian and Moldavian, Indonesian and Malay wasn't successful, Hindi and Urdu was never attempted. You can always try and raise it again at Beer parlour. What are you suggesting? Having ==Chinese== header and list all dialect pronunciations? --Anatoli(обсудить/вклад)04:12, 18 February 2013 (UTC)Reply
FWIW, I agree as well, not least as there is simply so much overlap between the various Chinese languages/dialects. It just seems wrong to have the headword and etym duplicated so many times over one single page. And most defs, too, at that.
Hmmm. 'Chinese' is linguistically inaccurate and likely to cause a godawful mess. On the bright side, we already have a godawful mess that is arguably worse. The pronunciation section could also be solved by means of Lua if all topolects go straight from romanisation to IPA without a hitch (Lua will hopefully also remove the need for overly complex templates like py-to-ipa and grc-ipa-rows). —Μετάknowledgediscuss/deeds05:02, 18 February 2013 (UTC)Reply
@Wyang. All depends how strong the case is, how you present it. You need to know the moods of other Chinese editors and your possible opponents - their arguments. The arguments will need to be addressed. I'd hate to set up votes myself, since my only vote on banning entries like "Planck常数" in Chinese failed (Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries), even with a compromised solution to having them as soft redirects. I would probably support your idea in the vote. --Anatoli(обсудить/вклад)05:04, 18 February 2013 (UTC)Reply
Well, I'll try to explain more clearly. This approach, or to be exact an approach very similar to this approach is, what they use at zh.wikt, right? It works at zh.wikt because Chinese is something that everybody there knows and that enough people are willing to upkeep. Around here, the merge wouldn't be pretty. Sure, Hakka and Wu will go without a fight. But Cantonese entries, for example, will sometimes diverge from other languages or have a different level of detail and merging those will often take a human, not to mention that I'm already assuming that somebody is running a bot to do all the easy parts and to import data from zh.wikt, most likely. There are a lot of characters and shared words. So, if you're volunteering to write and run a bot, to sift through entries and to edit your way through massive categories, I still might not support, but I wouldn't oppose. The problem is that if we don't have someone, we could get a mess at least as bad as this one, especially if our format got frozen half-and-half Chinese-based/topolect-based or something horrible like that. —Μετάknowledgediscuss/deeds16:04, 18 February 2013 (UTC)Reply
Is the m/s actually a part of the reconstruction, or does it indicate alternative reconstructions? Normally, alternative forms get their own page. —CodeCat23:47, 23 February 2013 (UTC)Reply
*m- and *s- were the prefixes reconstructable that could be added to the stem/root to form derivatives. So the whole etymon (treated as a single unit, a word family which would contain multiple allofams) is written *m/s-. Wyang (talk) 23:57, 23 February 2013 (UTC)Reply
Doesn't that technically mean that no single form is actually reconstructable for Proto-Sino-Tibetan proper? If this word has two different prefixes that cannot be cognates, then it seems to me that this word didn't actually exist in PST but was only formed later, and that one branch used s- while the other used m-. —CodeCat00:16, 24 February 2013 (UTC)Reply
It is possible for differently prefixed forms to exist in a language simultaneously, with the derived words having divergent or largely identical meanings. For example: ཉལ་བ. Wyang (talk) 00:21, 24 February 2013 (UTC)Reply
Latest comment: 11 years ago2 comments2 people in discussion
As per you requested on the talk page, I have just added an etymology to that word (plus examples, derived terms, and a picture). --Pereru (talk) 09:51, 26 February 2013 (UTC)Reply
The more I work with Lua, the more I realise how useful it is. It allows you to remove templates and parameters that aren't actually necessary because Lua is able to split strings and look at individual characters. A Lua function that converts, say, Hangul to IPA could be written in only a few lines of code. —CodeCat00:09, 27 February 2013 (UTC)Reply
Latest comment: 11 years ago2 comments2 people in discussion
I dimly remember reading about the reason that the afterworld is associated with yellow springs, but it's been quite a while and I've forgotten most of the story. Could you add something about that to the etym, assuming of course that you're familiar with the tale? It's a bit obscure otherwise. :) -- Eiríkr Útlendi │ Tala við mig15:29, 13 March 2013 (UTC)Reply
Latest comment: 11 years ago5 comments2 people in discussion
I was looking around the Tubes for a handy chart showing how Middle Chinese initials, medials, and finals generally change upon entrance into Japanese, Korean, and Vietnamese, but I could only find charts with example words, for the most part, not generalizations about the phonemes. To what degree is it in fact predictable, and if that degree is high enough, is there a chart anywhere? Thank you —Μετάknowledgediscuss/deeds17:51, 23 March 2013 (UTC)Reply
I'm pretty confused... *nyijH and *sijH have the same syllable coda and tone AFAICT. The Sino-Japanese descendants have the same vowel, but the Korean and Vietnamese ones don't. Why is that? —Μετάknowledgediscuss/deeds18:20, 23 March 2013 (UTC)Reply
There are detailed explanations of how each Middle Chinese initial/final/tone corresponds to modern Chinese and Sinoxenic readings in many publications written in CJKV languages, however maybe not so much in English. I had a search of Wikipedia and could only find this rather (unnecessarily) rudimentary page at Sino-Japanese vocabulary; the other language versions are more detailed: ja:音読み, ja:漢音, ja:呉音, ko:한국 한자음, zh:漢越音. As for predictability, I estimate 90-95% of modern readings to be regular. The percentage is a lot less in Chinese varieties with prominent literary/colloquial distinctions. The reason for the difference in the vowels is the difference in MC initials. This -ij rhyme corresponds to:
Japanese: -i
Middle Korean: -uy (velar and laryngeal initials) (> Modern i), -o (coronal sibilant initials) (> a), -i (else)
I see, excellent. And thank you for the Wikipedia links; they are very slow reading for me, but better than the English resources by far. How often does the initial affect the vowel like that? (PS: All you have to do is learn Lua and you will be worthy of worship. Your knowledge is impressive in the extreme.) —Μετάknowledgediscuss/deeds16:14, 24 March 2013 (UTC)Reply
I'm flattered :) Influence by initial or medial glide is very common. Almost every rime corresponds to multiple finals in the modern language, with the exact reflex depending on initial or glide (-y-, -w-, -ɣ-). Just found a good Wikipedia article explaining the correspondence between MC finals and Beijing Mandarin ones: Middle Chinese finals. Wyang (talk) 00:31, 25 March 2013 (UTC)Reply
Found an issue
Latest comment: 11 years ago3 comments2 people in discussion
|e1= , |e2= : new etymology section, definitions for the first and second characters
|c1= , |c2= : if components for etymology are different from the first and second characters
| : more definitions
|wp= : link to zh.wikipedia
eg.
市政厅: {{subst:cmn new/a|p1=shì|p2=zhèng|p3=tīng|n|] ]|type=21|c1=市政|c2=厅|e1=municipal government|e2=hall; designation for a certain level of government|wp=y}}
落實: {{subst:cmn new/a|p1=luò|p2=shí|a|]; ]|v|to ]; to ]|e1=to fall, to settle|e2=true, real}}
Thank you, that's extremely helpful. I don't actually speak IPA but, I dunno, it doesn't look right to me. E.g. at 内地 is that really how it's written? So many numbers... ---> Tooironic (talk) 01:15, 9 April 2013 (UTC)Reply
That's just the tones and the tone sandhis in Beijing Mandarin. Superscripts 1-5 are the same as tone symbols ˩˨˧˦˥; they are easier to recognise typographically. Wyang (talk) 01:19, 9 April 2013 (UTC)Reply
I got, thanks. Have you documented the code somewhere? c1 and c2 don't seem to work. I tried {{subst:cmn new/a|p1=shì|p2=zhèng|p3=tīng|n|] ]|type=21|c1=市政|c2=厅|e1=municipal government|e2=hall; designation for a certain level of government|wp=y}}. Also how do you create entries for simp. or trad. only (not both)?--Anatoli(обсудить/вклад)03:58, 9 April 2013 (UTC)Reply
It worked when I tried on 市政厅 (). The instructions are at {{cmn new}}. This code is for one entry only, so for simp+trad, you have to submit the code on both pages, and for one of simp/trad, it's one submission at the missing entry. Wyang (talk) 04:10, 9 April 2013 (UTC)Reply
Nice tool!I got it to work with the shorter version (cmn new/a) on 废除 and 廢除
User:Ruakh has developed a nifty tool (User:Ruakh/Tbot.js) for accelerated Russian entries creation from translation sections. I enabled it here. Do you think you could create the same for Mandarin? Just clicking on a red link (green if the script is enabled) in translations creates an entry in Russian. I'm just filling the rest manually (inflection, etc.). --Anatoli(обсудить/вклад)04:28, 9 April 2013 (UTC)Reply
The cases are a little different. In translations {{t|cmn for trad is not assigned a transcription, so it would not (?) be possible to extract pinyin for the missing trad entry. I'm not used to writing .js things like that, so having me to digest what's written there probably will take days. The current code is simple enough, for me... Wyang (talk) 04:41, 9 April 2013 (UTC)Reply
I thought I'd let you know. The transcription for trad. is the same as for simpl., so tr= can be copied from simplified.
After enabling 'cmn' and clicking on the green link (减弱 - appears green on mine in Translation section) in abate#Translations I instantaneously got this:
=={{subst:cmn}}==
===Verb===
{{head|cmn|verb|tr=jiǎnruò}}
# ] {{gloss|to bring down or reduce to a lower state}}
The gloss, the part of speech, tr is all there, only the code is generic and uses {{t|head. I find both yours and his work amazing pieces for accelerated development, keep up the good job. --Anatoli(обсудить/вклад)04:52, 9 April 2013 (UTC)Reply
With cmn the trad-simp conversion has to be done in addition, which is preferably achieved through substitution of existing trad-simp lists (which is what {{cmn_new is doing). I envisaged filling a missing entry with the code {{subst:cmn new/a|p1=PINYIN|PoS|defn}} when clicked, then having to decompose the PINYIN into syllables separated by |p2= etc. Since trad form is not assigned a |tr= in translation sections, one would have to copy the pinyin manually. So for simp it's one extra step of decomposition of PINYIN, while for trad it's two extra steps. The overall process is not hugely simpler than substituting {{cmn_new from scratch, that way the definition and SoPness are also checked (which is important for cmn as the definitions are likely different from the translation glosses). Wyang (talk) 05:04, 9 April 2013 (UTC)Reply
Adding new languages
Latest comment: 11 years ago11 comments2 people in discussion
Hi,
Sorr to be a nuisance. Could you please write a basic script to create new Russian and Japanese entries (in this order) like you did for Mandarin, Korean and Vietnamese?
A basic Russian (ru) noun entry is very simple but it needs gender (g) and transliteration. For example: "лютик" - a buttercup (leaving the entry uncreated)
Interjections, conjunctions, particles, prepositions use {{head||ru|preposition...
Japanese entries are more complicated, divided into hiragana, katakana, kanji and I don't know if it's feasible. --Anatoli(обсудить/вклад)00:45, 11 April 2013 (UTC)Reply
No worries. Russian one done at {{ru new}}. Don't really know how Japanese entries should be formatted. If you can point to me all format possibilities, that'd be great. Wyang (talk) 01:12, 11 April 2013 (UTC)Reply
Wow! That was quick. Thank you! Will test and come with some feedback. Japanese can be basic and more complex, depends how far you're willing to go. Didn't give as I didn't know if you would agree. --Anatoli(обсудить/вклад)01:36, 11 April 2013 (UTC)Reply
Japanese should be alright. Forms are easy to detect (see if it's a pure hiragana or katakana string, assign it as such if so; otherwise, kanji if no kana present, mixed if kana present), and script conversion shouldn't be an issue as well (hira to kana and to romaji, or the reverses, depending on requirement). Wyang (talk) 01:40, 11 April 2013 (UTC)Reply
JA noun example 愛国心, kanji, need to provide kanjitab (this is done simpler than Chinese hanzi), wikified hiragana, romaji:
JA noun example アニメ , katakana, need to provide romaji and, hidden index (convert katakana to hiragana アニメ -> あにめ but without voiced consonants (e.g. が (ga)-> か (ka)) - this part may be hard, will check with Haplology or Eirikr). It's fine if just a parameter, without any tricks.
Hi, no pressure at all, just reminding that I still need Japanese templates. A simple template, without IPA or script conversions will do, as long as the formatting matches the above. --Anatoli(обсудить/вклад)03:25, 12 April 2013 (UTC)Reply
Fixed. (I didn't know what's the parameter for mixed script, so I just put 'm') Other PoS enabled too. Code has been altered, see the template for example usages. Wyang (talk) 04:48, 12 April 2013 (UTC)Reply
Script detecting
Latest comment: 11 years ago1 comment1 person in discussion
Latest comment: 11 years ago2 comments2 people in discussion
I am a serial pest and I'd like to ask you for two more very simple templates - pinyin and romaji, if you haven't created them yet. (I hope I can learn from these how to create my own).
Pinyin is standard, romaji is still in a bit of limbo but we have many thousands romaji entries, so I don't see them reverted soon.
Pinyin can have one or two parameters (and more) per line, e.g. dòngnéng:
They are not as important and the templates are already very easy. Perhaps this could be done differently, by some accelerated method like English plurals (green links) or something.
--Anatoli(обсудить/вклад)06:50, 12 April 2013 (UTC)Reply
Latest comment: 11 years ago10 comments3 people in discussion
Hi,
I think Template:cmn new incorrectly generated rs value for "纯洁". Should be 纟04, not 糸04. But I used used "t", not "s" parameter (by mistake). It didn't matter on "厂长" though, where I also used "t" (I wonder how the script figured it out). --Anatoli(обсудить/вклад)01:12, 17 April 2013 (UTC)Reply
I see. They are basically the same radical, one is the combining or simplified form (纟) of the radical form (糸). Characters should be listed under the radical forms. I don't think I'm able to separate the two at the code level, since the subpages of Index:Chinese radical (where I extracted my rs values from) do not differentiate the two. I suggest merging the combining or simplified forms into the radical forms (like those subpages), which can be done if you replace all the {{{rs|}}} in Category:Mandarin headword-line templates with {{#invoke:zh|sortkey_conv|{{{rs|}}}}} (They are protected so I can't edit them). Wyang (talk) 01:39, 17 April 2013 (UTC)Reply
Thank you. Automatic script detections seems to work well. I know that 纟 is a simplified form of 糸. They've been usually sorted differently though. You lost me about your suggestion. Are you basically suggesting that both simp and rad. were sorted the same way, if they use radicals, not pinyin for sorting? Will this also affect characters like 門 and 开? Not sure if I understand this correctly. --Anatoli(обсудить/вклад)02:17, 17 April 2013 (UTC)Reply
I was suggesting that the combining forms and simplified forms of the radicals be treated as identical to the basic radical forms. It would affect 門 but not 开. Or, even better, just sort everything in pinyin and get rid of this parameter altogether (currently {{cmn-noun}} uses an awful mix of sorting methods if you have a look at the code). Wyang (talk) 03:35, 17 April 2013 (UTC)Reply
From a Western perspective, I was wondering this whole time why we don't just sort by pinyin. I assumed it is out of respect to traditional Chinese lexicography. In any case, I'd support that. —Μετάknowledgediscuss/deeds03:57, 17 April 2013 (UTC)Reply
Sorry, I put the wrong character, I meant 開 and 开, the two equivalents (t/s).
@Metaknowledge. The arguable benefit for sorting by radical, not pinyin was meant for people (primarily Chinese) not familiar with pinyin, speakers of dialects. Especially applicable to overseas Chinese where pinyin was not taught and various input methods exist, which don't rely on pronunciation of characters. The designer of this method - User:A-cai comes from Taiwan where dictionaries are also structured more towards radicals and romanisation, including two variants of pinyin have changed over time and still don't enjoy the full support of the population.
I already expressed my support to sort words by numbered pinyin. If you're able to do it, go ahead. All Chinese speaking editors already expressed supported. I think for people not knowing Chinese, it's still possible to find words they want just by entering them in the search window. Sorting by pinyin is more beneficial for learners than for native speakers.
I don't think I support creating such categories (for Chinese and Japanese). It's going to be more troublesome to maintain those categories for Chinese. All one has to do to find all compounds and related pages of 始 is to go to or some page like this. Wyang (talk) 05:08, 17 April 2013 (UTC)Reply
The trouble with those links are that they mix all other languages using the same character, translations and pinyin entries. The format is no user-friendly either.
Daniel might be able to help with making it work. When a Japanese entry is created, the categories are added automatically. Their structure is identical and only differs by the kanji and sort value. The category Category:Japanese_terms_spelled_with_始 lists only Japanese words, no other. Just looking at the list is educational and shows how words can be created with the character, especially common infixes or suffixes. I think creation of characters can be automated as well. --Anatoli(обсудить/вклад)05:24, 17 April 2013 (UTC)Reply
Creating these is easy, but I am not particularly fond of the idea. These should be information provided at the character page, which should explain the definitions and the relevant compounds. It shouldn't be done with thousands of categories. Wyang (talk) 05:29, 17 April 2013 (UTC)Reply
Latest comment: 11 years ago8 comments2 people in discussion
It may be hard to generate correct IPA for erhua (儿化) by your template. Besides, the reading can be as expected, like 女儿. Anyway, could you fix the IPA and check the entries otherwise, please?
|p1= should be 'jīnr' and |p2= is empty (although the template automatically generates pronunciation for the second character too... will solve this with the new Module:zh-based template). The pronunciation is still generatable by substituting {{py-to-ipa}}, eg. {{subst:py-to-ipa|jīn|er1=y}}. I've created {{erhua form}}. These are not really alternative forms, they are diminutive forms (like Dutch -je, English -ling) with sometimes different meanings. Wyang (talk) 01:58, 19 April 2013 (UTC)Reply
Thanks. Just on the erhua forms. It may be worth separating words that simply attach 儿: 没事-> 没事儿 to those that replace another form? 今儿 = 今天, 这儿 = 这里. I think 今 is seldom used as "today" and 这 doesn't mean "here". --Anatoli(обсудить/вклад)02:05, 19 April 2013 (UTC)Reply
这儿 is formed from 这 ("this, such, here"), not 这里. The development of 这 -> 这里 is parallel to the development of 这 -> 这儿. The original un-erhua-ed forms may not be in use colloquially, but etymologically this is how erhua forms are generated. Wyang (talk) 02:10, 19 April 2013 (UTC)Reply
Latest comment: 11 years ago1 comment1 person in discussion
I was hoping you might be able to help with the etymology for Japanese 年度. I suspect this was in use in China some time back, but there's a slim chance it's a more modern coinage. Do you have any insight? If so, please change the etymology there as appropriate. TIA, -- Eiríkr Útlendi │ Tala við mig16:49, 15 May 2013 (UTC)Reply
Come back
Latest comment: 11 years ago1 comment1 person in discussion
Latest comment: 11 years ago2 comments2 people in discussion
I'm chewing on the etymologies for various kinds of 如来. I'm working on the assumption that Buddhist terms would have been imported from Chinese wholesale; JA-JA dictionaries give etymologies traced back to Sanskrit, which must have come via China, especially considering the history of Buddhism and even literacy in Japan.
However, I'm uncertain if names like 大日如来 were brought into Japanese as an integral unit, or if the name portion of 大日 came into Japanese, with the 如来 added in Japan. I ask because I'm noticing that some of the w:Five Dhyani Buddhas show up on the ZH WT with the epithet 佛 instead of 如來/如来(rúlái), such as zh:w:不空成就佛 vs. ja:w:不空成就如来. I do see that zh:w:不空成就如來 redirects to the zh:w:不空成就佛 entry, and google:"不空成就如來是" does find over 100K hits, but in terms of etymologies, I'm not sure what's best.
Latest comment: 11 years ago3 comments2 people in discussion
Thank you for your etymological activities of late, I very much appreciate the fuller picture of various KO terms.
Along similar lines, I was wondering about 아버지 (abeoji). Modern JA has 祖父, 翁(ōji, “grandfather; old man”), deriving from OJP opoji, which at first glance looks like a possible relative to KO 아버지 (abeoji).
That said, OJP opoji is itself a compound of opo “big, great; many” (root of modern JA 大きい(ōkii, “big, great”), 多い(ōi, “many”)) + ji < chi = 父(chi, honorific form of address for males). Any chance that 아버지 (abeoji) is also of compound derivation?
abeoji was abi in the 15th century. The form abeoji clearly violates vowel harmony and seems to be of late origin, formed from abi + some sort of suffix -Aji. The -Aji (-아지/어지) suffix is probably the same suffix as in 바가지 (< 박, "gourd"), 싸가지 (< 싹, "hope" < "bud"), or even the diminutive suffix -ngaji in 송아지, 강아지. The first component abi was probably Altaic: Turkish aba, Mongolian abu. I'm not sure of the etymology of the Japanese words. Wyang (talk) 00:28, 26 July 2013 (UTC)Reply
Brilliant, exactly the kind of detail I was hoping for. It sounds pretty clear from your description then that opoji and abeoji are only superficially similar, and that OJP chi, "male" is no match for Middle Korean aji, "diminutive suffix". :) Thank you! ‑‑ Eiríkr Útlendi │ Tala við mig01:12, 26 July 2013 (UTC)Reply
“Phonosemantic interpretation” of Chinese characters
Latest comment: 11 years ago1 comment1 person in discussion
Thank you! Do rule match those described by Shinji, his Korean link and the French templates? If you don't know, then I guess, I will need to run test cases with your templates as well and see if they are acceptable. Can it work for single words only, not strings with spaces? --Anatoli(обсудить/вклад)03:22, 11 November 2013 (UTC)Reply
It basically matches the official guidelines, except that it uses dashes in consonant + vowel syllable divisions, i.e. bur-yaseong instead of buryaseong (This can be changed by replacing all "-" in the produced string). This template was written before the advent of Lua, so it is quite slow and may not be useful with long strings, but for simple strings like 5 - 6 characters this should be sufficient. To incorporate Lua into this template would probably involve a quite substantial rewrite. Wyang (talk) 03:29, 11 November 2013 (UTC)Reply
I just wanted to verify what it does. If it can work with short strings but accurately, it can still be used to verify {,"framed":false,"label":"Reply","flags":,"classes":}'>Reply
Yes double consonants in codas were not included - couldn't be bothered to do research and generate a large matrix of how to romanise all combinations (also because exceptions are quite common), so just left out this bit altogether. Wyang (talk) 03:58, 11 November 2013 (UTC)Reply
"measure word", "counter" and "classifier" - headers
Latest comment: 10 years ago1 comment1 person in discussion
Latest comment: 10 years ago6 comments2 people in discussion
Thank you for adding an etymology to this entry. I'd love to add etymologies like these to the Vietnamese Wiktionary, but I've had very little luck finding etymologies apart from modern loanwords. What sources do you consult?
Also, I noticed that the Mường word cal³ is given in an orthography I'm not familiar with. The Vietnamese Wiktionary has been using the Vietnamese-based orthography that seems to be ubiquitous among Vietnamese academic and government sources, since the Mường live in Vietnam. (This orthography uses Vietnamese tone marks rather than tone numbers.) Where can I find out about the orthography you're using?
Hi Minh. The Sealang Mon-Khmer Comparative Dictionary is a very useful resource for this purpose, which includes results from Shorto's Mon-Khmer Comparative Dictionary, and Ferlus' unpublished 2007 manuscript "Lexique de racines Proto Viet-Muong" (from the POV of Vietnamese). The notation is per Ferlus (2007) - The Mường form using Vietnamese diacritics appears to be chẳl. Wyang (talk) 06:00, 16 December 2013 (UTC)Reply
Wow, that's awesome! What's the copyright status on the database? Some of the citations have tooltips that say, "Do not cite entries from this manuscript!" What's that about?
If I'm not mistaken, spellings like cal³ in the database are just IPA transcriptions with Chinese-style tone numbers. It would be more appropriate to use the "local orthography" field in {{term}}. Some linguists use ad-hoc transcriptions, but the Vietnamese-based one seems to be prevalent in the media and other dictionaries. Would you mind if I changed transcriptions like cal³ to chẳl when I see them?
Sure, I have changed it myself. Are there any good online resources describing the phonology or orthography of tieng Muong, or other Vietic languages? Googling in English and Vietnamese does not seem to yield much useful. The tooltip note means the work is a preliminary unpublished manuscript and is subject to errors. Wyang (talk) 21:38, 16 December 2013 (UTC)Reply
Latest comment: 10 years ago3 comments2 people in discussion
Hi,
Thanks for your efforts on the conversion module! I posted a question there (described as I personally see it), copying here:
Issue with erhua: Since erhua is very unpopular in Taiwan, we still need to convert them correctly but they may be back conversion problems. ㄦ is equivalent to a full syllable "ēr" (first tone without a tone mark. To convert Pinyin like wánr and dàir probably need to do ㄨㄢˊㄦ˙ and ㄉㄞˋㄦ˙. Converting them backward would give wáner (wán+er) and dàier (dài+er). I can't find a definite explanation of how to transliterate erhua using Zhuying but Pleco uses ㄦ˙ (with a neutral tone marker) to render the "-r" suffix.
Latest comment: 10 years ago15 comments2 people in discussion
Hi,
Could you take a look at Module:PinyinBopo-convert/testcases, please? "dì'èr shǒu" becomes "ㄉㄧˋ 'ㄜˋㄦ ㄕㄡˇ" but should be "ㄉㄧˋ ㄦˋ ㄕㄡˇ". Perhaps if apostrophes are removed before the conversion it'll work. Also what would PinyinZhuyin for Pinyin "hm" look like, as in 噷, also hèn? I'm just trying to cover all corner cases, not trying to bombard you with requests :). --Anatoli(обсудить/вклад)22:09, 28 January 2014 (UTC)Reply
Hi, no worries. The former was taken into account and {{#invoke:PinyinBopo-convert|convert|dì'èr shǒu}} works as expected: . However apostrophe in PAGENAME fails to be recognised no matter what I do to the module. Thus {{#invoke:PinyinBopo-convert|convert|{{PAGENAME}}}} fails at dì'èr shǒu. I am not sure what can be done to fix this. As for the latter, how should 'hm' etc. be transcribed in Zhuyin? Wyang (talk) 22:40, 28 January 2014 (UTC)Reply
The apostrophe only fails to be recognised when it is part of PAGENAME, using it inside the string is not buggy (as I said above). For the latter, I am not sure I understand what you mean in the question.. Wyang (talk) 23:59, 28 January 2014 (UTC)Reply
No-no. It makes sense, I haven't read carefully the first time. Maybe a silly suggestion but have you tried - storing PAGENAME in a variable, displaying it first, remove apostrophe, display again, then convert, check result in this order? It's not easy to debug Lua, I know. --Anatoli(обсудить/вклад)00:21, 29 January 2014 (UTC)Reply
No, I haven't tried that. I guess I will wait for the more knowledgeable to kindly shed some light on the problem first, and resort to my dilettantish skills if all else fails... Wyang (talk) 00:31, 29 January 2014 (UTC)Reply
More "weird" Pinyin and Zhuyin: ng=兀 (with various tones or neutral, Pleco lists a few) as in 嗯 and 㕶. 兀 is both a Han character and a Zhuyin symbol reserved for non-Mandarin sounds and interjections like this. --Anatoli(обсудить/вклад)00:56, 29 January 2014 (UTC)Reply
I don't know, I just thought it sounded strange, so I assumed it is a Taiwanese usage. To me it just means "secondary (information)". The more common term is 二手. It seems 二手 is more common in Taiwan as well. Wyang (talk) 01:59, 29 January 2014 (UTC)Reply
I've changed to "rare" for the correct categorisation (Mandarin terms with rare senses, not rare forms), it doesn't mean I agree it's rare. I don't know. --Anatoli(обсудить/вклад)
I've been seeking used bookshops in Taipei recently and they all seem to use just 二手 rather than 第二手, or the other purported Taiwanese term 中古 for that matter. — 18:57, 5 February 2014 (UTC)
Hi, it seems Lua is having a bit of a meltdown at the moment, if you try {{#invoke:PinyinBopo-convert|convert|anything}}. I don't know what is going on. Wyang (talk) 00:58, 6 February 2014 (UTC)Reply
Yes, they do. I am on a lookout for these. Thank you!
Sorry to be a serial pest. I have 2 requests for {{ko new}} and {{ja new}}, when you have time and if you have interest. In Korean, the hangulisation template should be orphaned and deleted, IMO. We should use the standard {{etyl}} for loanwords (doesn't apply to Sino-Korean, Sino-Japanese, etc.). With Japanese, the template should produce simpler output, hiragana being the first parameter.
==Japanese==
{{ja-kanjitab|招|猫}}
===Noun===
{{ja-noun|kk|hira=まねきねこ|rom=manekineko}}
# beckoning cat; figure of a cat with one paw raised
Should be just:
==Japanese==
{{ja-kanjitab}}
===Noun===
{{ja-noun|まねき ねこ}}
# beckoning cat; figure of a cat with one paw raised
Note that hiragana, katakana may have spaces "まねき ねこ", which are not displayed but produce a more user-friendly romaji. Not urgent but it would be great to have. I will take a "no" for an answer if you rather not change. Appreciate your efforts! --Anatoli(обсудить/вклад)01:26, 6 February 2014 (UTC)Reply
There have been many changes to the standard format of a Japanese entry, thanks to the simplification efforts by User:Haplology. I have changed Template:ja new to adapt (it seems) to those changes. As for Korean, you can use |ee=league in 리그. To change it to other languages, you can use |el=fr ... |el= is 'en' by default. Wyang (talk) 02:03, 6 February 2014 (UTC)Reply
Latest comment: 10 years ago1 comment1 person in discussion
Thanks for your comments on my edits.
I tried to correct them following your comments.
I kept some of the wrong etymology and labelized them as mnemonic. Is it a good practice for the chinese character etymology section?
Feel free to comment or modify my edits as a am a beginner in wiktionary and english is not my native language.
Hi, the template can handle pinyin with apostrophes correctly. The small superscript in the IPA represents the semi-glottal stop, found in the onset of certain null-initial syllables. Wyang (talk) 03:05, 26 February 2014 (UTC)Reply
Hi. There is only one pronunciation (not variant pronunciations), and Pinyin only writes the non-sandhi form (yi1ban1) . Please see the page now. Cheers, Wyang (talk) 11:05, 2 March 2014 (UTC)Reply
Looks great now, thanks. Is there a way to mention tone sandhi in the pronunciation box? I think that would be helpful to users who may not understand why the change occurs. ---> Tooironic (talk) 08:48, 4 March 2014 (UTC)Reply
On a side note, I'm having second thoughts about this not being a variant pronunciation. The 國語辭典 (a trusted Taiwan dictionary) lists 一般 as yībān, even in the pronunciation sample, along with example sentences, listen here: Do you think this is a Taiwan variant perhaps? I can't recall ever hearing a mainlander pronouncing it as yībān. ---> Tooironic (talk) 08:52, 4 March 2014 (UTC)Reply
To me the two pronunciations sound like the pronouncer's attempt to pronounce the two syllables as if they are in isolation, probably as a consequence of the Pinyin orthography being the non-sandhi version (yi1ban1) (i.e. spelling pronunciation). Instead, the two 一般's in example sentences show tone sandhi. I don't think it is a Taiwan variant, at most a rare one. There are many online resources describing the tone sandhi patterns of 一 and 不 in Taiwanese Mandarin - . Wyang (talk) 11:28, 4 March 2014 (UTC)Reply
@Tooironic Ah, another thing. For the pronunciation template, it is not necessary to specify the audio file if the filename is 'zh-PINYIN.ogg'. Using '|a=y' suffices. Wyang (talk) 11:30, 4 March 2014 (UTC)Reply
That is part of where the pun is. There was a typo in 乞人憎 which I have fixed. This is a xiehouyu in Chinese, the first part being 非洲和尚 (a monk from Africa), and the last part being 黑人僧 (a black monk) (another synonymous way of putting it) - 乞人憎 (makes people hate) (the near-homophone). Someone might say "something is really 非洲和尚", to mean "something is really annoying". Cheers, Wyang (talk) 03:40, 8 March 2014 (UTC)Reply
I don't know as I rarely edit hanzi entries too. I just dislike the current format. It is too distant from the ideal logical format I have in mind. Wyang (talk) 07:17, 11 March 2014 (UTC)Reply
Also, do you mind adding Zhuyin to {{Pinyin-IPA}}, next to Pinyin? I hope it's not too hard for you. The table is a little too tall, maybe it could simplified with the bullets a bit, considering that we will include dialects as well. --Anatoli(обсудить/вклад)07:05, 14 March 2014 (UTC)Reply
Hi. The apostrophe issue is fixed. I have added Zhuyin to Pinyin-IPA and made it a little more compact (The extra line at the top and bottom of the table is something I plan to remove when my bot gets granted bot rights). Wyang (talk) 13:20, 14 March 2014 (UTC)Reply
Putting a homophone field in the pronunciation header template
Latest comment: 10 years ago14 comments4 people in discussion
Hi Wyang, is it possible to put a homophone field in the pronunciation header template? I think it would be useful. Here are two sets of entries that could benefit from it: 營利/营利 VS 盈利 and 迷路 VS 麋鹿. ---> Tooironic (talk) 00:07, 18 March 2014 (UTC)Reply
See Russian homophones привести́(privestí) and привезти́(priveztí). They can simply be added manually with:
* Homophones: 營利/营利(yínglì), 营利(yínglì) at the bottom of "====Pronunciation====" section. --~~
Yes I'm aware of that. But I was hoping there was a way to integrate it into the new pronunciation template. It looks strange and ugly to have it listed as a bullet-point under the lovely box. Here's another example: 便利 VS 遍歷/遍历. ---> Tooironic (talk) 00:24, 18 March 2014 (UTC)Reply
I think homophones should be manually parameterised then, so that not the template but the entries are maintained - 遍歷/遍历 - {{Pinyin-IPA|biànlì|遍歷/遍历|a=y}} maybe? But If we keep the simple bulleted style then homophones could fit nicely into the format. Of course, pinyin entries should be kept up-to-date with homophones (no problem listing currently missing entries). Actually, homophones in Chinese is an issue we should discuss separately. There could too many and fitting them into the template will become problematic. --Anatoli(обсудить/вклад)00:58, 18 March 2014 (UTC)Reply
(E.C.)Possibly. It seems the structure of templates with homophones is unsustainable. Editors should be able to add/remove them manually into entries or ignore them altogether and let categories, Pinyin entries list them. If we go away from one complexity with Chinese entries, such as "rs" value, we shouldn't create new one. Topolectal pronunciation/transliteration should be optional, of course. --Anatoli(обсудить/вклад)01:24, 18 March 2014 (UTC)Reply
Re:Anatoli: How about now (collapsed)...? Pinyin entries can be made to have zero information, only links to these templates (in another format). Re:DTLHS: Too many of them... Probably looking at >10000 of these. Templates are probably easier to manage. Wyang (talk) 01:18, 18 March 2014 (UTC)Reply
It might be harder to do manipulations of the data... For example, if one is interested in finding out all near-homophones (minimal pairs wrt tones) of shi4shi4 (i.e. shiNshiN), templates would seem easier in producing the list, no? Wyang (talk) 01:42, 18 March 2014 (UTC)Reply
I don't think so- either way you're passing the pinyin through a module that can generate and parse it any way you like. The only disadvantage is you can't add terms we don't have an entry for yet. DTLHS (talk) 01:45, 18 March 2014 (UTC)Reply
Not sure about your first question. Could you give me a link. Who will maintain templates? Pinyin entries are much simpler and they have been used to find homophones or choose the right Hanzi entry, anyway. If a template could read Pinyin entries, then it's probably better but seems too complex to me, anyway. In short, status quo is better for homophones, IMHO. --Anatoli(обсудить/вклад)01:24, 18 March 2014 (UTC)Reply
@Wyang Perhaps it's time you change your position on Pinyin entries? :) They work exactly as Pinyin indices in published dictionaries. I saw your expanded example. It looks OK but would be hard to maintain in the long run. --Anatoli(обсудить/вклад)01:37, 18 March 2014 (UTC)Reply
I meant the template format if you have a look at 犀利. The Pinyin entries in their currently state cannot be called by the pronunciation template to generate a list of homophones. The Pinyin information should be kept centralised somewhere, such that both Pinyin entries and character entries can call these templates without having the need to do synchronisations of contents (especially homophones which can be quite a headache to keep identical). The conversion would not be hard, and it would make Pinyin entries even more unjustified. Wyang (talk) 01:42, 18 March 2014 (UTC)Reply
"Yi" in Zhuyin
Latest comment: 10 years ago7 comments3 people in discussion
Yes, it's ㄧˋ ㄧˋ as well in 國語辭典 entry. If you copy-paste, you'll see ㄧˋ ㄧˋ. Interesting that it appears horizontally. Does the symbol appear horizontally in horizontal writing and vertical in vertical, similar to the Japanese elongation symbol ー, which appears as a vertical stroke in vertical writing? I've only seen Zhuyin symbol ㄧ(yī) as a vertical sign before. --Anatoli(обсудить/вклад)05:19, 24 March 2014 (UTC)Reply
Theoretically it should be ㄧ in horizontal writing and 丨 in vertical writing. In reality there are often exceptions to this rule. Wyang (talk) 05:22, 24 March 2014 (UTC)Reply
Interesting that in w:Bopomofo瓶子 appears in vertical as:
ㄆ
ㄧ (appears horizontally, in vertical writing ?!, can't render here)
ㄥˊ
˙
ㄗ
and in horizontal as ㄆ丨ㄥˊ ㄗ˙ (the opposite of what you suggested). Note ㄧ and the position of the neutral marker as well. Is there any rule in these examples? --Anatoli(обсудить/вклад)05:29, 24 March 2014 (UTC)Reply
Ah, yeah. I tried to search for the official rules when I edited Module:PinyinBopo-convert, but did not appear to have found anything very useful. There are also multiple versions of Zhuyin. The Wikipedia example might have been what the official rule (if there is one) considers as correct. I don't know about the rule for tonelessness. As the moedict.tw example above shows, in reality there are often exceptions to how ㄧ/丨 are supposed to be used, if Wikipedia is correct (chances are). Wyang (talk) 05:40, 24 March 2014 (UTC)Reply
I got it from less reputable dictionaries and I thought it was a valid variant. Anyway, I don't recall how you handle words with multiple readings, such as 瘦削. What's the right way? --Anatoli(обсудить/вклад)10:21, 24 March 2014 (UTC)Reply
My plan after User:Wyangbot gets granted a bot flag by a bureaucrat is to finish the format change on pages using Pinyin-IPA (i.e. ). Currently there are >3000 pages using the old format of Template:Pinyin-IPA, which requires each syllable to be fed into the template separately. Once that is done, I will modify Template:Pinyin-IPA, to make it accept alternative readings as second, third, ... parameters and enable one to write comments for each pronunciation (like Taiwan/Mainland, standard/colloquial, if one needs to), and use that to end the template awkwardness in Category:cmn:Variant pronunciations pages. Template:cmn-new will also be modified, so that one can use the parameter |py2=... to add a second pronunciation, although it would be better if one modifies the page afterwards, since the readings are often used in different contexts. By the way, 觉 is one of the few characters in Standard Chinese which show different literary and colloquial readings, with jiao4/jiao2 being the colloquial reading (limited in 睡觉), and jue2 being the literary reading (all other situations). People speaking other dialects may use the colloquial reading in compounds which Standard Chinese normally uses the literary one, eg. 觉得jiao2de, 自觉zi4jiao2, and this would be typically considered heavily accented or colloquial. I haven't heard 错觉 been pronounced cuo4jiao4 or cuo4jiao2 though. Wyang (talk) 22:48, 24 March 2014 (UTC)Reply
Multiple pronunciations often have different statuses. E.g. Russian до́гово́р(dógovór) when stressed on the first syllable is considered less educated, so is свекла́(sveklá), which is also an alternative spelling of a more standard свёкла(svjókla). Manual feeding of templates is fine with me but I'd like to see an example. At Wiktionary, it's OK to list all acceptable but verifiable forms, even if they are colloquial. As for different contexts, words could be split into etymologies, like 得了. No rush. I can see you're busy. Please consider that we will need templates for words, which ARE NEVER USED in Mandarin as well - Cantonese, Min Nan (including non-Han scripts - Latin, Cyrillic, Arabic), where Pinyin/Zhuyin may not be appropriate. See Talk:老番. Perhaps Cantonese 佢哋 could be a good example, how this type of entries are going to look (after a change).
It shows tone sandhi in IPA, but in Pinyin. I only did tone sandhi for Pinyin for words containing 一 and 不, not other cases, since the effects cannot be represented well by Pinyin tone marks. In the case of 指指点点, all syllables undergo tone sandhi, the first three undergo third-to-second tone sandhi which you can represent using the acute accent, but the last syllable undergoes third-to-half-third (half third: only the first half of third tone, only dipping, no rising) which you cannot represent using Pinyin tone markers. There are also half fourth-tone and tone sandhi of neutral syllables, for which there are no Pinyin diacritics available too.
Yes, I think so. You could release the pronunciation template without having to wait for the vote. Since you're not breaking anything with it. I guess "==Mandarin==" and "lǎofān" looks confusing on 老番, even if Mandarin usage might be attestable as well.--Anatoli(обсудить/вклад)00:23, 25 March 2014 (UTC)Reply
Possibly but "OK" is spoken quite a lot by Chinese (it's questionably considered the most common word in the world!), even if it can be argued as "code-switching", nobody found reasonably acceptable Hanzi to render the sounds, so that it was accepted by the majority, besides "OK" is so easy to type, compared to anything else. OK in 卡拉OK is usually pronounced identically, the Chinese way, that's all (with some variations in both). Of course, they have nothing in common otherwise. IMHO, rendering foreign "/k/" sounds seems problematic in standard Chinese with some eexceptions, since many words have "j" via Cantonese or otherwise, even "卡" is not common for foreign /ka/. --Anatoli(обсудить/вклад)05:03, 28 March 2014 (UTC)Reply
Yeah, I'm not sure. It's probably related to my edits to the pronunciation template. I'll see what I can do. By the way, the template now can handle variant pronunciations (eg. 骨頭, 普遍) and can generate Mainland-Taiwan differences automatically (eg. 星期, 乳酪). Cheers, Wyang (talk) 06:42, 29 March 2014 (UTC)Reply
Latest comment: 10 years ago5 comments2 people in discussion
The code in these templates is pretty much unreadable right now, it's just a big giant blob of code. Could you clean it up please? —CodeCat18:14, 29 March 2014 (UTC)Reply
I've reorganised a lot of the code in these templates, to make it easier to maintain. The /essence template really contained the same code 5 times with some small variations, so I split that code out into a separate template, Template:Pinyin-IPA/table. The /essence template is no longer needed now. —CodeCat00:07, 30 March 2014 (UTC)Reply
Yes, the variant pronunciation feature added two days ago involved a lot of duplications. I wanted to put it entirely into Module:Pinyin-IPA when I added it, but I just opted for the easier option out of laziness. Thanks for doing that. By the way, there was a minor error in your code of the main template, which is now fixed. Wyang (talk) 22:36, 30 March 2014 (UTC)Reply
More specifically
Latest comment: 10 years ago10 comments2 people in discussion
Using User:Wyang/歷史 as an exemplar, this is the xml which I would be processing - the most current revision of the article:
<page>
<title>User:Wyang/歷史</title>
<ns>2</ns>
<id>4354968</id>
<revision>
<id>25957600</id>
<parentid>25957507</parentid>
<timestamp>2014-03-28T00:38:39Z</timestamp>
<contributor>
<username>Atitarev</username>
<id>27724</id>
</contributor>
<comment>/* Chinese */ add {{temp|zh-hanzi-box|]|]]}}, rm Wikipedia, etymology, irrelevant to the proposal</comment>
<text xml:space="preserve" bytes="438">==Chinese==
{{zh-hanzi-box|]|]]}}
===Pronunciation===
{{zh-pron
|m=lìshǐ
|c=lik6 si2
|mn=le̍k-sú
|w=5liq sr
|ma=y|ca=y|wa=y
}}
===Noun===
{{User:Wyang/zh-noun|hsk=b}}
# {{cx|obsolete}} ]s of past events; ] records
# ], ]
# past ]s of a person, the history of a person
# ], the ] of history, usually {{l|cmn|歷史學|tr=lìshǐxué}}</text>
<sha1>tp7iuzb46po75ssvxo3n88s5lbl3ouu</sha1>
<model>wikitext</model>
<format>text/x-wiki</format>
</revision>
</page>
For my current project this would result in the title (User:Wyang/歷史) being added to the list of words for "Chinese". When creating captcha images, having mixed scripts can result in text in one script appearing much smaller than the other, usually illegibly. This can be worked around, but it's an additional investment of time and effort. Every wiki whose language would be collapsed to "Chinese" would end up with, possibly, all the words in that classification being used.
My particular project is very WMF-focused, and you can easily say that it would not matter on other sinitic WMF projects. But Wiktionary's data is not intended solely for use inside the WMF. A researcher may wish to use a wiktionary dump to create pools of zh-classical 'words', or a teacher might wish to create a booklet of Min-Nan zoological terms, or a developer might pull solely Cantonese translations and want solely Cantonese senses to go with them. Doing so under the proposed model would not be possible using the dump data, because the relevant information is carried solely within the parsed templates. Working with the dump takes about 12 minutes to build my 1612 word lists; building the same from the API apparently missed about 4 million entries, and took 36 hours.
In the city I live in, Richmond in BC, Canada, the majority of people speak one or another Sinitc language, but most students must speak English in school. Even very young students use Wiktionary to clarify both their English and their Chinese language use. While I do not have specific evidence, I would expect a speaker of a Chinese language would look on the page for (example) Cantonese first, and Chinese second (or not at all.) In my opinion, Wiktionary should strive to help that student find what they are looking for on their first attempt. - Amgine/t·e06:07, 31 March 2014 (UTC)Reply
Hi, Amgine. Thanks for the clarification. I see what you mean in your comment now. Let me get back to you in two or three hours. Wyang (talk) 06:20, 31 March 2014 (UTC)Reply
I agree that data maintenance of multi-scripted (digraphic) languages is typically particularly troublesome, and people on Wiktionary working with those languages (eg. Serbo-Croatian, Chinese, Japanese) can certain relate to that. However, digraphia in Serbo-Croatian, Chinese and Japanese is unrelated to the amalgamation or separation of its varieties. If the grouping of Serbo-Croatian is not in place, the issue of multiple scripts would still pose a problem for your captcha work, as both the Cyrillic and Latin alphabets are used to write the Serbian language, with Serbian being the only European language which has synchronic digraphia. Similarly, both Simplified Chinese and Traditional Chinese are used to write every Chinese variety. Pulling out all entries of a Chinese topolect, before and after the amalgamation of varieties, would both inevitably run into the problem of having to deal with both sets of Chinese characters. The difference in font size between simplified and traditional is probably not significant, if any, fortunately.
As you probably know, the Chinese varieties share a common written form - in the past it was Classical Chinese, and now it is Written vernacular Chinese. Consequently there is not much point in generating a topolect-specific captcha, say a Wu-language captcha. It would be more realistic to generate a captcha based on Written vernacular Chinese, or just Chinese characters in general. I'm not sure about the details of your captcha project. Do you use the presence of '== ==' language headers to pull out all entries in a particular language? If so, then the merger would be great for your project, since it only applies to Chinese character-scripted entries here. You could pull out the title of every page which contains the header '==Chinese==' in its content, as they are guaranteed to be the same script.
If you have attempted generating captcha for non-Mandarin Chinese topolects using Wiktionary data, you may have noticed that at present, such data is remarkably meagre on Wiktionary. For example, Category:Wu nouns only has 10 pages, Category:Gan nouns has 1 page, and Category:Xiang nouns has only 1 page as well. Generating page title data for Category:Min Nan nouns would have resulted in a terrible mix of three scripts, whereas the unified Chinese approach would eliminate this script multiplicity, as said above. I am curious, though, regarding how you would handle Japanese data on Wiktionary? It is written in three (Kanji, Hiragana, Katakana) different scripts here... well actually, four (plus Romaji) if you pull out everything. It must be a headache to try to analyse this.
I'm not sure I agree with your point on language self-identification. Most of the Chinese people I know identify their speeches to be 'Chinese' when asked. It is when people enquire further which division of Chinese it is that they give the 'Cantonese' or 'Mandarin' answer.
For my specific project it is, in fact, important to identify which script is used as the identities of the Wikipedia communities is in part based on their use. It may be offensive to, for example, a Bosnian wikipedian if xe is given a cyrillic captcha, while it would not matter for sh.wikipedia and would possibly be offensive not to do so on sr.wikipedia. My personal opinion for Chinese languages would be to use vernacular Chinese, as you suggest, but how would such be identifiable under your proposal? I do use the L2 header ('== ==') to identify the language, but this is exactly why your proposal is a problem, as I will explain below.
As I understand your proposal, script-wise the article titles would probably not face great difficulties. The breadth of characters available in any given family member language may, however, be more limited than the total number of entries in "Chinese". Being able to easily identify a relevant vocabulary - in some ways to limit the expressions to those in common use by the target reader population - is often an important reuse of Wiktionary data. Would it be possible to include in your model an unambiguous method of identifying language codes which would commonly be expected to use the entry?
Yes, I have generated word lists for all L2 on en.WT; for Wu I found exactly 45 entries, 19 in Ga, and two for Xiang. Although Min-Nan does include a mix of scripts, this appears to be normal across the spectrum for written Min-Nan although I found references to efforts to standardize Hokkien in any of several writing systems. With your proposal, each of the Chinese languages would suddenly seem to have a very large number of entries if one assumed that, for example, Min-Nan = Min-Nan + Chinese. But it doesn't. Like-wise I, as a person not familiar with these languages but working with the en.Wiktionary data, would not know if Bopomofo entries are Chinese or not, or if terms written in Taiwanese Kana or latin are included. If they are not, would Min-Nan suddenly consist solely of entries in these other writing systems?
For generating word lists for Min-Nan your proposal would have no effect on reducing the multiplicity of scripts actually used in written Min-Nan. It would, however, possibly erroneously limit (and/or expand) the list of terms found for Min-Nan in en.Wiktionary data. Consolidating terms under a single L2 header "Chinese", while excluding Chinese family language headers, will likely result in confusing data for later use. Having an unambiguous language code list identifying which languages in which a term is in common use would reduce this confusion, but not entirely alleviate it. Put another way, it is likely to cause future errors, requiring a greater investment of effort in order to use Wiktionary data.
I think what I am trying to say is that although English may occasionally use words from many related languages, especially German, French, and Latin, these words are not commonly considered 'part' of the English language except here on en.Wiktionary. These words and phrases make up a much larger vocabulary for English than is actually recognized or understood by large percentage of the population, even though the terms may follow the linguistic rules and rôles of English. I approach the concept of Chinese written language and Japanese Kanji in much this way - intelligibly part of the larger classification, but not always part of the vernacular - which may be completely ignorant.
For this project I am using all entries in any writing system, so all four writing systems of Japanese are valid. The generation of captcha images, however, is failing mostly due to the Kanji, which have a high percentage of illegibility due to the complexity of the characters versus the distortion effect used. This is also a problem with Chinese scripts - the highly refined characters become illegible when even slightly distorted. For other analyses I have done with Wiktionary data having multiple writing systems is a distinct benefit, allowing use of a larger corpus of source documents. The limitations become creativity and time, rather than what can be analysed. - Amgine/t·e15:22, 31 March 2014 (UTC)Reply
Hi, Amgine. Thanks for the reply. I would like to mention a few things:
Content under the heading '==Chinese==' will not be absentmindedly assigned to every ISO-coded Chinese variety under the proposal. The unambiguous language code list for a term is the pronunciation template {{zh-pron}}, and pronunciation in each variety will be fed into that template. In the xml code above, the varieties for which pronunciations have been given include: Mandarin, Cantonese, Min Nan and Wu, hence the entry will be categorised into those respective categories, sorted by the appropriate romanisation. One could parse through the template code to extract the topolect-specific page titles, eg. regex \{\{zh\-pron\n\|(+\n)*\|c=(.+)(\n+)*\n\}\} or something similar, to extract all the Cantonese pages. Even easier perhaps, one could recursively extract the titles of all pages from the category Category:Cantonese parts of speech and its subcategories, which would be much more convenient.
I concur with your point of using the appropriate script so as to avoid offending specific subpopulations of a larger speaker population. However, the circumstance may be different for languages with speech-writing separation. The imposition of a modern literary standard is in place in all countries which designate Chinese as one of the official languages, and texts written in Written vernacular Chinese would be understandable to any educated person. The scope of other orthographies would be very limited. For example, the Pe̍h-ōe-jī romanisation of Min Nan is generally only understood by some seniors. Young people in Taiwan, who are fully conversant in Min Nan, are mostly illiterate in Pe̍h-ōe-jī. Even if they are able to read it, it is unlikely that they would have the appropriate input method for it.
Generating Chinese character captchas seems quite uncommon, and I can imagine it will be much more difficult than the Latin alphabet. A Google image search suggests most have basically unobscurified characters, or just characters in different fonts. Most Chinese fora are simply not bothered and just opt to use Latin-alphabet captchas. For the captcha project, I think the best approach would be to generate captcha based on Written vernacular Chinese. It could be produced by pulling out titles of all entries containing '==Chinese=='. Alternatively, you might want to pull out titles which are used in both Simplified Chinese and Traditional Chinese, as the reader population for any Chinese variety will be a mix of the two, and it may not be possible for them to have the input method for both sets of characters. This could be done by parsing through the simp-trad form template ({{zh-hanzi-box}}, as alternatives have been made obsolete by User:Wyangbot), and generating all mainspace pages transcluding the template but lacking a second parameter - eg. \{\{zh\-hanzi\-box\|(+)\}\}. Or, you could use a frequency list for Chinese characters (eg. ) and do combinations and modifications on characters which lack a simp-trad distinction. The text probably does not have to be meaning-conveying for Chinese character captchas.
Thank you for this, Wyang. I've linked some of this information on the captcha bugs.
Having a second regex inside the L2 loop to check for the presence of the zh-pron template doubles the amount of work the parsing script is required to perform. It also more than doubles the amount content processing, and a basic test of your example page 1000000 shows a time to process increase of just over 18x on average. (I could do a formal benchmark if you would like.) The template {{zh-pron}} is not documented. The codes used are not unambiguous, and do not follow a reference standard. What this means is it cannot be trusted to reliably identify which languages the entry can be used for, nor can any metadata processor future-proof their code.
Using the API to recursively iterate over Category:Cantonese parts of speech would take, I estimate, two or three days. Multiply this for each language which is at least equal in size. This time expense is prohibitive; it is not an option. Additionally, previous parsing for european languages found about 98% of terms were properly categorized; the remaining 1-2% were uncategorized or miscategorized.
Yesterday a new dump of wiktionary was produced, and I'm working on automating a process to update the word lists generated from it. However, I will be recommending that we not use en.Wiktionary data in the future, and instead derive word lists from the wikipedia dumps.
Hi, User:Amgine. I have added some more detailed descriptions of the template {{zh-pron}}. Please forgive me if it is insufficiently detailed and unambiguous; the template itself rests on the assumption that a unified Chinese approach is agreed upon. I am wondering what the outputs for your runs are like? Are they page title lists for entries satisfying a particular criterion, without any page content or history information? If so, there are probably better ways to achieve this. Wyang (talk) 04:27, 2 April 2014 (UTC)Reply
I have, previously, processed en.Wiktionary content for many different purposes ranging from a mediawiki gadget to (as a proof of concept) to various structured dump processing scripts for linguistic research and cross-referencing all wiktionaries to a private corpus. My current project's output is a simple lists of terms, a couple of quick hacks which produce output like this from the 2014/03/28 dump of en.WT.
In short, I manipulate Wiktionary content in many different ways for diverse clients. Unlike many project members, I am aware of how exceptionally relevant en.WT data can be in real-world applications, both inside and out of acadæmia. And how useless.
To answer your question directly regarding the current request, I am creating lists of terms or phrases which are considered vulgar or obscene, and lists of terms which are *not* considered vulgar or obscene which meet the further requirements of being not confusable, single scripted, and/or non-spoofed (invisible || single- or multi-script equivalencies to non-linguistic terms.) This is related to Mediawiki bugs #32695#5309 (primary), #63216, #63217, #62960 (prototypes via GSOC 2014) and of course mw:CAPTCHA. - Amgine/t·e05:35, 2 April 2014 (UTC)Reply
@Amgine Thanks, I see. For the current project, it seems Mandarin.txt is a mix of Simplified Chinese, Traditional Chinese, Latin letters (some with diacritics), numbers, and special symbols. Just a thought: In multi-script cases, you could perhaps use AWB for simple tasks like generating word lists. This is my take on Mandarin.txt (via recursively extracting pages under Category:Mandarin parts of speech three times) and Mandarin_NoSimpTradDistinction.txt (using the second latest en.WT dump, finding mainspace transclusions of {{zh-hanzi-box}} lacking a second parameter). In both cases non-Chinese characters symbols have been filtered off. These are effectively Written vernacular Chinese wordlists, and the latter is probably good for producing captchas targeted at speakers of Chinese varieties. Wyang (talk) 06:24, 2 April 2014 (UTC)Reply
Not sure that AWB can run as an unattended event on a *nix server, but I'll ask Reedy about how automatable the process could be. - Amgine/t·e16:23, 2 April 2014 (UTC)Reply
Latest comment: 10 years ago2 comments2 people in discussion
Thank you so much for this and sorry for the confusion and making you work! You seem to be able to do the formatting work with your bot as well for the Japanese. --Anatoli(обсудить/вклад)22:43, 3 April 2014 (UTC)Reply
Latest comment: 10 years ago7 comments3 people in discussion
@Wyang I just ran across this a second time, and realized that Wyangbot is the one doing it: diff for one example. I can't remember the earlier example of where else I've seen this, but just now checking Wyangbot's contribs, I also found diff and diff. Could you look into this? ‑‑ Eiríkr Útlendi │ Tala við mig19:55, 8 April 2014 (UTC)Reply
"rs" is not references but "radical sort". Mandarin entries should now be sorted by numbered pinyin instead- "pint" by an earlier agreement with al Chinese editors. Suffixes "in simplified script"/ "in traditional script" are removed in topical categories. The bot should actually replace rs with pint, IMO. Maybe Wyang wants to do it in stages? --Anatoli(обсудить/вклад)20:11, 8 April 2014 (UTC)Reply
@Atitarev Anatoli, have another look -- Wyangbot is deleting the ===References=== header from some, but not all, Japanese entries that are above a Mandarin section that was edited by the bot. I'm not sure why Wyangbot is only doing this some of the time. ‑‑ Eiríkr Útlendi │ Tala við mig20:14, 8 April 2014 (UTC)Reply
There was an error in the code looking for empty reference sections, and I have fixed this. Sorry and thanks. @Eirikr Could you please have a look at bot edits of the 'references' section of other articles in your watchlist? I will search for other affected articles too once the new en.wikt dump is available in about ten days time. Thanks. Wyang (talk) 04:27, 9 April 2014 (UTC)Reply
The automated changes were started approximately 24 hours ago. I am automatically readding the references header to lines of <references/> not preceded by ===(=)References===(=). I have checked all pages linking to {{ja-kref}}, which is used in the reference section of ~65 pages. All reference headers followed by bulletpointed references are immune from the attack. It seems those are the major types of Japanese references... Wyang (talk) 07:04, 9 April 2014 (UTC)Reply
Categories and sorting
Latest comment: 10 years ago3 comments2 people in discussion
@Wyang I see you have already done some work on this, thanks! Are you able to set the bot to do the following, e.g. 二 (if you're not doing it already)?
to add "pint" value to each Mandarin category after a pipe "|", e.g. ]
to add "pint" (if missing), e.g. "|sort=er2" to any contexts and labels, e.g. {{temp|context|slang|lang=cmn|sort=er2}}, {{temp|cx|slang|lang=cmn|lang=cmn|sort=er2}}, {{temp|label|slang|lang=cmn|lang=cmn|sort=er2}}.
to remove " in simplified script" and " in traditional script" from category name inside entries? It's OK if some categories are red-linked.
It is doing #3 now. I held off doing the rest because I wasn't sure that this is the best approach. I know it is how it is traditionally done, but there is a lot of duplications involved, and I am sure there is a better way to do this (section DEFAULTSORT) now that Lua is possible. Maybe we can embed some magic in {{zh-hanzi-box}} or {{Pinyin-IPA}} to make it look for the first parameter of Template:Pinyin-IPA. I don't know though. Wyang (talk) 06:51, 9 April 2014 (UTC)Reply
Thank you. I don't know myself, sorry. I've only listed things that I think need to be done. I've been fixing some of them manually. Obviously after conversion, you may get duplications like ] appearing twice. Sorting in categories is a pain in Chinese and Japanese but without "|pint". The categories will have the first character as the header, e.g. compare 百 with 八 in Category:cmn:Cardinal numbers. --Anatoli(обсудить/вклад)23:09, 9 April 2014 (UTC)Reply
单一
Latest comment: 10 years ago2 comments2 people in discussion
Latest comment: 10 years ago2 comments2 people in discussion
What's your source on this as a 輕聲? I've never heard it pronounced this way, and all the dictionaries and online sources I check don't indicate it is. ---> Tooironic (talk) 22:33, 11 April 2014 (UTC)Reply
It's a colloquial variant, especially 體育 is followed by other nouns, as in 體育場, 體育館, 體育頻道, 體育新聞, 體育人生, 體育中心, 體育總局 (http://youtu.be/oou4Mp-6khE?t=3m1s). It is listed in 《大漢俄詞典》: "体育 tǐyu физическое воспитание, физическая культура; спорт; физкультурный, спортивный 體育運動 физкультурное движение, физкультура и спорт 體育比賽 спортивные соревнования". Wyang (talk) 01:10, 12 April 2014 (UTC)Reply
Your bot breaks stuff
Latest comment: 10 years ago6 comments2 people in discussion
Those are using the wrong template. All Korean adjectives and verbs (lemma forms) have to end in 다. I will fix them later. Wyang (talk) 08:51, 22 April 2014 (UTC)Reply
Chinese entries by Lo Ximiendo
Latest comment: 10 years ago2 comments2 people in discussion
Latest comment: 10 years ago2 comments2 people in discussion
Please help me to do some testings because I'll be simplifying this Module by a lot. I'm also thinking about putting the data on a separate page but I don't know enough Lua to be able to do that. --kc_kennylau (talk) 10:23, 22 April 2014 (UTC)Reply
Hi. It is good to see someone willing to tackle that module. My programming experience prior to Lua here is close to non-existent and I'm not familiar with everything in Lua, but I am happy to give whatever help I can regarding these modules. Another reason as to why I haven't got around to doing the rewrite, is that Module:zh is chiefly used substitutively. When you make changes to Module:zh, please make sure that the various functions in it still work correctly like before when called via {{cmn-new}}. Please let me know if I can be of any help. Wyang (talk) 11:37, 22 April 2014 (UTC)Reply
Latest comment: 10 years ago5 comments2 people in discussion
Hi,
Could I ask you for a favour, please? Could you convert all pinyin entries in Category:Mandarin pinyin with diacritics to use {{cmn-pinyin}} with Wyangbot to make single-syllable pinyin be the same as multisyllabic ones, e.g. biào? There are other problems with those entries, though but that's a first step.
Optionally, if it's not too hard, could you also remove anything (short definitions, descriptions) after {{pinyin reading of}} or brackets? I know the entries are far from perfect. --Anatoli(обсудить/вклад)01:28, 23 April 2014 (UTC)Reply
I don't know this character... Nonetheless, it's added to Module:zh/data now. I sometimes use |p1=... (qí) for characters I don't know (and hence characters that are likely to fail). Wyang (talk) 03:51, 23 April 2014 (UTC)Reply
OK, thanks again. I wasn't sure if you need to know about any missing character. I'll keep using |p1=..., etc. 埼 may be a Japanese invention, even if it's not marked so. BTW, on Chinese Wikipedia I tried to change 栃木 to 枥木/櫪木 (simp/trad) but was corrected. 栃 is a Japanese character (or ancient Chinese), AFAIK but it's now used in Chinese, at least in Wikipedia. --Anatoli(обсудить/вклад)04:41, 23 April 2014 (UTC)Reply
埼 referred to "bent coastline" in Classical Chinese. 栃 is kokuji, not shinjitai. The Chinese-language version of the official government website uses the unchanged character too, therefore the Wikipedia people did not change the title. Wyang (talk) 04:53, 23 April 2014 (UTC)Reply
It's the opposite as the 'sortkeys' function is to be used in conjunction. 玉 was written like 王 in the seal script and the two radicals were traditionally merged under one radical . Unihan does the same too, as is Wiktionary. Wyang (talk) 04:43, 23 April 2014 (UTC)Reply
It was used in a previous version when I tried to generate the content of the Etymology section all in one go. But it's been replaced by compdecompetym. It's gone now. Wyang (talk) 12:37, 24 April 2014 (UTC)Reply
Latest comment: 10 years ago4 comments3 people in discussion
Hi,
If you accept the nomination, could you please edit Wiktionary:Votes/sy-2014-05/User:Wyang for admin and set your languages and the time zone, please? This also has to be on your user page. I believe you also need to make yourself contactable via email but I'm not 100% sure about this. The vote can start after your acceptance or whenever it's edited to be open. Good luck! --Anatoli(обсудить/вклад)11:50, 25 April 2014 (UTC)Reply
Thanks Anatoli. I have accepted the nomination, specified languages and timezone, and enabled email. What should I do next? Wyang (talk) 00:39, 26 April 2014 (UTC)Reply
It's again to do with @kc_kennylau's unfaithful simplification edits of Module:zh and Module:Pinyin-IPA. I have fixed them. To disable varpron, you can replace the character with Pinyin. To Kenny: Not all compounds of characters which are pronounced differently in Mainland and Taiwan should be interpreted by default as variant pronunciations. An example is 发/髮 (not 發). Basically if a character is used in the pronunciation template, it will be interpreted as being pronounced differently across the strait. cmn-new should not keep every character in data/MT, but only a subset (which is now lost in the code). Wyang (talk) 00:20, 26 April 2014 (UTC)Reply
There was originally a leading whitespace at example_transform when the module used iterative word assignment. You rewrote it to use mw.text.split but did not remove the leading whitespace, hence it causes problems when the capitalisation feature appends a '^' at the start of translit. When Anatoli reported that the capitalisation feature failed, I saw that the problem was caused by '^ %l' (instead of '^%l'), but didn't realise what caused that. I have fixed it now. Wyang (talk) 07:55, 3 May 2014 (UTC)Reply
Thanks. I may have used a comma somewhere. I don't remember now. I'd prefer to have a bit more consistency, since Mandarin uses comma. Too many differences and things to remember. :)
BTW, converting existing Cantonese, Min Nan to ==Chinese== is quite time consuming. I'm not complaining but it will take time. We should try and get more people doing it. @Jamesjiao, @Tooironic, @Kc kennylau do you think you can help a bit? You don't really have to know Cantonese, just need to know what to do and know Mandarin. Some entries are tricky but this diff was quite straightforward and simple edit on 多士, which merged Cantonese and Mandarin into one Chinese entry, which now has both Mandarin and Cantonese pronunciations and categories. --Anatoli(обсудить/вклад)00:18, 5 May 2014 (UTC)Reply
@Tooironic Converting all by hand would be a huge task but it's better to convert varieties (topolects) first, because they duplicate info and it's hard to do by a bot, e.g. starting at Category:Cantonese_nouns (I'm on letter "H") and other parts of speech and a much bigger list - Category:Min Nan nouns (verbs, adjectives, etc.) (only in Hanzi, don't do romanised forms). Please take a look at some complete entries to see how it's done. You can use Wyang's {{zh-new}} for this "|c=" stands for Cantonese reading (romanisation syllables should have spaces) and |mn= for Min Nan POJ. Wyang might be able to do it most of remaining Mandarin entries by a bot. Do only multisyllabic for the moment, single-character ones can be done later and they are much more complicated and messy. --Anatoli(обсудить/вклад)04:33, 5 May 2014 (UTC)Reply
Latest comment: 10 years ago3 comments2 people in discussion
Hi Frank,
I have two questions.
Wikipedia seems to use Pe̍h-ōe-jī and Tâi-lô as synonyms but they are not, apparently. What's the difference and which is more common, standard? is it POJ?
POJ and TL are two different romanisation schemes. POJ is the more popular one, although the TW government seems to be promoting the latter and the amount of material printed using TL has been increasing.
Latest comment: 10 years ago5 comments3 people in discussion
Hi Wyang, I noticed a problem with the pronunciation template. It seems that 血 comes up automatically as mainland=xuè and Taiwan=xiě (e.g. in the entry I just created for 血運), but actually xuè is the standard pronunciation, while xiě is a (very) common variant in both mainland and Taiwan. Currently the template gives the impression that xuè is only used in mainland, and xiě only used in Taiwan which is incorrect. Would we able to fix this? ---> Tooironic (talk) 02:37, 6 May 2014 (UTC)Reply
I would just use xuè,xiě (both equally) with xuè being the more common pronunciation. It's probably complicated, some words may use one or the other only, is that right? --Anatoli(обсудить/вклад)05:22, 6 May 2014 (UTC)Reply
Xiě is definitely the most common pronunciation; very few Chinese pronounce it as xuè. As it stands now the design of the pronunciation header is incorrect in that it assumes that there is regionality when there is none. ---> Tooironic (talk) 10:03, 6 May 2014 (UTC)Reply
I agree that 封建社會 is probably a word. The Chinese love the word 社會 and use it very flexibly, sometimes even creating words in the process. Also, both MOE and 現代漢語規範詞典 list it. ---> Tooironic (talk) 22:48, 6 May 2014 (UTC)Reply
Chinese idioms
Latest comment: 10 years ago6 comments3 people in discussion
Hi Wyang. Was just wondering how Chinese idioms are dealt with under the new system? I tried making a new entry at 厚積薄發 but it didn't turn out well. Any suggestions? ---> Tooironic (talk) 10:00, 6 May 2014 (UTC)Reply
You should have used id instead of idiom. However, I have added codes to adapt to names already, so you're safe to use idiom now. --kc_kennylau (talk) 10:23, 6 May 2014 (UTC)Reply
Latest comment: 10 years ago2 comments2 people in discussion
Look at 唔該, the Yale romanization is not function properly because the grave accent cannot be displayed on top of the letter m. Any idea how to fix this?
What parameters should be included in {{zh-noun}}? I feel so awful deleting every detail in the head.
#1 is due to the <tt> formatting. We could remove that, although it wouldn't look as nice typographically. 2) I would go for no parameter at all. Anything included would be duplicative of something that is already present. Wyang (talk) 09:32, 7 May 2014 (UTC)Reply
return your head
Latest comment: 10 years ago7 comments3 people in discussion
Latest comment: 10 years ago8 comments4 people in discussion
Cantonese multisyllabic entries are now converted/merged/fixed to use "Chinese" L2 (every PoS, if they used the proper templates)! Now the turn is for Min Nan - a much larger set. I'm not familiar with Min Nan but I can treat carefully and check but I may not be able to spot wrong entries - transliteration, senses, etc. Do you think you can run your bot again (I saw you merged Min Nan entries as well)? Min Nan entries seem a bit more complicated than Cantonese, though. --Anatoli(обсудить/вклад)04:59, 9 May 2014 (UTC)Reply
I'm probably the wrong person to ask for this...I'm against having Mandarin Pinyin in the way they are now, or having any romanised entries at all. Wyang (talk) 05:28, 10 May 2014 (UTC)Reply
Hi, please use {{zh-pron}} instead of {{Pinyin-IPA}}. Mandarin audios have the parameter '|ma=', which works exactly like the parameter '|a=' in Pinyin-IPA. Please see my change on that page. Wyang (talk) 01:29, 10 May 2014 (UTC)Reply
Latest comment: 10 years ago2 comments2 people in discussion
It's a minor issue, but I just made the above change to zh-pron in 4 entries to empty Category:Mandarin con and its sister categories: you evidently told your bot to use "con", and zh-pron didn't recognize it. Fortunately I knew where to find the correct abbreviation, but others won't- so someone will probably make that or similar mistakes as long as there's no list in the documentation. Chuck Entz (talk) 00:21, 12 May 2014 (UTC)Reply
Please if you can. It's all right since the Wu pronunciations are quite unintuitive for anyone not familiar with it. Wyang (talk) 01:11, 12 May 2014 (UTC)Reply
Middle Chinese and Hakka transliterations
Latest comment: 10 years ago5 comments2 people in discussion
Hi Frank,
Is there a way to transliterate Middle Chinese? I've merged 我 but not happy about Middle Chinese (ŋấ, ngɑ̌) and Hakka (a big list with a reference to a dictionary). I'd like to do 好. It also has Middle Chinese transliterations: *xaù, *xǎu and a list of Hakka. Not sure about the best way to add them. --Anatoli(обсудить/вклад)14:41, 13 May 2014 (UTC)Reply
Please use the |mc= parameter in {{zh-pron}}. Please use this page to look up MC pronunciations, the parameter value is "中古声母(1 syl)-中古韵母(1 syl)-中古等(1 or 3 syl, no "等" character)-中古开合(1 syl)-中古摄(1 syl)-中古声调(1 syl)-中古反切(2 syl)". Multiple readings ("后一条") are separated by ",". Please see my edits at 我 and 好. Wyang (talk) 05:01, 14 May 2014 (UTC)Reply
Hi. I did a couple but adding Middle Chinese transliterations seems such a hassle using ] Perhaps, we should just adopt one or two of the transliterations there without extra info? Same with Hakka, actually, perhaps a simple list would do, not sure if every word/character can be found in the used references. --Anatoli(обсудить/вклад)01:29, 20 May 2014 (UTC)Reply
The process of extracting those values can be automated. The "one or two of the transliterations there" are for Old Chinese, not Middle Chinese. Wyang (talk) 01:47, 20 May 2014 (UTC)Reply
Latest comment: 10 years ago2 comments2 people in discussion
Was just wondering if you had a suggestion about how to translate the extended meaning of 備胎? The best I could do was "a possible replacement for one's current partner". It's a terrible translation, but I'm not sure if there is any equivalent for this in English. ---> Tooironic (talk) 04:17, 14 May 2014 (UTC)Reply
Aha, good one. Don't think an exact equivalent exists in English - maybe "a backup", "a second choice", "a just-in-case", "a plan B", "a contingency"? Wyang (talk) 04:26, 14 May 2014 (UTC)Reply
Latest comment: 10 years ago4 comments2 people in discussion
Hi Frank,
Could you check this entry please - specifically word boundaries and Min Nan transliteration? There is some Wu specific grammar and words I don't understand in this usage example. --Anatoli(обсудить/вклад)00:06, 15 May 2014 (UTC)Reply
I have checked Wu. It doesn't seem to be used in Min Nan. Which bit of the grammar do you not understand? 立(站)-辣(在)-窗口頭(窗口前)-額(的)-搿(這)-個-人-是-㑚(你)-經理,對- 𠲎(嗎)? Wyang (talk) 03:13, 15 May 2014 (UTC)Reply
Thanks for adding Mandarin, I understand now. I hoped there is a Min Nan reading, also for 你们, even if the words are not used in Min Nan. Should 窗口頭(窗口前) be split or is it synonymic to 窗口? Also, Qian Nairong says 我 is also pronounced as "whu23" by young people, normally "ngu34", that's "3ngu" and "3hhu", right? Which tone is right? Can I add the alternative "hhu" pronunciation? --Anatoli(обсудить/вклад)03:22, 15 May 2014 (UTC)Reply
Latest comment: 10 years ago4 comments2 people in discussion
When converting topolects to the new format, the longest time is to convert from using {{Pinyin-IPA}} to {{zh-pron}}. Could you run a bot to change those on existing Mandarin entries? I don't know if it's hard and if it may cause other problems, though. --Anatoli(обсудить/вклад)01:18, 16 May 2014 (UTC)Reply
I'm not sure what you mean. I am only doing it manually (copy/paste) or re-generate with {{zh-new}}, which adds {{zh-pron}}. Can you show, please? --Anatoli(обсудить/вклад)
Wyang is very busy with merging topolects. I'm also hassling him to add Wu pronunciations, which I attempt to do myself. For some topolects without a developed transliteration system it's especially complicated and may not be even available. If IPA or sound recording is found, then it's possible but this information has to be found. Having said this, a starightforward way to add topolects, which are not handled yet must be addressed, if IPA or sound recording is found.
In Shanghainese Wu, it follows phrase tone sandhi rules, as its individual parts are evident. It's ka44 hhieu23. Wyang (talk) 02:28, 16 May 2014 (UTC)Reply
I don't hear /ɦ/ either. I've got a little book on Shanghainese. They speak very fast, though and I don't seem to get Wu sounds well. here's a nice recording on , the site Wyang gave me.--Anatoli(обсудить/вклад)03:01, 16 May 2014 (UTC)Reply
/ɦ/ is the slight constriction of the glottis in the recording. Apart from the constriction, the presence of 'hh' also causes the tone to be lower when the character is pronounced in isolation. Compare 椅 i and 夷 hhi, as well as 矮 a and 鞋 hha. Null-initial and /ɦ/ are found in complementary distribution, occurring in characters which had voiceless and voiced initials in MC respectively. 油 (you2) had voiced initial in MC, which is why it is tone 2 in Mandarin (阳平) not tone 1 (阴平). 幽 (you1) would have voiceless initial in MC, and its Shanghainese pronunciation would therefore lack 'hh' and be just 'ieu'. Wyang (talk) 04:42, 16 May 2014 (UTC)Reply
Makes a bit of sense but how do you know if it's 阳平 or 阴平 tone? Do you know Middle Chinese pronunciation for these characters? For 油 Wu minidict only shows "yeu" 平/1. So it can be either 1yeu or 3hhieu? --Anatoli(обсудить/вклад)05:23, 16 May 2014 (UTC)Reply
For 油 it is 3hhieu (MD: yeu 平/1, 阳平), and for 幽 it is 1ieu (MD: ieu 平/1, 阴平). You can use the MC pronunciation or other dialectal information. For the level tone it is easy, Mandarin 1st tone = 阴平, 2nd tone = 阳平; Cantonese 1st tone = 阴平, 4th tone = 阳平. So compare: 幽 (M you1, C jau1, W 1ieu), 油 (M you2, C jau4, W 3hhieu). Wyang (talk) 05:28, 16 May 2014 (UTC)Reply
So, you basically can use Mandarin pronunciation + 平/去/入 from MD to determine the tone of isolated hanzi? I was only relying on MD for tones when I couldn't use Qian's book. --Anatoli(обсудить/вклад)05:34, 16 May 2014 (UTC)Reply
You don't need to use Mandarin. The voicedness of the initial and 平上去入 is enough for knowing which tonal category the character belongs to in Shanghainese. MiniDict's 'y' is 'hhi', so for the voiced initial 'hh', the tonal category of 油 is tone 3 (voiced, 平, i.e. light level). Wyang (talk) 05:37, 16 May 2014 (UTC)Reply
I still find it hard to convert what I find in wu-minidict to what you have described. I'm not giving up but it's kind of difficult to combine learning and editing. Even if I get an audio file to listen to Shanghainese words, I can now pick up only some tones, phrasal tones make little sense. I'm more or less comfortable with reproducing and picking up Mandarin tones, I never really bothered with IPA, since I used pinyin and characters. And I'm still about uncomfortable with numbers used to represent tones in IPA but I'm getting more understanding. My exposure to Cantonese is much shorter but I used lessons and listen to recording but I'm not comfortable with Cantonese tones. Still, Cantonese doesn't sound as alien as Shanghainese, my former Chinese classmates taught me some too. After the merger, I'll do a bit more Shanghainese. Sorry for bugging about transliterations and thank you very much for your help. If it's not a burden, I'll keep adding words to my list of words to transliterate in Wu.
On the topic of Xiang, Gan, etc. Since there's so little documentation, no standard or official transliteration, are we going to handle those at all? if yes, in what way? Currently, there's almost nothing in Wiktionary, outside Mandarin and major popular topolects - Cantonese, Min Nan, Wu and Hakka. What if there's a sourced audio-recording or IPA in Xiang Chinese? Can we have a simple framework for those? E.g. as simple as x=IPA(key): /siɔ̃44 ny31/ invalid IPA characters (4431), etc. in 湘语#Pronunciation? Just a thought. --Anatoli(обсудить/вклад)01:50, 20 May 2014 (UTC)Reply
Shanghainese is a bit unusual among Chinese dialects. It arose as sort of a creole of different Wu and Mandarin dialects in the past century, which is why its phonology is a lot simplified compared with the neighbouring dialects. Its tone system is on the verge of breakdown (or from another perspective, on the path to a pitch accent system), and there is so much homophony and multisyllabification. For example, the listener wouldn't know whether the person who said 我买/卖过汽车 has the experience of buying or selling a car. No worries about the transliteration checks.
Tones are hard to get used to, especially when there are too many of them in the language.
With regard to the other groups, the only one with some printed romanisation material would probably be Min Dong. The romanisation is Foochow Romanized or "Bàng-uâ-cê" (same characters as POJ). However, the phonology of Min Dong is notoriously difficult, arguably the hardest in theory among Chinese dialects. There are complex sandhi rules not only for tones, but for initials and finals (!) as well (See how Fuzhou dialect#Rimes has two sets of values for each rime). Luckily I had some exposure to it before. The amount of printed material using that romanisation is meagre, although I am looking for ways of obtaining those material either electronically or in print.
The other ones - I would just set the parameter |x=, |g= to IPA. The parameter will be passed to a function which converts numbers to superscripts: x=siɔ̃44 ny31. Audios can be added using |xa=, |ga=; see 中国. Wyang (talk) 02:22, 20 May 2014 (UTC)Reply
It's a shame Wu/Shanghainese has so few resorces. The site you gave me - doesn't use consistent spelling and there's so little about grammar. Ming Dong seems scary and there must be very little written in this dialect or only in Roman letters. Another problem is, dialectal words may not pass RFV, if they only appear in chats, dubious web-sites and the pronunciation/transliteration provided is amateurish or otherwise incompatible with the way we write IPA/transliteration here. So, some dialects, even big ones may miss out completely. --Anatoli(обсудить/вклад)02:36, 20 May 2014 (UTC)Reply
It's not easy to access them, I don't see myself mass-adding entries in smaller dialects, I may become more comfortable with Wu later, and Min Nan and Cantonese are available enough. I think we should create a simple enough framework, though (like you said x=IPA(x)). Please also answer my question above about the format of Xiang IPA or let me know if you're undecided yet. --Anatoli(обсудить/вклад)03:50, 20 May 2014 (UTC)Reply
Categorised now. Gan, Jin and Xiang promoted (it looks a bit weird though, having a mix of romanisations and ipas). Wyang (talk) 04:37, 20 May 2014 (UTC)Reply
Thank you. I think it looks OK for the lack of romanisation and because there could be multiple IPA for other varieties. we can document it later.
Without actually suppressing any dialect, there should be probably be a technical limit on what can go into {{zh-pron}}, and can be added to PoS categories. What if a small regional entry with a pronunciation is added by a contributor, e.g. Sichuanese Mandarin横顺 (huan2 sen1) (=反正) or even smaller, less known dialect? Wiktionary principle is all words in all languages, though. What do you think? --Anatoli(обсудить/вклад)04:56, 20 May 2014 (UTC)Reply
Good job! There are some multisyllabic adjectives, interjections, pronouns and prepositions. I have just cleaned a few proper nouns. Well, when all varieties are done, you can do Mandarin? --Anatoli(обсудить/вклад)08:45, 20 May 2014 (UTC)Reply
You are right... For some reason I erroneously filtered some articles off the list. I'm now generating a still-to-do list from the dump, and I'm probably looking at >100 pages here. Wyang (talk) 10:52, 20 May 2014 (UTC)Reply
Could you run your AWB again, please? It's just not efficient to do it manually.
I have saved the file but I have no idea how AWB works and I don't have it. You'd probably have to spend much time explaining. Tomorrow's fine or any other time, as long as you're planning to do it. --Anatoli(обсудить/вклад)13:28, 21 May 2014 (UTC)Reply
At any rate, if you would like to learn to use it any time, I'm more than happy to help. All you do is download it, put the file in (File > Open Settings), log in (File > Log in/Profiles > Add), and run (Start > Start). I will do some when I have time. Wyang (talk) 23:25, 21 May 2014 (UTC)Reply
Sorting
Latest comment: 10 years ago2 comments2 people in discussion
Latest comment: 10 years ago2 comments2 people in discussion
Just bringing this to your attention. All these entries have come up with "(At least one of the forms in the hanzi box is uncreated...)" at the top of the page. ---> Tooironic (talk) 04:29, 21 May 2014 (UTC)Reply
It goes away if you save the page with an empty edit. It's a server lag problem. I'm using my bot to do null edits on these, so it should go away soon. Wyang (talk) 04:32, 21 May 2014 (UTC)Reply
zh-usex
Latest comment: 10 years ago7 comments2 people in discussion
Latest comment: 10 years ago2 comments2 people in discussion
Frank, could you please edit the entry yourself, specifically the Wu transliteration, perhaps some use examples? Just one of them is okey - 個, I'll fix the other one (trad./simp.).
I have added a few Wu entries without updating the check-list. Some are from the Wu dictionary (astronomy, weather), so I have some confidence about the tones but initials/consonants may need checking but the IPA generated looked similar (not identical to Wiktionary methods you designed). I also used existing verified entries for reference. I can't easily access the dictionary, though. So, others entries need more attention still - both tones and the rest. Would you prefer me to add any new Wu entry to the checklist? Thanks for regularly checking it! It's really helpful.
I'd like to do more Wu, I'd appreciate if you check my edits. The more entries we, the easier it gets to add more contents.
I'll leave the remaining work on topolect merger to you, since you're better equipped with tools and skills (there are still remaining multisyllabic entries but you need to update your list, since I have done a few) but I will work gradually on single-characters entries, they probably can't be done automatically?
I don't have a strong preference for this... To me they are just different ways of looking at the phonotactics. I prefer /y/, as I think there isn't a semivocalic component that is worth notating, but that might just be my idiosyncrasy. If you change it, make sure you change /i/ and /u/ as well. Wyang (talk) 10:00, 23 May 2014 (UTC)Reply