Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2019/November. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2019/November, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2019/November in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2019/November you have here. The definition of the word Wiktionary:Beer parlour/2019/November will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2019/November, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
October was a busy month, and the issue of the month reflects the online and offline activities well with the presentation of a printed dictionary based on the French Wiktionary, the report of a weekend of contribution to the French Wiktionary in Grenoble, a significant improvement in the help on definition writing, Vongnes, the opinion of two newly accessible online dictionaries, a dozen briefs, lots of statistics and a lot of nuts.
This issue was written by eleven people and was translated for you by myself. This translation can still be improved by readers (wiki-spirit). We hope you could enjoy this reading and we'll be happy to answer any question you may have about our publication or articles in it Pamputt (talk) 17:47, 2 November 2019 (UTC)
Yeah! It has been a while since the last issue we translated in English for you! This one is pretty unique with our first two days of pure contributions with 15 people and a great nut pie. Also, a published dictionary based on Wiktionary is something rare. I hope you will find this reading very interesting. Noé18:29, 2 November 2019 (UTC)
Yeah, I know. I don't get why the floating div does not jump below the other div, I may have used an imperfect css, or Mediawiki is not really fund of this kind of display trick. Noé10:18, 3 November 2019 (UTC)
Community Wishlist 2020
The 2020 Community Wishlist Survey is now open! This survey is the process where communities decide what the Community Tech team should work on over the next year. We encourage everyone to submit proposals until the deadline on November 11, 2019, or comment on other proposals to help make them better.
This year, we’re exclusively focusing on smaller projects (i.e., Wikibooks, Wiktionary, Wikiquote, Wikisource, Wikiversity, Wikispecies, Wikivoyage, and Wikinews). We want to help these projects and provide meaningful improvements to diverse communities. If you’re a member of any of these projects, please participate in the survey! To submit proposals, see the guidelines on the survey page. You can write proposals in any language, and we will translate them for you. Thank you, and we look forward to seeing your proposals!
Hey, only five days left and there is very few proposals for Wiktionary. I am quite sad, because I have plenty ideas. So, here so, to adopt:
Problem: Some scripts are not activated because they are heavy, and new contributors may not find how to pimp their interface to fit their needs to contribute easily. Proposed solution: Having a Reading mode and a Contribution mode with dedicated and open parameters, including specific gadgets and css.Posted by Noé
Problem: We don't have decent statistics for most of Wiktionary projects. Proposed solution: Having better metrics, such as the one we have in French Wiktionary, for examples count, pictures count, quantity of nouns, adjectives, how many people have contributed to thesaurus in the past months and so one.Posted by Lyokoï
Problem: There is no reuse of Wiktionary content in electronic readers. Proposed solution: Having an app to use Wiktionary as a dictionary in reading apps.Posted by DaraDaraDara
Problem: There is very few reuse of Wiktionary content because the dump doesn't fit the needs. Proposed solution: Having more export formats, including set-ups to select a set of languages or a set of 50k more viewed pages.Posted by Lyokoï
Problem: Wiktionaries offer definitions, but there is many ways to describe a words and it is complicated to indicated the other dictionaries with other definitions. Proposed solution: Having a tab with automatic import of dictionaries entries transcluded from Wikisource.Posted by DaraDaraDara
Problem: Search engine is not made for dictionary needs. Proposed solution: Having an internal anagram and advanced search such as the one we have in French Wiktionary, to find anagrams but also words with a specific sound or with a sequence of letter and a grammatical class for example.Posted by Lyokoï
Problem: We are not capable of detecting all new words in the press. Proposed solution: Having a tool that record words in newspapers that are missing in our projects.Posted by DaraDaraDara
Problem: Have you ever tried VisualEditor on Wiktionary? It works quite bad. Proposed solution: Having a visual editor adapted for Wiktionary.Posted by Romainbehar
Problem: If I want to select a sample of definition to learn them, it is quite difficult and boring now. Proposed solution: Having a tool to create a list of entries easily exportable, with selected information, to serve to memorize or learn a language.
Problem: Little discrimination between definitions by frequency of current use. Proposed solution: KWIC indexing of very large corpora to use collocations of words, synonym groups, and other semantic groupings to identify potential meanings in current use and their current frequency of use. DCDuring (talk) 14:41, 8 November 2019 (UTC)
That's something bigger than my most consequent propositions, something that could be handle by an external research team maybe. I think a grant for a scholar could be a good way to develop that, similarly as Esther' project Etytree. Well, please post it in Meta, it may reach someone able to do it Noé15:02, 8 November 2019 (UTC)
"At that time, there were 4046 languages on Wiktionary." Manifestly inaccurate?
The statement "At that time, there were 4046 languages on Wiktionary." appears on this page . However, some Wiktionary editors say that 'Translingual' is not a language, and I of course agree. I think the statement needs to be modified to something more like, "At that time, there were 4046 language headers on Wiktionary." I know you may not like it, but I want to make sure we are honest here. If the dictionary's homepage has a link to an obviously mistaken sentence like that, then what credibility does the website have? (I discussed this with Mnemosientje on the user's page.) I'm trying to keep you all honest here, please don't take offense. --Geographyinitiative (talk) 15:08, 5 November 2019 (UTC)
Normal people care more about languages than language headers. Why not just say "At that time, there were 4045 languages on Wiktionary."? In a discussion of the distinction between, say, language and dialect "Translingual" is just a distraction. DCDuring (talk) 15:34, 5 November 2019 (UTC)
Whether or not you like me or want to make fun of me, the website is spouting manifestly objectively mistaken information as far as I can tell. Of course normal people care more about languages than 'language headers'. But these normal people you are referring to also want accurate information, don't they? Or do normal people just want to pretend Wikipedia has 4046 languages and not know what the actual facts are? The only objective thing we can really say is that there are 4046 language headers as far as I can tell (I may be wrong). If you want to say Chinese is a language, you need to change this page , where Chinese is not considered a language. It's kind of a concept similar to 'Translingual'. "Chinese is a group of related, but in many cases mutually unintelligible, language varieties, forming the Sinitic branch of the Sino-Tibetan language family." "There is no unique "Chinese language". There is a group of related ways of speaking, which some may call dialects, others call "topolects" (a calque of Chinese 方言, fāngyán; DeFrancis uses the term "regionalects"), and still others would regard as separate languages, many of which are not mutually intelligible." If normal people want lies, then count me out. --Geographyinitiative (talk) 16:05, 5 November 2019 (UTC)
I’ll delay this month’s statistics update until a consensus is reached or the discussion fizzles out. Could someone send me a ping once we’re done here? Thanks in advance. — Ungoliant(falai)16:18, 5 November 2019 (UTC)
It seems clear from their comments that DCDuring & Mnemosientje (and myself of course) don't see the 'Translingual' language header as a header for any 'language' per se, so the statement "At that time, there were 4046 languages on Wiktionary." is already seen as dubious or maybe even erroneous by community members. Updating the page using its current form as a template may be immoral or unethical due to this and other related problems of factual accuracy on the page, so I understand your reaction. --Geographyinitiative (talk) 22:58, 5 November 2019 (UTC)
It should probably say "unique language headers". Because of typos etc. there're also likely some errors in there, so I wouldn't freak out about a count difference of +/- 1. – Jberkel23:38, 5 November 2019 (UTC)
@Equinox Misrepresenting the number of languages on Wiktionary is a grave error. What is more likely, that "Translingual" is a language, or that we are satisfied in our position/privilege in society and just don't care enough about the misrepresentation that was going on and want to meekly allow a factually mistaken statement to persist on the website? Your attitude (as I am perceiving it) of nonchalance and not caring if the statistic is accurate is what ruins the statistic. I'm the one doing the right thing and you're advocating for nonsense as far as I can tell (personal opinion). --Geographyinitiative (talk) 01:28, 6 November 2019 (UTC)(modified)
Misrepresentation of the concept of Translingualism as a language is just silliness of course. Misrepresentation of Chinese language as one language is not a nuanced academic perspective. My edit (languages->language headers) is not really that good, but the interia of the status quo is very strong as we can see from the attacks against my motives. 'something is rotten in the state of Denmark'--Geographyinitiative (talk) 01:44, 6 November 2019 (UTC)
I am here to make a dictionary, and why would the dictionary need to incorporate blatantly inaccurate statistical information? Have mercy on the people we are misinforming. --Geographyinitiative (talk) 01:52, 6 November 2019 (UTC)
GI, take some advice. If you're here to make a dictionary, then make it. This kind of thing really doesn't matter — all it does is waste your own time. Remember: the dictionary is the goal, and the rest (including the main page) is nice, but ultimately orthogonal. —Μετάknowledgediscuss/deeds02:07, 6 November 2019 (UTC)
A simple thing would be to sample the languages with very small numbers of entries and find out how many are problematic. That way we could:
have a fairly high-quality estimate of the total number of languages to justify any claims we make in places where non- and low-volume-contributors might be misled and
determine whether there is a problem worth being addressed by all of us persons of privilege.
We might also be able to determine how to manipulate the process to ensure the continuation of our world domination. DCDuring (talk) 02:38, 6 November 2019 (UTC)
Secure in our world domination, I will use my position of privilege to make blatantly inaccurate assertions about some characteristics of the list of languages created by my similarly privileged henchman Ungoliant.
About 45% of the languages have a single entry, usually for their term for water. Only about 15% of the languages have more than 100 entries. About 5% of the languages have names of the form 'Proto-'. None of this should come as a shock to any regular contributor here. An outsider might not expect that so many languages would have only one entry. Perhaps it would deserve mention that a small number of contributors worked hard to accumulate the words for water in as many languages as they could, incidentally thereby testing the technical limits of our software architecture. Perhaps it would be nice to note how many reconstructed, dead, and endangered languages are one our list. But I don't see how any of this rises to make the data as it is now 'manifestly' and 'blatantly' 'inaccurate', 'unethical', and 'immoral'. In fact none of those adjectives and adverbs seem close to appropriate. All efforts to collect real-world data about human behavior inevitably are faced with conceptual problems like what is a language (vs a dialect)?, should reconstructed languages count?, etc. I hope no one thinks that corporate accounting data or national economic statistics are much better than the statistics we use to encourage and monitor progress.
OTOH, we are getting near the end of the year, so manipulating statistics and suppressing dissent may help me get another big bonus, a raise, and stock options. DCDuring (talk) 03:31, 6 November 2019 (UTC)
More hyperbole. The boundary between languages and dialects is very hard to define. If you asked 100 linguists, you'd get 200 guestimates of how many languages we have. The difference between Chinese as one language and as a family is a mere rounding error. This is nothing but another excuse for you to crank up the drama over an arbitrary choice made for practical reasons. Chuck Entz (talk) 04:00, 6 November 2019 (UTC)
Wiktionary uses a weird definition of a language: an entity that has a code and a "canonical name" assigned to it in one of the submodules of Module:languages. Under that definition, we have 8092 languages. About half of them have entries (pages in the main and Reconstruction namespaces, and some Appendix pages, in which the canonical name of the language is a level-2 header), and some of them, like Translingual and Chinese, might not be what a normal person, or even a linguist, would call a language.
So it's just a weird convenient practical definition of "language" for the purposes of pages in this wiki. It seems unproductive and beside the point to argue about whether it's utterly unacceptably misleading. It's just a stupid internal convention we have to call these things "languages". It's not meant as a serious contribution in the long history of arguments among linguists and philosophers of language and nationalists and whoever else over what "language" is.
So I support "language" being changed to "language header", or something else, in Wiktionary:Statistics/generated if it seems somewhat clearer and less likely to be misunderstood, and particularly if it will cut short unfruitful discussions. — Eru·tuon03:48, 6 November 2019 (UTC)
Regardless of how the “what” of the large number gets named, to give a more meaningful statistic we ought to add something along the lines of “, although only 155 languages are represented by 1000 or more entries”. (A better statistic is the number of lemmas; is that not available?) --Lambiam09:14, 6 November 2019 (UTC)
I'm a little skiddish (as I pronounce it) about getting blocked, so I will leave this to greater minds for now- but I'm pretty sure I have pointed out a major error of wording on that page. Good luck --Geographyinitiative (talk) 11:10, 6 November 2019 (UTC)
{{surname}} automatically adds a final period to its output unless |dot= or |nodot= is specified. {{given name}} does not add a final period. I propose making {{surname}} not add a period, for consistency. I would use a bot to add a final period wherever dot= and nodot= aren't present, to preserve the same output. Benwing2 (talk) 03:20, 6 November 2019 (UTC)
Pali Masculines in -ī - Citation Form?
Pali masculine nouns traditionally considered as nouns in -ī, such as sāmī, are historically consonant stems in -in; the PTS dictionary cites the word as sāmin, which is not a case form, whereas sāmī is a case form. We currently don't have many such words.
What has created an issue is that @Lbdñk has added sāmin as an alternative form, and the descendant entry {{desctree|pi|sāmī}} in Sanskrit स्वामिन्(svāmin) is therefore listing both forms. I don't believe that double listing is correct - they are just alternative citation forms.
What should we do? I see three possibilities.
List sāmin as an alternative citation form in the entry for sāmī, and add an entry for sāmin as an 'inflected' form.
Give sāmin its own entry, referring to sāmī for inflection, and don't mention it on the sāmī page, as I have done with pitu and pitar.
Change the citation form to sāmin, and demote the pages for sāmī and its attested equivalents in other scripts (including as yet unattested cases where the spelling is used in other scripts for a different word, perhaps in a different language). This will need a bit of fiddling with the inflection templates, but isn't anything that isn't already being done for Pali declension.
@RichardW57: Sorry for my carelessness and the inconvenience I have begotten. I also agree that sāmī and sāmin should not be listed at the same page calling one as an alternative form of the other. Well, I did not understand what you mean by "case form". Also, sāmin is not there in the declension table of sāmī as an inflected form (as you have mentioned). At any rate, sāmin is the more historical representation of the word: Turner also mentions this form. So it is better that sāmin be created as the main entry; in fact, it is स्वामिन्(svāmin), not स्वामी(svāmī), that is the lemma form for the Sanskrit word. You may go ahead as you wish. —Lbdñk (talk) 14:07, 7 November 2019 (UTC)
@Lbdñk: sāmī is a 'case form' because it is the nominative singular and the short nominative, vocative and accusative plural.
Your etymological argument hits a problem with the neuter of adjectives in -in. They are declined in exactly the same way as the etymologically distinct category of neuter nouns in -i. Besides, we are documenting Pali, not its sister Sanskrit.
Another issue is the citation forms in other scripts. A Thai dictionary will give the citation form in -ī, not in -in. Indeed, in the (Thai) Royal Institute Dictionary, the sources of the Thai word บาบี are given as Pali ปาปี(pāpī) and Sanskrit ปาปินฺ(pāpin) (corresponding to pāpī and pāpin respectively), even though you argue for the same citation form. Given that Pali and Sanskrit are both written in the Thai script, should one be able to look up the cited Pali source form directly? It's not impossible to choose different citation forms for different scripts, but it will complicate the template, or require lots of manual overrides.
We already have the issue that for Pali stems in -ant, the font urged on us by the web pages cannot render the Burmese script forms! (See, for example pacant.) --RichardW57 (talk) 20:04, 7 November 2019 (UTC)
I never see Pali words ending with -n in any source. Pali is not directly derived from Sanskrit, but from Prakrit inbetween. Pali has itself grammar/declension that not refers to Sanskrit. So we would have not "Sanskritize" Pali. --Octahedron80 (talk) 00:57, 8 November 2019 (UTC)
Using forms that aren't words as the citation 'forms' is a bit unusual. Some words with stems in -ar were entered by their stems so that the declension generator would work (in Latin), and @AryamanA insisted that bhagavant be the lemma for bhagavā. Building on that, the declension templates now work with neuters in -as and -an, and rājan is the lemma for rājā. None of these lemmas occur as actual words. One consequence is that we have little precedence for handling alternative citation forms. For Latin verbs, the problem can be handled by making sure that both the possible citation forms, the first singular present indicative and the infinitive, have entries. --RichardW57 (talk) 17:35, 8 November 2019 (UTC)
@RichardW57, Octahedron80: Using the stem form as the lemma is a well-established practiced for all of the Prakrits and Pali, not only Sanskrit. These words have unusual declination and the stem form is useful in deriving them while the nominative singular can sometimes be ambiguous. I agree that sāmin should be the lemma, but Richard is right to point out that the nominative singular ought to have an entry for convenience. —AryamanA(मुझसे बात करें • योगदान)14:22, 13 November 2019 (UTC)
I agree too. For Sanskrit, we list nominative singulars as inflected forms and use the stem as the lemma form, e.g. at अर्धपथः(ardhapathaḥ), अंशकरणः(aṃśakaraṇaḥ), and दाता(dātā). I see no reason not to do the same for Pali. —Mahāgaja · talk15:50, 13 November 2019 (UTC)
Can one write Sanskrit stems in Sanskrit? (The Sanskrit Burmese script rendering problem seems to be a result of a misspelling - virama instead of asat at the end of a word. Like our Pali script conversion, you'll still hit a rendering problem with -nt stems.) The Pali Pali stems don't end in normal consonants in Pali - the Pali stems in -nt end in -ntu in Pali! We're not sure how to write consonant stems as stems. --RichardW57 (talk) 21:11, 13 November 2019 (UTC)
Expressing a Pali masculine stem as -in offers no advantage - masculine and neuter stems in -i, -u and -ū all use -n- to separate some case endings from the stem vowel. (I'm not aware of any spelling issues with this declension, whereas there are with -a stems.) Or will you insist that regular masculines in -ū are actually in -un because they have -n- in the NVA plural but sayambhū doesn't? --RichardW57 (talk) 21:11, 13 November 2019 (UTC)
The word minute, as a period of time, means “sixty seconds”. But when you hear someone say, “Hang on a minute please”, they do not beseech you to wait for sixty seconds. It just means, “a (relatively) short period”. The meaning “sixty seconds” is the strict sense of the term. However, in the idiomatic use in a request, that meaning is relaxed, so then it is used in a loose sense. Thesaurus.com givesstrict as one of the antonyms of loose. You can also contrast “strict” morals with “loose” morals. --Lambiam20:14, 7 November 2019 (UTC)
Help with creating wanted topical categories
I'm looking for help creating topical categories listed in Special:WantedCategories. I have a bot that automatically creates all categories that can be created using {{auto cat}}; the remaining topical categories on the list are for topics that don't currently exist. The full list of such categories, grouped by topic and sorted by the number of wanted categories for each topic, is here: User:Benwing2/wanted-topical-categories-2019-nov-7. For convenience, following are the top 50 wanted topics:
As can be seen even from this list, a lot of the wanted categories relate to plants; I think User:DCDuring is the plant expert here. A lot of the categories also relate to geographic entities of various sorts; it looks like User:Carl Francis is responsible for the Cebuano entries, and User:TagaSanPedroAko for the Tagalog entries.
BTW some of the topics above are clearly erroneous; the way to fix them is to click on the category/categories in question and edit the pages that are calling for those categories. Benwing2 (talk) 20:05, 9 November 2019 (UTC)
Chuck Entz has been creating topical categories rather diligently for quite some time, which is why there are so few (2,167). Among "Wanted" things, the more serious problem is wanted pages, for which the last item is a Chinese term displayed as a redlink on 22 pages. Wanted templates is cluttered by thousands of wanted "tracking templates". Wanted files is cluttered by all the wikicommons files that show as wanted because they do not reside on this wiki. DCDuring (talk) 20:22, 9 November 2019 (UTC)
@DCDuring I could write a script that would go through the most recent dump and find all the nonexistent templates and other pages linked to, and count them up, if that would help. This would necessarily skip all templates and pages that are generated by other templates or by Lua code, which would automatically skip all the tracking categories. Benwing2 (talk) 20:48, 9 November 2019 (UTC)
What might be useful would be a listing of the redlinked entries, sorted by language, preferably with the number of redlinks. They could be linked to from the About pages for the associated languages. If we can't get all the mainspace redlinks, then getting all the ones from templates like {{l}} and {{m}} would be a good start. DCDuring (talk) 00:56, 10 November 2019 (UTC)
The plant categories should be empty now. Most of them were either categories I never created because they would never have more than one or two entries in any language, or categories where I used a common name rather than the taxonomic name. In one case I changed the category to "Plants" rather than creating "Aquifoliales order plants" to accommodate a single term in a single language that wouldn't belong in "Hollies". Then I would have been faced with the choice of either having a parent category that's superfluous in all but one language or to merge the intuitive and user-friendly "Hollies" into "Aquifoliales order plants", which is neither. Chuck Entz (talk) 21:04, 9 November 2019 (UTC)
“Hollies” might be intuitive and user-friendly but the whole picture of having English vernacular names on some places but taxonomical translingual designations at other places is not, and maybe people do not want it intuitive in the first place but want to apply categories mechanically – everything that is not mechanical interrupts work and zeal. The recent example you have seen, Vtgnoq7238rmqco did just this by categorizing in Latin merely, where you see too that the mashup is hard to explain to new editors one invites to use the category system. Thou perhaps sortst the plant world by English vernacular names but for obvious reasons I don’t, and particularly for the commonly used taxa I took care to map the botanical order of nature in translingual terms into the mind, only to find botanical thought foiled by this mashup. English-speakers of course also excuse the use of vernacular names because the spelling of the taxonomical names is hard to remember especially if they were learned in the **** pronunciation Latin is commonly uttered by them, but that is even more reason to use the names, for the sake of repetition, just like for reconstructed languages errors are thrown out outright if the asterisk before a word is missed, so one gets used to using it. Gentle push. (systemd fan here, yes.) Fay Freak (talk) 21:49, 9 November 2019 (UTC)
@Benwing2: A category like Category:Pali nouns in Lao script is currently a subcategory of Category:Pali nouns; dual parenting by script and by part of speech would be nice if it can be done. The categories by part of speech and script have their uses - there can be script-specific issues in the spelling of inflections forms. For example, the ablative/instrumental plural forms of a Pali noun in Lao script depend on the consonant repertoire of the Lao script orthography used, and the feminine declensions may have their unique complications. (For maintenance, splitting nouns by 'declension' may help - we have a trans-script issue with the declension of feminine nounse in -i. More data is required before it can be fixed.) Splitting by script helps until we get random access to alphabetically sorted batches of lemmas. --RichardW57 (talk) 22:10, 10 November 2019 (UTC)
@RichardW57 Thanks. I took the liberty of fixing some typos in your comment. Dual category parenting is easily possible, although only one of them will be displayed in the "breadcrumbs" at the top; I think this should be "terms by script" -> "FOO script" -> "POSs" rather than "POSs" -> "by script" -> "FOO script". Note that there's nothing preventing one language from splitting by POS and another language not, esp. if we adopt the former breadcrumb style, as I advocate; {{spelling of}} could have an optional part-of-speech param and categorize into e.g. "LANG nouns by FOO script" if |pos=noun, but "LANG terms by FOO script" if |pos= is unspecified. For Pali, the categorization happens automatically by the headword template, in Module:pi-headword (and the same for that matter could and probably should be done for Azerbaijani, Kazakh and other languages with multiple scripts). Benwing2 (talk) 22:24, 10 November 2019 (UTC)
Special:WantedPages contains at least 5,000 pages that have 22 redlinks or more. The intent of the special pages is to assist us in adding entries that have the most value, assuming that the frequency of wants correlates to true value. In the last four years or so, the threshold has nearly doubled from 12 redlinks to 22. The total number of missing entries must, of course, be vastly more than 5,000, probably more than 500,000. As it is now, it is hard to find English terms on this list. Many of those that can be found are linked only from spaces other than mainspace, usually user space, mainspace talk, and Wiktionary pages. Thus, the important job of finding and adding the most frequently "wanted" English entries is made difficult because adding FL entries from the wanted lists seems to be lagging behind the processes that add redlinks.
There are two approaches to reducing this list. The first is to add the missing entries. The second is to remove some of the more frivolous links. Removal of frivolous links would involve reviewing user space, templates, and large tables of derived terms. Unfortunately the most frivolous links would occur at very low frequencies, far below 22 redlinks, not helping with problem of the top 5,000.
One source of redlinks is templates that are transcluded in significant frequency. Some of these are inflection/declension templates that contain redlinked grammatical terms. One really doesn't have to know the language to make a stub entry as the tables usually have the corresponding English term. I suspect that there are red-link generating templates and, worse, modules that generate redlinks in non-obvious ways.
Why are there 143 pages of the form 'Wiktionary: transliteration', but no visible redlink on the supposedly linked-from pages? Is that careless module-writing or something else? DCDuring (talk) 03:42, 10 November 2019 (UTC)
@DCDuring Whenever such a page exists, it's linked using a black dot in the headword between the native script and transliteration. I think the wanted pages entries for the non-existent pages comes as a side effect of the module's checking for the existence of the page. Benwing2 (talk) 05:34, 10 November 2019 (UTC)
So this link is designed to be invisible to all but the cognescenti, but conveniently available on every page where one of the cognescenti might need it? Are there sensible stub pages that could be added, for example, generic ones for the script of the language referred to? Something simple like "See article on Devanagari script on Wikipedia." if there were no better article or section of an article, eg, in the article on the language involved. I did something similar for some proto-language links. DCDuring (talk) 13:45, 10 November 2019 (UTC)
I wouldn't say the hidden links are exactly meant for anyone, even the cognoscenti. They're just a result of the weird way the site is designed. I guess in this case it's useful because it gives an idea of which transliteration appendices might need to be created.
However, I think some languages should not have transliteration appendices. Azerbaijani and Vietnamese don't have transliterations, they have Latin alphabets, so Wiktionary:Azerbaijani transliteration and Wiktionary:Vietnamese transliteration shouldn't be linked to at all. So Module:headword needs to learn how to tell which languages have transliterations and which have Latin alphabets. Or maybe the layout of Azerbaijani and Vietnamese headwords needs to change.... — Eru·tuon18:15, 10 November 2019 (UTC)
Thanks for considering this. Testing for the existence of a page each time something loads when the existence is something that rarely changes (Both creation and deletion are rare events for any given page.) seems ridiculous, no matter how fast and cheap it is. I'm thinking of removing such tests from {{taxlink}} and {{vern}}. The required monitoring of categories is easy enough. BTW, the taxlinks and vern templates only exist because Special:WantedPages is useless for items only "wanted" directly from in-entry links because such items are usually wanted from fewer than 10 entries at this stage of Wiktionary's development. DCDuring (talk) 19:39, 10 November 2019 (UTC)
@Benwing2 Is that appropriate? I can certainly envisage Arabic script Azerbaijani containing etymological information not represented in the modern Latin script.
@Benwing2 I took 'them' as referring to the system nagging that someone should do something about implementing the transliterations. In some cases, would it be appropriate to create an 'about' page referencing to a discussion in the talk pages? Not every non-Roman script writing system has an established transliteration (or transcription), and we may have to choose or even create the system.
Indexes for Lemmas
This relates both to wishlists (probably too late to discuss) and the much 'red'-linked pages.
One of the categories of page that came up is Index:Swahili/q and its like, which are missing members of series. The series exist because there are no indexes for navigating pages such as Category:Swahili lemmas. Are we missing a trick for adding such lemmas so that we can get navigation capabilities similar to those for Index:Swahili/b? Or do we need very serious programming help (or dedication) to build such a page on the fly?
Possibly we just need someone to maintain @Conrad.Irwin's index-generating bot? There are two immediate issues - it hasn't run for years, and it doesn't handle unused initial letters well. --RichardW57 (talk) 16:17, 10 November 2019 (UTC)
Pages with unrecognized language entries
@DCDuring Currently implementing a script to find wanted pages by language. First step is finding all existing entries in a dump. This found some entries with unrecognized languages. The following is the complete list of pages with unrecognized languages (surprisingly few, implying that someone, like User:NadandoBot maybe, has been patrolling for this). Could you take a look? Some of them are easy to fix, but for others it's not so obvious (e.g. Middle Japanese):
Page 34215 かめ: WARNING: Unrecognized language: Middle Japanese
@Erutuon Hmm, what I was implementing was essentially the same as what User:Jberkel has already implemented, although I'm not sure which templates are used to generate the lists. User:DCDuring asked for this, maybe they weren't aware of these lists? 22:30, 10 November 2019 (UTC)
@Benwing2: It would be great to add more templates. The CJKV templates require more complex logic, but there are a lot of links in them, so Chinese, Japanese, Korean, and Vietnamese are very underrepresented in the wanted entries list currently. — Eru·tuon00:45, 11 November 2019 (UTC)
@Benwing2, Erutuon: The initial plan for this was to have something to replace the redlink categories, I wasn't trying to cover all cases. There are so many custom templates, it's going to be lot of maintenance work. I'm still hoping we'll get full HTML dumps one day, then we can just perform the same task without implementing all the template specific parsing. I've mentioned this before, perhaps we could try to avoid language specific templates, and have just a few entry templates for all links (and delegate from within the modules). I'm not so worried about 1.5 million missing inflection links – these are less useful as guidelines for which entries to create (manually, that is). – Jberkel21:55, 11 November 2019 (UTC)
@Benwing2: Language-specific referencers also include {{pi-sc}}. There's also {{pi-alt}}, but I only follow up forms if I have an attestation or if the link leads to a different word. (The spellings aren't always obvious.) Of course, it might be useful to list cases where the word referenced by {{pi-alt}} isn't on its page. I presume you're deliberately excluding inflection templates. --RichardW57 (talk) 01:32, 11 November 2019 (UTC)
@RichardW57 There's an infinite variety of language-specific headword and inflection templates, and it's way too much work to try and enumerate and handle all of them. The best I can do is do this for specific requested languages. Benwing2 (talk) 01:42, 11 November 2019 (UTC)
@Benwing2: My thought was rather that one shouldn't aim to have a page for everything referenced by these templates. Rather, it should be relatively straightforward to get from the inflected form to the lemma. --RichardW57 (talk) 01:57, 11 November 2019 (UTC)
Thanks for the ping. I think the "Middle Quenya" entry can just be removed. "Middle Scots" is an etymology-only language; User:Mahagaja can probably advise whether threschald should be ==Scots== or ==Middle English== (and I'll also ping User:Lbdñk, as the entry's creator, to make him aware of this discussion). AFAIK we treatMiddle Japanese as ==Japanese==. - -sche(discuss)02:15, 11 November 2019 (UTC)
Well, the Middle Japanese entries have been there quite a while. The Japanese editors are aware of them which is good enough for me- hopefully to be resolved eventually. DTLHS (talk) 04:11, 11 November 2019 (UTC)
No. The code "sco" is Scots, so what you have is Early Scots and Middle Scots being variants of Scots. Wikipedia gives 1500 as the rough end of Middle English, and 1700 as the start of Modern Scots. The indirectly referenced citations for the word date to the period between 1500 and 1700, so calling it Middle English is not appropriate. Middle Scots seems to be formally descended from Middle English. In this entry, the editor used templates for Scots, so I'd suggest labelling it as an early (or 'middle') form of Scots if you object to having Middle Scots as a language. --RichardW57 (talk) 19:04, 11 November 2019 (UTC)
I was merely pointing out what our structure actually says; I wasn't arguing that it's right or should stay that way. I do agree that the fact that Middle Scots is roughly contemporaneous with Early Modern English, and not with Middle English, is a good argument in favor of making Middle Scots an etymology-only variant of Scots rather than an etymology-only variant of Middle English. But Early Scots can stay an etymology-only variant of Middle English, I think. —Mahāgaja · talk19:33, 11 November 2019 (UTC)
@Mahagaja: It is actually Early Scots that is a variant of northern Middle English, having descended from Northumbrian Old English. On t'other hand, Middle Scots was roughly contemporary with Early Modern English. The best option would be to make Middle Scots a full-fledged language (rather than keep it as an etymology-only language). So should I start a vote, for it to get the approval? —Lbdñk (talk) 11:45, 13 November 2019 (UTC)
@Benwing2, Jberkel: One idea I've had is to separate the parsing of the XML dump file and the wikitext from the parsing of template instances. I currently do processing of templates (like locating incorrect characters or counting script combinations in link templates) by creating dump files of template instances (names and parameters with page names), and processing them rather than the whole XML dump. I generate the template dumps with a Rust program, which is fast and memory-efficient, probably much more so than if it were written in one of the other languages that have wikitext parsers, like Java and Python. I imagine the wanted entries lists could be generated from template dumps and the script would be a bit simpler, and faster to re-run when the template parameter processing is modified.
This would also allow the parameter parsing to be done in any language that can parse the template dump format, either with Java as in Jberkel's version, or with Lua, in which case the module code for the templates could be used with less modification, because it would not have to be translated into another programming language. (However, both of you probably dislike Lua.)
The libraries look interesting, a good excuse to finally start looking at Rust. There are many Wikitext parsers out there, but they all end up unmaintained. Same with the Java library I used. I like the idea of running Lua directly, but there are so many dependencies that it is difficult to embed into a project. The wanted entries process is currently very slow, but it performs a full parse of the Wikicode, which is then serialized to disk, and picked up by another run. Unfortunately toolforge has really bad I/O performance (NFS), so the whole thing takes very long. I could use your template dumps to just parse the templates, that would make it faster. Are they published somewhere? – Jberkel22:56, 8 December 2019 (UTC)
No, but I can get them to you if you'd like. (In that case I guess send me an email to respond to. Or maybe there is a better way....) The current format is a series of CBOR maps (like JSONL), each with a schema equivalent to the JSON {"title":"woordenboek","templates":}, with the strings represented as text strings. The "text" field isn't needed in this case. — Eru·tuon08:53, 9 December 2019 (UTC)
Comparing to "Category:Mandarin pinyin" and "Category:Cantonese jyutping", then "Category:Min Nan peh-oe-ji" (without diacritics) might be preferable. --Octahedron80 (talk) 04:27, 11 November 2019 (UTC)
I don't have a lot of time at this moment, but I would like to point out that the statement, "POJ (Pe̍h-ōe-jī) is not a script, but a transliteration scheme." may be uncertain. The Min Nan Wikipedia is nothin' BUT Pe̍h-ōe-jī with CJKV characters in parenthesis when needed. Their Wiktionary is POJ too. When I make new POJ titled pages on Wiktionary, they sometimes match up with a Ban-lam-gu page in the left hand side where it shows you different language options you can select from. POJ is a transliteration scheme, but so is CJKV characters. That's my two cents for the moment. Not sure how that would influence the rest of the proposal mentioned here. --Geographyinitiative (talk) 04:44, 11 November 2019 (UTC)
(edit conflict) I do agree with Geographyinitiative that the statement "POJ (Pe̍h-ōe-jī) is not a script, but a transliteration scheme" is problematic. It is true that it isn't a script in the technical sense - POJ is written in the Latin script, but it's not simply a transliteration scheme; it's an actual orthography on par with the Han character orthography. One way to deal with this is to treat POJ like Simplified Chinese; in fact, there have recently been attempt to have Simplified Chinese-like soft redirects from POJ entries to Han character entries (under the Chinese header). I do think this is a generally appropriate move as Han characters are generally more common/accessible to Hokkien speakers than POJ. If we go through all the POJ entries and convert them to soft redirects to Han character entries (where possible), this would essentially move them all to Category:Min Nan Pe̍h-ōe-jī forms. I don't think we need the different PoS categories, just like we don't have analogous categories for Traditional Chinese or Simplified Chinese. — justin(r)leung{ (t...) | c=› }04:58, 11 November 2019 (UTC)
OK, I stand corrected about my statement about POJ being a transliteration scheme; it sounds like it's rather an orthography. But it's definitely not a script, which was my primary point; the script is Latin, and category names like Category:Min Nan adjectives in POJ script are wrong. I then suggest renaming the categories to e.g. Category:Min Nan adjectives in Latin script, which parallels e.g. Category:Pali adjectives in Latin script. If the POS-splitting is unnecessary, we can combine all categories into Category:Min Nan terms in Latin script, which parallels e.g. Category:Azerbaijani terms in Arabic script. There is proper poscatboiler support for categories of these names, unlike for "POJ script". I don't think it's really necessary for the category to say "POJ" or "Pe̍h-ōe-jī" (with or without diacritics) in it; the entry itself certainly will say that. But if everyone else feels it's necessary to include the orthography in the category name, I'd go with Category:Min Nan peh-oh-ji (no diacritics), as suggested by User:Octahedron80.
I also agree with User:Justinrleung that these should all be converted into soft redirects; it looks like the Han-orthography entries are better-maintained and contain more information. Having the information duplicated between the corresponding POJ and Han entries is less than ideal. Benwing2 (talk) 06:04, 11 November 2019 (UTC)
@Benwing2: The problem with having all POJ entries labelled as Latin script is that there is more than one Latin orthography for Min Nan. The obvious competitor for POJ is Tâi-lô, and there are other less common ones like Daighi tongiong pingim, Phofsit Daibuun and Bbánlám pìngyīm. Are we going to allow entries in these other systems, and if so, are we going to lump these all under the same category? — justin(r)leung{ (t...) | c=› }06:41, 11 November 2019 (UTC)
@Justinrleung If there are 5 competing "orthographies", then IMO those aren't full-fledged orthographies so much as transcription systems hoping one day to be promoted into orthographies. (For example, do books regularly get written in all 5 of them, and are significant fractions of the population literate in them?) If only 1 or 2 of them are real orthographies, I don't have a problem lumping them under the same category. This has not been an issue for other languages with multiple orthographies in different scripts, and undoubtedly similar concerns come up there. In practice the point seems moot as only POJ entries are getting created. If significant numbers of Tâi-Lô entries get created as well and it becomes an issue, at that point we can see about creating additional categories such as Category:Min Nan terms in Peh-oh-ji orthography vs. Category:Min Nan terms in Tai-Lo orthography. As for the remaining systems, I would say we should bar them or at best treat them as Romanizations, not orthographies, i.e. the entries in those systems do not qualify as lemmas and are nothing more than soft redirects. Benwing2 (talk) 07:02, 11 November 2019 (UTC)
@Benwing2: Unicode Thai does have a word separator - ZWSP (U+200B), but 8-bit Thai doesn't. The space is the commonest form of visible punctuation in Thai. Conveniently for this purpose, Thai polysyllables generally need respelling for phonetics. Thai text to pronunciation is hard, and, strictly speaking, impossible even for words spelt regularly; เพลา(pee-laa) and แหน(hɛ̌ɛn) are the standard examples. In phonetic respelling, syllables are normally separated by hyphens. However, the boundary between phrase and word is very fuzzy in Thai. Note, however, that space is not a word separator in Vietnamese, e.g. ô tô(“car”). One would have to tag the Vietnamese IPA invocation to alert the system to phrases. --RichardW57m (talk) 12:53, 11 November 2019 (UTC)
I have no comment about the topic (because my wiki already uses 'term' for it). By the way, we Thai do actually not use ZWSP in typing or databasing. And ZWSP is not recommended to use because none will see it and it will make automation fail. --Octahedron80 (talk) 04:22, 12 November 2019 (UTC)
So how do Thais override incorrect divisions between words? Those errors affect line-breaking as well as spell-checking. (The racists at Unicode have decided that WJ may only be relied upon for line-breaking.) --RichardW57 (talk) 08:47, 12 November 2019 (UTC)
The problem had been researched for decades. There are also theses especially for optimizing Thai word-breaking. Try this and this. I cannot say in details; we are just users. Nowadays, many applications do well in breaking Thai text. No problem (almost) any longer. --Octahedron80 (talk) 09:02, 12 November 2019 (UTC)
PS. In word processor, if incorrect breaking occurs, we just press space or enter where to split. On web sites, we just ignore them. lol --Octahedron80 (talk) 09:23, 12 November 2019 (UTC)
Proposal: Delete all Old English words with -ƿ-
(Notifying ; errors): This is just a graphical variant of -w-. Any word with -w- can be spelled with -ƿ- as well, leading to an explosion of entries, e.g. acƿelan, acƿeorran, acƿeþan, acƿician, acƿinan, acƿincan just under acw-, all of which are just soft redirects. Every single dictionary I've ever encountered, including older ones, spells these words with -w- not -ƿ-, so it seems unlikely that some random user would type in the ƿ-variant form of a word. I propose deleting all the Old English entries with -ƿ- in them, after checking that the corresponding entry with -w- exists (all of this done by bot, of course). This is similar to how we don't allow Latin words with æ and œ in them in place of ae and oe, and don't allow Latin words written ALL-CAPS EVEN THOUGH THAT'S HOW THE ROMANS WROTE THINGS. Benwing2 (talk) 04:02, 12 November 2019 (UTC)
I think they did not do anything wrong. They are the alternative forms which their entries can exist as well as other languages have. --Octahedron80 (talk) 04:11, 12 November 2019 (UTC)
Yes, I think it is a good idea to delete all Old English words with -ƿ-. I've always wondered whether it is necessary to have these graphical variant entries. Would it be possible to make the templates in Category:Old English headword-line templates automatically display an alternative -ƿ- variant spelling so that the word remains searchable? KevinUp (talk) 04:59, 12 November 2019 (UTC)
It's not "necessary", we just need to decide what exactly the standards for Old (and Middle) English entry titles are. Most editors of those languages seem to be operating under the assumption that all attested spellings get entries. That policy needs to be written down and possibly voted on. DTLHS (talk) 05:10, 12 November 2019 (UTC)
@DTLHS IMO, Old English and Middle English are totally different. Middle English had no standardized spelling so it makes total sense to list all the attested spellings. Old English, however, was fairly standardized. Furthermore, -ƿ- is not really a separate letter from -w- so much as a mere graphic variant (similar to regular g vs. g with a swash tail), and I will bet you $100 that the editors entering the -ƿ- variants are not going by attestation but merely mechanically entering -ƿ- variants for all -w- words. It's exactly parallel to the situation with -þ- vs. -ð-, where dictionaries generally standardize on -þ-, except that for whatever reason no editor felt the need to go and mechanically enter -ð- variants of all -þ- words (for example, there is no entry broðor corresponding to broþor). It looks like all the -ƿ- variants are the work of User:Birdofadozentides, whose entire output for the past year at least has consisted of mechanically entering -ƿ- variants of Old English words with -w-. Benwing2 (talk) 06:33, 12 November 2019 (UTC)
@KevinUp Yes, the headword templates could easily be made to automatically display ƿ-variant text (even hidden text is searchable, I think, and that might be the best approach). Benwing2 (talk) 06:35, 12 November 2019 (UTC)
@Benwing2 As I understand it, "vv" is the rare variant in the actual manuscripts, and "ƿ" is almost universally attested. The use of "w" in printed editions has more to do with typesetting limitations of years past than any lexical or academic reason. Since you mentioned Latin I would suggest you compare this and this, which is much closer to this issue than "ae" vs. "æ" (that's more like "vv" vs. "w" in Old English). I'm not saying that we shouldn't implement this, but the decision needs to be made by the Old English community, not by you or by me. Chuck Entz (talk) 07:34, 12 November 2019 (UTC)
"Most editors of those languages seem to be operating under the assumption that all attested spellings get entries. That policy needs to be written down and possibly voted on." -- yes we do work under those assumptions and with good reason. For the study of a lot of extinct languages, attested spellings are in some ways more important than their standardized dictionary spellings. Standardization schemes, while convenient when looking up definitions, can obscure certain kinds of extra information conveyed by spelling variation (e.g. subtle variants in pronunciation, scribal or dialectal oddities, etc.) in itself. That is why that assumption - that all attested spellings deserve entries - comes entirely naturally to anyone who regularly works with medieval and ancient corpora, especially old Germanic languages where spelling variation is common and often very meaningful.
Regarding wynn, my thoughts:
As has been pointed out, virtually all Old English texts use wynn instead of <w> (and those that don't use wynn use <uu>); the use of <w> is a product of modern typesetting limitations.
<w> is very much the standard for representing wynn and has been for centuries since people started studying Old English again, for reasons noted above (typesetting difficulties and there is no real additional value to using the exact Old English letter shape). Nobody uses the wynn unicode character in creating even scholarly/diplomatic editions of texts. Everyone uses <w> to represent wynn. An important reason why this is unproblematic is because <w> did not exist in Old English, so using it instead of wynn does not in any way obscure any information (except that wynn was used, which is a given to anyone who knows the first thing about Old English anyway tbh). Very rarely you will find (mostly old) texts that try to use wynn or something that looks like it.
For old Germanic languages with much spelling variation, I think it is absolutely useful not only to have a standardized-spelling main entry but also link as alt-forms to actually attested manuscript spellings, which often differ in meaningful ways. People may arrive at our dictionary looking for a certain alternative (manuscript) spelling, and having those soft redirects (+links to them from the main entry) is then very useful in my view. As a matter of fact, I have found this to be useful in my own use of Wiktionary many times in the past.
In the case of Old English, people still won't type the letter wynn even if they're looking for weird manuscript spellings. They'll come here looking for the spelling with <w>, even if the manuscript has wynn. This is because nobody types the letter wynn, it's a hassle (you have to copy-paste it from somewhere on the net) and even diplomatic editions will replace wynn with <w>. Put quite simply: everyone is used to <w>, it is infinitely easier to type, it is used basically everywhere instead of wynn in editions of OE text these days and no information is really lost by representing wynn as <w>.
Regarding <uu>, I think that spelling would in fact may be used in e.g. diplomatic editions as well, although I cannot confirm this because while I know a fair bit about Old English, I am not that well read in the scholarly nitty-gritty of it. Therefore, I think it is still useful to have alt-form entries of attested spellings with <uu> (uncommon). People definitely might come here looking for such spellings with <uu> and then actually write it that way too (because unlike wynn, <uu> is easy to spell and have been the standard in diplomatic editions and elsewhere for many scholars and others, notably those working on other medieval languages without wynn where <uu> is the standard.)
Speaking of attestation, I don't think each of the very, very many spellings with wynn that has been added by User:Birdofadozentides is actually attested. That user seems to indiscriminately add a wynn-entry to correspond to every Old English <w>-entry we have, while a lot of those lemma entries themselves are not attested in the spelling used for the lemma entry, which is the result of modern standardization conventions and does not necessarily reflect the actual attested Old English spelling of that word (especially in the case of stuff like dialectal hapaxes etc.). So some wynn-entries created by Birdofadozentides may essentially be useless even if you're looking to be more true to the manuscripts.
All in all, I'd say wynn-entries can probably go (see discussion below; not entirely sure right now); there is no compelling reason in this case to really keep them, the use of wynn instead of <w> introduces no possible new information and is mostly just a hassle (and clogs up the alt-forms lists even more!). However, I do not think that the assumption that all attested spellings for pre-modern languages should have an entry is an invalid one at all. That should not be the rationale for the deletion of those entries, and that is not why I wouldn't mind seeing the wynn-spelled entries go at all. — Mnemosientje (t · c) 09:18, 12 November 2019 (UTC)
100% dump wynn (ƿ). It's hard to read, which is why it was dumped historically, and no dictionaries bother with it, and nobody is googling for entries with it. Sidenote: I hate it when users check minor edit for all their edits. Please don't. --{{victar|talk}}18:53, 12 November 2019 (UTC)
I've certainly come across printed texts that use wynn instead of 'w', though it does seem the older style. The use rather relies on fonts that make 'p' clearly distinct from wynn - wynn v. 'p' v. thorn is a balancing act. --RichardW57 (talk) 09:02, 12 November 2019 (UTC)
@Chuck Entz I'm not quite sure what your reference to Latin J was referring to. Romans didn't have J, true, but they didn't have U either, and we consistently spell Latin words with U not V where it represents a vowel. Imagine if someone had an ideological aversion to the letter U, just as User:Birdofadozentides seems to have to the letter W (based on the statements on his/her user page), and started mechanically creating entries like assvmo in place of assumo, bonvs in place of bonus etc. (or even worse, entered them in all caps because "that's how the Romans spelled things"). Maybe you are referring to the soft redirects from J-spelled words to I-spelled words? In this case, the switch from J-spelling to I-spelling was quite recent (within the last 50 or so years I think), and many dictionaries still use J-spelling. OTOH, as User:Mnemosientje points out, the use of w instead of wynn has been the standard for hundreds of years (on top of which, wynn is easily confused with p and thorn). Benwing2 (talk) 09:34, 12 November 2019 (UTC)
Not just the user page but their talk page, where they have unilaterally replaced all of the wynns in another user's comment! That is obsessive. Equinox◑13:48, 12 November 2019 (UTC)
I suspect Mnemosientje of hyperbole when he writes, "the use of w instead of wynn has been the standard for hundreds of years". Wheelocke's 1644 edition of the Anglo-Saxon Chronicle uses wynn and archaic letter styles. I doubt that 'w' has been the 'standard' for as much as 200 years. Wikipedia dates the replacement of wynn by 'w' to the start of the 20th century, which is consistent with my recollections. --RichardW57m (talk) 13:58, 12 November 2019 (UTC)
My reference to Latin J was merely to illustrate that there's precedent for having a fairly complete set of duplicate entries based on an arbitrary choice of representation of the same letter. I don't really want to get into a debate on this, because my personal views are more in line with those of Mnemosientje. My main concern was that this was going to be decided and done before anyone who actually works with Old English entries had a chance to weigh in. I'm still waiting to hear from those people- so far we've only heard from one. Chuck Entz (talk) 14:38, 12 November 2019 (UTC)
@Leasnam Speaking of which, I'll tag Leasnam, who didn't mind (at least back in 2018) the wynn-forms; as I believe he has more experience with Old English than I and most people in this thread do, I'm curious about his opinion. — Mnemosientje (t · c) 20:30, 12 November 2019 (UTC)
I have no real preference either way. I'm fine with w or ƿ, even uu. As mentioned, w seems to be more the "norm" used in modern sources, and that is a tremendous plus, especially for users. I am however sensitive to remaining true to a language's original character; however, not at the cost of making working with it a nightmare, in this case by presenting unfamiliar and ambiguous bookstaves that could enhance potential difficulties. Leasnam (talk) 23:22, 21 November 2019 (UTC)
Please don't delete these pages. I know the spelling is the same, but the orthography also matters. Yes, I create pages with ƿynn every time when a new word written with w appears 'cause I want them to be written with ƿynn also, just like in Old English. Maybe some words were not attested, but they would be written like this.
No, ƿynn is not "just a graphical variant of -w-", they are absolutely different letters, and w the way it exists now didn't exist in Old English, uu - yes, it was, and I don't mind having pages with it, only not instead of ƿynn pages. "W has been the standard for hundreds of years 'cause ƿynn can be confused with þorn and p." With p maybe - at first glance, if you look more carefully, you'd see they're written differently - but with þorn, here, on Internet pages, not in the manuscripts? How could they be confused with each other?
People who look for Old English words may never heard about ƿynn and finding these pages may help them find and learn more about the Old English orthography. I know there are also pages with ð and œ pages, what's so wrong with it? What harm could it made if there are alternative aesthetic pages? They will never become main ones, they just show that things can be written and look differently. --Birdofadozentides (talk) 12:17, 12 November 2019 (UTC)
We could modify the headword templates to automatically display ƿ-variant text, so that words with ƿynn are still searchable. And if you do come across Old English texts with ƿ stated explicitly, you are encouraged to add the original sentence containing ƿynn as a quotation. Alternatively, consider creating entries with ƿynn only if it is attestable. It is not a good idea to create pages with ƿynn if you are not sure whether they are spelt that way as it tends to bump up the number of entries in Category:Old English lemmas. KevinUp (talk) 13:17, 12 November 2019 (UTC)
But how can someone know for sure if the word attested or not? I just know that in Old English these words would be written with ƿynn or with uu or with u.
"And if you do come across Old English texts with ƿ stated explicitly..." In Beoƿulf it's almost always explicitly, but adding such quotes is impossible 'cause it's impossible to find editions that have this text printed with ƿynn. And most printed editons of manuscripts use w.
"it tends to bump up the number of entries in Category:Old English lemmas". Is this a problem? So, maybe it's possible to make these pages not appear in this category, but they still would exist the way they are now with no changes.
Is wynn used in any language other than Old English (and maybe early Middle English)? If not, then we could consider hard redirects from spellings with wynn to spellings without it. That way if anyone does look for a wynn spelling, they'll still find the entry they want. —Mahāgaja · talk19:14, 15 November 2019 (UTC)
We should not delete them. And wynn is not "a graphical variant of W", it is a completely separate letter. Modern editions which use W are, in fact, simply modern transcriptions in that sense; manuscripts of OE almost always use wynn, and since that is the actual attestable form of most of these words, we should absolutely keep them. Ƿidsiþ15:08, 17 November 2019 (UTC)
W appears in exactly the positions in Middle English where Ƿ did in earlier Middle English and Old English. Different in origin, yes, but not functionally different. I'm more concerned about the modern editions, since it seems unlikely a manuscript reader is going to be using Wiktionary, and manuscript readers are going to be searching for the w variants, since a wynn is hard to type in.--Prosfilaes (talk) 17:44, 17 November 2019 (UTC)
But what about online editions? People might easily copy/paste words with wynns from, for example, Wikisource - e.g. this. Anyway, modern editions are neither here nor there since no one is proposing not to use Ws, only that we should also cover the actual attested spellings as well. Ƿidsiþ13:11, 18 November 2019 (UTC)
What about people reading the Anglo-Saxon Wikipedia? There's a chaotic mix of the two letters at the article on Norway.
I for one had not considered online editions, and you raise a good point, wynn is used online. Nonetheless, the spelling variation within Old English yields enough alt-form entries as it is; having all those alt forms doubled due to wynn tends to yield extremely long alt-form lists on entries with w; e.g wifmann, þrescold. Not sure whether that is desirable/whether we should display alt forms differently/whether this would justify reducing them to hard redirects. — Mnemosientje (t · c) 13:42, 18 November 2019 (UTC)
I'm unsure about the direction to go here. I believe that attested words should be includable in their original form without question, whether as the lemma or as an alternative spelling. But at the same time, the chances of someone actually ever looking these up are extremely slim. To what degree does "all words in all languages" apply here? Would Wiktionary be incomplete by omitting them? —Rua (mew) 10:17, 18 November 2019 (UTC)
I don't do very much editing in Old English, but find the existence of both wynn- and w-entries a bit irritating, but it does seem possible that people will encounter words with wynn online – as mentioned above, Wikisource sometimes uses wynn. What's really a bad idea is to have separate full entries for the wynn-words and the w-words: ƿeard and weard, for instance. There don't seem to be many though. (Currently these search queries to find entries with wynn that are not alternative forms return 11 and 79 results: : incategory:"Old English lemmas" intitle:/ƿ/ -hastemplate:"alternative form of" -hastemplate:"alternative spelling of", : incategory:"Old English non-lemma forms" intitle:/ƿ/ -hastemplate:"alternative form of" -hastemplate:"alternative spelling of".) At the very least it would be a good idea to come up with some policy on whether the full entry should be located at the wynn or the w spelling and what each should look like (for instance, perhaps wynn entries should use {{alternative spelling of}}). — Eru·tuon16:38, 18 November 2019 (UTC)
The vast majority of wynn entries currently already use some sort of alternative form of template already and link to the w-spelled entry as a main lemma. The ones that don't should absolutely be fixed to a soft redirect. The discussion here is about whether they should have entries at all. (I think that the main entry should be at the w-spelling is obvious for reasons of user friendliness and standardization with other dictionaries.) — Mnemosientje (t · c) 10:29, 19 November 2019 (UTC)
@Mnemosientje A problem I see is that many wynn-entries say alternative form rather than spelling, and also duplicate the pronunciation of the main entry. I tried to persuade the editor not to do this, but they kept doing it. This is something that would need fixing as well. —Rua (mew) 11:50, 21 November 2019 (UTC)
As someone who often reads other languages on the Internet and copy-pastes words to look them up, I think I would appreciate having redirects from entries spelled with wynn. Even hard redirects would be fine, but I don't think we should neglect the original spellings entirely. As someone with limited knowledge of Old English, however, I won't take a strong position one way or the other. Andrew Sheedy (talk) 22:40, 21 November 2019 (UTC)
If a wynn form of a word existed, as I believe was mentioned, it could be added as a quote. In that way, it would still come up on the Googles. Same goes for uu spellings. --{{victar|talk}}06:21, 22 November 2019 (UTC)
FWIW I would support hard-redirecting these, somewhat similar to what we do with long s. In that case, the software redirects people who search for "ſoup" automatically to soup, and even if you actually navigate to , it also redirects you from that page after a minute. Typing e.g. "ƿorld" into the search bar does result in being sent to world, but navigating to results in no such redirect; if we could make that also automatically redirect so we wouldn't even need to have actual hard reidrect entries (blue links), great. - -sche(discuss)07:49, 31 August 2020 (UTC)
So, I'd suggest we first figure out if the technical thing is doable, making it so that a user who reaches a nonexistent page gets automatically redirected to after a few seconds (the way someone who lands on a long-s spelling gets redirected). Because if it is, then presumably a vote to simply delete all the wynn spellings is more likely to pass, whereas if auto-redirection won't work (I seem to recall that one reason it works for long-s is that that long-s can be converted into uppercase S and then converted back into lowercase as "s", whereas lowercase wynn probably coverts to uppercase wynn and then back again), then it's more likely more people might want to keep hard redirects. - -sche(discuss)18:57, 8 September 2020 (UTC)
Shoot, I'd forgotten to post about this. Here is the edit, which affects the text that shows on a nonexistent page. The automatic redirects are something we created here, and an interface administrator or template editor can create them, but the method is a bit abstruse. If a suggested title is marked up in a particular way (as content of a HTML tag that matches the selector #did-you-mean a), our site JavaScript (search "auto redirect") will change the users to the w title after a certain number of seconds. Because this is implemented in JavaScript, it won't work for people who've turned off JavaScript or aren't using a browser. — Eru·tuon05:35, 9 September 2020 (UTC)
@Metaknowledge Going to ƿorld tries to create that page, but going to says "did you mean world?" and redirects after several seconds. Does the latter link not work for you? It works for me using Chrome on Mac OS X. Benwing2 (talk) 06:13, 9 September 2020 (UTC)
@Benwing2, Erutuon: The latter link didn't work for me as of my last edit, but now does, so that's good! The former link still doesn't work, but are you saying it isn't supposed to? I suppose that's fine, but in that case we'll want to make our linking templates convert wynn to w for OE, so new wynn redlinks don't crop up. —Μετάknowledgediscuss/deeds15:54, 9 September 2020 (UTC)
@Metaknowledge Good to hear. Yes, the former link to ƿorld doesn't redirect, it just brings up a page allowing you to create that entry if you really want to. Your idea of making the linking templates convert ƿ to w is a good one, and I can implement that once we have consensus to delete the existing wynn pages. @-sche Feel free to create a vote if you want. Benwing2 (talk) 02:36, 10 September 2020 (UTC)
I'd put in for someone to come up with a less resource-intensive method to do what we do in Lua. I'm sure we will blow through any amount of memory we'd be allowed. DCDuring (talk) 02:01, 14 November 2019 (UTC)
I'm inclined to think so too. Pages with translations have a special potential for increase in memory, because we have more than 8000 languages and in theory every one of them could have a translation, but it is easier to reduce memory there because they have a more restricted structure. Entries for short words, and for letters of the alphabet (which I'm starting to agree with Rua are a bad idea), are a bigger problem because they have a wider variety of different stuff in them. The more complicated the wikitext, the harder it is to develop a memory-optimized version of it. — Eru·tuon20:36, 16 November 2019 (UTC)
A great deal of time has been spent trying to reduce the Lua memory for entries with short words such as the Han character entries, and information such as citations, transliterations were removed just to reduce the Lua memory. Can we find out what is the actual amount of Lua memory needed by entries such as do, 一, 人, 水, 月, 生, 我 (basic words) to solve the error? KevinUp (talk) 23:16, 16 November 2019 (UTC)
The only way I can think of is by setting up a MediaWiki installation that we could tinker with the settings of, and add the necessary pages from the latest dump and try rendering them and check the reported memory usage. I haven't ever done anything that ambitious. — Eru·tuon20:29, 17 November 2019 (UTC)
@KevinUp, Victar (and others) I've implemented support for aliases and varieties in language entries (including etymology languages but not yet scripts or families), and converted (as best I could) all the otherNames entries in Module:languages/data2 to use aliases and varieties instead. I'm now requesting help for converting the remaining language entries. Note that the format of varieties can include nested lists in the case where a variety itself has multiple aliases; see the entry for Azerbaijani for an example that uses both aliases and varieties, including nested lists. (I could expand the structure of varieties further to allow subvarieties to be explicitly specified, but at a certain point you get diminishing returns; in such cases, it might be better to convert the varieties into proper etymology languages.)
In a few places in Module:languages/data2 I wasn't sure what to do so I kept some entries in otherNames. Maybe others more knowledgeable can help:
"Modern Standard Arabic", "Standard Arabic", "Literary Arabic", "Classical Arabic": aliases or varieties of Arabic?
"Farsi": alias or variety of Persian?
"Frisian": otherName of "West Frisian" but this is a family and should probably be deleted
"Hindavi", "Khariboli", "Khari Boli", "Manak Hindi": aliases of varieties of Hindi?
"Kartvelian": otherName of "Georgian" but this is a family and should probably be deleted
"Netherlandic", "Flemish": I listed them as varieties of Dutch but this needs to be verified
{"Pukhto", "Pakhto", "Pakkhto"}: I listed them as aliases of the same variety of Pashto, but this needs to be verified
"Central Thai": alias or variety of Thai? (BTW I listed "Siamese" as an alias of Thai.)
"Standard Zhuang", "Dai Zhuang", "Wenma Zhuang", "Wenma Thu", "Wenma", "Nong Zhuang", "Youjiang Zhuang", "Yongbei Zhuang", "Yang Zhuang", "Yongnan Zhuang", "Zuojiang Zhuang", "Central Hongshuihe Zhuang", "Eastern Hongshuihe Zhuang", "Guibei Zhuang", "Minz Zhuang", "Guibian Zhuang", "Liujiang Zhuang", "Lianshan Zhuang", "Liuqian Zhuang", "Qiubei Zhuang", "Chongzuo Zhuang", "Shangsi Zhuang": all listed as distinct varieties of Zhuang. Which ones (if any) are aliases of each other? "Wenma Zhuang", "Wenma Thu" and "Wenma" in particular look suspect.
Farsi is the name Persian called itself (alias). Central Thai is alias of Thai. Zhuang has a lot of varieties, they even have distinct ISO codes, but Wiktionary groups them all (almost of them spells same). To separate varieties, I suggest to refer to the ISO codes first. --Octahedron80 (talk) 07:53, 12 November 2019 (UTC)
Although Persian-speakers call their language 'Farsi', in English 'Farsi' seems to refer to a version from Iran. This makes it a variety. --RichardW57m (talk) 12:26, 12 November 2019 (UTC)
I don't believe that's usually true. It's hard to prove, since most Persian speakers live in Iran, so most use of Persian or Farsi is going to refer to Iranian Persian. It is a bad idea to use Farsi as a variety of Persian, since that's the endoname for Persian in all dialects. If someone means Farsi as Iranian Persian, it should be changed to Iranian Persian.--Prosfilaes (talk) 15:29, 12 November 2019 (UTC)
Are you sure you aren't getting confused with the Persian word? The Wikipedia article records Farsi as meaning 'Iranian Persian'. --RichardW57 (talk) 20:27, 12 November 2019 (UTC)
The Wikipedia article? Which Wikipedia article? w:en:Farsi redirects to w:en:Persian language, which says "Persian, also known by its endonym Farsi ... is a pluricentric language predominantly spoken ... in three mutually intelligible standard varieties, namely Iranian Persian, Dari Persian (officially named Dari since 1958) and Tajiki Persian (officially named Tajik since the Soviet era)...".--Prosfilaes (talk) 10:11, 13 November 2019 (UTC)
That's not what I find; it lists one dialect as "Iranian Persian (Persian, Western Persian, or Farsi)" and another as "Eastern Persian (Dari Persian, Afghan Persian, or Dari)". w:en:Dari lists "Farsi" as an alternative as does w:en:Western Persian. The latter does mention that "Farsi ... has also been used widely in English in recent decades, more commonly to refer to the standard Persian of Iran." It's obviously not consistently used one way or the other.
I'm really skeptical; I think it's usually more true that Farsi/Persian are more likely to refer to Iranian Persian, as the dominant dialect.--Prosfilaes (talk) 14:31, 13 November 2019 (UTC)
To most English speaking Iranians, Farsi refers to the modern language, and Persian denotes the historical language. Since we don't distinguish between the two on the project, Farsi should be listed as an alias. The same logic follows for Arabic. --{{victar|talk}}18:45, 12 November 2019 (UTC)
My impression is that it is a term for Persian fashionable or politically correct in the 1980s, hence it occurs in the names of characters Unicode created based on materials about a decade old. Fay Freak (talk) 19:10, 12 November 2019 (UTC)
Has become and has ceased? Of course some might be stuck in the 1980s and consider it standard while else one uses “Persian”. Fay Freak (talk) 19:21, 12 November 2019 (UTC)
"Stuck in the 1980s"? No, the usage of "Farsi" as the name is very strong today, and I would say the default for young people and online. "Persian" is, as I said, more historical and stuffy sounding to most native speakers. --{{victar|talk}}20:04, 12 November 2019 (UTC)
I'm not sure Iranian usage should be binding for the English Wikipedia. I used the name "Farsi" back in the 1990s, but got the impression that Iranians were frustrated with it, felt it as way to separate Iran from the historically great Persian Empire, so I switched to using "Persian". Sites like President.ir, Al Jazeera and the Tehran Times seem to use both.--Prosfilaes (talk) 10:11, 13 November 2019 (UTC)
w:en:Western Persian says "The Academy of Persian Language and Literature has called for avoiding the use of the endonym Farsi in foreign languages and has maintained that Persian is the appropriate designation of the language in English, as it has the longer tradition in western languages and better expresses the role of the language as a mark of cultural and national continuity. Eminent Iranian historian and linguist Ehsan Yarshater, founder of Encyclopædia Iranica and the Center for Iranian Studies at Columbia University, mentions the same concern in an academic journal on Iranology, rejecting the use of Farsi in foreign languages." I think a dictionary should tend historical and stuffy over cool and trendy.--Prosfilaes (talk) 14:31, 13 November 2019 (UTC)
Here's a current list of languages that have otherNames, if people want to search for particular languages that they can help with. — Eru·tuon08:25, 12 November 2019 (UTC)
@Erutuon The "varieties" field should not include etymology languages. The original purpose of the "otherNames" field is to specify language aliases/varieties that should be mapped to the canonical language, and necessarily this should not happen for etymology languages. Benwing2 (talk) 01:07, 13 November 2019 (UTC)
Changed “language” to “language header” in some key positions.
Added a footnote explaining what counts as a language.
Updated the footnote that explains the difference between gloss and non-gloss definitions (is it clear enough?)
Added a new footnote with a couple of caveats about the list of new languages.
The paragraph about parenthesised data now says “appendix and reconstruction namespaces”.
The footnotes are now <ref> links.
And to address the real con of the page, I’ve added a breakdown section that lists how many languages have 10000+, 1000+, 100+ and 99- gloss definitions.
Let me know if there is anything else that you would like to see added, or if you think the changes are not good, or not good enough. — Ungoliant(falai)20:05, 12 November 2019 (UTC)
I'm probably being overly picky, but is an invocation of {{inflection of}} that provides a gloss a gloss-definition? FWIW, I think it isn't. (I think these glosses are useful for people who've forgotten the word. It saves a click.)— This unsigned comment was added by RichardW57 (talk • contribs) at 20:35, 12 November 2019 (UTC).
Definition lines with {{inflection of}} count as non-gloss even if they have a gloss. I know this looks contradictory but that’s how the term “gloss” was used even before I took over the statistics page. — Ungoliant(falai)16:48, 14 November 2019 (UTC)
I think you covered lots rather well, but I have minor quibles.
Is a "synonym of" definition a gloss definition?
I'd go down another two powers of ten to include those with 10 or fewer and 1 e
I think "translingual—for terms that cannot be said to belong to one or more particular languages" is highly ambiguous. "How about Translingual-for terms used in more than one language that cannot be said to belong to any one language." I realize that you are trying to make sure that you are excluding borrowed terms. The underlying problem is that many terms that meet any reasonable definition of translingual are not included. Examples include modern scientific terms (units of measure, chemical names, etc), scientific, legal, and medical Latin, and probably many others beyond my ken. Maybe the clause should be much briefer and include "such as CJKV characters and scientific names of organisms." DCDuring (talk) 20:54, 12 November 2019 (UTC)
It does, but I don’t think it should. Does anyone disagree?
Thanks for doing all this. I could see going either way with "synonym of" definitions. But at present it would depend only on whether {{synonym of}} was used. We have an awful lot of one-word definitions as well as definitions that are mere lists of synonyms, eg. many taken from MW 1913. Maybe we should accept such definitions.
The definitions that use {{n-g}} and {{non-gloss definition}} should count as real definitions. They are often the only practical kind of definition for function words. Thus first column heading would be a misnomer. I don't have a suggested alternative header at the moment. DCDuring (talk) 17:01, 14 November 2019 (UTC)
Great now. Especially the explanations of the column headers. They were a riddle to me until the end. The stats finally start to mean something. Fay Freak (talk) 17:20, 14 November 2019 (UTC)
Leiden Indo-European Etymological Dictionary Series
Hey, I would like to start reconstructed terms on polish wiktionary. Can I cite "Leiden Indo-European Etymological Dictionary Series" which are used on english wiktionary (eg. ěsti)? Author stated that book can't be quoted without permission but since english wiktionary uses it I assume polish can too. Sławobóg (talk) 19:36, 13 November 2019 (UTC)
You can't quote long passages directly from the book without violating copyright, but you can assert that a certain term comes from a certain root on the basis of the book. The wording of the book is copyrighted, but not the facts. —Mahāgaja · talk20:50, 13 November 2019 (UTC)
So I am allowed to make entries based on these books and their further etymology? What about problematic etymologies like here? Three possible etymologies are actually quotes from the book. Sławobóg (talk) 21:35, 13 November 2019 (UTC)
Is the character "ꝰ" a special Unicode entry for this abbreviation or a superscript nine? If the former, then there should perhaps be a note stating which it is, and a reference to a page here for this sign. If not (and if there is no provision in Unicode), is the abbreviation really considered a superscript nine, or is it an approximation of a different hand-written sign? It seems important to me to state this correctly in the entry. PJTraill (talk) 17:31, 14 November 2019 (UTC)
Well, one way to identify a character is to enter it in Wiktionary's search box – in this case there's an entry on it. There are also websites like codepoints.net. (I use a little program that I wrote to identify characters.) — Eru·tuon20:47, 14 November 2019 (UTC)
Incidentally, the quote in the similibꝰ entry uses the long S, which in our current font often looks a heck of a lot like a lower-case L. Could we please find a way of using a glyph for long S that is less visually ambiguous? Historical texts are much clearer than our current crappy font, and show the long S extending below the line and with a clear ending hook on the bottom, rather than a straight vertical bar ending abruptly at the text bottom line. ‑‑ Eiríkr Útlendi │Tala við mig17:59, 14 November 2019 (UTC)
The long s (ſ) looks fine to me in my browser and operating system, and if anything looks more like an f than a lowercase l. At the moment, the quote is displayed in the default font, which isn't very specific: it seems to be "sans-serif". That is resolved to an actual font family (such as Arial, DejaVu Sans, etc.) by the browser, and the font chosen depends on many factors, including operating system and browser preferences. I guess if anyone has an idea of which fonts have a particularly visually distinct glyph for the long s character, they could add inline CSS in the entry to specify those fonts, but even then users will have to have one of those fonts installed. — Eru·tuon19:18, 14 November 2019 (UTC)
I confess I'm confused about your proposed layout.
What does the arrow mean?
Where would this go? In the 抛物線 entry, or in the 放物線 entry?
We already have the alt parameter for {{ja-kanjitab}}, which allows us to indicate alternative spellings. Details such as "this particular kanji was superseded in official Tōyō spelling lists by this other kanji" seem more appropriate for the single-kanji entry, rather than duplicated (sporadically, irregularly) across every single entry that includes that character.
I note over in the JA Wikt entry for the kyūjitai (pre-spelling-reform variety) character ja:拋 that they use the term 代替字(daitai ji, “replacement character”) rather than 代用字(daiyō ji, “substitute character”). Meanwhile, my local copy of the KDJ uses the term 代替漢字(daitai kanji, “replacement Chinese character”).
The term 当用漢字(tōyō kanji, “current use Chinese characters, present use Chinese chracters”) is obsolete. This refers to the 1946 spelling reform, which was superseded by the 常用漢字(jōyō kanji, “general-use Chinese characters”). The list of Jōyo kanji was first promulgated in 1981, and then updated multiple times thereafter. I don't think we should use the term tōyō anywhere in {{ja-kanjitab}}, although it would be appropriate to include on the individual kanji's entry page as a record of historical usage patterns.
@Eirikr: This is a kind of etymological information, i.e. 放物線 is from earlier 抛物線. Although 当用漢字 is no longer used, most of the replacements of kanji it caused (w:ja:同音の漢字による書きかえ) was kept by its successor 常用漢字. Simply adding it to alt does not make this clear.
As for the "just single-kanji page" or "every single entry that includes that character" question, the problem is, only some of the new characters are results of replacement, like 放物線. Some are not, like 放心. "just single-kanji page" is not enough to explicitize this difference. -- Huhu9001 (talk) 01:23, 15 November 2019 (UTC)
As etymological information, that should go in the ===Etymology=== section, presumably of the superseded form where it's more relevant. This is where we've put similar spelling-related information for non-lemma forms, with the lemma page linking to the other spelling as an alternative form. ‑‑ Eiríkr Útlendi │Tala við mig20:13, 15 November 2019 (UTC)
It is better to show that it is an etymological alt-form, rather just a common one. As t:ja-kanjitab itself is partly an etymology template. Information of rendaku, kun'yomi, etc. is in fact etymological per se. The template sometimes just directly lies within the ===Etymology=== section. -- Huhu9001 (talk) 05:08, 16 November 2019 (UTC)
I defer judgment. I will note that your proposed visual layout is confusing to me, whereas using alt in {{ja-kanjitab}} seems visually much cleaner and clearer.
Information on 代用字 should be shown just like kyujitai. In the page of 国語 we have only 国 and 語 in a kanji box, and we don’t have an indication like 國 → 国. — TAKASUGI Shinji (talk) 09:42, 16 November 2019 (UTC)
I think this sort of display is more suitable to indicate graphical variants, such as 異体字(itaiji) (if the previous form was more popular in pre-1946 literature) and 旧字体(kyūjitai). It is not suitable for 代用字(daiyōji) because it confuses people into thinking that the former kanji is obsolete and has been superseded. This is not true because 同音の漢字による書きかえ is only applicable for certain compounds. I agree with User:Eirikr that alt under {{ja-kanjitab}} would be more suitable.
@Huhu9001: I'm afraid {{ja-kanjitab}} is not an etymology template. It is just a template that allows language learners to look up the constituent kanji. For Japanese entries, statements such as {{compound|ja|常用|漢字|t1=daily use|t2=kanji}} is still needed under the etymology section to clarify word formation, unlike Chinese entries which have the functionality covered under {{zh-see|type=22}}.
I think a separate template similar to {{wasei eigo}} with automatic categorization can be used at the etymology section to indicate that a particular kanji of that entry is a 代用字(daiyōji). KevinUp (talk) 10:21, 16 November 2019 (UTC)
The name of the template is also slightly problematic: using depre is not immediately intuitive. While I might see ja-toyo-depre in the wikicode and have a good chance at guessing its meaning, as an editor, I would have a harder time remembering this if I wanted to use this template. Should we (the JA-entry editing community at least) decide to keep this template, I recommend renaming it to ja-toyo-deprecated.
(@Huhu9001, in re-reading this thread, I realized that my posts might be read as opposition. On the contrary, I fully support your effort to add this kind of information. My concern is that this information 1) be added in the correct place, and 2) be added in a manner that is clear and understandable to users. ‑‑ Eiríkr Útlendi │Tala við mig18:38, 18 November 2019 (UTC))
I actually support the box-like display on top for {{ja-kanjitab}} because it can be used to indicate graphical variants such as 旧字体(kyūjitai) or 異体字(itaiji) that are more common in pre-1946 literature.
I think {{ja-toyo-depre}} needs to be renamed and its contents reworded. I would suggest changing the current statement "... is replaced by ... after the 1946 tōyō kanji reform" to become "the kanji ... is a 代用字(daiyōji)" with automatic categorization into "Category:Japanese entries containing daiyōji".
If it's a graphical kanji reform, i.e. kyūjitai to shinjitai conversion, we can use the box-display with arrows suggested above.
If it's not a graphical kanji reform, e.g. 抛 → 放, we use {{ja-kanjitab|alt=}} to specify its previous form and a statement in the etymology section to specify which kanji is a 代用字(daiyōji).
@Huhu9001: By "box-like display for {{ja-kanjitab}}" I'm referring to your original proposal at the very top of this discussion where "抛 → 放" is listed under "Kanji in this term". Note that I disagree with the usage of this format for compounds with 代用字(daiyōji). I only support such a format for graphical variants such as 旧字体(kyūjitai) or 異体字(itaiji) that have been superseded by modern forms. KevinUp (talk) 13:12, 19 November 2019 (UTC)
@Huhu9001: I don't think this template is necessary for single character kanji entries. Are you sure that the replacement of kanji A by kanji B in modern Japanese is due to the 1946 tōyō kanji reform? This sort of replacement is only true for 旧字体(kyūjitai) characters.
The document at 同音の漢字による書きかえ recommends substituting Sino-Japanese compounds containing kanji not found in the 1946 tōyō kanji list with other homophones found in the tōyō kanji list. This replacement only affects Japanese compounds. The kanji itself (when used independently) is not deprecated or replaced by other kanji, so the template {{ja-toyo-depre}} is not suitable for single character kanji entries.
"...not suitable for single character kanji entries" If this is true, the "usage notes" section of un-, sub- and partly -ish should all be removed, just because exceptions exist. -- Huhu9001 (talk) 13:24, 19 November 2019 (UTC)
Please read 同音の漢字による書きかえ. The decision to use which kanji to replace uncommon kanji was not confirmed until 1956 (昭和31年).
Different language have different considerations. We're discussing Japanese here, not English. The idea that kanji such as 抛(hō, “to hurl”) has been replaced by 放(hō, “to release; to set free”) is incorrect because the swap from 抛(hō) to 放(hō) only occurs in compound words and is limited to a small group of kanji that are homophones. The meaning of the individual kanji remains different, unlike kyūjitai/shinjitai pairs.
@KevinUp, Huhu9001, I think it would clarify the situation, and make things more useful to our readers, if we were to also list all compounds where the older kanji was replaced, including this list on the individual kanji's page. Since the four kanji currently using {{ja-toyo-depre}} are not that extensively used anyway, this should not be all that difficult to do. I'll have a go at mocking up 抛 later today as an example of what I'm thinking. ‑‑ Eiríkr Útlendi │Tala við mig19:13, 20 November 2019 (UTC)
I'm not sure how 同音字による書き換え should be handled either. The problem is that the new character compound (e.g. 函数 → 関数) may give rise to different kyūjitai and historical kana (e.g. 函數 → *關數 and かんすう → *くわんすう), so they are probably best treated as different alternative forms that have different headword templates, rather than soft-redirects. (The forms marked with asterisks appear in dictionaries, though I wonder if they have been used at all.) --Dine2016 (talk) 04:45, 23 November 2019 (UTC)
(1) I am not only asking about pronunciation, but think it would be helpful for other information, e.g. etymology, as well. (2) It took a little more persistence to find how to do it than should perhaps be expected from a casual user. (3) Once you reach Wiktionary:Requested entries it is clear what it is for, but that was not what I was looking for, and that can be frustrating if you are not helped along. PJTraill (talk) 17:43, 14 November 2019 (UTC)
Families and scripts split into otherNames, aliases and varieties
I added support for aliases and varieties to families and scripts, as for languages and etymology languages. I also went through and converted Module:families/data from otherNames to aliases/varieties as much as I could. I will probably do that for Module:scripts/data, but one confusing thing is that we have several scripts with the same name (esp. "Arabic"). @Rua, Erutuon Do you know why this is? Why not give each script a unique canonical name, hence "Pashto Arabic", "Kazakh Arabic", etc.? Confusingly, script code xzh-Tibt is called "Zhang-Zhung" not "Tibetan", and pa-Arab is called "Shahmukhi" (with "Arabic" listed in otherNames), inconsistently with other script variants. Benwing2 (talk) 01:50, 16 November 2019 (UTC)
I guess it's for subcategories with "in Arabic script", which would be the most common term to use in contrast to any non-Arabic script. I don't believe any language uses multiple variants of Arabic. —Rua (mew) 09:29, 16 November 2019 (UTC)
Hi. First of all let me state that I love dictionaries and the possibility of exploring words, senses and translations like I can with our current Wiktionary.
I believe information wants to be free. I believe in combining and evolving structure and massage our data into a form that the community finds most useful. Our current structure is from a programming perspective unfortunately quite hard to work and interoperate with.
Say e.g. I want to create an Anki deck for my daughter to learn more Danish words. Say with a picture in front and word + sense/definition + sound on the back card. How would you do that with wiktionary? In short you don't because it involves writing a custom parser before you might get good results.
With Wikidata this should be fairly trivial to generate from the database via some sparql queries or maybe a simple script that stores the results in a form that anki can read.
I just wrote why I find the Wikidata Lexeme project superior to Wiktionary in the long run. That said we have some information that is currently not put in Wikidata e.g. usage notes and it would be sensible to interoperate until we find out how to store and edit the data in a way that is attractive to users/readers/editors.
I'm interested in discussing how we can integrate our current wiktionary CC-BY-SA with wikidata CC0.
We have multiple subjects to discuss that I will outline below.--So9q (talk) 14:16, 18 November 2019 (UTC)
Interesting to note is that they have not been deleted from WD, that is it seems that the wikidata collective de facto has accepted the data as being CC0.
Both Wikipedia and OpenStreetMap have succeded a license change for the whole project without any greater loss of quality. I think we should talk about relicensing part of or the whole wiktionary to make a migration of selected content into Wikidata possible. I don't think it is a good idea to waste good human resources on keeping WD and Wiktionary apart and updated. WDYT?--So9q (talk) 14:16, 18 November 2019 (UTC)
To understand why data are not deleted on Wikidata, please read Wikilegal/Lexicographical Data. It explains what can be protected by a CC by-sa licence and what cannot be. In very short, only senses are protected; it explains why there are mainly lexemes and forms (and not senses). Pamputt (talk) 20:25, 18 November 2019 (UTC)
Bad UI and no API support for lexemes in WD
The UI of en.wiktionary is very nice and clear to understand and our old NEC and the newly rewritten NEC makes it really easy to enter new lexemes. Wikidata on the other hand present you with a page that requires you to both learn new concepts and in addition makes it hard to search for lexemes by default and link translations together. Not nice. So while their structure is quite nice their UI currently suck compared to ours. They tried making it a little better by making a wizard but it fails to let users enter all the data you would want, e.g. sense, definition, forms, translation, etc. This makes me think that WD lexemes are still in a premature state. The differences in handling translation of sences also points me to make that conclusion. Also the fact that senses and lexemes have not been separated (see my questions to the community here) worries me that also their data model is premature. These tickets:
API: Support editing statements on senses via wbeditentity (open)
Error when creating lexemes in languages which have an unsupported ISO 639-1 code (open)
Expose number of senses of a lexeme to SPARQL (open)
Example of what you can do with Wikidata compared to Wiktionary
I like this application a lot as it shows the power and usefulness of having data with relations (as in Wikidata) in contrast to static text (as in Wiktionary). From a child perspective I find exploring the former much more useful and fun than the latter. I also believe this will decrease the need for editing and maintenance over time as relations change more seldom than representations of data. WDYT?--So9q (talk) 14:16, 18 November 2019 (UTC)
I guess it was inevitable that Wikidata were going to steal our content, regardless of silly little things like "licenses". DTLHS (talk) 16:19, 18 November 2019 (UTC)
There are still many entries on Wiktionary that need cleanup and contain errors such as incorrect pronunciation. Copying all of these to Wikidata would only duplicate existing errors. I'm not sure how both of these projects can be merged in a way that would decrease the need for editing and maintenance over time. If we copy everything over, does this mean I have to edit both projects every time I spot an error on Wiktionary? KevinUp (talk) 19:15, 18 November 2019 (UTC)
As both a Wiktionary editor (on the French Wiktionary) and a Wikidata editor, my opinion is that Wikidata will be very useful only (or mainly) for the grammatical informations (declination, conjugation and so on). These information are duplicate on each Wiktionary version and centralizing these info in a central hub would benefit to anyone (French people would work on the French conjugations, Spanish people on the Spanish conjugations, Russian people on the Russian declinations and so on). These informations are not protectable by licences (see the link I gave above). About senses and translations, I am much more dubious. Pamputt (talk) 20:34, 18 November 2019 (UTC)
A problem I encountered is that Wikidata has no way to automatically generate inflections the way we do with templates and modules. Everything has to be entered by hand. —Rua (mew) 10:00, 19 November 2019 (UTC)
Since these Braille symbols mean things like "question", "here", and "know" — do they not have parts of speech? I suppose they don't inflect, so they are better seen as an "encoding" of the letters... but we should be consistent about Braille anyhow. Equinox◑22:32, 20 November 2019 (UTC)
Hi, I created a vote some time ago to move Akkadian lemmas to their transliteration, and the vote passed. Unfortunately, I haven't been able to move the content, because I became very busy until now. However, now that I'm not so occupied I would like to take this opportunity to review the subject. I'd like to know your opinions about if transliterations is the right way to go, or if it would be better to follow other dictionaries (like the CAD), that lemmatizes at transcriptions. – Tom 144 (𒄩𒇻𒅗𒀸) 17:44, 20 November 2019 (UTC)
Chiming in very much from the sidelines, as I know next to schmotz (i.e. nothing) about Akkadian -- we've had some discussions recently about where to lemmatize Japanese terms. Dine2016 came up with a very clever approach, such that soft-redirect entries to the lemma (perhaps analogous to an Akkadian transcription entry that points the user to the lemma at the transliteration?) still contain useful information, without the need to duplicate data across multiple pages. As an example, the Japanese entry at さくら(sakura) is a soft-redirect to three different lemma entries that share the same kana spelling.
No, he does not even know which transcriptions or transliteration system to use, and how to encode them. I have elucidated it in the talk page of that vote (especially with examples in the post starting “So what will be the page titles?”). Also I do not see that content according to that vote must be moved at all or no new words may be under cuneiform, if the “consensus” is so crude and inconcrete that perhaps there is none. Fay Freak (talk) 23:08, 20 November 2019 (UTC)
I do intend to redirect other forms, such as the cuneiform, and transliterations/transcriptions to the lemma. A possible format for entries lemmatized at transcriptions is abašmû, the one for transliterations is the same as the format for cuneiform lemmas, as in pa-rá-su-um. – Tom 144 (𒄩𒇻𒅗𒀸) 23:46, 20 November 2019 (UTC)
Category:Micronesia vs. Category:Federated States of Micronesia
The reason the FSM is so-named is because it comprises four different nations within the former U.S.-administered Trust Territory of the Pacific Islands—Chuuk, Kosrae, Pohnpei and Yap—that banded together as one federated union rather than go their separate independent ways as the Marshall Islands and Palau had. Nauru and Kiribati were not part of the U.S.-administered region and became independent under separate circumstances. But all these places are part of the Micronesia region.
I propose that all boilerplate categories currently simply labeled "Micronesia" be renamed "Federated States of Micronesia". I don't know if it's current practice to allow other categories for subregions like Polynesia, Melanesia or Micronesia, but a category simply named "Micronesia" ought to refer to wider Micronesia (just as the Wikipedia article does), including not only FSM but also the wider Caroline Islands (including both FSM and the separately independent country of Palau), Kiribati, the Mariana Islands (both Guam and the Northern Mariana Islands), Nauru and Wake Island. - Gilgamesh~enwiki (talk) 04:15, 22 November 2019 (UTC)
@Gilgamesh~enwiki: It's the right place, but maybe nobody had any opinion on the topic. It makes sense enough to me, and I'd say just go ahead and do it if you can figure it out. If not I can help. — Eru·tuon22:16, 7 December 2019 (UTC)
That's the thing—I would do it, but I have no idea how the automatic categories are set up under the hood. All I've been doing is create missing parent categories and adding {{auto cat}}. What I want to do is have Category:Micronesia and its language-associated subcategories apply to the international region, and Category:Federated States of Micronesia refer only to the federated Yap-Chuuk-Pohnpei-Kosrae polity. This and other extant categories for polities in the Micronesian region would also be subcategories of Category:Micronesia. And if there's a Category:Micronesia, it may also be appropriate to have separate categories for Category:Polynesia, Category:Melanesia and Category:Australasia, with some overlap between the three. (New Guinea is in both Australasia and Melanesia, New Zealand is in both Australasia and Polynesia, etc.) - Gilgamesh~enwiki (talk) 22:55, 7 December 2019 (UTC)
@Gilgamesh~enwiki: Well, the module that creates the category descriptions can be accessed by the "Edit category data" button at the top right of the category description. For Category:Micronesia, that's Module:category tree/topic cat/data/Earth. There, you can search "Micronesia" and edit the category description, and add other categories for regions or nations using the currently existing categories as templates. I think the module format is fairly self-explanatory.
It's time to vote in the Community Wishlist Survey 2020, and this year is very special as all proposals are for small community projects, such as ours. It's maybe the only year when it's possible to have one of our proposal in the top 5. Last year, the best one about Wiktionary had less support than the latter of Wikisource proposal. Let's not make this situation happen again! Please consider to have a look at the proposals, at least the 20 for Wiktionary, and support those you think are useful for you and for your community! Noé09:08, 22 November 2019 (UTC)
Scores are changing fast, and the proposal in top 5 for Wiktionary may be ejected for only 2 votes. Help still appreciated. Noé10:41, 1 December 2019 (UTC)
Please support Insert attestation using Wikisource as a corpus. This is most likely winner for Wiktionary proposals. It would be a great help to provide attestation for the older terms we have, whether completely uncited, cited only by terse reference to an author, or cited by a too-short passage and no link to fuller context. We already have more than 10,000 pages with {{rfquotek}}, many with multiple uses per page, for which wikisource would be a good source. This excludes the many noun and verb entries to which I have not yet systematically applied the template and also the pages that reference various bible editions. DCDuring (talk) 19:39, 1 December 2019 (UTC)
Other proposals of use to Wiktionary could win because participation across projects is low, Wikipedia projects being excluded this year. DCDuring (talk) 19:39, 1 December 2019 (UTC)
Hey, the vote is over, results are about to be published tomorrow but it sounds the proposal mentioned by DCDuring is about to be selected. It's a great news. I think it could be a nice tool to include new participants in not-dedicated editathon, Hey look how easy is contribution in Wiktionary!. Still, it's only one development and the 19 others remains unsolved. I though our interlingual user group may be of some help in this survey, but it wasn't so effective. I initiated a conversation about the demotivational effect of this survey on contributors in Wikimedia Space. This is a service provided by the Wikimedia Foundation to set up conversations with functionalities from forums and mailing lists. There is mostly people from the Wikimedia Foundation, and it's interesting to reach them there, but also contributors from various communities. Feel free to express your opinion here or there. Noé15:32, 5 December 2019 (UTC)
Unsourced Claims about the Voynich manuscript and Chinese
Dear all- I have just now deleted a large section of material on the Voynich manuscript page which was comprised of three paragraphs and an image which were nothing but unsourced claims about alleged connections between Asiatic languages and the Voynich manuscript. The content had been on the website essentially unchallenged since 2004.
I invite you to take a look at my triage work on that page ().
I think they should simply be merged. I also think “spelled” reflects the situation more accurately than “read”. --Lambiam22:40, 23 November 2019 (UTC)
category = "Japanese terms read with on'yomi",
category = "Japanese terms read with kun'yomi",
category = "Japanese terms read with nanori",
category = "Japanese terms read with yutōyomi",
category = "Japanese terms read with jūbakoyomi",
category = "Japanese terms read with jukujikun",
category = "Japanese terms with irregular kanji readings",
category = "Japanese terms read with kan'yōon",
The list above are categories that belongs to Category:Japanese terms by reading pattern. I think the current definition of 熟字訓(jukujikun) is slightly incorrect. Jukujikun refers to the reading of a kanji compound based on its meaning, and not the kanji compound containing such a reading pattern.
Basically, the "Japanese terms read with ..." categories are not incorrect because they are referring to the reading pattern, e.g. on'yomi, kun'yomi, etc.
So I take it you see jukujikun as kind of short for jukuji-kun’yomi. I took it as meaning: the kun is that of the jukuji (and kun’yomi: the yomi is that of the kun). As to the overlap, shouldn’t 守宮(yamori) be categorized as jukujikun? --Lambiam10:50, 24 November 2019 (UTC)
Include "Reconstruction notes" and "Alternative reconstructions" in WT:EL?
These sections are occasionally used on pages of reconstructed entries (e.g. *kʷel-), and I would like to standardise them. So far I've placed them in the same location as the usual "Usage notes" and "Alternative forms" sections, but I don't know what relative ordering should be used for entries that have both "Alternative forms" and "Alternative reconstructions". They are not mutually exclusive, since "Alternative forms" lists forms that were synchronically alternatives in the language, while "Alternative reconstructions" handles uncertainties in the reconstruction itself. I would also like to codify that "Alternative reconstructions" isn't meant for differences in notation that don't actually amount to a phonologically different word; words in this section should be spelled in Wiktionary's agreed-upon notation for the language, if there is any. Finally, some entries use "Reconstruction" alone instead of "Reconstruction notes". I propose renaming the former to the latter where applicable. —Rua (mew) 09:32, 24 November 2019 (UTC)
I certainly agree that "Reconstruction notes" should be used rather than "Reconstruction", because the entire entry is a reconstruction. I also agree that "Alternative reconstructions" should be used primarily for substantive differences in reconstruction rather than mere orthographic differences, but there may be times when including mere orthographic differences is acceptable, especially for lesser-researched proto-languages where it might not be clear whether two scholars' reconstructions are different in substance or only in notation. (Especially in cases where the two scholars in question might be the only two people on the planet who have worked on reconstructing that language.) In such cases I'd err on the side of including both spellings. —Mahāgaja · talk16:02, 24 November 2019 (UTC)
This is descriptive of accepted usage. As long as one knows where it’s from its okay, or do you contest the header “Alternative reconstructions”? (A bit late, innit …) However if I had added it I would have added it in the section “Headings before the definitions” too; putting it under “Headings after the definitions” only seems to reflect Rua’s personal preference. And now I even see that “Reconstruction notes” can come before the definitions as well, as for heaving it near the etymology, as I have done at *čǫbьrъ – but Rua of course correspondingly suggested to have etymologies under part-of-speech headers so following these logics it does not go before, however these logics are so far not cogent and ’tis lamentable that she has not added it as so far actually acceptable. Fay Freak (talk) 08:27, 4 January 2020 (UTC)
@Fay Freak: I missed your reply. Using "Alternative reconstructions" as a header has been suggestedseveraltimes and I used to add them myself, but I stopped because I was getting reverted (ironic if it was Rua, but sadly I don't recall). We should have a two-fold vote on it: 1) should "Reconstruction notes" and "Alternative reconstructions" become accepted headers, and 2) where should they be placed? I was hoping @Rua would create one. --{{victar|talk}}20:51, 6 March 2020 (UTC)
@Victar The layout of your vote in those code boxes is strange: Alternative formsto:Alternative reconstructions does not work. What would need a vote is to include Alternative reconstructions and Reconstruction notes; but not necessarily under the part of speech header, or is this vote indeed about making reconstruction pages different from mainspace pages? All can well be alternatively at the etymologies, the order probably alternative forms–alternative reconstructions–reconstruction notes (while there is a fluent transition from “content normally found under === Etymology ===”), and one can have the vote thus extend as well to the mainspace pages to include === Etymology === and === Alternative forms === there under part of speech too. One needs good faith of course that certain revertards would not abuse the presence of multiple formatting possibilities, having Alternative forms, Alternative reconstructions and Reconstruction notes before etymology and after part of speech. Fay Freak (talk) 13:09, 4 April 2020 (UTC)
IMO "Reconstruction notes" should replace "Usage notes", since reconstructions have no usage, and "Usage notes" is analogous in its varied use in entries. Still, Reconstruction entries are strange because they're really an extension of the etymology sections, so they're heavy on stuff that shouldn't be in entries. As I see it, there are at least a couple of types of notes we should allow for:
One is for discussing the usual type of things found outside of etymology sections: the definitions/semantics, morphosyntax/inflection, descendants, etc. In other words, what we know or can deduce about the term as part of a language that we're guessing existed at some time. This kind of note would mention such things as influence of name-avoidance taboos on names for dangerous animals among descendant languages, dialectal variation, inflection classes and accent patterns, etc.
The other is for discussing how the construction was arrived at, what evidence there is for it, and what the scholarly consensus is about it- more etymological and technical in nature.
@Chuck Entz:: Reconstructions had usage (if they aren’t bad), so we still deploy “usage notes” (but less often since the details that else go thither are not well known). “Reconstruction notes” sections include remarks disputes about the form, as far as I have seen it. But as you have seen there are fluent transitions and we can well have neither ===Reconstruction notes=== nor ===Alternative reconstructions===. I see now that we might peruse a header like ===Alternative forms and reconstructions=== because one is sometimes confused enough not to know what one wants to state as alternative form and what as alternative reconstructions. Perhaps the distinction is an emanation of our hybris of trying to distinguish what is real and what is of disputable reality, while in each case somebody imagined a form to be real. In another point of view, ===Alternative forms=== will always contain but imagined forms. Fay Freak (talk) 10:47, 5 April 2020 (UTC)
@Victar: Well I think it without embracing any side; just pointed at the possible outcomes, it’s Möglichkeitsdenkenand notWirklichkeitsdenken. What would be more a concern is an inflation of headings and their orders to memorize by users, which can be considered a burden and retrogressive. But maybe there is little interest in having such a mental map. I don’t think in headings anyway. First comes what I want to transmit to the reader as what one would expect from a lexicographer then it gets pressed into entry layout, in one of multiple possible ways. So I can only ridiculize many zealous sticklings for “logical” orders. As long as things still fit and are not too odd, the merriest attitude is that of an anarchist. Fay Freak (talk) 02:58, 8 April 2020 (UTC)
Hello all. I want to bring up a major difficulty that I see on our website for your consideration. I believe our set up on Wiktionary is in error concerning Chinese.
Please keep in mind that I only want to talk about the academic value of the problem I pose here, not the practical consequence of the discussion. Once the academic discussion has been concluded, then the practical implications and requirements of our discussion can be brought up. Let me start with a (real) quote from Bertrand Russell to ground the discussion:
When you are studying any matter, or considering any philosophy, ask yourself only what are the facts and what is the truth that the facts bear out. Never let yourself be diverted either by what you wish to believe, or by what you think would have beneficent social effects if it were believed. But look only, and solely, at what are the facts.
In that spirit, I would like to point out a few things which may make some angry or uncomfortable. Please don't hate me; I am attempting to directly address what I see as a major factual problem. I could be wrong of course, but I would like to be convinced of an alternative viewpoint if it were possible.
First: 'Chinese' is not included in the List of languages by number of native speakers page on Wikipedia or Ethnologue (). Chinese is "broken up" among the various dialect-language-fangyan-topolect things (in fact there is nothing to be "broken up" unless you want to say (by analogy) we are "breaking up" the Indo-European language by showing regional variations in Spain, France, Italy, etc)
Second: the premise of our dictionary is that all languages have their own language header. Because of this premise, we are able to derive the number of languages we document on Wiktionary on the homepage as '4000'. However, after I pointed out Chinese and Translingual were not languages, this statistic has been more fully recognized by the community as uncertain or erroneous. Now we write at "The total count of language headers includes living languages such as English and Spanish, dead languages such as Old English and Etruscan, reconstructed languages such as Proto-Indo-European and Proto-Turkic, constructed languages (including appendix-only constructed languages) such as Esperanto and Lapine and macrolanguages, such as Norwegian (alongside Norwegian Bokmål and Norwegian Nynorsk). In addition there is a separate header for Chinese and a special header—translingual—for terms used in more than one language that cannot be said to belong to any one language, such as CJKV characters and scientific names of organisms."
Third: There are eight separate Wikipedias for the different languages covered under the language header 'Chinese' such that our page 中國/中国 (Zhōngguó) has links to eight different Wikipedias. In fact there are more "Chinese-es", that have yet to create a Wikipedia.
Fourth: There is a header for Old English and Middle English, but there is no header for Old Chinese or Classical Chinese. Classical Chinese has its own Wikipedia. (lzh) The The Chinese Language: Fact and Fantasy page "There is no unique "Chinese language". There is a group of related ways of speaking, which some may call dialects, others call "topolects" (a calque of Chinese 方言, fāngyán; DeFrancis uses the term "regionalects"), and still others would regard as separate languages, many of which are not mutually intelligible. One such variant, based on the speech of the Beijing area, has been chosen as the standard language in the People's Republic of China, and is now known as Pǔtōnghuà, or "common language"."
I say let's follow Ethnologue friends. Ethnologue says that there's a Mandarin Chinese and a Wu Chinese etc, and they list them alongside English, Hindi, Spanish, French, Standard Arabic, Bengali, Russian, Portugese, Indonesian, Urdu, Standard German etc. Final quotation for you to consider: Chinese language "Chinese is a language family that forms the Sinitic branch of the Sino-Tibetan languages. Chinese languages are spoken by the ethnic Chinese majority and many minority ethnic groups in China."
Thanks for your time reading my post. These are just my thoughts on the matter and I don't mean to demean anyone's opinion or viewpoint. I think the officials at the Ministry of Education of the PRC would probably side with me in my position. "According to the Ministry of Education, China-as a country with more than 130 ethnic minority languages and 10 major Chinese dialects, has been taking active measures for the protection of languages resources." Given the way we break up English and other languages, why not give Wu Chinese etc their own headers? If you have the littlest doubt that these languages- Mandarin, Cantonese, Hakka, Wuu, Min Nan, Min Dong, Xiang, Gan, etc. are independent languages, then I say give them the benefit of your doubt. What's the harm? An argument from practicality is not what we need: we need to realize that having "Chinese" as a header is accidentally misleading the readers of the website into the belief that Chinese is a language (because we say that we have 4000 languages on Wiktionary on the basis of how many headers we have). Thanks for your time and consideration. --Geographyinitiative (talk) 01:46, 27 November 2019 (UTC)
The "Old English and Middle English" example reminds me of the English dictionary market. Among the historical dictionaries, we have both the diachronic Oxford English Dictionary, which traces the history of every "English" word in use since 1150 in a seamless flow under the modern spelling, and the synchronic Dictionary of Old English and Middle English Dictionary, which deal exclusively with one period of English. If we're to draw an analogy, our current treatment of Chinese is like the former type (except that it is cross-topological as well as diachronic) and in the end proved to have had better coverage of the Chinese languages.
Calling Chinese a language family is one thing; breaking Chinese entries into several sub-entries is another. I totally support renaming the ==Chinese== header to something like ==Chinese languages== or ==Sinitic==, and I wouldn't object to giving separate treatment to the individual languages alongside. This is like having both types of historical English dictionaries mentioned above. For example, a separate Min Nan L2 section of an entry can use Pe̍h-ōe-jī instead of pinyin as the default romanization scheme and define relation in terms to the vernacular language instead of Modern Standard Written Chinese, which may be more useful to Min Nan learners. --Dine2016 (talk) 04:24, 27 November 2019 (UTC)
SUPPORT I support the great user Dine2016's proposal. Unless there is some reason to pretend Chinese is a language is the only way to understand the situation, there needs to be some kind of note on every page with "Chinese" saying that Old Chinese, Middle Chinese etc are not necessarily all one language. We need something like that now. Have mercy on the readers and let our dictionary be an honest dictionary. --Geographyinitiative (talk) 05:53, 27 November 2019 (UTC)
How long am I going to have to fight to get you all to accept reality? Chinese is not a language (not even on Wikipedia) and we are misleading the readers. --Geographyinitiative (talk) 06:01, 27 November 2019 (UTC)
I didn't mean the "Chinese" header should be abandoned. Some Common information like Template:Han etym can go there. Instead it is the details that should be moved to the corresponding dialectal headers. -- Huhu9001 (talk) 14:40, 27 November 2019 (UTC)
Which details? I find it convenient to search a Chinese sign and find all the related pronunciations under it. For now only it’s only sometimes for etymologies of wanderworts since I do not deal with Chinese but if one wants to learn all the Chinese dialects this is efficient. Like I open a Proto-Slavic page to memorize the descendants. Language learning in the 21st century. Fay Freak (talk) 15:38, 27 November 2019 (UTC)
"if one wants to learn all the Chinese dialects" Not really. By mixing a lot of things here it is often more confusing than efficient. The learners may think they have mastered "all the Chinese dialects", but that is very likely an illusion. -- Huhu9001 (talk) 16:42, 27 November 2019 (UTC)
@Geographyinitiative, you claim the unification of the Chinese lects under a single L2 heading is a major problem. You present no solutions, only a perceived problem, and not a clearly described problem at that. Apparently you think Wiktionary is being fascistic, that Wiktionary is not "morally correct", that Wiktionary is "dangerously inappropriate", that Wiktionary is "evil". These are not rational arguments.
As has been pointed out to you before, there are many organizational benefits that come from the current unified approach. You have not articulated any clear benefits from splitting everything out into separate L2 headers, nor have you addressed the enormous amount of work that would necessitate, nor the enormous amount of data duplication and consequent maintenance nightmares.
As @Dine2016 and @Huhu9001 have noted, there are some specific usability issues that they've encountered in working with the current system. However, Dine2016 proposes reorganizing under a single L2, perhaps renamed from "Chinese" to something else, but still under a unified structure. Huhu9001's suggestion I find a bit more confusing, but even their proposal maintains a unified structure to some extent.
What is the problem you see? Can you explain it in non-moralistic terms?
What solution do you propose? Can you explain it in terms of non-moralistic benefits that the proposal would bring? Can you address how those changes would be implemented, and how the proposed benefits would outweigh the advantages of the current structure?
I also suggest that you set aside the "fight" vocabulary. Fighting with someone does not help bring them into positive agreement with you. If you approach community engagement from a perspective of violent opposition, you've already lost the initiative for achieving a peaceful, mutually beneficial resolution.
This has been discussed ad nauseum. The current system is fantastic (speaking as a longtime Mandarin learner) and is the product of a lot of generously given free labour by editors that deal in the Chinese lects. And Eirikr hits the nose on why this rant has no leg to stand on. Also disappointed that none of the major Chinese contributors were pinged. —AryamanA(मुझसे बात करें • योगदान)22:37, 27 November 2019 (UTC)
Re: Geographyinitiative's argument, I'm open to being convinced -- provided that a clear proposal can be made for a different approach where the benefits outweigh the impacts. My last two bullet points above were intended as coaching for how to build a better argument. What we have so far ... doesn't paint a compelling picture. ‑‑ Eiríkr Útlendi │Tala við mig23:26, 27 November 2019 (UTC)
Thank you for doing that. Obviously, I am not a major Chinese contributor and so I will wait for them to discuss it. But I do feel like Geographyinitiative's ideas are a step back since Chinese unification was such a massive project. —AryamanA(मुझसे बात करें • योगदान)23:37, 27 November 2019 (UTC)
The only major argument I see against the unified Chinese approach is that Chinese may be misunderstood as being one language. It is highly unfeasible to undo the tremendous amount of work that has gone to making the Chinese/Sinitic infrastructure possible and building a huge inventory of entries, so I think disunifying is out of the question. I would be fine with Dine2016's proposal of changing the header name, preferably "Sinitic" if we do change, but I'd also like to point out that Chinese is in fact considered a "macrolanguage" in Ethnologue, and the name for it is "Chinese". I think macrolanguages should be allowed as headers alongside languages. In fact there other macrolanguages that we treat the same way as we do Chinese, such as Zhuang and Albanian (and maybe Arabic in the future).
Also, I do want to point out that Ethnologue, while a great resource, doesn't do justice to the Chinese topolects. There are areas of controversy concerning classification, such as lumping Pinghua into Cantonese, lumping Hainanese into Min Nan, lumping Shehua into Hakka, not clearly dealing with Tuhua (spoken in Hunan, probably related to Pinghua), etc. Ethnologue/ISO 639-3 should only be used as a guideline, not as the authority on how we organize our entries. — justin(r)leung{ (t...) | c=› }01:15, 28 November 2019 (UTC)
@Aryaman, Eirikr, Justinrleung As Old English and English are not the same thing, so too Old Chinese and modern-day Mandarin Chinese are just not the same thing guys. The issue has to come up again and again because it's part of the big lie we're being sold. How long are we going to be misleading the readers? How long are we going to be complicit? With some study, I can parse Middle English, even Old English. One language header, one language- that's the principle of this website. If you still can't handle that truth that this website is pushing Mandarin chauvinism, then at least add a note like Dine2016 has brought up. You can see an example of that kind of note at 暑假, where a question mark is written next to "Written Standard Chinese" (the 1984-style dissimulation term for written Mandarin Chinese). --Geographyinitiative (talk) 21:56, 30 November 2019 (UTC)
@Geographyinitiative: We are not saying that Old Chinese and modern-day Mandarin are the same thing, but the truth is that people still read Old Chinese in the modern lects - so what do we do about that? Do we have Mandarin/Cantonese/etc. pronunciations under a header labelled Old Chinese, or should we stick with what we do now and label a sense as "literary" or "classical" to mean that it is not common in modern lects? Sure, we could follow other languages in splitting diachronically, but they are also arbitrary lines because the history of a language is not categorical.
Also, I agree with Eirikr that we need to get past the (subjective) moralistic judgments on the current framework. We are not at all pushing Mandarin chauvinism - which is why it's not labelled as dialects of "Mandarin". It just so happens that the standard variety of Chinese across the Sinosphere (except for maybe Hong Kong and Macau, but even then, the written form used in official documents is still "Written Standard Chinese" - more on that later) is indeed a variety of Mandarin. I also want to say that the line separating languages is a sticky situation. Taishanese is hardly mutually intelligible to Guangzhou/Hong Kong Cantonese speakers - yet Ethnologue treats them as the same language. Defining what is and what isn't a language is complicated - the "one language header, one language" principle is an idealization that cannot be perfectly realized in Chinese. I would opt for staying in the current framework, which has definitely increased coverage of Chinese varieties much more than before the unification. Unless someone has a brilliant and cohesive plan that would work better than the current framework, I would not agree to any major change to how we present the Chinese languages on this platform.
Side note: Not to be picky, but "Written Standard Chinese" is not the same as "written Mandarin Chinese". "Written Standard Chinese" is more specifically a written form of standardized Chinese (which is taught in schools across the Sinosphere and is indeed based on Mandarin, at least in terms of grammar) - so it would generally not contain colloquialisms/regionalisms. Note that it's the "Chinese" Wikipedia zh that it's referring to, not Mandarin cmn. — justin(r)leung{ (t...) | c=› }00:37, 1 December 2019 (UTC)
I am with Justin on this. Deunifying the contents shared by Chinese/Sinitic topolects would be a major step back, which won't help this dictionary project.
Geographyinitiative has been acting emotionally and politically. He somehow thinks that this is a social platform and unifying Sinitic languages/topolects, which share Chinese characters is somehow a blow on the independence of individual Chinese varieties or Beijing's oppression, specifically of Min Nan.
I strongly oppose splitting Chinese back into its L2 headers. We already had this and it didn't work.. Moreover, talking about this and getting some sympathisers is one thing but acting unilaterally is another. Geographyinitiative was temporarily blocked for his unilateral actions. I also have serious doubts he has a good understanding how dictionaries should work, ignores previous agreements and conventions. He is OK to sacrifice consistency (e.g. lumping up all possible transliterations for various topolects together or displaying all attested transliterations with various spacing and capitalisations). Overall, the user doesn't suggest any good framework that will help Chinese topolects and dialects to thrive. --Anatoli T.(обсудить/вклад)02:41, 1 December 2019 (UTC)
From a technical point of view, all of these lects are treated as independent languages with individual categories available, e.g.:
As discussed here, I would prefer to have labels such as {{lb|zh|mainly|Cantonese}} used for senses that are specific to a certain lect for proper categorization. I would also like to point out that Chinese is also a literary language. If someone writes 臭豆腐(chòudòufu) we are not able to determine which lect is being referred to unless the word is spoken. KevinUp (talk) 07:07, 6 December 2019 (UTC)
===Pronunciation===
* Phonetic transcriptions
* Audio files in any relevant dialects
* Rhymes
* Homophones
* Hyphenation
However, when the rhyme is added to an entry with the accelerated method (via the rhyme page), the code places it at the bottom of the Pronunciation section. Which one is correct? Panda10 (talk) 20:33, 27 November 2019 (UTC)
ELE was actually voted on, so it's supposed to govern. Placing the new item at the bottom was probably just convenient, especially because we had relatively few homophones and fewer hyphenations. DCDuring (talk) 02:24, 28 November 2019 (UTC)
Currently an entry might specify a synomym term without the latter mentioning the former, as happens for example with like mad and like crazy. Is there an solution to automatically add synonyms in both directions? --Backinstadiums (talk) 10:59, 29 November 2019 (UTC)
This can hardly be fully automatic, because terms may have different senses, and then we want to label its synonyms with the sense for which they are a synonym, using {{sense|...}}. For example, the British term quits has a synonym even, but even has many senses, so we cannot automatically add quits to the entry as a synonym. I can imagine a tool that facilitates adding synonyms but still requires user interaction, but I am not sure the added value justifies the considerable effort of designing and implementing it. --Lambiam00:36, 30 November 2019 (UTC)
Proto-Balto-Slavic accentology, and its changes on Wiktionary.
1. I will begin with a call to observe chronology in phonetic changes, and with a change in the etymology of this nonsense: https://en.wiktionary.orghttps://dictious.com/en/Reconstruction:Proto-Slavic/kor%C4%BE%D1%8C
So many changes cannot happen in one century: two accentological laws (Dybo's, Ivšić-Stang's laws), quantitative differentiation of long and short vowels, iotation, reform *u ⇒ *i and quantitative alignment!
Correctly should be so: Old High German Karl ⇒ late Proto-Slavic *kõrľь, with an accent paradigm b. There is not Ivšić-Stang's law!
2. I also call for the use of the now reconstructed Proto-Balto-Slavic accentological system by the Moscow Accentological School (now MAS). This "revolution" is 40 years old and it is not "new in linguistics".
The History:
Christian Stang reconstructed Proto-Slavic accentology without seeing a connection between neoacute-type (new acute) stress and anything Baltic.
Further, V. A. Dybo proved (his hypothesis was used by Illich-Svitych in the book 1965) that in Lithuanian the accent on PIE place, and in Proto-Slavic the accent was transferred to the syllable to the right (Dybo's law).
However, the MAS began to study the exceptions to Dybo's law - and it was found that for early Proto-Slavic it was necessary to reconstruct the accent on the Lithuanian place.
All that Stang did not explain, in the "devoid of clarity" period of 60-70 years to explain, complex schemes were built, which assumed multiple shifts of stress to the right and pull back on the syllable on which the stress previously stood in the Proto-Balto-Slavic.
So that "concept MAS" declared and promulgated in 80's and detail developed in 1990-2000's.
"Concept MAS" explains an order of magnitude more facts than of the Stang's concept and the more hyperconservative of the Leiden's concept, and so the theory by the Moscow Accentological School in our time closer to scientific truth. But, of course, Moscow Accentological School - not the last word in science: perhaps, and PBS and especially PIE systems were are far from their Leyden and Moscow models, although to Moscow Accentological School markedly take a closer.
3. Please add the following notation + accent paradigm d:
1. õ — Proto-Balto-Slavic dominant circumflex ⇒ õ — early Proto-Slavic neoacute (new acute), preserved in the old place or shifted to the right by Dybo's law;
2. ó — Proto-Balto-Slavic dominant acute ⇒ ő — early Proto-Slavic «old acute»;
3. ô — Proto-Balto-Slavic recessive acute together with ȏ — Proto-Balto-Slavic recessive circumflex ⇒ ȏ — early Proto-Slavic circumflex;
4. ò — late Proto-Slavic gravis, denoting the transferred stress;
5. ô — is the Proto-Slavic notation for intonation in nom.- acc. sg. the mixed accent paradigm D (with the circumflex intonation characteristic of the form-enclinomenes in the mobile accent paradigm С), which is a variant of the Proto-Slavic oxytonic accent paradigm B.
4. Add valences + and -
5. Please add the sources of V. A. Dybo, S. L. Nikolaev, A. A. Zaliznyak and V. M. Illich-Svitych on accentology:
a) V. M. Illich-Svitych (1963). Именная акцентуация в балтийском и славянском.
b) V. M. Illich-Svitych (1964). Следы исчезнувших балтийских акцентуационных систем.
c) V. A. Dybo (1981). Славянская акцентология: Опыт реконструкции системы акцентных парадигм в праславянском.
d) A. A. Zaliznyak (1985). От праславянской акцентуации к русской.
e) S. L. Nikolaev (1986). Греческая протеза и балто-славянский акут.
f) S. L. Nikolaev (1989). Балто-славянская акцентуационная система и её индоевропейские истоки.
g) V. A. Dybo, S. L. Nikolaev, G. I. Zamyatina (1990). Основы славянской акцентологии.
Then the concept changed — "drawn accents" began to be treated as early Proto-Slavic stress.
a) V. A. Dybo, S. L. Nikolaev, G. I. Zamyatina (1993). Основы славянской акцентологии. Акцентологический словарь.
b) V. A. Dybo (1997). Балто-славянская акцентологическая реконструкция и индоевропейская акцентология.
c) S. L. Nikolaev, V. A. Dybo (1998). Новые данные и материалы по балто-славянской акцентологии.
d) S. L. Nikolaev (1998—1999). Рефлексы праславянских тонов в восточнославянских языках.
e) V. A. Dybo (2000). Морфонологизованные парадигматические акцентные системы: Типология и генезис.
f) S. L. Nikolaev (2009—2011). Восточнославянские рефлексы акцентной парадигмы d и индоевропейские соответствия славянским акцентным типам существительных мужского рода с о- и u-основами.
g)V. A. Dybo (2011). Балто-славянская акцентная система как рефлекс «западноевропейского» варианта праиндоевропейской акцентной системы.
h) V. A. Dybo (2013). Балто-славянская акцентная система и итоги индоевропейской акцентологической реконструкции.
i) V. A. Dybo (2019). Ещё раз о праиндоевропейском характере двух праславянских акцентных парадигм глагола.
I thought that it is our inofficial policy that, in the translation section while giving Norwegian translations, if the Nynorsk and the Bokmål terms be identical, then one can show the translation under the banner of Norwegian alone, instead of redundantly showing nn and nb separately. However it seems that some users do not approve of this (as in this case: diff). What are the folk's thoughts? Thanks. —Lbdñk (talk) 19:02, 30 November 2019 (UTC)
It makes sense to show specific lects because having just the more general could also mean “I don’t know / have not checked / don’t care whether there is a difference, I just know that there is this”. Fay Freak (talk) 21:01, 30 November 2019 (UTC)
I don’t know about that inofficial policy, but I see a drawback of the practice. Suppose that in the translations section of English Advent calendar we use “Norwegian: adventskalender”. Then the link does not go to any Norwegian section on that page but to the whole page. --Lambiam22:02, 30 November 2019 (UTC)