Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2022/July. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2022/July, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2022/July in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2022/July you have here. The definition of the word Wiktionary:Beer parlour/2022/July will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2022/July, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
I don’t mind keeping these, but it does feel a bit eyeroll-inducing when so much time has gone into adding this stuff, when we’re missing a lot of far more notable places elsewhere. Theknightwho (talk) 10:16, 3 July 2022 (UTC)
@Ghirlandajo You were told to take the issue to RFV, which is the appropriate place, not to “take your grievances elsewhere”. You also claimed in your edit summary that they aren’t English terms, which is a different argument to the one you’re making here. Theknightwho (talk) 02:18, 3 July 2022 (UTC)
Crowdsourcing means relying on volunteers for contributions. If we impose restrictions on what the volunteers want to enter, they tend to stop contributing. Some start by contributing low-value entries that interest them for some reason (their local toponyms, occupational terms, local slang, etc) but eventually stay to contribute material that is of greater value. If a contributor makes contributions that are more trouble (poor formatting, wording, etc) than they are worth, the contributor can be reasoned with and then blocked, if reason doesn't work. DCDuring (talk) 17:14, 3 July 2022 (UTC)
The statement of principle may be good, but the particular contribution was made by a veteran contributor, who could do better than contribute probably unattestable toponyms. DCDuring (talk) 14:40, 4 July 2022 (UTC)
@Inqilābī What "LOL", huh? You're one of the most annoying editors here (inb4 says fckn Shumkichi, I knnow, I know :3). I still remember your unfunny "joke" about Russia and Ukraine, the distaste remains. Two quick questions: 1) why are Indians so pro-Russian, is it some kind of Stockholm syndrome or what? 2) why do we even allow pro-Russian users to edit in the first place? I think we should introduce some form of thought police here because everyone must unanimously support Ukraine (i.e. be a normal, decent person - why is that so hard for some of you?). I will only forgive you if you publicly admit your guilt and repent for your words. Shumkichi (talk) 11:05, 4 August 2022 (UTC)
@Inqilābī How am I a liar? You were joking about Ukraine to piss me off a few weeks after the war broke out so who's the liar? Btw. do you only have Internet Explorer in India or what? Because this joke about space is like 1000000000000 years old. We call ppl like you "bezbek" in Poland, look it up. Shumkichi (talk) 08:54, 5 August 2022 (UTC)
@Shumkichi: You’re a liar because I didn’t even mention the word Ukraine in any conversations with you. And no, this joke about space is a current meme (hint: Polandball). PS. Don’t assume in which country I live. It doesn’t matter. ·~dictátor·mundꟾ00:20, 6 August 2022 (UTC)
@Inqilābī Jeez, which part do you not understand? You didn't have to explicitly mention the name of the country to make a stupid and disgusting, given the bad timing, joke about it. Btw. are you seriously trying to play the race card? Then you're more Americanised than you realise :D They're the ones who incorrectly use this word for any kind of discrimination or even for simple impoliteness. Wow, the American cultural hegemony seems to work, good job you guys. Ppl like you genuinely make me think that Richard Lynn was right. Shumkichi (talk) 01:26, 6 August 2022 (UTC)
Why do you insist on personal attacks (calling someone a liar) instead of attacking what they say (ie, what you believe to be false)? The person could be mistaken. The person might be joking. The person might be trolling you. YOU may have misread/misinterpreted what was said. Also, you might try taking in w:Fundamental attribution error, w:Attribution bias, etc.. DCDuring (talk) 13:21, 6 August 2022 (UTC)
I don't know the context of this at all, and I agree that people should not make personal attacks but I'm not sure if "misguided" counts as a personal attack. Saying someone is misguided sounds to me similar to saying you believe they are wrong in a given situation; it may be a bit rude but it doesn't sound especially personal, unless something else was also said. Also, in general you should not edit other people's comments; it may be acceptable in the case of egregious attacks (e.g. if someone uses racist language or profanity), but usually it is better to leave them alone and let the words speak for themselves. Benwing2 (talk) 00:09, 3 July 2022 (UTC)
For the record, the phrase that was used by me was ‘misguided opinion’, which was totally apropos, considering the critical and somewhat sharp comments of FF I was responding to (while trying to justify what a wordbook is supposed to be). The accuser attempted to make the false impression that I called other people ‘misguided’ so as to make the non-issue look as huge as possible, but haply, has failed. Whoop whoop pull up should consider taking a wiki-break to relax: sometimes wiki-editing can have an effect on our sanity. Peace be upon all. ·~dictátor·mundꟾ11:42, 3 July 2022 (UTC)
Inqilābī has merely called an opinion misguided so you're mischaracterizing what has actually happened. But even if they had called somebody else misguided, that would of course still totally be within the bounds of the acceptable and wouldn't warrant altering somebody else's comments. — Fytcha〈 T | L | C 〉 12:45, 3 July 2022 (UTC)
Yes, it’s simple: You only may edit someone’s comment if he would want it, or his lawyer would advise him it agood, but not if you or somebody else wants it, as this is a falsification worse than any unfounded personal attack. Really, aren’t all the shams on the internet worse than its abundance of distaste and injury? “Personal attack” is a strong word and we may just keep to the standards according to which something is allowed or disallowed as an insult—which is roughly said that an utterance is permissible if it has sufficient proportion of reference to the publicly relevant subject matter (Sachbezug). Fay Freak (talk) 14:55, 3 July 2022 (UTC)
Disagree with the first, agree with the second. Even if they'd want to make the change, let them make it themselves. I'd possibly make an exception for broken links/formatting, but that's it. Theknightwho (talk) 16:01, 3 July 2022 (UTC)
Yes, you would have to conclude that he specifically wants somebody else to correct, an assumption exceedingly rarely permissible, so you disagree not, defining the putative agreement sufficiently narrow. It is but the thought of rightful negotiorum gestio. Fay Freak (talk) 21:03, 3 July 2022 (UTC)
@Whoop whoop pull up: if that was a personal attack, then your response to it was, too, and double for this thread. By claiming that Inqilābī's behavior was so serious that you had to take emergency action to deal with it, you're making assertions about their intent and their ability to moderate their behavior. While Inqilābī has never exactly been known for excessive tact, the fact that no one else objected or apparently even noticed there was a problem should have given you pause. This was an open discussion with participation by a number of people who could have taken action, including admins, so there was no reason for you to intervene. Chuck Entz (talk) 15:55, 3 July 2022 (UTC)
It is always better to direct one's negative comments to observable actions rather than direct them against the actor. By better I mean less likely to cause unproductive interpersonal conflict. DCDuring (talk) 17:27, 3 July 2022 (UTC)
Keep it on your watchlist, do I really have to teach you etiquettes? I only ping people for serious stuffs, I do not ping the bored, annoying, racist people out there… Anyways, that Wikipedia article is irrelevant to your argument. ·~dictátor·mundꟾ00:20, 6 August 2022 (UTC)
Without having seen the joke in question, I demand, in the name of human rights, human dignity, and a reasonable respect for intellectual freedom, a qualified right of editors to make unfunny jokes within the scope of building the dictionary project. This is not a joke project, but unfunny jokes must not be a plus or minus to any administrative concerns unless they are just totally outside the scope of building the dictionary. There is no need for a right to tell funny jokes- everyone loves those. What's precious and important is the ability to tell an unfunny joke. --Geographyinitiative (talk) 13:07, 4 August 2022 (UTC)
I don't have any problem adding this, and the the Glottolog code gurg1241 is a good sign that this is recognised by linguists. Despite the name of the Wikipedia article (Gorgani), I note that Gurgani seems to be in much more common use, so I agree we should be using that as the name.
There isnt a code for Malto instead it has the code for its 2 main dialects kmj – Kumarbhag Paharia and mjt – Sauria Paharia even though they are intelligible and commonly considered the same language AleksiB 1945 (talk) 13:56, 3 July 2022 (UTC)
By default we've used the ISO 639 language codes, which split them, but that doesn't always make the most sense. I note from the Wikipedia article that the lexical similarity is 80%, which is quite a lot lower than the similarity between many European langauges such as Spanish and Portuguese (89%), French and Italian (89%) and various others. Are you sure this makes sense? Theknightwho (talk) 14:32, 4 July 2022 (UTC)
Wait 89% lexical similarity? how are they considered different languages? apart from that almost all articles take them as dialects of Malto like and AleksiB 1945 (talk) 15:04, 4 July 2022 (UTC)
As Wiktionary:Requests for deletion/Non-English § extrem seems to soon conclude with deletion, I wanted to find out whether the majority view is that we exclude these forms in general. If consensus exists, I will amend WT:ADE accordingly giving everybody the license to delete these fake adverbs whenever encountered. I will of course take proper care and formulate it carefully so as to not exclude anything remotely interesting (if the adverb use has gained additional senses for instance).
My arguments are laid out in that thread in more detail, but just to recap:
The majority of German grammarians don't consider adverbially used adjectives to be adverbs, to belong to the lexical category of adverbs.
No serious monolingual dictionary that I'm aware of (not even online dictionaries) has them.
They clutter the adverb category and make it hard to find true adverbs.
A related issue: adjectives can be used in 3 ways (attributively, predicatively, adverbially). Whenever a given adjective can only be used in 1 of these 3 ways, we have a way to denote that fact: attributive only (Hamburger, siebte): {{de-adj|pred:-}}; predicative only (barfuß): {{de-adj|predonly}}; adverbial only: adverb entry. However, whenever an adjective can be used in exactly 2 of these 3 ways (but not the third), we currently have no way of documenting that. Some work would have to be done (perhaps also in connection with the adjective templates, @Benwing2) if the concern is that we hereby lose the information of which adjectives cannot be used adverbially (this information is not currently present because almost all adjectives have no adverb section, but it could theoretically be there if we had all possible adjective->adverb conversions as PoS headers). See the RFD thread for a more in-depth discussion on this (especially the proposed categories and the examples to which this applies).
So: should "boring" (i.e. ones that have nothing special about them; see above) adverbially used adjectives be excluded from having separate adverb entries? — Fytcha〈 T | L | C 〉 01:25, 4 July 2022 (UTC)
Support. This feels similar to the issues brought up in the essersi discussion. Things that are entirely predictable and are easy for language learners to grok tend to clutter up entries and categories if they are specified explicitly. (IMO this argument does not apply so much to inflected forms of most inflected European languages because the forms typically are complicated and not easy for language learners. That's why for example I created a script to generate Russian non-lemma forms and am in favor of doing similar things for Romance languages, Arabic, etc. Turkish is a different story; same for Arabic nouns and verbs with object suffixes (but Hebrew object-suffixed nouns and verbs are far less predictable and hence it makes sense to include them). BTW I can add the necessary support to {{de-adj}}. I'm guessing it's impossible to have an adjective that's both attributive-only and predicative-only, so we just need to add an indicator of whether an adjective can be used adverbially, and since most can, it should be something like noadv to indicate that it cannot be used adverbially (right?). Benwing2 (talk) 01:41, 4 July 2022 (UTC)
@Benwing: That would be great and very welcome if you could change the templates accordingly. To summarize:
klein can be used attributively and predicatively, but not adverbially
monatlich can be used attributively and adverbially, but not predicatively
Adverbially only doesn't make sense and I'm not sure whether predicative+adverbial exists. Something like noattr, nopred, noadv, as you propose, would be good in addition to (or as a replacement of?) predonly and pred:- (though the latter should IMO be renamed to attronly). — Fytcha〈 T | L | C 〉 09:39, 4 July 2022 (UTC)
It appears that siebte can be used predicatively, but in German it is then somehow often considered a nominalization and capitalized: Brömmel wurde Siebte in 4:26,92 Minuten;Bei den Mädchen holte sich Mathilda Bertaggia souverän den Titel, Paula Kristan wurde Siebte;Teresa Stadlober belegte Platz neun im Final Climb und wurde Siebte in der Gesamtwertung. (If they were men, they’d become Siebter.) Not all uses are capitalized: Luise Werner wurde siebte in Ihrer Wertung;Nadine Kuhl wurde siebte in der Gewsichtsklasse bis 57 Kg in der gleichen Altersklasse;Die Schweiz ist nicht siebte geworden, sondern fünfte, mit gleich viel Punkten wie Dänemark und Schweden. --Lambiam19:54, 5 July 2022 (UTC)
Support So basically the header already tells us whether an adjective can be used attributively, adverbially, or predicatively. The merely traditional circumstance that we otherwise have separate part-of-speech section headers “Adjective” and “Adverb” is too weak a motive to impose duplication upon German adjectives. Fay Freak (talk) 22:55, 24 October 2022 (UTC)
Oppose
Oppose. As I said at the deletion discussion for extrem, CFI says "all words in all languages"; German zero-derived adverbs are words in a language. On the other hand, CFI says nothing about excluding completely predictable derived forms without morphological change. We're not paper, so we don't have to worry about saving space like major German dictionaries do. The further discussion at that thread also shows significant doubt thrown on the assertion that all German adjectives predictably form zero-derived adverbs. —Mahāgaja · talk08:08, 4 July 2022 (UTC)
@Mahagaja: Yes, all words in all languages. The German adverb word extrem doesn't exist though; this supposed adverb is not part of the German lexis as has been demonstrated sufficiently. — Fytcha〈 T | L | C 〉 09:41, 4 July 2022 (UTC)
@Fytcha: Well, the distinction between adverbs and adjectives used adverbially has been asserted, but without any evidence that there's actually a difference. —Mahāgaja · talk10:11, 4 July 2022 (UTC)
@Mahagaja: Sorry, I was being unnecessarily unfriendly to you in my previous reply. To expand on it a little: in that sentence, extrem is an Adverbial, a member of the syntactic, relational category of adverbially-acting components. Adverbiale is a category that exists only in relation to a phrase, just like Objekte. However, it is not a member of the lexical category Adverbien. This is the view of the majority of German grammarians according to the de.wiki articles that I've cited in the RFD thread. To provide some more literature on this (including actual arguments, not just expert opinion):
2012 October 24, Petra M. Vogel, Wortarten und Wortartenwechsel: Zu Konversion und verwandten Erscheinungen im Deutschen und in anderen Sprachen (Studia Linguistica Germanica), Walter de Gruyter, →ISBN, →OCLC, page 212:
Das spiegelt sich auch in der unterschiedlichen Terminologie für solche Erscheinungen als 'adjektivische Adverbien' (z.B. ADMONI 41982: 204) oder 'adverbiale Adjektive' (z. B. EISENBERG 31994: 220). Ich plädiere hier mit EISENBERG für letztere Lösung, und zwar aufgrund der lexikalischen und formalen Übereinstimmung mit flektierten Adjektiven und setze hier parallel dazu ein Nullallomorph an.
2016 December 17, Peter Eisenberg, Grundriss der deutschen Grammatik, Springer, →ISBN, 6.1 Abgrenzung und Begriffliches, page 204:
Kein terminologischer Glücksfall ist das Nebeneinander der Begriffe Adverb und Adverbial. Meistens - aber längst nicht immer - wird Adverb als kategorialer, Adverbial als relationaler Begriff verwendet. Wir folgen diesem Usus und gebrauchen ›Adverbial‹ synonym mit ›adverbiale Bestimmung‹ als Bezeichnung für eine syntaktische Relation (s.u.).
2017 December 18, Peter Eisenberg, Grundriss der deutschen Grammatik, Springer, →ISBN, page 209:
Wir schließen uns dieser Position nicht an, sondern plädieren für eine Zuweisung zu den Adjektiven und wollen nur noch von adverbialen Adjektiven sprechen. Zu den drei genannten Argumenten: (1) adverbiale Adjektive haben viele Stellungsmöglichkeiten mit Adverbien gemeinsam, viele andere aber nicht, beispielsweise die in 2. Auf weitere syntaktische Besonderheiten wird weiter unten eingegangen. (2) Adverbiale Adjektive sind auf das Verb bezogen, das ist unstrittig. Nennt man sie deshalb Adverbien, so müssen die Adverbien anders benannt werden, denn sie sind gerade nicht auf das Verb bezogen (dazu 6.1). (3) Die Zuweisung zu den Adverbien aufgrund von Unflektiertheit beruht auf einem systematischen Irrtum. Adverbien sind einelementige (›uneigentliche‹) Paradigmen, die wir deshalb als nichtflektierbar bezeichnet haben (2.1). Nichtflektierbar und unflektierbar is nicht dasselbe. Die Kurzform des Adjektivs, wie sie in prädikativer und adverbialer Position erscheint, ist nicht markiert in Hinsicht auf Genus, Numerus und Kasus und deshalb unflektiert. Das Paradigma, dem sie angehört, ist aber keineswegs nichtflektierbar. Wer dieses Kriterium zum ausschlaggebenden für eine Zuweisung zu den Adverbien macht, müßte jedenfalls auch das prädikative Adjektiv zu den Adverbien zählen (so im Prinzip in Droescher 1974). Das beseitigt zwar nicht die Verwechslung von nichtflektierbar und unflektiert, ist bezüglich des einmal gemachten Fehlers aber konsequent. Auch aus der Sicht einer traditionellen Kategorienlehre hätte die Klassifikation als Adverbien unerwünschte Konsequenzen. Fast alle Adjektive können adverbial verwendet werden. Würde man sie in dieser Verwendung als Adverbien klassifizieren, wären die Adjektive Homonyme einer Teilklasse der Adverbien. Die Kategorie Adjektiv würde nur Elemente enthalten, zu denen es auch ein homonymes Adverb gäbe (Grundzüge: 621). Eine der vier lexikalischen Hauptkategorien hätte ihre Eigenständigkeit verloren.
2018 June 22, “Adverb”, in grammis - Grammatisches Informationssystem:
Adjektive, die von Haus aus ja flektierbar sind, gelten also auch dann nicht als Adverbien, wenn sie in adverbialer Funktion vorkommen: So kommen die Adjektive hastig (ein hastiger Schritt) und laut (eine laute Stimme) in den Sätzen Peter isst hastig und Fritz hat laut gelacht als Adjektive in der Funktion eines Adverbiales vor.
On a balance, I'm weakly inclined to support this, but only weakly. Grammarians do seem to feel that these (adjectives used in adverbial roles) don't belong to the category of the adverb part of speech, and because (nearly) any adjective can be used in this way, it does fill the Adverbs category with noise; compare if we put languages' stative verbs in both the verb and adjective categories to swamp out true adjectives. On the other hand, is this different from crowding the noun category with nominalizations of verbs? Meh. I also wonder if turning the presence or absence of an "adverb" section into a question of attestation is more like turning the presence or absence of e.g. a plural into a question of attestation (good), or like turning the presence or absence of a dative feminine singular weak declension slot in an adjective table into a question of attestation (bad, misleading, in that if readers see the section/slot in many entries and then find it missing from one, it would be reasonable for them to gather there's some rule preventing the adjective from being used adverbially or in the dative feminine singular in the weak declension, when in fact the absence is just because we only found two but not three cites of that because the adjective itself was already rare). - -sche(discuss)22:21, 16 July 2022 (UTC)
I have to admit ignorance, because I must have been absent when Adverbs were introduced in school grammar. I could as well oppose in support of Mahagaja, in favor of continuing the discussion, though its difficult to argue against the communis opinio, which is echoed by w:de:Adverb as well. There remains a caveat:
"Die Zugehörigkeit bestimmter Wörter zur Kategorie Adverb ist häufig umstritten, da nicht immer klar ist, ob die vorgeschlagenen Kriterien nur auf Wörter der Kategorie Adverb zutreffen würden"
Maybe that's why they commit to nothing more than two very broad criteria that would include Auto. Vogel begins with notable references for further discussion to continue with Latin adjectives, which exhibit agreement similar to einen schnellen Wagen. We do include some participles as adjectives (verschwindend).
Eisenbergs counterexample "ein hastiger Schritt" is a rash decision. Ein hastig' Schritt would sound archaizing but not wrong, because there are thousands of compounded collocations without binding vowel. The expression is idiomatic and hastig is debatable. It can be nominalized from pret. er schritt hastig , and the diphtong of the more felicit Schreiten may be the same as for strittig / streitig which points to High German only for the latter.
extrem is a useful example because the suffix of extra is possibly adverbial.
scheinbar vs. anscheinend is another neat example of adverbs. The suffixes may point to different grammatical functions that should reflect in their distribution. Compounds with the first sense rely on the bare root schein-(“pseudo-”) instead. The -bar suffix is a curious case because it is able to calque -abel (indiskutable, doch diskutierbar) with an appical approximant, or -e with an uvular approximant /baːɐ̯/ on the other hand (IMHO); Low German feste points to an adverb suffix, s. v. gerne. Latin extremum, -issimus are deceptively similar to German -sam. E.g. arbeitsam pertains literally to a werkwoord, the superlativ -ste may be used pronomial.
By the way, the "als" in Vogel's use in a context with "solche" breaks my parsing. By the same token I have to reconsider my "wie" in use to introduce enums. In combination, and in spirit of the topic, one might try to read an adverbial pronoun phrase all solche, or aliquis in the post field. The use of *-līk is at any rate remarkable. 2A00:20:6007:A3E4:A4A1:3DFA:B280:FB4818:18, 5 August 2022 (UTC)
I think I'd rather focus on the practical / usability side of things and not so much on the "academic" side here. So extrem is probably more used adverbially than as a "true" adjective in contemporary (colloquial) German. I don't see any harm having this as a separate POS, as long as we don't mindlessly create adverb sections for all adjectives. – Jberkel19:40, 25 October 2022 (UTC)
@Jberkel: Thanks for the input. I have a counterpoint though: if we don't mindlessly create adverb sections for all adjectives, people might take the absence of such a section as an indication that this specific adjective cannot be used adverbially. And because we agree that mindlessly creating such sections is bad, I would conclude that not having them at all is the least evil out of the three options (the three being: having adverb sections for no, some, (virtually) all adjectives).
We could also add an "adverbial" row to the adjective declension template which could also help guide learners in the right direction in the absence of an adverb POS section. I felt like that was missing anyway because what else would the form "extremst" be if not the adverbial of the superlative? — Fytcha〈 T | L | C 〉 21:52, 25 October 2022 (UTC)
Decision
@Mahagaja: Seeing that the discussion has fizzled out and that you're the only explicitly opposing party, do you think it would be fair if I went ahead and added this exclusion to WT:ADE? — Fytcha〈 T | L | C 〉 04:09, 8 November 2022 (UTC)
I suppose 4–1 counts as "consensus" for the Support side, though obviously I wish more people had taken part in the discussion. —Mahāgaja · talk09:01, 8 November 2022 (UTC)
Let me post late oppose. I checked multiple German-English dictionaries and they usually have separate coverage as adverbs. Having adverbs and their -ly translations is more user-friendly. Whether to consider the adverbial behavior as a separate SOP or as part of an adjective is a manner of linguistic convention; the behaviors are the same in both manners of analysis. --Dan Polansky (talk) 09:04, 8 November 2022 (UTC)
Should alternative forms be in topical categories?
For example, should both mesail and mezail be in CAT:en:Armor, or only the lemma? Should both azure and azur (which have out-of-sync pronunciations) be in heraldic categories, or only the lemma? On one hand, having alt forms in the category makes it easier to find the entry you want if you're searching the category for one spelling unaware a different one is lemmatized; OTOH, having half a dozen spellings of chamfron, multiple spellings of mesail, mamelière, affronté/affrontée/affronty, etc cluttering up space makes it harder to look over how many actually-different words are in a category. - -sche(discuss)10:12, 4 July 2022 (UTC)
I personally never add alternative forms, alternative spellings or even entries equipped with {{synonym of}} to topical categories, and I would like to keep it that way. Thadh (talk) 10:54, 4 July 2022 (UTC)
The terms double bogey and buzzard both refer to the same thing in golf, but the names for it are very different. I don't see any reason to only categorize one of them in "en:Golf". A dictionary is a directory of words, not of unique concepts. If some words happen to have the same denotation that doesn't make them less valid as entries; whereas mere spelling variations are clearly a different matter. 98.170.164.8814:44, 4 July 2022 (UTC)
In principle, terms defined as synonyms deserve fuller treatment than alternative forms, eg, etymology, derived terms. If the synonym is used in fewer or more senses than the term it is defined as a synonym of, it may need a fuller set of definitions or a usage note. DCDuring (talk) 14:46, 4 July 2022 (UTC)
Yes, agreed (also with 98., Chuck and Theknightwho above). Synonyms should be full articles, alternative forms should be stubs. — Fytcha〈 T | L | C 〉 14:49, 4 July 2022 (UTC)
@Fytcha: I think references should be allowed (perhaps even obligatory if there are any!) for altforms, since these are the main source of attestation in LDL languages. Thadh (talk) 13:41, 4 July 2022 (UTC)
@Thadh: In the current text they're allowed if they differ from the reference in the main form. Referring to a different entry in a dictionary also falls under "differing". Is this too restrictive in your opinion? — Fytcha〈 T | L | C 〉 13:44, 4 July 2022 (UTC)
@Fytcha: Yes, because entries can give both the main (canonic) form and the alternative form together, while counting as verification for both. Thadh (talk) 13:49, 4 July 2022 (UTC)
As I interpret WT:RfV, alternative forms need three cites, but cites using alternative forms count to support separate senses at the lemma. Is that our policy or practice? DCDuring (talk) 14:13, 4 July 2022 (UTC)
I personally think you might've jumped the gun with creating the vote after only 3 hours of discussion here, that's not enough time for enough folks to even read what's going on, let alone timezone shenanigans. AG202 (talk) 21:37, 4 July 2022 (UTC)
@AG202: I've only just seen this reply now. You can ping me whenever, it doesn't annoy me. As to your comment: no, I haven't jumped the gun because I didn't create the vote only because of this discussion; the discussion was merely the last straw. This particular issue has been a long-term grievance of mine that I run into almost every single day. Just moments ago, marvelous and marvellous were posted to WT:RFM which is yet another perfect illustration of the problems with the status quo. — Fytcha〈 T | L | C 〉 20:00, 6 July 2022 (UTC)
Synonyms: yes. Alt forms: it's another of those synchronisation problems as with color/colour. I suppose we need some bot or template solution in the long term, or things will inevitably fall out of sync. Equinox◑14:46, 4 July 2022 (UTC)
Thanks. (I agree synonyms should be treated as their own words with categories etc, although I'm not sure how I feel about using labels that suggest x is a {{lb|en|medicine}}{{synonym of}}y, as if y might be the nonmedical word, when in fact y is also a {{lb|en|medicine}} word...) I intend to go through heraldry terms later, as we have many duplicate entries that should be alt forms, and wanted to know whether to add categories like someone did to mezail or not. I hope the vote doesn't get derailed by people's feelings about national standard spellings; if so, perhaps these could be excepted. - -sche(discuss)19:09, 4 July 2022 (UTC)
I agree with Equinox and sche here, synonyms should be excluded. Also, I agree that national spellings should be excluded as well. There's no current written that state that terms should be lemmatized at any one alternative form with WT:FORMS saying "In particular, while some editors try to make the “main” entry correspond to the most common form —and some sysops actively encourage this— the official policy is that all the forms are equally valid. It is not mandatory to make an alternative form entry's content consist exclusively of an alternative form link; especially in cases when all forms are obscure, a gloss is permitted for each form, although it is usually best to indicate that other forms exist. " Though that emphasized part is not ~official~ policy, it does make me lean on allowing categorization for alternative forms. Also, I don't like that we'd be removing grammatical information like "countable" & "uncountable" from the alternative forms. AG202 (talk) 21:34, 4 July 2022 (UTC)
@Fytcha Also does this only apply to words with {{alternative form of}} on the definition line? What about dialectal forms? Should those not be included in categories as well? I feel that they should. I don't see much reason for ẹkà to be stripped of imagery, categorization, and information, just because the standard ọkà exists, as it limits the information and can hide the term from the categories that it should be in. Same with any difference in regional spellings and standards. I really feel that this is really majority-language-centric as well, like sure the different spellings of naïveté might not need full categorization, but that's far from the case with all alternative forms across all languages, so this proposed does too much for me now. AG202 (talk) 21:44, 4 July 2022 (UTC)
Also what about languages that have different scripts? Would アィヌモシㇼ lose info because aynumosir exists? What about Serbo-Croatian? What about translations at 1st vs first? The phrasing "or should correctly be defined as such" is very vague. AG202 (talk) 22:43, 4 July 2022 (UTC)
@AG202: Writing the vote, I did think about Serbo-Croatian which is exactly what made me add that "(or should correctly be defined as such)" disclaimer because, as it stands, we treat the Cyrillic and Latin entries of any given Serbo-Croatian word as equal, thus neither should correctly be the alt-form of the other (I do acknowledge though that it is worded poorly and I'm open for suggestions). In other words, Serbo-Croatian is excluded from this (and from the looks of it Ainu too, though I don't know the policies surrounding it). Abbreviations, initialisms, clippings etc. are likewise exempt from this vote (as they are not listed in WT:ALTER).
Dialectal forms on the other hand will be stubified which is actually in part what motivated me to create this vote. Not only in English (see the insane amount of duplication and also, unsurprisingly, discrepancy between fiber and fibre), but also in my native language: see the endless duplication of etymologies and alt-forms in the alt-forms of Chatz. When I added one, I had to go to all entries separately and add it, which is definitely not what we want to do. This is not maintainable design. I'm also not a fan of ẹka/ọka: what if a synonym needs to be added, qualified or removed? What if the etymology needs to be expanded? Having multiple non-stubs always leaves the door open for discrepancies to creep in. — Fytcha〈 T | L | C 〉 23:27, 4 July 2022 (UTC)
@Fytcha "What if the etymology needs to be expanded? Having multiple non-stubs always leaves the door open for discrepancies to creep in." Then editors should update the entries? I already have to do it when I add {{see also}} to entries or whenever I have to update template usage or whatever. I don't see the worth in removing important content from dialectal entries because it's not "maintainable". That brings in the same issue that @-sche mentioned with national standards. And then what if someone is looking up that specific common dialectal term? It's easier to see all the information on that singular page and have it listed, rather than having to click to another page. If you want to stubify coverage for the languages you work in that's fine and honestly that seems like something that could be done on an About page, but as for the languages that I've worked in, I wouldn't approve it, especially when we've been told to try and condense Yorùbá into dialects rather than separate languages with their own headers. Also, dialectal variation of this kind isn't listed on WT:ALTER either to be fair. AG202 (talk) 00:27, 5 July 2022 (UTC)
fibre & fiber should just be cleaned up on their own. It also brings up the question of which one is the lemmatized form, which has been avoided reasons like this. AG202 (talk) 00:29, 5 July 2022 (UTC)
I am (slowly) working on a solution to situations where we don't want to prioritise one spelling over another. I don't think it's appropriate to say one is the lemma, because it depends where you're from. Theknightwho (talk) 16:49, 5 July 2022 (UTC)
@AG202:: sure, it would be nice if all our editors did the right thing and updated both entries, but that's not what really happens. The average editor sees something is missing from the one entry, so they add it. Then they go off and do something else. The number of editors who actually go to the other entry is miniscule. Out of those, the number who take the time to check whether sense #18 that they just added is the same as sense #23 in the other entry, let alone figure out where their new sense fits in the completely different order of senses in the other entry is not much greater than the number of participants in this thread. Come to think of it, there's a good chance that the only people who do that are the participants in this thread... Chuck Entz (talk) 07:38, 5 July 2022 (UTC)
@AG202: Besides what Chuck said (which I 100% agree with), I'm really not convinced that my proposal makes the average reader's experience any worse. I can see a point for retaining the image in ẹka because there isn't any major source for discrepancies introduced by it but the etymology and synonyms need to go. Yes, the user needs to click on ọka to get to the etymology and synonyms now, but the experienced user will do that anyway. You know why? Because they have learned that in 4 out of 5 cases the information in such alt-form entries is incomplete.
Also, WT:ALTER does list this kind of dialectal variation, at least that's how I understand the "regional variations" bullet point (yes, the examples given only differ by orthography but I don't understand that bullet point to be restricted to that). — Fytcha〈 T | L | C 〉 11:03, 5 July 2022 (UTC)
@Fytcha That bullet point should be clarified then because it’s not clear. And sure we could move the etymology & synonyms to the other entry, but I still don’t think that it’s necessary to remove categorization or things like that. @Chuck Entz there are a LOT of things that Wiktionary is lacking on or that editors don’t do or update or clean up, whatever, but that doesn’t mean that we should put that laziness into policy and significantly alter the experience for the editors that do do that and find it useful. AG202 (talk) 13:46, 5 July 2022 (UTC)
@AG202: Addendum: a gloss will still be permitted under this vote because it is part of the {{alternative form of}} template. Countable/uncountable and other grammatical categories are allowed per the vote but I lean towards disallowing them as labels. They're again a source of potential copy-paste errors, discrepancies, incompleteness etc. Secondly, they, for a long time, used to look weird to me because they read like "(uncountable) Alternative form of .."; so is it the uncountable counterpart of a normally countable noun? That's how it naively reads at least. -sche seems to share this perception, given their above comment. I'm also not sure whether such grammatical labels in alt-forms even do any good; when you stop and ponder about a word, don't you move on to the non-alt-form anyway? It doesn't matter that the alt-form lacks such-and-such label because it lacks everything else too; normally you'd move to the main form anyway.
@-sche: The issue with your second suggestion in diff is that obsolete (/dated / rare / uncommon) alt-forms should not have labels either; they should be defined using {{obsolete form of}} etc. so I'm not really sure which example to put there to avoid the rightfully raised issue about pondian bias while not sacrificing illustrative value. — Fytcha〈 T | L | C 〉 23:58, 4 July 2022 (UTC)
"I'm also not sure whether such grammatical labels in alt-forms even do any good; when you stop and ponder about a word, don't you move on to the non-alt-form anyway? It doesn't matter that the alt-form lacks such-and-such label because it lacks everything else too; normally you'd move to the main form anyway." I don't as often as you'd think, especially if the entries are actually well-done, and it's an issue of national/dialectal/regional variation. If the important info is already there, then I don't need to go to another location for it. AG202 (talk) 00:31, 5 July 2022 (UTC)
Several problems with this.
At the end of the day, alternative forms are still real words. You're proposing that we essentially turn them into redirects, which would only make topical categories useless for readers. What happens if, for instance, someone tries to find shasqua in Category:en:Swords, and isn't familiar with the shashka spelling we use for the main entry?
What we consider to be the "primary" spelling generally isn't based on any objective criteria other than "someone made this page first". In the case of US vs UK spellings, our policy is to pick the main entry completely at random to avoid giving a preference to either dialect.
Alternative spellings already share most categories (usually automatic ones) with their main entries. I don't see why you're focusing on topic categories in the first place.
If somebody tries to look up a certain word that they already know, why would they look into that category then? Your proposed use case doesn't exist. What does exist however is that somebody looks into a topical category and finds that the same semantic item is represented by 5 different elements in that category because of slight spelling differences. How is that useful?
Yes, this is a problem (one that should be changed IMO) but luckily it only affects US vs. UK differences. For all other spelling variations we're usually wise enough to just follow the usage, i.e. put the main entry into the most used form.
Nobody browses through the 276k entries in Category:English countable nouns (such categories exist mainly for query purposes) but people absolutely manually browse through our small topical categories which justifies the difference in treatment. — Fytcha〈 T | L | C 〉 02:00, 7 July 2022 (UTC)
Fair enough, that was bad phrasing on my part. People might expect to find a term in a topical category, though, and not having it there would only confuse readers. There's also the fact that, at the end of the day, these are still equally valid words, and there's no lexical justification for treating them like redirects. The reason we even use templates like Template:alternative form of has nothing to do with these spellings being less "real", it's because it'd be a pain in the ass to keep definitions in sync across multiple pages. On the other hand, decategorizing these pages feels more akin to placing them in Category:English non-lemma forms or labelling them as misspellings. Going back to my previous example, there's nothing that makes shasqua less relevant to the topic of swords than shashka.
Admitting it's a problem doesn't fix it. What you're proposing would immediately turn topic categories into a mishmash of random spellings.
So? How does it actually harm readers to learn that a word can be spelled in several ways? This is a solution to a nonexistent problem.
I'm just a humble editor, and I don't understand the concept of a 'topical category' versus another type of category. But I think that, at minimum, Taipei and Taibei should appear in the relevant categories in which only one of either could appear- in this case, one (Taipei) will appear in the Wade-Giles category and one (Taibei) in the Hanyu Pinyin category. Now as to Chongqing and Chong Qing, you may say "it's clutter to include both in the Hanyu Pinyin category". I'm okay with that logic, but I think it would be fun and informative to see all the variants in one category, because only a select few words will be able to achieve (as in, 3 good cites) variants. --Geographyinitiative (talk) 22:00, 4 July 2022 (UTC)
When an alt form, abbreviation, phrase, etc is pronounced like its lemma, components, etc.
Currently, absence of a pronunciation on an alt form can mean either (a) it's pronounced like the lemma, or (b) no-one added it yet (e.g., mezail lacks a pronunciation but is probably /z/ unlike mesail's /s/). IMO we'd benefit from a template saying ~"likeotherentry" for when an alt form is pronounced like the lemma, to distinguish that from oversight. This would also be useful when some/all pronunciations of an abbreviation, phrase, etc are the same as the full word. E.g. etc. is pronounced like et cetera but they present pronunciations in differing orders; etc. could just point to et cetera. Or look what I did in Latin@, which has some of its own pronunciations which must be listed, but which can be pronounced like "Latino and/or Latina": it'd be dumb IMO to repeat on Latin@ all the ways Latino (/ləˈtinoʊ/, /læˈtinoʊ/) and and (/ænd/, /ɛnd/, /ənd/, /ən/, /æn/, /ɛn/, ...) can be pronounced. - -sche(discuss)19:35, 4 July 2022 (UTC)
If something is pronounced like the lemma, it's an alt spelling, not an alt form. For abbreviations that's more difficult, but I think we should assume it's also pronounced like the written-out form, and otherwise not call it an abbreviation but rather state that it's an abbreviation in the etymology section.
Fair point, in theory "same pronunciation" vs "different pronunciation which no-one has added yet" could be indicated by always using {{alternative spelling of}} for the first one and {{alternative form of}} for the second. But the number of people who use the templates this way seems ... lower than the number who just use one or the other template for either kind of thing. (It also gets tricky if two forms/spellings have different etymologies, e.g. -er vs -or, which would suggest they are different forms and not different spellings, but they are not actually pronounced distinctly.) (It's something DCDuring has talked about: the danger of fine distinctions that are only maintained by adepts is that in practice they are not maintained. There are situations where it frustrates me, because I am one of the adepts who would like to maintain some distinction, but it is what it is.) - -sche(discuss)19:56, 4 July 2022 (UTC)
Well, the fact people don't follow a practice doesn't mean we don't have a practice or that the introduction of another template would solve that.
The point about -er, -or is a good one, but I think the best solution for that would be creating a second section for -or and call it an alternative spelling of -er (since that is probably the most historically honest representation of what happened). Thadh (talk) 20:05, 4 July 2022 (UTC)
It's further complicated by the fact that "form" refers to different things in different languages. Chinese has a strict definition of what it means to be a form, and it also doesn't make sense to refer to spellings. There is the "synonym" template, but it doesn't have the wide variety such as "obsolete form of", "rare form of" etc. Theknightwho (talk) 20:35, 4 July 2022 (UTC)
Minor point: please don't use the word "like" (which might suggest mere partial similarity). Perhaps "same as", or "see". Equinox◑11:37, 5 July 2022 (UTC)
Another place this might be useful is with names, which at least some users have been disinclined to call "alternative spellings" of other names (fair; does Ashlee think her name is a mere variant of Ashley as opposed to her own name? or as DCDuring said, does someone whose birth certificate says Jim think his legal name is a mere hypocoristic of James, or does he think it's a separate name?), some of which are pronounced the same. Then again, sometimes distinctions are asserted. (Widsith and I and some others discussed Sara vs Sarah a while ago, where either one can be found, in videos from the US, UK, Australia, etc, pronounced in any of the ways the others are, yet a distinction is sometimes asserted to be supposed to exist.) - -sche(discuss)22:16, 17 July 2022 (UTC)
Do we want to have these? I've seen them be added to more and more entries lately. They're just links to a free corpus search engine and not a particularly good one at that. It contains tons of unchecked, machine-translated stuff. The Turkish and Romanian results are particularly poor but the German ones (which I think is one of their best corpus) are still pretty bad at times.
If you search for instance fresh as a daisy in the English-Turkish corpus, you get a lot of results along the lines of papatya gibi taze / papatya kadar taze. This is a non-idiomatic literal translation (I've asked a native speaker) and Google Books also doesn't contain any uses.
If you search called it a day, you get kıyamet gününde onu çağırırız which is utter nonsense. If you search the same phrase in the English-German corpus (), you get some correct and idiomatic translations but the fourth translation there is Es wurde gerade gestrichen über der Schimmel und nannte es ein Tag. which is not only a non-idiomatic literal translation of the idiom but also grammatically bogus. If you look it up in the English-Romanian corpus (), the first hit is Se pare că calmar murdărie numit o zi prea. which is just incomprehensible gibberish.
If you search all that jazz in their English-Romanian corpus, you get many literal translations actually containing the word jazz (there is no such idiom in Romanian).
It's worth noting that these are the first three English idioms that came to mind. I didn't even have to search to find all these errors. And these are just the problems with translations that at least match, because many pairs don't. Probably has to do with the software that matches the sentences of the two running texts.
If we add search engine buttons to our articles, we may as well link Google "term" site:.tr because that usually at least yields texts written by native speakers (or by humans at all). In my opinion, we should only link to newspaper-grade corpora (the ones that e.g. {{R:DWDS}} provides are good), not to machine-translated garbage.
Thank you for your feedback. Reverso has a lot of good info especially for smaller languages, and some errors as well. The Swedish and Ukrainian parts are marked as Beta. The other that is frequently used is https://tr-ex.me/ which might also contain errors due to corpus misalignment. Most of these use ML technology to align text corpora. Do you have a site suggestion that you would prefer to use here? LinguisticMystic (talk) 08:21, 7 July 2022 (UTC)
They clearly generally aren’t just machine-translated. They are mostly movie subtitles, made by professional translators and fans, perhaps sometimes by way of PEMT – idioms are always stressing the capabilities of translators, as well as very technical language, but those corpora are worth checking for anything colloquially needed by bare languages. Fay Freak (talk) 10:06, 7 July 2022 (UTC)
There are many machine translations if you check various collocations and similar things. It's very hit or miss. Vininn126 (talk) 10:08, 7 July 2022 (UTC)
The dictionary part of Reverso is quite good by the way. Do you have any suggestions for freely available general parallel corpora for smaller languages? LinguisticMystic (talk) 12:10, 7 July 2022 (UTC)
Yes, not exclusively ML. It also depends on the language, the German ones tend to overall be of higher quality. But even something like 20% unsupervised machine translations (which is a lowball figure for Turkish and Romanian) makes this unacceptable to link to. And for German there isn't really a need because we do have high quality, newspaper-grade corpora search engines to link to, for instance {{R:UniLeipzig}} and {{R:DWDS}}. — Fytcha〈 T | L | C 〉 13:05, 7 July 2022 (UTC)
I think we should remove all these, from my experience this is just a slightly better version of Google Translate, and God knows we don't want thát site to be templatised. Let's stick to paper sources as much as possible. Thadh (talk) 12:51, 7 July 2022 (UTC)
To be clear, we all be talking about different things. I was referring to Reverso Context, which shows phrases from bilingual corpora. The slightly-better-than-GT parts on other subdomains are a different piece of cake, which I do not even use, apparently approximately knowing the value. Fay Freak (talk) 11:38, 8 July 2022 (UTC)
We're not talking about different things: What reverso does, is undiscriminately extract translations by bot from a wide array of websites. Let's take a look at also for instance. Per sentence, the translation is extracted from:
A subtitled video from "Learn to Dance Tango" (a website, unsurprisingly, about learning to dance Tango).
Unknown
Unknown
An article on the website Kömmerling, some kind of Real Estate website. Mind you, the translation's not even there, so we don't know where that comes from.
Unknown
Some kind of very suspicious website that I don't even want to know what it does, again with no translation there.
An article on IVS Dosing Technology, some kind of steam company; The translation was extracted from the English version of the website.
A brochure about renovating museums, the translation is once again from an unknown source.
Unknown
I would go on, but unfortunately further sentences are behind a paywall request to make an account. Note that each sentence had its source indicated as "Various sources". Honestly, even our entries, if unsourced, are a more trustworthy source, because at least you certainly know who the contributor is. Moreover, using whole versions of websites in another languages as a direct source for translations is not a good idea. Thadh (talk) 12:11, 8 July 2022 (UTC)
I personally use context.reverso.net to look for modern usage of Italian terms, because Italian dictionaries (including extremely good ones like Treccani) tend to be quite prescriptive, and as a result often omit modern usages that they don't like or include archaic usages that they do like. I use the left side of Italian-English examples; the English on the right side is sometimes garbled but you can usually get a sense of what's going on. The Italian usually looks pretty good but maybe this is partly because I'm not a native speaker. However, I would not advocate quoting from this source for precisely the reasons articulated by various people above; the quality is uneven and the English, even when idiomatic, isn't always correctly aligned with the foreign language. Benwing2 (talk) 13:48, 8 July 2022 (UTC)
@Thadh, Benwing2: “a wide array of websites” is often exactly what you want, and good as long as the crawler detects whether its sources are human-translated. For technical products you don’t get any other bilingual corpora – I mean, tango? Realtor jargon? Museums? Or fitness and fashion? Exactly the kinds of things that you wanna know but then do not find in from-scratch dictionaries, and I slowly had to add by checking the corresponding content of websites on the same topic but in various languages – the bot is for finding those that have been translated.
I caveat that I have mostly used Reverso Context for Arabic. It has many dialogues by reason of their dumping subtitles, and indeed the idiomaticity problems are what humans from a completely differently structured language and culture have struggled with; principally humans work like bots if they aren’t (paid to be) too creative. So I have long followed their request to make an account (so they can understand what topics people search, right? I also let Firefox and KDE etc. track usage so my interest counts) and look for the best examples of umpteen, since indeed the first examples cannot be relied upon to be the most relevant; likewise dropping a phrase into Reverso Context has often been the easiest solution if man has been to dull to parse it into European terms. But I have never felt a need to link it, in spite of my contributions to the Arabic entries by themselves amounting to a whole dictionary. It could be a discussion page template though, “just look {{R:Reverso|source language|target language}}”. Fay Freak (talk) 19:56, 8 July 2022 (UTC)
@Thadh: Good analysis, thank you. This helps explain why there are so many machine-translated or not even matching pairs. I would vote to delete if there was an RFDO for these templates. — Fytcha〈 T | L | C 〉 13:59, 10 July 2022 (UTC)
I think this needs to be done desperately. The uses of lemma form-of templates and non-lemma form-of templates are diametrically opposed. When I'm looking for a specialized alternative-form-esque template, all the non-lemma form-of templates are just noise in this category and table. A sortable column in the table would be a start but seeing that the table is not maintained, we should perhaps just split the category into two sub-categories (perhaps Category:Variant-of templates and Category:Inflection-of templates or just simply Category:Lemma form-of templates and Category:Non-lemma form-of templates). Pinging @Benwing2 as you did massive work around all of these. — Fytcha〈 T | L | C 〉 15:46, 7 July 2022 (UTC)
@Fytcha The table hasn't been updated in awhile because although it's autogenerated, the input file for the autogeneration is a manually curated file. I am in the process now of writing a script to autogenerate the input file as well by parsing the template code for each of the form-of templates. As for separating them out, lemma vs. non-lemma may be a bit tricky because some templates may be used for both purposes. However, there's a natural three-way separation into those that are defined using inflection_of_t (which means they take inflection tags, like {{inflection of}}, {{archaic inflection of}}, {{participle of}} and a few others); those defined using form_of_t (which have their text specified using an arbitrary text string like alternative {{glossary|letter case|letter-case}} form of, and which take approximately the same params as {{m}} and {{l}}); and those defined using tagged_form_of_t (which have their text specified using inflection tags, and which also take the same params as {{m}} and {{l}}, such as {{definite singular of}}, {{active participle of}}, etc.). The latter are the closest to what you probably think of as non-lemma form-of templates, although there are some weird outliers like {{alternative reconstruction of}} (which is defined using the tags alternative and reconstruction; this is a holdover from an earlier format and should be corrected). So maybe we can do a first-pass split based on which function is used to define the template, and make any necessary manual corrections. Benwing2 (talk) 01:26, 8 July 2022 (UTC)
Weird Indonesian rhymes
User:Xbypass has created a ton of pages with rhymes categories looking like this (on bubar):
This creates rhyme categories for rhymes like /bar/ and /r/ which makes no sense to me. Is there something special about Indonesian rhymes or is this user simply misusing the template? Benwing2 (talk) 01:10, 8 July 2022 (UTC)
The rhymes in Indonesian is used for oral poetic form of pantun. In its most basic form, the pantun consists of a quatrain which employs an abab rhyme scheme.
First example
Tanam selasih di tengah padang,
Sudah bertangkai diurung semut,
Kita kasih orang tak sayang,
Halai-balai tempurung hanyut.
Second example
Singapura negeri baharu,
Tuan Raffles menjadi raja,
Bunga melur, cempaka biru,
Kembang sekuntum di mulut naga.
I think it makes sense for me to write rhymes for bubar as {{rhymes|id|bar|ar|r|s=2}}. So, what no sense are you talking about? Xbypass (talk) 03:20, 8 July 2022 (UTC)
@Xbypass Normally "rhyme" specifically means the end of a word as far left as the stressed vowel, but no more. It does not normally include the consonant preceding that vowel. In the examples you give, there is nothing to suggest that Indonesian rhyme is any different, and the Wikipedia article you link says nothing about rhyme being defined differently in Indonesian than elsewhere. Where is the stress in Indonesian words? I take it that raja and naga have stress on the last syllable, hence the rhyme in /a/ is totally normal. As for baharu and biru, the rhyme is clearly /u/ not /ru/, since your other examples show that the consonant preceding the rhyme vowel is not part of the rhyme. In order to justify a rhyme like /r/, you need to show several examples that rhyme only in the last consonant, whereas you've shown none so far. Benwing2 (talk) 13:40, 8 July 2022 (UTC)
I don’t know if the rules for what constitutes a rhyme are different for Indonesian, but the standard rule for a perfect rhyme requires the onsets of the stressed syllables to be different, so while brutality is a perfect partner in rhyme for reality, it is not so perfect for mortality. The pair baharu – biru seems less than perfect. --Lambiam15:25, 8 July 2022 (UTC)
Yes, the understanding of rhyme in Indonesian (and Malay in general) is different in first place because stress only plays a minor role in Indonesian prosody. It may depend on the background of the speaker (people from North Sumatra and South Sulawesi, for instance, have very prominent stress that comes with higher pitch and vowel lengthening), but generally, stress has a "floating" character. And in the Riau-Johore area, the cradle of the pantun, it actually falls on the final syllable. So its not surprising that for rhymes, the final syllable is crucial. So baharu – biru is a perfect rhyme because of the shared onset.
Yes, within reason. There is a reasonable degree of freedom as to what people can put in their user-space, as long as it is not problematic it will probably be left alone. If you use it as a personal blog, or advertising platform, etc. someone will probably delete it. - TheDaveRoss12:39, 8 July 2022 (UTC)
User Theknightwho has repeatedly deleted stress marks, arguing that English monosyllables aren't stressed (?), that stress is "only relative" (as are vowels and consonants), which is grossly ignorant. Lexical stress is phonemic in English, even in monosyllables. If we're going to have a phonemic transcription, shouldn't it be the phonemic description of the word? kwami (talk) 05:08, 8 July 2022 (UTC)
We do not include stress marks on monosyllabic English words when used in isolation, which I have already explained to you (though other users can comment on other languages). Please also tag someone if you’re going to start talking about them in a public forum, too. Theknightwho (talk) 09:50, 8 July 2022 (UTC)
It depends on a language-by-language case, but I got the impression that English entries didn't mark stress on monosyllabic words, just like many other Germanic languages (Dutch, Afrikaans and Saterland Frisian at the very least).
That doesn't hold true for every language, and is very much a choice up to the language community. For instance, Russian terms need to have stress, even if they are monosyllabic, because vowel reduction depends on the word's location in reference to the stress. Thadh (talk) 10:13, 8 July 2022 (UTC)
I’m reasonably sure English includes it if stress is mandatory, but that’s pretty rare, and tends to relate to specific senses of words like the. Theknightwho (talk) 10:17, 8 July 2022 (UTC)
Ah, fair enough. I think in the more "lexical" words, as opposed to the grammatical ones, it wouldn't be necessary. Vininn126 (talk) 12:24, 8 July 2022 (UTC)
True, but then again, they aren't at the moment. Perhaps they should though, since what is lexical and what isn't isn't always clear-cut. Thadh (talk) 12:31, 8 July 2022 (UTC)
To be honest, it seems to be the same as with English. We only include stress marks for these where it has semantic relevance. Theknightwho (talk) 13:26, 8 July 2022 (UTC)
'Grossly ignorant' is harsh, no need for that.
As for the matter at hand, I do not see why stress should not marked in English monosyllables. Clearly there are ones that do carry a lexical stress (girl, door, play) and ones that do not (a, and, or). Nicodene (talk) 15:12, 8 July 2022 (UTC)
Yes, I was harsh, but they were edit-warring over their ignorance, and have a history of making POINTy edits and other troll-like behaviour, so I wasn't feeling sympathetic. I am happy to see that they're arguing the point here. kwami (talk) 17:34, 8 July 2022 (UTC)
No, I reverted your attempt to unilaterally impose your view on a project page without any prior discussion. You started this discussion by misrepresenting what I said, and now you’re misrepresenting why you started it. Just stop. Theknightwho (talk) 18:10, 8 July 2022 (UTC)
Perhaps you didn't look at what you were doing. Paul G deleted, without discussion, stress marks that had been stable for a decade. I reverted him. You then reverted me, again without discussion. If you want to make a change to a long-standing consensus, fine, but when your changes are contested, you need to argue your case rather than just edit-warring. kwami (talk) 06:59, 10 July 2022 (UTC)
You’ve been told what the long-standing consensus is. A handful of inconsistent mistakes are irrelevant. Nobody cares about your Wikipedia-style rules lawyering here. Theknightwho (talk) 10:52, 10 July 2022 (UTC)
What decade of consensus have I overturned here? You're the one arguing that we should include stress marks on monosyllabic English words, which is something that we don't do, other than a handful of exceptions which have already been explained. Or are you dishonestly trying to claim that because a handful of words on WT:Pronunciation included them that the consensus is for a mish-mash and on that page only? Because that would be hilarious. Theknightwho (talk) 17:38, 10 July 2022 (UTC)
and /phonemic/ are two different things. For over a decade, all of the phonemic transcriptions on that page indicated phonemic stress. kwami (talk) 18:25, 10 July 2022 (UTC)
Okay, but current practice is that we include them on neither. Make your arguments otherwise, but don't claim that we do something that we don't, particularly when the page itself is silent on the topic (and therefore not authoritative). Theknightwho (talk) 18:28, 10 July 2022 (UTC)
Stress is phonemic in English at the lexical level. A phonemic transcription therefore needs to indicate whether or not a word, even a monosyllabic word, has lexical stress, or else it's not phonemic. Being in isolation doesn't matter, because the phonemic structure of a word is (more or less) independent of that. True, in the major lexical classes monosyllabic words will nearly always have lexical stress, but omitting the stress from a phonemic transcription means that the word does not have lexical stress. I.e., the transcription will be in error. It's also not entirely predictable, since some words ostensibly in major lexical classes have been grammaticalized. Omitting stress from monosyllabic words in English (or Russian) because it's "relative" would be like omitting tone from monosyllables in a register tone language like Yoruba because it's "relative" (which it is), despite the language having a contrast between stressed and unstressed or high and low tone. That's true regardless of whether there is associated vowel reduction in words without lexical stress that can be independently transcribed. Unstressed syllables in English may have reduced vowels, but that's not a general rule.
Several major English dictionaries use the IPA stress marks for non-IPA values: a monosyllable without a stress mark is stressed, while a monosyllable with a stress mark is a disyllable. Wikt could imitate that usage, but then we would not have a phonemic transcription. The vowel letters ⟨ᵻ⟩ and ⟨ᵿ⟩ have been removed from Wikt, despite being convenient and being used by the OED, because they're not proper IPA. By that argument, we shouldn't copy the OED's non-IPA conventions for stress marks either.
(I don't mind marking the non-phonemic distinction between primary and secondary stress, which is prosodic rather than lexical, as long as we don't use "secondary" stress to indicate that an unstressed vowel is not reduced, as MW does. The OED is generally a good model in such cases.) kwami (talk) 17:33, 8 July 2022 (UTC)
I thought I'd copy an answer from my talk page, where it appears that Theknightwho may not understand the difference between phonetic and phonemic:
"Stress marks were present in all the phonemic transcriptions , and none of the phonetic ones (monosyllabic). I don't know if that's good practice, but it's common enough to exclude irrelevant lexical details or to add extra-lexical elements to a broad phonetic transcription. A phonemic transcription, on the other hand, needs to include all phonemic distinctions. You're effectively arguing that English monosyllables don't have phonemic stress, which is contradicted by RS's on the subject." kwami (talk) 18:19, 10 July 2022 (UTC)
I do understand the difference, as you very well know from our previous conversations - that's a pretty obvious attempt at poisoning the well. That response you've just posted wasn't actually pertinent to anything I'd said, which was about what the current consensus is. Theknightwho (talk) 18:25, 10 July 2022 (UTC)
Then why do you make ridiculous arguments like it being a "mish mash" to have different phonetic and phonemic transcriptions? Either you don't understand, or you're making arguments in bad faith in an attempt to score points. kwami (talk) 18:27, 10 July 2022 (UTC)
I haven't seen any convincing justification for treating phonetic and phonemic transcriptions differently when it comes to stress. Theknightwho (talk) 18:31, 10 July 2022 (UTC)
I know a lot of Unicode code points have gotten pages. Should we do the same for UCSUR code points? If so should we also some make pages for words using UCSUR scripts, such as in toki pona? GTbot2007 (talk) 21:00, 8 July 2022 (UTC)
We don’t do conlanguages, so why should be do conscripts? (Those few that we allow don’t have conscripts.) We would also have completeness problems due to copyright. Fay Freak (talk) 21:19, 8 July 2022 (UTC)
Some of the appendix-only conlangs do have conscripts. Also Cistercian Numerals are in UCSUR and are NOT conscripts. GTbot2007 (talk) 21:37, 8 July 2022 (UTC)
The main obstacle is that they aren't part of Unicode, so there's no guarantee that any given site visitor will see the same thing as anyone else. We do have a display hack that allows us to control what site visitors see in certain pages created to work around MediaWiki syntax constraints, but I'm not sure if that would work for Private Use Area codepoints. Someone would have to set it up for you, anyway. Chuck Entz (talk) 00:30, 9 July 2022 (UTC)
It's just like CJK Unified Ideographs Extension G, not many people can see it but thats why there is a image on the pages --GTbot2007 (talk) 01:23, 9 July 2022 (UTC)
Technically there is nothing official about Unicode either. Also as far as I can tell UCSUR is kinda different then those because different fonts agree on UCSUR. --GTbot2007 (talk) 09:22, 10 July 2022 (UTC)
Then we can document both because a word can have more then one meaning, the only PUA standers I see a lot of people using are UCSUR, MUFI and SMuFL, so we could do just those --GTbot2007 (talk) 02:31, 21 July 2022 (UTC)
Unicode is an international standard supported by virtually all modern computers. I'm not sure if it's more or less of a standard than, say, the kilogram, which is more omnipresent but has more outright opposition with the pound and stone. In any case, there's a huge difference between a standard supported by Apple, Microsoft, IBM, Adobe, W3C and just about everyone else, and a standard not officially supported by anyone.--Prosfilaes (talk) 03:44, 12 July 2022 (UTC)
We could work with conflicting PUA interpretations for different scripts if the entries typically have many characters, so that few words have more than one reading. The page title may be gibberish (and it is sometimes not much better for Tai Tham, where there is no control over the font), but the head lines can usually be assigned a sensible font depending on the language. (Of course, as with Old Tamil, there may need to be separate entries for different encodings, in this case Unicode < 14.0 and Unicode >= 14.0.) So the requirement would have to be the existence of 'widely' available fonts for the various encodings. We could likewise accommodate MUFI characters, but in general for MUFI I think our support for normal lexical entries should be restricted to a transliteration service.
For Tengwar, there is confusion over whether the original encoding was superseded by what looks like a Unicode proposal. In general, I think we should require a reasonably well-standardised encoding, but we can cope with two encodings. In principle, it's not much worse than multiple orthographies. --RichardW57m (talk) 15:49, 11 July 2022 (UTC)
Ten years ago I had a discussion on Meta of Mongolian in the PUA, part of which I screenshot above; I now see just boxes. How does “ ” display to you?
I don't see the need, either. The reason why Tolkien's scripts aren't encoded in Unicode is because they have possible IP constraints and the Tolkien estate isn't going to wave them. Same thing with pIqaD and Paramount. So why should we push the line? Yes, the Cistercian numerals are currently only in UCSUR, but they're generally out of scope anyway.--Prosfilaes (talk) 03:44, 12 July 2022 (UTC)
In general, we shouldn't have entries at or using PUA codepoints, for the reasons outlined above (they mean and display different things to different people), for which reason the usual Unsupportedpages title-changing method also wouldn't work, unless perhaps we could change the title to an image. If we want to include images of e.g. Tengwar script at Appendix:Sindarin, then we can discuss the legality of that, but we shouldn't be using PUA codepoints. Prosfilaes' text shows up as boxes for me, whereas Richard's shows up as Lao script (except the first character, which is a box). - -sche(discuss)00:41, 16 July 2022 (UTC)
The text I gave is entirely composed of Lao script characters; my point is that even assigned characters are gibberish if you lack the means to display them. Now, text of a known language will display if the script is defined and a font is successfully nominated for it. The same goes for text in the PUA. What matters is that there is an agreed encoding* for it. Indeed it has been the case that if a suitable font is not specified for it, typically by a user's CSS file, Pali entries in the Sinhala script will display as some other word. That does not delegitimise the entry. --RichardW57 (talk) 05:32, 16 July 2022 (UTC)
In the case of words in the Tai Tham script, it seems that there doesn't have to be a firmly agreed encoding for it. --RichardW57 (talk) 05:32, 16 July 2022 (UTC)
An Election Compass is a tool to help voters select the candidates that best align with their beliefs and views. The community members will propose statements for the candidates to answer using a Lickert scale (agree/neutral/disagree). The candidates’ answers to the statements will be loaded into the Election Compass tool. Voters will use the tool by entering in their answer to the statements (agree/disagree/neutral). The results will show the candidates that best align with the voter’s beliefs and views.
Here is the timeline for the Election Compass:
July 8 - 20: Community members propose statements for the Election Compass
July 21 - 22: Elections Committee reviews statements for clarity and removes off-topic statements
July 23 - August 1: Volunteers vote on the statements
August 2 - 4: Elections Committee selects the top 15 statements
August 5 - 12: candidates align themselves with the statements
August 15: The Election Compass opens for voters to use to help guide their voting decision
The Elections Committee will select the top 15 statements at the beginning of August. The Elections Committee will oversee the process, supported by the Movement Strategy and Governance team. MSG will check that the questions are clear, there are no duplicates, no typos, and so on.
We have a zillion ways of linking to Wikipedia, Wikisource, etc. I am going to clean them up. This does *NOT* involve eliminating qualitatively different ways of linking, but redundant templates that do the same thing. We have a lot of qualitatively different link templates (IMO too many), from heaviest to lightest (these are just the ones I've discovered so far, there may be others):
The situation is similar for other projects, e.g. {{projectlink|source}} (and its aliases {{projectlink|wikisource}}, {{sourcelite}}, {{PL:source}}, {{PL:wikisource}}) do exactly the same thing (i.e. they produce the same-looking results) as {{source}} but use different code.
I am planning on eliminating redundant aliases, esp. the longer ones, and different access points for the same thing. For example, I don't think we need {{projectlink}} at all, and it's not used that much (about 1300 uses); use {{pedia}} or a similar dedicated template. I also think {{pedlink}} (177 uses) and {{WPLook}} (no uses) are unnecessary, and long aliases like {{wikipedia-slim}} for {{slim-wikipedia}} are pointless. {{projectlink/Wikipedia}} should not be the main entry; it should sit at {{pedia}}, which is by far the most-used alias, and {{projectlink/Wikipedia}} eliminated. {{wpl}} is a convenient interface for non-English uses, but it should be given a different name, as its existing name is terrible; I suggest {{R:WP}} (or repurpose {{R:W}}, which is currently unused) in line with other reference templates used in References and Further Reading sections. Benwing2 (talk) 03:56, 11 July 2022 (UTC)
Specifically:
We only need one shortcut {{wp}} for {{wikipedia}} ({{wiki}} is a bad name as there are lots of wiki- projects).
We only need {{pedia}} and an alternative interface {{R:WP}} ({{pedialite}} is a bad name, both in that it's a longer alias and in that the -lite series of templates normally refers to non-Lua variants of Lua-backed templates, whereas {{pedialite}} is in fact backed by Lua; but obsoleting it is a bit problematic because it's used quite a lot).
We only need {{w}}; it's already as short as possible.
We don't need {{pedlink}} or {{WPLook}} at all, as I mentioned above.
Oh hell, I found some more: {{in wikipedia}}, {{PL:dab}} (a bad name), {{wtorw}} (a horrible name), {{w2}} (another bad name), {{vern}} ("Hey Vern!"). Each has their own code. Seems like each user feels the need to create their own version, all similar but slightly different. "You are in a maze of twisty little passages, all alike". Benwing2 (talk) 04:29, 11 July 2022 (UTC)
@Benwing2{{vern}} is different. Its main purpose is to track redlinks for vernacular names of taxa. The WP link is only intended as a temporary substitute for the link to the entry- once the WT entry exists, the template links to the WT entry and adds a different tracking category so it can be replaced with an ordinary wikilink. Chuck Entz (talk) 04:45, 11 July 2022 (UTC)
"Temporary" will probably mean a very long time where {{vern}} is used for uncommon vernacular names, including mistaken spellings, eg, using non-ASCII letters, with wrong gender in specific epithet, with "ii" instead of "i" for specific epithet endings (or vice versa), etc. DCDuring (talk) 12:13, 11 July 2022 (UTC)
I am for cleaning this up. Vern is important. While we are at it, can we decide where to put these links? Either under the L2 or under further reading? Vininn126 (talk) 12:54, 11 July 2022 (UTC)
I am planning on eliminating {{projectlink}} and {{projectlinks}}. {{projectlinks}} will be replaced by multiple invocations of {{projectlink}}, and then {{projectlink}} will be replaced by using project-specific templates, as follows: use {{pedia}} for Wikipedia; use {{specieslite}} for Wikispecies (because there are 15,256 uses of {{specieslite}} and <= 21 uses of {{PL:species}} or {{projectlink|species}}); use {{PL:PROJECT}} for other Wikimedia projects (e.g. {{PL:books}} for Wikibooks, {{PL:source}} for Wikisource, etc.). @Surjection there is a gadget MediaWiki:Gadget-AggregateInterprojectLinks.js that you have worked on recently that may be affected by these changes; not sure, because the Javascript appears to operate on the resulting HTML rather than the raw Wikisource code. Benwing2 (talk) 05:33, 13 July 2022 (UTC)
I suggest {{R:wsource}} in place of {{PL:source}}, {{R:wquote}} in place of {{PL:quote}}, etc. The total number of references to all projects other than Wikipedia and Wikispecies is ~ 500, so this sort of change won't be a big deal. I also propose making the first parameter a language code, but omittable for English, and the second parameter the page to link to, but omittable if the linked page is the same as the page title. That way, we can have {{R:wp}} to replace the badly named {{wpl}}, for linking to Wikipedia in Further Reading sections. Benwing2 (talk) 07:42, 13 July 2022 (UTC)
Also, I propose renaming User:Fish bowl's badly named {{w2}} template to {{lw}}, since it's effectively a cross between {{l}} and {{w}}; it works similarly to {{w}} but takes approximately the parameters of {{l}} (and even uses {{l}} under the hood). The badly named {{ruby/ja-w2}} + variants {{jarw}} and {{wj}} will become {{ja-lw}}. Benwing2 (talk) 07:47, 13 July 2022 (UTC)
Just a thought on the unification of templates: ("what do we need"?) For all, we need i)a link to PAGENAME, ii)an alternative showing of the link, iii) a lang= (en is default). The formats we need: 1) a box. {{wikipedia}} +text= could be a useful addition? 2) a written out expression with visible logo {{pedia}} 3) an inline link, like {{w}} as in definitions +tooltip+need some kind of visible feature to warn the reader that we link to a different wikiproject. Like
for Wikipedias, a light blue+tooltip e.g. Hominidae (Is used at greek el:Template:w)
@Thadh When you say merge {{w}} and {{w2}}, which interface should be used? I suspect the large majority of uses of {{w}} are for English Wikipedia and rely on the default language code of en, whereas the large majority of uses of {{w2}} are for non-Englishs Wikipedias and rely on the convenience of specifying the language code in |1=. From above, there are > 250,000 uses of {{w}} so it would take some doing to convert them all (not impossible; I converted all uses of {{inflection of}} to put the language code in |1=, and there were about 1.4 million of them). Benwing2 (talk) 00:28, 14 July 2022 (UTC)
@Sarri.greek Thanks for your comments. I agree with your approach of having a three-way distinction between boxes, "Further Reading"-style lines and inline links, and I also like the idea of color-coding the cross-project links. My proposal for naming these templates for projects other than Wikipedia is probably as follows: (1) for a box, {{wikisource}}, {{wikiquote}}, {{wikibooks}}, etc. (these already exist); (2) for a line, {{R:wsource}}, {{R:wquote}}, {{R:wbooks}}, etc.; (3) for an inline link, {{wsource}}, {{wquote}}, {{wbooks}}, etc. Some of the latter already exist. Benwing2 (talk) 03:30, 14 July 2022 (UTC)
@Benwing2: I meant, create an RFM and see which one gathers more support. I personally have never used {{w2}}, yet used {{w|lang=X}} quite regularly. Thadh (talk) 07:00, 14 July 2022 (UTC)
How to request examining whether senses are different
Example: these two sense definitions for the verb sight:
I do not understand the difference, which may be due to these senses not being different, but also to the definitions being insufficiently precise, compounded with a lack of aptly illustrative examples. I could propose a merger of the senses at WT:RFM, but I’m not at all sure they should be merged, and there is no applicable template for flagging this. I could request verification of these senses, using {{rfv-sense}}, but the issue is not really verification; it is, rather, enlightenment. What is the proper, or best, way to flag this as an issue for discussion? --Lambiam07:45, 11 July 2022 (UTC)
WT:Usernames_and_user_pages recommends that user names should be easy for English-speakers to pronounce and type. User names like 沈澄心 fail this massively, but are saved by an exemption for the Wikimedia common login. The example user appears to be active on Mandarin Wikipedia. Is there any plan for getting round the problem with names like this? Ideally it would be possible to set up synonyms (as opposed to alternative, softly linked accounts, as with my accounts RichardW57 and RichardW57m, where the latter is relatively insecure). This solution would probably have to be implemented at a Wikimedia/Phabricator level. --RichardW57m (talk) 12:05, 11 July 2022 (UTC)
I don't think this requirement can be enforced, due to unified login. If you have difficulty typing my username, you can ping @Dringsim instead. I will receive notifications via E-mail (response may be relatively slow). 沈澄心✉13:04, 11 July 2022 (UTC)
@沈澄心: 'Dringsim' is an example of a 'softly linked account'. Thank you for taking the time to set it up. As you noted, the solution is inferior to it being a synonym for your main account. --RichardW57m (talk) 15:46, 11 July 2022 (UTC)
@RichardW57m This is something I was thinking of just today as I tried to ping an editor with an Arabic name. Maybe we should mandate all users to have a pingable alternative username like Dringsim above. However, this is not enough in itself, as 1) it does not make them recognisable and 2) the alt account is not obvious to anyone seeing the user for the first time. I think those editors should make some allusion to the alt in their nickname. For example, the person above would be called 沈 (@Drigism) or something to that effect. brittletheories (talk) 11:20, 12 July 2022 (UTC)
Improve on the simplistic intransitive/transitive verb distinction
We have first class support (labels and categories) for the distinction of transitive and intransitive verbs but next to no support for verbs taking objects in other grammatical cases. This leads to tons of inconsistencies:
Apart from the inconsistencies and missing categorization, it is also not clear whether "transitive|dative" means that the verb takes an accusative and a dative object (as in holen) or only a dative object (as in yardım etmek). I don't think {{+obj}} (experimental since 2014) is a satisfactory solution; if we document the existence of an accusative object in the label, it is only reasonable to expect the same to be true for objects in other cases (which is reflected by the fact of how widespread these non-standard case annotations within labels are).
Ideally, we would show exactly which objects are present, how they semantically relate to the verb and whether they're optional, but I can't think of a good way to present all this information without bloating everything immensely. The concerned sense (6) in schreiben could be labeled with something like {{lb|de|(acc)<what is being written>|dat<to whom it is being written>}} or, without the semantic information, {{lb|de|(acc)|dat}} which would place it in Category:German verbs that optionally take an accusative object and Category:German verbs that take a dative object while perhaps something like {{lb|de|noobj}} could be used for verbs that take no objects in any case (both senses). I personally find this potentially parenthesized list of cases to be vastly superior to the simple transitive/intransitive distinction. While it appears to break the long-standing tradition of distinguishing between transitive and intransitive verbs, it is always possible to make {{lb|en|transitive}} be an alias of {{lb|en|acc}}; the presentation of this object information can also always be tailored to the specific language (i.e. {{lb|en|acc}}/{{lb|en|(acc)}}/{{lb|en|noobj}} can be made to display (transitive)/(transitive, intransitive)/(intransitive) as usual for English and other language with simple case systems).
This could also be extended to prepositional objects but IIRC I've proposed that a while back and it hasn't garnered the best feedback.
Isn't transitive more strictly "that which has an accusative argument and also a passive participle"? Granted, having some sort of argument is something we should be documenting, but terminology here matters. Vininn126 (talk) 22:53, 12 July 2022 (UTC)
@Vininn126: Are you asserting that languages that don't have a passive voice don't have transitive verbs? And what counts as a passive participle? What I interpret as Thai participles are unmarked for voice, and don't even need to be analysed as zero-derivation participles. --RichardW57m (talk) 09:24, 13 July 2022 (UTC)
That has been a definition I have always heard - at least within certain languages, i.e. in Polish you have a verb like obejść that takes an accusative participle but is considered (by most sources) to be intransitive as it lacks a passive participle. It's even a bit of a psuedo... not shibboleth but pop-linguistics question to ask people, to form the passive voice with that verb. Vininn126 (talk) 09:27, 13 July 2022 (UTC)
@Vininn126: This is the first time I've heard this distinction. It appears to be an idiosyncrasy of Polish grammar specifically (see the section w:Transitive_verb#In_Polish). Also, the rule seems to be "takes a direct object or has a passive participle" which would also make more sense to me. This website names verbs that take the genitive (in the positive) or instrumental but have a passive participle and are thus classified as transitive. This can still easily be accommodated for under the {{lb|en|(acc)}} etc. proposal as this logic for which case annotations lead to which things being displayed can be specified on a per-language basis. — Fytcha〈 T | L | C 〉 10:31, 13 July 2022 (UTC)
Sure, that was the assumption I originally had before hearing certain Polish grammarians. I think it's more intuitive anyway. It doesn't really change my ultimate feelings on such a proposal - I do think it's worth it to discuss how we represent transitivity. Vininn126 (talk) 10:37, 13 July 2022 (UTC)
There's also {{indtr}} for verbs taking a prepositional complement. It doesn't really work for languages with a case system, though. PUC – 22:56, 12 July 2022 (UTC)
@Fytcha What you are proposing with labels is very similar to what I have implemented for {{+obj}}. I prefer doing it using a separate template to the right of the definition rather than mixing case/prepositional usage with semantic labels. I also pinged you elsewhere about this; see User:Benwing2/test-obj for a bunch of examples of what I implemented. What I implemented doesn't currently categorize, but that is not hard to add if we come up with a good scheme for doing it. My main unhappiness with what I implemented is with the appearance; I tried various possibilities, including different colors, to distinguish (a) literal words (prepositions/postpositions/conjunctions/etc.); (b) cases and related grammatical labels (subjunctive, reflexive, etc.); (c) glosses; (d) the word "or". If you or someone else can come up with a good scheme for how this should look, I can implement this and push it to production. BTW there is also {{+preo}}/{{+posto}} (used for similar purpose as {{indtr}}), which my new {{+obj}} subsumes. Benwing2 (talk) 03:13, 13 July 2022 (UTC)
@Benwing2: Thanks a lot, the new template seems great. I do however agree that we need to come up with a solution to make it more easily humanly readable. The biggest problem to me is that it is not immediately visually apparent where the definition actually is (i.e. where the labels end and the objects begin). The band-aid solution would be having distinguishing background colors for the three components (labels, definitions, objects; even just giving the objects a different bg color does a lot) but I can imagine why people would oppose that. If we moved all grammatical information after the sense1 that would also make it a lot easier to read. Further, if we are linking to the cases anyway, we could also think about using the standard 3 letter abbreviations for them to save some space. And on a related note, I personally don't like the semicolon given that the template already comes with such an easily identifiable or anyway.
I'm not quite sure how this meshes with the transitive, intransitive, reflexive etc. labels. To me it seems like we have split up the information of which objects a verb takes into two visually separated areas (most editors seem to share this perception based on the fact that they usually document indirect objects in the labels; there's a definite tendency to document both together, they're the same kind of information after all). The confusion of whether transitive+dative means that it takes one or two objects (one in the accusative, one in the dative) as I've highlighted above also persists. I don't know how to solve this outside the radical approach of also moving the accusative object(s) behind the definition line (which will be opposed by the community if it came to a vote). The examples for beginnen (no label, +obj transitive), anziehen (transitive, +dative), erinnern (ditransitive, +accusative +genitive) in User:Benwing2/test-obj perfectly demonstrate this issue. But even if we leave these points unsolved and only concern ourselves with non-accusative and prepositional arguments, it is a big improvement.
1: Apart from us treating accusative and objects in other cases differently (documenting the former in labels and the latter either with nonstandard labels or after the sense with {{+obj}}, {{+preo}} etc.), the same is also done for nouns where countability is a label but other grammatical information comes after the sense (see e.g. Ort). I honestly think we are too far gone and that there will never be community consensus to rectify this, so we have to make the best out of the inconsistent situation which your template improvements do. — Fytcha〈 T | L | C 〉 11:53, 13 July 2022 (UTC)
I think it is good to add such term-specific grammatical info in a standardized, machine-interpretable way. It may need some effort to get this right. The example above for labelling schreiben indicates an optional direct object, but the indirect object is equally optional. Isn't in fact the general situation that these roles are optional? We may want to mark the less usual situation that they are semi-mandatory, as for the Turkish verb yemek(“to eat”) – you cannot ask, *yediniz mi? ; it has to be yemek yediniz mi? or bir şeyler yediniz mi?. A minor point is that in Germanic languages the indirect object (marked by a case, not in preposition form) tends to precede the direct object, so the preferable order is {{lb|de|dat<to whom something is being written>|acc<what is being written>}}. --Lambiam10:05, 13 July 2022 (UTC)
I think we can further this by talking about syntax in general. Imo {{+preo}}/{{+posto}} aren't bad, but how should I handle a noun like bitwa? I currently have the prepositions next to the headword to signify those prepositions apply to all meanings, but I feel like that is a sub-optimal solution. Also, how can we handle things like certain conjunctions like чтобы and żeby? Finally, what about Slavic language verbs with accusative arguments, unless it's negated, in which case you use genitive? Would we just mark them as using accusative and the reader has to know to use genitive otherwise? Vininn126 (talk) 10:58, 13 July 2022 (UTC)
I think there are a number of repeated language-specific issues with such awkward behaviours. We have French reflexives that take objects, copulas that are occasionally passivised (English 'be'!), and Slavic (+ at least some Uralic) accusative/genitive switching. Hebrew has some similar behaviour to the last. It's not clear to me that Latin deponent verbs can't be transitive, though I don't think they can be used in the passive. --RichardW57m (talk) 11:50, 13 July 2022 (UTC)
I'd like to chime in with a perspective from the Japanese language.
From what I've read, and recall from earlier in my own education, English grammar distinguishes between syntactically intransitive verbs, verbs that have no direct object in a given sentence, and syntactically transitive verbs, verbs that do have a direct object in a given sentence. Thus, I eat uses the verb eat intransitively, and I eat spaghetti uses the verb eat transitively with the object spaghetti.
Japanese grammar distinguishes between semantically intransitive verbs, called 自動詞(jidōshi, literally “self-acting + word”), and semantically transitive verbs, called 他動詞(tadōshi, literally “other-acting + word”). Verbs are classed as one or the other, regardless of the syntax of a given sentence. Thus, 私は食べる(watashi wa taberu, literally “I eat”) uses the semantically transitive verb 食べる(taberu, “to eat”) without an explicitly stated object, and 私はラーメンを食べる(watashi wa rāmen o taberu, literally “I ramen eat”) uses the semantically transitive verb 食べる(taberu, “to eat”) with the explicitly stated object ラーメン(rāmen), and in both cases the verb taberu is classed as a 他動詞(tadōshi, “semantically transitive verb”).
Occasionally, there are exceptions that can be puzzling to the language learner. One specific example is that semantically intransitive verbs can sometimes take a direct object marked with the object / accusative particle を(o), particularly verbs of motion or temporal persistence as used in expressions describing a time or place through which the action of the verb happens. This might be things like 私は道を歩く(watashi wa michi o aruku, literally “I road walk”). The verb 歩く(aruku, “to walk”) is classed as a 自動詞(jidōshi, “semantically intransitive verb”), as the action of the verb does not inherently involve the agent acting upon something else, but rather the agent does something that affects the agent themselves, in a specifically non-reflexive fashion. The grammatical object in this sentence references the spatial context through which the action occurs, and does not indicate an object that is semantically required by the action of the verb.
Some non-mainstream grammatical analyses posit the existence of things like "dative subjects" and "nominative objects" in Japanese. What I've read of these so far boils down to apparent confusion introduced by analyzing Japanese grammar in specific constructions from the perspective of the idiomatic English translations of those constructions. As Japanese, and ignoring the colloquial English equivalents, these putative oddities evaporate. (For those interested, read through this thread at Japanese Stack Exchange.)
With regard to some of the discussion above, pretty much all Japanese verbs have a "passive" form, even intransitives, due in part to the additional use of the passive as a method of indirection, for purposes of increased politeness, and/or to express capability. Thus, we can say things like どこへ行かれますか(doko e ikaremasu ka, “where ” → “where are you going? ”), or いよいよ来られました(iyoiyo koraremashita, “finally ” → “you finally came ”).
In light of these aspects of how Japanese verbs operate, the presence of direct objects or passive forms cannot be used as any clear indicator of transitivity / intransitivity in Japanese. ‑‑ Eiríkr Útlendi │Tala við mig23:11, 13 July 2022 (UTC)
@Fytcha Can you create some mocks of how you think things ought to look? I am not so good at user interfaces, having always done back-end software development and rarely front-end work. BTW I'm not opposed to stuffing all this info into the label before the definition if people think this is the best way; my preference is to put it after mostly because I think the definition itself is more important, and having a bunch of semantic labels + syntactic info all before the definition can make the definition get lost. And yes I don't think we need to solve every issue to create something far better than what we have today. For example, in a language where the concept of transitive vs. intransitive doesn't make sense, you can simply avoid those labels in favor of specifying the case explicitly. Benwing2 (talk) 03:25, 14 July 2022 (UTC)
@Benwing2: Unfortunately, I currently don't have the time to initiate some big change, but seeing that the transitive/intransitive distinction is more complicated than I had initially been under the impression of, I don't think I can stand behind my initial proposal anymore anyway. I think your proposed template that subsumes {{+preo}}, {{+posto}} and {{indtr}} is a big improvement and I don't have a whole lot to suggest in terms of presentation apart from what I've already said (getting rid of the semicolon before the or). The presentation can always be changed retrospectively; what really counts is that we have the verbs' argument data present in Wikicode in a consistent format. — Fytcha〈 T | L | C 〉 22:22, 18 July 2022 (UTC)
A further thought, however
What if we included something like a syntax header, or put that under Usage notes? Also, should these be collapsible? Vininn126 (talk) 22:27, 18 July 2022 (UTC)
Lowercase at definitions
Could you please make template {{alternative form of}}, {{diminutive of}} to start normally with lowercase as default (as in definitions)? Also, standardise lowercase default at similar templates at Category:Form-of templates? We need to keep adding nocap=1 everywhere at definitions. I take it, that in this dictionary, capital initial letters are used when we write a word which is always written with capital first letter (depending on the language). E.g. now, the impression is that Alternatives or Diminutives are proper nouns for English. (Please, keep in mind, that en.wiktionary is English, but has an international audience: it is crucial to know how words are normally written). Unlike at etymologies, where we need cap=1: here, we have a full text with sentences (starting with capital, ending with full stop). Otherwise, we are left only with template {{form of}} which is normal, with initial lowercase default + need to add categories with {{cln}}. Is there a link to the house rules for comma, capital, and the similar? Is there a reassurance that the use of full stops and capitals are and will remain unchanged in the future? (Sometimes, I use nodot=1 and nocap=1 provisionally, in the fear that templates might be changed in the future). Thank you. ‑‑Sarri.greek♫I17:18, 13 July 2022 (UTC)
An alternative approach that wouldn't burden those entering English alternative forms etc. would be simple templates that called the above templates and specified "nocap=1". This seems like a Grease Pit matter or you could make such simple templates yourself, which might trigger some veteran template editor to make more definitive templates.
As to "house rules" (ie, a style guide), you could propose one. Our practice is to have full definitions for English terms with initial capitals and a full stop. The usual practice for entries in other languages favors simple glosses with no initial caps and no full stop. DCDuring (talk) 18:08, 13 July 2022 (UTC)
O!! @DCDuring, Equinox, thank you. Now i see at Wiktionary:Style_guide#Patterns that en.wikt starts definitions with capitals, even these short phrases! But why? Why in such short explanations? e.g.To walk like... I might think it is To instead of to. I might understand the initial capital if there are 2 sentences in a much larger text... I have asked the same at fr.wikt, where e.g. for fr:Sunday they write *Dimanche instead of dimanche etc... So confusing. I'm so sorry! That makes us (us=non host-language speakers) doubt any wiktionary def especially in wiktionaries of languages we do not understand, we never know: is it or is it not?... ‑‑Sarri.greek♫I18:58, 13 July 2022 (UTC)
So BTW, @DCDuring, Equinox, Sarri.greek Awhile ago I proposed making all form-of templates generate an initial capital and final period when the language is English, and no initial capital or final period when the language isn't English. IMO this would solve the perennial issue of whether to make these templates capitalize and add a period or not, and this is consistent with what User:DCDuring said above about glosses vs. full definitions. Some people objected to this approach in the past but I'd like to ask it to be reconsidered; the current situation with form-of templates is an utter mess. Benwing2 (talk) 03:19, 14 July 2022 (UTC)
Small note: @Benwing2, i do not understand in English? Everything here, in en.wikt, is in English. About initial capitals to non-initial-capital-words: i call it "the curse of wikipedia", where everything is lemmatized with capital, regardless of how a word is written. :) ‑‑Sarri.greek♫I04:17, 14 July 2022 (UTC)
I don't think he said "in English", he said "is English", i.e. when the language of the term being defined is English (not the language of the definition). The reason there is an inconsistency between English and non-English languages is that definitions of English words are descriptive, whereas definitions of non-English words are typically just one-word translations. It is exceedingly rare for the capitalization practices here to create ambiguity. Just as is the case for capitalization at the beginning of sentences, you should generally assume that the capitalization at the beginning of definitions is grammatical/aesthetic and not actually part of the word. Andrew Sheedy (talk) 04:24, 14 July 2022 (UTC)
(edit conflict) @Sarri.greek I should have been clearer, what I meant by "the language is/isn't English" is that the language of the term is/isn't English. User:Andrew Sheedy explained it well. English-language terms like physics have full-sentence definitions:
# The branch of ] concerned with the study of the properties and ]s of ], ], ] and ].
So the corresponding non-lemma forms, alternative forms and the like would have their definitions formatted like full sentences as well. Greek φυσική(fysikí), Russian фи́зика(fízika) and similar terms in other languages have their definitions as simple glosses (e.g. just # ]), so the non-lemma forms, alternative forms, etc. would be formatted likewise. That would be much better than the current mess at Category:Form-of templates. All form-of templates would take params |nocap= and |nodot= that could be used in the definitions of English-language terms (and for non-English-language terms would either have no effect or throw an error), as well as a param |cap= that could be used in the definition of a non-English-language term if you really wanted it (and for an English-language term would either have no effect or throw an error). Benwing2 (talk) 04:30, 14 July 2022 (UTC)
Thank you @Andrew Sheedy, Benwing2 for explaining. Still. The example above (# The branch of ] ... . is a sentence. OK, i understand it as such. But Diminutive ofblah, I do not understand it as a sentence. It is just 3 words, no verb! Nevermind, though, if it is a custom for English dictionaries, Thanks. ‑‑Sarri.greek♫I04:44, 14 July 2022 (UTC)
It is indeed just a custom, which some dictionaries follow and others don't. We're not saying that the definitions are sentences, just that they're written in sentence style (which means, with the first word capitalized and a period/full stop at the end). Andrew Sheedy (talk) 04:51, 14 July 2022 (UTC)
Just to be clear, the definition of physics above is NOT a sentence, concerned being a past participle used to modify branch. But, it is formatted with initial caps and a full stop, in the same way that sentences are, as is customary here and in most print English dictionaries and many online ones. The only English dictionaries that I am familiar with that have full sentence definitions are the Collins COBUILD series.
Usually a definition of an English word is written to have the same grammatical function as the PoS header indicates, eg, an adjective definition would consist of one or more adjectives or adjectival phrases. An advantage of that is that the definition can, however awkward-sounding, be substituted (for noun definitions, after trimming any initial determiner) for the defined word in any usage to test the correctness of the definition in any usage. Non-gloss definitions characterize the usage, but don't fill the syntactic role of the term they define. DCDuring (talk) 14:39, 14 July 2022 (UTC)
Thank you for the link @J3133. I now see, it is customary for English dictionaries. Not in greek dictionaries though where the opposite is usual style, which is why i was so perplexed. ‑‑Sarri.greek♫I09:23, 15 July 2022 (UTC)
Since there is now specifically an <<lbor>> template in addition to just plain <<bor>>, how are we planning on using these as per official policy, particularly in relation to things like Latin and Ancient Greek borrowings by the modern descendants of these languages, or in general just borrowings from them into any language (since they are essentially dead languages that have had a lot of impact on technical, academic, and scientific domains and international vocabulary)? Wouldn't the vast majority of borrowings from Latin (whether Romance languages or not) technically be "learned"? Do we need to go back and convert several thousand entries to this (using a bot)? Word dewd544 (talk) 19:58, 13 July 2022 (UTC)
Are borrowings from Latin into, say, New High German always learned or are there some edge cases (e.g. religious vocabulary) where natural borrowing may have occurred? — Fytcha〈 T | L | C 〉 20:14, 13 July 2022 (UTC)
@Brittletheories: AFAICT, {{internationalism}} should only be used where it's not clear which language directly the term was borrowed from. Finnish hypoteettinen is marked as an internationalism (because it is not known through which language exactly the term entered Finnish; at least that's how it should be) whereas it is known for Romanian ipotetic, even though they have the same ultimate source. — Fytcha〈 T | L | C 〉 20:59, 13 July 2022 (UTC)
That is our current approach. I do wonder if we should tag internationalisms in the future alongside an ultimate source, but that's a conversation for a different day. Vininn126 (talk) 21:08, 13 July 2022 (UTC)
Bot task: checking for uncreated German terms
Hello, I am looking to put my bot (KovachevBot) to work to do a task involving the dict.cc database, a German dictionary consisting of ~1.2 million terms. I'm looking to use Pywikibot to simply check whether each lemma in the dictionary exists on Wiktionary, and from that, to create a vocabulary list which I will publish here on Wiktionary so that others can see what words have yet to be added in German. I've managed to trim down the dictionary to only around 400,000 non-redundant words, so hopefully this would reduce the burden on the servers when connecting to each page to check for its existence. I just wanted to gather approval/dismissal as to whether this is a constructive idea and how I should go about doing it so as to not put overmuch load on Wiktionary.
I've published the code on GitHub so anyone who can read Python can have a look for bugs, etc., though it seems to be working fine on the small-scale test I did for ~30 terms. I intend to expand the script to group words by their part of speech for added convenience: they would appear under different headings for each POS on the generated dump page.
@Kiril kovachev: You don't need a bot to do that. It's much faster to download the database dumps and to check whether the entries exist in there. Note however that we won't be able to add all entries from dict.cc here because they have laxer inclusion criteria than we do (this only concerns a minority of entries). You can post these finished lists to your user space. Note however that under no circumstances should entries be automatically generated based on the dict.cc data. — Fytcha〈 T | L | C 〉 20:45, 13 July 2022 (UTC)
Thanks, that sounds way easier, haha. I forgot those existed. And, of course, no auto-generation can happen, not least because dict.cc doesn't allow the data to be re-published, apparently not even excerpts. I don't know what this means for whether I'll be able to upload the finished list, but anyway, thanks a lot for the input! Kiril kovachev (talk) 20:47, 13 July 2022 (UTC)
It depends on whether the compiler exercised discretion in compiling the list. If they were to purport to list all words in the German language, then their list would just be a raw fact that anyone could permissibly compile. If they claim to exercise some judgment in determining which words to include, they have a stronger claim to copyrightability. When we previously created User:Brian0918/Hotlist (a list of missing words from other large English dictionaries), as I recall we ultimately generated a list of words common to two such dictionaries, so that it was not actually a list from any one of them. bd2412T04:27, 14 July 2022 (UTC)
@Kiril kovachev: You could also try @Erutuon's entry index, hosted on toolforge. I had a look at the dict.cc data, its licensing scheme is a bit weird. But as long as the data is not re-published, it seems fine. I think the most interesting use of this list could be to identify missing multiword entries. – Jberkel18:11, 20 July 2022 (UTC)
Oh, that looks splendid, looks like that'll forego a lot of trouble scanning through the colossal dump files :)
But, you're right, I do think the dict.cc license is a touch bizarre. I also don't get what the author means by "republishing the data", because that seems to include the words in the dictionary themselves, not just the translations... in that case, it wouldn't be possible to re-publish the generated list, or...? In any case I suppose I'll get in touch :')
Philippine languages subfamilies, and descendant proto-languages of Proto-Philippine
Perhaps time to create subfamilies for the Philippine languages and some descendant proto-languages of Proto-Philippine. The existing category does look too crowded in its present state.
Proposed family tree for Philippine languages (and existing languages to be grouped therein):
Batanic
Yami (tao)
Ivatan (ivv)
Ibatan (ivb)
Northern Luzon
Ilocano (ilo)
Arta (atz)
Dicamay Agta (duy)
Isnag (isg)
Pamplona Atta (att)
Villa Viciosa Agta (dyg)
Ibanag (ibg)
Itawit (itv)
Yogad (yog)
Central Cagayan Agta (agt)
Gaddang (gad)
Gad'ang (gdg)
Dupaningan Agta (duo)
Dinapigue Agta (phi-din)
Casiguran Dumagat Agta (dgc)
Nagtipunan Agta (phi-nag)
Pahanan (apf)
Meso-Cordilleran
Northern Alta (aqn)
Southern Alta (agy)
Central-Southern Cordilleran
Isinai (inn)
Binongan Itneg (itb)
Limos Kalinga (kmk)
Tanudan Kalinga (kml)
Lubuagan Kalinga (knb)
Southern Kalinga (ksc)
Batad Ifugao (ifb)
Amganad Ifugao (ifa)
Mayoyao Ifugao (ifu)
Tuwali Ifugao (ifk)
Balangao (blw)
Central Bontoc (lbk)
Eastern Bontoc (ebk)
Northern Bontoc (rbk)
Southern Bontoc (obk)
Southwestern Bontoc (vbk)
Kankanaey (kne)
Southern Kankanay (xnn)
Ilongot (ilk)
Pangasinan (pag)
Ibaloi (ibl)
Karao (kyj)
Central Luzon
Kapampangan (pam)
Abenlen Ayta (abp)
Ambala Ayta (abc)
Bolinao (smk)
Botolan Sambal (sbl)
Mag-Anchi Ayta (sgb)
Mag-Indi Ayta (btw)
Sambali (xsb)
Remontado Agta (agv)
Northern Mindoro
Alangan (alj)
Iraya (iry)
Tadyawan (tdy)
Greater Central Philippine
Central Philippine
Tagalog (tl)
Bikol
Bikol Central (bcl)
Southern Catanduanes Bicolano (bln)
Buhi'non Bikol (ubl)
Libon Bikol (lbl)
Miraya Bikol (rbl)
Iriga Bicolano (bto)
Northern Catanduanes Bicolano (cts)
Bisayan
Tausug (tsg)
Butuanon (btw)
Surigaonon (sgd)
Tandaganon (tgn)
Cebuano (ceb)
Waray-Waray (war)
Waray Sorsogon (srv)
Hiligaynon (hil)
Capiznon (cps)
Bantayanon (bfx)
Porohanon (prh)
Masbatenyo (man)
Masbate Sorsogon (bks)
Romblomanon (rol)
Asi (bno)
Aklanon (akl)
Kinaray-a (krj)
Inonhan (loc)
Ratagnon (btn)
Cuyonon (cyo)
Caluyanun (clu)
Mansakan
Davawenyo (daw)
Mansaka (msk)
Kalagan (kqe)
Tagakaulu Kalagan (klg)
Mamanwa (mmn)
Southern Mangyan
Buhid (bku)
Western Tawbuid (bnj)
Eastern Tawbuid (twb)
Hanunoo (hnn)
Palawanic
Brooke's Point Palawano (plw)
Central Palawano (plc)
Southwest Palawano (plv)
Central Tagbanwa (tgt)
Palawan Batak (bya)
Subanen
Western Subanen (suc)
Central Subanen (syb)
Northern Subanen (stb)
Danao
Iranun (ill)
Maguindanao (mdh)
Maranao (mrw)
Manobo (mno)
several languages already correctly assigned to mno
Gorontalo-Mongondow
Bolango
Buol
Gorontalo
Suwawa
Mongodow
Ponosakan
Ati
Kalamian (phi-kal)
several languages already correctly assigned to phi-kal
Southern Mindanao
Koronadal Blaan
Sarangani Blaan
Tboli
Tiruray
Sangiric
Talaud
Sangir
Ratahan
Minahasan
Tontemboan (tnt)
Tombulu (tom)
Tonsea (txs)
Tonsawang (tnw)
Tondano (tdn)
Umiray Dumaget Agta
Also requesting addition of some descendants proto-languages of Proto-Philippine.
Support, although I think we don't have to go into detail with the internal branching of the Bisayan languages (and maybe also of the Northern Luzon and Central Luzon languages).
@TagaSanPedroAko Is there a reason why you have left the languages of Mindoro unassigned? The longstanding working hypothesis is that they can be assigned to the Northern Mindoro and Southern Mindoro (nested in Greater Central Philipines) subgroups. Also, it will be of great help for anyone who is willing to execute this request if you add the language codes. –Austronesier (talk) 16:42, 14 July 2022 (UTC)
This seems like a great idea. Plus, Bantik (bnq) should also be assigned to this group, specifically to the Sangiric subgroup. Wiktionarian89 (talk) 13:40, 20 July 2022 (UTC)
Standardise the choice of the lemma and alternative forms of Cantonese, and to allow Jyutping titles as lemma if there are no other alternatives?
Background (you may skip this if you're familiar enough with Cantonese): There are some Cantonese words without corresponding character(s) in standard written Chinese. For these words, speakers would use a character with same/similar sounds (e.g. 咪), use a character with the same/similar etymology (e.g. 攰), or to create new characters with both the phonetic and semantic meaning of the word (e.g. 𨋢/䢂(lì)), or for Hong Kong Cantonese speakers to use a non-standard romanisation that is often based on English phonology, e.g. hea, gur. These methods works fine for monosyllabic words, where we would then lemmatize the most common form, though it is possible to have many alternate forms due to the reasons mentioned, cf hea. For multisyllabic words this becomes problematic, where the various combination of alternate forms sporadicly appear, which means that it is difficult to determine the most common form. Some dictionaries would try to assign obscure characters that share a similar etymology, see yue:虢礫緙嘞yue:犖确 for example, but this method is rather arbitrary and also create the same problems with alternative forms. Some dictionaries uses dual-title, e.g. words.hk: 啹/gur.
Proposal:
If there is an indisputable form, use that as the lemma form.
If the forms have distinct pronunciations but share the same/similar meaning(s), the lemma form should be considered separately. (e.g. 尋日/寻日 and 琴日)
If there is enough written usage to justify a particular form as the most common written form, that form should be used as lemma (e.g. hea, 攰), even in cases it may not fit the word's etymology.
If there is enough written usage but there are multiple written forms sharing similar usage, the lemma form should prefer etymology (among the forms found in usage) if possible, otherwise on a first-come-first-served basis.
If a word does not have enough written usage to establish the most common written form, the lemma form should be determined via etymology, provided that this form has actual uses, not only mentions, and that its etymology is logical and actually makes sense. (e.g. 欷欷歔歔)
If the lemma form of a word could not be determined via the rules above, then Jyutping titles will be used as a last resort, where the Jyutping will be determined based on the most common pronunciation. (e.g. the word with the pseudo-etymological form 虢礫緙嘞 would be at gwik1 lik1 kaak1 laak1) Note: This only involves around 20-30 entries by my estimation, most of which are onomatopoeia and Kra-Dai loans.
Rationale:
By lemmatizing words under a particular form, we would be implying (or misleading users) that it is the correct form of the word, especially for words in the final category. Note that almost all 虢礫緙嘞 results on Google are mentions rather than uses. (e.g. although 轆/辘(lù) is etymologically more correct, the non-verb senses of 轆/辘(lù) are almost always written as 碌(lù) in Hong Kong Cantonese, and therefore should move to 碌(lù) to reflect usage)
This will involve changing quite a number of pages, but I believe it is worth the effort to do so, considering the benefit it brings.
This set of principles can also be applied similarly to other Chinese languages in the future if necessary.
See also: this vote, where monosyllabic entry Jyutping entries are allowed as non-lemmas.
@Wpi31: Thanks for drafting the proposal. I think I generally agree with this, and it seems to capture what most editors have been subconsciously following. I do have some reservations about certain details, such as lemmatizing at the most common form when the etymological character is apparent. Sometimes this might cause issues where we're using different forms for cognates outside of Yue (or even within Yue), e.g. 錫 instead of 惜. I know this would give an effect of implying a correct form, but it is probably inevitable unless we want duplication of information. It is probably better to write usage notes that might give details on which form being more common in which variety. Jyutping should really be a last resort where there are no attested characters (Chinese or Latin). Kwik1 lik1 kwaak1 laak1 (and its variants) do have written forms in dictionaries, and so I would tend towards following what other dictionaries have instead of resorting to Jyutping, even if the characters are rather obscure. — justin(r)leung{ (t...) | c=› }06:00, 15 July 2022 (UTC)
@justinrleung: Regarding the first issue you mentioned, personally I feel like its better to treat such cognates separately, as in 唔 vs 毋(wú). I'm also fine with putting everything in the same entry, but more care should be taken. Readers would not go to the usage notes section in many cases, e.g. if a non-native speaker visits 錫/锡(xī) to see what it means in Cantonese, they will see the {{zh-alt form|惜}} stating means "to kiss", which fits in their context, and simply assumes that 惜 is the more correct one without even going to the page, let alone reading the usage notes. Perhaps modifying {{zh-see}} and {{zh-alt form}} to allow mentioning something along the lines that 𡃶 is an alternative form of 惜, but 𡃶 is a more common form, and 錫 is the most common form, and maybe adding a {{zh-common form}} which mentions the common forms with a similar format to {{zh-syn}} (so that it doesn't involve scrolling down to usage notes)?
On the second part, I think Jyutping should also be used when the etymology does not make sense, as in the case of gwik1 lik1 kaak1 laak1. Usually when creating/coining characters for related sounds used in the same word, they would share the same radicals/components, as in 徘徊, 蝴蝶, 葡萄, which is why I don't feel like using 虢(guó)礫/砾緙/缂(kè)嘞 or its variants since it tries to approximate the pronunciation but ignores other things almost completely. The characters are also too obscure and arbitrary for approximating sounds, since if these characters are only chosen to represent their sounds, they should be common enough for the general public to know its pronunciation: I might have chosen the less accurate 棘力卡啦 instead. (also compare with the straightforward 直不甩) Wpi31 (talk) 07:38, 15 July 2022 (UTC)
@Wpi31: For the first issue, I think the same issues would kind of come up with other kinds of variants, like simplified Chinese or variant traditional forms (common ones include 裏 and 説). Alternative forms are usually put in {{zh-forms}} if they apply to all definitions, but I would usually use {{zh-alt-inline}} for variants that are particular to certain senses.
For the second issue, I think the obscurity/arbitrariness of characters is kind of subjective. Unattested forms would be ruled out by WT:ATTEST. (Since Cantonese is a LDL, we would only usually require one attestation from a reliable source.) — justin(r)leung{ (t...) | c=› }08:09, 15 July 2022 (UTC)
Romance Etymology headers
The etymology of Romagnol bo says the word is from Latin bōs(“cow”), but actually there are different stages from Latin onward and phonetic changes as much. bo < ʙŏᴠᴇ < bōs (the intermediate form doesn't undergo metaphony because of the final -e). Are all these changes and phonetics rules allowed on Wiktionary, and are the former required to be in ꜱᴍᴀʟʟ ᴄᴀᴘꜱ characters? BandiniRaffaele2 (talk) 19:10, 15 July 2022 (UTC)
It went probably through Vulgar Latin*boem < Latinbōvem; compare Friulian bo. I have not seen small caps used in etymologies on the English Wiktionary. I don’t think the phonetic rules involved in the development need to be spelled out, unless something non-obvious is going on. We even give the etymology of Middle Persian gwl as “rom Old Persian*vr̥da-” without any further explanation of the sound laws involved – in this case a quite dramatic but entirely regular change. --Lambiam20:23, 15 July 2022 (UTC)
Of course, Romagnol bo descends from Latin bovem and not it's nominative form bōs, it's not like bōs > bove, magically. So should the etymology be 'From Latin bovem'? That is surely more accurate, but it would require users to click twice to get to the Latin lemma entry, presumably they are already familiar with navigating themselves in non-lemma entries.
Should the etymology then be 'From Latin bovem, accusative singular form of bōs'. This makes it so users can immediately get to the lemma entry without misleading users of the terms descending from the nominative form, but it seems unnecessarily long.
Should it be 'From Latin bovem', but then the link actually takes you to the bōs page? The page bovem exists, and I'd feel betrayed and deceived clicking on a link that takes me to another page that it promised.
We did have another thread about it here. Perhaps it is time to formally discuss the 'link form A, but display form B' strategy on the Beer Parlour. A bit odd, granted, but it is my favourite of the options that remain after eliminating the apparently too radical idea of changing the Latin lemmas.
I have, incidentally, used 'hybrids' in cases where a Proto-Italic entry is involved. For instance, the etymology for Spanish hender:
Welcome to the 7th issue of Movement Strategy and Governance News! The newsletter distributes relevant news and events about the implementation of Wikimedia's Movement Strategy recommendations, other relevant topics regarding Movement governance, as well as different projects and activities supported by the Movement Strategy and Governance (MSG) team of the Wikimedia Foundation.
The MSG Newsletter is delivered quarterly, while the more frequent Movement Strategy Weekly will be delivered weekly. Please remember to subscribe here if you would like to receive future issues of this newsletter.
Movement sustainability: Wikimedia Foundation's annual sustainability report has been published. (continue reading)
Improving user experience: recent improvements on the desktop interface for Wikimedia projects. (continue reading)
Safety and inclusion: updates on the revision process of the Universal Code of Conduct Enforcement Guidelines. (continue reading)
Equity in decisionmaking: reports from Hubs pilots conversations, recent progress from the Movement Charter Drafting Committee, and a new white paper for futures of participation in the Wikimedia movement. (continue reading)
Stakeholders coordination: launch of a helpdesk for Affiliates and volunteer communities working on content partnership. (continue reading)
Leadership development: updates on leadership projects by Wikimedia movement organizers in Brazil and Cape Verde. (continue reading)
Internal knowledge management: launch of a new portal for technical documentation and community resources. (continue reading)
Innovate in free knowledge: high-quality audiovisual resources for scientific experiments and a new toolkit to record oral transcripts. (continue reading)
Evaluate, iterate, and adapt: results from the Equity Landscape project pilot (continue reading)
Other news and updates: a new forum to discuss Movement Strategy implementation, upcoming Wikimedia Foundation Board of Trustees election, a new podcast to discuss Movement Strategy, and change of personnel for the Foundation's Movement Strategy and Governance team. (continue reading)
What do people think about creating a "deleter" user group that has the ability to delete pages? Currently you have to be an admin to be able to delete pages, but I'd like to be able to delete pages from my non-admin account as I try to do all my editing from there. I find that the main reason I have to switch to my admin account is to be able to delete pages. Other admin-only actions (e.g. changing page protection) occur more rarely. I also imagine it may be useful to be able to grant the deletion right to certain non-admins, similarly to how we have a "template editor" user group that gives the ability to edit many otherwise-protected templates and modules. Benwing2 (talk) 00:04, 18 July 2022 (UTC)
I think this has come up before. My main question is whether this comes with the ability to undelete and possibly also see deleted revisions. In your case it doesn't matter, since you're basically an admin using a non-admin account. For others, though, that might be more problematic. @Svartava got into trouble because they were using their extended mover privileges to delete things (I'm not sure exactly how), but that was a unique combination of factors that may not apply to anyone else who might be granted the role. Chuck Entz (talk) 00:27, 18 July 2022 (UTC)
@Chuck Entz Deleting, undeleting and seeing deleted revisions are all different user rights. There are actually a whole host of user rights related to deletion, at least the following:
Delete tags from the database (deletechangetags)
Delete and undelete specific log entries (deletelogentry)
Delete and undelete specific revisions of pages (deleterevision)
Delete pages (delete)
Mass delete pages (nuke)
Search deleted pages (browsearchive)
Undelete a page (undelete)
View deleted history entries, without their associated text (deletedhistory)
View deleted text and changes between deleted revisions (deletedtext)
So potentially we could create a "deleter" user group without the ability to undelete or view deleted entries. (I would be fine with this as I don't have occasion to undelete pages very often, and I'm not sure if I've ever found the need to view deleted revisions.) Benwing2 (talk) 01:39, 18 July 2022 (UTC)
Thank you @Benwing2 for pointing out these rights. (Imetsia might want to comment) I strongly support adding the group deleter/eliminator/closer for a group with deleting and undeleting rights, with or without revision (un)hiding rights (no strong opinion on that). I think the most sensible is delete, undelete, deletedtext, deletedhistory, mergehistory and suppressredirect from Special:ListGroupRights. See also: d:Q10862160 for the group's existence/proposal elsewhere; where it exists are — fawiki, jawiki, hiwiki (no longer), ptwiki, ruwiki, urwiki, viwiki, viwikibooks; elsewhere on the remaining wikis it has been proposed. —Svārtava (talk) • 12:52, 18 July 2022 (UTC)
@Chuck Entz, 98.170.164.88: It's about a year and a half ago when I first got extended mover rights, when I was sort of a newbie with only 6 months of editing. I didn't really know the rules or how things work (and neither should I have been granted that out of process in the first place then) and impulsively moved a bunch of speedy requested pages to uncreated Hindi/Sanskrit pages and changed their content accordingly - which was not what the move feature is for; and secondly it messed the page history. —Svārtava (talk) • 12:52, 18 July 2022 (UTC)
The bar is, and should be, quite low to be an admin. It is also not difficult to request a page be deleted if one is not an admin. This seems like adding complications without solving any existing problem. If someone wants to be able to delete stuff, request to be an admin. - TheDaveRoss14:43, 18 July 2022 (UTC)
I very much favor this idea — I created the last vote about it after all! There are many benefits that come with it, as I've expounded on previously. The main policy obstacles that others put forward the last time are two:
What should the nomination process be?
It should ideally be robust enough to prevent an unqualified person from becoming a deleter. But it shouldn't be too restrictive that it becomes barely any different from an admin-scale vote. I favor a process by which one admin nominates, and then two other admins have to approve the nomination. Similar to what we do at WT:Whitelist, but a bit more rigorous.
What permissions does the role include, paying particular attention to undeletion, viewing hidden revisions, and unhiding revisions?
I would favor all the permissions. According to my vision, the role would be basically an admin role but without the blocking power. There are several users for whom such a role would be appropriate. Imetsia (talk) 18:26, 18 July 2022 (UTC)
@TheDaveRoss I wonder why you think the bar should be low to be an admin; admins can really fuck up the site if they want, so I think adminship shouldn't be given out willy-nilly. Also I definitely believe in separating admin and non-admin accounts similarly to how you wouldn't normally do all your work in Unix as root; it's too easy to mess something up accidentally. Benwing2 (talk) 02:00, 19 July 2022 (UTC)
@Benwing2: How can admins mess the site up? None of the actions which an admin can take are irrevocable, most of the actions are easily so. The most damaging stuff which can be done (aside from by people with database access) can be done by people not even logged in, they can still run a bot and if they hop around IPs it could be very challenging to find all of the edits they make, especially if it was something like just deleting the Russian section from every page where it exists because they don't like Putin. The fact is the more admins we have the less damage admins can do, because there are more people around to see and stop the problem, and then fix the problem. I agree that we shouldn't give the rights out to anyone who asks, but anyone who sticks around for a while and makes good-faith, quality edits can block/delete in my book. It should also be easier to take the tools away if someone abuses them, or acts like a jerk. It also makes it less "special" to be an admin if the majority of regular contributors are admins, and it shouldn't be special to be an admin. Everyone who contributes should have an equal voice in the project, if it feels like there is a cabal who have all of the "power" that doesn't foster a democratic community. - TheDaveRoss12:38, 19 July 2022 (UTC)
@TheDaveRoss: I think you are too sanguine about giving out powerful tools. (Let's give everyone a gun so that there are always guns around to stop any potential mass shooter. Right?!) You can protect pages against non-admins, you can block accounts if necessary, etc. but none of these things work against admins. It is true that someone who was truly malicious could theoretically e.g. use a bot and attack the site using lots of accounts or IP's, but people of that nature are thankfully rare; most of the worst damage comes from people are misguided and stubborn, and think they are doing the right thing when in fact they aren't. I'd call this sort of breakage semi-vandalism, and it's much harder in practice to correct than true vandalism, because true vandalism is obvious and gets identified and reverted immediately, while semi-vandalism may hang around for awhile before being noticed, and in the meantime others may make legitimate changes on top of it. Benwing2 (talk) 05:57, 20 July 2022 (UTC)
One personal tip on vandalism is to sneak some in amidst hundreds of good entries (another one is to keep on pestering the decent users to create a toxic environment so they don't want to return ). When I went admin rogue (five times), I didn't last long on my vandalism spree, and it couldn't have taken more than a few minutes to undo anything. To be fair though, if I had used a bot and been more meticulous it could have been much more devastating. But a tiny drop in the ocean is the worst we can expect, nobody's going to be able to destroy the website! Dunderdool (talk) 10:11, 20 July 2022 (UTC)
@Dunderdool, TheDaveRoss Just FYI, Wonderfool makes it out like the only issues have been with occasional times he "went rogue" and started vandalizing, but in fact I've spent a great deal of time and effort cleaning up Wonderfool's mistakes. User:Equinox knows what I'm talking about. (Same, I should add, with User:SemperBlotto and his and his bot's mistakes, along with certain other editors who continue to make sloppy edits even after repeatedly being warned to be more careful. I may need to start blocking these editors to get their attention ...) This is a good illustration of "semi-vandalism"; the mistakes are mixed in with good commits so it's very hard to find and fix them. Wonderfool makes huge batches of "drive-by" edits and has never shown much inclination to fix his own mistakes. Benwing2 (talk) 22:43, 24 July 2022 (UTC)
Ben getting the idea that there might be a difference between occasional mistakes and deliberate sabotage. Look when you are eight fucking colons indented, just give it up. He rejoices in wasting your time. Equinox◑01:26, 25 July 2022 (UTC)
What does that have to do with a deleter role? The damage he does as an admin is obvious and quickly reverted, the damage he does as a drunk editor is the perfidious stuff that sticks around for years until someone spots it. Neither situation would change if a deleter role was created. - TheDaveRoss12:29, 25 July 2022 (UTC)
@Benwing2: Your gun analogy is flawed, since the consequences of misuse of a gun are extremely high, whereas if someone misuses admin buttons we can take them away and undo everything they did with them without much effort. The tools are just not that powerful, and they are not abused particularly frequently. It is probably too hard to take the tools away when someone isn't playing nice, but I would advocate for making it easier to take them away rather than keeping them out of the hands of people who might use them productively. - TheDaveRoss12:16, 20 July 2022 (UTC)
@Surjection, because you were one of the lead opponents the last time, how would you respond to the two questions I posed? Is there any form of the deleter role that you would vote in favor of? Imetsia (talk) 17:55, 24 July 2022 (UTC)
No. The points I raised earlier still stand as well today. If we can trust someone to (un)delete pages, we should be able to trust them with the other sysop tools too. — SURJECTION/ T / C / L /18:15, 24 July 2022 (UTC)
My thought is: it is not necessary. People who need this can already do it. Unlike some people who may be more diplomatic (or afraid), I don't mind saying that we don't need to argue back and forth with Wonderfool (as above), nor to provide tools to people like "Svartava". Equinox◑01:15, 25 July 2022 (UTC)
Why don't we use the head template in Italian entries to display the fully accented form? An older discussion from 2017 settled with hyphenation or IPA, since they already give out the stress, but that's hardly a reason. Non-mandatory accents (that is, not word-finally) are sometimes spelled to avoid homography (eg. sùbito, princìpi, pèsca), it's not just something to put in the pronunciation section. Moreover, we already do it for verbs (only on the lemma entry though), let's be consistent about this. Catonif (talk) 20:41, 18 July 2022 (UTC)
@Catonif, Imetsia, Sartma I was the one who added the accent marks to Italian verbs. It's true that we routinely add accent marks to some languages to indicate stress (various Slavic languages, for example), and macrons to some languages to indicate length (Old English, Latin, Ancient Greek, ...). The trickiness in Italian is (a) that accents are in fact standardly written when word-final, but not otherwise, and adding them everywhere might be confusing; (b) some monosyllabic words are standardly written with an accent, some with out it. In languages with added diacritics, we normally provide for automatically stripping them out when generating links to terms, which would be logical here except for (b), which makes it hard to implement. Interested in hearing what native Italian speakers think. Benwing2 (talk) 01:57, 19 July 2022 (UTC)
The written accent can be either:
mandatory: word-finally and certain monosyllables, it's already reflected on the standard spelling and cannot be removed (eg. è, così, dovrà)
non-mandatory: rarely written, though often used to avoid homography or mispronunciation (eg. còrso, nartèce). This is also the kind of accent you've added in verbs. The general rule is that you could even write uòvo, in case you don't want anyone to for some reason pronounce it ùovo or uóvo (eg. Treccani uses uòvo).
useless: in plain bisillabe words where the accented vowel is à, ì or ù (eg. càpo, tìpo, pùpo). Treccani doesn't use it, but it's what happens with verbs (see stìmo at stimare), and I'd keep it for consistency.
forbidden; in monosyllables where the accent isn't mandatory, it is forbidden (eg. *é (and), *à (to), *hà (it has), *sò (I know), *ài (to the)). The head template should display these terms without accents.
Side note: archaic conjugation of avere (ò, ài, à, ànno) are alternative forms with mandatory accent.
It should be noted that, while marking stress in words such as càpo/tìpo/pùpo may be useless to anyone at all familiar with Italian, it is not so to anyone unfamiliar with it. Nicodene (talk) 08:50, 19 July 2022 (UTC)
@Benwing2, @Catonif, @Nicodene: As long as we have the IPA transcription(s), I would only write the accent when it's mandatory. Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's (there are native speakers that only have 5 vowels, and even native speakers who have 7 don't all agree on whether accented E's and O's are open or closed in a given word). E.g. I say "pésca" for both (prescribed) "pésca" (fishing) and "pèsca" (peach), "dòccia" for prescribed "dóccia", etc. All Italian dictionaries are prescriptive when it comes to pronunciation, which is in clear contradiction with WT:NPOV, so I would avoid copying what they are doing. Sartma (talk) 00:15, 20 July 2022 (UTC)
That's not how it works. The dictionaries are in fact descriptive when it comes to the standard/traditional pronunciation. By your reasoning, we should remove practically all of our pronunciations, including for languages other than Italian, so long as there exist regional varieties to which they do not apply. Not gonna happen. I suggest simply adding additional regional pronunciations if you want representation. Nicodene (talk) 05:00, 20 July 2022 (UTC)
@Nicodene: No, not for Italian. There's no Italian native speaker who coherently pronounces words as shown in Italian dictionaries (unless they took pronunciation classes). Italian dictionaries are not descriptive when it comes to pronunciation, they're prescriptive. That's also the main reason why we don't write accents on every word that's not accented on the penultimate, like they do in Spanish. There are no native speakers of "standard Italian" in Italy, that's a constructed language we learn in school, and while most might manage to get to a good level of written standard Italian, on the pronunciation front no Italian native speaker speaks like the dictionary. Sartma (talk) 06:31, 20 July 2022 (UTC)
I do not have it in me to take this seriously. Want to tear down the standard Italian pronunciations on Wiktionary? Have at it. Sounds like a permaban speedrun.
And no, mid-vowel disagreements are not in the slightest the reason why you 'don't write accents on every word that's not accented on the penultimate'. The assumption that the lack of a consistent written distinction must need some special explanation like that is baseless, considering that it is the norm, not an exception, for a writing system not to indicate all of a language's phonemic information. Humans do not need comprehensive training wheels to read their own language, and they rarely care enough to consistently accommodate foreign learners. For Italian in particular, the lack of a Latin precedent for directly marking vowel quality is also relevant. Nicodene (talk) 07:25, 20 July 2022 (UTC)
@Nicodene: "Want to tear down the standard Italian pronunciations on Wiktionary? Have at it."
That's really not what I said. Talk about straw man! Please read my comments again.
I believe pronunciation issues belong in the pronunciation section. I didn't say we should "tear down standard Italian pronunciation on Wiktionary". I'm just saying that I don't think we should add non-standard accents on Italian headwords, and explained that pronunciations in Italian dictionaries are indeed prescriptive, which is a quite undisputed fact. That's all. It's such an obvious thing to whoever studied Italian linguistics (it's one of the first thing you learn at University) that to be honest I'm not sure why you're reacting so badly to it...
About not writing accents in all words that aren't accented on the penultimate: you talk about the lack of a Latin precedent as being relevant, but the same is true for Spanish, and they indicate all accents anyway, so not sure about that line of reasoning. I don't think anybody was ever afraid of accents and other diacritic signs in Romance Europe, you just need to have a look at older documents, they have any kind of them, and pretty much all Romance languages use quite a wide variety of them. It's Germanic Europe that has a diacritic phobia. XD Sartma (talk) 07:43, 20 July 2022 (UTC)
You were making an argument about pronunciation, and now you're surprised that someone found it relevant to pronunciation sections.
We're not talking about some ludicrous 'diacritic phobia'. We are talking about the fact that most of the world's writing systems do not convey all phonemic information. Want Romance examples? Catalan, Portuguese, and Campidanese Sardinian spelling does not consistently distinguish close-mid vowels from open-mids either. Nicodene (talk) 08:05, 20 July 2022 (UTC)
@Nicodene: Please, re-read my comments. The whole thread is about adding diacritics to Italian headwords, not about the pronunciation section, and my comments are on that topic. You can't discuss Italian accent diacritic without talking about pronunciation, that's the very reason why one would use them.
I'm fluent in English and Japanese, probably two of the languages with the weirdest writing/spelling systems in the world, so I'm very well aware that most writing systems do not convey all phonemic information. The point is that if you add diacritics to Italian entries you'll have to decide for one or the other pronunciation when it comes to E's and O's, and if you follow an Italian dictionary, you'd be giving a prescriptive pronunciation. Sartma (talk) 08:23, 20 July 2022 (UTC)
@Sartma We already have 'decided for one or the other pronunciation when it comes to E's and O', your attempt to pretend that the pronunciation section is somehow irrelevant notwithstanding. I do not understand your personal crusade against Italian dictionaries anyway. As has already been explained, nobody is stopping you from adding regional pronunciations as well, which has already been done on some entries.
'Prescriptive' this, 'prescriptive' that. We get it: Standard Italian pronunciation is *evil*, and you have arrived to save the human race from it. We all appreciate your heroic efforts. Just try not to let it bother you that it is actually descriptive to describe a standard pronunciation without banning others. Or that practically any pronunciation given on Wiktionary is 'prescriptive' by your standards, since there exist dialects or accents that differ from it. Nicodene (talk) 09:11, 20 July 2022 (UTC)
@Nicodene: With all your straw men you make it really unpleasant discussing with you. I would appreciate if you would stop putting words in my mouth that I never said, thank you. If you need to quote me, just copy paste what I wrote between quotation marks and spare me your personal and absurd interpretations.
Prescriptive is prescriptive. It's neither good nor bad. It was my understanding that we don't do that on Wiktionary, that's why I'm pointing it out. Especially since that's what I was told when talking about analogue issues for Japanese (see below).
Again: I never said I want to change the pronunciation section. Please stop using that straw man. We're not talking about the pronunciation section in this thread, why do you keep bringing that up?
Oh please, spare me the tactical dishonesty. You know full well the logical conclusion to what you're arguing.
Let's suppose that I accepted your argument that indicating a standard Italian pronunciation of mid-vowels in the part-of-speech section on Wiktionary is simply unacceptable because 'muh prescriptivism'. Your mission now is to convince me that indicating the same mid-vowel in the above pronunciation section, both in clear IPA characters and in the exact same è/é system found in the part-of-speech section, is somehow acceptable. Good luck.
Nothing that you have said about Standard Italian is not incredibly obvious. That is how formal/educated pronunciation works in any number of other languages, including English. Shall we start stripping our standard English pronunciations, etc. from Wiktionary?They're prescriptivist after all, surely. Nicodene (talk) 10:37, 20 July 2022 (UTC)
@Nicodene: I never said it's unacceptable. Here is what I said:
"As long as we have the IPA transcription(s), I would only write the accent when it's mandatory."
At this point I assume you're not misreading/misinterpreting/misrepresenting what I write on purpose, but you just can't help yourself, so I'm gonna stop here. I said what I had to say anyway.
You do have terrible reading comprehension skills, though, so I strongly recommend you do some training in that area. I would start with refraining from imposing your thought on a text and read what's written, not what you want to read in it. Sartma (talk) 13:26, 20 July 2022 (UTC)
While you may not have literally said that it was 'unacceptable', you said in the very next sentence that 'we should never indicate accents on E's and O's'. Is there such an important difference between saying that we should never do something on Wiktionary and saying that something is unacceptable on Wiktionary, or are you (like last time) using a quibble over semantics as a deflection tactic?
You have not made the slightest attempt to explain how, per your own reasoning (see the above quote, plus for instance 'Italian dictionaries are prescriptive when it comes to pronunciation, which is in clear contradiction with WT:NPOV'), we would not have to remove the stressed mid-vowel distinctions indicated in Italian dictionaries from our pronunciation sections as well. (Keep in mind that the same pronunciation sections use both IPA and é/ó/ò/è, i.e. 'accents on E's and O's', which you specifically said 'we should never indicate'.) Either accept that your reasoning is wrong or take a consistent stance against showing Standard Italian mid-vowels. Nicodene (talk) 14:59, 20 July 2022 (UTC)
@Nicodene: Well done for actually going back and reading what I wrote. Next step is quoting me in full, not just using clippings to make them say something they're not saying. My full sentence was: "Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's". If Wiktionary is non prescriptive, then we shouldn't indicate accents on headwords. I didn't say something absolute like adding accents on headwords "is simply unacceptable", as you put it. Yes, there is a different there. Take all the time you need to see it.
If you tell me that Wiktionary is fine with a degree of prescriptivism, then ok, let's add the accents.
I also clearly told you where I'm coming from, and that "standard accent is prescriptive so not good on Wiktionary" was the reasoning I was given when my proposal to add accents to Japanese romanisations was rejected. So I'm basically just repeating what I was told by other Wiktionary editors. That's why I also suggested it would be the case to make this question clear, so that everybody is on the same page. Sartma (talk) 15:56, 20 July 2022 (UTC)
Your having said 'Wiktionary shouldn't be prescriptive' before that quote does absolutely nothing to change the following 'so we should never indicate accents on E's and O's'. Had you said 'if Wiktionary shouldn't be prescriptive', then this new interpretation that you are trying to claim ('if Wiktionary is non prescriptive...') would apply, and you would in fact be giving a conditional rather than the absolute that you actually gave (and as you have ludicrously tried to deny just now with 'I didn't say something absolute'). As it is now, you are simply lying. Nicodene (talk) 16:16, 20 July 2022 (UTC)
@Nicodene: Anyway, if it really isn't a problem to give the dictionary "standard" pronunciation of a language here on Wiktionary, we should add this info somewhere, so everybody here is on the same page. When I proposed to revise the pronunciation section on Japanese entries and also pitched the idea to add the accents on JP romanizations, I was told by Japanese editors that "the pitch accent marked in most dictionaries is specific to the "standard" variety of Japanese, which is inappropriately prescriptive for our mission here at Wiktionary" and that "doing so would imply that Tokyo accent is the only pitch accent for all of Japanese, which is incorrect"... Sartma (talk) 07:26, 20 July 2022 (UTC)
(I'll answer here instead of the other branch of the discussion because of space reasons and practicality) @Sartma is actually making a reasonable point. What you are describing is sadly the case in the North. In central Italy's not-too-rural areas though by now, in spite of the Treccani quote, Dante's Conlang is spoken as a mother tongue by nearly everyone. There is a notable number of words that have different mid-vowel open-ness in different areas, but not to the extent that should prohibit us from displaying the standard variants.
All of this though, seems to me completely beyond the point. Accents are occasionally written, and when they are, it's based on the standard, not on the pronunciation of the writer. For example, if on a text there's written pèsca ("peach") with the spelled diacritic, whether you pronounce it the same way or not as pésca ("fishing"), still makes the accented spelling pèsca mean "peach, and peach only". It becomes a sort of definition-reasoned-difference in spelling, kinda like a and ha. While pronouncing /ˈpeska/ for peach is a legitimate regional pronunciation, writing pésca for peach is just wrong. When Standard Italian decides the Standard Spelling for the Standard Words, it also decides which Standard Accents should be on there (even though they are Standardly Invisible).
This is the difference between this and the Japanese question. Italian written accents are standardized, Japanese written pitch accent is not (well, since it's not writable).
Let the nonstandard pronunciations be in the pronunciation section, and the standardly accented word be in the head template.
<sidenote> To @Nicodene I recommend not bringing Latin in the discussion, since it's not new to the standard pronunciation the concept of changing vowel quality to its liking, making the rest of the peninsula endure the pain of having to treat things like spòso, sógno or édera as standard and having their pronunciations, actually reflecting Latin, be labeled as "regional" (just wait till I get to join the Crusca and change this). </sidenote>
@Benwing2 I forgot to address your second point (the first one being addressed in the big older message in which I forgot to ping you). As for links I would only keep the mandatory accents in them. Even if we wanted all accents though, it would still be pretty logical to strip them away (just remove every non-word-final accent). Still, this could probably cause some problems, I think for example in loanwords from French or Spanish. In general putting written accents on links would be pretty overwhelming and might deceive non-Italian users to think that Italian actually spells like that. Catonif (talk) 15:32, 20 July 2022 (UTC)
@Catonif: Regional Italian is definitely spoken as a mother tongue by most Italians these days, but there are no native speakers of Standard Italian (with the exact pronunciation given in dictionaries). One might reach a good level of spoken Standard Italian in school, but since no-one really cares about vowel openness or any other phonetic issues, one would retain their Regional Italian vowels (and consonants, and phonotactic doubling, etc.). English (both General American and RP) pronunciation is based on actual spoken varieties, and so is Japanese. There are/were actual communities of native speakers who speak/spoke like that. That's not the case for Italian. Standard Italian was never natively spoken by anybody (as Treccani clearly say), and still isn't.
RP is based on 'actual spoken varieties' and Standard Italian is not? That's a creative claim. Please explain what fundamental difference you are imagining here. Nicodene (talk) 16:23, 20 July 2022 (UTC)
@Nicodene: Standard Italian is a constructed language, there were no native speakers of it back then when they constructed it and there are none now. Sartma (talk) 16:58, 20 July 2022 (UTC)
@Thadh: I'm not sure all written languages have the same history as Italian... I know that German has a similar one, even worse possibly, since it's a sort of Frankenstein language that came out of a mixture of two different chancery languages (but I don't remember the details). Was Russian the same? Were there no native speakers of Russian before they made it the standard language? Sartma (talk) 21:06, 20 July 2022 (UTC)
@Sartma: you're mixing up "language" and "standard language". No, nobody speaks exactly as dictated by the standard language, that is impossible unless the criteria are incredibly lax. I doubt there was even one person in the history of English who consistently pronounced all vowels and consonants in all positions and contexts exactly as dictated by RP or GA, and used prescribed grammar at all times. Thadh (talk) 21:28, 20 July 2022 (UTC)
@Thadh: I'm talking about pronunciation on a phoneMic level, not a phoneTic one. é/è and ó/ò in Standard Italian are phoneMic. RP pronunciation is based on the spoken language of the South of England, and native speakers there will pronounce all the phonemes of a word like they are given in a British English dictionary. They'll say /bʌs/ and not /bʊs/ (northern pronunciation) for bus. Sure, there will be phoneTic differences depending on the area, but not phoneMic ones. Sartma (talk) 21:52, 20 July 2022 (UTC)
First of all, the claim that no Italian speaker's phonological system conforms to Standard Italian's is one that desperately needs a source.
Secondly, phonological analysis isn't something a standard language can dictate: It can prescribe a phonetic pronunciation of words, which can then be analysed phonologically by linguists. And if you do that, the ò-o merger can simply be analysed as a phonetic merger of the two within Standard Italian's phonological system. I'm sure other speakers merge vowels differently. Thadh (talk) 05:26, 21 July 2022 (UTC)
“in Italia, chi più chi meno, tutti parlano con qualche venatura regionale. Non c’è nessuno in Italia che possieda l’italiano standard come lingua materna.” (in Italy, more or less everyone speaks with some regional influence. There’s no one in Italy who has Standard Italian as a native language)
I'm not sure what you mean with "phonological system" non conforming to Standard Italian. What I'm saying is that no Italian native pronounces the language consistently as it appears on dictionaries. The pronunciation given in a dictionary is no-one's native pronunciation in Italy. People in Florence might get most of the vowels right, but they pronounce consonants differently, so not even they speak like the dictionary prescribes.
Italian was created and fixed as a language way before any native speaker of Italian even existed. At the time people were all speaking different vulgars. Italy itself didn't even exist. I don't know how many other countries had a national language (spoken by no-one) before even having a nation. When Italy became a country in 1861 "Italians" didn't exist. One of the most famous sentence we learn in school is that once they unified the Italian territory the main task was to "create Italians": "Abbiamo fatto l'Italia, ora dobbiamo fare gli Italiani" (We created Italy, now we need to create Italians). All this despite having Italian as a codified language (on paper, not spoken by anyone) since around 1600. Italians started "learning" Italian only with the diffusion of the radio and the TV (they did study it in school, but the major forces for the actual spreading were radio and TV. Most people before my parents generation (talking about those coming out of the second world war and before) would barely complete the first 5 years of school). We literally had "Italian classes" on radio and TV. My grandmother died 4 years ago and she never spoke one word of Italian (just to get you an idea of how recent this all thing is).
It's also strange to think of people pronouncing O's and E's differently as a question of "merger". It's not like there was a spoken Italian to begin with and in a second moment people in different regions started to merge sounds. The differences in pronunciation reflect that of the vulgar varieties spoken in a place way "before" Italian made its entrance in the "spoken" scene. Since there were no native speakers of Italian to mimic (as there aren't to these days), anybody just pronounced it the way that came natural to them, reflecting their own regional language.
You guys should pay me. All these information are available everywhere and are nothing special, it's no secret to anybody. I'm literally giving you lessons on Italian historical linguistics for free! XD Sartma (talk) 07:34, 21 July 2022 (UTC)
Simply put, the problem is that you are not familiar enough with other languages to realize that Italian is in no way a special case here. What you have written can be said—adjusting for particularities of time, place, and so on—about RP/'the Queen's English', about Standard French, about Standard German, and any number of other cases.
Incidentally, nobody owes you anything, let alone money, for spamming trivial information that is available just about anywhere (even the not-particularly-good Wikipedia page on Italian) mixed in with your own questionable notions, such as the claim that 'people in Florence might get most of the vowels right, but they pronounce consonants differently' in a discussion where you yourself insisted 'I'm talking about pronunciation on a phoneMic level, not a phoneTic one'. For the record, neither la gorgia nor the intervocalic deaffrication are 'phoneMic'. Nicodene (talk) 09:45, 21 July 2022 (UTC)
Rather than wasting fifteen (more) minutes of my life on this, why don't you tell me directly- does the random Youtuber that you have cited actually corroborate the notion of phonemic consonant differences between Florentine and the standard? With a timestamp preferably. Nicodene (talk) 12:14, 21 July 2022 (UTC)
@Nicodene: You crazy? I'm happy to waste your time. You're wasting mine, so it's just fair, don't you think? Watch the whole thing and learn something. Sartma (talk) 13:08, 21 July 2022 (UTC)
@Nicodene, Sartma (1) No, the video displays the phonetic gorgia as the only difference from standard italian. (2) What does it matter if the gorgia is phonemic or phonetic? Sartma himself said that Florentine people "might get the vowels right". Weren't we talking about accents? Are accents on consonants? (3) Do you guys even remembered why this was relevant in the first place? (4) The way you ended up speaking with each other is the most infantile I've seen on the wikt and makes me value the efficiency of the whole project way less. Snap back to your reasonable selves, please! Excuse my straightforwardness, but this has gotten kind of ridiculous, I'm sure you agree. Catonif (talk) 19:27, 21 July 2022 (UTC)
@Catonif: The gorgia is not phonemic, and I never said it was. As you said, it's irrelevant. I was talking about é/è and ó/ò.
The only thing I've been saying is that the pronunciation you find in an Italian dictionary is prescriptive. There is no native speaker in Italy that natively pronounces words the way prescribed by the dictionary (unless they study diction). Standard Italian pronunciation is not a native Italian pronunciation, so can't be "described", can only be "prescribed". That's all. Not sure why this fact is so shocking.
Considering that Wiktionary shouldn't be prescriptive, I wouldn't add non-mandatory accents to Italian headwords. Dictionary pronunciation is given in the Phonetic section both in the IPA transcription and in the hyphenation anyway, so it would just be redundant. Sartma (talk) 22:26, 21 July 2022 (UTC)
> The gorgia is not phonemic, and I never said it was. As you said, it's irrelevant. I was talking about é/è and ó/ò.
Oh really? Then what, exactly, did you mean when you said 'people in Florence pronounce consonants differently' in a discussion about 'pronunciation on a phoneMic level' where you yourself just admitted that the gorgia is irrelevant?
> Standard Italian pronunciation is not a native Italian pronunciation, so can't be "described", can only be "prescribed". That's all. Not sure why this fact is so shocking.
What's shocking is that you still have not grasped that exactly the same can be said about RP (etc.)
> Considering that Wiktionary shouldn't be prescriptive, I wouldn't add non-mandatory accents to Italian headwords.
Meanwhile you are fine with having the same mid-vowels indicated two different ways directly above the part-of-speech section? How is that not 'prescriptive', if we accept your argument? I seem to recall you saying 'Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's'. Nicodene (talk) 06:24, 22 July 2022 (UTC)
1) Thank you. There is no small irony in posting that video, along with the remark 'learn something', in response to a comment that already mentioned the gorgia toscana and that it is not phonemic. 2–3) It is relevant because he himself chose to make it relevant by declaring 'I'm talking about pronunciation on a phoneMic level, not a phoneTic one' as part of an attempt to deny the parallels between RP and Standard Italian. Nicodene (talk) 20:43, 21 July 2022 (UTC)
@Thadh: And from what I read in Wikipedia, both General American and RP English pronunciations are based on actually spoken varieties. Standard Italian pronunciation is not, and that's why no native Italian speaker ever consistently pronounces words the way they appear in an Italian dictionary. That's also why Italian dictionary's pronunciation is prescriptive, and not descriptive. Sartma (talk) 21:12, 20 July 2022 (UTC)
You know full well that Standard Italian is and was based on traditional Tuscan, very much 'actually spoken'. Nicodene (talk) 20:16, 20 July 2022 (UTC)
@Nicodene: You have a very mystified knowledge of the history of the Italian language. Standard Italian was never based on the spoken language of Florence, but on a literary, written version of it that was a mixture of Florentine, Latin and other vulgars. Here, copy-paste this into Google translate and read the part in bold out loud 10 times:
"Il modello di lingua che viene codificato è «il toscano urbano della classe colta di Firenze» (Galli de’ Paratesi 1984: 60), cioè una varietà scritta, un registro letterario con influenze latineggianti e di altri volgari, e non il fiorentino parlato. Non tutte le caratteristiche del fiorentino sono quindi accolte dallo standard. L’italiano standard in effetti non ha mai, fin dalla codificazione cinquecentesca, coinciso esattamente con il fiorentino, e sin dal Seicento ha accolto, data anche la mancanza fra il tardo Cinquecento e l’avanzato Ottocento di un centro preminente che imponesse una norma, innovazioni di varia provenienza. La distanza dal fiorentino si è ancora accresciuta dopo l’Unità d’Italia, nonostante i tentativi puristici di imporre il fiorentino moderno come modello, in particolare per la pronuncia." (from Treccani: Italiano Standard) Sartma (talk) 20:59, 20 July 2022 (UTC)
That is adorable. You know what it sounds like? A middle-schooler smugly proclaiming 'No, you're wrong, humans didn't evolve from monkeys, they evolved from apes' and triumphantly pointing at a Wikipedia page, never pausing to think about whether apes themselves evolved from monkeys and so humans through them as well. Likewise, you have apparently failed to realize that 'il toscano urbano' was itself based on 'il fiorentino parlato' (senza parlare delle altre varietà toscane), whatever additional 'influenze latineggianti e di altri volgari' it had. Did it never occur to you that literary English also has Latinizing influences and no shortage of influence from, say, French and Norse?
I have to wonder how you imagined this happening, if you ever stopped to think about it all. Did you picture Dante Alighieri sitting down one afternoon and saying 'You know what? Imma just invent an entire phonology, an entire set of noun and verb inflexions, and an entire grammar and syntax out of thin air just for the lolz' and presto, we have Standard Italian in its embryonic form? If so, it is nothing short of a miracle how well the result aligned in all of these aspects to Old Florentine. Nicodene (talk) 06:29, 21 July 2022 (UTC)
@Nicodene: Sorry, I'm tired to keep up with your straw men. You're more interested in belittling me, than to have a discussion about the actual topic of this thread, and while I can excuse that behaviour the first couple of times (we can all slip, we're all humans), I'm not willing to engage with you further if that's just your default. So, ciao ciao! Sartma (talk) 08:55, 21 July 2022 (UTC)
Feel free to demonstrate otherwise by actually arguing the point instead of devolving into vaguely flippant one-liners or other irrelevancies. Nicodene (talk) 14:01, 21 July 2022 (UTC)
Oh no, that was the least relevant part of the entire message lol. I should have ordered my points better. Catonif (talk) 16:56, 20 July 2022 (UTC)
@Catonif: "This is the difference between this and the Japanese question. Italian written accents are standardized, Japanese written pitch accent is not (well, since it's not writable)."
It's not really "written accents" that are standardised, but the very pronunciation of Standard Italian. Accents just follow, as a way to show that.
Japanese pitch accent is also standardised and it can easily be indicated on the romanisation, but apparently even just giving the standard accent on headwords is "prescriptive, and not in accord with our mission on Wiktionary"...
As a matter of fact, though, it's pretty much the same case. The fact that accents can be more easily added to an actual Italian word than to a Japanese one doesn't change the fact that what's standardised in either case is the pronunciation, not the way to indicate it. Sartma (talk) 23:09, 20 July 2022 (UTC)
@Catonif: I agree with you that should there be the need to disambiguate two words that are prescribed in Standard Italian as having two different vowels, one would use the accent given in a dictionary, but for the only reason that that's what dictionaries prescribe. Left to their own without a dictionary to check, most Italians would be lost. Nothing in this process is "descriptive". Mind that the vast majority of Italian homophones are written exactly the same and are impossible to disambiguate, we're talking about a very tiny minority of cases here. There's no way to disambiguate "la vecchia porta la sbarra" ("the old lady carries the bar" or "the old door is blocking her"?), for example, and no-one really cares either.
Either way, we do indicate the prescribed accent in the Pronunciation section under hyphenation so I don't see the need to add it on the headword too. It would just be redundant. Sartma (talk) 21:31, 20 July 2022 (UTC)
Let's recap.
Reasons to agree:
Because we already do this
For consistency, by analogy of Latin, Slavic Languagues, Tagalog, etc.
For consistency with the verb lemmas, to which @Benwing2 already added this feature that I very much enjoy
Because the pronunciation section is not the right place for this
Not all pages have (or should have) a pronunciation section. While I don't dislike the {{it-pr}} module per se, it does take a notable amount of space, and seems kind of out of place in small entries like nottolone. This is especially true for non-lemma forms: how could I know that sagginassero is stressed on the antepenult? That is, without having to look for it in the gigantic conjugation table on the lemma entry.
IPA might be misleading in regional-only words, which are actually never pronounced as the Standard key suggests.
Because literally every Italian dictionary does this, let Wiktionary not be the weird kid
Problems arising:
Monosyllables' accent is sometimes forbidden
Solution: never write an accent on a monosyllables unless it is mandatory (see a longer explanation in my second post in this thread).
Non-Florentine accents
Solution 1: write both the standard and regional accent form (see porgere), but it doesn't look very good and I don't like the idea of having two links one right after the other that take me to the same exact page.
Solution 2: write the standard only. While the regional pronunciations are used and thus should be described, they are never used in the written language and are only pronounced. If something is never written and only pronounced, their place is the pronunciation section (this is the one thing that flipped the thread over).
Note: I don't know if here "headword" means "also in links". What I mean by headword is what is displayed by the head template. I'd leave the links alone.
Being the newbie here, I ask whether usually here people vote before going in the practical details of the implementation. It would seem sensible, given the disagreement, but I don't think there's enough people interested in this for the votes to have any actual statistical meaning. I hope this doesn't turn into Does even Italian exist?
again. Catonif (talk) 17:19, 22 July 2022 (UTC)
@Catonif: Thanks for the recap! I think that all the reasons you give are quite weak, though:
Consistency with some other languages (Latin, some Slavic languages, Tagalog): As far as I know, Inter-linguistic consistency on Wiktionary is not a thing. Each language has its own practices and standards, so the only thing that matters is Intra-linguistic consistency. Moreover, Italian has the issue of compulsory accents VS non-compulsory accents (very rare), plus words that never take an accent (the monosyllables you mentioned), which complicates things a lot. The other languages we mentioned don't have a similar issue and the diacritics used to indicate accents and vowel length are clearly different from any diacritic used in the orthography of those languages (Latin, Ancient Greek, Slavic languages), so no confusion ever arises.
Consistency with the verb forms: these forms are a recent addition by @Benwing2 and I don't know whether the topic was discussed and voted upon back then before implementation. I take it there wasn't one, otherwise we wouldn't be here talking about this? If that's the case, I don't see why we should seek consistency in that direction. What we should do is removing all accents from Italian verbal headwords (they can stay in all other form and inflection tables). That would solve the issue and no further problems would arise.
Pronunciation section not the right place: You don't give any reason why the Pronunciation section shouldn't be the right place, but your considerations about space are not an issue on Wiktionary, since Wiktionary is not paper.
IPA: What we could do about the {{it-pr}} is marking the standard IPA transcription and hyphenation as "Standard Italian", and add other eventual regional pronunciations after that under "Regional Italian" (with {{q}} or something to indicate the region/area/city). Sartma (talk) 11:01, 23 July 2022 (UTC)
@Sartma: (1) 'Inter-language consistency isn't a thing'. Generally speaking, wouldn't it be better if it was? (1b) Compulsory and non-compulsory accents don't seem to cause a mess: if it's word-final, it's compulsory. Monosyllables' solution is also simple and already mentioned: do not write accents on monosyllables unless it's mandatory by spelling. (3) 'Wiktionary is not paper' means 'Write as much information as you want, because hardwares are cheap', this doesn't have much to do with this, since writing {{it-pr}} requires only 9 bytes. What I'm talking about is the use of space of the actual displayed page which should be concise. (4) Take for example the Romanesco word "fusaglie" : is the result of a yeist merger of /ʎʎ/ and /j/ (and a merge of /s/ and /z/, though this is of marginal importance). In this case we can't know which one of the two to use in the standard, since the word doesn't exist in the standard. This is a very simple example, and for this, we could probably also use a phonetic transcription, but for words used in a bigger region, there isn't a single phonetic realization. The thing is, I might not actually care/know about about the phonetics and phonemics of every dialect, and only want to write the stress, but before I would have to solve this problem. In the end I settled with an adhoc transcription only because I wanted to display the accent, which I'd deem not ideal. (note) {{a}} is used instead of {{q}} in pronunciations.
Tbh, I'm really about to just give up on this proposal, I thought it would have been easily accepted but it's kinda draining me and it's not like I'm particularly keen of this anyway. Catonif (talk) 12:12, 24 July 2022 (UTC)
I agree with you that inter-language consistency would be better. It would definitely be my preference, too. But here on Wiktionary it's not a thing. I was told that quite clearly in more than one occasion by more than one Editors/Admin, so it's just a matter of accepting that that's the way it is... (1b) The final/non-final rule is not immediate, and I can see how a reader could be confused by this difference. That question apart, it remains that before @Benwing2 added the accent on verb headwords, there was no question of consistency in the direction of "adding accents". As I said above, all problems and possible confusions are solved if we delete the accents from the verbs, especially considering that (as far as I understand) there was no vote on the subject.
"Wiktionary is not paper" has repercussions on style too. As far as I know there is no rule mandating "concision", so I'm afraid that is just your preference...
Zanichelli has fusaglia, it notes that it's Roman/Romanesco and that it's used mainly in the plural, and has no issues giving "/fuˈsaʎʎa/ (or /fuˈzaʎʎa/)" as its standard pronunciation. That word does exist as Regional Italian, so of course it has a Standard Italian pronunciation (all Regional Italian words have a Standard Italian pronunciation, since we're still talking about Italian, not of a "dialect").
I understand your frustration, it's part of working on a shared project like Wiktionary and I think we all went through it. You have to compromise on everything, often with people who know less than you but still have a voice, and often more power than you. Results are often sub-optimal, but there's nothing you can do about that. It's something you learn to deal with/accept at some point.
That said, I could agree with indicating the accented vowel or syllable in a different way. What about underlining them? Like this: fusa̱glia, saggina̱ssero, nottolo̱ne, etc.? Sartma (talk) 18:31, 24 July 2022 (UTC)
I have an even better idea: using the grave/acute accents employed by Italian dictionaries, against which you have not provided a single logically consistent argument and which you are so far alone in opposing. Nicodene (talk) 19:28, 24 July 2022 (UTC)
"Una delle conseguenze più vistose del fatto che l’italiano è una lingua scarsamente parlata si ha sul piano fonetico. Manca uno standard parlato, perché la pronuncia dell’italiano che si è formata a partire dall’unificazione ha subito una forte interferenza delle fonologie locali: più che essere una vera e propria fonologia, dunque, è stata per molto tempo soltanto una mera pronuncia, ovvero una resa orale dello scritto (Mioni 1993; Schmid 1999; Bertinetto & Loporcaro 2005). Non esiste un corrispondente italiano della received pronunciation inglese: la pronuncia delle persone colte «in ogni regione è più simile alla pronuncia delle persone incolte della stessa regione che alla pronuncia delle persone colte di altre regioni» (Lepschy & Lepschy 1981: 13)." Sartma (talk) 23:14, 24 July 2022 (UTC)
@Sartma I see we've circled back to 'Standard Italian pronunciation don't real, so we shouldn't have it here'. It seems that the point about logical consistency has been lost on you for the second or third time now, since the consequence of what you're trying to argue would be removing all Standard Italian pronunciations from Wiktionary- if, that is, you are actually right.
Of course even a cursory search for 'Standard Italian phonology' brings up reliable sources speaking of it as a real thing, shockingly enough. Luciano Canepari has published an entire book describing it in fine detail (Italian Pronunciation & Accents). In the same vein, see also Martin Kramer's The Phonology of Italian. For incidental comments in other sources, see for instance the Oxford Guide to the Romance Languages (e.g. 'in standard Italian, the phonemic contrast /s/ ~ /z/ is neutralized in preconsonantal contexts' or 'educated standard Italian speakers nowadays may have either no paragogic vowel or just a weak vocalic post-consontal release akin to an "excrescent" vowel'), Lori Repetti's Phonological Theory and the Dialects of Italy (e.g. 'vowel length in Standard Italian'), Clivio & Danesi's The Sounds, Forms, and Uses of Italian (e.g. 'phonetic feature missing from the pronunciation of standard Italian'), or Michele Loporcaro's Facts, theory and dogmas in historical linguistics (e.g. 'for Standard Italian, the experimental-phonetic literature shows that there is a gradual decrease in stressed vowel length as the number of syllables to the right of the stressed one increases'). We can go on all day with more examples. Nicodene (talk) 08:24, 25 July 2022 (UTC)
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Sartma, Nicodene, Catonif Apologies for not reading all the discussion here, there was way too much arguing past each other. IMO Nicodene you are more guilty than others in this discussion of taking a negative tone, although Sartma I don't really understand your assertions that no one speaks Italian as a native language. (This may well have been true in 1861 when Italy was unified, but generations of schooling have resulted in large numbers of people who do speak essentially according to the standard, especially those whose native languages are/were quite dissimilar to the standard. Something very similar has happened with Standard German.) Also, User:Thadh is very correct in noting that the situation is not so different from other languages. The solution used in Wiktionary across all languages that have a standard form is to describe that standard; Sartma, I don't understand at all your desire to remove information on standard Italian from Wiktionary just because some people don't speak that way. I would be strongly opposed to using an ad-hoc method of indicating stress, such as underlining the stressed syllable, given that there is a nearly universally used standard way of doing so using acute and grave accents, which also conveniently indicates the quality of the stressed vowel. In other words I largely agree with Catonif in this regard (and would not be opposed to marking the stress and vowel quality on the headword, including of non-lemma forms, as we already do with verbs). Benwing2 (talk) 06:26, 25 July 2022 (UTC)
@Benwing2: When I say that no-one "speaks" Standard Italian as a native language, I don't mean that there are no Italian native speakers and I'm not talking about grammar or vocabulary. I never said that. There are native speakers of Italian, of course, but the pronunciation of the Italian they speak is not the standard as indicated in the dictionary. Nowhere in Italy there is a place where the standard pronunciation is native. I mean that Standard Italian doesn't have a native phonology of its own, that the pronunciation given in dictionaries is no-one's native pronunciation. It's never been at any moment in history and it's definitely not nowadays either. And that should definitely shock no Italian (if they were born there and lived there, this should just come as an obvious piece of information to them, unless they're in denial). I understand that we all tend to think things are elsewhere just like they are at home, so Italian must be just like any other standardised national language in the world ("you're just too ignorant to know that", would add Nicodene), but it really isn't the case, and it would be nice to be able to have a bit more of a nuanced discussion here without going crazy and becoming offensive. I've literally just posted an article about Italian pronunciation by Treccani written in 2011 where scholars are quoted saying l’italiano è una lingua scarsamente parlata ("Italian is a hardly spoken language"), which is why manca uno standard parlato ("there is no spoken standard"). Italian pronunciation più che essere una vera e propria fonologia (...) è stata per molto tempo soltanto una mera pronuncia, ovvero una resa orale dello scritto ("more than being a true and real phonology, for a long time it's just been a sheer pronunciation (here they mean "reading"), i.e. an oral rendition of the written form"). They also clearly state that non esiste un corrispondente italiano della Received Pronunciation inglese ("there is no Italian equivalent to English Received Pronunciation").
In the same article, they say that Un modello di standard parlato sarebbe il cosiddetto fiorentino emendato (...). Il modello è stato poco praticato nell’insegnamento scolastico, poiché per certi versi artificioso (...), e in pratica non è appreso da nessun parlante come lingua materna: vale piuttosto come punto di riferimento normativo. ("A model for a spoken standard would be the so-called "emended Florentine". This model has been rarely used in scholastic education, since in a sense it is artificial, and as a matter of fact no speaker acquires it as a native language: it works rather as a normative point of reference.").
Of course if you ask Nicodene, they must all be ignorant scholars that know nothing about languages and Treccani is just a shitty Encyclopedia for ignorant people, but my question to you would be: how many linguistic sources of those other languages we're talking about would be fine with writing in 2011 that their respective Standard languages "have no true and real phonology" but they're just a "reading" of the written standard form? And that their spoken standard language has no native speakers?
I understand that this discussion is probably hitting on a lot of people's cognitive bias, but that doesn't make it less true. Italian dictionaries are prescriptive in the stricter sense of the term: they are normative, they tell you what you should say, despite no Italian speaker natively possesses that phonology system. Sartma (talk) 08:59, 25 July 2022 (UTC)
'There is one accent which is not connected with a specific locality, though it is rather more southern than northern in its overall character. This is RP, which is short for Received Pronunciation... Despite the advantages of RP as a regionally neutral accent, it has not displaced the local accents of England... RP is clearly a minority accent... Until World War II RP was also the exclusive accent of the BBC and it is still especially prominent there' (Gramley & Patzold 1992, A Survey of Modern English, p. 309).
For comparison, some 3% or so of Italians speak the standard (from here, p. 11). For the other parallels, such as relative prominence in the media, I assume you can connect the dots yourself. With all due respect to Treccani, the situation is comparable to that of RP in the UK. This is not the first time, incidentally, that I've disagreed with that source on matters outside of Italian lexicography, considering that it incorrectly derives Spanish botilla (< bota + -illa < Latin buttis + -ella) from Latin butticula (whence Spanish botija, with /x/ not /ʎ/).
As for this:
> despite no Italian speaker natively possesses that phonology system
Tuscans do, considering that we're 'talking about pronunciation on a phoneMic level, not a phoneTic one' (and, in any case, we only provide phonemic transcriptions for Italian). Granted, it is not always a perfect match on a lexical basis (e.g. Tuscans often have /ɔ/ rather than /wɔ/ in certain words), but neither is any regional variety of English to RP, that I'm aware of. Nicodene (talk) 11:15, 25 July 2022 (UTC)
@Nicodene: I hope you'll excuse me for ignoring the random paper you found. It was written in 1983 for a random conference in Finland, with a bibliography where the oldest reference work is dated 1982. Hardly something worth quoting in any serious discussion. Bash Treccani as much as you wish, but at least make an effort to find something better? (besides, even in that paper they say that Italian dictionaries are prescriptive, lol).
Also, "Tuscans" don't all speak with the vowels indicated in dictionaries. Tuscany is a big region, there's a lot of Tuscans having different vowels, so that's just not true. Sartma (talk) 11:45, 25 July 2022 (UTC)
@Sartma The figures come from Luciano Canepari's Italiano standard e pronunce regionali, which is not available online, hence I linked a paper that extensively quotes from it. Perhaps if you'd cared to look carefully, you might have realized that yourself.
'besides, even in that paper they say that Italian dictionaries are prescriptive, lol'
Lol omg lmao xD indeed. Why don't you look up RP + 'prescriptive' and see what you'll come up with? There's a world of possibilities out there, waiting only for your click. You'll also have to explain to me, as I already asked you to, how it is 'prescriptive' for Wiktionary to describe a standard pronunciation if regional ones are permitted as well.
I like how you've provided "Tuscans" with scare quotes. Anyway, if you personally believe that Tuscans in general differ, on the phonemic level and in some actually significant number of words (say- 10%?), from Standard Italian with regards to /ˈe/ versus /ˈɛ/, or the corresponding back vowels, I'd like to actually see a source for that totally legit, definitely-not-made-up-on-the-spot notion. Bear in mind that this has nothing to do with the overall 'phonology system', as you put it. Nicodene (talk) 11:58, 25 July 2022 (UTC)
@Nicodene: the difference between RP and Italian dictionaries standard pronunciation is that RP came to be as a linguistic reality first, and only afterwards it was described (and sometimes prescribed). Italian standard pronunciation never existed and still doesn't exist outside of a dictionary, so it can only be prescribed, it can't be described. Florentine and Tuscan people speak with a Florentine or Tuscan accent(s), they don't speak Standard Italian. Sartma (talk) 13:36, 25 July 2022 (UTC)
Standard Italian pronunciation very much does exist; cf. the several reliable sources that I quoted earlier and the most recent one as well. I'm sorry, but this is simply not something you can deny without descending to the level of flat-eartherism. At this point I have to wonder if you are in fact trolling, considering that you directly admitted 'I'm happy to waste your time'.
'Florentine and Tuscan people speak with a Florentine or Tuscan accent(s), they don't speak Standard Italian' is such a complete non-sequitur that I am flabbergasted. We were clearly talking about differences between Tuscan and the standard. If you do not actually have an answer, just say so.
Edit: just for good measure, another set of sources:
Bertinetto & Loporcaro, The Sound Pattern of Standard Italian as Compared with the Varieties Spoken in Florence, Milan and Rome. No need to cite a specific page here, as the entirety of it is à propos. Per the abstract, 'this paper is a condensed presentation of the phonetics and phonology of Standard Italian...'
Claudia Vigario et al., Phonetics and Phonology: Interactions and Interrelations, p. 141: 'The elision of voiced vowels also serves to increase the complexity of consonant clusters, creating heavy or superheavy syllables not licensed in standard Italian phonology...'
Gabriel et al., Manual of Romance Phonetics and Phonology, p. 276: '...in southern varieties of Italian, a word like roba 'things, stuff' can be produced with a geminate : , and the duration of geminates is typically phonetically longer than in Standard Italian'.
Gibson & Gil, Romance Phonetics and Phonology, p. 92: 'Some phonetic information for contemporary standard Italian is provided in Ladefoged and Maddieson (1996: 218–21). According to these authors, Italian intervocalic rhotics are realized as trills... The apical alveolar trill is also reported by Bertinetto and Loporcaro (2005: 133) to be the "unmarked allophone" of the rhotic phoneme in standard Italian'.
Also on the previous page: 'Contemporary standard Italian inherited its phonemic inventory from the Tuscan (Florentine) dialect and is phonetically very close to the variety spoken in Florence and other Tuscan areas...'
Ledgeway & Maiden, The Cambridge Handbook of Romance Linguistics, p. 240: 'As for standard Italian, stressed vowels in open syllables undergo lengthening, while all unstressed vowels are short...'
Maiden et al., The Dialects of Italy, p. 67: '...the dialectal data provided evidence of the CG constituent, while such evidence was lacking from the phonology of Standard Italian'. Nicodene (talk) 15:41, 25 July 2022 (UTC)
@Benwing2: About using accents as Italian dictionaries do, I'm not against that by principle. But by doing so we are being prescriptive. If that's ok here on Wiktionary, than let's go for it. In the past I was told that this is not accepted here on Wiktionary. @Eirikr (talking about indicating pitch accent in Japanese), talking about Japanese standard pronunciation, for instance, told me:
"I would like to draw your attention to the second point at Wiktionary:What_Wiktionary_is_not -- our aim with Wiktionary is to describe how words are used, not to prescribe how words should be used. Specifying a pitch accent pattern in all of our romanizations is overly specific and incorrectly prescribes pronunciation for that term."
"the pitch accent marked in most dictionaries is specific to the "standard" variety of Japanese, which is inappropriately prescriptive for our mission here at Wiktionary."
If this is not true, I'm just asking for it to be written somewhere in clear terms. There is obviously some confusion on the topic of what's to be considered "prescriptive" on Wiktionary when it comes to pronunciations. Sartma (talk) 11:18, 25 July 2022 (UTC)
@Benwing2, Catonif: If we decide to add accents to all headwords, what will we do in cases like lettera? Are we going for a sandard dictionary pronunciation/traditional "lèttera" or with the current Modern Italian "léttera"? Sartma (talk) 14:38, 25 July 2022 (UTC)
Since you're all cornering Sartma with 'Standard Italian is spoken though', I must say, he does have a point. There's no speaker outside of Florence and it's environs who pronounces mid vowels in all words like the standard. On the other hand, here in Rome, the discrepancies in mid vowels are, though present, quite few, and looking at a dictionary which contains 'prescriptive' accents still feels descriptive of how I actually speak natively. Now, we of course don't want to make a Florence-Rome-centric dictionary, which is why we include regional variants on the pronunciation section, but I would rather for the 'no one speaks this way' claim to be softened. Speakers without the mid-vowel merger (meaning outside of the North-West or extreme South) still pronounce most words like the dictionary.
@Sartma: I believe there is a thin difference between the Japanese thing and this one. I tried to explain earlier, when I said that 'written accents are standardized' but I must admit that it wasn't very clear. I'm going to make an example to compensate my lack of clearness in explainations.
How to pronounce doccia:
(most common) /ˈdot.tʃa/
(Veneto?) /ˈdɔt.tʃa/
How to pronounce 箸:
(Tokyo) /háɕì/
(Kansai) /hàɕí/
And this is all nice and well since it's all descriptive, the problem arises when you spell these words.
How to spell doccia:
doccia
dóccia (used literally 0% of the time, but could still be used)
dòccia (no one would ever write this)
I don't feel like I'm prescribing the pronunciation, more like, describing how the word could be spelled. Now how is this different from the Japanese thing?
How to spell 箸:
箸
Moreover, I must admit I'm not very knowledgeable of Japanese, but isn't it kinda rare for Japanese dictionaries to include pitch accent? On the other hand, I've never seen an Italian dictionary without accents.
@Catonif: I would add that there are variations even within Tuscany itself. For doccia, DiPI says you can find dóccia/dòccia in Tuscany, Marche, Latium and Rome, and gives néve/nève in Tuscany. I'm from Venice, we say dòccia, but in Treviso they say dóccia. Same region, just neighbouring cities.
You say that you've never seen an Italian dictionary without accents, but the digital version of Devoto Oli, Hoepli and De Mauro do not in fact indicate the accent on the headword. They only show it in the following hyphenation.
I also wouldn't be so sure that "no one would ever write dòccia". I mean, no one would probably ever put the accent on that actual word, but what I mean is that I'm not sure that if someone wanted to write the accent on a word without checking what the dictionary prescribes, they would use the right accent.
I did understand the difference you see between Italian and Japanese, I think your explanation was quite clear from the beginning. I just don't agree that it's a matter of writing system. To me it's a matter or pronunciation. The fact that the Italian writing system includes accents (since some are compulsory) and gives you an easier way to show whether a tonic vowel is open or closed is accidental, and doesn't mean that the grave/acute accent comes as a predefined "set" to that word, even when it's not "showing". So the word "lettera" doesn't inherently have a "hidden" /é/ that appears if you want to write it. "lettera" is the Standard Italian spelling for that word, and can take a grave or an acute accent depending on how you pronounce it. If you pronounce it with an è, you can write lèttera, otherwise you'll write léttera. I don't know if I'm being clear, but it's not a question of orthography, it's a question of pronunciation, so we should treat it as such, i.e. in the pronunciation section.
Regarding Japanese dictionaries: good ones would tell you on what kana the accent falls. They generally use numbers, so you would find things like:
The key issue we had at that time was not about including pitch accent information in our entries -- we already do that in the Pronunciation sections. The problem point was Sartma's proposal that romanized Japanese in {{m|ja|...}} or {{ja-r|...}} should show accents. This is problematic in a few different ways.
Japanese prosody will cause words to evidence different pitch accent patterns depending on the prosody of the overall utterance. A word that is usually pronounced with three morae and a downstep after the second mora might wind up having no downstep at all in certain phonological contexts. Prosody is complex, and beyond the ken of most of our editors (myself included), at least when it comes to definitively editing entries here.
Pitch accent for the "main" or "standard" variety (as used in nationwide broadcasting; mostly synonymous with Tokyo Japanese) is indicated in our entries using {{ja-pron}}. Pitch accent for other varieties is also given in some entries, where users knowledgeable enough have added that information: the Kansai pitch accent is an important one, by measures of number of speakers and cultural importance. Examples include 朝(asa, “morning”) and 朝(ashita, “morrow; next morning”). Adding Tokyo pitch accent to all romanizations is thus superfluous to information already available, and a maintenance challenge.
The number of words even in "standard" Japanese that have multiple accepted pitch accent patterns is not small. Examples include 桜(sakura, “cherry, cherry tree”), 夜更かし(yofukashi, “staying up late”), 一日千秋(ichinichi senshū, ichijitsu senshū, “time has slowed to a crawl”, literally “one day, one thousand autumns”), 三方(sanbō, “a specific kind of offertory table used in Shintō”), and as an extreme instance, 耳当て(mimiate, “earmuff, ear flap”) with four accepted pitch patterns. There is no sane way of using accented romanization to spell out such words.
Historically, romanized Japanese has not used any diacritic except the macron to indicate long (two-mora) vowel values. Adding accents would represent a radical departure from romanization practices.
As an aside, the accent grave notation like à is usually used in IPA to indicate a low tone, whereas the accent acute like á is used to indicate a high tone. The word 橋(hashi, “bridge”, pitch accent pattern 2) and the word 箸(hashi, “chopstick”, pitch accent pattern 0) are identical in this romanization scheme, both appearing as hàshí. The key difference is the downstep, which for 橋(hashi, “bridge”) occurs after the word and before any following particle so the particle has a lower pitch, and which is absent for 箸(hashi, “chopstick”), so any following particle continues the higher pitch from the preceding mora.
Using the accent grave as Sartma suggests above, to mark the mora after which the downstep occurs, is one potential way to overcome this. However, this also deviates from IPA and presents a likely source of confusion.
All that said, I know very little about Italian phonology, orthography, dialects, etc. I post this in the hopes that it might be useful in some way. ‑‑ Eiríkr Útlendi │Tala við mig19:11, 25 July 2022 (UTC)
@Eirikr: Thank you for the recap and your personal opinions on showing the accent on romanizations. I used the wrong accent on my examples, they should have been 橋hashí and 箸háshi, with acute accent (as in my original proposal and as used for example in Samuel E. Martin's Japanese reference grammar and dictionary). You presented your opinions as if they were matter-of-fact, although none of your points is really a problem, but that's a discussion for another time and place.
@Benwing2, @Catonif: The important thing here is that one of the reasons to discard my suggestion of adding the accent on Japanese headword's romanizations is that it is "superfluous to information already available" (i.e. in the pronunciation section); in other words: it's redundant. (now, ignore the fact that @Eirikr never really gave me this reason back then; what I was told is that it was "inappropriately prescriptive" for Wiktionary; redundancy doesn't seem to be bothering any Japanese editor to the slightest, otherwise they couldn't stand any of their pronunciation sections, where the same information is repeated 4 times, or entries with ルビ(rubi) + romanization, etc., or any of the numerous redundant things present in pretty much every Japanese entry...).
Mutatis mutandis, this is exactly the same case as the one we're talking about for Italian. If "it is redundant/prescriptive to indicate the standard accent on headwords" is a valid reason for Japanese, that should be true for all languages. On the contrary, if those are no valid reasons, then they shouldn't be for Japanese either. As I said before, It would be nice to have a clear answer to this question and to write it down somewhere, so we're all on the same page from now on. Sartma (talk) 08:27, 26 July 2022 (UTC)
Derogatory terms amendment to WT:ATTEST has passed
@Fytcha: my understanding as the drafter is that an editor has two options:
Wait for two weeks from the date the entry is created, and if there are insufficient qualifying quotations after that the entry can be speedily deleted by an administrator. (Preferably, the entry should be tagged with {{derogatory}} so that editors are alerted to the matter.)
Alternatively, the entry can be sent to RFD or RFV. If this is done, the qualifying quotations must be added within two weeks of the entry being nominated, otherwise it may again be speedily deleted.
I don't believe we have any rule that new amendments to the rules are non-retrospective. Thus, as it stands, any derogatory term created more than two weeks ago from today, or which has been listed at RFD or RFV more than two weeks ago from today which hasn't been satisfactorily resolved, can be deleted right away. This is without prejudice to their undeletion if the required quotations can be found. — Sgconlaw (talk) 18:30, 20 July 2022 (UTC)
@Fytcha: I believe so, given the wording of WT:DEROGATORY. However, the policy says such entries may, and not must, be speedily deleted. Thus, it would be for an administrator to use their judgment to decide whether speedy deletion is warranted. I would take the view that speedy deletion would probably not be warranted for a term that appears widely in other dictionaries, and which no one seriously disputes (such as Bible thumper; I’m not so sure about faggotry). On the other hand, for a term which is a dubious neologism or completely made up to vandalize Wiktionary, judgment may properly be exercised to speedily delete. — Sgconlaw (talk) 19:09, 20 July 2022 (UTC)
I've RfV'd all the n-word derivatives I could find that didn't seem sufficiently supported with citations or references. (Probably missed some.) I hope that's not too disruptive. At least after this we'll know that the ones we have can be attested. 98.170.164.8822:14, 20 July 2022 (UTC)
To me it seems very questionable to (be able to) delete them without at least pulling them through the appropriate forum for 2 weeks and it will, from what I can see, inevitably lead to citable terms being deleted because they flew under the radar... — Fytcha〈 T | L | C 〉 01:20, 21 July 2022 (UTC)
In hindsight, it might have made more sense to start the two week time limit from the time of being tagged instead of the time of entry creation, so admins can't just delete any uncited derogatory entry from years past without giving anyone time to cite it. Oh well, it's been voted on. 98.170.164.8801:31, 21 July 2022 (UTC)
I agree. It seems absurd that I could now technically delete Bible thumper if I wanted. Merely reducing the RFV time window on the other hand would have been more reasonable. I am not sure if everybody who voted in favor of the change was aware of the full implications; I at least was confused by the vote's text. — Fytcha〈 T | L | C 〉 01:37, 21 July 2022 (UTC)
@Fytcha: I’d say let’s try out the current policy for a month or two, and if after that it’s felt that it should be tweaked another round of consultation followed by a vote can be held. — Sgconlaw (talk) 06:36, 21 July 2022 (UTC)
Question about the scope of this rule: is this only intended to cover words for specific groups, like ethnic slurs, or does it apply to generic obscene insults as well, like shitbiscuit or jizztrumpet? (Both of those are cited, but they illustrate the kind of term I am referring to.) The wording "or with the use of a demeaning or obscene term" leads me to believe it covers such words, but I'd just like to check. 98.170.164.8823:01, 20 July 2022 (UTC)
The policy defines a term to be derogatory if, among other things, it is apparently intended to denigrate “an unnamed person, group of persons, or geographical location on the basis of ancestry, ethnicity, gender or sex, religion, or sexual orientation, or with the use of a demeaning or obscene term”. I think their suggests there should be some targeting of a person, group of persons, or geographical location inherent in the term, which is absent in a mere term of abuse. But I can see how on a broad reading of the definition such terms might be covered. — Sgconlaw (talk) 06:42, 21 July 2022 (UTC)
That would seem to include all terms validly labeled offensive or derogatory. Any term of insult (eg, idiot, four-eyes, fatso, pale male) can be 'intended to denigrate "an unnamed person with the use of a demeaning or obscene term"'. DCDuring (talk) 12:24, 21 July 2022 (UTC)
@DCDuring: yes, that would be the broad reading I referred to. But I reiterate what I said above that it is for administrators to exercise discretion in determining when speedy deletion should be applied. A word like fatso appearing in many other dictionaries would probably not warrant speedy deletion, whereas a word like jizztrumpet which looks made up to vandalize Wiktionary might well be speedily deleted first, and possibly undeleted later if sufficient quotations are gathered on its citations page. — Sgconlaw (talk) 12:50, 21 July 2022 (UTC)
Obviously, there are many well-attested (dated) insult terms.
Possibly related: all "slurs" were removed from the official Scrabble word list about a year ago (totalling several hundred words!): . Thank God SNIGGER is word, sez I. Equinox◑14:10, 21 July 2022 (UTC)
The Affiliate voting process has concluded. Representatives from each Affiliate organization learned about the candidates by reading candidates’ statements, reviewing candidates’ answers to questions, and considering the candidates’ ratings provided by the Analysis Committee. The selected 2022 Board of Trustees candidates are:
You may see more information about the Results and Statistics of this Board election.
Please take a moment to appreciate the Affiliate Representatives and Analysis Committee members for taking part in this process and helping to grow the Board of Trustees in capacity and diversity. These hours of volunteer work connect us across understanding and perspective. Thank you for your participation.
Thank you to the community members who put themselves forward as candidates for the Board of Trustees. Considering joining the Board of Trustees is no small decision. The time and dedication candidates have shown to this point speaks to their commitment to this movement. Congratulations to those candidates who have been selected. A great amount of appreciation and gratitude for those candidates not selected. Please continue to share your leadership with Wikimedia.
Thank you to those who followed the Affiliate process for this Board election. You may review the results of the Affiliate selection process.
The next part of the Board election process is the community voting period.You may view the Board election timeline here. To prepare for the community voting period, there are several things community members can engage with in the following ways:
Read candidates’ statements and read the candidates’ answers to the questions posed by the Affiliate Representatives.
I recently returned to check out the topic of "Westrobothnian" on English Wikipedia. The topic has been questioned for about a decade now. Over a year ago, I posted a request on the talkpage for someone to reference that "Westrobothnian" (Swedish: västerbottniska is an established, recognized term. No one attempted to address this or even commented on it, so I recently redirected the article to Norrland dialects. Work is ongoing to expand the this article to reflect published linguistic research.
So far, no sources have actually been presented that support the idea that it is a recognized language, either by authorities or linguists. There's actually not even recognition of västerbottniska as a separate dialect grouping. The only example I've been able to find is Dahl (2016) (pp. 15-16) where the term used as a convenient grouping for the sub-categories of north and south västerbottniska and transitional variant in Ångermanland. But Dahl makes no attempt at specifying general traits of such a grouping. Dahl's classification of dialects is otherwise identical to the classification that was established in Våra folkmål by Wessén (1967) and is the current standard for Swedish dialects.
As far as I've been able to ascertain, there are no sources that västerbottniska is much in use among speakers themselves. Its use seems to be limited mostly to a minority of hobbyists and activists who are attempting to establish a regional language identity. This seems similar to the campaign to do the same for Scanian (which often includes dialects spoken not just in Scanian/Skåne but also neighboring Halland and Blekinge). There's no standardized orthography and entries here on Wiktionary seem like they are simply the preferences of individual contributors. There are no recognized language codes either. The proposed synonym of bondska (literally "peasantish") has been propagated, but this is downright misleading since it's a term that has historically been used to describe all forms of rural dialects in Sweden, not just the ones spoken in Västerbotten.
The previous discussion have in my view gotten stuck in dissecting minutiae of language samples, or the contentious issue of language recognition of Scandinavian dialects in Sweden in general. The matter of "Westrobothnian" has also been muddled by whataboutisms relating to Scanian, Elfdalian, Gutnish, etc. rather than looking at the merits of "Westrobothnian" on its own. Unless we are presented with sources that actually support widespread use of "Westrobothnian" outside of the Wikimedia projects and self-published websites, is it really relevant for Wiktionary to keep Category:Westrobothnian language and its associated entries?
I do completely understand why you are removing both the wikipedia article and the wiktionary entries, since the grouping has not been used to refer to both dialects in Västerbotten and Norrbotten in the literature, and since there is no published material indicating that they form a cohesive group, let alone a separate language. However, I do want to clear up a few things.
Firstly, bondska doesn’t mean ”peasantish”, it means ”farmerish”. The translation to ”peasantish” and the argumentation about how it’s derogatory is one of the things that frustrated me the most about the wikipedia article. Secondly, even if the term bondska has historically been used to refer to all dialects in Sweden, it is by far most commonly used in Västerbotten and Norrbotten today. For example, the swedish radio program ”språket” used it a few years back https://sverigesradio.se/avsnitt/1389742.
With regards to classification, I do believe that ”bondska” forms a separate linguistic group, but I don’t believe that saying that it’s a separate language or ”just” a dialect adds much to the conversation. Bondska dialects are quite difficult to understand for swedish speakers, but since everything is a continuum I don’t think we need to define it as a language. There are however several phonological as well as grammatical developments shared by Västerbotten and Norrbotten, some of which were mentioned in the wikipedia article, which do not exist further south in Ångermanland or further east in Österbotten. Hopefully I will one day write something about those developments and maybe then ”Westrobothnian” or ”Bondska” (I have no idea how to anglisize this) will actually be a real linguistic term. For now though, I understand your decision to remove it from wiktionary.rubbedibubb (talk) 16:20, 22 July 2022 (UTC)
If this was actually a recognized dialect group, it should be easily supported by sources. But it's been over a decade since the concept was introduced to Wikipedia and so far no evidence has been presented. Everything is limited to isolated facts about various local dialects in Västerbotten and a single off-hand reference by Dahl (which doesn't actually describe any dialect group). It's all based on conclusions drawn by editors, not by published linguists. The only generally recognized dialect group among Swedish linguists is norrländska mål ("Norrland dialects") which is divided into a lot of local dialects, but no västerbottniska or bondska. I can certainly agree that an update and nuancing of the current dialect grouping would be in order, but that's for linguists to establish, not the Wikiprojects.
The public service radio program Språket focuses on Swedish in general and is aimed at a general audience. At best, it can summarize existing research, but no more than that. In this case, no publications are actually cited (even if they have a general linguistics experts on the show). At the beginning of this particular episode (1:15), bondska is described as a general term that covers all of Norrbotten, Västerbotten and even Ångermanland ("according to some listeners"). The actual samples used in the episode are explicitly from the Piteå dialect. Even in that case, they describe a situation where speakers are trying to establish a written standard that hasn't actually existed before (3:00).
I believe it's a great thing to try to keep local dialects alive, and I would encourage anyone to use their local dialect wherever possible, including social media, work, online forums, etc. But if you want to describe them on the Wikiprojects, it has to follow some sort of established, published standard. You can't simply make up your own dialect groups without violating basic principles of how the projects are supposed to generate content.
For anyone taking an interest in this, I want to underline that the issue regarding the 'Westrobothnian' entries does not primarily concern terminology or language status, but rather the entire basis for the classification.
Currently on Wiktionary, 'Westrobothnian' is de facto defined as being a distinct linguistic grouping and a separate language, comprising the traditional Scandinavian dialects spoken in the Swedish historical provinces of Västerbotten and Norrbotten (cf. the current entry for Westrobothnian). This is also how it was presented on English Wikipedia until the article was recently cleaned up. The problem is that this definition has no basis whatsoever. It has been invented and spread online by a handful of hobbyists, with the Wiki projects being targeted in particular.
There are no scientific publications that classify these linguistic varieties as a single grouping distinct from the varieties spoken further south. On the contrary, dialectological handbooks like Wessén (1970) and Pamp (1978) both say that there is a sharp linguistic boundary between Västerbotten and Norrbotten, and Edlund (1996) points out that there are no significant linguistic traits that are unique only to Västerbotten.
There are also no advocates for this purported grouping among speakers of the concerned linguistic varieties (with the possible exception of those pushing for the classification online). No locally published work on any traditional Scandinavian dialect of Västerbotten or Norrbotten makes any mention of their dialect being part of such a larger grouping, not to mention a language. It is also conspicuously difficult—perhaps impossible—to find any mention of a distinct 'Westrobothnian' language even on Google, that cannot be linked to the few hobbyists mentioned above. Armeix (talk) 21:31, 21 July 2022 (UTC)
What action do you propose to take? If simply renaming the language won't work, would you rather have us reclassify them as Swedish with a dialect label, or split the language into Västerbotten and Norrbotten, or just delete them entirely? The last one doesn't sound very appealing. Splitting into two dialects could work but it would require someone who is deeply familiar with regional Swedish lects to go through and reclassify them all.
By the way, it seems like a lot of the entries aren't easily verifiable. While some like aftavǽł have references that might provide some assistance in telling whether the term is from Västerbotten or Norrbotten, others like ennęrvęnnę and kwyʃʃ have no references and no independent hits on Google, so it's unclear where they even came from. 98.170.164.8820:47, 25 July 2022 (UTC)
If the entries aren't attested outside Wiktionary, then we're back to the issue raised at Category talk:Westrobothnian lemmas (that, based on available evidence, including the argumentation above that the lect itself doesn't exist as such, the entries shouldn't exist / don't meet CFI). If some are attested, then as the IP says, we need to decide whether we can assign them to more linguistically-recognized lects or to Swedish, although the latter might require many to be deleted as the criteria for inclusion would go up from "one mention" to "three uses". - -sche(discuss)00:22, 26 July 2022 (UTC)
The classification recognized by linguists specify Norrbotten dialects and individual local dialects. There is often discussion of dialects within specific historical provinces (landskap) but this is a strictly geographical distinction and is not based on linguistic analysis as such. Including a map here to give an idea of what it looks like. In aftavǽł, there are attestation to sources that specify at least two local dialects (Lövånger and Burträsk) and entire provinces (Västerbotten, Jämtland, Ångermanland, etc). "Westrobothnian" is an attempt to merge multiple dialects under one banner in a way that makes entries very difficult to disentangle. The only thing they have in common is that they are all considered Norrland dialects of Swedish.
I believe we have to focus on the underlying problem here: "Westrobothanian" is not recognized and doesn't belong in Wiktionary. A quick fix could be to replace "Westrobothnian" with "Norrbotten dialects" as as sub-category of Swedish. This would be painting with a very broad brush, but at least it would replace a made-up term with a recognized one. It would leave room for anyone who wants to split entries into local dialects. How would the setup for this work? Are there any existing setups that can serve a inspiration?
But this has no linguistic basis! The Norrland dialects are as distant from Swedish as Elfdalian, and just like it they are not descended from Old Swedish, since they lack such basic things as eastern monophtongization. Some dialects even have different outcomes of nasal vs non-nasal vowels in the pre-language. So why should a single parish's dialect spoken in Sweden (Elfdalian) be classified as a separate language, and all the others be classified as Swedish, even though they don't even descend from Old Swedish (or even 1000s Old East Norse). As for the talk about hobbyists and original research, since when is this prohibited on Wiktionary? Look at the Etymology scriptorium, there's a ton of OR going on there and nobody complains. We are not Wikipedia. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything15:48, 26 July 2022 (UTC)
Swedish is not a language lacking in linguistic research. There's a very large body of scholarship out there that has probably described hundreds of local dialects since the 19th century. None of them recognize "Westrobothnian" as a separate entity (except for a description of dialects within specific landskap). None of the sources used here on Wiktionary recognize them as "Westrobothnian" either. The term has been created by activists without little or no basis in published research. As far as I know, it's not even something that would be widely recognized by those who are supposed to be speakers. Most of those seem to identify with local dialects of Piteå, Kalix, Lövånger, etc.
I'm not going to get involved in Elfdalian whataboutism or debates over phonological minutiae. It's not because I'm against describing dialects on Wiktionary, but because I'm against using Wiktionary as a place to publish linguistic research. Wiktionary is indeed not Wikipedia, but I'm not buying the argument that there's a carte blanche to engage in OR here. After all, in Wiktionary:Criteria for inclusion, there are guidelines requiring attestation. It seems self-evident to me that a requirement for attestation also requires us to include entries under established labels, not made-up ones.
Something I think we should do more often is add dates to etymologies or definitions, but that's only partially related to the discussion I'd like to start.
I've been adding first usage dates to Polish entries, but one thing that I've wondered about is inherited terms. Should we base it on the cutoff dates set for Old/Middle languages and find the first one after that, or what?
Relatedly, how new should a term be for us to add the neologism label? Should we have a system where once a word is a certain age, we remove the label? Vininn126 (talk) 09:56, 21 July 2022 (UTC)
Your point about neologisms is well taken. How a term is viewed will vary quite a bit. Thinking about a political example, how "neo-" a given neopronoun is depends on the context and there will be places where English is spoken and a given pronoun will be considered "new and radical" and others where it is conventional. I realize this doesn't add much clarity, but I hope it helps us think about what constitutes a "neologism" at all. —Justin (koavf)❤T☮C☺M☯10:09, 21 July 2022 (UTC)
Our current glossary definition is "any term that has been newly coined", which is very broad. However, there is a social aspect to this, as many people have various associations with the term. We do tend to have a history of going with the more technical definition with such terms, which would mean we should go with the broader definition, in which case we should determine a specific "age" and also be dating terms more to determine what is and isn't a neologism. Vininn126 (talk) 10:55, 21 July 2022 (UTC)
@Vininn126: you make a good point about whether we should have a guideline for when a neologism should be regarded as no longer “new”. I wonder if linguists have a rule of thumb for this? If not, I wonder if it is advisable to simply adopt an arbitrary period for when a “neologism” label should be removed – two years? Five? Ten?
defdate is probably ideal for words with many definitions.
Another point to bring up with this is how hot-words and neologisms interact. Shouldn't we be tagging hot-words as neologisms? Vininn126 (talk) 12:01, 21 July 2022 (UTC)
@Vininn126: I see hot words as terms that have an even more tenuous status than neologisms, as they may be deleted if they don’t gain currency for more than a year. Perhaps if a term survives beyond that period the “hot word” label can be replaced by “neologism”. — Sgconlaw (talk) 12:54, 21 July 2022 (UTC)
Actually now that I think about it, defdate only partially works, what if the same meaning has been passed down? Would we still just use the cut-off year, or would we find the earliest use in a descendant? Vininn126 (talk) 19:11, 21 July 2022 (UTC)
My first question has been about inherited words/definitions - Let's say we have a definition of a word that has been used since Middle English and has been carried over to Modern English, how are we supposed to date that? Vininn126 (talk) 08:41, 22 July 2022 (UTC)
@Vininn126: I'm not sure – I think this is definitely worth discussing. I think the tendency has been to indicate the earliest evidence of the term in Old English or Middle English, but this doesn't sit very well with our policy of treating these as separate languages. — Sgconlaw (talk) 07:07, 23 July 2022 (UTC)
WSJP, a monolingual Polish dictionary, lists earliest usages found in Old Polish, which confirms the idea that the tendency is to include even the parent languages. It doesn't seem right to me, but neither does the alternative. Vininn126 (talk) 08:12, 23 July 2022 (UTC)
@Vininn126: yes, it would be odd in English entries to say “from 1500”, since the use of the year 1500 as the dividing line between Middle English and modern English is fairly arbitrary. Maybe we should say something like “inherited from Middle English (early 14th c.)”. That way it is clear that the term is derived from a different language (at least as we regard it here). — Sgconlaw (talk) 08:29, 23 July 2022 (UTC)
I wonder what the best way to avoid duplication would be - i.e. Should we repeat that information on the Middle English entry as well? Vininn126 (talk) 08:35, 23 July 2022 (UTC)
@Vininn126: I don’t really edit Old and Middle English entries, so I’m not sure. I’d say that, at most, it would be enough to list a modern English term as a descendant in a Middle English entry. — Sgconlaw (talk) 08:47, 23 July 2022 (UTC)
It's not just Old and Middle English, but really any language with descendants, so... basically all of them. I have the same problem with Polish and Old Polish. Vininn126 (talk) 12:43, 23 July 2022 (UTC)
My two cents. When a term is coined for a concept for which no term existed – often because the concept did not yet exist (software, metrosexual, cryptocurrency) it seems unnecessary to label it explicitly as a neologism, regardless of the date of coinage. For the “pure Turkish” terms invented by committee to replace Arabic and Persian loanwords, such as çaba for say, it is (IMO) reasonable to call them “neologisms” also today, even though three generations of Turkish school kids have grown up not realizing these words did not exist one century ago. So I think it depends a bit on the context and purpose of the coinage. --Lambiam14:41, 21 July 2022 (UTC)
This connects back with the "cultural" connection mentioned above. IF that is the direction we decide to go, we should set up very clear definitions in our glossary so we can adhere to them and appropriately label. Vininn126 (talk) 14:43, 21 July 2022 (UTC)
I think putting dates on inherited terms is misleading at best. For one thing, most languages don't have the level of attestation necessary for this to make sense. Even in English, it can be problematic: let's say you have a rare English term attested since the 19th century with sound changes that suggest it had to have come by inheritance from an Old English term that may or may not have been attested. It must have been spoken by someone, somewhere in the intervening years, but we don't know by whom or where- did it reenter the standard language from some obscure dialectal term spoken in one place, or was it widespread, but just didn't happen to occur in writing that survived?
And then there are terms borrowed from other languages that are adapted to a native shape by analogy with inherited terms, based on known correspondences. Some of these may be the expected outcome of inheritence from terms in ancestral languages that actually died out.
Another point: temporal boundaries between parent and child languages are rarely clear-cut. A particular form may have been adopted from the start as a fashionable way to show how modern one was, or it may have hung on as an archaism due to some association with traditions or literature, songs, etc. Or both may have coexisted in different registers, sociolects, etc.
For any or all of those reasons, giving a date based on the general history of the language implies we know more than we do. We should stick to dates of attestation, and leave the implications for our readers to draw on their own. Or perhaps mention the gap, with the before (if known) and after attestation dates cited. Chuck Entz (talk) 17:48, 23 July 2022 (UTC)
So your short answer for inherited terms would be to ignore it altogether? That is what I have been doing thus far. (We still should discuss neologisms.) Vininn126 (talk) 17:54, 23 July 2022 (UTC)
I think in previous discussions about neologisms it was felt that a generation growing up with the word (i.e. 20 years) was a decent (max) cutoff at least for English, but I only take that to mean that a term that's been in use for 20 years isn't a neologism, not that every newer term has to be labelled as a neologism, if other considerations apply, like those Lambiam mentions. In most cases I think a combination of defdate, etymology and/or possibly usage notes (e.g. for Turkish) will be clearer. (I thought we were moving away from the neologism label in favor of those things for that reason, and for the reasons Lambiam mentions — that a lot of terms are actually newer than people might think because the concepts themselves are new — as well as because labelling neologisms is a rolling maintenance problem, as you inherently always have to go back and remove the label at some point.) For inherited terms I agree it'd be silly to give only the earliest post-1500 cite if it's been in use continuously from before 1500; I see entries date senses to the earliest attestation even if it's in an ancestor language, e.g. English "since the 8th century", and in general this seems reasonable to me, although there are occasional situations that should be handled differently (one time someone labelled a definition which made reference to chromosomes as having been in use since the 14th century...no). - -sche(discuss)05:53, 24 July 2022 (UTC)
2p from a Japanese perspective.
Japanese is a bit of an oddball for our purposes: there is only one major dividing line, between Old Japanese and Japanese, and while the history is depicted as relatively clear (as at w:Old Japanese), that line is actually quite blurry in lexicographical contexts due to the continued use of archaic terms and constructions in higher-register writing. In part because of this, monolingual modern Japanese dictionaries will, to some extent, include at least some entries for terms that are technically Old Japanese.
As such, the Japanese community here has only relatively recently set up Old Japanese as a separate "language". Even in doing so, we keep a lot of the information in the "main" language of Japanese, and often only trouble with making an Old Japanese entry if there are details specific to Old Japanese that would be inappropriate in a Japanese entry.
Depending on how other languages do things, the entry at Japanese如是(nyoze) might be an example to crib from. This term first appears in Japan in a text from 611, well within the Old Japanese stage (usually treated as ending around 794), so technically in Old Japanese. It remains in use in specific contexts even today, as evidenced by the 32K+ works listed at Google Books that include this term and were published this century.
@Vininn126, an example to maybe look at is with Korean. The typical practice to do with words inherited from (Late) Old Korean, Middle Korean, or Early Modern Korean is to list the first written attestation ever, see: 식다(sikda), 서울(seoul), 구물구물(gumulgumul), 꿰다(kkweda), 꿈(kkum), and 가엾다(gayeopda) for some examples. That's more clear to me instead of trying to put first "modern" attestation of words at whenever the language transitions, unless the script or spelling has majorly changed. AG202 (talk) 19:19, 25 July 2022 (UTC)
Hi guys! I would like to invite you all to play Multilingual Scrabble with me for the third time! It is back after a four-year absence. The game will kick off on Saturday July 23. To play all you need to do is have a valid email address and not be blocked. Dunderdool (talk) 19:14, 21 July 2022 (UTC)
keener currently has three senses with a single etymology keen + -er. I would argue there are actually three different homonymous etymologies, as follows:
@Jnestorius: if the words originate from different senses of keen, I would separate them into different etymology sections. In fact, if two related words with different parts of speech have different etymologies (for example, a noun and a verb derived from different Middle English words), I also put them in separate etymology sections; I only combine them if one is directly derived from the other. — Sgconlaw (talk) 03:47, 22 July 2022 (UTC)
What IPA transcription to use for Hong Kong English?
I'm currently adding HKE pronunciations to some entries (so far Cambridge, Manchester, extraordinary, September), but then I went onto latte, which already had HKE pronunciation. The problem is that the IPA on latte uses stress marks, but HKE is often considered to be a tonal dialect of English (at least for the majority of HKE speakers, the non-tonal speakers are very few and almost always follows RP pronunciation), so I find the usage of stress marks on latte weird. On the other hand I'm not sure if there was already a consensus on whether to use stress marks or tone marks for HKE, so I'm still doubting on whether to change it or not. Pinging @Justinrleung the one who added HKE pronunciation for latte. -- Wpi31 (talk) 08:59, 22 July 2022 (UTC)
User:Oniwe created this topical category under "religion". I am skeptical; not about the existence of the religion, but about whether we need/want to have categories of this nature. There are thousands of traditional religions, and I'm not sure we want a topical category for every one. On top of this, according to Wikipedia's entry on Igala Kingdom, the traditional religion of the Igala is called Ifá and is a Yoruba religion, and we already have a category Category:Yoruba religion. In general, I'm not sure how we should categorize traditional religions but I suggest groupings of them, similar to how we group most Protestant sects under just Category:Protestantism. There are only three subcategories under Protestantism, Category:Anglicanism, Category:Quakerism and Category:Mormonism (and it's extremely doubtful whether the latter belongs there; I doubt you'll find very many authorities on Mormonism who consider it Protestant, and I seriously doubt most Mormons consider themselves Protestant). Benwing2 (talk) 04:46, 23 July 2022 (UTC)
I think we should have subcategories as long as there are languages that have a set (say, over ten) terms related specifically to that (sub)type of religion. For instance, we could potentially classify Norse, Greek and Roman religions under "Indo-European religion", but numerous languages that have terms for these religions also have terms for the others (so, Greek has a word for both "Thor, "Zeus" and "Jupiter").
So, what we need to ask ourselves is, do Igala speakers distinguish between the Igala and Yoruba religions, or do they not? Thadh (talk) 06:18, 23 July 2022 (UTC)
I think that while Yorubas and Igalas may have similar deities, along with other Niger-Congo ethnic groups, the Igala religion is unique to the Yoruba religion, in the way the Igbo traditional religion, Odinala, is similar but unique to the Yoruba and Igala religions. Igala religion centers on a divinity known as Ọ́jọ́ whom they consider the Supreme divinity. Ifá is a divination system that is found in almost all West African traditional religions, but these religions are distinct spiritualities. While the Igala religion has not been the subject of much research and the amount of adherents small, Igala speakers most certainly distinguish between the two religions as they do the languages. I shall add more terms to the Igala religion category, which definitely exceeds 10. Oniwe (talk) 04:01, 24 July 2022 (UTC)
Category:Terms derived from placenames (and another category for wrong ones)
and a category for "Terms incorrectly based on placenames", like Hawaiian pizza (from Canada), or Rua's example of Dutch filet americain (not from America).
The main (connected) issues are what exactly to call the categories, and how (or, I suppose, whether) to exclude just any placename-adjective (Bostonian, Russian, etc) which would otherwise absolutely swamp the more interesting terms. If anyone has a better idea than "Terms based on placenames" ("Terms derived from placenames"?), please pipe up. - -sche(discuss)00:23, 24 July 2022 (UTC)
...upon closer inspection (because I recalled and was trying to relocate where someone had complained about "toponym" meaning something else; maybe I was thinking of this and this), I notice we've had Category:English terms derived from toponyms since 2012, so I guess the only one we're missing Category:English terms derived incorrectly from toponyms? (Is there a better name? "Category:(Language) misnomers derived from toponyms", qualified thus to prevent the issue with bare "misnomers" that was raised in the last thread? But would that include things where the toponymic part is right and it's only the other part that's a misnomer? I can't think of an example offhand, since in e.g. Dutch baby both parts are arguably misnomers, it not being Dutch or a baby, but examples must exist.) - -sche(discuss)21:35, 24 July 2022 (UTC)
Thesaurus categories lump together different languages
Category:Thesaurus:People contains mostly English pages, but also some Indonesian, Finnish, Polish, Telugu, and Chinese, all mixed together into the one category. IMO it would be clearer (or at least, more consistent with how we do topical categories for mainspace pages) to subcategorize by language, e.g. "Category:Thesaurus:en:People". Yes? At the risk of distracting from my main point, I also notice we have several nearly-empty thesaurus pages like Thesaurus:linguist which contain very little content which could more easily be left directly in the mainspace entries... - -sche(discuss)07:29, 25 July 2022 (UTC)
I agree with you on both accounts. I will also bring up this old thing which we should probably reconsider. It is something that is easy to decide and enforce while the amount of entries on WS is still quite low. brittletheories (talk) 05:08, 26 July 2022 (UTC)
Good point. A user there makes the accurate point that a few words like god could end up being useful Thesaurus pages in several languages. OTOH, lengthening the titles of every Thesaurus: page for the tiny minority of such pages that would have conflicts seems less than ideal (e.g., "beautiful woman" is only a phrase in English); maybe only the few that have conflicts could be split onto subpages (or handled like mainspace entries, with multiple language headers). Anyway, yes, we should try and decide this. - -sche(discuss)00:19, 27 July 2022 (UTC)
Sure, but as far as I understand the current setup of the thesaurus, it'd be at that Japanese title, like Thesaurus:死ぬ is, and wouldn't conflict with Thesaurus:beautiful woman. (Whereas, Danish and Norwegian might both want to put synonyms of their respective words for good at Thesaurus:god... so there I guess we might need to resort to "Thesaurus:god/da" etc.) - -sche(discuss)00:48, 27 July 2022 (UTC)
An Election Compass is a tool to help voters select the candidates that best align with their beliefs and views. The community members will propose statements for the candidates to answer using a Lickert scale (agree/neutral/disagree). The candidates’ answers to the statements will be loaded into the Election Compass tool. Voters will use the tool by entering in their answer to the statements (agree/disagree/neutral). The results will show the candidates that best align with the voter’s beliefs and views.
Here is the timeline for the Election Compass:
July 8 - 20: Volunteers propose statements for the Election Compass
July 21 - 22: Elections Committee reviews statements for clarity and removes off-topic statements
July 23 - August 1: Volunteers vote on the statements
August 2 - 4: Elections Committee selects the top 15 statements
August 5 - 12: candidates align themselves with the statements
August 15: The Election Compass opens for voters to use to help guide their voting decision
The Elections Committee will select the top 15 statements at the beginning of August
Best,
Movement Strategy and Governance
This message was sent on behalf of the Board Selection Task Force and the Elections Committee
I would like to open this to a somewhat broat community discussion. As it stands, I cannot find any consensus as to the best way to handle how many/what cognates to list. On some pages you get giants walls of cognates, and other times you get people removing them.
Personally, I don't use them that often, but on occassion they have proven to be useful. I imagine the average reader might also find them useful - it can be a pain clicking through page after page to try and hunt down a cognate.
Personally, I would prefer a more delicate touch - not the giant walls of cognates listing every possible sister language and the like. I think it's best to try and get a cognate from major branches. A sort of example of this would be bób - I listed a cognate from each Slavic branch, and then one from more distant branches. There can obviously be a lot of variation in this - people might not agree which EXACT languages to list using this method, which is a downside.
I think I've said it before, I'll say it again and probably will keep saying it in the future: In my opinion, the most effective way is give one or two cognates per branch, unless you lack an ancestor term, taking into account the (cultural/political) significance of a language. To illustrate this point:
A Komi-Zyrian term inherited from Proto-Permic, inherited from Proto-Uralic, derived from Proto-Indo-Iranian, being a term in the major Permic language (one could argue for Udmurt being that, but let's not go into that just yet) should get all Permic cognates, two Uralic cognates (preferably Finnish and Hungarian), two IIr cognates (preferably Sanskrit and Persian).
A Komi-Permyak cognate should display only Udmurt and Komi-Zyrian. If there are no Permic cognates, but there are Uralic cognates, two of those should be displayed.
A Saterland Frisian term deriving from Proto-Germanic should display a West Frisian term and a major West Germanic cognate (preferably German, English, Dutch), since North Frisian is very minor and a mess.
Which cognates exactly should be given depends on the language, but preferably ones that the reader interested in this language knows: So, geographic neighbours and close languages. Thadh (talk) 15:54, 27 July 2022 (UTC)
I do think having a proto page with descendents should change it - but I wonder if we should rather go in the opposite direction - by having the nearby languages there, they are more at hand, but the more distant languages take more clicks. Vininn126 (talk) 16:37, 27 July 2022 (UTC)
I remember the fruitless hagglings we had about this issue. People are never going to agree which cognates are appropriate. The solution is not listing any cognates when a parent entry with descendants exists. That's neutral and avoids duplication. Also, let users click and see our beautiful proto-pages. May be they will learn something new. Vahag (talk) 16:20, 27 July 2022 (UTC)
This has been mentioned before. The downside is sometimes someone might want to check a more distant cognate and dig through up to 5 proto pages and the give person might not be after that. Vininn126 (talk) 16:30, 27 July 2022 (UTC)
If you really must have a lot of cognates in an etymology section, at least have the decency to hide them from those who don't care about them (ie, almost any normal definition-seeking user). DCDuring (talk) 16:23, 27 July 2022 (UTC)
I do prefer having them hid, as well. Also, it's true it would be better to have data. This is from talking to a few people who don't it. You are claiming they only search for definitions, but a lot of our reader base are amateur linguists who are interested in etymologies. Both claims require data, which neither you nor I have. Vininn126 (talk) 16:28, 27 July 2022 (UTC)
Yes, I don't think our average reader is necessarily the average reader of dictionaries. Wiktionary, in my experience, is pretty niche, so I wouldn't be so quick to dismiss the value of cognates for our users. Andrew Sheedy (talk) 04:40, 29 July 2022 (UTC)
I appreciate you probably weren’t referring to them, but just to confirm that this discussion doesn’t apply to doublets? I think those are always worth listing. When a term has tons of descendants, they can be quite tricky to spot on the tree (especially when it gets split across more than one entry for practical purposes, as has happened with Egyptian bꜣjr and its descendent Latin barca). It wouldn’t be trivial for a user to see that barge, bark, barque and baris are doublets without that being mentioned on each entry, as they all entered English independently. Theknightwho (talk) 21:45, 27 July 2022 (UTC)
I don't think it does: as I see it, cognates are related words in different languages, while doublets are related words in the same language (with a special type of relationship at that). I wouldn't support removing the latter either. PUC – 17:52, 29 July 2022 (UTC)
Very simply, there's no such thing as a few, selected cognates. There are lots and lots of editors whose only contribution consists of adding words in their own language to cognate lists. If you have Danish, soon there will be Norwegian, Swedish, Faroese and Icelandic- these people feel offended that their language is omitted. There are even editors who add Albanian "cognates" to terms that only go back to Proto-West Germanic, among hundreds of others. Perhaps we should have some kind of template or tooltip or something that tells readers to click on the parent terms to see cognates. Chuck Entz (talk) 03:05, 28 July 2022 (UTC)
@Chuck Entz, @Vininn126: Cognates are not, strictly speaking, part of the etymology of a word. They are nice to have, though, from a comparativistic point of view. I wouldn't mind having them all grouped together in a collapsable table after the etymological explanation. That way there would be no need to limit the number of cognates, we can just add them all if we want. Sartma (talk) 11:35, 30 July 2022 (UTC)
I support eliminating cognates, so long as the ancestor language has an entry for the term in question, with or without leaving some template encouraging readers to click on the proto-whatever entry (it seems a bit redundant to me). Nicodene (talk) 08:10, 29 July 2022 (UTC)
I think I could get behind something like that. I'm not the biggest fan of having tons of cognates, unless it's for farther off connections, but even then. Vininn126 (talk) 18:01, 29 July 2022 (UTC)
Oppose listing cognates as a rule. Adding them soon makes for an unreadable mess, I much prefer streamlined etymologies. PUC – 17:52, 29 July 2022 (UTC)
@Sarilho1: Some might see those as back-formations, some might not. Still remains the fact that those are, deverbals. If we end up agreeing on deverbals being back-formations, then we will treat deverbals as a subcategory of back-formations and still use the {{deverbal}} template. Personally though, I consider a verb > noun shift no different than a noun > verb shift. Words are made from words, there's no correct direction. Catonif (talk) 08:38, 31 July 2022 (UTC)
I've understood back-formations as words that could be expected to be derived from another word, and if so the case is weak for these. Many of these noun-verb pairs in Romance languages are both inherited from Latin, some are derived from the noun, and some are derived from the verb; so there's no clear etymological expectation to follow or break. I've been referring to these as deverbals, mainly because there's not a neat suffix category to put them in. (Category:Spanish words suffixed with -o should not be used in this case, IMO.) Ultimateria (talk) 22:26, 5 August 2022 (UTC)
I agree with Ultimateria. Back-formations, or formation by analogy or regressive derivatives, are unusual derived forms that usually follow the opposite direction. For example, reivindicar from reivindicação (ultimately from Latin rei vindicatio) is a back-formation removing suffix -ção. A deverbal noun is a common process in Romance languages, not an unusual back-formation. Vriullop (talk) 09:38, 10 August 2022 (UTC)
2+2=?
Heyy.
Why are proverbs not treated as a subcategory of phrases? We can't access them from the lemmas page (even though they are counted as lemmas). Shumkichi (talk) 14:11, 30 July 2022 (UTC)
I'm a bit wary of doing this, as I think they're our only active native-language Mon contributor. It's pretty out there, but ultimately harmless. Theknightwho (talk) 18:39, 9 July 2022 (UTC)
I'm probably the only active native-language Guilin Mandarin contributor. Does it mean I can abuse my user page? 沈澄心✉06:20, 11 July 2022 (UTC)
What do you mean by "abusing", lol? Also, it't THEIR userpage, not yours, they can do whatever they want with it. Also, what does their userpage have to do with their other activity here, huh? If they're a productive user, why the hell do you care about this other stuff? Some ppl here are just ugghhhh Shumkichi (talk) 10:56, 11 July 2022 (UTC)
@沈澄心Btw., you decided to put this on your userpage: "This user is non-binary (Q48270) and is currently using */* pronouns." How is this in any way related to Wiktionary? And how much content unrelated to Wiktionary is "too much" and who decides this? You? Isn't it a little arbitraryyy? Shumkichi (talk) 11:00, 11 July 2022 (UTC)
@Fytcha are they allowed on my global user page? If not, I will remove them or consider creating a local user page. 沈澄心✉11:41, 11 July 2022 (UTC)
@沈澄心: Don't worry about it, I'm sure nobody takes issue with it. It was more to demonstrate that the rules in WT:USER are rather strict but that most editors on here don't care as long as it doesn't come from a place of bad faith and belongs to somebody who actually works on the project. — Fytcha〈 T | L | C 〉 21:07, 11 July 2022 (UTC)
0. It's my global user page on Meta, not on Wiktionary.
1. Userboxes on meta:User:沈澄心 are just a little amount of my personal information, which is usually allowed. They even cannot occupy a page on a PC.
2. Your user page belongs to you but does not completely belong to you. Not everything is allowed on your user page. Usage of user pages should follow some rules, and I just don't think Wiktionary is a place for promotion or webhosting. (edited)
To be fair, I think that pronouns are important to interacting with and referring to other users on the site, so I very much welcome them. For the RFD at hand, though, it does seem to violate WT:USER, but I don't feel that it rises to the level of deletion, though maybe a warning or something could be given. AG202 (talk) 20:15, 11 July 2022 (UTC)
I reverted the tagging for speedy deletion and was preparing to write it up here. While it's true that it grossly violates some aspects of WT:USER and this user has always had trouble with Wiktionary norms and practices, as the user page of a native speaker of and contributor in a little-known but not unimportant language I felt it should be at least discussed before deletion. Chuck Entz (talk) 18:42, 9 July 2022 (UTC)
Is this to be read as a request to remove the invocation of {{User:咽頭べさ/Top icon}} and the "100% true history" and then clean up the formatting? I would favour keeping the information box. --RichardW57 (talk) 19:43, 10 July 2022 (UTC)
If the decision comes down to whether to get rid of the user page together with the user (who will probably resist etc.) or neither, I'd prefer the latter. It may be worth pointing out WT:USER is explicitly marked as a mere draft proposal but that also goes for a lot of other WT namespace pages that are de facto pretty binding. Maybe also worth pointing out that User:Shumkichi was recently blocked for perceived user page violations. — Fytcha〈 T | L | C 〉 01:50, 11 July 2022 (UTC)
Keep — Just a full disclosure, Burma and the Mon language are not my areas of interests. But, it's a user page of an active good-faith contributor. It's not even totally useless, since it also has some linguistic information in the box, and the essay (whatever you think of it) does show an interest in the Burmese culture which seems to be the user's main interest in editing on this site. The person doesn't have an alarming amount of subpages unrelated to site content. I say, if they want to share their opinion or perspective on the history of Burma, on just one user page, that's fine. To delete it would be giving them a reason to stop editing—hell, even having this discussion could be enough to do that already. Furthermore, I see nothing grossly offensive enough to even warrant a discussion. I really think a damn good reason is needed to ever warrant deleting a user page, so we should not be pedantic about it. It's not harming anybody, so I say leave it alone. if it ain't broke, don't fix it. PseudoSkull (talk) 04:18, 11 July 2022 (UTC)
Scrub. Burma is now in the WTO and acceded to TRIPS, and those give at least 50 years protection, so the question now becomes one of licensing of the content. --RichardW57m (talk) 10:44, 13 July 2022 (UTC)
Okay, maybe saying it's "crap" is inappropriate, but that doesn't change the fact it was copied from that Mon news site, so that should be grounds for removing it, no? Acolyte of Ice (talk) 12:48, 13 July 2022 (UTC)
srsly, does anyone think that that Mon news site will sue this person or what? xd halo policja proszę przyjechać do internetu Shumkichi (talk) 12:54, 13 July 2022 (UTC)
@Shumkichi please read WT:COPY and m:ToS. Copyright violation is not allowed, even though the copyright holder has not take any action yet. BTW, if you participate in Commons, you can see deletion requests involved in copyvio every day. 沈澄心✉13:35, 13 July 2022 (UTC)
@Shumkichi Please stop assuming bad faith and personal attack. I nominate this page for deletion just because it is out of scope. I don't care if it is about history of Mon, Myanmar or other things. 沈澄心✉13:46, 13 July 2022 (UTC)
Change vote to delete as copyright infringement. There is virtually no way this could be out of copyright in the US, and keeping copyrighted content (even if it's unlikely to be enforced) is strictly against universal WMF policy. This'd be deleted pretty readily over at Wikisource; I don't see why Wiktionary has to be an exception. PseudoSkull (talk) 13:13, 13 July 2022 (UTC)
Even before any news of copyright infringement, "the racist BSPP dictatorship abolished the privilege for Mon monks" was enough for me to vote delete: I think WT:NPOV is more important than any individual editor's right at their own user page. If we the editors cannot be neutral on our userpages, how can anyone trust us to be descriptive in our entries? Thadh (talk) 13:25, 13 July 2022 (UTC)
@Thadh NPOV is only about edits to the site content. Opinions are like assholes and we've all got them, so it's pretty human. Opinions of individuals on their user pages are fine IMO. (And the stating of such opinions on the user page could also theoretically help us catch instances of non-NPOV edits.) PseudoSkull (talk) 13:37, 13 July 2022 (UTC)
I know that it is technically not against WT:NPOV and thus not against policy, but I do think it's in bad taste and wouldn't oppose it made policy not to include such things on userpages. We all have opinions, but no reason to take them to work. Thadh (talk) 13:43, 13 July 2022 (UTC)