Wiktionary:Beer parlour/2023/May

Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2023/May. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2023/May, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2023/May in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2023/May you have here. The definition of the word Wiktionary:Beer parlour/2023/May will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2023/May, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

Footnotes in Further Reading section (after References)

In the Irish entry for fara there is a footnote for the etymology – the source cited there also talks about inaccuracies in one dictionary. Then in the Further Reading section that dictionary is listed – with a cautionary note based on that reference too, with the same footnote as the etymology. So initially the Further Reading section was above the References section and all was well – but that’s not the canonical entry layout on Wiktionary and it was automatically rearranged by a bot – and now the footnote is broken (the <ref> tag is after the reference list, so it doesn’t refer to it, if you click it you get Cite error: Invalid <ref> tag).

I see @Mahagaja reordered the sections back some time ago – presumably to fix that – but it was again rearranged by a bot later and it’s broken again.

So… is there a good way to handle this? Should it just be changed to a plain-text “See O’Rahilly above” note or something like that instead of an actual footnote? // Silmeth @talk 10:38, 1 May 2023 (UTC)

I've moved References to the bottom as an L3 header while keeping Further reading within Etymology 1 as an L4 header. I think (hope) the bot won't undo that. —Mahāgaja · talk 12:08, 1 May 2023 (UTC)
This is a good solution, the bot won't undo it JeffDoozan (talk) 14:28, 2 May 2023 (UTC)
Also ping the bot operator @JeffDoozan This, that and the other (talk) 05:50, 2 May 2023 (UTC)
In general, ban bots from making changes unaided that require human intervention - reordering 'References' and 'Usage notes' automatically is generally dodgy.
In this case, it might help to use the group attribute in <ref> and <references/> so that one can add a references section to the further reading. I'd like make its heading 'Notes'. --RichardW57m (talk) 09:06, 2 May 2023 (UTC)

Multiword Terms

Do we have a discussion of when lemmas that resemble two words joined by a hyphen are multiword terms?

For example, I will contend that words like re-entry and co-operate are single words with a digraph-breaking hyphen, just as in the Thai surname Chan-ocha (RTGS) or Chan-o-cha (now usual form) it represents a syllable boundary. On the other hand, there are relatively clear cases such as man-eater. (At the Grease Pit, I have already separately requested the capability to use the |nomultiwordcat= parameter of {{head}} with the headword templates for the English language.) However, there are, to me, debatable cases such as Anglo-French, Franco-German and re-member. (Note that the addition of a prefix or suffix is supposed not to create a multiword term.) --RichardW57m (talk) 11:52, 4 May 2023 (UTC)

Has there been a decision pertaining to the existence of multiword Thai terms? There are many idiomatic compounds of the form noun-noun and noun-adjective with no major phonetic adjustment for which it may be difficult to decide where they are one word or two, and it has been suggested that therefore the concept is meaningless for Thai. Has this conclusion been endorsed? If so, has it been deliberately extended to Thai script languages in which word boundaries are marked? --RichardW57m (talk) 11:52, 4 May 2023 (UTC)

Slightly off-topic, we have a means of selectively suppressing categorisations as multiword in newly added languages. Should manual categorisations as multiword be implemented by manually written categorisation or by a new template such as {{multiword}}? --RichardW57m (talk) 11:52, 4 May 2023 (UTC)

@RichardW57m Apologies I've been a bit under the weather but I will implement your multiword cat param requests soon. Benwing2 (talk) 18:52, 10 May 2023 (UTC)
@Benwing2: Thanks! The |multiword= or similar should be the only way to determine the term or expresssion is multiword for Thai, Lao, Khmer, et al. IMO, it shouldn't be based on presence of spaces or square brackets ]]
@Benwing2, Atitarev: While spaces don't help with identifying the constituents for Thai, a space within a Thai term (other than the sequence " ๆ") is very good evidence that the term is a multiword term. It's better than a hyphen within an English word! RichardW57 (talk) 01:59, 12 May 2023 (UTC)
@RichardW57: It's fine, as long as there are no false positives as with term containing . Anatoli T. (обсудить/вклад) 02:04, 12 May 2023 (UTC)
@Atitarev: Your reply is ambiguous. What's fine? Using spaces as evidence of multiwordiness in scriptio continua, or hyphens in English? Using hyphens in English falls foul of words like co-ordinate, where the hyphen serves the same function as a diaeresis. --RichardW57 (talk) 02:18, 12 May 2023 (UTC)
@RichardW57: "Fine" meaning that I agree with you that space could be used to identify multiwords with an exception or two.
I think you don't need to offer the solution by introducing the non-existent {{multiword}} template. Category:Thai headword-line templates has Thai specific PoS templates, which also includes {{th-phrase}} (usually but not always multiword). A parameter may be sufficient, a language-specific or generic.
After a cleanup of the current ]] where they don't belong (created for etymological reasons, not to separate words), the use of ]] can be used only for multiwords. Just a thought, a parameter is probably better. Anatoli T. (обсудить/вклад) 02:34, 12 May 2023 (UTC)
I think we don't want to make the editor-preferred solution be just to add "]". Some headword-line templates and modules are protected against editing. --RichardW57 (talk) 02:57, 12 May 2023 (UTC)
@Benwing2 Using a parameter to {{head}} has the problem that it doesn't automatically become a parameter to the multitudinous language-specific headword templates. Having {{multiword}} would bypass that problem. --RichardW57 (talk) 02:03, 12 May 2023 (UTC)
@RichardW57 Using a template like {{multiword}} works for languages where the multiword category is suppressed by default, but not so easily the other way around; for this to work you'd have to have Module:headword read the page contents to see whether {{multiword}} is present, which seems like asking for trouble. Benwing2 (talk) 02:13, 12 May 2023 (UTC)
@Benwing2: I only suggested {{multiword}} for forcing categorisation as a multiword term; it would override instances of |nomultiwordcat=1. Arguably the implementation of |nomultiwordcat=1 should have been extended to all pertinent headword templates. (I presume one-word proverbs are possible.) We do have languages where using hyphens as a diagnostic of multiwordiness is considered far too unreliable, but spaces are used for identifying constituent words. --RichardW57 (talk) 02:46, 12 May 2023 (UTC)

Spanish - obsolete verb-form template use?

Hello to all Spanish editors. I was looking at Category:Spanish forms of verbs ending in -ir, which apparently consists of all entries that use {{es-verb form of}} with a lot of manually-specified parameters, which results in a much different format to the more modern, auto-generated one that is formed by simply passing the verb lemma. E.g., arrepentamos currently uses:

# {{es-verb form of|ending=ir|mood=imperative|pers=1|number=plural|arrepentir|nocat=1}}
# {{es-verb form of|ending=ir|mood=subjunctive|tense=present|pers=1|number=plural|arrepentir|nocat=1}}

...which generates...

  1. First-person plural (nosotros, nosotras) imperative form of arrepentir.
  2. First-person plural (nosotros, nosotras) present subjunctive form of arrepentir.

...whereas it would be possible to just write:

# {{es-verb form of|arrepentir}}

for

  1. inflection of arrepentir:
    1. first-person plural present subjunctive
    2. first-person plural imperative.

This latter convention is the only one I've seen up until now, and I have been browsing Spanish here for a few weeks now. Should we update this old format to match the new and automatic style? I figure this task is small enough to do manually, and I would do it, but I don't know if everyone agrees. Thank you for any opinions, Kiril kovachev (talk) 21:58, 4 May 2023 (UTC)

@Kiril kovachev: All or nearly all of the items listed in Category:Spanish forms of verbs ending in -ir and the larger Category:es-verb_form_of_with_old_params are junk forms that were created by mistake, usually the "regular" conjugation of irregular verbs. The "valid" verb forms have all be converted to {{es-verb form of}}. There are two "valid" verb forms, érase dícese, plus a few "obsolete" verb forms that aren't handled by {{es-verb form of}}. See ] for more details. JeffDoozan (talk) 14:47, 5 May 2023 (UTC)
Okay, thank you for this. In conclusion, if they're "junk" as you say, should we be removing them now? If they don't represent valid Spanish then this category and all of its entries are just misleading. How can I tell also whether a particular entry in those categories is valid, or indeed junk? Kiril kovachev (talk) 18:46, 5 May 2023 (UTC)
@JeffDoozan Apologies if my previous response did not get through to you. It's not overly urgent, but if there's anything I can do then I'd simply like to know if this is the right thing to do. — This unsigned comment was added by Kiril kovachev (talkcontribs).
I've deleted the bogus forms of arrepentir without the stem change. Ultimateria (talk) 22:21, 8 May 2023 (UTC)

FYI: May update from Unicode

https://mailchi.mp/39a4f117a8f4/unicode-in-6236062Justin (koavf)TCM 17:58, 5 May 2023 (UTC)

Unacceptable behaviour from the admin Thadh to the user Anazarenko

I think that the message from the admin Thadh violates paragraph 2 of the Universal Code of Conduct. Gnosandes ❀ (talk) 18:49, 6 May 2023 (UTC)

This is the pettiest stupidity I've seen today. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 19:10, 6 May 2023 (UTC)
Did en.wiktionary ever formally adopt the UCC? Wonderfool69 (talk) 19:14, 6 May 2023 (UTC)
@Wonderfool69 I don't know, but admin Vininn126 used UCoC against me. Gnosandes ❀ (talk) 20:29, 6 May 2023 (UTC)
I was saying you were acting against its spirit. Do not twist words. Vininn126 (talk) 07:14, 7 May 2023 (UTC)
According to this page, "Further information on the implementation of the Guidelines will be published in April 2023." and apparently that hasn't happened yet. My take on all that is that Wiktionary is a jungle! 71 Brick Walls (talk) 20:41, 6 May 2023 (UTC)
No, Wiktionary is not "a jungle". --Geographyinitiative (talk) 21:04, 6 May 2023 (UTC)
The most interesting part of this discussion is that this is the first time User:Wonderfool69 has been used as an account name. Can't wait for User:Wonderfool420! -- Sokkjō 07:39, 7 May 2023 (UTC)
The edit summary was unnecessarily rude. An admin should hide it. Vox Sciurorum (talk) 15:24, 8 May 2023 (UTC)
If you had to deal with Anazarenko, you'd realize there is a justification for being somewhat harsh. — SURJECTION / T / C / L / 16:09, 8 May 2023 (UTC)
I have to agree with Surjection here. There's a history of this user ignoring requests and being generally obstinate. Vininn126 (talk) 16:23, 8 May 2023 (UTC)
1. I don't see that the edit summary clearly violates paragraph 2, which is largely precatory, attempting to encourage niceness.
2. I certainly see no violations of paragraph 3.
I think all that can and should be done is to encourage Thadh to be nicer, because some may find negative evaluations harsh and then complain. DCDuring (talk) 17:05, 8 May 2023 (UTC)

OED1 transcription?

(I looked and wasn't able to find an earlier discussion of this on English Wiktionary: sorry if I missed one!)

The first edition of the Oxford English Dictionary was first published in 1928, so next year will be its 96th year of publication. I believe (I am not a lawyer!) that this means that the whole OED1 will be in the public domain in most places by the beginning of next year. (Unfortunately that doesn't include the rather important 1933 First Supplement.) Obviously there would be great benefit in linking OED1 entries for words from the corresponding English Wiktionary entries and/or incorporating OED1 material straight into the English Wiktionary entries. Is anyone aware of any plans being made anywhere to transcribe the OED1? It would definitely be a really big undertaking, but not an unthinkable one: it shouldn't be dramatically bigger than Wikisource's 1911 Encyclopædia Britannica transcription, for instance. And even just a list mapping headwords to page numbers and page locations would be enough to be extremely useful. Similarly, are there any thoughts about how the material could or should be linked to or incorporated into English Wiktionary?

If a transcription were to be done as a Wikimedia effort then I assume that English Wikisource would be the most obvious place for it. I've started an equivalent discussion there. RW Dutton (talk) 21:13, 6 May 2023 (UTC)

The entire first edition is already online as scans. OCR is unreliable on the small print. See {{R:Oxford English Dictionary}}. Vox Sciurorum (talk) 21:33, 6 May 2023 (UTC)
Yes, the Internet Archive thankfully has scans already (though it also seems that the scans could be better). Again I'm not sure of the copyright situation of those volumes which were issued after 1927, at least in the UK and EU, but I'm not a lawyer. In any case, the more of the OED1 and First Supplement is out of copyright sooner in more places, the stronger is the argument to start work on transcribing earlier. Getting a good transcription is going to involve some combination of manual effort and more carefully tuned OCR: hopefully mostly the latter, but definitely lots of the former, too.
However you also don't need a gleaming transcription to start incorporating OED1 into the English Wiktionary. Even if you just have a list mapping all the headwords to their volumes, page numbers and page locations you have enough to turn each OED1 entry into one or more little image files which can be linked straight from the corresponding English Wiktionary entry. That would obviously be far from an ideal end state (especially, though not only, for visually impaired users) but it would still be obviously much better than having no OED integration, and it could serve as a waypoint to a much more complete conversion. RW Dutton (talk) 22:12, 6 May 2023 (UTC)
In the UK/EU, those volumes worked on by William Craigie (N, Q–R, Si–Sq, U–V, and Wo–Wy) won't be out of copyright until 2028, and those by Charles Talbut Onions (Su–Sz, Wh–Wo, and X–Z) until 2035, and I'm not sure if the contributors might extend that more.--Prosfilaes (talk) 23:09, 6 May 2023 (UTC)
Yes: I don't want to speculate too much about this here, partly because I'm still not a lawyer, but I think it's possible that the (pre-Supplement) OED1 will still be in copyright in the UK in 2024. But I would guess that that might not be the case in the US, and IIUC the Wikimedia Foundation is US-based. This is really something that the Foundation should get copyright laywers to review: OED1 is important enough that this clearly deserves the WMF's money and attention. (OFC it's possible that WMF has already looked into this quietly, in which case I wouldn't know about it.) RW Dutton (talk) 14:29, 24 October 2023 (UTC)
? As I said above, most of N-Z will be in copyright in the UK and EU until 2028, and parts until 2035, with legal arguments possibly extending it more. The Foundation doesn't get involved with stuff like that; the DMCA means they're more protected if they don't touch it until someone sends a DMCA letter. But anything published before 1928 is out of copyright in the US. We could transcribe this on Wikisource, but it would be a lot of work. We can't upload it to Commons until it's out of copyright in the UK.--Prosfilaes (talk) 01:01, 31 October 2023 (UTC)

are language names proper nouns in Czech?

@Atitarev, Vininn126, TomášPolonec, Solvyn, Zhnka Normally I would say that language names are proper nouns, but in Czech they are lowercase, cf. etruština "Etruscan (language)", španělština "Spanish (language)". It's true that we don't always go by capitalization (e.g. we consider capitalized Czech nouns like Američan "American (person)" to be common nouns), but it seems strange to me that a lowercase term is considered a proper noun. Currently there is no consistency in whether the language names are assigned to common or proper nouns. Thoughts? Benwing2 (talk) 21:40, 7 May 2023 (UTC)

@Benwing2: Common noun (podstatné jméno) as is the case in the majority of languages. English is one of the exceptions here. Anatoli T. (обсудить/вклад) 22:16, 7 May 2023 (UTC)
Agreed. --TomášPolonec (talk) 06:32, 8 May 2023 (UTC)
@Benwing2, @TomášPolonec: Thanks. I noticed long ago that editors here at Wiktionary (not necessarily out there with other dictionaries) use(-d) proper nouns for language names, even when there is not capitalisation distinction in a given language (e.g. Chinese, Japanese, Korean, etc.). We had a few discussions before regarding demonyms are more often proper names than language names, compare French Français vs français but everything is lower case in Spanish, Portuguese, Romanian. In Slavic languages, West Slavs, Serbo-Croatian, Macedonian, Slovene capitalise demonyms but not language names, all lower case with East Slavs and Bulgarians. All capitalised with Dutch, English and obviously German.
If I'm not mistaken, demonyms in West Slavic languages are considered (or should be) common nouns, even if they are capitalised. The Serbo-Croatian demonyms might need a clean-up or clarifications. Macedonian and Slovene demonyms are common nouns, despite the capitalisation. Anatoli T. (обсудить/вклад) 07:39, 8 May 2023 (UTC)
I don't think capitalization is everything - language names are proper nouns and people, while often capitalized, are common nouns. Vininn126 (talk) 07:40, 8 May 2023 (UTC)
@Vininn126 I can't speak for all languages, but in Slovak only proper nouns are capitalized (excluding capitalization of pronouns in letters and other things) and there are more languages that do this. In Slovak, language names are common nouns and names for members of nations or nationalities are proper nouns (according to the official guidelines). --TomášPolonec (talk) 09:50, 8 May 2023 (UTC)
Sure but that doesn't hold true for all languages, i.e. in Polish. Relying on it to determine the type of noun isn't... reliable. Vininn126 (talk) 09:53, 8 May 2023 (UTC)
Logically speaking, IMO language names should be proper nouns because they refer to unique entities (there's generally only one language of that name) while demonyms should be common nouns (because they refer to a class of entities of a given type). I think that is what User:Vininn126 is getting at. However, it does seem like different languages view these choices differently. Benwing2 (talk) 10:19, 8 May 2023 (UTC)
@Vininn126 Exactly, that's what I was trying to say, that different languages handle these things differently, so generalization isn't possible :) I think that words should be categorized according to the function in their respective languages, so e.g. the Slovak word Maďar as a proper noun, while maďarčina as a common noun, while the opposite way in Polish. --TomášPolonec (talk) 11:53, 8 May 2023 (UTC)
  • What would be the implication in terms of syntax (or anything else) of declaring something to be a proper noun rather than a common one? Do normal users care? I suspect that it is a matter of custom, which may differ among linguistic communities.
    In Latin, the various gentes (tribal/family names, which seem like demonyms to me) are called proper nouns, taxa are considered proper nouns (though frequently used metonymically to refer to constituent taxa and even specimens). DCDuring (talk) 16:44, 8 May 2023 (UTC)
    One implication for syntax is in the marking of definiteness, e.g. in the use of articles. In English and French, the rules for definiteness don't correspond with capitalisation, and one can get some complications, with horrible effects on finding things in an electronic dictionary. For example, at Cambridge, England, Fitz means 'Fitzwilliam College', whereas the Fitz means 'the Fitzwilliam Museum', at the opposite end of the city. In English, the spelling rules for pluralisation also depend on 'proper noun' v. 'noun' - Cy+s > -Cies does not apply for proper nouns! --RichardW57m (talk) 12:13, 9 May 2023 (UTC)
    This is not applicable to non southern Slavic languages, since they do not mark definiteness. Vininn126 (talk) 12:15, 9 May 2023 (UTC)
    @Vininn126: A Polish equivalent of 'a' is reported to be on the rise - https://www.ejournals.eu/pliki/art/11902/&usg=AOvVaw0WNuBoIMpdrcoF71IrLV72 --RichardW57m (talk) 13:06, 9 May 2023 (UTC)
    Eh, that's very very nonstandard also beside the point. Vininn126 (talk) 13:08, 9 May 2023 (UTC)
    @RichardW57m: And how much does the application of the label 'proper noun' simplify the explanation for English, for other languages? DCDuring (talk) 14:04, 9 May 2023 (UTC)
    @DCDuring: Sorry, please rephrase your question. I can interpret it too many ways. --RichardW57m (talk) 15:52, 9 May 2023 (UTC)
    How much does a good explanation of the behavior of names of specific things depend on those names being labeled differently from common nouns? DCDuring (talk) 16:00, 9 May 2023 (UTC)

Languages present as both etymology-only languages and full languages

Module:data consistency check finds that several languages are listed as both etymology-only and full languages:

  • Bashkardi language (bsg-bas) has a canonical name that is not unique; it is also used by the code bsg.
  • Khasa Prakrit language (inc-khs) has a canonical name that is not unique; it is also used by the code inc-kha.
  • Ardhamagadhi Prakrit language (inc-pka) has a canonical name that is not unique; it is also used by the code pka.
  • Magadhi Prakrit language (inc-pmg) has a canonical name that is not unique; it is also used by the code inc-mgd.
  • Maharastri Prakrit language (inc-pmh) has a canonical name that is not unique; it is also used by the code pmh.
  • Sauraseni Prakrit language (inc-pse) has a canonical name that is not unique; it is also used by the code psu.
  • Paisaci Prakrit language (inc-psi) has a canonical name that is not unique; it is also used by the code inc-psc.
  • Rudbari language (rdb-rud) has a canonical name that is not unique; it is also used by the code rdb.
  • Chali language (tks-cal) has a canonical name that is not unique; it is also used by the code tgf.

Let's decide for each whether it should be a full language or etymology-only; it can't be both. - -sche (discuss) 01:07, 8 May 2023 (UTC)

@-sche The intention of the South Asian editors, established by consensus maybe a year or so ago, was to switch the Prakrit languages to be etymology variants of a single Prakrit language. However it seems this is a long process and hasn't been finished. Benwing2 (talk) 08:39, 8 May 2023 (UTC)

Toxic, aggressive, unfriendly policy from the admin Vininn126

Everything is in my opinion: This admin has been hounding me around the project for a long time. In the discord on the Wiktionary server, he constantly insulted me, including on the basis of nationality and nation, mockery and aggression. He constantly threatens to block me for nothing, thereby provoking mental discomfort. His messages contain trolling. This admin is engaged in substitution and psychological manipulation. (UCoC 3.1 & 3.2) I ask this admin to stop these actions and unacceptable behaviour. Gnosandes ❀ (talk) 17:28, 10 May 2023 (UTC)

Vininn has never insulted you on the basis of nationality and nation, that's an outright lie. He has also not done anything unacceptable or not in line with UCoC, nor has he trolled you (in fact, the other way around did happen multiple times). This is a bullshit tactic to distract everyone from the fact that you have been going around using Wiktionary as a website to host your conlang-like reconstructions that ignore scholarly consensus.
If you feel "mental discomfort" by an admin patrolling you after several blocks, I suggest you find another passtime than editing Wiktionary. Thadh (talk) 17:59, 10 May 2023 (UTC)
I have explained many times why you were blocked, it's not for nothing, that's just simply not true. I told you that if you did x, i.e. continue to stop leaving sources you will be blocked, or blocking for edit warring, which is standard practice. "For nothing" is manipulation and simply not true. I have no idea what trolling you are referring to. I have explained everything to you very clearly and stuck to direct communication and action, which is the opposite of manipulation. Vininn126 (talk) 18:15, 10 May 2023 (UTC)
Everything is in my opinion: Besides, they do it all the time as a group (Thadh + Vininn126), it will be noticeable from history. They are conducting an aggressive, toxic policy against me. Completely wrong. This is not a tactic and I do not use Wiktionary as a website to host reconstructions like conlang. It is not written in UCoC that I should find another pastime, obviously I reject the admin's offer because UCoC is above him. Gnosandes ❀ (talk) 18:38, 10 May 2023 (UTC)
It is completely unclear why Thadh and Vininn126 got into my cute and cozy dialogue with Synotia. In my opinion, this is a special kind of provocation. The admin Thadh was talking about some kind of my model, although I don't have any model. Because there are developments by Starostin and Nikolaev, as well as Schrijver and Nichols, I only follow the suggestion that Nichols considers the Nakh like Finnic. Gnosandes ❀ (talk) 18:47, 10 May 2023 (UTC)
You could've asked for a clarification but instead you're doing this. By "model" I mean a defined phonology and morphology of the protolanguage and its relation to the descendants. Thadh (talk) 18:59, 10 May 2023 (UTC)
I'm complaining about more than that. I don't bother you with strange questions about the Ingrian language. I only saw the typological form in this language and made an edit, but you canceled it. Gnosandes ❀ (talk) 19:11, 10 May 2023 (UTC)
Yes, but I'm not the problematic user who has been blocked for not giving sources before, you are. You are also not the admin whose job it is to ensure that our dictionary is up and running with verifiable, quality information; I am. Thadh (talk) 19:13, 10 May 2023 (UTC)
The explanation is that these account blockages were made for nothing. I think that's not why you became an administrator, if you ensured something in the dictionary, you would monitor the entire project, but this is not the case. You'd probably be better off pointing at the noses of various users and telling them where their noses don't need to be poked (from your comment). Or something similar. Gnosandes ❀ (talk) 19:32, 10 May 2023 (UTC)
If you think edit warring and the rest are nothing, then I think you have a core disagreement with the project, not individual editors. Vininn126 (talk) 19:33, 10 May 2023 (UTC)
Stop making up stuff for me. I didn't say that, but I notice that this isn't the first time you've done it. Gnosandes ❀ (talk) 19:36, 10 May 2023 (UTC)
? I stated my reasons above. Are you ignoring them saying they are nothing? Vininn126 (talk) 19:37, 10 May 2023 (UTC)
You've come up with something again. Perhaps I'll wait for another administrator who will sort everything out in detail and write. I have complaints about you... and admin Thadh above.
By the way, the administrator Thadh can reconstruct accents in Old East Slavic without sources, but he forbade me to do it and also pointed out after that that he didn't care about it, but then he came back to it. Gnosandes ❀ (talk) 19:47, 10 May 2023 (UTC)
I never forbade you to reconstruct the OES accent, and I removed the accent that turned out to be incorrectly reconstructed by me. You can't just make stuff up, then say that every single admin that disagrees with you doesn't suit you, and say that the justified continuation of monitoring your edits is "harassment" and "hounding".
And I do also monitor other editors. You're not special. Thadh (talk) 20:55, 10 May 2023 (UTC)
You should also know that many sources provide libraries, only through universities or the like, and many people will not be able to verify the information because you use a paid, closed source. Gnosandes ❀ (talk) 19:34, 10 May 2023 (UTC)
That doesn't matter. You should still add these sources. Thadh (talk) 20:56, 10 May 2023 (UTC)

Surname clogging

I stumbled upon this category and it's overclogged by just surnames. Can't we make separate categories for surnames? Synotia (talk) 14:49, 11 May 2023 (UTC)

Personal and place names, demonyms and similar stuff. Similarly with “request for etymology” entries. Even when you go to Category:Requests for etymologies in Latin entries it is full of this stuff, but the investigations of these terms are unlike. Fay Freak (talk) 15:19, 11 May 2023 (UTC)
I don't understand what you mean? Synotia (talk) 15:51, 11 May 2023 (UTC)
I have not answered your question either, but pointed out that the category separation would have to be broader. Fay Freak (talk) 16:43, 11 May 2023 (UTC)
Create a standard category, "X proper nouns derived from Y" below "X terms derived from Y". Add an argument to {{bor}} and {{der}} (but not {{inh}}) to indicate that the derived term is a proper noun. If set, the derived term moves to the lower category. Have a bot set this argument when all definitions under an etymology are for proper nouns. Vox Sciurorum (talk) 14:39, 13 May 2023 (UTC)
The nature of Wiktionary's stateless parsing makes it impossible to solve this in any "nice" way. However, you can use Petscan to solve this type of query: . This, that and the other (talk) 06:24, 12 May 2023 (UTC)
Not everyone is going to use this tool... --Synotia (talk) 07:58, 12 May 2023 (UTC)
Potentially {{bor}} and the like could check the page contents to see if it's a proper name and put it in a different category if so. I have generally avoided doing this sort of stuff for fear of hitting server limits but User:Theknightwho seems to think it's a viable approach in many circumstances; can you comment? Benwing2 (talk) 22:55, 16 May 2023 (UTC)
@Benwing2: Transliteration category is better than borrowed terms category, IMO.
Please consider an example like Котрика́дзе (Kotrikádzɛ) - the first definition line {{name translit|ru|ka|კოტრიკაძე|type=surname}} mentions the origin, the second adds to the category {{surname|ru|from=Georgian|xlit=Kotrikadze}} Category:Russian terms derived from Georgian. It doesn't add to Category:Russian terms borrowed from Georgian, which is good but ideally, it should add to Category:Russian transliterations of Georgian terms
I think these two methods can be merged into one but I don't know how. It's a bit of a challenge. Anatoli T. (обсудить/вклад) 23:08, 16 May 2023 (UTC)
@Atitarev The main issue I have found in this case is the criterion for distinguishing name translits from borrowed names. I am not really sure where one stops and the other starts. Potentially any name from language or culture X can also be a name in language or culture Y if a person with that name from X moves to Y. E.g. could Shevardnadze be an English-language surname if a Georgian with that name moves to the US and has children? What about a Slavic given name like Igor? Benwing2 (talk) 23:19, 16 May 2023 (UTC)
@Benwing2: I think both Shevardnadze and Igor can fall into transliteration categories without clogging the borrowed term categories. You see, the entry Котрика́дзе (Kotrikádzɛ) says "from Georgian". I forgot to mention it was added to Category:ru:Georgian surnames as well, they are subcategories of Category:ru:Foreign personal names and deeper under Category:ru:Names. Category:Russian surnames from Georgian is also under Category:Russian surnames. So I don't see anything missing or redundant but I had to use two definition lines.
I normally don't add personal names into borrowed term categories. For countries or other place names only when the case is less than obvious, e.g. Khmer ប៉ូឡូញ (poulouñ, Poland) is from French. Anatoli T. (обсудить/вклад) 23:30, 16 May 2023 (UTC)

Automatic citations based on ISBN are broken

Apologies if this message does not reach you in your favorite language. You can help translate it centrally at Meta. Thanks for your help.

We have recently become unable to access the WorldCat API which provided the ability to generate citations using ISBN numbers. The Wikimedia Foundation's Editing team is investigating several options to restore the functionality, but will need to disable ISBN citation generation for now.

This affects citations made with the VisualEditor Automatic tab, and the use of the citoid API in gadgets and user scripts, such as the autofill button on refToolbar. Please note that all the other automatic ways of generating citations, including via URL or DOI, are still available.

You can keep updated on the situation via Phabricator, or by reading the next issues of m:Tech News. If you know of any users or groups who rely heavily on this feature (for instance, someone who has an upcoming editathon), I'd appreciate it if you shared this update with them.

Elitre (WMF), on behalf of the Editing team.

MediaWiki message delivery (talk) 19:45, 11 May 2023 (UTC)

Kumzari language: Ancestry and script

Please specify Arabic script as the script for Category:Kumzari language, and Middle Persian as the immediate ancestor. Thanks!--Saranamd (talk) 13:06, 14 May 2023 (UTC)

@Saranamd: Done Done.
Shouldn't the diacritics be removed from the entry name? — Fenakhay (حيطي · مساهماتي) 13:24, 14 May 2023 (UTC)
@Fenakhay Kumzari is not traditionally a written language. The speaker community (with the help of field linguist Erik Anonby) devised a variant of the Arabic script in 2009 and is trying to implement this in primary education. Most young children are monolingual in Kumzari, so there is apparently real demand for this.
Unfortunately, the 2015 dissertation written by Anonby's wife, the only comprehensive source on the language I could find, does not discuss the script other than giving a table of one-to-one correspondences between IPA and the alphabet. According to this table, the short vowels (and for some reason the long vowel /oː/) are supposed to be marked with diacritics. But there's no information on whether these are obligatory or not, so I decided to play it safe. There's precedent for languages with Arabic script having obligatory vowel diacritics, like Kashmiri.
The Anonbys are apparently currently working on an dictionary of Kumzari that includes Arabic script, where hopefully these issues (and some others) will be clarified. I'm currently slightly confused by the alphabet table as given, since it would suggest that e.g. the pronoun /toː/ is supposed to be written as تُ which seems a bit unnatural.--13:40, 14 May 2023 (UTC) Saranamd (talk) 13:40, 14 May 2023 (UTC)
@Saranamd: That seems odd indeed. Better than nothing I guess. There's one Kumzari entry hau in Latin script, it should be moved to آو (āw), I guess? — Fenakhay (حيطي · مساهماتي) 14:27, 14 May 2023 (UTC)

Slavic Headwords

@Benwing2, @Sławobóg, @Stujul and maybe... @Erutuon? Slavic headwords are way too crowded. I think we should remove a few things, namely genitive singulars and nominative plurals. We have declination tables for that - it's useful for low-morphology languages like English but not for languages like Slavic. I think we should stick to augmentatives, diminiutives, as well as masculine, feminine, and neuter equivalents for noun, comparatives and adverbs for adjectives, and aspectual variations for verbs.

On aspects, we might want some templates like {{iterative of}} with things like {{distributive of}} and other similar ones, Polish has the prefixes po- to mean both "one after the other" as well as "for a while", and na- to mean "a lot of". But the headwords are more important. Vininn126 (talk) 12:55, 15 May 2023 (UTC)

Would note that this is not solely a Slavic headword problem. Take a look at Hund#German. By no means are all of these baseless, but... "weiblicher Hund"? "männlicher Hund"? Really? Hythonia (talk) 13:11, 15 May 2023 (UTC)
@Vininn126 I would also disagree that this is just a Slavic thing. Either genitive singular or nominative plural is also provided for Latin, Ancient Greek or Arabic, which are by no means low-morphology languages like English. These are important basic forms commonly found in dictionaries, which can be used to decline the word even without the declension table. While I agree that we should keep it simple, having two or three forms for most nouns is by no means "crowded" in my opinion. But if any form were to go, I would definitely go with nominative plural. --TomášPolonec (talk) 13:39, 15 May 2023 (UTC)
@TomášPolonec I limited to Slavic langauges because it's easier to make smaller changes than one giant one. Vininn126 (talk) 13:44, 15 May 2023 (UTC)
@Vininn126 I agree. But I would still let the genitive be. For example when I am looking up a Latin verb or a noun, I usually find all the information I need in the headword without having to look though the table of forms. I can imagine the same could apply for someone looking up a Slavic word. It's like having an infobox on Wikipedia - all the information is in the article anyway, but if you only need a specific basic piece of information or a quick overview, infoboxes are really helpful. I have nothing against removing the nominative plural, though. --TomášPolonec (talk) 14:50, 15 May 2023 (UTC)
@TomášPolonec Keeping the genitive only makes some sense for some languages - for many the genitive is completely irregular while the dative is more unpredictable etc, etc, which is why it makes more sense to just have it in the declination table. If there's no need for a table fine. Vininn126 (talk) 14:58, 15 May 2023 (UTC)
Regarding "weiblicher Hund", mentioned above: adding that kind of thing is someone's (bad) pet project; I've removed it from an entry or two, but haven't felt like removing it systematically, though it'd be good to do so — the shoehorning-in of lots of rare nonstandard alternatives to the standard diminutives/inflections, and lots of etymologically-unrelated hyponyms, is undesirable. I don't edit Slavic as much, so the editors above are better positioned to know exactly which forms to keep vs drop, but I support the general principle that if a headword for a short word (i.e. Hund as opposed to the full name of titin) is more than 2 lines long on desktop, it probably has too much stuff in it, and we should migrate some elsewhere. Hund was four lines for me on desktop and ten lines on mobile until I cut some but not yet all of the stuff that shouldn't be on the headword line out of it just now. If there are so many 'form-slots' that the headword "line" actually takes up multiple lines, and some of them are predictable and/or low-importance, move those to inflection tables. If there's a standard form (e.g. standard past of laugh is laughed), generally don't give a long list of rare obsolete dialectal alternatives like low prominence on the headword line, put them elsewhere. If something has low connection to the entry, like etymologically-unrelated words for hyponyms, that doesn't go on the headword line. - -sche (discuss) 03:34, 16 May 2023 (UTC)
@Vininn126, TomášPolonec, -sche I admit to having added the genitive plural to Russian nouns, on the grounds that for some nouns the genitive plural is unpredictable (e.g. with some feminines and neuters it's not clear whether they are reducible or not, which shows up only in the genitive plural, and for nouns of accent pattern e, this doesn't show up in the nom sg, gen sg or nom pl). Since all the inflected forms (gen sg, nom pl and gen pl) are auto-generated for Russian (as well as e.g. for Ukrainian and Belarusian), we could make the code smarter so it only lists the nom pl and gen pl if they are in some way unpredictable; this should reduce some of the chaff. I also agree about not adding low-frequency or etymologically unrelated derivational terms to the headword as with Hund (btw I think it was User:Catonif who asked me to remove some low-frequency forms from Italian headwords; apologies for not getting to this yet, I got hit by Covid right when I returned from vacation). I was actually thinking of implementing the Russian/Ukrainian/Belarusian-style headword for Czech, where the headword just copies the inflection line and auto-generates the relevant forms rather than expressing the same information in a different format; this might be a good place to start implementing "smart" auto-generation of inflections. Benwing2 (talk) 22:51, 16 May 2023 (UTC)
@Benwing2 If it's only listed when unpredictable, I'd be much better with that. I'm glad we all seem to agree that we should remove chaff such as low-frequency or etymologically unrelated. Vininn126 (talk) 22:54, 16 May 2023 (UTC)

Blocked by Surjection

I was never warned and I am blocked over making only 1 revert and I already promised not to revert, but still this punitive block is being kept by the blocking admin (who himself said "Blocks aren't punitive, they're preventative") even after long discussion about it. Editorkamran (talk) 21:24, 15 May 2023 (UTC)

I don't see anything out of the usual in this block. You were reverted for removing a sense out of process, instead of discussing this edit you went on to revert it again. Your block will end in two days and it's limited only to this one page which is apparently very controversial. Honestly, we should look into protecting the page, as it seems a frequent target of vandalism/content removals. Thadh (talk) 21:54, 15 May 2023 (UTC)
Thadh A single revert is obviously not a blockable offense. I never even had a warning. See Wiktionary:Blocking_policy#Explanation. This block violates the policy that I just read. Editorkamran (talk) 22:00, 15 May 2023 (UTC)
It's the act of removing it out of process and then undoing the edit that forces you to try and remove it out of process. Vininn126 (talk) 22:02, 15 May 2023 (UTC)
You're acting as if anything on that page proves your point, whereas it's a guideline that hedges everything it says and makes no absolute statements. In this case, Surjection decided that it was more important to protect the page's contents than to potentially slightly inconvenience you for three days - I can't say I disagree. Thadh (talk) 22:28, 15 May 2023 (UTC)
Lets say you are correct but how I would know that? I was never even told that a single "out of process" revert can lead to block. Warning was essential but I had none and my talk page. There is nothing which justifies this block. And when I said I won't revert again then the block is supposed to be removed because "Blocks aren't punitive, they're preventative". So why I am still blocked? Editorkamran (talk) 22:31, 15 May 2023 (UTC)
Lack of knowledge of a law does not excuse one from breaking the law. Vininn126 (talk) 22:34, 15 May 2023 (UTC)
You are on Wikitionary, not in real world. See the difference. Wikitionary blocks are explained by Wiktionary:Blocking_policy which are contrary to every single message you have made here so far. If policies cannot be applied then why they are even kept here? Editorkamran (talk) 22:41, 15 May 2023 (UTC)
@Editorkamran We have no reason to trust you that you won’t do it again. In fact, saying that it was “supposed to be removed” after you said you won’t do it again strongly suggests you only said it because you wanted to be unblocked. Theknightwho (talk) 22:38, 15 May 2023 (UTC)
You will have "no reason to trust" only if you have evidence for it. You are supposed to assume good faith, otherwise why do we even have a provision for seeking unblocks here? Editorkamran (talk) 22:41, 15 May 2023 (UTC)
@Editorkamran You gave me reason to doubt it when you acted like saying you wouldn’t do it again meant you were entitled to be unblocked, because it strongly suggests you had an ulterior motive for saying it. That is compounded by the fact that the inconvenience to you is pretty small, which makes your reaction seem completely disproportionate. Theknightwho (talk) 23:07, 15 May 2023 (UTC)
Thanks for explanation but I don't have any ulterior motive. Editorkamran (talk) 23:37, 15 May 2023 (UTC)
When an admin reverts an edit the edit summary normally includes "If you think this rollback is in error, please leave a message on my talk page." Vox Sciurorum (talk) 22:48, 15 May 2023 (UTC)
Wasn't it supposed to be something else more accurate? Like: "If you want to revert this rollback, then first gain consensus or be blocked." Editorkamran (talk) 22:59, 15 May 2023 (UTC)
Since the block is only for one page and only for one day, I see no issue with it; you're saying you won't revert on that page again, so ... why do you need to be able to edit it immediately? Frankly, the more you make a mountain out of this, the better an idea the block seems to me.
With that said, I think we should file a Phabricator request to rename page "blocks" to something else, e.g. "restrictions", because both editors who receive them and other people who see "Editor So-and-so was blocked" do often react as if they're, well, blocks — as if the user has been banned from editing the project as a whole, for the sort of pervasive harm only that step can stop — and that seems to increase the frequency with which people have this "why was I BLOCKED?!?!" reaction (also on Wikipedia), as if they don't realize they've merely been restricted from warring over a single page and can just edit literally any other entry. In this case, the "block" from one page for one day is a far less restrictive move than semi-protecting the page for a month while the RFV proceeds, which is what would have occurred to me to do (and what I've done now), but people react more strongly to "blocks". - -sche (discuss) 03:49, 16 May 2023 (UTC)
I agree completely with - -sche. (Would you believe almost completely: apparently the "block" was for 3 days?). —DIV (1.145.63.208 14:05, 25 May 2023 (UTC))
The fact that this was posted here made me look closer, and this user was blocked on the English Wikipedia for edit warring, so there is literally no excuse. — SURJECTION / T / C / L / 06:30, 16 May 2023 (UTC)

To the removal of the Proto-Northeast Caucasian and Proto-North Caucasian

  1. Proto-North Caucasian. I think we should have removed everything related to Proto-North Caucasian, everything that it contains in itself and Proto-North Caucasian itself. Simply on the grounds that this is not a family, but a superfamily, which, moreover, is not yet a proven superfamily.
  2. Proto-Northeast Caucasian. There are still no good reconstructions about the Proto-Northeast Caucasian. Perhaps the only revision of the reconstruction of Starostin and Nikolaev (1994) was the work of Nichols (2003). However, she uses the # sign, which stands for pseudo-reconstructions, which was introduced by Williams (1989). Here everything (Appendix:Proto-Nakh-Daghestanian reconstructions) is incorrectly indicated by an asterisk, this can be misleading. This family has not been proven.
  3. Proto-Northeast Caucasian splits into two branches Proto-Nakh and Proto-Daghestanian. Proto-Nakh is quite well reconstructed, which is not to say, of course, about the Proto-Daghestanian. For some reasons, the Proto-Daghestanian can be left and not deleted, although it has also not been proven. Although recently, for example, in the work Schrijver (2021), the Proto-Nakh are compared with the Proto-Tsezian and Proto-Avaro-Andian. However, the Proto-Avaro-Andian is not worked out properly. And besides, Proto-Tsezian forms is adjusted to the data of Proto-Avaro-Andian.
  4. In accordance with paragraph 3, the modern comparison is not with the Proto-Nakh and Proto-Daghestanian, but directly with the Daghestanian groups (Proto-Tsezian and Proto-Avaro-Andian). Which makes one doubt the existence of Proto-Daghestanian.
  5. Proposed trees:
  • Daghestanian dgn 🌳 dgn-pro
    • Avaro-Andian dgn-ava 🌳 dgn-ava-pro
    • Dargic dgn-drg 🌳 dgn-drg-pro
    • Khinalug kjj
    • Lak lbe
    • Lezgic dgn-lzg 🌳 dgn-lzg-pro
    • Tsezic dgn-tsz 🌳 dgn-tsz-pro
  • Nakh nkh 🌳 nkh-pro
    Support Proto-North Caucasian.
    Oppose Proto-Northeast Caucasian's deletion as a code - you can just not link and use {{inh|LANG|cau-nec-pro|-}} when words are undoubtedly related. I haven't ever heard anyone doubt the validity of Northeast Caucasian as a language family before, and its exact cladification is under serious debate as far as I know, so simply splitting into "Nakh" and "not-Nakh" is not a good idea here. Thadh (talk) 13:50, 16 May 2023 (UTC)
    @Thadh: Okay, but we don't have high-quality reconstructions, even of Dagestanian groups. Perhaps only Proto-Tsezic. And it turns out that this is mostly a philosophical(?) classification, not a linguistic one. ɶLerman (talk) 14:59, 16 May 2023 (UTC)
    @ɶLerman Gnosandes, I seriously suggest you stop sockpuppeting. You posted this argument on my talkpage a few days ago. Pinging @Thadh @Surjection @Chuck Entz Theknightwho (talk) 21:30, 16 May 2023 (UTC)
    I don't see any issue with a person using an alt account unless he is blocked. Thadh (talk) 21:31, 16 May 2023 (UTC)
    @Thadh This appears to be an attempt to build false consensus. Theknightwho (talk) 23:08, 16 May 2023 (UTC)
    @Theknightwho: @Thadh: No, I created this account a month ago when I was blocked by Vininn126, but I didn't edit through it. That account, in some way, is strongly connected with accentology, besides, V. Dybo died 10 days ago, and for some reason, I don't want to go back to that account anymore.
    Further, since you decided to ignore the suggestions, I published it not on your discussion page, but here. ɶLerman (talk) 13:31, 17 May 2023 (UTC)

Retiring the Rhymes namespace

@Fenakhay @Mahagaja @Theknightwho @Thadh I believe it's time to retire the rhymes namespace. There is nothing they can do that categories can't do better - categories can automatically update as well as keep the same information you put into the rhymes namespace. I propose we find any special information in the rhymes namespace and move them to the appropriate category and permanently retire the namespace (I say this as someone who essentially singlehandedly built the Polish Rhyme namespace - hundreds of my pages would be deleted but I see no point in keeping something I have to update manually). Vininn126 (talk) 21:25, 16 May 2023 (UTC)

I was under the impression that this was already the idea when we updated the rhyme template a while ago. Thadh (talk) 21:27, 16 May 2023 (UTC)
Also @Surjection, Equinox might be interested in this. Thadh (talk) 21:28, 16 May 2023 (UTC)
I see absolutely no need for the namespace if {{rhymes}} automatically links to the category. Vininn126 (talk) 21:30, 16 May 2023 (UTC)
I have just a couple of questions:
Would it be possible to still put notes like
  1. In non-rhotic accents, words ending in /-æmə(r)/ are rhymes for words on this page.
if the pages are moved to categories? We would need to manually type that text onto each page.
Also, would it be possible for categories to allow navigation, as we currently have at pages like Rhymes:English/əʊp...? One benefit of the current setup is that you can start with the accented syllable and see what words lie ahead instead of having to look up whole words at a time. In theory, the current setup could also allow navigation in the opposite direction, but it isn't well developed. Soap 21:41, 16 May 2023 (UTC)
@Soap The template {{rhymes}} automatically adds pages to the category and also a sub category if a parameter {{{s}}} is given, i.e. domawiać generates Category:Rhymes:Polish/avjat͡ɕ/3_syllables and Category:Rhymes:Polish/avjat͡ɕ Vininn126 (talk) 21:44, 16 May 2023 (UTC)
Okay thanks. It would be nice if we could have the convenience of the old system, where the reader could start with the accented syllable and see the possible rhymes that lie ahead, but I suppose that's not possible to do with categories that just show the whole rest of the word at once. But I suppose this is a work-saving measure and that the additional task of adding a rhyme everytime someone adds a pronunciation to a word is just not worth the benefit of a more convenient rhyme finder. Thanks for answering my other question, though. Typing out the special notices doesn't seem like it would change from the way it is now. Soap 21:58, 16 May 2023 (UTC)
Category:Rhymes:Polish has that first feature. Vininn126 (talk) 22:04, 16 May 2023 (UTC)
But there's no way to start with the first syllable of the rhyme and then see what lies ahead. There's no parent category for avjat͡ɕ such as avj or even a, as there is now in English. This means a reader has to scroll through thousands of entries just to get to the next letter in the alphabet, and for the vast majority of us, typing in the URL bar isnt going to help either because the rhymes are written in IPA. The current setup is more conveient by far.
Im not even against this change, because I know the current Rhymes pages are woefully empty, with probably more than half of the English entries not even listed, and that filling them all by hand would be a tremendous amount of work for relatively little gain. I dont expect anyone to do it, because I wont do it either. I just dont like the way this is being sold as a great big improvement when from what I can see we're sacrificing the convenience of having a navigation UI for the sake of having a more complete list of entries. Thanks, Soap 22:22, 16 May 2023 (UTC)
Would this be a possible compromise solution? Maintaining a sort of portal page so that readers aren't immediately dropped into the thousand-plus all-in-one rhymes category. We wouldnt need to manually update the portal page, since the phonemes of a language arent going to change. This came up several months ago and I made a similar comment then. Soap 22:26, 16 May 2023 (UTC)
Yes, anything like that you can do in a category. That is why the namespace is essentially obsolete. Vininn126 (talk) 22:29, 16 May 2023 (UTC)
I am absolutely opposed to retiring the Rhymes: namespace, which is much more useful than the categories. For one thing, the categories cannot include red or orange links, which the Rhymes pages can. For another, a whole lot of people who add the {{rhymes}} template don't actually understand how rhymes work, so they add nonsense like {{rhymes|xx|gat}}, because they don't know (1) that the rhymes are supposed to be given in IPA, so they need to use the IPA ɡ rather than the normal g, and (2) that rhymes always start with a vowel, so a word whose final stressed syllable is /ɡat/ needs to be put in {{rhymes|xx|at}}. The other kind of mistake I see a lot in English entries is people creating separate rhymes for different accents of English, e.g. tagging a word like lot with both {{rhymes|en|ɒt}} for RP and {{rhymes|en|ɑt}} for GenAm. However, the rhymes are supposed to be pandialectal so that all the words that rhymes with each other are gathered together in one place with no duplication, so lot should only have {{rhymes|en|ɒt}}, and there shouldn't be either a page or a category for "ɑt". Now, these mistakes potentially exist both for Rhymes: pages and for rhymes categories, but the difference is that the badly named Rhymes: page usually doesn't get created in the first place and so just chills as a harmless red link. But the badly named rhymes category gets created by bot within a few hours. So while I don't have to worry about pages like Rhymes:English:ɑt or Rhymes:Whatever:gat, I do have to manually depopulate and then delete CAT:English rhymes/ɑt and CAT:Whatever rhymes/gat, which is a pain the ass and a waste of my time. All this in addition to the points Soap brought up above about messages like "In non-rhotic accents, words ending in /-æmə(r)/ are rhymes for words on this page", which categories don't do. I suppose I feel strongly about this because I've put a lot of work over the past two years or so into setting up and maintaining Rhymes: pages for Welsh and would feel like it was all in vain if all my hard work got deleted for the sake of some brainless categories. If we feel like the Rhymes: namespace and the rhymes categories are redundant to each other and we only need one, then it's the categories we should get rid of, not the Rhymes: namespace. —Mahāgaja · talk 07:30, 17 May 2023 (UTC)
Just putting it in there so that this isn't overlooked: Some languages do rhyme from a consonant up (I seem to remember something like that with some Austronesian language, Indonesian or Tagalog I think?). This is to say that we shouldn't rush to making the "starting with a vowel" rule obligatory for the template. Thadh (talk) 07:55, 17 May 2023 (UTC)
@Mahagaja
1) How are red/orange links better?
2) Not knowing how rhymes should be added is unrelated to them being in a namespace or category. If the red link sits there for a while or is created by a bot, in either case, 99% of the time it's unmoderated because people do not check that.
3) Adding multiple rhymes/navigation is very easily addable to the template. Vininn126 (talk) 10:25, 17 May 2023 (UTC)
@Vininn126: (1) Red and orange links are good because if someone is looking for a word that rhymes with another word, there's no reason their search should be limited to words that already have Wiktionary entries. With pages, rhyming words can be added even when no one has yet gotten around to creating the entry. (2) The difference is that badly named pages that stay red links don't have to be deleted, while badly named categories get created by a bot that doesn't realize they're badly named, so they have to be depopulated and deleted afterward. (3) I'm not sure what you mean, can you give me an example? —Mahāgaja · talk 10:46, 17 May 2023 (UTC)
2) Depopulated categories are still deleted anyway, no?
3) You could have the categories set up to also include navigation by first syllable, etc, much like how it is in most rhyme pages. Vininn126 (talk) 10:56, 17 May 2023 (UTC)
@Mahagaja It would be straightforward to make rhyme categorisation more intelligent - I’m not sure why you think it has to be “brainless”. Plus, we can add lots of things to category pages - just look at language categories like Category:English language. I also don’t really see why redlinks are important for rhymes in a way that doesn’t apply to anything else we use categorisation for: what you propose requires manually adding everything, which has led to many rhyme pages being neglected. what you suggest massively increases workload - which means in many cases it just won’t happen. Theknightwho (talk) 12:48, 17 May 2023 (UTC)
All the content at CAT:English language is added by {{also}} and {{auto cat}} as far as I can tell. I'd like to see a prototype of a rhymes category where {{auto cat}} or something like it can generate an output as detailed and flexible as the current Rhymes: pages, including things like links to categories that rhyme in some accents and not others. But even then, all that content would have to be generated by module rather than added manually, which means the most editors won't be able to add anything, since most editors don't know how to code modules. I certainly never would have created the whole rhymes infrastructure for Welsh if I had had to do it in a module for {{auto cat}} or the like rather than doing it by hand in Rhymes: namespace. I wouldn't have had the remotest idea how to go about it.
Mahāgaja · talk 13:43, 17 May 2023 (UTC)
@Mahagaja As with most things in modules, there are layers: you don't need to know how to code Module:labels to add a new label to Module:labels/data. You're thinking in absolutes here. Theknightwho (talk) 13:46, 17 May 2023 (UTC)
I strongly support this approach. The issues Mahagaja raises are not insurmountable, and a lack of automation makes the task far harder. It would be like opposing declension modules because there are some dialectal differences. Theknightwho (talk) 07:32, 17 May 2023 (UTC)
I oppose retiring the namespace. In my opinion, Rhymes pages are superior to categories in terms of navigation and the ability to filter out incorrectly added entries. Also, as Mahagaja said, they cannot include links to terms that don't exist yet. (Indeed, I see rhyme categories as somewhat unnecessary.) For example, I find the subpages of Rhymes:Hungarian (mostly created by User:Adam78 a few years ago) much more useful compared to the corresponding categories, and I don't think their functionality can be grafted into the Category namespace. Einstein2 (talk) 09:59, 17 May 2023 (UTC)
What evidence do you have that leads you to believe it cannot be transferred? If that were not the case, would your opinion change? Vininn126 (talk) 10:27, 17 May 2023 (UTC)
Yeah - this is what doesn’t make sense to me: users declaring that rhyme functionality can’t be automated, when correspondences between dialects are usually pretty regular. The entire point of IPA is that it allows us to keep track of these things. Theknightwho (talk) 12:54, 17 May 2023 (UTC)
I oppose it too, mostly per the reasons by Mahagaja and Einstein2 (thank you for the credit). Adam78 (talk) 10:16, 17 May 2023 (UTC)
I'm not entirely sure how I feel about this but am leaning towards opposing, due to points from Mahagaja for one thing. Also @Adam78, I took a quick glance through the Hungarian rhymes pages and I have to say I am quite impressed with the sheer amount of work you have put into this. :) Acolyte of Ice (talk) 12:22, 17 May 2023 (UTC)
If nothing else, this needs to be made language-specific. Nobody is going to maintain Rhymes:Finnish or its subpages. Life is too short for that. — SURJECTION / T / C / L / 14:09, 17 May 2023 (UTC)
This will likely require a formal vote. That being said, I have to agree with Einstein that the rhyme pages, when done well, are much more easier to navigate than the categories (which is just by nature of how they work). Rhymes:French is so much better imho for someone who's new to Wiktionary or just a general reader (especially with examples!), than going through Category:Rhymes:French which to me is kind of a maze. And so, I'd likely oppose this change, unless the categories are made to be more user-friendly and show charts & tables as the Rhyme namespace can first. AG202 (talk) 14:59, 17 May 2023 (UTC)
Replying (very late) because I was pinged. I agree that theoretically the rhyme is a function of the pronunciation, and therefore should be automated as much as possible. (Plus, people often add wrong entries to rhymes pages, due to getting confused about the stress! That's annoying.) We can't just shut those pages off though, because we at least need to deal with the red links. But sure, it's like closing certain old Appendices. Equinox 07:34, 20 May 2023 (UTC)

Do we want to keep sub-entries that are effectively empty of content?

I've previously nominated a lot of articles for deletion, because all they contained were the Unicode definition of a character, and so were effectively empty of content. Having them as blue links in our lists made it look like someone had already covered them; restoring them as red links made it obvious that the work still had to be done (which in a few cases I did).

But what of equivalently trivial sections of article? For example, for the character á U+00E1 LATIN SMALL LETTER A WITH ACUTE, we have a "Translingual" section with the definition the letter a with an acute accent.

If that were the only content in the article, it would be deleted for having no content other than the Unicode definition, so should we also delete the "Translingual" section for being similarly trivial?

If we keep those sections, should we add similar "translingual" sections under e.g. aai that "aai" is a sequence of the letters a-a-i, since that trigraph is found in both Dutch and romanized Cantonese? Seems rather silly. kwami (talk) 22:00, 16 May 2023 (UTC)

It seems a bit silly to delete a definition just because it comes close to 'SoP', i.e. 'what it says on the tin'. (There are several Unicode misnomers.) In this particular case, the only real expansion that would be useful for the translingual meaning would be to list the languages where it is used but has semantics that are no more than the sum of the parts, e.g. Spanish, Dutch and Vietnamese. Now, the combination may be idiomatic in some languages, e.g. with slight idiomaticity in older orthographies where 'vowel plus length' doesn't completely nail down the significance, e.g. Icelandic. --RichardW57m (talk) 12:36, 23 May 2023 (UTC)
@Kwamikagami, please stop deleting definitions & marking things for deletion. You haven't even gone through the typical process of putting them for RFD. I've made it clear on my end that I don't think these should be deleted, and no one here has given any notice yet that they're fine with them being deleted either. CC: @-sche, @Theknightwho. AG202 (talk) 14:45, 11 June 2023 (UTC)
Deleting the "Unicode description only" entries (with no meanings, just a description like "letter A underlined") has been practice for quite a long time and I have been deleting such cases marked by kwami. Did not realise it was controversial and think it should be fine to delete. Equinox 14:49, 11 June 2023 (UTC)
If it's been practice for quite a long time, then the entries shouldn't have been created in the first place and there should be an explicit policy about them. Right now, it just looks like they're being deleted out of process, and it's even led to (assumingly accidental) deletion of IPA information as well as with gb. (Also just realized that I was reverted there as well, yet again.) It's also led to misformed entries as in Tlapanec (the usage notes section makes no sense for a single language). There should've been ample discussion reaching a consensus beforehand or at least an RFD discussion. AG202 (talk) 15:03, 11 June 2023 (UTC)
They also haven't been putting at the very least {{see also}} so that we can still get to the other some of the related letter entries, because with these changes, we're losing sections that point us to related letters, like ọ being able to point us to other letters with an underdot. AG202 (talk) 15:09, 11 June 2023 (UTC)
If we include those who are visually impaired among the users we serve, then a verbal description of something essentially graphic seems essential. DCDuring (talk) 15:31, 11 June 2023 (UTC)
What do you mean? We're not a screen-reader program. Equinox 21:32, 11 June 2023 (UTC)
DCDuring, the "definitions" are just the Unicode names, and so don't add any info, since the Unicode name is already found in the character info box. In any case, such wording, if needed, should go under 'description' -- it's not a definition. kwami (talk) 22:32, 11 June 2023 (UTC)
I thought that there were or will be programs that allow the visually impaired to read content from broad classes of websites. Is that not true?
@Kwamikagami What is this thing called "description"? How would a visually impaired user new to Wiktionary know where to look to find info about a character once the page was found? If the user came from WP they would perhaps look for an infobox. We don't have them AFAICT. Wiktionary is usable, at least potentially, for the visually impaired with good consistency, especially within languages. Definitions are in relatively standard locations. Would it be a fun challenge for a visually impaired person to find something useful in the entry for á. How would a visually impaired person find the entry, you ask? I don't know, but perhaps by copying something that confused the user's reader program and then pasting into the search box, maybe. Perhaps not for á, but for some more confusing of unusual Unicode character. DCDuring (talk) 01:23, 12 June 2023 (UTC)
DCDuring, we provide definitions. Software for the blind can already read text characters, and will (presumably these days) see a snowman emoji and read it as "picture of a snowman" or something. That's orthogonal to writing definitions. Equinox 01:29, 12 June 2023 (UTC)
AG202, the 'latin script' template that these prolific spurious-article creators have automatically placed in a 'see also' section will show other letters with an underdot, so we're not losing anything. If the article is recreated, the template will have the same behaviour. Also, the 'also' template at the top of the article remains visible as the first line of what was deleted, so anything worthwhile can be recovered from that when the article is recreated as an actual article.
The usage note makes perfect sense under Tlapanec, because that's the only entry there with a casing distinction. (It was the only language I could find with a casing distinction. If there are others, that would be a welcome addition.)
BTW, the main reason I'm doing this is to improve coverage of these articles. In a sea of blue links, it's not apparent that many are not actually articles. In a sea of red links, we see exactly what we need to work on. If I created an entry for the English word "apple", stating only that it means "apple", I think it would be right for someone to delete it as empty of content. Similarly, if I create an article for k̩̂, defining it as "K with a circumflex and a stroke", that should be deleted. Some of the articles I tagged for deletion are exactly that, without even a Unicode info box. I've also been going slowly through and adding actual information to many of these articles, it's just a lot to work through. If you want to do the same, by all means add a language or two that use the letter and delete my deletion tag. kwami (talk) 21:45, 11 June 2023 (UTC)
Why wouldn't you add the Unicode infobox, instead of deleting? DCDuring (talk) 01:25, 12 June 2023 (UTC)
They are not "a word in a language". The ones being deleted are those that have no meaningful definition, but only a description (e.g. + meaning "addition" is okay, but é defined as "e with an acute accent" is not). We are not a Unicode database either, but a dictionary. Equinox 01:28, 12 June 2023 (UTC)
Why do we have so many character entries? They aren't words, are they? These kinds of pseudo-definitions are useful for the visually impaired and possibly for the typographically challenged as well. I, for one, often can't tell what diacritical marks a character has, nor what those marks are called, etc. I don't intend to make a study of them, but there are times when I'm using Wiktionary when I'd like to know what they were, what they did to a character, which languages use them etc. DCDuring (talk) 01:50, 12 June 2023 (UTC)
If we don't have an entry, and you click on a red link or enter the character in the search box, you'll see the character info box for that character, with the full Unicode info. That's a very handy feature that I use a lot. So deleting these "articles" doesn't remove any information. All the Unicode stuff is still there, but we're not falsely claiming to have an article on the letter. kwami (talk) 02:01, 12 June 2023 (UTC)
@DCDuring: because the infobox only works with single characters. It will generate an error with an entry like ⟨k̩̂⟩. Anyway, adding an infobox to a non-definition does not turn it into a definition -- I'm only requesting deletions where there is no content apart from the Unicode definition, or similar descriptions that don't provide any actual information (and in the case of emojis are often specific to a particular font, and so do not accurately describe the character).
All I'm asking for is the most minimal info, such as a language the letter occurs in. We have some entries for letters that do not occur in any language, that AFAICT are not used at all, but are only included in Unicode for technical or historical-compatibility reasons. So, just as a word requires evidence of use to be on Wikt, so an article for a letter should provide evidence of use. If no-one can find any evidence that the letter is used (except for mentions, same as words), then IMO it should be deleted. kwami (talk) 01:33, 12 June 2023 (UTC)
That seems like a defect in the design of the infobox. I think it is a definition for the visually impaired, whom I don't think we should ignore. DCDuring (talk) 01:42, 12 June 2023 (UTC)
Who's ignoring them? If you think infoboxes should cover compound characters or multigraphs, you can take it up on the template talk page, but that has nothing to do with deleting empty articles.
No, that's not a definition, it's a description. Anyway, the Unicode info is still available when the article is deleted (it's visible when you click on a red link), so the visually impaired aren't missing anything. An article on "Unicode character Q with an acute accent. Definition 1: Q with an acute accent" is a joke, and just makes us look lazy and stupid. kwami (talk) 01:46, 12 June 2023 (UTC)
A description is one kind of definition. DCDuring (talk) 16:57, 9 July 2023 (UTC)
No, it's not. I can't create an entry for prunvino, label it "Translingual" and define it as "a word spelled p-r-u-n-v-i-n-o." That's not a definition, and there's no evidence it's translingual. kwami (talk) 20:22, 9 July 2023 (UTC)
The usage note doesn't work under Tlapanec, because it's a usage note that involves other languages. It doesn't make sense to put a usage note about other languages under a single language; it should be under translingual. To your other points, I can kind of understand it, but I still think that there should have been a formal RFD. Also, according to @Theknightwho, some of the Unicode names and descriptions aren't accurate anyways, so we may have a problem there as well. The "see also" portion still doesn't work though for seeing other letters with an underdot, for example, unless we make a specific see also appendix for underdotted letters. AG202 (talk) 16:05, 13 June 2023 (UTC)
It's exactly because Unicode names and descriptions are often inaccurate that we shouldn't create articles based only on that. We need independent attestation, even if it's only the sources that were used in the Unicode submission (assuming they can be confirmed).
The usage note works just fine under Tlapanec because it is about Tlapanec, just as we could say under 'Irish' that (hypothetically) Gaelic is the only language still to use as an ampersand. kwami (talk) 04:07, 10 July 2023 (UTC)
@kwamikagami, Equinox: Will a plausible quotation of the character in use save the letter entry, e.g. as for 𑅇? The letters used for Pali in the Chakma script, and therefore probably sees some discussion in the Chakma language. In this case, I've also added a usage note, though I've not transcribed and annotated the evidence to support it. --RichardW57 (talk) 12:51, 9 July 2023 (UTC)
The Pali entry is good, but I don't understand the "translingual" one. Why not put it under the header of the language it's being used for? What's "translingual" about it? It's not even used for Chakma, just for Pali.
As far as I'm concerned, if all you did is list the letters of the Chakma script under Chakma and say "a letter of the Chakma alphabet" along with its phonetic value, that would be enough. But the real phonetic value, per some RS, not blindly copying the Unicode transcription, which is often inaccurate. We don't usually bother citing the RS for the pronunciation as long as it's something relatively straightforward to confirm, but for obscure letters or usages it might be worth a citation or two. kwami (talk) 20:16, 9 July 2023 (UTC)
I personally rather doubt that it gets much usage outside fontmakers, creators of language manipulating software and the band of enthusiasts who decided that their sacred texts should be written in the Chakma script. The letter, as the name of itself, probably gets most usage in Bengali, English and Chakma. It's only used for words in Pali. I think 'multilingual' sums that up best.
How would you determine its real phonetic value? I suppose it's probably however Chakma L1-speakers pronounce it when reading Pali. I think /b/ and /w/ are probably the best guesses, but there may be all sorts of approximations to . --RichardW57 (talk) 20:57, 9 July 2023 (UTC)
I meant phonetic values for Chakma usage (for the letters of the Chakma alphabet), not for Pali. For extinct languages, a transliteration is fine.
How is it used in Bengali? And it's not used in English. You can find mentions or quotations in English-language text, but that doesn't count as English. kwami (talk) 21:03, 9 July 2023 (UTC)
@Kwamikagami: Where's your evidence that '𑅇', which was created in 2013, is in the Chakma alphabet? I think it is no more in the Chakma alphabet than 'ɛ' is in the English alphabet.
I may be wrong about Bengali. I was getting confused with the fact that the best sources about Chakma are in Bengali. --RichardW57 (talk) 03:57, 10 July 2023 (UTC)
AFAICT it's not in the Chakma alphabet, only in Pali. I suppose it might be used for unassimilated Pali loans in Chakma, but that's just a guess. kwami (talk) 04:02, 10 July 2023 (UTC)
@Kwamikagami:: I see you've removed the multilingual entry out of process and made it a Pali letter. That has knock on implications, raised at Wiktionary:Beer_parlour/2023/July#Pali_Letters_and_Translinguality. --RichardW57 (talk) 20:07, 10 July 2023 (UTC)
This discussion doesn't seem to go anywhere so @Kwamikagami send the entries you want to RFDO instead. I'll be reverting your edits in a few days otherwise, because edits like diff ensures that the entry has a fake header (there is no 'Palaung language'), has no headword, and has an error in the template because no code has been provided. That just won't do. Thadh (talk) 08:29, 12 July 2023 (UTC)
You could correct it, then, rather than making it worse. The evidence for the language is the Unicode name. You have no evidence that I can see for your change.
Also, if you don't want edits "out of process", then you should follow the very simple directions on the tag. kwami (talk) 08:47, 12 July 2023 (UTC)
@Kwamikagami: The fact that you blindly follow some Unicode classification instead of actually knowing what languages it represents and how we handle those on Wiktionary, and don't try to figure that out, is just one of the many reasons why I'm not going to delete this entry out of process, and I don't see why I have to put in the work of compiling a list of entries to potentially be deleted and changing all the {{d}}s with {{rfd}}s and adding headwords just to fix your disregard for the practices on Wiktionary. If you want these entries deleted, go through the RFD, if you don't, then you won't mind me deleting that tag. Thadh (talk) 08:59, 12 July 2023 (UTC)
The Unicode classification is the only evidence we have. It's either that or for 'undetermined' -- your choice. I can (a) correct the information per the only evidence or (b) blank the article. You could try helping rather than corrupting Wiktionary with bullshit. But really, empty articles should be deleted. kwami (talk) 09:03, 12 July 2023 (UTC)
@Kwamikagami, Thadh: I think I've fixed the letter Rumai Palaung , though I should transcribe and possibly transliterate an intelligible portion of the copied page. Translating it is beyond me. Evidence such as I've added can often be found using Wikipedia articles such as https://en.wikipedia.orghttps://dictious.com/en/Myanmar_(Unicode_block). (Confusingly, the useful history is frequently collapsed.) The big problems usually lie in characters added early or characters accepted from tables; the latter tend to be mentions not from dictionaries. --RichardW57m (talk) 10:07, 12 July 2023 (UTC)
Thanks, Richard. Personally, I'm not so worried about quotations so long as we can verify the language and sound value, so that we can provide at least basic information to the reader. kwami (talk) 10:19, 12 July 2023 (UTC)

Books written by AI

If we identify that a book (or news site, etc) was written (entirely) by AI rather than a human, it can't be used as a source, right? (This isn't a new issue; we've encountered and disused the occasional book of randomly-generated words before, but as AI-generated books become less gibberishy and more superficially intelligible, I want to check if we're still excluding them.) - -sche (discuss) 01:43, 17 May 2023 (UTC)

Yes, the language lacks a reality check, since AI does not have real-life experience, and always fails the “independence” criterion of the Wiktionary:Attestation requirements. For the same reason psychotic speech may have to fall through the cracks. Fittingly, AI has poor illness insight and seems staggeringly wrong on the topics of mental disorders and as well physical fitness, which would get our definitions crooked in basic areas of human experience. Ultimately it forces misinformation when some ill-informed interest group has dominated the internet; has someone tested them already about Westrobothnian and Scots? We would also quote redactions of Wikimedia, which is forbidden in Wiktionary:Attestation. Fay Freak (talk) 03:05, 17 May 2023 (UTC)
I've seen some absolutely grotesque Google Books results lately, and I dont mean bad writing, I mean something that looks like it was copypasted from a chat log, complete with unclickable URL's in what is supposed to at least look like a paperback book (though these tend to be e-books, the links are still just plain text). To be clear, these are not AI at all, but it seems that the barrier for getting published and hosted on Google Books has fallen a lot, if not disappeared entirely. Soap 05:58, 17 May 2023 (UTC)
See Wiktionary:Beer parlour/2021/December#Nicolae Sfetcu, and other book authors who just copy Wikipedia for an older discussion. With newer AI it will be more difficult to detect… In general, best to avoid self-published sources (we still need that blacklist!) Jberkel 07:26, 17 May 2023 (UTC)
What about avantgarde texts? Have it been risen discussions about it?
In my opinion, avantgarde texts, such as Velimir Khlebnikov or Throbbing Gristle, should not be used, because they often do not show the proper use of words. I also ain't sure about more concrete poetry/songs tho (old folk songs, spells, or stuff like early Quorthon or late Yegor Letov). For example, the lexical use in spells and folk songs may be very different from a layman's language. Tollef Salemann (talk) 05:52, 22 May 2023 (UTC)

Arbitrarily assigning pronunciations to (written) quotes from "Hokkien" media

When an entry (like 阿莎力) quotes something in written form from "Hokkien"-language media (in practice, almost always Taiwanese-language media), if the quote uses Sinographs ( = Kanji = Hanja) that represent different pronunciations in various Taiwanese dialects, is it acceptable to convert the quote to romanized Hokkien-Taiwanese on the basis of an arbitrarily selected pronunciation?

Logically, this is misleading, although it is convenient, and I understand why people do it.

The underlying issue is that (even aside from the social reality of Hokkien & Taiwanese being distinct languages) Wiktionary has in recent years re-imagined Hokkien to be an unwritten dialect of "Chinese" (essentially Mandarin); Sinographic quotes in either Hokkien or Taiwanese (or Cantonese) are treated as "dialect Chinese" (essentially "dialect Mandarin"), and romanized Taiwanese is conceptualised as a pronunciation guide for "dialect" Sinographs.

My proposal is that quotes from (written) Taiwanese-language media & Hokkien-language media should be treated as writing; "conversions" should be avoided at this time b/c there are no conventions in place for converting between scripts — even for Hokkien, in a paradigm where Hokkien is not recognized as a language distinct from Taiwanese (or Mandarin, for that matter).

Meanwhile, pronunciations should be provided — or not — in a manner consistent with how other languages are dealt with at Wiktionary. (talk) 08:41, 17 May 2023 (UTC)

@Justinrleung (talk) 08:42, 17 May 2023 (UTC)
I would have to agree that the current approach has some flaws. Indeed this is also a problem faced with Cantonese (albeit with less variation when compared to Hokkien), idolectal differences exists such as (this) pronounced as ni1 or nei1, pronounced as lai4 or lei4, and we currently choose them rather quite arbitrarily. I guess a practical way to solve this is to use {{zh-x}} only for audio/video sources, and use |text= in {{quote}} templates otherwise, i.e. for printed-only materials.
As a side note, there's nothing here about the paradigms or language treatment or what not, since this issue also occurs in Mandarin as well, as both Mainland and Taiwan Mandarin pronunciations are acceptable, though it's much less of an issue given Mandarin is (mostly) standardized. – Wpi (talk) 09:33, 17 May 2023 (UTC)
A solution that avoids assigning arbitrary pronunciations to written (including online-only) sources would be great.
(There is a paradigm issue. Mandarin doesn't have a romanized script. Wiktionary treats "Hokkien" as not having a romanized script either, but rather just a system for indicating pronunciations via romanization.) (talk) 10:34, 17 May 2023 (UTC)
@Wpi, : In principle, we could maybe make the possibility of suppressing romanization in {{zh-x}}, but I think in general, Wiktionary seems to favour having romanization where possible. IMO, using {{zh-x}} only for audio/video sources is not quite right because of the intended purpose of the template. In fact, I do want to see a better integration of {{quote}} template parameters into {{zh-x}} (or have a separate {{zh-quote}} template), so that we wouldn't just be doing our own thing with |ref=.
And I agree with what Wpi said about treatment of different lects. We run into the same issues with Mandarin (e.g. is the 這 zhè or zhèi in such-and-such particular case?) and Cantonese. — justin(r)leung (t...) | c=› } 14:53, 17 May 2023 (UTC)
@justinrleung: there is no need for a new {{zh-quote}} template. It works fine with the following layout, with |text= in {{quote}} left out:
#* {{quote|zh|…}}
#*: {{zh-x|…}}
Wpi (talk) 15:41, 17 May 2023 (UTC)
@Wpi: The issue with that is that we are not categorizing right. {{zh-x}} without |ref= would categorize to instead of . Also the formatting of |ref= is different from the rest of Wiktionary, which doesn't seem right. — justin(r)leung (t...) | c=› } 15:52, 17 May 2023 (UTC)
Does this mean that lifting copyrighted usage examples when editing in the UK is illegal? Under UK law, the equivalent of fair usage requires an adequate attribution. Perhaps we need a special convention such as |ref=above to suppress duplicating the reference from the {{quote}} invocation. --RichardW57m (talk) 09:03, 18 May 2023 (UTC)
I don’t think copyright is the issue here. We would be showing attribution in one way or another. It’s just about formatting and technicalities with categorization. — justin(r)leung (t...) | c=› } 20:13, 18 May 2023 (UTC)
@Justinrleung: So how do we show the source for a usage example without it being miscategorised as a quotation? For {{mnw-quote}}, a Mon-specific extension of {{quote}} and {{usex}} supporting automatic mark-up-assisted transliteration, I added a parameter |isex= to distinguish, for categorisation, between usage examples and quotations. --RichardW57m (talk) 08:47, 19 May 2023 (UTC)
@RichardW57m: We'd probably need a similar parameter in {{zh-x}} if we use {{quote}} with {{zh-x}}. Otherwise, it's just using |ref= in {{zh-x}}, which doesn't do any auto-formatting unless you use the predefined sources, like Shijing (MOD:zh-usex/data). — justin(r)leung (t...) | c=› } 13:57, 19 May 2023 (UTC)
@justinrleung: The issue is not whether it's all right to arbitrarily choose one of several pronunciations when creating our own examples. The issue is whether it's all right to do that with pre-existing creations created by others. (And whether that's being done with Mandarin doesn't logically settle the matter for Hokkien, etc.) (talk) 03:08, 19 May 2023 (UTC)
@: Yes, I understand that. I guess I was pointing out the fact that the status quo is to also arbitrarily select readings for quotations, which I also see is possibly an issue if we are not certain how things would be read by the author. (Notifying Atitarev, Tooironic, Fish bowl, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi31, ND381): Just wanted to get more visibility with the ping. — justin(r)leung (t...) | c=› } 04:52, 19 May 2023 (UTC)
@Justinrleung: What's being proposed, suppressing transliterations in quotes? It only makes sense, in cases when pronunciations are unknown or there is a doubt, they are accurate. To make readings non-arbitrary, they need to be selected with care, of course. {{zh-x}} allows substitutions when the reading differs from default. Anatoli T. (обсудить/вклад) 05:03, 19 May 2023 (UTC)
@Atitarev: What's being proposed, if I understand this right, is basically that we should not show romanization in quotations, at least in cases where dialectal/idiolectal variation exists, since we often do not know the specific dialect/idiolect of the author (or the author intends there to be the possibility of variation). — justin(r)leung (t...) | c=› } 05:18, 19 May 2023 (UTC)
@Justinrleung I would say that when there's a recording, it makes sense to follow it (regardless of whether or not the singer, say, was also the lyricist).
Where there's no recording — and no rhyme — I'd say that 多 contains the possibility of either CHĒ or CHŌE; it's CHĒ & CHŌE at the same time, regardless of which one the writer themself uses. (talk) 14:17, 25 June 2023 (UTC)
@Justinrleung Sorry again for the sluggish response. (talk) 14:18, 25 June 2023 (UTC)
The problem here is that some people are confounding transliteration and transcription. If we treat the Romanisation as a transliteration, then the phonetic deviations do not matter. Perhaps we need to add a note 'key' to the transliteration, linking to an explanation of the Romanisation scheme, as a reminder that we are not quoting the pronunciation. In a similar vein, I've found myself having to annotate clearly corrupt Pali texts, as in quotations at ᨾᩉᨶ᩠ᨲ᩺ (mahant) (one example) and ᩈᨻᩛ (sabba) (2 examples), with the split into text, transliteration and translation as in the modules Module:RQ:pi:Phaya Luang Maha Sena, Module:RQ:pi:Sai Kam Mong/passages and Module:RQ:pi:Watcharasat respectively. In these I've added 'footnotes' to appropriate elements. One might need to add a footnote when a word is ambiguous, but in general, would not this ambiguity compromise the translation? --RichardW57m (talk) 12:40, 19 May 2023 (UTC)
@RichardW57m: In the cases we're talking about here, there's no ambiguity in meaning, just ambiguity in pronunciation due to uncertainty of the dialect of the author, for example. — justin(r)leung (t...) | c=› } 13:47, 19 May 2023 (UTC)
Good point. However, for instance, CHĒ & CHŌE aren't just phonetic variants of each other, although they are two pronunciations of the same word; they're two alternate Latin transliterations of 多. (This has to do with Hokkien having never been an official language and not having ever been tightly standardised.) (talk) 14:21, 25 June 2023 (UTC)
@RichardW57m (talk) 14:22, 25 June 2023 (UTC)
@Justinrleung In the case of quotations with Korean and Japanese literary Chinese (quotes from the former have already been added in some entries), I'd like to have the option to suppress romanization because the actual pronunciation involves linguistically non-Chinese elements. For what Korean does, see the quote in 左海 (zuǒhǎi).
In the case of Japanese Kanbun, the oral/pronounced form is linguistically 100% Japanese by Kanbun kundoku and even more unsuitable under the Chinese L2 header.--Saranamd (talk) 02:37, 19 May 2023 (UTC)

Macrons over Latin "long" nasal vowels in headwords

(Notifying Fay Freak, Brutal Russian, JohnC5, Benwing2, Lambiam, Mnemosientje, Nicodene, Sartma, Al-Muqanna): Currently we in Latin headwords and links we display <Vns> as <V̄ns>, e.g. īnsidiae, ānser, mēnsa. The length of the vowel is not only predictable by the orthography, but also unphonemic (per our {{la-IPA}} as well), and as a general rule it is probably preferable to have as few diacritics as possible there. I believe it is a matter for our phonetic IPA transcriptions to deal with rather than for headwords and links, and propose their removal. Catonif (talk) 17:33, 17 May 2023 (UTC)

@Catonif Isn't this standard in Latin dictionaries? Theknightwho (talk) 18:48, 17 May 2023 (UTC)
I agree with User:Theknightwho; since this is standard in Latin dictionaries we should do the same. Benwing2 (talk) 20:05, 17 May 2023 (UTC)
I would contest that. Of the sources cited under mensa, for instance, only the second provides a macron. It is my experience that Latin dictionaries, or etymological dictionaries that reference Latin, rarely use a macron in this context. Nicodene (talk) 20:45, 17 May 2023 (UTC)
@Catonif: As a user for Latin (I seldom edit contents in Latin), I prefer the macrons to be used and seek to overcome any technical problems. Unless the use of macrons over nasal vowels is proven to be wrong, of course. Do you argue that mēnsa is wrong? If some sources use macrons, it seems a proof enough to me. It seems like a case of rare diacritics, e.g. Arabic ـٰ (called أَلِف خَنْجَرِيَّة (ʔalif ḵanjariyya, dagger alif), it is rare, lacks on keyboards, seldom used but it helps to produce the accurate transliteration and shows that it's possible to use it. So we do use it. Anatoli T. (обсудить/вклад) 00:28, 18 May 2023 (UTC)
I favor including the macrons. The dictionaries that omit them generally follow a policy of omitting length marks in some or all closed syllables, and explicitly marking short vowels with breves; I find this convention preferable in several respects to what Wiktionary currently does, but per our current convention, unmarked vowels are typically used to represent short vowels. I admit the presence of the phonetic transcription greatly reduces any chance of confusion. As for the phonemic situation, I'm not actually a fan of showing the vowel as short there either. It is possible at some level of abstraction to analyze the reconstructed nasal vowel phones as being phonemically composed of a short vowel (or vowel of unspecified length) + a nasal consonant, but I don't think this is necessarily appropriate or even synchronically accurate for Classical Latin. W. Sidney Allen's description in Vox Latina ("The development in a word such as consul, therefore, is: prehistoric cŏnsol, early Latin cō̃sol; classical colloquial cōsul; classical literary cōnsul") is not really consistent with analyzing the Classical Latin form as phonemically containing a short vowel phoneme + nasal consonant (it isn't consistent with our phonetic transcriptions either, but that's another digression). Anyway, we know from descriptions such as Cicero's that Latin speakers perceived a vowel in this position as long, not short. Sure, the length isn't distinctively contrastive, but you could also say that the place of articulation of the nasal consonant in the hypothesized underlying representation /ˈmen.sa/ isn't distinctive either, which would imply that we actually need to transcribe /ˈmeN.sa/. Rather than going down that road, I'd say we should just not bother with being strict about marking theoretical neutralizations, and use mēnsa and /ˈmeːn.sa/.--Urszag (talk) 00:58, 18 May 2023 (UTC)
I also favour including macrons, per @Urszag's reasons. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:59, 19 May 2023 (UTC)
That vowel length is unphonemic would be news to me. I don't think {{la-IPA}} actually says that. It even provides for the breve-macron in case the vowel length is unknown. (Even though that isn't actually the case for crusta but whatever.) —Caoimhin ceallach (talk) 10:58, 7 June 2023 (UTC)
Catonif was only referring to vowel length in the specific context of vowels before ns or nf. A vowel in this context is always phonetically long in Latin; therefore, there is no contrast between short and long vowel phonemes in this context. Currently, the template {{la-IPA}} transcribes these phonemically as a sequence of short vowel + /ns/ or /nf/, and phonetically as long nasalized vowels + or .--Urszag (talk) 11:10, 7 June 2023 (UTC)
I see, that makes more sense. I though <Vns> was only meant to be an example. I still think they should be included for consistency and because not everyone looking up a Latin word knows all the rules determining length. —Caoimhin ceallach (talk) 11:19, 7 June 2023 (UTC)
Agreed. For user friendliness, I think macrons should be included for all long vowels. I've studied Latin on and off for several years (enough to read simple texts in Latin), and I didn't know this rule. I don't think having the information hurts anybody. Anyone who knows that the vowel length is always long can just ignore the macron, whereas anyone who doesn't know that will lose that information if we remove it. Andrew Sheedy (talk) 16:13, 7 June 2023 (UTC)

Romanized Hokkien is a script, not a system for phonemic transcription

It's a script that doubles — imperfectly, perhaps — as a phonemic transcription system. Not a phonemic transcription system that sometimes doubles as a script.

Sorry to be broaching another topic so soon. This came up tangentially in discussing "Arbitrarily assigning pronunciations ...", and I figure it's too large of a topic to get at in there.

Hokkien has a native Sinographic script that goes back to the 16th or (probably) 15th century, and a Latin script that goes back to the mid 1800s. The modern Standard Chinese (not exactly Mandarin) script has also been officially extended for Hokkien as of the last two decades. Wiktionary currently deals in the Latin script & the Mandarish script extension.

Latin scripts can be perfectly phonemic when they're created, but this is difficult to maintain over time. The core issue is that when speech evolves, speech may diverge from the writing. This adds to the overhead of learning the writing. Different communities have radically different approaches to dealing with the divergence. But even with Spanish, the writing is not a perfect reflection of speech, even aside from register, diction, etc. The same is true of romanized Hokkien writing.

The for-some seductive idea — which Wiktionary seems to have adopted — is that romanized Hokkien is just a phonological transcription system, like Pinyin or Hepburn.

(That idea is the tip of an iceberg in which Hokkien is seen as essentially part of a greater mono-language, with a mono-script that pertains to the entire mono-language rather than to any of its purported parts. In fact, the idea that Hokkien is part of that mono-language seems to be circularly based on the notion that Hokkien doesn't have its own systems of writing. So some near-existential significance seems to have attached to the denial of the existence of Hokkien writing as such. The romanized script has thus been reconceptualized as a phonological transcription system, while the native Sinographic script has been reconceptualized as unsavory, unsuccessful pre-attempts to create something like the 21st century Hokkien extension of the Mando-Chinese script(s).)

This is an upstream issue, and upstream issues tend to spawn a wide range of symptoms downstream. JUST one example of a concrete downstream problem that has surfaced is the issue of whether a written form in romanized Taiwanese (which Wiktionary takes to be part of Hokkien) proves the existence of the pronunciation that that spelling would indicate IF romanized Taiwanese were a mere phonological transcription system a là Pinyin. The problem is heightened if — as seems to be the case — Hokkien-related entries on English Wiktionary are being gate-kept by non-speakers.

Non-existent pronunciations are thus being listed on Wiktionary as alternate pronunciations. The only proof of them is alleged instances in writing, in a script that is in fact not a phonemic transcription system. The inability of anybody to link to audio or video real-world instances of these pronunciations is attributed to their rarity, and the general difficulty of finding that kind of proof.

Given the social & political realities, I'm aware that this tree of mine may fall silently in the forest. I want to highlight — however ephemerally — how the distorted treatment of Hokkien, Hakka, Cantonese, etc., on Wiktionary, in recent years especially, has been illogical, and inconsistent with the general principles of Wiktionary & linguistic scholarship at large. I guess we're supposed to believe that these languages are being subordinated to greater causes. I sympathise with some of those causes, but no good can come of such wide-ranging intellectual distortion so far upstream. Even if one greater goal (for some) is to disable these languages from compromising the integrity of the Chinese nation-state (perhaps via engineering the gradual, face-neutral die-off of these languages), I think Wiktionary should deal with these languages using the same best practices applied to the other languages of the world, w/o bias or distortion. Alternatively, if this segment of Wiktionary has achieved a deeper understanding of human language, that understanding should be shared with other segments of Wiktionary and with the world.

For simplicity, I've avoided bringing up the relationship between Hokkien & Taiwanese. In practice, Hokkien-related issues that arise on Wiktionary often have more to do with Taiwanese. But the foundational upstream issue we're looking at here is just as relevant to Hokkien itself. (talk) 03:48, 19 May 2023 (UTC)

The above user is unlikely to contribute to Wiktionary's Category:Hakka lemmas, which now boasts 29,138 total entries, mostly stirring s*t in Wiktionary name space. Again, like someone we know, they will blame the Chinese content unification in this project and some unnamed editors who mistreat Chinese varieties.
Factually, nobody treats two Hakka romanisations (Pha̍k-fa-sṳ (PFS) and Guangdong Romanization (GD) for the Meixian dialect) as words, creating entries in PFS or GD is specifically disallowed per Wiktionary:About_Chinese#About_specific_lects.
Also, if anything (content, audio, examples, etc.) is still missing, it's nobody's fault. Anatoli T. (обсудить/вклад) 04:05, 19 May 2023 (UTC)
I disagree with Anatoli's statement above, which appears to be misunderstanding the above statement by 釆. 釆 is a Hokkien/Taiwanese editor, discussing about the issues with Hokkien and/or Taiwanese, and only brought up Hakka as a related example of (mis)treatment. I don't understand why you would be claiming that someone is "unlikely to contribute to Wikitionary's Category:Hakka lemmas" and "mostly stirring s*t" in Wiktionary name space" when it is evident that they are a Hokkien/Taiwanese speaker and not a Hakka speaker. Obviously if one is not a speaker of some language they are unlikely to contribute to that language. While it may be true that no one treats the Hakka romanisations as words, we can bring this up for discussion (which is exactly why we're here). Saying PFS/GD entries are disallowed per the about page, so we shouldn't discuss it is absurd – the purpose of the discussion includes deciding whether they should be allowed or not.
In fact, the issue also exists for Cantonese with its romanised form, except that the spelling is nonstandard and based on English orthography, and there is no separate header for Cantonese (i.e. the L2 is Chinese). Examples of entries that have a Sinitic origin include fing, kick, lor, lur, chur, hea, gur, wor, etc., and there are even more entries with an English etymology. I am willing to propose separating the Cantonese L2 for these entries, though I don't want to hijack the discussion here. Instead I'm mentioning this only as an additional example of the problems caused by the issues mentioned by 釆. – Wpi (talk) 10:04, 19 May 2023 (UTC)
Thank you, Wpi. I have some passive knowledge of Hakka, but I certainly wouldn’t feel comfortable editing Hakka entries on Wikt in general.
Anatoli implies that romanized Hakka is not genuine writing. This is not true in the case of PHA̍K-FA-SṲ. PHA̍K-FA-SṲ is bona fide writing. It’s not equivalent to Guangdong Romanization, which is purely a transcription system. But I acknowledge & understand that some people wish they WERE equivalent.
Literacy in Romanized Hakka (PFS) is certainly not widespread, and seems to be dwindling. That doesn’t make it “just a transcription system” by default. Even in death, a (dead) script is a script.
It’s mind-boggling that the powers-that-be disallow entries in Romanized Hakka. Obviously, this bars no small number of Hakka words from being added to Wikt. Even where a word has a written form in two or more scripts, the typical treatment on Wikt seems to be to “allow” both / all:
https://en.wiktionary.orghttps://dictious.com/en/われわれ
https://en.wiktionary.orghttps://dictious.com/en/我々#Japanese
The resistance to acknowledging various vernacular-centered scripts for non-Mandarin “Chinese” vernaculars is alarming & worthy of inspection. In the case of the Latin scripts, there could be some kind of anti-hegemonic (?) animus at work, which is perhaps admirable. But at some point we need to take a level-headed look at what’s there. With Hakka, Hokkien, etc., there seems to be ferocious resistance not just to doing that, but to even “allowing” that to be done.
Part of the tug-of-war in such discussions seems to be over whether it’s valid to discuss a Hokkien or a Hakka as a language in itself w/o simultaneously worrying about how — if Hokkien were to end up well represented on Wikt, perhaps as a schematic equal of Mandarin & Turkish & Ilokano — how appearances would be kept up regarding the dozens of poorly documented & poorly differentiated Sino-languages of the hinterland.
I guess one solution is to pretend that Hokkien is constructively a dialect of Mandarin; only Mandarin would be schematically equal to Turkish & Ilokano. But what does that solve?
Clearly, & maybe unfortunately, there are any number of human languages that are poorly represented on Wikt. So I question the anticipatory angst that we might have over the problem of there being nobody to do the work of fleshing out Wikt for Hakka. It’s not like the current solution of treating Hakka as (basically) a dialect of Mandarin has expedited the useful cataloguing of Hakka speech & writing. If anything, it has resulted in muddled documentation that neither a language learner nor a future language historian would be able to rely on. In short, why the idea that wrong or useless information (such as non-existent pronunciations, to tie this back to my first post) is better than no information? What are we trying to cover up? What are we ashamed of? (talk) 12:15, 19 May 2023 (UTC)
Clicking on Anatoli's link, I found 182 prepositions among the supposed 29K total entries.
The first of these is 中, with two pronunciations but no guidance as to usage, register, or even whether this preposition is common. Typically (generally), barebones dictionaries provide clues as to frequency by simply not including elements that are less frequently used (or maybe restricted in terms of register or subculture). Wikt in its current state is not providing even that level of service.
The next three entries in the list of "Hakka prepositions" were 𠁦, 𠁧, 𠁩 — typographic variants of 中, supposedly. If, as I suspect, 𠁦 or 𠁧 or 𠁩 have never been used in Hakka texts — and I'll stand corrected if a few sheets of scanned evidence comes along — what's the user supposed to make of these three entries?
Meanwhile, I wondered where I'd find the super-common preposition TI / TU. I found it under 在, not b/c I'm good at Hakka, but just as kind of a lucky guess. We're informed that there's an alternate form 𡉄 (again, when was this ever used in Hakka?), but, again, there's no guidance as to usage, etc. And rather than blank spaces reassuring the user s/he's seen all there is to see, I had to wade through a sea of irrelevant information about other languages just to find that there is no guidance on how TI / TU is used.
How is any of this good? (talk) 12:55, 19 May 2023 (UTC)
Arguably the problem lies in the fact that Wiktionary is documenting Chinese from the perspective of Standard Written Chinese, which is based on and heavily skewed towards Mandarin. This is also complicated by the fact that many of our Mandarin editors fail to distinguish the difference between the two, e.g. treating words used in formal written Chinese in Hong Kong as Mandarin when it's not.
All of the coverage of words that are used in SWC are not labelled, even when some dialect rarely or doesn't use a particular word in SWC. This paints a false picture that some words are used in one of the Chinese lects, for example is simply "to eat" when is "(literary or Cantonese, Hakka) to eat", or the Cantonese senses of , which has an equivalent in Mandarin but 才 is not used in Cantonese in such manner. (Please bear with me using Cantonese as examples since I only speak Cantonese)
There're also differneces in formality between different lects, usually a word may be informal in the north but rather formal in the south. One example would be 先生 "(honorific in Mandarin, Jin; common in Cantonese, Gan, Hakka, Min) teacher".
It also appears to me that single character entries are worse in this regard, perhaps because of the sheer amount of information they contain.
I'm not sure if I've mentioned this before, but I think a solution would be requiring all definitions to be labelled properly, whether in SWC or not. – Wpi (talk) 13:36, 19 May 2023 (UTC)
@Wpi: I agree, and the solution you give is a good step in making it clearer. — justin(r)leung (t...) | c=› } 13:41, 19 May 2023 (UTC)
@Wpi Thank you for the examples. (I speak a passable Cantonese myself and) I think Cantonese could be the key to these upstream issues. (talk) 10:15, 20 May 2023 (UTC)
@: Thank you for taking the time to write this out. I think you bring up important issues to think about. It is certainly true that the current way POJ is dealt with is perhaps non-ideal. There are currently two approaches to POJ entries. The older approach is to have a "full" entry, and the newer approach is to use {{zh-see}}; we can see the two different layouts here and here, respectively. There doesn't seem to be an agreement as to how to do this, but when the new approach was brought up, I had some resistance to it. (However, I can't remember if I have voiced it out, and indeed, I have followed suit with using the newer layout sometimes, which I do think is probably not great on my part.) Generally, I still think the current official guidelines for POJ is that we can have POJ entries. It is only the de facto state of Wiktionary that makes it such that POJ is treated as inferior, which would probably take the efforts of editors to change, since we only have that many editors and we do not really have active editors who are native Taiwanese users AFAIK.
> "romanized Hokkien is just a phonological transcription system, like Pinyin or Hepburn."
I think this is not quite a fair assessment of how we treat POJ. Yes, we are using it in the pronunciation sections because it is a well-established romanization system that can work as a phonemic transcription system. This doesn't mean it's automatically at the same level as Pinyin. Full Pinyin entries are not allowed at all, but full POJ entries are (as per Wiktionary:About Chinese#About specific lects). If the issues discussed at Talk:阿莎力 has caused you to think what you have expressed above, it might have been a mistake on my part on that particular word, and I hope my actions would not be taken as the be-all-end-all of practice on Wiktionary.
> "JUST one example of a concrete downstream problem that has surfaced is the issue of whether a written form in romanized Taiwanese (which Wiktionary takes to be part of Hokkien) proves the existence of the pronunciation that that spelling would indicate IF romanized Taiwanese were a mere phonological transcription system a là Pinyin."
While it is possible that POJ is "inaccurate" in its representation of pronunciation due to its nature as a writing system rather than being a purely phonological system, I would say in at least 90% of the cases, it is trying to represent how people speak. To me, it generally would be enough evidence for a particular pronunciation, but I understand that it is generally harder to be sure when it comes to loanwords like Talk:阿莎力. Would you say we should completely abandon allowing POJ writing as the only evidence for a particular pronunciation when it has been a great source for this, or should we just learn to be more cautious when weighing the evidence?
> "Even where a word has a written form in two or more scripts, the typical treatment on Wikt seems to be to “allow” both / all"
This varies from language to language. For example, Zhuang does "allow" both Latin script and Sawndip, but the main entries are always only at the Latin script, with soft redirects from the Sawndip to the Latin script. For Japanese, there is a nuanced treatment as to where things are lemmatized (Wiktionary:About Japanese#Lemma entries). — justin(r)leung (t...) | c=› } 13:36, 19 May 2023 (UTC)
@Justinrleung Thank you for the thoughtful reply.
I wonder what the reasoning would even be for not allowing entries in the Latin script.
> I think this is not quite a fair assessment of how we treat POJ. Yes, we are using it in the pronunciation sections because it is a well-established romanization system that can work as a phonemic transcription system.
Agreed that it makes sense to use POJ in its secondary function as a phonemic transcription system (in addition to its primary function as a script).
> Full Pinyin entries are not allowed at all, but full POJ entries are (as per Wiktionary:About Chinese#About specific lects).
I see. (Is the distinction stable? I notice that romanised entries are not “allowed” for Hakka.) Wikt is far from treating romanised Hokkien (POJ) as full-fledged Hokkien writing, though.
In this stub, for example:
https://en.wiktionary.orghttps://dictious.com/en/chhù
… the definitions & examples can’t be found. The user has to click over to another page (https://en.wiktionary.orghttps://dictious.com/en/厝#Chinese), where s/he has to first wade through many lines of info about (as far as we know) unrelated etyma that just happen to share a Sinographic form, translingually. (A tighter link could spare us the wading, but the question remains of why the word is stashed away at this exact spot.)
If non-duplication of info is a huge priority (how do Serbian & Japanese deal?), from a user perspective, the definitions & examples would probably be easier to access if they were placed under the Romanised form (“chhù”), and linked to from the Sinographic form. I suspect this would be true more than 3/4 of the time.
Also, the stub treats “chhù” as a “character”. This indicates some kind of unresolved mismatch upstream, although an easy fix seems possible (edit “character” to “form”).
Having found our word, we find an example sentence in “Hokkien” form — 伊無佇厝。/伊无伫厝。— followed by the same sentence in “Pe̍h-ōe-jī” form.
Leaving aside the question of Sinographic orthography, all three forms of the sentence are obviously meant to be “Hokkien”. So it seems like the template didn’t contemplate the orthographic existence of romanised Hokkien — or it was not made with Hokkien in mind at all.
Or take this entry:
https://en.wiktionary.orghttps://dictious.com/en/啥物
“Siáⁿ-mi̍h” is listed in pronunciations, which is objectively wrong. It is then absent without explanation from “alternative forms”.
“Chhù” is listed as a pronunciation, which is correct. But it’s absent from “alternative forms”, where five distracting fantasy forms — 戍, 茨, 㢀, 𡪣, & 次 — are given instead. Of course, 厝 is the overwhelmingly dominant written form for this word, and “chhù” has been the ONLY other usage (as in, not mere proposals — although “tshù” is arguably entering orthographic usage).
These are problems that would have to be fixed upstream; yet the fix would be susceptible to being undone in the future, regardless of efforts made downstream. I imagine editors would be hesitant to contribute to Wikt for Catalan if there was a possibility that Wikt would be reconfigured in the future to treat Catalan as a dialect of French. (Not so far-fetched, in terms of “abstand” distance....)
I guess the idea is that Hokkien has been collapsed under “Chinese” as an administrative shortcut, but the shortcut is clearly having trouble swallowing the reality of Hokkien. And any gain (?) that may have come of force-administering Hokkien this way has clearly been at the expense of user ease, at least for the conventional user.
(English) Wiktionary as regards Hokkien seems to be confounded by two or three layers of unresolved dissonance, at this point.
First, there’s the “Modern Chinese” habit of trying to merge info on words & info on Sinographs into one seamless dimension. (Another genuine sub-question: whether the word is an objective unit of Hokkien speech at all.)
Second, there’s an attempt to combine the utility of a conventional, usage-oriented dictionary with a comprehensive compendium of Sinographic dialect theology, with the former lending its secular raison d’être & real estate to the latter. Currently, for Hokkien — but not for Mandarin & maybe not for Cantonese — dialect theology outweighs conventional utility on Wikt. Hence 𡪣 as an “alternative form”, but not “chhù”.
Third, little or no academic, social, & orthographic infrastructure exists for Wiktionarising any number of Chinese languages — such as the language of Changsha — as full-on languages, despite their impressive volumes of speakers. As a shortcut, these languages are now treated as appendages of “Standard Chinese”; notable words & expressions are expressly added to the database, lending color & flavor to “Chinese”. But this shortcut would appear incongruous if certain coastal languages stood off as schematic equals of Mandarin, Turkish, or Ilokano. Hence the (further) need for Hokkien to be treated as an appendage of “Standard Chinese”, to justify the workaround considered necessary for clearing bahasa Changsha out of the freezer.
So … Wikt isn’t treating Romanised Hokkien as straight writing (although — as you point out — it doesn’t put it on the level of Pinyin either). This looms even larger if we consider that the particular Sinographic “script” that Wikt recognises as “standard” Hokkien writing is largely experimental (not to mention ceremonial), and at odds with the native Hokkien Sinography.
> While it is possible that POJ is "inaccurate" in its representation of pronunciation…. To me, it generally would be enough evidence for a particular pronunciation…. Would you say we should completely abandon allowing POJ writing as the only evidence for a particular pronunciation when it has been a great source for this, or should we just learn to be more cautious when weighing the evidence?
Written forms in English provide clues to pronunciation, and so do Sinographic forms in Hokkien. POJ is certainly less “lossy” than either of the above, but dynamic proficiency in POJ (& one of the underlying languages, & in many cases local knowledge) is still indispensable for this task.
On top of this, any written form found in the wild could contain user error. An isolated usage can be hard to interpret.
This is not a loanword-specific problem. Take, for example, “siáⁿ-mi̍h.” This is the to-date dominant Romanised form of one of the most common words in spoken (& written) Hokkien. But it’s not a pronunciation for the word in modern times, nor is there evidence that it was in the past. (All things considered, Wikt is incorrect in having “siáⁿ-mi̍h” under pronunciations.)
As another example, take “hông”, the contracted form of “hō͘ lâng”. In running *position*, this word takes either a low contour or a low-rising contour. (Alternatively, we could say it’s really two syllables….) A lot of people in recent years are writing “hŏng” instead. Does this indicate a third high-rising pronunciation? If you ask, people who write “hŏng” will tend to say that the inverted bowl is a symbol indicating a contraction. (They’re mistaken, but if the mistake sticks & spreads, at some point it might just have to be acknowledged as a usage.) But I’ve also “seen” people claim that their pronunciation is high-rising.
So does “hŏng” (written form) suggest the existence of a high-rising pronunciation? Possibly. Does “hŏng” (written form) prove the existence of a high-rising pronunciation? No.
(Notice how the official Republic of China Hokkien-Mandarin dictionary — and most Republic of China scholars in the relevant fields — do not bother with such secular matters.)
> "Even where a word has a written form in two or more scripts, the typical treatment on Wikt seems to be to “allow” both / all"
> This varies from language to language. For example, Zhuang does "allow" both Latin script and Sawndip, but the main entries are always only at the Latin script, with soft redirects from the Sawndip to the Latin script.
Objectively, this would be the best approach for Hokkien as well — perhaps with “nuancing” where there’s an established Sinographic form. Not b/c the Latin script is primary (I might argue the reverse), but b/c it provides a plenary & OBJECTIVE means for writing (Amoy) Hokkien. (talk) 09:34, 20 May 2023 (UTC)
For the problem about unrelated etyma (i.e. the one you mentioned about chhù/厝), there is a way to do so, i.e. {{etymid}}/{{senseid}}, but (a) etymid/senseid aren't really used in Chinese entries (I've been advocating about using it more) (b) {{zh-see}} (the template on the POJ entry) has very limited functionality and doesn't support linking to subsections or etymid, but I think it should do so and extract the relevant parts for the displayed gloss. – Wpi (talk) 10:35, 20 May 2023 (UTC)

Shortening Romance verb etymologies

We have, by my estimate, thousands of Romance verb etymologies of the type represented by Asturian venir, which reads as follows:

'From Latin venīre, present active infinitive of veniō.'

As discussed earlier, this is unnecessary cluttered.

There was some support for shortening these etymologies in the following way:

'From Latin venīre.'

Shall we make this the standard way of doing Romance verb etymologies? (Aromanian, naturally, will have to be an exception, as it lacks a grammatical infinitive.) If so, perhaps a bot could clean up the bulk of the present-active-infinitive-spam.

Nicodene (talk) 23:51, 19 May 2023 (UTC)

Let's not do that Skisckis (talk) 23:54, 19 May 2023 (UTC)
Hi Wonderfool. Perhaps you could elaborate, on the off-chance that you're being constructive. Nicodene (talk) 23:55, 19 May 2023 (UTC)
I've got nothing... Skisckis (talk) 00:35, 20 May 2023 (UTC)
It makes sense to me, and is consistent with the practice of e.g. the DLE (Spanish), Treccani (Italian), and and TLFi (French).--Urszag (talk) 23:59, 19 May 2023 (UTC)
I'm pretty indifferent, but I don't really see the incentive to remove work that's been put into the dictionary if it's accurate. Since we aren't constrained by paper, what is the value of removing the material? —Justin (koavf)TCM 00:36, 20 May 2023 (UTC)
The question is not whether the information is accurate, but whether it is helpful. Depending on how you count, Latin verbs can have hundreds of forms; I think it's clear that we don't need to list all of them. But there is no universal convention about which is treated as the lemma or citation form: in the context of discussing the etymology of words in the Romance languages, the present active infinitive is often cited as the lemma, but in dictionaries of Latin, the first-person present active singular tends to be used as the citation form. Listing both is a kind of compromise, but I find it hard to see how it adds value beyond simply listing the infinitive with a link to the Latin entry where all the forms can be found.--Urszag (talk) 01:58, 20 May 2023 (UTC)
So there was a different discussion than the one that User:Nicodene cited, one that I started, where it was agreed to do what Nicodene is proposing above. As a result I've already made this change to some languages, e.g. Italian. The principle I follow is: for verbs, use a piped link of the form ]; for nouns, just link to the lemma (the nominative singular) if it has the same number of syllables, same stress pattern and same stem as the accusative singular (which is basically everything other than third-declension consonant-stem nouns and second-declension nouns in -er), otherwise use a piped link of the form ] (in a small number of cases, such as Italian crimine, fiele and folgore, the Romance term isn't derived from either the Latin nominative or accusative singular; in those cases I think I use a piped link to the ablative singular, noting that this is the ablative singular). Cleaning this up completely by bot isn't possible due to the variety of ways people have written both forms out, but it can be done in a semi-automated fashion using regular expressions as applied in a text editor to a file containing all the lemmas of the daughter language. Benwing2 (talk) 02:14, 20 May 2023 (UTC)
That makes sense. A side point, but I'm not sure fiele really is from the ablative singular. Compare cuore, and note also that there's no geminate in fiele. I think it could be from the nom/acc form with a paragogic -e added.--Urszag (talk) 02:30, 20 May 2023 (UTC)
@Urszag I think you are right here. You also wouldn't expect the diphthongization from ablative felle. Benwing2 (talk) 03:31, 20 May 2023 (UTC)
Excellent- I agree with all of these principles.
As Urszag has noted, fiele derives from a Vulgar Latin *fele, as cuore from *core. An originally open stressed syllable is required to explain the diphthongization in Italian as well as French (fiel, cœur). I'll fix up those entries soon. Nicodene (talk) 03:32, 20 May 2023 (UTC)
Wouldn't it be desirable in those exceptional cases to explicitly state that they were derived from the ablative and if possible explain why? Precisely because wiktionary isn't limited in space the way a book is and it strikes me as a plausible reason why one would like up an etymology in the first place. —Caoimhin ceallach (talk) 21:21, 11 June 2023 (UTC)
To be clear, I'm not calling into question if the data are accurate, I'm just saying that assuming they are... If the data are inaccurate or misleading that is an obviously good reason to remove it. —Justin (koavf)TCM 03:19, 20 May 2023 (UTC)
It is accurate to say that Latin venīre is the present active infinitive of veniō. But I find it a little misleading to say that Spanish venir is "from Latin venīre, present active infinitive of veniō" because the Spanish verb as a whole is not just from the Latin infinitive; e.g. the form viene of the Spanish verb is from the third-person singular present active form venit. The infinitive of the Spanish verb is from the Latin infinitive form, but I think what we really are trying to say when we give this kind of etymology is that the Spanish verb as a whole is from the Latin verb as a whole (not excluding changes in the overall conjugational system), so it seems better to cite the Latin verb as a whole, using a lemma as its name, rather than explicitly specifying a particular form of the Latin verb.--Urszag (talk) 03:56, 20 May 2023 (UTC)
thumbs up emoji, makes sense. Gracias/grazie/obrigado/merci. —Justin (koavf)TCM 04:35, 20 May 2023 (UTC)
Full support. Verb etymologies should be for the verb as a whole, not the form we lemmatize under. Our etymology for venir shouldn't be for venir, but for the verb that venir is the dictionary form of. — SURJECTION / T / C / L / 09:48, 20 May 2023 (UTC)
I agree with the proposed changes. I'm happy to help with manual edits if someone makes a list of pages that need them. Ultimateria (talk) 18:09, 21 May 2023 (UTC)
I'd much rather linking as {{inh|ast|la|venio|veniō, venīre}}, because it is more honest on what page it is linking to, and as a multilingual dictionary, which DLE, Treccani and TLFi aren't, I feel like it is important, although I would of course prefer the proposed way rather than the current present active infinitive of situation. Catonif (talk) 20:34, 21 May 2023 (UTC)
I agree with Catonif here, or maybe {{inh|ast|la|venio|venīre, veniō}}. AG202 (talk) 05:02, 22 May 2023 (UTC)
I prefer the order veniō, venīre, as I think that feels much more natural from the perspective of a student of Latin than the reverse. But I'm not sure whether I agree that this is preferable to just venīre. While I can imagine unsignposted redirects being annoying in some circumstances, I can't quite picture the issue with clicking on venīre and being redirected without foreknowledge to veniō.--Urszag (talk) 05:57, 22 May 2023 (UTC)
User shock. It will be particularly confusing if the browser is acting up. --RichardW57m (talk) 11:42, 22 May 2023 (UTC)
I don’t see this as a serious objection, because it would apply to any link using alt text. Users aren’t idiots. Theknightwho (talk) 15:55, 22 May 2023 (UTC)
Debatable. --RichardW57m (talk) 13:27, 23 May 2023 (UTC)
We already do this kind of thing. Should we change it everywhere? Vininn126 (talk) 15:57, 22 May 2023 (UTC)
I definitely think something like this should be given instead. Vininn126 (talk) 11:54, 22 May 2023 (UTC)
This would all be much easier if we just lemmatised Latin at the infinitive… Theknightwho (talk) 15:54, 22 May 2023 (UTC)
"This would all be much easier if we just lemmatised Latin at the infinitive…"
Why don't we?
Serious question.
It seems like the main reason we lemmatize Latin entries at the first-person active present is because that's what other references do. Is that a good enough reason?
I have no strong opinion either way -- it just seems like maybe Wiktionary has a different use case, which might not have been fully considered in the face of the inertia of past academia. It seems like we lemmatize at the infinitive for all the Latin daughter languages. It has always puzzled me that we don't do the same for Latin, and instead have these confusingly-worded etymologies. ‑‑ Eiríkr Útlendi │Tala við mig 20:22, 22 May 2023 (UTC)
See: Wiktionary:Beer parlour/2022/February § Changing the default citation form for Latin verbs. AG202 (talk) 20:30, 22 May 2023 (UTC)
I'd be happy to support this as an improvement over the longer wording. Ultimately my preference would just be to use the lemma (From Latin veniō) but I'm really willing to support any sensible change here. This, that and the other (talk) 23:52, 22 May 2023 (UTC)
Note, I wrote a script to clean up Latin etymologies to use the {{bor|es|la|venio|venīre}} (and {{bor|es|la|natio|nātiōnem}}) syntax and have run it on Spanish and French, with some help from User:Nicodene. I can't tell from the above discussion if there's any consensus to use this syntax or the {{bor|es|la|venio|veniō, venīre}} / {{bor|es|la|natio|nātiō, nātiōnem}} syntax. I have a certain preference for the former over the latter for various reasons, e.g. it is routine in headwords to create piped links that link non-lemma forms to their corresponding lemma, and I think a user unfamiliar with Latin conventions would have no idea what veniō, venīre means and would just get more confused than if presented with the infinitive alone. But if the consensus ends up favoring the veniō, venīre / nātiō, nātiōnem syntax, links of the shorter kind can be bot-converted to the longer kind. Benwing2 (talk) 05:28, 25 May 2023 (UTC)

Quoting Wikipedia Diffs in Wiktionary: An Exception

Some words originate in or have a highly significant use in a Wikipedia diff. I propose changing WT:CFI's: "We do not quote other Wikimedia sites (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource)." to something like "We do not quote other Wikimedia sites (such as Wikipedia) except in cases of citogenesis, but we may use quotations found on them (such as quotations from books available on Wikisource)." Here are two examples of what I imagine this would look like in these two diffs: diff & diff. See also my comment here Wiktionary_talk:Criteria_for_inclusion#Potential_Exception_to_"We_do_not_quote_other_Wikimedia_sites_(such_as_Wikipedia)":_Brazilian_aardvark --Geographyinitiative (talk) 00:00, 24 May 2023 (UTC)

I thought you could quote non-DA material, just that it doesn't count for CFI. CitationsFreak: Accessed 2023/01/01 (talk) 01:19, 24 May 2023 (UTC)
Right, this rule can be teleologically reduced. The “literal meaning” of a formulated rule is not its only meaning. You have found cases of exceptions, so be it; reformulation to catch all cases is not always a necessity but the formulated rules always do not catch everything directly. It doesn’t even literally say “we never quote” other Wikipedia sites, but “we do not quote”. It just states a principle, which may be outweighed by other reasons, derived from inclusion principles, like accordance with etymological accuracy. Fay Freak (talk) 11:03, 24 May 2023 (UTC)
Thanks for your replies. There's no "do not quote" wording for normal non-DA (durably archived) materials, but: there is "do not quote" wording for Wikipedia Diffs. Hence, I'm making a suggestion about (1) a potential reform or perhaps (2) mere rewording of some kind. I doubt that someone new who is quoting a bunch of Wikipedia Diffs will last long, and probably on the basis of this very "do not quote" language. If this is a bona fide exception, then what's to fear with making that explicit? Maybe: "We do not quote other Wikimedia sites (such as Wikipedia) except in extremely rare cases of citogenesis, but we may use quotations found on them (such as quotations from books available on Wikisource)." I can't imagine that I would make cites like these two ever again, but it might be valuable to someone else. --Geographyinitiative (talk) 12:24, 24 May 2023 (UTC) (Modified)
If the etymology claims the terms seems to derive from Wikipedia, then whoso removes the Wikipedia diff cite must be called out for being stupid, in his literalist application of the rules, that simple. On another note, the etymology can literally link the diff anyway, so one can’t make a difference by believing one has to remove the quotation. Such quotations were never to be removed by bot either but humans. There have always been almost no cases where Wikipedia has been quoted anyway, I think somebody botted a short listed a year ago after which editors could manually replace such bad quotes, like a dozen cases in twenty years. Fay Freak (talk) 14:08, 24 May 2023 (UTC)
Okay, well, thanks. I'm just putting this discussion out there such that whomsoever wants to do more work on these rare Wikipedia-originating words in future years has some background discussion to look at. --Geographyinitiative (talk) 15:01, 24 May 2023 (UTC)

Global ban for Leonardo José Raimundo

In accordance with the global ban policy, I am notifying you that I have started this global ban request. I welcome your participation there if you're interested. Best regards, Elton (talk) 12:49, 25 May 2023 (UTC)

He's never edited here, I think. CitationsFreak: Accessed 2023/01/01 (talk) 19:30, 25 May 2023 (UTC)
@CitationsFreak: Actually, he has, going back to 2015. He's limited himself to a very narrow semantic field- mostly books of the Bible- so you may not have run into his edits. He used to make weird edit summaries like he was announcing to the world his every move: "I am creating an entry for " or "I am translating" whatever he was working on. He also has always been extremely polite to the point of being stilted, and has left messages on admins' user pages wishing them a merry Christmas, etc. in a very formal and quaint manner. He's always been very careful to avoid any kind of conflict, and no one has bothered to systematically check the accuracy of his edits. Chuck Entz (talk) 03:26, 26 May 2023 (UTC)
Thank you for correcting me, Chuck! I'm gonna see if we already reverted the false stuff he's added. CitationsFreak: Accessed 2023/01/01 (talk) 03:46, 26 May 2023 (UTC)

Purposes of Wiktionary

Do we have a reasonable, ideally live, list of the legitimate purposes (and perhaps use cases) of Wiktionary? A list of non-purposes may also be useful for delimiting the boundaries. Recent discussions have come up against a number of questions:

  1. Should it be usable by idiots?
  2. Should it be usable for trying to understand a text by assembling translations or explanations word by word?
    1. For non-speakers to understand English texts
    2. For understanding other languages
  3. How much can we rely on the users having a good grammar to hand?
    • So is it enough to say that a term is the instrumental singular of a specified lemma?
  4. Should it be usable for generating at least a literal translation to English?
    • This includes the problem of uniformly adopting at least US or UK English (generating Australian or Canadian English might be too hard).

As a work in progress, Wiktionary will obviously have many shortfalls, but I'm not not sure how much we want to deliberately build them into Wiktionary. (Obviously, we want to make it difficult for someone who can't type Devanagari to look up Sanskrit they may encounter, and Prakrit exhibits other obscurantist practices.) --RichardW57m (talk) 10:38, 26 May 2023 (UTC)

  • 1 and 2: yes. 3: not at all, but users develop an intuitive understanding of practical grammar. If a bookish user has serious grammatical questions, they can ask us or they can buy, beg, borrow, or steal a learner's grammar or a more academic one. 4: I suppose so. DCDuring (talk) 15:50, 26 May 2023 (UTC)
    On 1: We should be usable by idiots, but not by lazy idiots.
    On 4: We can't cater for people who don't speak English, so we assume that the user has at least a basic understanding of English or can use it to mediate between two languages. Thadh (talk) 16:25, 26 May 2023 (UTC)
1: Currently, we cater to people who are idiotic enough to be unable to figure out that poliorcética would mean poliorcetics, but it seems we do require them to know what poliorcetics means. Maybe our motto should be: Wiktionary, the obnoxious dictionary.
2 is impossible. Forget about it.
MuDavid 栘𩿠 (talk) 03:15, 29 May 2023 (UTC)
If only there were some way for a user to get to a full definition of poliorcetics from an FL term, without the FL entry contributor having to duplicate content already in Wiktionary, if only . DCDuring (talk) 17:38, 29 May 2023 (UTC)

Question 1 is relevant to the way pronunciations are presented on Wiktionary. Appendix:English pronunciation is called "less intelligible" diff but grudingly allowed. What about more ad hoc stuff like was removed here diff (/tree-ZOOB/)? Would Saul B. Cohen's pronunciations for instance here: "CHOO-SHYONG" (Saul B. Cohen, editor (2008), “Chuxiong”, in The Columbia Gazetteer of the World, 2nd edition, volume 1, New York: Columbia University Press, →ISBN, →LCCN, →OCLC, page 802, column 2) be deleted on sight on Wiktionary? But no American high schooler can read IPA, that is to say tryzub#Pronunciation is worthless to a major potential audience, except for the audio clip. I read this as: to Wiktionary, prestige pronunication schemes- those with words like "international" in them, which are "scientific"- are more important than actually aiding "the idiots" (read: the Americans) where they are at. So the answer is, truly, No. --Geographyinitiative (talk) 20:01, 29 May 2023 (UTC)

But the question was "SHOULD it be usable by idiots?" Maybe the questions and answers should be broken down to specific headers etc. In Pronunciation sections we accommodate 'idiots' by homophones, rhymes, audio clips, sometime hyphenation. At my level of idiocy the stress indicators in the IPA are useful. DCDuring (talk) 22:57, 29 May 2023 (UTC)

I work with languages that can have dozens of dialects. To help me print lists of dialect names, I created the template {{lect}} some years ago, which allows for {{lect|language_code|dialect_code}}, with the added benefit of not having to type in the complete language code for each dialect with a parent language prefix, ex. {{lect|xme|abz|far|qoh|yar|cim|nar|jow|qal}} instead of {{lect|xme|xme-abz|xme-far|xme-qoh|xme-yar|xme-cim|xme-nar|xme-jow|xme-qal}}.

The module was broken through some edits to a dependent module, outputting incorrect languages. I fixed it, but @Theknightwho reverted that edit in bad faith due to personal conflict, and has locked the module in a state that where it is now not just outputting the wrong languages, but producing errors. -- {{victar|talk}} 17:09, 26 May 2023 (UTC)

@Victar I have no personal conflict with you. I told you that there was no consensus for this format, because it results in subdivided language codes that are ambiguous. You then decided to edit war over it, so I locked the page as I warned you I would do. I'm now going through entries correcting your ambiguous formatting. Theknightwho (talk) 17:11, 26 May 2023 (UTC)
Consensus from whom? This is how and why the template was created, with help and support from Erutuon. You simply came, disagreed, and took it upon yourself to remove that functionality. --{{victar|talk}} 17:37, 26 May 2023 (UTC)
Erutuon's contribution was very minor, and amounted to fixing a bug. You need consensus to subdivide language codes, because the entire issue is that it creates ambiguity; I have said that already. This is unexpected behaviour that goes counter to every other template on Wiktionary (as opposed to merely being a feature), and you haven't even documented it. After all, the whole issue you were complaining about was the fact that it was showing the wrong languages, which was precisely because of the ambiguity I am highlighting. Theknightwho (talk) 18:01, 26 May 2023 (UTC)
Haha - just to highlight how disingenuous it was to say Erutuon helped: here's a thread showing that Victar essentially made the decision to create Module:lect unilaterally, in the face of objections from Erutuon. Come on. Theknightwho (talk) 18:32, 26 May 2023 (UTC)
The collaboration with @Erutuon for this module was done through a sandbox and on discord. What you linked to is a discussion from two years prior to the creation of this module for a different concept, which had to do with {{alt}} and how it displays labels through Category:Dialectal data modules.
The module had been working as expected for years, sans problems calling up the correct language code. This "ambiguity" you're decrying is a non-issue. --{{victar|talk}} 18:58, 26 May 2023 (UTC)
The ambiguity exists, whether you acknowledge it or not. As conversations with you are inevitably fruitless, I don't see much point in continuing this discussion. Theknightwho (talk) 20:37, 26 May 2023 (UTC)
  • For context: Module_talk:lect ‑‑ Eiríkr Útlendi │Tala við mig 17:26, 26 May 2023 (UTC)
  • Hm, I never knew exactly knew what the template {{lect}} was for, which I have used but a few times after Victar informed me about its existence in one edit history, I thought it is something about making wikisourcecode and module behaviour cooler. So the main reason was as ridiculous as splitting language codes? Sorry, this unexpected behaviour gives me heebie-jeebies, the aesthetic gain seems not worth the existence of the template and module. Langcodes are sometimes dumb, but it is a non sequitur that one should be able to use jla instead of arc-jla after arc. And pree, I am already bare autistic about looksmaxxing, you don’t want to see me drip right now (or do you?)—but one shall not see a difference where there is none. Fay Freak (talk) 21:27, 26 May 2023 (UTC)
  • I might have more sympathy if there was any scrap of documentation anywhere saying what the template, the module, or any of their parameters are, what they're used for, or how they're supposed to work. This is a wiki. Anyone with the right permissions can modify anything. How is anyone supposed to know that your little private gizmo is going to be affected by seemingly unrelated changes elsewhere? The behavior of your template seems logical enough- but it's a different logic from all the other templates. I know you would rather that no one would ever touch anything you've done- but, again, this is a wiki.
Not that I think @Theknightwho is perfect. This actually is a great object lesson in what's wrong in their own approach, exaggerated to the point of absurdity. They should take a long, hard look in the mirror and think about how to avoid ending up like this. Chuck Entz (talk) 22:31, 26 May 2023 (UTC)
This is a concern that I've seen a ton as a third-party observer to a lot of these conflicts. I wish that folks would discuss more and reach a consensus rather than sectioning off parts of the wiki that they claim to own. AG202 (talk) 22:58, 28 May 2023 (UTC)

Selection of the U4C Building Committee

The next stage in the Universal Code of Conduct process is establishing a Building Committee to create the charter for the Universal Code of Conduct Coordinating Committee (U4C). The Building Committee has been selected. Read about the members and the work ahead on Meta-wiki.

-- UCoC Project Team, 04:21, 27 May 2023 (UTC)

Tons of synonyms in translation-style definitions

For example, the first sense of mindenképpen has nine more-or-less synonymous English glosses. My instinct is that this is unhelpful, because distracting, and superfluous, because you can find synonyms under the English entries, but is there a policy on this? The style guide doesn't really mention it. —Caoimhin ceallach (talk) 23:33, 27 May 2023 (UTC)

It's probably worse than misleading, because some of the glosses have more than one sense. Are all the senses covered by the Hungarian term? I don't know what the right way is to handle polysemy in FL L2 section gloss words, but I'm pretty sure that we're not doing it right. Our style guide is at best a suggestion. DCDuring (talk) 01:50, 28 May 2023 (UTC)
I don't think it hurts to have a lot of synonyms, provided they're accurate, because (a) that's what's expected for any good translation dictionary, (b) it helps to illustrate the range of meaning, and (c) when you're translating something and you're trying to think of the right word, it's helpful to see all the possibilities. That doesn't mean you have to list every possible translation, but you should list all the good ones. Many translation dictionaries I've used regularly list as many as a dozen translations for a single sense. Andrew Sheedy (talk) 06:06, 28 May 2023 (UTC)
"Many translation dictionaries I've used regularly list as many as a dozen translations for a single sense."
Especially helpful for language pairs that don't correlate very cleanly. ‑‑ Eiríkr Útlendi │Tala við mig 06:17, 28 May 2023 (UTC)
I also find it implausible that most users would read a list of English glosses and conclude that the term means all of them in every sense, which is what the objection by DCDuring seems to suggest. Rather, it represents the intersection of those terms. Theknightwho (talk) 19:49, 28 May 2023 (UTC)
I agree. Otherwise we would have to do without polysemic terms entirely and use only monosemic ones. That would be a rather tall order. PUC20:23, 28 May 2023 (UTC)
It is only helpful if the definition gives you a way of choosing between the offered translations. I also feel that if a sense encompasses a wide range of meaning, it should be split up into sub-senses. —Caoimhin ceallach (talk) 22:17, 28 May 2023 (UTC)
Synonym clouds should generally be avoided - I think what should be done instead is if a word has a hard time translation, you give a few English glosses, then you give a full explanation in {{gloss}}. Vininn126 (talk) 08:06, 28 May 2023 (UTC)
I have observed editors going beyond their understanding and making long translations less accurate than single word translations. Vox Sciurorum (talk) 17:30, 28 May 2023 (UTC)
I think giving a couple (2-3) translations is actually useful, because the overlap of the English terms usually eliminates the necessity for a gloss definition.
I.e. if I say "for", you won't understand what I mean, but if I say "for, since", you'll get a much better understanding. Thadh (talk) 18:26, 28 May 2023 (UTC)
I do think there are situations where one translation the perfect one, but you can clarify it with {{gloss}}. If a word overlaps, use many, but I think that sometimes {{gloss}} is underutilized. Vininn126 (talk) 06:31, 29 May 2023 (UTC)
The correct answers have been given. While there is a “too much”, entries more naturally suffer from too few if there is a gloss of but one word one which only corresponds in one sense or some senses and even those not wholly. How difficult is it to understand that a foreign term is not required to correspond to any particular English lexeme at all? The smallest and most common elements, like Russian -то which we don’t gloss directly, don’t: While organism names may have a specific referent so that one translation is best, other terms relate to the whole approach in which a speaker of a language structures his world. It is a most usual situation that a good translation is context-dependent, since word choice is conditioned by the environment, which engenders idiomaticity, more than by rationalized “senses”. If we aren’t idiomatic we aren’t convincing. But if we are idiomatic then we are untrue. So dictionary editors limber it up a bit, so there is no impression of foreign language being English. On a very basic level all languages fulfil the same purposes of expressing things, but then again humans have all kinds of discourses that due to their intertextual nature only make sense in one society, with terms meeting the ends of such disputes. Fay Freak (talk) 20:16, 28 May 2023 (UTC)
In general, there's nothing wrong with creating definitions out of a few overlapping synonyms. The problems arise when the meaning doesn't lend itself well to that approach. One could write whole essays about terms like Hebrew שָׁלוֹם and Spanish gracia (the first two that popped into my head) that overlap in part with some English terms, but not completely with any of them. There's also the danger of false friends and misleading etymological connections.
The example given (Hungarian mindenképpen) definitely has problems- after looking at it a dozen different ways for an hour or so, I'm still not sure what it means- but I'm inclined to think that the long list of synonyms is more a symptom of the problem than the actual problem itself.
To start with, "by all means" is something you would say if someone asked about whether doing something was a good idea. The idea is that the speaker is trying to encourage the other to do it by letting them know they like the idea and don't see any good reason not to. On the other hand, "at all costs" is saying that the speaker is emphatically insisting that the other person do it- not doing it would be a serious failure. One is positive with no coercion, while the other is negative and demanding. "In any event" is used after discussing various factors or possibilities to switch focus to what is said next. "No matter what" is saying that nothing could possibly be a reason not to do something, or that nothing will prevent something from happening/being true. Without going into the other terms, it's apparent that there are mutually exclusive connotations: something should/must/will/has to be done/happen/be true, let's change the subject/there's no need to discuss factors/they're irrelevant/they must be ignored.
This analysis isn't complete by any means, and could use further development/clarification, but I hope you get the idea. The synonyms seem to be chosen because they have some connection to whatever the central idea is, not to show the boundaries of it. I'm sure at least a couple are connected by having analogous parts, regardless of whether the actual meanings are related.
At any rate, I would contend that there are some cases where simply giving a list of English terms is not enough. In such cases, some explanation is required- either in the definition or in a usage note. Not that it happens often, but when it does, we shouldn't let general stylistic preferences get in the way of getting the idea across. Without knowing what mindenképpen actually means, I can't comment on whether that's the case here- but we seem to be discussing general principles here, anyway. Chuck Entz (talk) 01:37, 29 May 2023 (UTC)
"By all means" isn't necessarily positive. For instance, consider "to do something by all means necessary." Andrew Sheedy (talk) 22:44, 1 June 2023 (UTC)

Thank you for the replies. They've been helpful. I've expanded mindenképpen instead of slimming it down. —Caoimhin ceallach (talk) 20:28, 1 June 2023 (UTC)

Translations in Campidanese and Logudorese Sardinian

Some entries have translations in Campidanese and Logudorese Sardinian. However, these two appear to be etymology-only languages. What name and code should we use for translations in these languages/varieties?

Entries with translations in Campidanese: five, three, two

Entries with translations in Logudorese: five, gift, minute, three, two

Nakakaano (talk) 04:20, 29 May 2023 (UTC)

For varieties not considered to be independant languages, we do what is already done in those translation tables: the language code and the main header is for the language we do recognize, but with subheaders or glosses to show which is which. The language header in the entries is also for the main language, with labeling as to dialect on the definition line. I haven't tried it in this case, but I would imagine that using any etymology-only code in the {{t}}/{{tt}}/{{t+}}/{{tt+}} family of templates would cause a module error. I would also imagine that the translation-adder gadget would automatically format them the way I described, though I'm less sure about that.
As for whether to treat those dialects as separate languages, that would have to be discussed and consensus reached before we could do that. I can certainly see arguments for either approach. See WT:LANGTREAT for previous discussions. @Nicodene, -sche. Chuck Entz (talk) 04:53, 29 May 2023 (UTC)

Greek

Why are there only Greek and Ancient Greek entries?

French for example has Middle French and Old French.

Wouldn't it make more sense for Greek to have modern Greek, medieval Greek, Koine Greek and Ancient Greek? Because what we currently have seems too simplistic to me. Synotia (talk) 08:07, 29 May 2023 (UTC)

The periodization of Greek on Wiktionary was actually discussed recently (in March). It seems that some were concerned that, because of the conservative spelling, vocabulary and morphology of written Greek, separate medieval Greek entries might often be overly duplicative of the content in Ancient Greek entries. It also requires introducing another arbitrary boundary which might have associated inconveniences. Nevertheless, several Greek language editors did favor defining Medieval Greek as a separate language. Note that the current policy is that Koine Greek and Medieval Greek are covered on Wiktionary, just not as separate languages from "Ancient Greek".--Urszag (talk) 08:25, 29 May 2023 (UTC)

Japanese – bot script to remove yomi from Template:ja-pron

Hello, as we deprecated the yomi parameter from {{ja-pron}} about two months ago, and we have been managing a cleanup category to track all of the pages that still use this parameter, I believed it might be a good idea to write a bot script to automatically handle removing these arguments from existing entries. Currently, using it doesn't affect the rendering of the template anyway, but at least to me (and maybe new editors too) it was confusing to see it used in existing entries, but apparently apparently to no effect; this would make for a confusing learning material for new editors, definitely.

The script can be seen here, if anyone would like to critique the code or find potential errors in it. So far, I have also run it on my main account to test that everything is in order; you can see some edits here:

Please let me know if you support the running of this script across the ~17000 entries we currently have to process; if so, then I can deploy the script whenever it's needed. @Erutuon @Nardog @Rdoegcd Thank you! Kiril kovachev (talk) 15:48, 29 May 2023 (UTC)

+1 in support. ‑‑ Eiríkr Útlendi │Tala við mig 21:37, 30 May 2023 (UTC)
@Kiril kovachev I am in support of this but I definitely think you should use mwparserfromhell rather than rolling your own regexes. It handles all sorts of edge cases properly (e.g. what if there are spaces before or after the yomi param name or its value, what if for some crazy reason the {{ja-pron}} template has a nested call to some other template with a yomi param in it, etc.). Also in terms of handling backup of pages in case your script goes awry, the way I normally handle this is to output a file containing, for each changed page, a unified diff of the change made. This uses up a lot less disk space and is more robust to people making unrelated changes to the same page after your script operates; when you just save the entire page contents, you would overwrite any subsequent changes upon restoring that version. Benwing2 (talk) 19:29, 2 June 2023 (UTC)
@Benwing2 Thank you for this advice; I wondered if there was a better way than the regex idea, but I didn't think it would be easy to use a parser, so I didn't originally make it that way; but mwparserfromhell looks like it'll make it rather easy in fact, so I'll re-write the script soon to use that as well. With regard to the backups, I was thinking of using the saved snapshot to manually restore just the Japanese section postedit by just using some string manipulation, extracting the pre-edit Japanese part and only replacing the Japanese section of the current entry; how would you go about using this "unified diff" system, i.e. how would you store, and then use, the diffs? Is there somewhere to read on this? At any rate a wise idea, I shall have to look into it. Thanks very much for your help, Kiril kovachev (talk) 20:04, 2 June 2023 (UTC)
I now updated the script, so it uses the parser and generates diffs using the Unix diff command. I was having various troubles with using Python's difflib, and although my code is using the filesystem to interact with the patch and diff commands, it is able to amply revert its changes when I run the revert script. If you have any ideas how this can be improved, since admittedly there is some amount of bodge associated with this here implementation, please let me know; but in any case, it's a good bit better now, I think. Here are some edits tested under the new code:
... and this one is a revert that undid my previous edit to the page, done using the undo script.
Kiril kovachev (talk) 22:18, 2 June 2023 (UTC)
@Kiril kovachev This is great, thanks for doing this. What issues did you run into with Python's difflib? This is what I use to generate diffs:
      if do_diff:
        pagemsg("Diff:")
        oldlines = existing_text.splitlines(True)
        newlines = new.splitlines(True)
        diff = difflib.unified_diff(oldlines, newlines)
        dangling_newline = False
        for line in diff:
          dangling_newline = not line.endswith('\n')
          sys.stdout.write(line.encode('utf-8'))
          if dangling_newline:
            sys.stdout.write("\n")
        if dangling_newline:
          sys.stdout.write("\\ No newline at end of file\n")

Benwing2 (talk) 05:51, 3 June 2023 (UTC)

@Benwing2 No problem, thanks for helping :) It wasn't any trouble to save the diffs, that much was fine. I was doing this myself:
             def save_diff(old_text: str, new_page: pywikibot.Page):
                 with open(os.path.join(YOMI_BACKUP, new_page.title()), mode="w") as f:
                     f.writelines(difflib.unified_diff(new_page.text.splitlines(True), old_text.splitlines(True)))
...maybe with some other code that I've since forgotten, but overall that same thing; the problem was that I didn't then know how to revert the changes anymore. Once the diff file is saved to disk, I no longer know how to load it back, nor what function to use to restore the content if I have to. There's `difflib.restore`, but I don't really get how this would work with the unified diff from before. Also, in what order should the diff be stored? My current setup uses the order new->old, meaning if I apply `patch` using the diff on the new page text, it generates the old page. In yours I notice it's the other way around... so I'm sure there is something more sensible I can be doing, but I'm not quite sure what. What do you do yourself with the diffs to process them? Kiril kovachev (talk) 13:41, 3 June 2023 (UTC)
@Kiril kovachev There are maybe three ways of applying diffs, either the patch command you're using, through git, or using a special library such as . I haven't actually implemented this functionality yet as I haven't needed it; the couple of times I needed to undo changes were long ago and I used the revert functionality in pywikibot. I was thinking of trying the Google library I just mentioned, but I haven't had a chance yet. I made the diffs go forward so I can read them more easily; you can feed such diffs to patch using the -r flag. It sounds like you have a good setup, so no need to change it. Benwing2 (talk) 21:17, 3 June 2023 (UTC)
@Benwing2 Alright, thanks, that's great. Maybe I'll try out the library and see if it works for me. Is there anything else to do script-wise? I suppose we should wait for more opinions before launching this as well anyway. Kiril kovachev (talk) 22:18, 3 June 2023 (UTC)
@Kiril kovachev All sounds good to me. In my scripts I have a general library that handles a lot of tasks that are repeated across scripts, but you can develop this over time as you write more scripts. Since this change seems fairly non-controversial I'm not sure you need to wait for more opinions. Benwing2 (talk) 20:23, 4 June 2023 (UTC)
@Benwing2 Right, if I find more tasks to get done via bot then I imagine I'll do the same—thanks for that idea. Anyway, if you believe everything's in order, could you please unblock my bot account (User:KovachevBot)? If that's fine then I'll do a few more manual tests and then get things going. Kiril kovachev (talk) 17:48, 5 June 2023 (UTC)
@Benwing2 Kiril kovachev (talk) 09:51, 12 June 2023 (UTC)
@Kiril kovachev Apologies, I missed your ping from a week ago. Normally to enable a new bot account you need a one-week vote; go to WT:Votes and click on "Start a new bot vote". When I did that it gives a default two-week period for voting but the text at the top says one week so I'd change the end date to be one week out. I don't think you'll have much trouble getting your bot approved since we've already discussed the changes you're going to make and I at least am confident in your technical abilities and carefulness. Benwing2 (talk) 17:06, 12 June 2023 (UTC)
No worries, I think I didn't send the ping properly the first time, so my mistake. I missed the part where I needed to create a vote, sorry about that—it's now been done, though. Could you please check if everything's okay with that? Is there anyone I should notify now, or will the vote get seen on its own? And thanks very much for your confidence. Kiril kovachev (talk) 19:06, 12 June 2023 (UTC)
In fact, I forgot to send it again @Benwing2 :) Kiril kovachev (talk) 19:07, 12 June 2023 (UTC)
@Kiril kovachev I don't think it quite worked; it looks like you edited the bot creation template rather than actually creating a vote. The vote also needs to be added to the list of active votes. Benwing2 (talk) 19:54, 12 June 2023 (UTC)
@Benwing2 Oh dear. Please forgive my lack of reading. I didn't see the part where you need to fill in the rest of the title before hitting the button, so it just started that page with no name filled in... In retrospect this is really a silly mistake... sorry about that. Now, I've created Wiktionary:Votes/bt-2023-06/User:KovachevBot for bot status and put it in the vote list – it seems quite out of the way, though, as it's right at the bottom of the page. Do these things get seen on their own, or should I make an effort to attract any interested parties to weigh in on it? I'm unfamiliar overall with the voting process, since I've not experienced one before nor gone to vote myself.
I also apologise if my edit has caused any trouble for you, although fortunately it doesn't look to have done anything to the bot-vote creation form, so please feel free to delete that extraneous page if you'd like. Kiril kovachev (talk) 20:30, 12 June 2023 (UTC)
No trouble whatsoever. The list of active votes shows up at the top of the watchlist whenever you visit Special:Watchlist so I think most people will see it. (I happen to be someone who doesn't use watchlists so sometimes I miss votes but I think I'm in the minority.) Benwing2 (talk) 20:36, 12 June 2023 (UTC)
@Kiril kovachev Benwing2 (talk) 20:36, 12 June 2023 (UTC)
Alright, that's good to hear – thanks very much for your help! Kiril kovachev (talk) 20:48, 12 June 2023 (UTC) @Benwing2

Hebrew verb conjugation tables

I have created a new topic on the About Hebrew talk page in order to discuss giving more parts of a verb in Hebrew conjugation tables. This has been raised in the past, but no resolution was ever reached. I'd appreciate thoughts if this is of interest to anyone. SaryaniPaschtorr (talk) 00:27, 30 May 2023 (UTC)

Looser CFI for dialects

Recently, a vote to treat dialects as LDLs failed for fears that the LDL standards are too permissive, but there was a consensus that the current criteria are too strict. I think we should adopt a more conservative approach of considering individual dictionaries: for example, the entries in English Dialect Dictionary and the Dictionary of American Regional English fall within the spirit of "all words in all languages" in my view even if we can't find three uses, since these are scholarly works compiled through extensive fieldwork representing dialect spoken by thousands or millions of people rather than just being collections of nonce words. Would anyone oppose this (or would like to suggest other dictionaries)? Ioaxxere (talk) 01:31, 31 May 2023 (UTC)

These words spoken by thousands of millions of people are already inclusible since by this token they have sufficient use, witnessed by such a dictionary. We don’t need to directly see or be able to see any uses, it may suffice that we can assume a secret military record containing the term. Fay Freak (talk) 03:08, 31 May 2023 (UTC)
The point of the vote was not strictly to treat all dialects/lects as LDL's, but allow groups of editors to decide which ones should be, just for the record. Vininn126 (talk) 09:18, 31 May 2023 (UTC)
Some terms in dialect dictionaries are common, some are very local. I do not support bulk imports from dialect dictionaries. If we can't find it in use and we can't reference it in an etymology, let it remain buried. Vox Sciurorum (talk) 18:56, 31 May 2023 (UTC)
Are terms less deserving of mention because they happen to be part of an obscure regional dialect of a "well-documented" language, instead of being part of a limited documentation language? That doesn't seem reasonable. Theknightwho (talk) 19:39, 31 May 2023 (UTC)
The quantity of documentable terms for limited documentation languages is far smaller. I am willing to discriminate against Shropshire, Pittsburgh, and the rest of the provinces. Vox Sciurorum (talk) 19:45, 31 May 2023 (UTC)
The original idea of the vote was to allow some dialects to be LDL's while others had to meet the standard 3 citations - is that more or less what you're saying? Vininn126 (talk) 19:46, 31 May 2023 (UTC)
You can’t really make a blanket statement like that, given the overwhelming majority of languages on Wiktionary are LDLs. Plus, no, let’s not discriminate against the “provinces”. Theknightwho (talk) 19:54, 31 May 2023 (UTC)
@Vox Sciurorum What about in cases in which we have 1 or 2 uses? Ioaxxere (talk) 00:10, 1 June 2023 (UTC)