Wiktionary:Beer parlour/2023/December

Zero width joiners in Sinhala script?

@RichardW57m has brought to my attention that almost 8 years ago I had created a faulty අද්ධානමග්ග instead of අද‍්ධානමග‍්ග, which I promptly deleted; after using a unicode tool to break down the codepoints I have noticed U+200D (zero-width joiner; ZWJ for short) in the correct spelling, and indeed the Wikipedia page confirms that the ZWJ should be there. So I've decided to bring it up here and ask whether the ZWJ are all mandatory and if it is possible to check for other faulty (misformatted) entry titles. --kc_kennylau (talk) 11:45, 1 December 2023 (UTC)

So far as I am aware, AL-LAKUNA should never be visible in Pali, but always be hidden. However, most fonts don't support the code for touching consonants (ZWJ, AL-LAKUNA), and support for conjuncts (AL-LAKUNA, ZWJ), even those used in Pali, is fairly limited. (I had to extend the LKLUG font to proofread what I'd typed.) I'll endeavour to check the lemmas and non-lemmas tonight - there are embarrassingly few of them. (I don't read the script accurately - there are far too many confusables for me.)

It does look as though parts of the Sinhala Tipitaka edition I use was printed using a font that was missing a few Pali conjuncts. To add to the complications, some Pali conjuncts only occur in sandhi compounds, so a simple search through a lexicon may miss some. The Pali conjuncts I've found so far are listed in sinh_cjct in Module:pi-Latn-translit or have 'r' or 'y' as second element.

Adding conjuncts for Sinhala and Sanskrit is a work in progress - the Tipitaka, or at least, the Sinhala version, contains a few snippets of what looks like Sanskrit, so the conjuncts listed in the introduction may include some that aren't used in Pali. --RichardW57m (talk) 13:28, 1 December 2023 (UTC)

@Kc kennylau: Note that the citation forms of Pali lemmas aren't necessarily Pali, but may be Sanskritists' abstractions. We write consonant stems ending in AL-LAKUNA, although that is not the native tradition. --13:40, 1 December 2023 (UTC) RichardW57m (talk) 13:40, 1 December 2023 (UTC)

@Kc kennylau: I found three apparently mistyped lemmas අහිඡත්තක (ahichattaka), ඉස්ස (issa), ද්වාර (dvāra), apparent errors for අහිඡත‍්තක (ahichattaka), ඉස‍්ස (issa) and ද‍්වාර (dvāra), all entered in 2016 by Lo Ximiendo. @Apisite, can you defend these entries of yours? --RichardW57m (talk) 15:16, 1 December 2023 (UTC)

So what's exactly the difference between the two sets of entries? --Apisite (talk) 19:36, 1 December 2023 (UTC)

@Apisite: With a Pali-capable font, the first show the p-like form of AL-LAKUNA as the elements of the consonant cluster are joined by <U+0DCA>, while in the second the consonants touch, as they are joined by <U+200D U+0DCA>. In most fonts, which aren't Pali-capable, they just have the the form with AL-LAKUNA displayed.

We can also add චිත්ත (citta, “mind”) for චිත‍්ත (citta) to the list. You can see an inflected form of the word (චිත‍්තං (cittaṃ)) at the far right of the second paragraph (end of verse 1) on p82 (the 97th page of the PDF) of http://www.aathaapi.net/tipitaka/10.OTSPDN1_Deegha_Nikaya_1.pdf. --RichardW57 (talk) 17:41, 2 December 2023 (UTC)

@Apisite, Kc kennylau: I've now RfVed the four words. --RichardW57m (talk) 17:59, 4 December 2023 (UTC)

@RichardW57m: Where can any Pali-capable fonts be found? Such fonts ought to be mentioned in Wiktionary:About Pali. --Apisite (talk) 22:22, 4 December 2023 (UTC)

@Apisite: Agreed. I thought I had mentioned the one I made, but I remember being very cautious not to be considered to be advertising. I've now added a link there to my font LKLUG_T, but I should re-investigate. Iskoola Pota seems to be optional for Windows, and it may work for Pali beyond Windows 7 - I'm not sure that I have any means of testing the relevant version at my disposal. HarfBuzz has had problems with Iskoola Pota for clusters starting with 'r', but (a) they may have been using too old a font, (b) the problems may have been solved, and (c) such clusters are rare to non-existent in Pali. (Trouble might come with registers of Pali that freely use Sanskrit words.) --RichardW57m (talk) 11:18, 5 December 2023 (UTC)

Safari on iPhone renders 3 of the 4 words here correctly - it fails on ද‍්වාර (dvāra), possibly because the font authors assumed that the cluster dv doesn't exist in Pali. RichardW57m (talk) 11:18, 5 December 2023 (UTC)

Noto Sans Sinhala is Pali-capable except for the rare cluster -ntv-, which occurs in ගන‍්ත්‍වා (gantvā), an absolutive of gacchati which we haven't documented yet. It also supports all the conjuncts listed in the introductory matter of the volumes of the BJT edition of the Tipitaka, except:

(a) The conjunct ඤ්‍ඡ (ñcha)

(b) The conjuncts looking like the Sinhalese letters ඟ (ⁿga), ඦ (ⁿja) ඬ (ⁿḍa) and ඹ (ᵐba); perhaps there is some ordinance that these conjuncts be encoded as the letters.

It's more readable than LKLUG(_T). --RichardW57 (talk) 13:23, 7 December 2023 (UTC)

@Apisite: An example of the word ගන‍්ත්‍වා (gantvā) occurs as the 11th word of the first paragraph on p88 of http://www.aathaapi.net/tipitaka/10.OTSPDN1_Deegha_Nikaya_1.pdf. In that volume, the sequence 'ntv' only occurs in that word and two of its compounds. --RichardW57 (talk) 14:35, 7 December 2023 (UTC)

The problem with the missing conjunct that I vaguely remembered was handled by using the touching form සන‍්ථව (santhava) instead of the conjunct form සන්‍ථව (santhava, “acquaintance”). There weren't any al-lakunas. --RichardW57m (talk) 15:16, 1 December 2023 (UTC)

Future of Northern Wu in Wiktionary

I would first like to thank @Wpi, @Manishearth, @Musetta6729, et cetera in the successful implementation of Wugniu for Shanghainese. I believe that, now that sufficient time has passed since its implementation, we should discuss prospects for further implementations.

Firstly, the full depreciation of the legacy Shanghainese romanisation ought to commence sooner rather than later. Assuming Wugniu to Wiktionary conversions are fully working (so far, no issues have been found since September), an automated conversion of pronunciation modules from |w= to |w=sh: could happen in the near future. I am aware that bot functions may be compromised (@Fish bowl, etc), and so it would be greatly appreciated for any affected code to be updated as soon as possible.

Secondly, a relatively more minor issue, raised by Musetta: certain users may have wrongly assumed that the legacy "ny" initial (Wugniu "gn") in Shanghainese did not need an -i- glide. This is, in fact, incorrect. Would it also be possible to have this be fixed by bots/module?

Thirdly, relevant information for a potential |w=sz: Suzhounese pronunciation module has been written out (User:ND381/Wu Expansion#Suzhounese). Aside from a disyllabic checked tone sandhi and a drastically simpler right-prominent sandhi, most of the code could potentially just be taken from the Shanghainese module. It would be greatly appreciated if the code for this could be written when anyone has the time.

PS: Since most other Suzhounese analyses do not pretend Suzhounese tone sandhi is this simple, perhaps writing down the tone for every syllable (with the tone number preceeding the initial and final, ie ¹ciau-¹kue 交關) should be done. This is a relatively minor note compared to the rest of this but is of note.

Once again thank you for the relatively placid transition into Wugniu for Shanghainese and comments would be greatly appreciated.

(@Justinrleung, RcAlex36; @Atitarev, Thedarkknightli, ChromeGames, Mteechan for the Wu stuff) — ND381 (talk) 20:53, 1 December 2023 (UTC)

With regards to the Suzhounese tone notation:

In my opinion it might be better to either notate the tone sandhi units and boundaries simply (ie 交關 ¹ciau-kue), or, if we are to notate citation tones, add them as subscript characters (maybe something like 交關 ¹ciau₁-kue₁).

Unlike the usual analysis for urban Shanghainese, Suzhounese phrase-sandhi can not be attributed solely to the citation tone of the beginning syllable (eg the word 從前 zon₂ zie₂ has tone 6 sandhi instead of the tone 2 of the first syllable), so a model that generates something like 從前 ⁶zon-²zie would likely introduce problems in that regard. If we were to mark citation tones in the notation I think then this should be applied to all syllables (ie 從前 ⁶zon₂-zie₂).

— Musetta6729 (talk) 14:24, 6 December 2023 (UTC)

@ND381: What do you think of thanking me and others for exporting pronunciations of Wu Chinese and other kinds of Chinese to the Chinese Edition of Wiktionary? --Apisite (talk) 23:32, 1 December 2023 (UTC)

Mass deletion by Equinox

Hi! Equinox recently went on a massive deletion spree. I can't blame the poor guy for that. Naturally, many decent entries also got deleted. Would someone be able to recreate all the Spanish stuff that got lost? The English stuff if good too, but I personally don't care about plurals. Denazz (talk) 22:32, 3 December 2023 (UTC)

A lot of the Spanish stuff shouldn’t be entries… “lavar la cara” is not a worthy entry. Two of your accounts have recently been blocked for this type of stuff, I’d avoid creating similar entries. This is precisely what I said would happen with that unfortunate vote too. CC: @Benwing2 since you also work in Spanish. AG202 (talk) 02:56, 4 December 2023 (UTC)

@AG202 Yup, I saw this. I scanned some of the deleted entries but unfortunately it's not so easy to scan them quickly; but the ones I saw were mostly stub-type articles. Wonderfool: If you want these entries to stick I'd suggest making a page in userspace containing the entries you want to add, all concatenated one after another with definitions, inflections, etc., so someone like me can quickly review them. Benwing2 (talk) 03:12, 4 December 2023 (UTC)

Yeah, I'm not gonna do that userpage concatenation, Ben. It would take forever to enter the mainspace. Remember that all excellent entries start off as stubs Denazz (talk) 09:33, 4 December 2023 (UTC)

@Denazz That last sentence is definitely not true. An entry can start off great. Vininn126 (talk) 09:34, 4 December 2023 (UTC)

Fine, most excellent articles start off as stubs Denazz (talk) 09:36, 4 December 2023 (UTC)

:-/ Please put in more effort than that. AG202 (talk) 12:39, 4 December 2023 (UTC)

In my experience, almost never the case. Stub entries stay stub entries, good entries become better. Thadh (talk) 14:58, 5 December 2023 (UTC)

The redlinks on User:JeffDoozan/lists/es/drae_link_missing_autofix are the deleted Spanish entries that are at least corroborated by DRAE, which does include "lavar la cara" in the sense of giving something a once-over. I can't see the deleted articles so can't comment on their quality or whether it's worth some admin's time to undelete them. JeffDoozan (talk) 13:19, 5 December 2023 (UTC)

@JeffDoozan This one is defined as "# (idiomatic) to patch up; to wipe off". I took a look at a few others; papel de cigarro is defined as "# (dated) paper cigar (cigarette)" which might be SOP; disco de freno is defined as "# brake disc" which is probably SOP. Some of the others may not be. Benwing2 (talk) 20:19, 5 December 2023 (UTC)

"Papel de cigarro" looks like it should mean 'cigarette paper', as in, the actual paper used to roll up the stuff in a cigarette. I'd have RFV'd it if the page were still there. MedK1 (talk) 15:21, 7 December 2023 (UTC)

@MedK1 Agreed, and Googling seems to confirm that "papel de cigarro" means rolling paper. Maybe Wonderfool was thinking of cigarro de papel and got confused? Benwing2 (talk) 21:50, 7 December 2023 (UTC)

Some of the deleted articles seem like good articles indeed, like botón de fuego, de poco pelo and huelga de cielo (among others). With my rudimentary Spanish, I have no idea what any of these could mean. MedK1 (talk) 15:14, 7 December 2023 (UTC)

This wonderfool fellow seems to enjoy an awful lot of slack. I feel as though if I were to make hundreds of deletable articles and literally six hundred sockpuppet accounts I'd probably be blocked for life without further consideration. Where's the justice? AP295 (talk) 09:30, 10 December 2023 (UTC)

Actually, Wonderfool was blocked for life, too, on multiple occasions. But the blocks were never range blocks, which would have been highly effective at stopping their participation. Another excellent tool to combat Wonderfool's participation would be to revert everything they've ever done, but that's only been used a couple of times as a last resort, and by Equinox, who was probably drunkenly editing at 4:00 in the morning. However, the general opinion is that the good edits outweigh the bad ones, hence the slack. Some argue that Wonderfool's good humour and lack of personal attacks against fellow users are points in their favour, too. Denazz (talk) 10:58, 10 December 2023 (UTC)

Seems a bit capricious to pay out so much slack just on 'good humor'. I don't know this user and have no complaint against them, it just struck me that they're remarkably well-tolerated. AP295 (talk) 11:07, 10 December 2023 (UTC)

Mostly well-tolerated. WF gets their share of angry messages of disapproval, too. Generally of the "I had to clean up your shit again" tone. To which WF could easily reply "I've cleaned up more shit on Wiktionary than anyone else", but they're too humble and polite to do so. Denazz (talk) 11:12, 10 December 2023 (UTC)

Six hundred sockpuppets, the perfect picture of humbleness and humility. I guess I'll have to take your word for it. AP295 (talk) 11:28, 10 December 2023 (UTC)

@AP295 you are conversing with Wonderfool himself 🎊 Word0151 (talk) 15:29, 10 December 2023 (UTC)

What an honor. AP295 (talk) 15:36, 10 December 2023 (UTC)

We're approaching 8 million entries and we have fewer contributors than Wikipedia. Slack is all about net value of contributions to the project. You don't really understand our Criteria for inclusion, and you've created 7 rather mediocre entries, but you've wasted huge amounts of contributors' time commenting on stuff you don't understand very well. At the current rate, you might reach a tiny fraction of WF's usefulness to the project sometime in the next century. Chuck Entz (talk) 16:05, 10 December 2023 (UTC)

A glowing testimonial. Feel free to ignore me if you'd like. That's your prerogative, but don't act like I'm being obstructive or holding up the process. AP295 (talk) 16:08, 10 December 2023 (UTC)

From what I have seen you generally have been. Very little mainspace contribution while absolutely adoring controversy in fora. Vininn126 (talk) 16:09, 10 December 2023 (UTC)

Again, nobody's forcing you to reply. You don't have to say a word to me unless you feel like having a chat, I can't make you and wouldn't want to if I could. AP295 (talk) 16:13, 10 December 2023 (UTC)

Public fora aren't a place to place a bunch of messages that no one should reply to. If everyone took that approach you could post endlessly and we could just say "eh, no need to reply!". They are for people to discuss, as hard as that is to believe. Please listen to the large number of people telling you you are being disruptive instead of doubling down. If everyone is telling you you are being disruptive, it's probably not from nowhere. Vininn126 (talk) 16:15, 10 December 2023 (UTC)

Why would I reply if nobody wanted to have a conversation? I posted one topic in the beer parlour and nearly everyone pitched a fit about it, for no discernible reason. I've dropped the issue, even though I still feel it was a good point and a good-faith suggestion to improve the place. You don't need to be so sour over it. God. AP295 (talk) 16:19, 10 December 2023 (UTC)

Okay bud. Vininn126 (talk) 16:20, 10 December 2023 (UTC)

@AP295 Please stop projecting your own insecurities onto other users - it's extremely rude. Theknightwho (talk) 01:15, 11 December 2023 (UTC)

Alright... it was clear already in the November BP thread (to quite a diverse range of users!) that AP was an entryist troll not here to build a dictionary, but I held off on giving him the NOTHERE block he was angling for because a minority of people seemed to think he might actually be OK. When the trolling in that thread resumed, I again found myself considering giving him the NOTHERE block he was after, but held off in the hope of seeing a miraculous personality / editing-behaviour change, especially once it seemed like he might've pivoted to making some possibly-useful contributions, though I thought (and still think) that not blocking him would be consigning ourselves to have to monitor the user's edits for the trolling/POV he's clearly inclined to introduce (and making/letting him learn to start diluting that POV into useful edits instead of virtually all of his contributions being to a few trolling discussions, only makes it harder to monitor and get rid of; IMO it'd be better to just be done with it). Seeing yet more trolling, I am upgrading the block Fenakhay implemented for disruptive editing from 2 weeks, to indefinite: AP is clearly not here to build a dictionary and it's better to just block and be done with it. - -sche (discuss) 16:01, 12 December 2023 (UTC)

Unlikely that the motive was to troll. Word0151 (talk) 18:00, 12 December 2023 (UTC)

I can't help but agree here. Sure, the dude's abrasive, rude sometimes and doesn't seem to have made the best edits, but considering his actions "trolling" seems a bit much? MedK1 (talk) 01:24, 18 December 2023 (UTC)

WP:DE address this issue really well: "Disruptive editing is not always intentional. Editors may be accidentally disruptive because they don't understand how to correctly edit, or because they lack the social skills or competence necessary to work collaboratively. The fact that the disruption occurs in good faith does not change the fact that it is harmful". Theknightwho (talk) 01:35, 18 December 2023 (UTC)

Just noting for the record in case AP tries to come back that it was brought to my attention that the user was indef blocked on Wikipedia in 2021 for the same reasons. - -sche (discuss) 16:38, 11 February 2024 (UTC)

I didn't "go on a massive deletion spree". I punished you the only way that is possible, which is deleting your work. I hate it too mate, but every single time you either do low-quality mistakes, or deliberate trolling. Merry Christmas. Equinox ◑ 02:33, 19 December 2023 (UTC)

Character names

I think letter/character names deserve a special treatment with the Etymology section simply directing to their respective origins. Otherwise be#Etymology 3 should be multiplied to all characters whose name in English/Latin is "be" (see this). The same goes to all other character names. Specifically Chinese characters can have a bunch of readings and one reading could be applied to a bunch of characters, even in one language/script. I don't think it's a good idea to have separate etymology sections for each such character name. Also, most of those names are met in other languages with the Latin script.

Character names are used to reference characters, e.g. zed for z. They are definitely words and can be popular among language enthusiasts of all origins. --GareginRA (talk) 22:23, 4 December 2023 (UTC)

Specifically Chinese characters can have a bunch of readings and one reading could be applied to a bunch of characters, even in one language/script. I don't think it's a good idea to have separate etymology sections for each such character name.

I am not entirely sure I understand what you mean here. From what I think I can glean, I must categorically disagree that "one reading should be applied to a bunch of characters" for etymologies.

Let's look at an example. The Latin-character string bèi is used as the pinyin and rough pronunciation of many Chinese characters. We have a disambiguation page at ] as a result. Click through from that disambig page to the full entries themselves, and for those entries that have etymology sections, you will see that each word that is presently pronounced as bèi in modern Mandarin has a distinct origin. Compare, for example, 備 and 北. Collapsing all of these etymologies into a single section at ] would either be unacceptably complicated (if explaining the actual derivations of all characters with this reading), or unacceptably lossy (if ignoring the different derivations).

If I have misunderstood your intent, please clarify. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:42, 4 December 2023 (UTC)

Ok, I think I misled everyone with that sentence. In fact, I'm not too familiar with how Chinese is dealt on this website. And I think I shouldn't have mentioned character readings at all. What I really want to focus on is specific names for characters in different languages that got their translation into our English entries. Please, follow the links I provided to familiarize yourself with an example. --GareginRA (talk) 03:28, 5 December 2023 (UTC)

So, for example. If we have a separate etymology in be for Russian бэ (we currently do), should we translate all other Cyrillic б's and put each in a new Etymology section? Check б#Kyrgyz and б#Ukrainian for reference. --GareginRA (talk) 03:36, 5 December 2023 (UTC)

@GareginRA: No, because they are the same word be in English. What is more of an issue is the duplication in б, though I believe Cyrillic letters are less of an issue than Roman letters. --RichardW57m (talk) 13:13, 5 December 2023 (UTC)

Same word, different etymologies. I think we should have an definition like "a reference name for following characters:" and then a list. And in etymology I don't even know, some direction to follow the links. This would work unless the English name has its own story, in which case it would deserve a separate Etymology section.

What you called "more of an issue" is actually less of an issue, because those entries are in different languages and they vary in alphabets and pronunciations, etc. --GareginRA (talk) 16:27, 5 December 2023 (UTC)

(e/c) If I understand you correctly, I am weakly inclined to agree with grouping related names for different scripts' versions of 'a' letter, the way that the Hebrew-script beth and Syriac-script beth are grouped under one etymology at beth. However, I would say that in the case of be, it seems like we're just talking about one script's letter ("the Cyrillic script letter Б"), and it seems plausible that English (as our etymology says) borrowed the term for this letter from Russian and now uses it to name that "Cyrillic script letter Б" regardless of whether that Cyrillic script letter is appearing in a Russian-language word, a Kyrgyz-language word, a Ukrainian-language word, etc, and that the current etymology is thus fine without us needing to mention Ukrainian, Kyrgyz, etc in the etymology. (Likewise, I would not start adding {{bor|en|de|Ef}}, {{bor|en|fr|F}} /ɛf/, etc to ef or F just because the German and French languages also use F. But I might add a separate sense-line and/or etymology section for ef as the name of the Cyrillic letter ф, since it seems to be a different letter with a different origin.) - -sche (discuss) 23:26, 5 December 2023 (UTC)

@-sche, GareginRA: Well, with a different path. I'd be very surprised if it didn't also come from Latin, with a suspected Etruscan origin. --RichardW57m (talk) 09:45, 6 December 2023 (UTC)

@-sche, ok, so what you imply is that there is an actual English name for a Cyrillic letter and it doesn't discern a specific alphabet, it just so happened that etymologically this one came from Russian, correct me if I'm wrong. But then again, we might as well call Japanese べ a be, and maybe there are some other be's from other scripts. --GareginRA (talk) 19:16, 6 December 2023 (UTC)

Vote now active: Ordering of descendants in mainspace entries

FYI: Wiktionary:Votes/2023-11/Ordering of descendants in mainspace entries. Chernorizets (talk) 00:05, 5 December 2023 (UTC)

Braided Trees of Descendants

How are we supposed to handle complications such as a word being both derived and inherited, and also with further descent perhaps treating words from those forms differently. I've found such a complication with Pali dhātu being inherited from 'Sanskrit' in the feminine and borrowed in the masculine, and apparently only the feminine sense being borrowed by most borrowers but both sets of senses being borrowed by Thai. Thai also has the complication that a word could be borrowed from Pali or Sanskrit (or a blend of the two) - and we should note that the compilers of the authoritative Thai dictionary (the Royal Institute Dictionary) have given up on taking a view as to which.

One possibility would be to record borrowing and inheritance separately. One could also record all plausible lines of descent. But without judicious stubbing out, this could lead to overlarge trees.

Am I right that {{desctree}} cannot handle multiple 'Descendants' sections? --RichardW57m (talk) 12:51, 6 December 2023 (UTC)

@RichardW57 I think you should record the two etymologies as two separate descendants. If there are further descendants of each, they can be recorded, but if they have merged, there's no point in duplicating everything from then on; instead use some sort of cross-reference. As for {{desctree}} not handling multiple Descendants sections, can you give an example? Benwing2 (talk) 22:24, 6 December 2023 (UTC)

@Benwing2: I don't have any examples of {{desctree}} going wrong, which is why I asked. It does work to the extent that it can distinguish words by etymid if the etymids match in calls of {{desctree}} and the descendant. --RichardW57 (talk) 10:50, 7 December 2023 (UTC)

@RichardW57 I asked because I'm not sure what you mean by handling multiple Descendants sections. Can you clarify with a hypothetical example? Benwing2 (talk) 21:47, 7 December 2023 (UTC)

@Benwing2: For example, Pali dhātu has multiple 'Descendants' sections, one for each etymology. An example of the issue would be provided if Sanskrit धातु (dhātu) listed the Pali word in its 'Descendants' sections without using |id=. It didn't provide an illuminating test case because the descendant in the second 'Descendants' section, namely Thai ธาตุ (tâat), also appeared in the first section, so it could have been using just the first section, or it could have been using both the sections and discarding duplicates. To thoroughly test the behaviour, one would have to consider the various permutations of the multiple 'Descendants' sections having or not having an associated {{etymid}}, and possibly even varying placements of the invocation if there's no boy checking that they're in the right place. --RichardW57 (talk) 08:39, 8 December 2023 (UTC)

@RichardW57 I see, I need to look at the {{desctree}} code but I'm 90% sure it will just take the first Descendants section it finds. Benwing2 (talk) 08:55, 8 December 2023 (UTC)

Read "boy" as "bot"! --RichardW57 (talk) 10:20, 8 December 2023 (UTC)

Portuguese adjectives/verbs as nouns

Paging @Sarilho1, Stríðsdrengur, Rodrigo5260, Benwing2, AG202, Munmula, Ultimateria as people in Category:User pt with recent contributions...

While creating augmentatives for adjectives in Portuguese (we're seriously lacking coverage for those), I noticed lindo has a noun sense, referring to a beautiful person. While that's accurate, the problem here is that one can do that with pretty much literally any adjective, it's a feature of the language, just like turning verbs into nouns. I think we really should consider doing something about those kinds of senses for words like this, seeing as it'd cause serious bloat... if it were applied consistently, which it actually isn't. This actually goes for the verbs-as-nouns too; you have gordo and andar as nouns, but not belo or caminhar or falar or anything.
It can even happen with derived forms of adjectives like lindeza; from Google: "Aquela lindeza passando no seu feed" ("lindeza" referring to a woman), "aquela lindeza que saiu por aqui hoje" ("lindeza" referring to a handmade product). I can't find citations for it, but something like "Aquela lindeza olhou pra mim hoje. Ela tá tão na minha!" doesn't strike me as odd either. All of these phenomena are really consistent in the language, but really annoying to implement in Wiktionary...

I think there's a case to be made for removing these seeing as you can interpret "o gordo" as "o gordo" with 'homem' being implicit. If I say "o gordo tropeçou na calçada", any native speaker would presume I'm talking about a man (homem) even though I could be talking about, say, a boy (menino) or a kitten (gatinho).

But then again, English has a similar thing where nouns become verbs and yet they're all properly defined as verbs as in typo or brain. So maybe what should be done is to actually go and properly implement this for loads upon loads of adjectives and loads upon loads of verbs? Preferably automated somehow: this doesn't look like a doable job for a human... MedK1 (talk) 15:12, 7 December 2023 (UTC)

@MedK1: My general guidelines would be based around if it's lexicalized or not, typically if it's already a lemma in Portuguese-language dictionaries (which fits WT:LEMMING). I've seen us do something similar for English and other languages as well. For example, with gordo, there's a separate noun sense in: Priberam, Infopédia, Dicionário da Língua Portugesa, pt-wikt, and Dicio and Vocabulário Ortográfico da Língua Portuguesa seem to imply that it's a noun as well; thus, I'd expect us to have a noun sense on our own to match that, though I do think our definition is... lacking. AG202 (talk) 16:05, 7 December 2023 (UTC)

@MedK1 @AG202 I agree with AG202 here, esp. about the lexicalization. For example, the verbal meaning of "brain" (to bash someone's brains out) is not especially predictable given its noun meaning. If I take a random noun like "pill", it wouldn't be obvious to me what "to pill" means, other than its meaning related to fabric (which has nothing to do with the noun meaning). Similarly for "to computer", "to chair" (other than its meaning derived from "chairperson"), "to floor" (other than its idiomatic meaning "to stun"), etc. The question for Portuguese is, can *any* adjective or verb be nominalized in a consistent, predictable fashion? I would guess, for example, that the meaning of "lindeza" as "pretty girl" or "beautiful product" etc. is idiomatic, comparable to English "beauty". The lemming test is not bad either; you have to be careful and not just follow it blindly, but to the extent that other dictionaries have similar conceptions of lexicalization, it can save some arguing back and forth. Benwing2 (talk) 20:59, 7 December 2023 (UTC)

I agree that that sense of "lindeza" is idiomatic; it belongs at the entry. Incidentally, I'd love to see na minha defined. The line of idiomaticity is sometimes blurry but I think it often comes down to common sense. Would someone really call a person "um inteligente" for example? (Compare "um cego".) If not, there probably shouldn't be a noun sense. I have sometimes come across definitions like "(in the plural) intelligent people" but I really don't think we want those; they're grammatical features, not lexical units. P.S. "atualmente" is best translated "currently". Ultimateria (talk) 04:27, 8 December 2023 (UTC)

Yeah hm, maybe "lindeza" really is idiomatic. You can do the same for "beleza" too, but definitely not for, say, "limpeza" or "rapidez". I can definitely concede that.

As for "um inteligente", yeah actually. It's not the most common thing ever, but it definitely happens; plenty of hits for that on google.

About "na minha", I sort of agree, but I think na de alguém would be a better article. You can say "Ela tá na sua" (literally "She's in yours") ou "Ele ficou na dele" (literally "He stayed in his"). It's worth noting these two sentences have different meanings though. When someone is "in their own", it means they're minding their own business, that they're staying in their lane. When someone is "in someone else's", it means they like someone, that they're into them. I wonder how that'd translate to an actual definition in a Wiktionary page... MedK1 (talk) 02:43, 9 December 2023 (UTC)

@MedK1 I think you'd maybe define na de alguém with two definitions, with qualifiers reading 'referring to the subject' for the meaning "minding one's own business" and 'referring to someone else' for the meaning "into someone, liking someone". Benwing2 (talk) 02:49, 9 December 2023 (UTC)

Ah, that looks perfect! I was unsure about how to handle the 'referring to someone else' part. I'm on it!

This reminds me, Benwing... I'm not really happy with how {{pt-adj}} handles absolute superlatives. Lots of verbs have -íssimo/-érrimo forms, but requesting those always makes the really redundant comparative forms with "mais" show up. This makes the headword so long. I mentioned this on Discord, but it didn't get a lot of traction; a word like arrumado gets really wordy and that's without adding the augmentative (which along with the diminutive seems strangely lacking here in Wiktionary and in Portuguese Wiktionary too...). Here's what I actually said over there:

it'd be cool if we removed the comparative form ngl

this feels like a carryover from English, where most adjectives have an -er/-ier form as their comparative

Portuguese doesn't have that, so we just have SOP forms like "more pretty" for every verb (which we don't list at "pretty" even though 'more pretty' is theoretically possible). It's not exactly usual to list that in any Portuguese dictionaries and I think it's kinda silly to do it here since there are no regularities? Like, it's not conveying any useful info. I don't think we do that for Spanish forms...

I think diminutive and augmentatives r more relevant esp since Wiktionary only has a handful of either of them and there are irregularities when it comes to those.

I wanna make a similar argument for the "o mais <adjective>" superlatives, too.

MedK1 (talk) 02:57, 9 December 2023 (UTC)

@MedK1 I see your point. Possibly this is a carryover from the old headword; I rewrote the code at one point. The absolute superlative has a different meaning from the 'o mais' comparative, right? So if we removed the 'o mais' form, we'd need to change the inflection to read 'absolute superlative', maybe linked to an appendix. As for SOP comparatives and superlatives, we do actually list them in English, check out any adjective with three or more syllables like intelligent (which lists a proscribed 'intelligenter', which BTW sounds completely wrong to me, along with 'more intelligent') and irascible (which just lists 'more irascible', 'most irascible'). Benwing2 (talk) 03:11, 9 December 2023 (UTC)

Changing the inflection to "absolute superlative" and then appendix sounds good to me, yeah.

About the SOP comparatives, I knew they were listed in English, but I was thinking moreso about how they're supposed to highlight the irregularities of those forms in English. Sometimes you use -ier/-er, sometimes you have to use more... and that kind of information is very useful to learners (was it "more pretty" or "prettier"? "shier" or "more shy"?). Still, my point is that nothing along those lines is the case for Portuguese though. There's no -er/-ier equivalent, you use 'mais' for 99% of the comparable adjectives since there's no other way to do it. Since the only exceptions are bem/mau/grande (and their respective antonym), there's no real reason to teach it (either to L2 learners or to kids) for loads of adjectives; to show it anywhere where it's not actually specified as an irregular form is to clutter the headword, if that makes sense. MedK1 (talk) 03:33, 9 December 2023 (UTC)

Ah, I forgot to ping @Benwing2, oops. But yeah, effecting the changes would be grand! MedK1 (talk) 00:20, 18 December 2023 (UTC)

@MedK1 I did see this, actually, but I haven't done it yet because it turns out to be a bit non-trivial to implement; but I'll get to it soon. Benwing2 (talk) 00:41, 18 December 2023 (UTC)

Canvassing Policy

A few editors have expressed concern about us not having a policy against canvassing discussions and I agree. I propose the following rule:

Advertising an active discussion is allowed for the purpose of achieving a broader and more effective consensus. Editors must not attempt to influence the outcome of a discussion by soliciting participation.
Allowed: Pinging or notifying relevant editors and clearly stating a valid reason for doing so, for example:

The main editors of a language being discussed.

The creator of an entry being discussed.

Editors who have participated in a similar discussion in the past.

Allowed: Linking to an active discussion via a public channel (WT:Beer Parlour, WT:Discord, etc.) and clearly stating a valid reason for doing so.
Not allowed: Linking to an active discussion and encouraging editors to respond or vote in a particular way, whether implied or explicit.
Not allowed: Pinging or notifying editors in an arbitrary manner or on the basis of their opinions.
Not allowed: Soliciting participation on a non-public channel.

Pinging @AG202, Theknightwho were discussing this on Discord. Ioaxxere (talk) 18:11, 7 December 2023 (UTC)

@Ioaxxere Was there a specific situation that prompted this? Also, what does "soliciting participation on a non-public channel" mean? Can you give examples? Benwing2 (talk) 21:46, 7 December 2023 (UTC)

@Benwing2: Yes, see wpi's vote at Wiktionary:Votes/sy-2023-11/User:Ioaxxere for admin AG202 (talk) 04:34, 8 December 2023 (UTC)

@AG202 I see, thanks. Benwing2 (talk) 04:35, 8 December 2023 (UTC)

@Benwing2 A non-public channel is any form of communication that isn't accessible by every editor. An example would be creating a private "Association of Inclusionist Wiktionarians" group chat and mobilizing them to participate in CFI-related discussions. It's essentially a corollary of the "pinging or notifying editors in an arbitrary manner or on the basis of their opinions" rule. Ioaxxere (talk) 21:41, 9 December 2023 (UTC)

@Ioaxxere I see, thanks. Benwing2 (talk) 21:43, 9 December 2023 (UTC)

I would also allow inviting contributions from those known to have given the matter, or a related matter, serious thought. Often, the inviter will have an idea as to how the invitee would vote, so this would fall foul of "Linking to an active discussion and encouraging editors to respond or vote in a particular way, whether implied or explicit." Perhaps a general exception should be made for canvassing from the discussion itself. --RichardW57 (talk) 10:43, 8 December 2023 (UTC)

Discord??? Then why not any social media means or Usenet group? DCDuring (talk) 16:00, 8 December 2023 (UTC)

I feel like if you only invite people to see a vote if they are on one side of a debate (such as people who you know are pro-CitationsFreak), that would be canvassing. If you were to ask people on both sides of the debate, it wouldn't. Same if you want a second opinion from an experienced user. CitationsFreak (talk) 05:35, 9 December 2023 (UTC)

I agree - this certainly isn’t the first time concerns have been raised about canvassing, and it would be good to have something formal going forward. Theknightwho (talk) 07:21, 9 December 2023 (UTC)

These rules just seems like common sense to me. When people behave in a way that falls afoul of the spirit of the suggested policy above, people already call it out. This community is not so large that this type of behaviour isn't obvious. I'm not convinced a formal policy is needed - especially not one that is so detailed as to stimulate the nitpicking behaviour of policy literalists. This, that and the other (talk) 08:43, 9 December 2023 (UTC)

@This, that and the other I kind of feel the same way, honestly. Benwing2 (talk) 09:12, 9 December 2023 (UTC)

Agreed. I think applying common sense is already enough. – wpi (talk) 07:22, 10 December 2023 (UTC)

I see no reason to disallow it here when there's no way of stopping any given clique of users from doing the same thing via a private chat. I doubt regular users would be held accountable to this rule even if they were quite obviously breaking it. AP295 (talk) 03:01, 10 December 2023 (UTC)

@AP295 This thread exists precisely because a regular user was held accountable for it. Conspiracy theories are not helpful. Theknightwho (talk) 07:01, 10 December 2023 (UTC)

tangent from a now-blocked troll

The point still stands. It's all well and good to say that soliciting participation on a non-public channel is not allowed but I have a hard time believing this would be regularly enforced or that it's even enforceable to begin with. Isn't the whole point to solicit participation anyway? And by stating one's argument, isn't one encouraging editors to respond or vote in a particular way already? I agree with the spirit of this rule but it seems unenforceable and not particularly likely to stop the powers that be. AP295 (talk) 07:09, 10 December 2023 (UTC)

@AP295 Most rules are impossible to enforce perfectly, but that does not make them pointless, so no, your point does not stand at all as it is wholly irrelevant. Theknightwho (talk) 07:13, 10 December 2023 (UTC)

Such a rule strikes me as rather an overstatement of the sort of behavior any project has the authority to materially enforce, except in instances where the offending user is especially obvious about it. In other words, Wiktionary can't cash the check, so it shouldn't write it. AP295 (talk) 07:26, 10 December 2023 (UTC)

@AP295 So you’re saying it’s fine to let someone get away with it if they’re sneaky about it? Or are you implying this policy would mean the Wiktionary police are going to stop people from communicating with each other in their free time? Because quite clearly all this policy would actually mean is that if you canvass you’ll be sanctioned on-wiki, which is self-evidently within the remit of the Wiktionary community.Theknightwho (talk) 07:33, 10 December 2023 (UTC)

It seems like any other 'feel-good' rule that's implemented not because there's much a need or because it can be practically enforced, but simply for appearances. Do what you like, this is just my opinion. As another example, consider AGF. It can only really be enforced to the extent that we say what we assume, and so for all practical purposes it could be equivalently stated "do not question the motives of others", which sounds far less appealing, no? AP295 (talk) 07:47, 10 December 2023 (UTC)

@AP295 You could make the precise same argument about every policy, because your criticisms are generalised, imprecise, and based on your own assumptions about the motivations of others (and clearly motivated by your own perceived grievances). It’s difficult to see it as a reasoned position. Theknightwho (talk) 08:09, 10 December 2023 (UTC)

Then ignore it if you like. My point is that wiktionary can't really enforce what people do outside wiktionary and rules are often ignored by frequent contributors without any particular consequence anyway. I reiterate my earlier comments too: Isn't the whole point to solicit participation anyway? And by stating one's argument, isn't one encouraging editors to respond or vote in a particular way already? Everyone gets a vote, no? In my experience admins will do what they feel like regardless of the rules, particularly on Wikipedia. AP295 (talk) 08:20, 10 December 2023 (UTC)

🍿 Word0151 (talk) 08:42, 10 December 2023 (UTC)

If that's beer I'll take it. Looks like neon-orange movie popcorn to me though. AP295 (talk) 08:49, 10 December 2023 (UTC)

Non-English Contractions

I'm curious what the policy is for creating entries of contractions, particularly for non-English entries. Specifically I'm wondering if we should add Welsh entries for words like i'r, which is a very common contraction of i + yr. I don't see anything glancing through WT:CFI, although I do see that English words like isn't and wasn't do have entries, which would seem to support creation of these entries.

This does get a little more complicated for the Welsh case, since contracted forms are possible for any word ending in a vowel + yr yields (word)'r. It would seem silly to add these contractions for every word, so that seems to be a point against those types of entries...

Thoughts? – Guitarmankev1 ^(talk) 21:35, 7 December 2023 (UTC)

@Guitarmankev1 This seems exactly parallel to the use of 's in cases like "that man's chasing me!" and "the dog's been chasing the mail carrier again". We handle this by defining 's as a clitic, and the same would apply to 'r (which in fact already exists). We would not define every such contraction; isn't and wasn't are defined because n't can only cliticize onto a small, fixed set of verbs. Benwing2 (talk) 21:43, 7 December 2023 (UTC)

Ahh okay that does makes sense. The distinction between a universal clitic and a few fixed cases is very clear. Thanks Benwing, helpful as always! – Guitarmankev1 ^(talk) 22:01, 7 December 2023 (UTC)

@Guitarmankev1: I face a similar issue with Pali, where the use of apostrophe is rare but not unknown, so there's often no hint that one is dealing with a contraction. I've taken to recording short contractions which I have had trouble figuring out, and they've not been challenged yet. I think it's better to err on the side of inclusion. --RichardW57 (talk) 11:06, 8 December 2023 (UTC)

However, for Welsh, "i'r" is not actually opaque, but its commonness ought to defend its inclusion. It would be usefully educative to describe it as an 'automatic contraction' so that someone who's looked it up will know how to handle other words. --RichardW57 (talk) 11:06, 8 December 2023 (UTC)

I have raised the issue before that we don't seem to have guidelines (do we?) about which contractions to include or exclude. We seem to get by on a shared/collective intuition: we have that's "that is/has/was/does" but we don't have man's "man is" ("the man's going to the store"), we have this's "this is/has" but not lady's "lady has" ("that lady's got some nerve!"), we have all's "all as", and's, what've, you'dn't've (and in other languages so'n, gibt's, hab's) etc, but not women've been working, pretty'll have to do. We seem to collectively intuit something along the lines of : "pronoun/demonstrative/conjunction-type words + other thing" contractions are reasonable to include, "basic noun/adjective-type words + other thing" generally aren't, probably because the set of pronouns/demonstratives/conjunctions is small and the set of nouns/adjectives is extremely large. I don't personally offhand see a problem with including i'r...? - -sche (discuss) 16:34, 12 December 2023 (UTC)

@-sche Agreed on general principles, although I'm not sure how to concretize this other than by POS, as you propose. Benwing2 (talk) 00:05, 15 December 2023 (UTC)

Santali declension

I am linking this discussion for more information. Can Santali be provided with a synthetic declension on Wiktionary? HeliosX (talk) 23:12, 7 December 2023 (UTC)

@HeliosX Per that discussion, if the case markers are normally written as separate words, then there may be no need to create declension tables for Santali nouns. In any case you probably won't find someone with the time to implement this. Benwing2 (talk) 01:02, 8 December 2023 (UTC)

@HeliosX, Benwing2 Isn't the template that HeliosX made, {{sat-noun}}, supposed to be named {{sat-decl-noun}}? --Apisite (talk) 08:11, 8 December 2023 (UTC)

@Apisite @HeliosX Yes, please rename to {{sat-decl-noun}} or {{sat-ndecl}}. A template named {{sat-noun}} should function as a headword template, not a declension template. Benwing2 (talk) 08:52, 8 December 2023 (UTC)

@Apisite, Benwing2 Pinging you and Benwing2, I renamed the template. Can you find counterexamples of Santali speakers using the declension suffixes apart from personal pronouns and demonstrative pronouns? At least in grammars they are used. HeliosX (talk) 15:47, 8 December 2023 (UTC)

@HeliosX Thanks for renaming the template. I don't know anything about Santali so I can't be of help here. Benwing2 (talk) 00:05, 15 December 2023 (UTC)

Transcription question

Pardon my 101-only knowledge of phonetics and phonemics. I have seen a stressed /ˈiː/ transcribed as /ˈiː/ for BrE but /ˈi/ for AmE. I've seen it more than once, so I've gathered that it was intentional. Hoping that someone can explain why. Thanks. Quercus solaris (talk) 15:59, 11 December 2023 (UTC)

There is a general consensus to not transcribe American English with phonemic vowel length. The development of open vowels is probably a factor in this: for accents with the father-bother merger, which includes the majority of American English speakers, the traditionally "short" LOT vowel is phonemically merged with the traditionally "long" father/palm vowel (or sometimes with the THOUGHT vowel): it's tricky to justify either transcribing "father" with a short vowel or "velocity" with a long vowel. For British English, some phonologists have also argued against using length markers, but because of the loss of rhotic vowels, there are more surface contrasts where length appears to play an important role in distinguishing mininal pairs such as burn vs. bun (NURSE vs. STRUT), beard vs. bid (NEAR vs. KIT, for some speakers), even if some qualitative differences may still exist in these cases.--Urszag (talk) 17:53, 11 December 2023 (UTC)

Basically, I follow “Appendix:English pronunciation”. — Sgconlaw (talk) 18:45, 11 December 2023 (UTC)

Thanks both! This was helpful. This was a good reminder for me to mindfully reconsult Appendix:English pronunciation before writing "ː" at WT. I think my mind was glossing over differences between WT's standards and WP's. Not wildly different anyway, but a reminder for me to mindfully reconsult WT's as needed. Quercus solaris (talk) 19:40, 11 December 2023 (UTC)

The context label "offensive"...

I asked Benwing2 whether or not he'd mind if I remade this topic and he didn't forbid it. So, with great trepidation, I will make a narrower restatement of my point: This context label should be replaced by "derogatory" or "pejorative" in cases where a word is offensive because it is derogatory or pejorative. Frankly I don't think offensive is an appropriate context label at all, but the only policy change I'm suggesting this time around is adding "Prefer objective context labels over subjective context labels. For example, if something is offensive because it is derogatory, then use derogatory instead of offensive." to Wiktionary:Context labels. AP295 (talk) 10:24, 12 December 2023 (UTC)

I'm inclined to agree with this, but I also didn't wade into the long, drawn-out argument on this before. Andrew Sheedy (talk) 01:40, 13 December 2023 (UTC)

It's tempting to argue OP has subjective and objective backward. "Offensive" is an objective label, measurable (are people offended by the word? if yes, offensive; if no, not), whereas "derogatory" requires surmising the inner heart and intent of the speaker (was she intending to derogate, or to inflame, or describe, or something else?).
As discussed in the other thread, some words are derogatory but not offensive (e.g. washing sherry, rustbucket, and according to our current entry American handegg which is even humorous), other words are offensive but not derogatory (e.g. squaw dress, blackfellow's bread, Jewish piano), so each label must be used where it applies, or the information the label conveys is obscured.
Of course, OP's framing — treating "are people offended?" as a more subjective question than "what was this speaker's inner intention in using this word?" — and OP's userpage and scant few other activities on this site mostly point towards the idea that obscuring the offensiveness of slurs is the user's intent... but determining intent is, as just discussed, more subjective than determining what the user's proposal would objectively do, which is obscure information... - -sche (discuss) 02:27, 13 December 2023 (UTC)

If "derogatory" can be read as "offensive" and depending on the speaker's intent, would "pejorative" be a better term for display where "derogatory" now appears? Derogatory and pejorative are synonyms, but pejorative seems to me to be less likely to be read as "offensive". Do others here agree?

In English we have:

37 entries with definitions labeled "pejorative".

978 entries labeled "derogatory".

143 entries labeled "offensive".

I find myself at a loss to justify in many cases why the specific labels are applied in specific cases.

An alternative is to eliminate "derogatory/pejorative" altogether, replacing "derogatory" with "offensive" where appropriate. Presumably we wouldn't deem rustbucket offensive. It doesn't need any label to convey that it disparages what it refers to.

I don't know how this would apply to affixes that change a neutral term to a pejorative one, apparently a phenomenon in some languages (though not common in English?). DCDuring (talk) 14:37, 13 December 2023 (UTC)

I strongly agree here. There's a clear difference between offensiveness and being derogatory. I'd rather us not try and combine the two labels. AG202 (talk) 14:52, 13 December 2023 (UTC)

Could you explain the difference? Most dictionaries, including enwikt, call disparaging, pejorative, and derogatory synonyms. Is the difference one of intent, as -sche has said: disparaging, pejorative, and derogatory terms try to be offensive/insulting, but don't necessarily succeed? Or is it one of degree: offensive terms are more usually taken as offensive that those that are derogatory. Or is it the nature of the target: offensive terms are derogatory of a person or identity group, derogatory terms have other targets: eg, non-human or humans who are beyond the pale ("pervert")? If we can't come up with explicit criteria, we will continue to have inconsistent application of the labels and have to accept recurring tea room and talk page discussions on the applicability in specific cases, not to mention more BP discussions. "Offensive" has problems of application, but at least there is a need for such a label. I'm not sure what the points of derogatory and pejorative are, let alone the distinction between them. DCDuring (talk) 19:16, 13 December 2023 (UTC)

@DCDuring There was a discussion awhile ago where I proposed generally replacing derogatory with pejorative. I'm not sure where the discussion went but I think User:-sche did a dictionary analysis and concluded that derogatory was more common (?), but (I think) we came to a similar conclusion as you concerning the possibility of derogatory being interpreted as "offensive", whereas pejroative doesn't seem to have that connotation. The result was that suffixes are identified as pejorative rather than derogatory (cf. Category:Spanish pejorative suffixes), since it's the more standard linguistic term, and likewise the form-of template {{pejorative of}} reads "pejorative of" rather than "derogatory form of" (or whatever); but the labels derogatory and pejorative are merged into derogatory and categorize as e.g. Category:English derogatory terms. Cf. a term like Russian золотишко (zolotiško), which is pejorative form of золото (zoloto, “gold”) and has a meaning something like "ill-gotten gold"; this feels qualitatively very different from terms such as artfuck and assbutt (to pick a couple of random terms out of Category:English derogatory terms). Benwing2 (talk) 00:03, 15 December 2023 (UTC)

Thanks. I still have trouble with the boundaries on the use of "derogatory". Is it really just for failed or mild forms of offensiveness, or for terms not directed at persons or groups, or for terms not directed at oppressed groups? Should most (negative-valence) definitions of, say, bad be properly labeled derogatory? If not, why not? DCDuring (talk) 13:33, 15 December 2023 (UTC)

@DCDuring By "negative valence" definitions of "bad" you just mean "not good"? Intuitively, no these should not be labeled either "derogatory" or "pejorative"; the reason may be that "not good" is the denotation of "bad" whereas "derogatory" or "pejorative" is intended for connotations. Benwing2 (talk) 23:47, 15 December 2023 (UTC)

(@DCDuring) Would you label "good" a {{lb|en|euphemism}}? I hope not. So, why not? Thinking about that may help clarify why "bad" is not derogatory/pejorative. :) (At least, not in the way dictionaries use those labels. Perhaps a philosophy of language class might interrogate whether "good" was a euphemism and "bad" was derogatory.) One indicator, perhaps the definitional indicator, is whether there exists a neutral way of saying which someone chose to use the derogatory word instead of so as to be more derogatory/insulting, or chose to use the euphemism instead of so as to be more euphemistic/positive. - -sche (discuss) 00:35, 16 December 2023 (UTC)

"Derogatory" is much older and more common than "pejorative" (even in the last 40 years, it's stayed almost exactly 1.5x more common), so I'd think it'd be more likely to be understood, but if other people think "pejorative" is clearer, I don't want to stand in the way of changing which one is displayed. Maybe we can take a straw poll or something? (I recall from the initial discussion that lead to merging them that although some people including me felt that one or the other was "stronger" or had some other subtle difference, neither Wiktionarians nor other dictionaries had the same ideas about which one that was, which basically confirmed the other dictionaries which define them as broadly synonymous / indistinct.) - -sche (discuss) 00:09, 16 December 2023 (UTC)

Helpful to know that one should look up dysphemism to find out the limits to the proper application of {{lb|en|derogatory}}. Should that appear in the text for Category:English derogatory terms and Appendix:Glossary (which has a nearly adequate entry at pejorative)? I didn't look in either location, but some might. DCDuring (talk) 16:54, 16 December 2023 (UTC)

To summarize this part of the discussion: Although dysphemistic is not a synonym of derogatory, we will label as "derogatory" only those derogatory terms that are also dysphemistic, possibly because few of our users and some of our contributors know the meaning of the word (dysphemism appears about 1/60th as often as euphemism at Google Ngrams and dyphemistic less than 1/100th as often as derogatory) or because contributors unconsciously, but correctly limit the use of derogatory to those terms that are dysphemisms. Some questions remain: Do other dictionaries avoid this little quandary by not trying to label terms as derogatory? Are we missing a definition of derogatory? Is the derogatory label correctly applied at English Wiktionary? DCDuring (talk) 19:56, 17 December 2023 (UTC)

We have a separate category for dysphemisms: Category:English_dysphemisms, and it has its label, {{lb|en|dysphemism}}. I'd be okay with merging it to one of the other categories, as the categories overlap in meaning quite a bit. It seems my contributions there are less than I'd thought they were though, so i guess its not really up to me. —Soap— 20:52, 17 December 2023 (UTC)

Per talk:rag-box I wonder if dysphemism may be the most appropriate label for inanimate and inalienably possessed objects. It seems awkward to call them offensive, and I see problems with each of the other labels as well. —Soap— 05:10, 11 January 2024 (UTC)

Inconsistency in the IPA system

I'm a newbie to this, but I can't help but noticing that there are some inconsistencies in the IPA.

For example, English two /tuː/ and English test /test/. The "t" is pronounced differently between the two words (English "two" sounds like "ch".)
So, within the English language, there are 2 different ways of pronouncing the "t"? Duchuyfootball (talk) 13:26, 12 December 2023 (UTC)

There are multiple ways of pronouncing a sound like /t/. In IPA, slashes denote what is called a "phonemic" transcription, meaning that distinct letters are only used for contrasts that can make a difference in a word's meaning. The word "two" is not normally pronounced with the same consonant phoneme as "chew". The /t/ is aspirated and can be affricated (pronounced with some frication after the plosive part), which may be what you're hearing.--Urszag (talk) 13:41, 12 December 2023 (UTC)

Or it's perhaps that Duchuy has w:yod coalescence, but yes the main point is what counts as a phonemic transcription vs a phonetic one. Vininn126 (talk) 13:43, 12 December 2023 (UTC)

"Two" doesn't have an original yod. I don't know whether some speakers may have developed one irregularly here (as in coupon). I think though that I have heard of speakers having an especially heavily aspirated or affricated pronunciation of /t/ before /u/ (as in "cartoon") or /w/ (as in "twenty").--Urszag (talk) 13:45, 12 December 2023 (UTC)

Thank you for the explanation. Still, there is no device to mark "aspiration", or it is not that important whatsoever? — This unsigned comment was added by Duchuyfootball (talk • contribs).

Aspiration is marked as follows: . Andrew Sheedy (talk) 17:20, 14 December 2023 (UTC)

Thank you. Duchuyfootball (talk) 12:33, 15 December 2023 (UTC)

(New) Feature on Kartographer: Adding geopoints via QID

Since September 2022, it is possible to create geopoints using a QID. Many wiki contributors have asked for this feature, but it is not being used much. Therefore, we would like to remind you about it. More information can be found on the project page. If you have any comments, please let us know on the talk page. – Best regards, the team of Technical Wishes at Wikimedia Deutschland

Thereza Mengs (WMDE) 12:31, 13 December 2023 (UTC)

This is great. I've tried it out on Штутгарт with the <mapframe> tag (help). Do we have a template that can handle this? Voltaigne (talk) 13:35, 13 December 2023 (UTC)

I really like the idea of using this more widely, but if we do I'd be keen to templatise it. Theknightwho (talk) 19:42, 13 December 2023 (UTC)

alternative reconstructions/reconstruction notes

Is their placement after the definitions obligatory or optional? The formulation in WT:EL suggests the former but current practice suggests the latter. —Caoimhin ceallach (talk) 14:05, 13 December 2023 (UTC)

I have not noticed current practice, as the cases where both are claimed are rare, nonetheless it is intuitive to me to put both these sections before definitions, where alternative forms mostly are situated—and there ordered after alternative forms, before etymology, so reconstruction notes are between alternative forms and reconstructions according to closeness to etymological issues. Fay Freak (talk) 14:32, 13 December 2023 (UTC)

It was formerly the practice to put alternative forms exclusively before the etymology. There was a vote some years ago to allow alternative forms to be placed before synonyms (when synonyms were exclusively between definitions and derived forms, etc.). So the preferred practice by many (though not the obligatory one) is to place alternative forms after definitions. Whether this should apply to alternative reconstructions as well depends, IMO, on how closely related these alternative reconstructions are to etymology. Andrew Sheedy (talk) 19:48, 13 December 2023 (UTC)

Here's the vote: Wiktionary:Votes/pl-2016-09/Placement_of_"Alternative_forms"_2_(weaker_proposal). Andrew Sheedy (talk) 19:52, 13 December 2023 (UTC)

Re: etymological tags for Scythian language

In September I had put up a request for new etymological codes for Scythian languages that did not result in anything concrete due to the resulting discussion going nowhere. This, I believe, happened because I had not properly explained my reasons for making such a request.

The problem is that Wiktionary's current understanding of the "western" Scythian languages spoken by the populations historically known as the Scythians proper and the Sarmatians is derived from the work of Vasily Abaev, who posited the existence of a single "Scytho-Sarmatian" language, which Wiktionary calls Proto-Ossetic.

However, this position has been challenged by a number of studies on the Scythian languages by several Scythologists and linguists over the course of the past few decades, including:

Скифский язык: опыт описания ("The Scythian Language: Attempt at Description"), by K. T. Witczak (1999)
Эминак в ряду владык Скифии ("Eminakes, King of Scythia"), by Sergey Kullanda and Dmitry Raevsky (2004)
Проблема скифского языка в современной науке ("The Problem of the Scythian Language in Contemporary Studies"), by Sergey Tokhtasyev (2005)
Sauromatae - Syrmatae - Sarmatae, by Sergey Tokhtasyev (2005)
Уроки скифского ("Lessons in Scythian"), by Sergey Kullanda (2011)
Скифы: язык и этнос ("Scythians: Language and Ethnicity"), by Sergey Kullanda (2011)
Скифские этимологии ("Scythian etymologies"), by Sergey Kullanda (2013)
К дискуссии о языке скифов: переход др.ир. *xš- > *s- и его отражение в древнегреческом ("Towards the Discusion on the Language of the Scythians: The Transition of OIr *xš- > *s- and its Reflection in the Ancient Greek"), by Mikhail Bukharin (2013)
Колаксай и его братья (античная традиция о происхождении царской власти у скифов) ("Kolaxais and his Brothers (Classical Tradition on the Origin of the Royal Power of the Scythians)"), by Mikhail Bukharin (2013)
External Relations of Scythian, by Sergey Kullanda (2014)
Скифы: язык и этногенез ("Scythians: Language and Ethnogenesis"), by Sergey Kullanda (2016)
Неархеологическая скифология ("Non-archaeological Scythology"), by Sergey Kullanda (2017)

As Sergey Tokhtasyev observed in 2005, Abaev came to the conclusion that a united Scytho-Sarmatian language existed by analysing mostly Sarmatian onomastics while neglecting Scythian ones. The common conclusion of these studies has meanwhile been that the peoples known as the Pontic Scythians and the Sarmatians in fact respectively spoke different but related languages with different features, such as:

Pontic Scythian:
- Proto-Iranian *d > Proto-Scythian *δ > Scythian *l
- and Proto-Indo-Iranian *ś (or *ć) > Scythian *ϑ;
Sarmatian:
- Proto-Iranian *d > Sarmatian *d,
- Proto-Iranian *ry- > Sarmatian *li- or *l-,
- and Proto-Indo-Iranian *ś (or *ć) > Sarmatian *s.

It is therefore inaccurate to speak of a single Scytho-Sarmatian, or, as Wiktionary puts it, Proto-Ossetic language, common to both the Pontic Scythians and Sarmatians. Proto-Ossetic was instead the same as the Sarmatian language, which was the ancestor language of Alanian and hence of Ossetian, but was a sibling language of Pontic Scythian.

To be able to create entries for reconstructions of certain terms, I would consequently need a separate etymological code for the Scythian language (sometimes also called Scythian proper or Pontic Scythian), that is the language spoken by the population of the Pontic Steppe to whom the name "Scythians" was initially narrowly applied. Antiquistik (talk) 19:31, 14 December 2023 (UTC)

@Antiquistik Are you asking for new full-language codes (aka L2 languages) or new etymology-only codes? The former requires more discussion and consensus than the latter. BTW this is getting rather technical; pinging User:Victar, who has contributed Proto-Iranian terms, and User:-sche as our general obscure-language maven. Benwing2 (talk) 00:09, 15 December 2023 (UTC)

@Benwing2: this discussion already exists here: WT:Beer_parlour/2023/September#Etymological_code_for_Scythian_language, they simply keep bringing it up different places. --{{victar|talk}} 00:37, 15 December 2023 (UTC)

@Benwing2 I would need full-language codes. And I did talk with @Victar about it in my previous request in September, but Victar kept on insisting that the Scythians and Sarmatians both spoke Proto-Ossetic, and they ignored it and stopped replying when I cited the recent linguistic work that has established that Scythian proper and Sarmatian (Proto-Ossetic) were different languages, resulting in the discussion going nowhere.

However, the lack of language codes due to said discussion going nowhere is limiting my ability to edit because I need those language codes for certain entries, hence why I am having to put another request for the language codes. Antiquistik (talk) 14:30, 15 December 2023 (UTC)

@Antiquistik I think the problem here is that these are unattested languages so Victar is (IMO rightly) skeptical of the need for new L2 codes. Benwing2 (talk) 23:44, 15 December 2023 (UTC)

@Antiquistik You seem to be confusing Sarmatian and Sauromatian. Sarmatian is covered under Old Ossetic (oos), beside Alanic, whilst Sauromatian is under Proto-Ossetic (os-pro). When dealing with Scythian given names borrowed into Ancient Greek, the date matters, which is part of why I asked you to create Ancient Greek entries first, and then we can work through the Scythian etymology of each. The second reason is that we're dealing with onomastics here, the most dubious of all linguistic studies, so a Scythian entry under any header is likely inadvisable. --{{victar|talk}} 20:17, 16 December 2023 (UTC)

@Victar Our previous discussion on the issue was very different from this present one, although I can agree with the solution you are presently proposing. Nevertheless, I will still need an etymological-only tag, at the very least, to create these entries.

@Benwing2 In this case, would it still possible to create etymology-only codes at least? Or would this be also too untenable?

Antiquistik (talk) 11:08, 22 December 2023 (UTC)

Nothing different; it's always been at User:Victar/Timelines/Scythian loanwords. I'm glad at least we're closer to being on the same page. Etymology-only code for what? There should not be a code for "Scythian", as you've requested, as Scythian (xsc) is the name of the language family. --{{victar|talk}} 12:43, 22 December 2023 (UTC)

@Victar As shown in the studies I have referred to, there was clearly a Scythian language which was part of the larger Scythian languages family, and which was distinct from the Sauromatian language. Antiquistik (talk) 23:15, 30 December 2023 (UTC)

Category:en:Landforms

This is not only about the category Category:en:Landforms. It's much broader. But I take it as an example. It has the note "This is a name category. It should contain names of specific landforms, not merely terms related to landforms, and should also not contain general terms for types of landforms." But (leaving aside the subcategories) it contains no names and a lot of general terms.

It seems to me that either 1) the description of the category should be changed, 2) all non-names should be moved to different categories, or 3) this and other categories should somehow be split into a name and a general-term part.

More general, the whole topic/category system has the potention to be a fun/useful alternative way of navigating languages and vocabularies, but the way it's implemented now is half-hearted. The section WT:TOPICCAT is very bare bones. I'm missing some kind of formulation of a philosophy for what we want this feature to actually do.

(I've written about this before: Wiktionary:Beer_parlour/2021/August#Categorizing_words_by_topic) —Caoimhin ceallach (talk) 00:32, 15 December 2023 (UTC)

LOL, I hadn't even noticed that Category:Landforms and Category:Bodies of water were "name" categories until now. I've been merrily and obliviously adding general terms for types of landforms and bodies of water to these categories the whole time (in my defence, following the precedent of what was already in there). Are these categories meant only to contain proper nouns such as Grand Canyon, Isthmus of Suez, etc.? I suggest we redesignate these categories as "set" categories under Category:Geography and create new categories for "Named landforms", etc. Voltaigne (talk) 01:13, 15 December 2023 (UTC)

@Voltaigne Yeah that's a mistake, they should be "type" or "set" categories. Benwing2 (talk) 01:21, 15 December 2023 (UTC)

Category:Landforms is also a subcategory of Category:Earth. Could that be the reason for the confusion? —Caoimhin ceallach (talk) 01:35, 15 December 2023 (UTC)

@Caoimhin ceallach Probably not; I just got confused when I (recently) added the category types to all the topic categories. Benwing2 (talk) 01:42, 15 December 2023 (UTC)

@Benwing2 I don't think anything has changed in the meantime. I'm still faced with the same issue: there is no good topic category to put generic terms for places. Has is to do with that this discussion is unresolved: Wiktionary:Requests_for_moves,_mergers_and_splits#Recategorize_Category:Demonyms_and_Category:Ethnonyms? At any rate, I would like there to be subcategories of Category:Nature to place things related to the physical world that do not fall under Category:Names. —Caoimhin ceallach (talk) 17:37, 22 April 2024 (UTC)

@Benwing I'm sorry I keep pinging you on this, but I keep running into this issue. I don't know who else to ping. I would do it myself, but I've never done anything with modules and I'm afraid to break everything. Should all subcategories of "Places" by "set" instead of "name"? —Caoimhin ceallach (talk) 16:36, 2 September 2024 (UTC)

@Caoimhin ceallach This needs a larger discussion, honestly. There was a prior discussion in Wiktionary:Requests_for_moves,_mergers_and_splits#Recategorize_Category:Demonyms_and_Category:Ethnonyms, as you note, which is unresolved and concerns this issue. The participants were @Ioaxxere and @-sche if I recall. I don't think it would be good to change subcategories of Places to set instead of name, because that would allow mixing types and names. Instead we need to split the categories somehow; maybe one of the people I just pinged can come up with some good ideas. Benwing2 (talk) 18:58, 2 September 2024 (UTC)

I ran into a similar(?) issue recently with CAT:en:Waterfalls, which said it was for names of specific waterfalls, but mostly contained terms for types. In Wiktionary:Beer_parlour/2024/August#List_and_topic_categories_again_(how_many_types,_and_how_to_name_them), bringing up also Seasons and Cities, I raised the old suggestion that we probably ultimately have to a) give each type of category a different name if we want to keep three types of category separate, or b) stop trying to keep all three types of category separate (e.g. allow "names of seasons : summer, fall, winter, spring" and "terms related to seasons : seasonal, wintery, summery, ..." to be in the same category). Possibly the real solution is to have four types of category with four distinct naming schemes:

A category into which all types of terms go, which is used when the more specific categories don't exist because there are not enough terms to justify separate categories for each type. For example, maybe there are not enough "names of seasons" or "terms related to seasons" to justify separate categories. Maybe this type of category gets the base name: "CAT:Seasons", "CAT:Waterfalls", etc.
In other cases, there are probably enough "types of river" (creek, bourne, burn, run, wadi, ...) and "names of particular rivers" (Mississippi, Nile, Thames, Volga, ...) to populate separate categories; it's plausible there are even enough "terms related to the topic of rivers" to justify the corresponding category for that, too ("alluvial", "amnicolist", "fluvial", "interamnian", "riverine", "tributary", ...). In that case, the "base" category could perhaps just be a parent category holding the other categories? Which, IMO, all need to have distinct names, like "CAT:Types of river", "CAT:Names of particular rivers", "CAT:topic:Rivers", if we expect people to be able to tell what goes where.

Personal names seems like a somewhat different issue to this. - -sche (discuss) 02:06, 3 September 2024 (UTC)

@Benwing2: I think we should have a better naming convention for categories, so there might be Category:en:Mountain topics (e.g. oronymy), Category:en:Types of mountains (e.g. five-thousander), and Category:en:Individual mountains (e.g. K2). Ioaxxere (talk) 03:46, 3 September 2024 (UTC)

Just noting that I noticed that "CAT:en:Wetlands" had the same issue, of claiming to be for names of specific wetlands, but in fact only containing types and some other related terms. I have reclassified the category, but anyone who wants to categorize names of specific wetlands will now be stymied. This is a widespread and enduring problem. I think we do ultimately need (1) to give each type of category a very distinct name by which it is immediately obvious to a new user which of the scopes each category they might try to add has, rather than "CAT:en:Wetlands" being the name that's used regardless of whether it's for "specific wetlands" or "types of wetlands" or "terms related to wetlands" and the fact of it being one type or another just being set in modules, and (2) perhaps to have a catchall category, with the different types of category as subcategories, so that in cases where it doesn't really make sense to split by type, we can just put entries directly in the catchall category. (E.g., suppose there is a word for "place where earth's mantle is actively exposed": AFAIK there are only two such places, not enough for a category IMO, but if there were also various terms related to *mantle-expose-ogeny like *mantle-expose-ogenous and *...-ogenic or whatever, it might make sense to chuck the names of the two places and those related terms into one category.) - -sche (discuss) 21:58, 15 October 2024 (UTC)

@-sche I agree with you here. I just don't know what the best names are, and this is a big change so it will require input from several people. Benwing2 (talk) 22:02, 15 October 2024 (UTC)

I think we should go with “Waterfalls” and “Named waterfalls” (or “Names of waterfalls”). — Sgconlaw (talk) 22:22, 15 October 2024 (UTC)

Yeah, "Waterfalls" would work well as an overarching catch-all 'super-category' that all the 'subtypes' could be subcategories of. The fact that we currently use that name, and names of that sort, for all three types — only the backend modules set up the category boilerplate to say "CAT:en:Waterfalls" is for "related to" or "types" or "specific individual ones", the name "CAT:en:Waterfalls" currently stays the same if the module reassigns it from one type to another — seems like proof of this to me (and proof that it's too ambiguous and unspecific a name to use for any one type, heh).
It seems like Ioaxxere and I agree that "Types of (mountain|waterfall|etc)" would be a logical clear name for the "types" category. I don't have a strong preference between "Names of X", "Named Xs", or "Individual Xs" (suggested above), but "Individual Xs" might cause problems in the case where there are names for pairs or sets of things (e.g. Twin Cities as the name for two cities, Oxbridge as two unis, and surely there are Oxbridge-type names for sets of cities), so perhaps "Names of Xs" is better (and has the same format as "Types of X")? Something like "CAT:en:topic:Mountains" appeals to me for the topic categories but I'm certainly not opposed to something like "Mountain topics", or perhaps clearer "Mountain-related terms" or something, if people prefer. - -sche (discuss) 23:01, 15 October 2024 (UTC)

Maybe "Terms related to mountains" if we have "Names of mountains" and "Types of mountains"? We would need some new short templates to make it easier to categorize into these categories in any case. Benwing2 (talk) 23:04, 15 October 2024 (UTC)

@Benwing2, -sche: I have no objection to “Terms related to mountains” and “Names of mountains” as subcategories of “Mountains”, but it might be hard to get editors to remember to use “Terms related to mountains”—I suspect a lot of them will just dump entries into “Mountains”, which is why it might be easier to just use the parent category for the related terms. Is it really necessary to have a third subcategory “Types of mountains”? Can that not be subsumed under the parent category or “Terms related to mountains”? — Sgconlaw (talk) 04:53, 16 October 2024 (UTC)

Fair point, it does seem likely that people will dump every kind of entry, from Related terms to Types to Names of individual mountains, into "CAT:en:Mountains" if it exists. We could eschew having 'top-level' categories like "CAT:en:Mountains". (Perhaps the category boilerplate could even automatically check for and crosslink the "names of Xs" and "terms related to X" categories for each X.) Hopefully others will weigh in, but I think you are also right that it'd be OK to systematically put all "types of..." entries in the "related to..." categories and forgo having "types of" categories. In some cases, a given topic does have a lot of subtypes and a lot of related terms, e.g. wetlands (bogs, marshes, fens, swamps, moors, mires, peatlands, carrs, cataract bogs, ... marish, wiery, fenny, boggy, swampy, marshy, ...), but I suppose having a big category will all such entries in it is fine. - -sche (discuss) 20:05, 17 October 2024 (UTC)

mod:Hrkt-translit needs attention

I have given detailed evidence of why Wiktionary:Beer_parlour/2023/August#Japanese いぃ, うぅ, イィ, ウゥ should be transliterated as ī, ū instead of yi, wu. User:Theknightwho, while unable to produce any evidence for his stance and not even a Japanese speaker, keeps edit warring and blocking me. (the edit in question) -- Huhu9001 (talk) 02:31, 15 December 2023 (UTC)

@Huhu9001 And I gave you a response based on the evidence you supplied, but you (predictably) ignored it then lied that I was vandalising the module. You cannot be reasoned with. Theknightwho (talk) 02:37, 15 December 2023 (UTC)

It seems that small kana vowels are also used for emphasis. Personally I think wu belongs to をぅ and yi belongs to ゆぃ (yes i believe i've seen okinawan transliteration (at least ideas) where this is used).

I hope this edit war stops. Chuterix (talk) 00:34, 18 December 2023 (UTC)

Adding redlinks to Japanese conjugations in the conj table

Why aren't forms not yet in Wiktionary not linked in the conjugation tables? For words like 知る, they're linked outside the tables whenever they're mentioned, but not inside them. I feel like this masks the actual pages that are needed? Like, think of all the forms that aren't mentioned in Special:WantedPages, in the Wiktionary dumps (that will god willing be fixed and started back up someday) nor anywhere else because the links simply aren't there. I believe that putting redlinks there would make creating them become way easier, too, especially if the WT:ACCEL code for those is written up. Should there be any mistakes with the conjugation table (kinda doubtful since afaik Japanese "conjugations" are usually super regular?), having links pointing to them would make catching them become way easier, too. MedK1 (talk) 00:04, 18 December 2023 (UTC)

(Paging @Mahogany115, Mlgc1998, Eyesnore, Mlgc1998, Chuterix as people in Category:User ja who have made changes to Japanese terms very very recently) MedK1 (talk) 00:09, 18 December 2023 (UTC)

I don't know, because then we'd have too many entries about a single verb conjugation (despite in English we have runned, ran, etc.). Best thing is the ren'yokei (stem conjugation), as many verbs (but not every single one of them) have noun counterparts clearly from this form. I'll leave the comments to other people. Chuterix (talk) 00:29, 18 December 2023 (UTC)

Second, I believe there are no nouns derived from suru verbs (that are also nouns without -suru). Chuterix (talk) 00:35, 18 December 2023 (UTC)

@Chuterix: I'm not sure having "too many entries about a single verb conjugation" is an actual issue. Romance languages have way more conjugations than English does and yet Wiktionary aims to give all of them pages. Just look at Category:Spanish verb forms and compare it to Category:Spanish verbs. The same goes for languages like Finnish, although we've been far less effective at covering those so far. Finnish has roughly 7 million redlinks currently, and I presume the plan is to actually create them, sooner or later. I'd guess Japanese would be somewhere between the two examples here? MedK1 (talk) 00:51, 18 December 2023 (UTC)

@MedK1: I would advise caution in proposing changes to languages that are completely different from your own in ways that neither you nor I know anything about. Japanese isn't really an inflected language: it uses lots of particles and syntactic categories that don't correspond to anything in western European languages. My impression is that those tables are a very sketchy summary of representative patterns- the forms they give may not even be things we would want entries for. Also pinging @Eirikr, who would be better at explaining the underlying issues. Chuck Entz (talk) 00:37, 18 December 2023 (UTC)

@Chuck Entz: Yeah, I'm somewhat aware that Japanese's super different and doesn't have conjugations the same way western European languages do, but I still thought it'd be relevant to bring it up here since some forms do get links to them outside the table (and the tables seem to be treating them as conjugations sorta?). I'm not all that knowledgeable about Japanese as you obviously know and my own userpage very clearly shows, but it just looks inconsistent — to me at least — to link to 知らない in the usage notes section but not in the actual conjugation table. I then presume there might be many other cases like this out there... MedK1 (talk) 00:47, 18 December 2023 (UTC)

@MedK1 I might support adding links to the forms of the verb that are actually inherent to it, but I don't know where I stand with the other 'conjugations' you can form with them. E.g., 知らない, as you gave that example, is just 知ら (mizenkei of 知る) + ない (negation), which is basically sum-of-parts. It's kind of hard to judge whether it's worth including, because for sure it's a common construction, but that's why in the table it's listed as a "key construction" rather than a "conjugation". On the other hand, the verb forms such as the mizenkei don't mean anything on their own, so it's kind of dubious for them to have an entry either...(?) Even though those are more inherent to the verb itself. Otherwise, it would be possible for every single verb to be combined with every possible auxiliary verb, to make a lot of conjugations.

What I would support for sure, if we want to do that, is linking to the auxiliary verbs that make up the conjugations, e.g. for 知らない, rending it as 知らない, 知ります as 知ります, etc.; which would actually go well with linking to the "stem forms" of the template as well, e.g. ]]. But, to be honest, the information in these tables is something you understand for all verbs once you learn it for a few. It might not even need to be generalized. Grammar study of Japanese will go a really long way, to where once you are looking things up in Japanese, I doubt you'd specifically enter 知らない without realizing it's just a form of 知る. Though it would be more convenient to be able to type either and end up where you want. Kiril kovachev (talk・contribs) 20:17, 21 December 2023 (UTC)

@Kiril kovachev @MedK1 I don't know that much about Japanese but I gather it's an agglutinative language a bit like Turkish. I have heard that there are an indefinite number of possible verb forms in Turkish that correspond to long phrases in other languages. If Japanese behaves similarly, we certainly wouldn't want to create entries for every possible verb form (which would be impossible, as the number is unbounded), but instead limit it to certain forms. Romance languages are different in that although there are several forms for every verb, the number is limited (around 75 or so), and they're often irregular enough that it's worth creating entries for them. Maybe something vaguely along these lines is that although I've done bot runs to create Russian noun and verb forms, I haven't done this for adjectives or participles except for the so-called "short" forms of these, because adjective formation in Russian is quite regular outside of the short forms, and there are a relatively large number of such forms (esp. considering that every verb has 3-4 possible adjectival participles), and the forms are rarely syncretic with noun or verb forms. Benwing2 (talk) 20:40, 21 December 2023 (UTC)

@Benwing2: Your 2nd sentence is totally to the point. We don't want endless inflected forms and their variants, shortenings for Japanese or Korean. (Korean conjugations could even also more styles, which are not present in the conjugation tables.) Anatoli T. ^{(обсудить}/^вклад) 20:45, 21 December 2023 (UTC)

@Benwing2 Indeed, I agree with that 100%, and I assume if we are to make links then it would only be to the specific forms listed in the table. Or else every verb could have infinite inflections by repeating the same auxiliary verb suffix 100,000 times. But keeping it to just those 15/16 or so would be manageable, if we wanted to do it. Kiril kovachev (talk・contribs) 20:47, 21 December 2023 (UTC)

Yeah, this. MedK1 (talk) 21:15, 21 December 2023 (UTC)

@Kiril kovachev: Oh, I was actually going to suggest ]] in a response following up the one I wrote to Chuterix (talk • contribs) but then I refrained for some silly reason. I like that suggestion a lot too and for me, personally, either way works! I'm guessing we'd still be making pages like 知った and 知って though, right?

Though, I wonder how it'd work for someone actually looking the page up? Sometimes you might encounter a verb in hiragana (or rarely even rômaji depending on the context). So if they were to look up, say, "shiranai" or "しらない", it wouldn't be as obvious that it's a form of "知る" (especially if it's some other verb). I was originally raising this as a potential problem, but I actually have some solutions in mind... Maybe shiranai/しらない could display as "hiragana spelling/romanization of ]]" and then, if somebody attempts to look up "知らない", perhaps hard redirect to "知ら" and have some sort of template at the bottom of the page listing the kinds of things it can be combined with along with the rough English meaning? MedK1 (talk) 21:13, 21 December 2023 (UTC)

@MedK1 Yeah, in that case we would still have to create the ones that aren't so clear, since technically 知った is 知りた but the り isn't present in the part that we can link to, so I guess that would be necessary. About looking it up, you can in general assume anything that ends in ない is either noun+ない, verb mizenkei+ない, or an adjective that ends in the ない suffix, but if neither of the other two happen to exist (e.g. you are looking up しらない and there's no result), you would assume it's the verb case and just apply the rule to know to look up しる next. Like it's been pointed out, though, it might be good to have redirects anyway to make this a bit easier. Finally, I would support the idea you suggest, and that could actually be an even more complete way of listing a verb's possible constructions, albeit split over several pages. Then the main page can be reserved for the most common and important conjugations, like we have now. Kiril kovachev (talk・contribs) 21:51, 22 December 2023 (UTC)

Exactly, yeah! I agree 100%. Now to hope other people do too so we can get a consensus and actually see changes happening... MedK1 (talk) 03:49, 30 December 2023 (UTC)

Late to the party, what with events IRL and all... 😄

I concur in general with what others have said so far:

While on the one hand, we (Wiktionary at large) do say "all words in all languages", and we do have entries for inflected forms in other languages...
on the other hand, Japanese is indeed a bit like Turkish, with multiple stems and suffixes that can go on and on and on, and things can get kinda silly.

Consider example word 食べさせられたくなかったら (tabesaseraretakunakattara). This parses out to, "if didn't/doesn't want to be made to eat ". Depending on one's analysis, this could be viewed as:

a single word (possibly the most common analysis)
as two words, 食べさせられたく (tabesaseraretaku, adverbial of tabesaseraretai, "to want to be made to eat") + なかったら (nakattara, conditional "if" form of nai, "not") -- this analysis is proposed in part because it is possible to add certain particles between these two pieces, such as contrastive は (wa, topic particle) or inclusive も (mo, “also, even”)
as a series of stems and suffixes, 食べ (tabe, verb stem, "eating") + させ (-sase, causative stem) + られ (-rare, passive stem) + たく (-taku, adverbial of -tai, desiderative "want to") + なかった (-nakatta, past-tense of negative suffix -nai, itself a contraction of adverbial -naku + past-tense of copular verb arita) + ら (-ra, conditional "if" suffix).

Broadly, I think it might be more useful to have our Japanese conjugation tables link through to relevant materials on the EN WP that explain Japanese verb conjugation, such as w:Japanese conjugation, rather than to try to document every single conjugable permutation of every verb and -i adjective. Consider also したくなくない (shitakunakunai, “it's not that don't want to do it”), etc.

That said, Wiktionary is not paper, so if we can create some kind of bot infrastructure for creating and maintaining the large number of derivable forms, I am not necessarily opposed to the idea. ‑‑ Eiríkr Útlendi │^{Tala við mig} 20:06, 29 January 2024 (UTC)

Brackets in kinkshame's 2nd sense.

Those brackets have looked at me funny and I don't like them. Can we remove them? I tried but I got reverted twice. Here's my actual reasoning as mentioned in the edit summary:

The brackets are unnecessary, this is like the only page with those and even then it's inconsistently applied to only one of its senses.

@Equinox says it's standard and advised me to take this here so here I am, but considering how I see bracketless entries far more than not, I actually think the tea room would be a better place for it? Regardless, I feel like either both senses should have the brackets or neither one should. MedK1 (talk) 00:40, 18 December 2023 (UTC)

it's because the 2nd sense is transitive. if there were a verb in English that meant to give a dog a bone, we might define it as

(transitive) to give (a dog) a bone

to show that the dog must still be mentioned in the sentence for it to be grammatical.

Personally I'd like to see these transitive-intransitive paired senses just subsumed into one, but it's been like this for a very long time, and i havent looked for old discussions to see what the reasons might be. —Soap— 00:54, 18 December 2023 (UTC)

A correction, ... both senses can be transitive, but the object is different. Basically, this verb is like provide (you can provide education, and you can provide someone with education). —Soap— 00:56, 18 December 2023 (UTC)

I see the line of reasoning behind it now. Thank you! Still, consider this: Only one of the senses listed at "provide" has the brackets (you don't see any senses with '(for)' or anything), and even then, you can easily render them unnecessary by putting something like {{lb|en|used alongside "with"}} before it as I've seen in many other places. One can even argue that the brackets are already unnecessary in that page since there's the usage notes section right underneath explaining everything just fine; removing the brackets wouldn't hurt either page at all, I feel. MedK1 (talk) 01:05, 18 December 2023 (UTC)

We have parens on both lines now. I guess that makes sense, since the intransitive use hey, don't kinkshame can mean either <hey, leave me alone> or <hey, my fetish is fine>. It'd be nice to have a way to indicate that they can both be intransitive too, but I suppose we could say that nearly every transitive verb in English can drop the object (with blame and provide being counterexamples), so it'd have to be labeled on every single one. As for why this is here on BP instead of the Tea Room, I suppose its because it applies to such a wide category of words rather than just one entry. —Soap— 10:19, 18 December 2023 (UTC)

@Soap: @MedK1: Now sense 1 is marked ambitransitive. Everyone happy yet? Equinox ◑ 22:49, 18 December 2023 (UTC)

Removal of `{{ko-IPA}}`

User:Dubukimchi, a native Korean speaker started removing {{ko-IPA}} from Korean entries. I noticed he does it for terms, which are North Korean and some loanwords.

While I agree it may be problematic to add pronunciations for compounds and North Korean terms display "(SK Standard/Seoul)', which is incorrect. Rather than addressing any issues, the issue decided to remove the template with the useful information altogether.

I also disagree with "... there are no standard pronunciations of Korean roots and loanwords". There may be more than one way to pronounce certain loanwords and there are some known patterns but it's very bad for our users to remove IPA. Naver and Daum often provide respellings and sound recordings for loanwords as well. It's not something like loanword pronunciations can't be referenced for a highly documented language such as Korean.

I have reverted a couple of edits but he reverted back (removed the pronunciation template). That's why the new thread here. Anatoli T. ^{(обсудить}/^вклад) 04:47, 18 December 2023 (UTC)

Since the template asserts that its output is a Seoul South Korean pronunciation, but (in the case of the terms which are only used in North Korea) no such pronunciation exists in Seoul South Korean, I understand why Dubukimchi is removing the template, and AFAICT he is correct. I recall the earlier thread about this; perhaps this time around someone can update the {{ko-IPA}} template to allow overriding the accent in these cases (and plausibly also the pronunciation). Even if no-one has the time to do that, perhaps just accepting the use of {{IPA}} rather than {{ko-IPA}} would solve this: for example, could we put {{a|North Korean}} {{IPA|ko|}} (or whatever the pronunciation is, if it's not that), rather than the spurious South Korean {{ko-IPA}}, on 로씨야족? - -sche (discuss) 07:27, 18 December 2023 (UTC)

@-sche @Atitarev I am with Atitarev here. The prescriptive South Korean pronunciation is conservative and essentially identical to the prescriptive Northern pronunciation, which is also based on the same variety of Korean (mid-1900s Seoul pronunciation). Valid information should not be removed.--Saranamd (talk) 09:37, 18 December 2023 (UTC)

@-sche: Yeah, I remember the discussion. It's a technical thing. Yes, there should be a way to remove/override the message, even to get rid of it altogether.

It's only a little bit questionable when the user removes from "North Korean", even if the IPA is accurate but otherwise, there is no good reason for removing valid information, which he has done on regular terms. If you read about the phonology differences between North and South, they are all basically already reflected in the different but phonetic spelling. Anatoli T. ^{(обсудить}/^вклад) 10:14, 18 December 2023 (UTC)

Note that most contributors already discussed at length the pronunciation module. Any discrepancies have been addressed and it's generally accepted by all contributors, including loanwords and North Korean. There are minor adjustments people make with loandwords, like pronouncing 선글라스 (seon'geullaseu, “sunglasses”) the same way 썬글라스 (sseon'geullaseu), adding some very occasional vowel length (which we already cater for) 크림 (keurim, “cream”). --Anatoli T. ^{(обсудить}/^вклад) 10:21, 18 December 2023 (UTC)

Naver dictionary provides Korean entries from pyojun gugeo daesajeon or urimal saem, but unlike the originals, it shows incorrect results in many cases. Anyway, according to "Standard Language Regulation" (표준어 규정) of National Institute of Korean Language, "loanwords are assessed separately". However they didn't make the loanwords regulation, but they only made the "Korean Orthography of Loanwords" (외래어 표기법). So, there are no standard pronunciations of loanwords, but {{ko-IPA}} is a template for "SK starndard" only. Then, how can we prove that it is a standard pronuncitaion of Korean? Also, North Koreans have different pronunciations (e.g. 적 is , not ), and, in general, they don't write down their pronunciation in the dictionary joseonmal daesajeon. Dubukimchi (talk) 11:32, 18 December 2023 (UTC)

@Dubukimchi We know the pronunciation of loanwords in South Korea because we have native speakers of the language here. There is no need to depend on a dictionary for this.

The North Korean pronunciations of words are different in practice, e.g. ㅓ as which you note, but this is a recent shift. The 1989 edition of ≪조선문화어문법≫, page 29 states that the vowel of ㅓ is 중간낮은모음.뒤모음.보통입술 which is clearly describing . The prescriptive pronunciation is basically identical.--Saranamd (talk) 11:43, 18 December 2023 (UTC)

It is not standard pronunciations. And according to "중국어어음습득에서 모국어어음의 영향을 극복하기 위한 방도" (Chi Kyongnam, 2021), "조선어의 혀끝앞소리 《ㅈ , ㅊ , ㅅ 》는 중국어의 혀끝앞소리인 자음 《z , c , s 》와 같기때문에 ", " 조선어에도 소리빛갈이 이와 비슷한 혀끝앞소리 《ㅈ , ㅊ , ㅅ 》가 있지만 " and " 조선어의 《ᄒ 》는 목구멍스침소리로서 ". North and South Korean have different pronuncitaions. Dubukimchi (talk) 12:15, 18 December 2023 (UTC)

@Dubukimchi: The problem with your edits was that you removed IPA templates, hoping no-one will notice? It's not just a good practice to check if anyone objects if you remove useful contents. If you disagree or dislike something, why don't you raise it as a concern first here at WT:BP or WT:TR?! What @Saranamd mentions here was already discussed in the past and agreed on. And it's not loanwords in Korean are pronounced so drastically different from native words. The differences are quite minor and known.

Also, with loanwords, you know that you can supplying "respellings" (how words are actually pronounced) or use some other pronunciation hints described at Template:ko-IPA/documentation? It wasn't a good start for you. Please review your removals, so that others don't have to revert your edits. If you're having trouble with your own additions or have little confidence with the pronunciation of words or our templates, you can always use {{rfp|ko}} (requests for pronunciation) under ===Pronunciation=== header. Anatoli T. ^{(обсудить}/^вклад) 12:09, 18 December 2023 (UTC)

Let's correct the information, let's make the information valid and useful! It's currently wrong, invalid information which should be removed; the template asserts that the pronunciation is used in the Seoul dialect, in South Korean, which in the case of terms which are not used in Seoul or anywhere in South Korea, is incorrect. If we want the information to be present, let's make the information correct; let's not just insist on keeping incorrect information because "it's information". - -sche (discuss) 15:49, 18 December 2023 (UTC)

Regardless of the North Korean pronunciation issue, the loanword pronunciations should not have been removed. AG202 (talk) 16:46, 18 December 2023 (UTC)

@Atitarev: I am sorry if my edits made you feel bad. But I didn't edit entries secretly. And I still don't know the places in wiktionary well. Nevertheless, as I said, there is no standard pronunciation of loanwords. @AG202, so how do we find standard pronunciations? Dubukimchi (talk) 22:11, 18 December 2023 (UTC)

As @Saranamd said, we already know how natives pronounce them. Just because 국립국어원 doesn’t list a pronunciation for those entries doesn’t mean that they don’t have one. Korean is one of the languages where it’s very easy to predict how words are said. We do not solely depend on 우리말샘 or another dictionary and can make our own judgment calls. AG202 (talk) 22:28, 18 December 2023 (UTC)

@AG202: There is no standard for native speakers' pronunciation. Where can we find the reference, if there is a standard pronunciation for native speakers? Because I don't prefer original research. Dubukimchi (talk) 22:52, 18 December 2023 (UTC)

@Dubukimchi: I see that you either don't like, understand or tust our modules and templates. You even used some old style IPA for your user name on your user page. For 두부김치 (dubugimchi) we use .

You have removed IPA from 러시아인 (reosiain) TWICE, even when I re-added. What is wrong with it?! Who told you it's incorrect?

We will address the template's message for North Korean but don't remove someone else's IPA without agreeing with us. As you can see, it's very controversial. Anatoli T. ^{(обсудить}/^вклад) 22:35, 18 December 2023 (UTC)

@-sche: Can you clarify what needs to be done to {{ko-IPA}}? Maybe I will look into fixing it. Benwing2 (talk) 22:42, 18 December 2023 (UTC)

@Benwing2, @-sche: Thanks for your input guys.

I think we need to remove the message (SK Standard/Seoul). Tell me if you disagree.
Otherwise, for entries in the North Korean category, allow a new label.

The prescribed pronunciation is the same in the North and South. All based on the spelling.

Note that rare North Korean dictionaries lack IPA.

In my opinion, we can still add IPA for terms, which are 100% North Korean but changed the label or a warning. The community should decide, if the status quo is unsatisfactory.

As for loanwords. Removing IPA is totally unjustified. Anatoli T. ^{(обсудить}/^вклад) 23:23, 18 December 2023 (UTC)

I’d personally much prefer just allowing for a new label. AG202 (talk) 23:40, 18 December 2023 (UTC)

@AG202: I am cool with that. Anatoli T. ^{(обсудить}/^вклад) 00:14, 19 December 2023 (UTC)

Yeah (@Benwing) I think the main thing is updating the template/module to allow individual entries to optionally suppress or replace the "(SK Standard/Seoul)" label. Previous discussion was Wiktionary:Beer_parlour/2023/August#North_Korean_pronunciations. 녀자 is an example of a term where references about Korean pronunciation specifically talk about the fact that South Koreans don't pronounce it ɲʌ̹d͡ʑa̠ (but North Koreans do), yet we're labelling ɲʌ̹d͡ʑa̠ as the South Korean pronunciation and not providing anything that we label as a North Korean pronunciation, which ... is a bit like saying /ˌal.jʊˈmɪn.ɪ.əm əˈkjuːm.jʊ.leɪ.tə/ is the rhotic (General American) pronunciation of aluminium accumulator, and declining to provide a (Received Pronunciation) or (UK) pronunciation. - -sche (discuss) 21:43, 19 December 2023 (UTC)

@-sche, @Benwing2, @AG202, @Saranamd, @Dubukimchi: Sounds good to me. We can change the label for terms, which are specifically labelled as North Korean (only). Please note Category:North Korean doesn't always include only North Korean senses.

I didn't get around to check all templates removals yet. Some are totally unjustified. Anatoli T. ^{(обсудить}/^вклад) 22:53, 19 December 2023 (UTC)

To get things moving, how about adding a new |nk=1 (North Korea=yes) to {{ko-IPA}}, so that Module:ko-pron#L-49 adds if |nk==1 is used:

Currently: (SK Standard/Seoul) IPA^(key):
New, with |nk=1: "(SK Standard/Seoul and NK Standard/Pyongyang) IPA^(key):

That way, it doesn't have to be exclusive but will make sure that North Korean standard is also mentioned when a term is also or only North Korean?

@-sche, @Benwing2, @AG202, @Saranamd, @Dubukimchi Anatoli T. ^{(обсудить}/^вклад) 03:12, 20 December 2023 (UTC)

@Atitarev I guess I am confused; are these North-Korean-only pronunciations or pronunciations that are both North and South Korea? In the former case we shouldn't label it "SK Standard and NK Standard". Benwing2 (talk) 03:24, 20 December 2023 (UTC)

@Benwing2:

Oops, sorry! I meant something else.

North Korean ONLY: e.g. 녀자(女子) (nyeoja) - just use the NK Standard
Also North Korean: e.g. 문화어(文化語) (munhwa'eo) - use the SK Standard and NK Standard

If each sense (or the only sense) is marked as North Korean then #1, otherwise #2. Anatoli T. ^{(обсудить}/^вклад) 03:37, 20 December 2023 (UTC)

@Benwing2: That means there should be more parameters for the NK standard or separate ones for "only NK" and "also NK". Perhaps model on English where you can have more than one standard (if there is this option)? Anatoli T. ^{(обсудить}/^вклад) 03:41, 20 December 2023 (UTC)

@Atitarev I can imagine a param |lect= that takes values nk and sk or a comma-separated list. Would that work? There isn't an {{en-IPA}}, but {{pt-IPA}} handles multiple dialects; in this case it contains separate params for each dialect because they may need separate respellings. For example, |rio= for Rio de Janeiro, |sp= for São Paulo, |br= for Brazil as a whole, |pt= for Portugal as a whole, etc. If you don't give a respelling or give a respelling in |1=, you get pronunciations for all dialects, otherwise you only get pronunciations for the specified dialect(s). You can use the value + to request a respelling that's identical to the actual spelling. Maybe this is overkill for Korean; I don't know if you ever need separate respellings for North Korean vs. South Korean. Benwing2 (talk) 03:52, 20 December 2023 (UTC)

@Benwing2: Yes, comma-separated values would work.

Regarding respelling North vs South - yes, potentially, since there are at least some cases. The famous surname Lee (Ree, Li, Ree, Yee, Yi, etc.) (in the linguistic sense) 이(李) (I, “Yi (SK)”) vs 리(李) (Ri, “Ri (NK)”) can be read as the NK in the South, or 육(六) (yuk, “six (SK)”) vs 륙(六) (ryuk, “six (NK)”) can be pronounced in the South as NK in some positions or in the North as SK (dialectal or non-standard) but we may find better cases. I don't want to make things more complex than they are for you, though. Anatoli T. ^{(обсудить}/^вклад) 04:05, 20 December 2023 (UTC)

@Atitarev If it is just a few rare cases, then IMO they're better handled by having two invocations of {{ko-IPA}} with appropriate dialectal restrictions, so I'll just implement that. Benwing2 (talk) 04:10, 20 December 2023 (UTC)

While we're at it, could we maybe get the different Korean modules to where they're readable? AG202 (talk) 04:49, 20 December 2023 (UTC)

@AG202 What do you mean by "readable"? More documentation? Benwing2 (talk) 05:06, 20 December 2023 (UTC)

@Benwing2: Hi,

If North Korean handling of the same words is ever introduced, here is an example difference.

The word is spelled as "doglib" but:

South: 독립(獨立) (dongnip, “independence”) respelled as 동닙 (dongnip) pronounced (SK) (can also be seen in the entry)
North: 독립(獨立) (dongrip, “independence”) respelled as 동립 (dongrip) pronounced (NK)

Currently the module handles the SK version. Anatoli T. ^{(обсудить}/^вклад) 07:04, 20 December 2023 (UTC)

And "dongnip" (IPA ) is still OK in the North.

The whole actual pronunciation difference (especially when the spelling is the same part) can be described in on small section as in w:North–South_differences_in_the_Korean_language#Pronunciation. Anatoli T. ^{(обсудить}/^вклад) 07:16, 20 December 2023 (UTC)

@Benwing2: An example: 미스터 이 ― miseuteo I ― Mr Li can be pronounced as "miseuteo Ri" in South Korea, as if it were spelled "리", not "이" (can also spelled 미스터 리 ― miseuteo Ri ― Mr Li. This surname's spelling and pronunciations is really confusing for everyone, including South Koreans. Anatoli T. ^{(обсудить}/^вклад) 04:11, 20 December 2023 (UTC)

@-sche, @Benwing2, @AG202: If you decide to change the display and what to change to, the message is set here: Module:ko-pron#L-49 Anatoli T. ^{(обсудить}/^вклад) 02:11, 19 December 2023 (UTC)

As for the IPA of "두부김치", you need to know the difference between phonetics and phonology. Please read the w:International Phonetic Alphabet#Brackets and transcription delimiters. Dubukimchi (talk) 22:47, 18 December 2023 (UTC)

And I didn't raise the issue, but you edited my user page last month without my consent. @Atitarev Dubukimchi (talk) 22:57, 18 December 2023 (UTC)

@Dubukimchi: We all agreed on the method on how represent the Korean pronunciation. You can agree or disagree but don't do your own thing, it creates a mess.

When I added a Babel to your page, I asked you to check. It's a courtesy to other users who might want to know. I have rolled it back. Anatoli T. ^{(обсудить}/^вклад) 23:17, 18 December 2023 (UTC)

@Dubukimchi Where do you live, are you below the age of consent? Word0151 (talk) 03:05, 21 December 2023 (UTC)

Deprecate Judeo-Persian?

We have Category:Judeo-Persian language as an L2. There are various Iranian languages traditionally spoken by Jews in Iran and nearby countries, but Judeo-Persian proper refers to just Classical Persian written in Hebrew script. It might have to be treated like Category:Judeo-Arabic and deprecated as a language.--Saranamd (talk) 09:11, 18 December 2023 (UTC)

כה is an example of a Judeo-Persian entry I just made.--Saranamd (talk) 09:21, 18 December 2023 (UTC)

A few entries do pose problem like תורה (torâ, “the Torah”), which was probably never used in Arabic script. Presumably we can just make the Hebrew script be the main lemma for these few cases of Hebrew loans in otherwise standard Classical Persian texts.--Saranamd (talk) 10:32, 18 December 2023 (UTC)

норм, I have seen papers and editions about Judeo-Persian and viewed it exactly like Judeo-Arabic, just not spoken about it, due to its exotic appeal: man would just create label data and an etymology-language code on some occasion, I supposed, but there we have an unjustified full language code even. As seen in the recent discussion about Gorgani, technically there is in the same fashion Judeo-Mazanderani, but I am just the first person in the world to employ this term. I guess @Fenakhay has to delete it before it forms metastases. Seems we just waited till someone who actually does Classical Persian confirms that it does not make much sense. Fay Freak (talk) 12:32, 18 December 2023 (UTC)

@Fay Freak It is actually a little more complicated because Early New Judeo-Persian (ENJP) is grammatically quite different from what solidified as Classical Persian, due to having different dialectal backgrounds. But the most celebrated works of Judeo-Persian literature, like the masnavis by Shahin of Shiraz or Elisha ben Samuel (I am quoting the latter), really are 100% Classical Persian in every way only with a few Hebrew religious terms mixed in.

Maybe ENJP can have its own language code when someone interested in it comes around (I am not).--Saranamd (talk) 12:42, 18 December 2023 (UTC)

From Wiktionary:Language treatment, I see the split of Judeo-Persian happened in 2013, when everyone was less informed, by -sche and Metaknowledge (who later vouched for merging Judeo-Arabic). Judeo-Tat language is left, I don’t know what that is, and the Jews who added entries in it might not know either, even less than Vahagn Petrosyan who added entries in it, but copied content of some reference works. Fay Freak (talk) 12:51, 18 December 2023 (UTC)

Personally I would like to distinguish between Judeo-Persian as a literary language (i.e. the Classical Persian language written in Hebrew script) and the Iranian dialects actually traditionally spoken by Jews. The actual spoken dialects tend to be more different from Persian proper, e.g. spoken Judeo-Shirazi is a direct descendant of Old Shirazi and the premodern Jewish poets did not write in it. So Judeo-Tat can stay.--Saranamd (talk) 13:03, 18 December 2023 (UTC)

Template editor permission request

Hello, all, may I please request to be made a template editor? I asked for this permission in the past, when we were working on specific pronunciation templates for Bulgarian, but these days I still occasionally involve myself with edits and improvements that require this permission, and it's become clearer to me that this may continue to be necessary — so in order to avoid bothering others who are already busy with things, I believe I would benefit from being able to make such small changes myself. I promise to consult with others before making any major changes. Thanks for your consideration, Kiril kovachev (talk・contribs) 20:56, 18 December 2023 (UTC)

I support this; if no one objects I will give him the permission. Benwing2 (talk) 22:43, 18 December 2023 (UTC)

Made it so. P U C – 13:38, 21 December 2023 (UTC)

Thanks! Kiril kovachev (talk・contribs) 20:01, 21 December 2023 (UTC)

How-To: forms with the same etymology, but different pronunciations

Hi,

I recently came across this when working on Bulgarian гаснете (gasnete). It's a non-lemma form of the verb гасна (gasna), but depending on stress, it corresponds to one of two inflections:

га́снете (gásnete) - 2nd person plural, present tense
гасне́те (gasnéte) - plural imperative

The BG acceleration module predictably created 2 separate POS sections (Verb) since the headword is different, and they're not just variants of each other - each stressed form corresponds to a different inflection. However, since AFAIK "Pronunciation" can't be a per-POS-section header, I ended up copying some of the information when giving the IPA for the two forms. It was the best I could come up with.

I suppose I could've created two separate "Etymology" sections, each with its own "Pronunciation", but that IMO would've gone against the common understanding that a separate Etymology section is only needed for etymologies that are, in fact, distinct. This is not the case here.

It gets even more interesting with verbs like летя (letja, “to fly”), which orthographically is both the verb's lemma form (1st person singular, present), as well as 2nd and 3rd person singular, aorist. The lemma form is pronounced IPA^(key): , while aorist form is pronounced IPA^(key): , and in both cases the stress is on the same syllable. The entry indicates that with a {{q}} after each pronunciation.

Is supplying qualifiers to different IPA bullet points in the Pronunciation section the preferred way of handling this type of thing? Or is there another, better way?

Thanks,

Chernorizets (talk) 01:51, 19 December 2023 (UTC)

@Chernorizets What you are doing is essentially what I believe is the best way. Some people use ==Pronunciation 1== and ==Pronunciation 2== headers, but there's no standardization of these headers (they're not even mentioned in WT:EL) and it gets very messy that way. When I encounter such headers I tend to clean them up using qualifiers on the pronunciation entries. I suppose a third way is to nest the pronunciation header under the respective POS header, but I've never seen that done. Benwing2 (talk) 02:53, 19 December 2023 (UTC)

Examples used for portuguese words

Paging @Stríðsdrengur, @Sarilho1 and @Benwing2, as I've seen contributions by them in articles related to the Portuguese language. Checking some portuguese words here and there, I noticed that the Brazilian dialect is the one commonly used for the examples (for example: "Ele veio me cumprimentar", on the page "vir"). Now, in my opinion, while I think it's alright, I also think that we should instead opt for more "universal" examples, that work both for the European and Brazilian dialects. However, some Wikipedias (such as the Portuguese one), have a policy against correcting a "mistake" for the reason of it being written in a different dialect. I wasn't able to find such policy here, so wondering if it would be possible? Jão das Couves (talk) 22:18, 20 December 2023 (UTC)

Bem-vindo. I agree that more dialect-neutral Portuguese (and Spanish, etc.) is preferable except in cases where we are trying to explicitly show Brazilian cases or peninsular cases. —Justin (koavf)❤T☮C☺M☯ 22:49, 20 December 2023 (UTC)

I forgot to say above, but I also agree that for explicitly Brazilian or European cases we should use that dialect (such as bebê, of course I'm going to keep the Brazilian dialect). Jão das Couves (talk) 08:28, 21 December 2023 (UTC)

And similarly, usage examples where the word has different shades of meaning per dialect, like abacaxi. —Justin (koavf)❤T☮C☺M☯ 14:21, 21 December 2023 (UTC)

I think we could put up two examples of using the same sentence but with the spelling/grammar of each "main dialect" of the language (European and Brazilian), just as we do with some words that are spelled differently depending on the country. Stríðsdrengur (talk) 22:53, 20 December 2023 (UTC)

That's because some of our most prolific early contributors were from Brazil. As to the issue at hand, I would say: add and label, don't replace. If the example is only valid for Brazil, add a "Brazil" qualifier and also a properly qualified "Portugal" example, so our readers can learn the difference. It might not hurt to have a "universal" example, too. Chuck Entz (talk) 22:57, 20 December 2023 (UTC)

That's exactly what I wanted to say Stríðsdrengur (talk) 22:59, 20 December 2023 (UTC)

Agreed about labeling and augmenting rather than changing. Benwing2 (talk) 23:06, 20 December 2023 (UTC)

Exactly what I was thinking. CitationsFreak (talk) 01:57, 21 December 2023 (UTC)

I will use that option then as more people agreed. Thank you everyone that commented! Jão das Couves (talk) 19:00, 21 December 2023 (UTC)

I'm not really convinced we should label sentences like "Ele veio me cumprimentar" in "vir" as pertaining to Brazil tbh. I agree we should strive to be 'neutral', but in that sentence, the difference between a Brazilian-looking entry and a Portuguese one is simply the placement of "me". That's a really minor thing, plus the page's about the verb "vir", which's used the same way in both countries (at least for that sense), so I feel like such a label would be adding unnecessary information. I don't think we do that in English when a sentence uses a word like "parlor" or "gotten". Universal examples sure; changing that sentence to something like "Ele veio cumprimentar teu amigo" sounds good to me, but duplicating examples? Putting labels on them when the sentence isn't even about those terms? Nah, that's too much. MedK1 (talk) 02:36, 21 December 2023 (UTC)

Firstly, I know the example only has a minor difference, I used that one because I had just checked it. Now, about your opinion, I agree that adding an entry to both dialects in a page that isn't about the word/expression that marks the difference between them is unnecessary, a 'neutral' example would be much better. But if I come through an example like that one, that isn't about the word/expression that marks the difference, but can't be 'neutral' without changing its meaning, I'll either duplicate the examples or change the meaning, if possible. Jão das Couves (talk) 19:17, 21 December 2023 (UTC)

My take is that indeed, a neutral example is better whenever possible, but duplicating the examples when the difference lies in a term irrelevant to the actual page is always unnecessary. You don't have to be explicitly neutral 100% of the time.

Also... I was gonna write about this in your talk page, but I mean, I'm talking to you here already so why not lol. About "ter que" vs "ter de" in your user page, I don't think it's actually a mistake to write "ter que", see Ciberdúvidas and Dicio. "Ter de" is (apparently?) more advisable, sure, but I'd dare argue it's a case of prescription rather than proscription at best and one similar to fewer and less at worst; never once have I lost points in essays for writing it like that.

You claim that "ter que fazer" is actually "ter o que fazer", but as you said right afterwards, that's a completely different sentence with a completely different meaning. To claim something like that is to go out of your way to misinterpret the construction just because of one's distaste for it. "ter que" in the outlined sentences is obviously and clearly a synonym of "ter de" regardless of how one might want to spin it. The point here is that you don't have to replace one form with the other, that's also unnecessary. It might be interesting to add some usage notes to "ter que" though. MedK1 (talk) 20:41, 21 December 2023 (UTC)

I didn't find those links when I was searching, so thank you! The reason I was replacing 'ter que' with 'ter de' is, again, to turn examples more 'neutral'. The other links I had checked (2 European and 2 Brazilian) all agreed (the Brazilian ones said there wasn't a consensus among linguists) that 'ter de' was the correct one for most forms (I think I have the exception in my user page, I can also provide the link if you want), and, to me, 'ter de' just seems more natural and correct. And, as both dialects agreed, I thought it would make sense to change that. But, again, thanks for providing the links! I'll keep the 'ter que's as they are then. Jão das Couves (talk) 21:48, 21 December 2023 (UTC)

Deletion of entries

A particular user deleted many of my entries. They need to be restored. Word0151 (talk) 03:04, 21 December 2023 (UTC)

@Word0151: many of your entries deserved to be deleted- for instance, recreating an entry that was previously deleted as a protologism with no citations and not even a definition is just wasting our time and yours. Chuck Entz (talk) 03:19, 21 December 2023 (UTC)

@Chuck Entz The english entries, were created in a short period of time and were around 8-15 in number; but most of the deleted entries were valid entries. Word0151 (talk) 03:39, 21 December 2023 (UTC)

@PUC Word0151 (talk) 12:57, 21 December 2023 (UTC)

Are you unaware of our WT:CFI? Previously deleted terms should be re-entered already with quotes. Vininn126 (talk) 13:08, 21 December 2023 (UTC)

Oh! Thanks, i was unaware of such a rule. Next time i will be more careful. Word0151 (talk) 13:13, 21 December 2023 (UTC)

Finally settling the matter with ES "verbse" forms having articles and PT "verb-se" lacking them

I'm pretty sure this isn't the first topic about this. Here's hoping it'll be the last? I wanna do something about that. In previous discussions, I believe the problem was that since the Spanish forms are directly attached to the verbs, they have a stronger case toward not being SOP, therefore fitting WT:CFI. Meanwhile, forms that are only ever written with hyphens and never together are a bit of a nebulous area even in English (going-out still doesn't exist for instance).

I still don't really know what to do about it nor what would fit WT:CFI the best, but this inconsistency really grinds my gears. Let me just run a wild idea by y'all though; it's something that popped up in my mind ever since the idea to make ]] was raised in my topic about Japanese verb forms above: Maybe we could do that with forms like abajarlos, considering them as SOP and then link to those as abajarlos (hover over that!)? And then we only keep/make pages that actually modify the base verb, as in acercándonos or pegá-los.

It kind of makes sense since, although they're written as one word, they're still clearly analyzable as different terms and arguably still don't form a single unit, right? This was me spitballing a compromise between "keep all Spanish pages and make the ones for Portuguese and others as well" and "delete all the Spanish pages". Maybe these other alternatives are still worth considering? I don't know, but let's please do something. MedK1 (talk) 23:53, 21 December 2023 (UTC)

I appreciate your initiative and agree that there should be a definitive solution, but this seems like the worst of both worlds. :/ I'm team "make all the Portuguese ones". —Justin (koavf)❤T☮C☺M☯ 00:03, 22 December 2023 (UTC)

@Koavf: I see your point and I can't exactly disagree, but I also can't help but think that the meaning of something like "pegou-me" is just obvious if you know what pegou and me mean. The """compound""" is really just the two terms mashed together and the same goes for the Spanish ones. Do we really need pages on those? This is why I actually lean toward the "delete the Spanish ones" team. I don't feel strongly either way though hm. MedK1 (talk) 00:39, 22 December 2023 (UTC)

Is there any reason not to keep the closed Spanish forms but delete/not create the hyphenated Portuguese ones? I don't see why consistency between separate languages (which obviously have separate spelling systems) should be a consideration.--Urszag (talk) 01:17, 22 December 2023 (UTC)

The difference between the two is really just the hyphenation and some different accentuation as a result of adding the hyphens. I don't see how they're any different when it comes to SOP-ness or fitting WT:CFI. The reasoning I gave at the original post is retroactive — an attempt (and a feeble one, might I add!) to explain the current state of affairs. There's no reason to keep the inconsistency. MedK1 (talk) 13:59, 22 December 2023 (UTC)

@Urszag Some of the hyphenated terms esp. those involving mesoclisis can't be analyzed as SOP words. E.g. fá-lo-ia "I/he/she would do it" is rather different from the equivalent Brazilian o faria. None of fá-, -lo nor -ia are standalone words in Portuguese. OTOH pegou-me "he/she grabbed me" is indeed transparently pegou + me, as shown by the equivalent Brazilian me pegou. The trickiness with Spanish is that doesn't exist to such an extent in Portuguese is that the mashing together of words often changes the spelling as a result of the appearance or disappearance of a written accent, according to quite complex rules. So I don't really know the solution; but IMO we should try to come up with a general principle that can apply to polysynthetic languages as well as languages that are in between agglutinative and fusional, such as Egyptian Arabic, where a sentence like "they won't give it to him" is expressed in a single word that has several phonetic modifications compared with the individual parts. Benwing2 (talk) 15:58, 22 December 2023 (UTC)

@Benwing: True. I hadn't thought about the mesoclisis forms. I've changed my mind from my original stance; I now think the option to 'delete the Spanish one' is a bit senseless due to the accentuated and/or slightly altered forms that can't quite be analyzed as SOP. Between "delete Spanish" and "add all the Portuguese ones", I'm leaning toward the latter, though maybe the original proposal at the opening comment might be worth considering... Hopefully we can reach an actual consensus here. MedK1 (talk) 03:46, 30 December 2023 (UTC)

What you have said, MedK1, makes sense. Thalyson2019 (talk) 02:19, 14 January 2024 (UTC)

Language codes for the proto forms of upper branches of Dravidian (south, south-central, central, north)

Has been discussed many times and was also agreed up to add them like here and here

The proposal was because there are tons of terms which are restricted to certain branches, sometime being only attested in like 4 languages of 1 branch (*cinki for example) in a family of 80+ languages with 4 branches but are reconstructed to the proto stage, {{R:dra:DL}} has many reconstructions for the inner branches. {{R:dra:Southworth}} mentions many cases where BK reconstructs terms to PD when they are restricted to certain branches like PSD or PCD further saying it should be reconstructed only to the proto branch only. AleksiB 1945 (talk) 10:35, 22 December 2023 (UTC)

Placement of References and Further reading in entries

Why does a certain bot (AutoDooz) place References above Further reading? Was there any consensus for this? I would prefer the order to be Further reading, References, then Anagrams. Hyde is an example. DonnanZ (talk) 11:54, 22 December 2023 (UTC)

Er, no, "References" goes before "Further reading". See "Wiktionary:Entry layout#Headings after the definitions". — Sgconlaw (talk) 12:03, 22 December 2023 (UTC)

Um, does that mean that nothing can be changed? DonnanZ (talk) 23:54, 22 December 2023 (UTC)

@Donnanz You can always start a vote to get it changed. I don't have a preference either way. Theknightwho (talk) 00:07, 23 December 2023 (UTC)

FYI: December updates from Unicode

https://mailchi.mp/7ae7b6c96c6b/testing-rickys-template-6264818 —Justin (koavf)❤T☮C☺M☯ 23:12, 22 December 2023 (UTC)

An idea for a Wiktionary competition...

I was reading some old children's dictionaries (don't ask), and I realized something. You see, there are certain types of these dictionaries, that define words like this. They use a sentence with that uses a word they want to define, and then they rephrase the sentence with a definition. It would go something like "He had a manxome look. He had a fearsome look.". Of course, this type of highly repetitive writing is weird, and not something you would typically find in the speech of anyone. Which is why I want to have a writing competition where the goal is to write a part of a play that uses as many sentences in this vein as possible.

How this would work is that I would give you a small alphabetic range and a certain time frame to write some dialogue, probably somewhere around a week or so. Scoring would be based on how many definitions you put into it, with more being better. I think you should do this by email, so you can't see how many words your opponents have defined in their works, although I am open to other suggestions.

Sound like fun?

I chose a play as what you'd write because of the fact that plays are generally oral in their dialogue, and so hearing (hypothetical) people say the no-doubt weird dialogue is a little funny to me. CitationsFreak (talk) 10:36, 23 December 2023 (UTC)

Sure, I would participate if we did that :) Kiril kovachev (talk・contribs) 15:27, 23 December 2023 (UTC)

I think this sounds like a great idea, but I think it would be more fun if it was in the style of those games where one person gives a line/few words, and then the next person continues the story where they left off, and so on. Those often become quite entertaining and it would be more of a community game. You could still assign people a certain alphabetic range that they have to use in their answers, but the play overall would have a greater range. What do you think? Andrew Sheedy (talk) 21:52, 23 December 2023 (UTC)

Sure. Less work for everybody, y'know? CitationsFreak (talk) 22:02, 23 December 2023 (UTC)

Along the lines of "exquisite corpse", where you get a fragment to continue from, and nobody sees the whole thing until the end. Equinox ◑ 22:04, 23 December 2023 (UTC)

What if it was written like how a dictionary would be written, where every word defined must come a little after the previous word defined? CitationsFreak (talk) 00:00, 24 December 2023 (UTC)

#invoke in entries

Are we allowed to use #invoke in main space? When I started serious editing here, I was told we weren't, but I'm no longer sure it wasn't one editor venting his dislike of the construct. The particular issues are whether I should replace {{#invoke:string|replace|...}}, as in Sanskrit බුද්‍ධ (buddha), by {{replace|...}} and instead of using {{#invoke:sa-convert|tr|...}}, as in අහම් (aham), use {{sa-convert|...}} , where the templates would need to be written. If I need to write the templates, what category should I put {{replace}} in? --RichardW57 (talk) 16:19, 24 December 2023 (UTC)

@RichardW57 It's forbidden by WT:NORM#Modules. Theknightwho (talk) 13:41, 29 December 2023 (UTC)

I doubt that this was ever voted on. It therefore might not be policy, rather than practice or guideline. DCDuring (talk) 14:28, 29 December 2023 (UTC)

@Theknightwho, DCDuring: Thanks. Indeed, the second paragraph says they're only mandatory for bots. I'm not a bot. However, this restriction was voted on, and passed 6-0-1. I could be antisocial and take the view, "If you don't like it, fix it yourself". --RichardW57 (talk) 15:12, 29 December 2023 (UTC)

But I've now created and applied {{replace}} and {{sa-convert}}. --RichardW57 (talk) 23:06, 29 December 2023 (UTC)

Notes section vote

I created a vote at Wiktionary:Votes/pl-2023-12/Notes section. Catonif (talk) 17:00, 27 December 2023 (UTC)

Problem with Category:Old Galician-Portuguese Lemmas

Paging @Stríðsdrengur and @Sarilho1, for their contributions in entries related to OGP.

I think the Category:Old Galician-Portuguese Lemmas isn't working as it should. At the time of writing this topic, the oldest pages by last edit (Lisbõa, menỹo and camỹo) are all non-lemmas. I'm still new to Wiktionary, so I don't know how to help with that. Amanyn (talk) 18:47, 27 December 2023 (UTC)

Btw, also forgot to say, but if anyone's got extra time could they help with ãade? It's an OGP term for duck, and maybe a category:roa-opt:Ducks could be added to put it in (that category then being put in category:roa-opt:Pouldry, or however it is done, I don't have experience with categories). Amanyn (talk) 18:54, 27 December 2023 (UTC)

@Amanyn: they should be lemmas: an alternative form/spelling of a lemma is itself a lemma. If you think about it, the alternative spelling has all of its own inflected forms, just like the main spelling has its own. Besides which, in languages with more than one standard, the decision as to which is the main spelling and which is the alternative spelling is often at least partly a judgment call. With inflected forms, on the other hand, a language which has the infinitive as the lemm for verbs can have all of the present, past, future, and other verb forms labeled automatically by the headword template as nonlemmas without the template knowing anything about the verb itself. Chuck Entz (talk) 05:02, 28 December 2023 (UTC)

Alright @Chuck Entz, thank you for explaining it. Amanyn (talk) 11:14, 28 December 2023 (UTC)

planning a vote to decide on the NURSE and STRUT vowels

Whether to notate STRUT as /ʌ/ or /ə/ and NURSE as /ɜ(ː)/~/ɝ/ or /ɚ/~/əɹ/ has come up several times and there's been a fair bit of support for using schwa, but the discussions died without resolution. So, I created Wiktionary:Votes/2023-12/Represent the GenAm NURSE and STRUT vowels as schwa. Please make or discuss any changes to the vote you think necessary. In particular, I titled it as being about GenAm, but we should consider whether to rename it and also vote on how to notate these vowels in (UK) / SSBE: the OED represents NURSE as schwa in both UK and US English, and though they still notate STRUT words as /ʌ/ for British (/ə/ only for American), their UK and US audio is identical.
I'd love if one day, when we get a T:en-IPA, we could include the historical evolution of sounds, e.g. (if we write modern GA NURSE as /ɚ/) mention older GA /ɝ/, mention pre-pane-pain merger pronunciations, etc — I think it's silly we provide more info about the evolution of Egyptian or Greek pronunciation than English! - -sche (discuss) 22:15, 28 December 2023 (UTC)

I'm not sure how accurate it is to present "ɝ" vs "ɚ" in stressed syllables as a matter of historical evolution of the phonetic vowel sounds (as opposed to evolution of notational conventions). Daniel Jones's transcription in the Phonetic Dictionary of the English Language (1913) marked the vowel in "earl" as əː. While Jones's general philosophy was to allow some wiggle room in quality between long and short paired vowels (thus the use of u, i, ɔ for what Gimson would transcribe as ʊ, ɪ, ɔ), his usage suggests to me that the sounds may have been as similar a century ago as they are today.--Urszag (talk) 02:07, 30 December 2023 (UTC)

Defragmentation (merging) of Korean X & X하다 pages

In Japanese, the highly productive light verb する (suru) is used to derive verbs from nouns, which are often (but not explicitly) loanwords, such as (more commonly) Chinese loanwords, or English loanwords. Because this is such a productive and common thing to do, on Wiktionary, we list the suru-verb of a noun under the same page as the noun, such that the page 勉強 has both the entries for the noun 勉強(べんきょう) (benkyō) and the verb 勉強(べんきょう)する (benkyō suru), even if the verb form has separate, differing, or extended meanings from the base noun form (as does 勉強(べんきょう)する (benkyō suru)). If someone tries to go to the page 勉強する, they get redirected to 勉強.

Korean, as a language, operates very similarly in this regard, apart from minor differences such as object particle usage. The light verb is instead 하다 (hada), and can be used almost syntactically identically as a "verbifier" to the Japanese する (suru), such as deriving verbs from native vocabulary or out of Chinese or English loanwords, or natively coined vocabulary out of foreign parts (such as Konglish or the broader Sino-Korean vocabulary, which also comprises 和製漢語(わせいかんご) (wasei-kango)). However, we treat Korean differently from the way we treat Japanese and have separate pages for a given "하다-able" noun and the noun + 하다, e.g. 공부 (gongbu) and 공부하다 (gongbuhada). (However, there is one example of a 하다-verb not having its separate page and redirecting to the base term page: -게 하다 (-ge hada), which redirects to -게 (-ge).) Chae (2023) expresses discontent at this type of practice in general:

Among the confusing practices, what stands out is the treatment of the ‘light verb’ ha- ‘to do’ and similar words like toy- ‘to become’ when they combine with a preceding ‘verbal noun’ (Chae 2020:157–59). Despite their status as (regular/independent) words, they are commonly treated as derivational affixes.
(1)
cyoni kongpwu(lul) cal/(acwu) manhi/… ha∅nta.
cyon-i kongpwu(-lul) cal/(acwu) manhi/…
John-NOM study(-ACC) well/(very) much do-NPST-DECL
"‘John studies well/(very) much/ … ’"

This example demonstrates that a variety of external elements can be inserted between kongpwu and ha-, indicating that ha- is a regular word and forms a phrase. Unfortunately, however, the following expressions of verbal noun plus ha- or similar verbs are analyzed as words rather than phrases in the handbook.

(2)
a. palphyo-ha- ‘to present’ (108), sayngkak-ha- ‘to think’ (344), mal-ha- ‘to say’ (380, 613), kitay-ha- ‘to expect’ (622), kongpwu-ha- ‘to study’ (666)

b. sengkong-sikhi- ‘to make succeed’ (387), kiso-toy- ‘to be prosecuted’ (399), kyeysok-toy- ‘to continue’ (438)

Even expressions that contain the particle -lul, like tongco-lul-ha- ‘to agree/align’, are analyzed as single words (804).
This undesirable tradition may have arisen from the function of ha-, which is to create verbal expressions out of loanwords. For instance, Korean borrowed numerous (verbal) words from Chinese, which needed to be adapted as verbals within the Korean grammatical system. To facilitate this adaptation, ha- was attached to these words, resulting in formations like ‘to construct’ and ‘to play soccer’. In recent times, English loanwords have also been assimilated as verbals in a similar manner, as seen in formations like ‘to do drive’ and ‘to do study’. This verbalizing function of ha- and its lack of (significant) meaning may have led to its analysis as a derivational affix. However, it is essential to understand that the morphosyntactic status of a linguistic unit as an affix or a word is not determined by its functions or meanings. As is well documented, the same function/meaning can be realized as an affix, a word, or even a larger expression in different languages.

I agree with Chae here. Personally, I just find the fragmentation of meaning across multiple pages particularly for loanwords unnecessary. It is definitely true that there are some cases where idiomatic usages, special grammatical constructions, orthographic considerations, or other large differences in meaning, usage, or semantic perception make this fragmentation necessary, such as in 못하다 (mothada) or -어야 하다 (-eoya hada). Such considerations are valid for Japanese too, which is why we split off terms like 関(かん)する (kansuru), 私(わたくし)する (watakushisuru), 息(いき)をする (iki o suru), or 後(あと)にする (ato ni suru) into separate pages. But for an extremely large body of predominantly Chinese-derived vocabulary, where there is virtually zero semantic difference between a noun form and a verb form other than part of speech, whether the existence of both e.g. 학습 (hakseup) and 학습하다 (hakseuphada) as separate pages is necessary or not is questionable. LittleWhole (talk) 05:57, 29 December 2023 (UTC)

I'd oppose any merging that involves a lemma that can be found elsewhere. I'd like to stay in line with monolingual Korean dictionaries. Also coming from a non-native's perspective, folks are more likely to look up the form with -하다 when they're looking up the verb, and I'd rather avoid creating a ton of redirects. I'd also like to avoid possibly overloading the noun pages themselves. AG202 (talk) 17:29, 29 December 2023 (UTC)

Strong support.--Saranamd (talk) 04:23, 30 December 2023 (UTC)

^ Chae, Hee-Rahk. Review of The Cambridge handbook of Korean linguistics ed. by Sungdai Cho and John Whitman. Language, vol. 99 no. 4, 2023, p. 844-850. Project MUSE, https://doi.org/10.1353/lan.2023.a914195.

Categories need documentation of how populated, why, where, etc.

Currently it seems very easy to create categories in modules which make it quite hard for normal contributors to determine much about the category: what leads to such categorization, why the category is necessary, whether it is to be permanent, how to depopulate it if it seems to refer to a problemm what the program is to depopulate it if not manually, who created the category, etc.

When such categories were created either manually or directly by a template, the lack of documentation was less difficult than it is now. The operation of modules is not inherently transparent and is not well documented from the point of view of normal contributors. I don't think that the unstated imperative "Trust us, we are doing the right thing" is good enough.

I would be happy to document all categorizing taxonomic templates for which I can get understanding of the workings of any modules that have usurped their operation. I've been thinking lately that it would probably be a good idea to eliminate many taxonomic categories in favor of relying on searching for parameters in the templates {{taxon}}, {{taxoninfl}} and {{taxlink}}. DCDuring (talk) 16:30, 29 December 2023 (UTC)

Entries for Sanskrit Noun-Verb Bahuvrihis

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): As I understand it, most editors are following a policy of self-denial for bahuvrihi compounds for Sanskrit, Swedish, German and similar languages, and treat bahuvrihis as potential SoPs. Under this policy, how should I treat a compound like Sanskrit पर्वतस्थ (parvatastha, “standing on a mountain”)? Should I treat it as a sequence of words पर्वत (parvata, “mountain”) and Sanskrit स्था (sthā, “to stand”, root), some way similar, or do I have to regard it as a single word? If not a single word, how do I record the inflection of the root as a final element? As a final element it declines from a stem -स्थ (-stha), with a by-form -ष्ठ (-ṣṭha) under RUKI, e.g. भूमिष्ठ (bhūmiṣṭha). (Both compound words show up in the final line of the quotation for Sanskrit යදා (yadā).) --RichardW57 (talk) 07:50, 30 December 2023 (UTC)

@RichardW57 I recently created a template kind of addressing this for what Monier-Williams calls "ifc" (in fine compositi, i.e. "at the end of a compound") called {{sa-ifc}}, which is currently only used on the page शी (śī). Monier-Williams defines स्थ (stha) as such an "ifc" word meaning "standing, staying, abiding, being situated in, existing or being in or on or among".

I would say that we should definitely have an entry for the ifc words like स्थ (stha), while the legitimacy of entries like पर्वतस्थ (parvatastha) can be determined on a case-by-case basis (if not, they are just an example use-case of the ifc word). Dragonoid76 (talk) 23:02, 30 December 2023 (UTC)

@Dragonoid76: That makes sense, but I think we may have a technical issue for roots in -ā in which the masculine stems ends in -a. If such feminines were declined as monosyllables, Whitney should not be as short of examples as he complains he is. Is it possible that they actually take the polysyllabic (aka 'derived') declension as feminines formed from the masculine? {{sa-decl-adj-mfn}} is delivering the monosyllabic -ā declension for the feminine of स्थ (stha), which I fear may be incorrect. --RichardW57 (talk) 19:45, 31 December 2023 (UTC)

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): Indeed, by Paragraph 333 of Whitney (https://en.wikisource.orghttps://dictious.com/en/Sanskrit_Grammar_(Whitney)/Chapter_V#111) - "There are no verbal roots ending in a. But a is sometimes substituted for the final ā of a root (and, rarely, for a final an), and it is then inflected like an ordinary adjective in a (see below, 354)." Ah well, in the short (I hope) term, just expand the declension to {{replace|{{replace|{{sa-decl-adj-mfn|kastha|kasthā|kastha}}|कस|स}}|kas|s}}. RichardW57 (talk) 04:41, 3 January 2024 (UTC)

Marking the replacement of a borrowing source's pronunciation with a spelling pronunciation

Many borrowings attempt to phonologically adapt the pronunciation of the source word into the phonology of the borrowing language. But sometimes the word's spelling is borrowed, but the pronunciation is completely discarded and replaced with a pronunciation inferred by the spelling. This occurred in bauxite, where the French etymon's pronunciation /bo/ was not borrowed, and instead a morpheme pronunciation /bɔːks/ was created out of whole cloth based off the spelling (there is no purely phonological way the /ks/ could have arisen from the French etymon).

How do we note the occurrence of a spelling pronunciation override in an etymology? I previously used {{obor}} for this purpose whenever it occurred in a borrowing, but are there more appropriate ways to do this? -sche complained when I used this template whose vast majority of uses were on Japanese-to-Chinese borrowings where kun'yomi can get replaced by Chinese readings of borrowed characters. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:23, 31 December 2023 (UTC)

My understanding is that a language using a spelling pronunciation is handled via T:spelling pronunciation as on e.g. Vanille#German or just spelling out what's going on as on e.g. Arctic (and a language using an arbitrary non-spelling pronunciation is handled the latter way, like on lingerie). My understanding is that "orthographic borrowing" is only for cases — chiefly in non-alphabetic Asian languages, but with a few edge cases among alphabetic languages like CCCP or Xmas — where the grapheme itself is the only thing that was borrowed , not for the normal case of changes to pronunciation happening. - -sche (discuss) 04:36, 1 January 2024 (UTC)

Alternative forms

Should alternative forms be included in categories? ... or do we only include the 'primary' entry? John Cross (talk) 18:12, 31 December 2023 (UTC)

@John Cross: I only include alternative forms when they're so different that it's not obvious that they're alternative forms. Having two forms that are the same except for hyphenation or other minor spelling differences just adds clutter. After all, categories are navigational aids, and the links in the main entry are more useful for finding alternative forms than those in the category. I should mention that I'm referring to topic, set and similar categories. The language, grammatical and morphological categories populated by templates are another matter. Chuck Entz (talk) 20:10, 31 December 2023 (UTC)

thank you. John Cross (talk) 20:17, 31 December 2023 (UTC)

[1] Chae, Hee-Rahk. Review of The Cambridge handbook of Korean linguistics ed. by Sungdai Cho and John Whitman. Language, vol. 99 no. 4, 2023, p. 844-850. Project MUSE, https://doi.org/10.1353/lan.2023.a914195.