Wiktionary:Grease pit/2024/January

Hello, you have come here looking for the meaning of the word Wiktionary:Grease pit/2024/January. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Grease pit/2024/January, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Grease pit/2024/January in singular and plural. Everything you need to know about the word Wiktionary:Grease pit/2024/January you have here. The definition of the word Wiktionary:Grease pit/2024/January will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Grease pit/2024/January, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

Protecting Empty Categories

How does one protect empty categories against unthinking deletion by @Benwing2? I gather from a post by @Theknightwho that there is some magic to do this, but from the recent mishap to Category:Pali terms belonging to the root sam, it appears that the magic word __EXPECTUNUSEDCATEGORY__ doesn't do this.

The problem is that, although I split the root by etymology, there is nothing to warn users against deriving words from the unsplit root. I therefore added warning text to the category for the unsplit root to increase the chances of inadequate categorisation being noticed and thereby, one would hope, be noticed. To do that, I added the magic word, thinking it would protected it from speedy deletion.

I have now made the categories for the split roots subcategories of the category for the unsplit root, and in this particular case it may protect them. However, there are some maintenance categories that could benefit from some text on the category page, but such documentation work, in keeping with the request at Wiktionary:Beer_parlour/2023/December#Categories_need_documentation_of_how_populated,_why,_where,_etc., would seemingly be lost while the category was empty.

With regard to categorisation, am I perhaps working in an unexpected way? Should root categorisation add to both split and unsplit categories when there is a split by senseid or etymid? If so, this would require a revision to Module:root. It would make it even harder to detect that a senseid or etymid was missing. --RichardW57 (talk) 13:54, 1 January 2024 (UTC)

That seems like an inadvisable way to warn users. We shouldn’t have empty categories just to say “don’t put things here”. Theknightwho (talk) 15:41, 1 January 2024 (UTC)
@Theknightwho: What do you suggest? Maintaining a blacklist for Module:root? --RichardW57 (talk) 20:06, 1 January 2024 (UTC)
@RichardW57 In the case of roots, your (new, I assume) solution of using subcategories is preferable, since it’s easy for users to find and they won’t get deleted since they aren’t empty, because subcategories count as members too. It would be better to set up the category tree to do this automatically through autocat, instead of manually as you’ve done it. Theknightwho (talk) 20:13, 1 January 2024 (UTC)
@RichardW57 I only delete categories in CAT:Empty categories. Categories are placed there by the category tree system if they're defined using {{auto cat}}, are empty and don't have the |can_be_empty= setting in their definition. The fact that they're defined using {{auto cat}} means they can be re-created the same way, and in fact will be the next time I run my create_wanted_categories.py script (which I run every 3 days, since this is the frequency that Special:WantedCategories is refreshed at). I can fix my deletion script to pay attention to manually-added __EXPECTUNUSEDCATEGORY__ keywords added to the category text, but in general this isn't the right way to do things. It's better to use the category-tree machinery as it was designed to be used, so that just {{auto cat}} is enough on the category page. Benwing2 (talk) 22:12, 1 January 2024 (UTC)
@Benwing2: Whereas __EXPECTUNUSEDCATEGORY__ works with Special:UnusedCategories! Thanks for answering the question. Who's doing the cleanup for unautocatted categories listed by Special:UnusedCategories?
I contemplated adding 'Category: Pali pi-alt templates' to the autocat systems a few days ago, and decided it probably didn't generalise well across languages. (The Sanskrit system is based on the Pali system.) I then discovered I'd already created cat: Pali alternative form templates, which I think could be incorporated into the autocat system, and had the new category deleted.
For homographic roots, would it make sense to put the black-listing in the {{auto cat}} definitions rather than Module:root? --RichardW57 (talk) 22:59, 1 January 2024 (UTC)
@Theknightwho: However, the text that a category should be devoid of entries still has to be added - or is there some sane way for the category page creation for "LANG terms belonging to the root XXX" to detect the existence of a category "LANG terms belonging to the root XXX (YYY)" for some unspecified YYY? --RichardW57 (talk) 22:19, 1 January 2024 (UTC)
@RichardW57 Sometimes I go through Special:UnusedCategories to see if any of them can be deleted or otherwise fixed up, although not on a regular basis. As for Category:Pali terms belonging to the root sam and subcategories like Category:Pali terms belonging to the root sam (toil), I'll fix things up so the latter automatically gets the former as a parent category, so you don't have to manually add them. This already happens with affix categories like Category:English terms suffixed with -en and subcategories Category:English terms suffixed with -en (inchoative), Category:English terms suffixed with -en (made of), etc. As for blacklisting, I'm not sure what you are referring to exactly, can you clarify? Finally, keep in mind that you can add language-specific labels and handlers to the category system, so there's no real need to worry about whether it generalizes across languages. Benwing2 (talk) 02:12, 2 January 2024 (UTC)
Fixed. BTW I think by "blacklist" you mean you want to display a message in categories like Category:Pali terms belonging to the root sam indicating that terms should go in a subcategory. It may be possible to automate this, as I think it's possible to query a category in Lua to see whether it has subcategories. Benwing2 (talk) 02:27, 2 January 2024 (UTC)
Blacklisting functionality would definitely be appreciated for cat:Latin terms suffixed with -atus and similar categories. This, that and the other (talk) 05:58, 2 January 2024 (UTC)
@Benwing2: There are various things one could do with a blacklist, depending on how it was implemented. For example, if the implementation were done in Module:root, options include raising a module error (possibly too extreme) and writing a message during preview via mw.addWarning(), which is reasonably done by the template {{sa-alt}}. --RichardW57m (talk) 11:34, 2 January 2024 (UTC)

How do I correctly sort Oͤ as Ö in Swedish Categories?

In Category lists; how do I alphabetically sort Swedish words starting with under Ö? Currently, all words starting with Oͤ are wrongly sorted under O.

An example of this can be found in Category:Swedish terms spelled with Oͤ; where oͤfwer (archaic spelling of över (over)) is sorted under O, despite /Ö and O being two completely different letters.

I understand that there is an more involved way to fix this centrally, but I'm in no way technically savvy enough to do that. Therefore I just wonder if there's a parameter, in e.g. autocat, that I can use? --Christoffre (talk) 11:53, 2 January 2024 (UTC)

@Christoffre I've just set "aͤ" to sort as "ä" and "oͤ" to sort as "ö" in Swedish and Finnish, so no need for you to do anything manually. Pinging @Surjection, Hekaheka as the two Finnish editors I can think of off the top of my head. Theknightwho (talk) 13:19, 2 January 2024 (UTC)
Thank you so much for the help, especially by fixing it centrally. --Christoffre (talk) 14:22, 2 January 2024 (UTC)

Can't create a page for upcoming unicode character U+A7CB LATIN CAPITAL LETTER RAMS HORN

Whenever I try to publish the page, a warning appears which prevents me from doing so, saying that "there are disallowed unicode characters in the title". This really confuses me, because there are already pages for reserved characters, such as (U+A7CF LATIN SMALL LETTER PHARYNGEAL VOICED FRICATIVE). Friendly Fire150 (talk) 13:40, 2 January 2024 (UTC)

@Friendly Fire150 Two things here:
  1. Yes, you can't create a page for an upcoming character because it's not a character yet. The filter will be changed when Unicode is updated.
  2. The reason we have pages for (not yet encoded) characters like and (and several others) is because @Kwamikagami decided - apparently because 8 disallow warnings weren't enough - to explicitly bypass the filter by moving pages which already existed to those titles. This was a loophole, which has now been closed.
Please just be patient. You can make the pages when Unicode 16.0 is live.
On a related note, I'm tagging administrators @Chuck Entz @Surjection, @Benwing2 @This, that and the other @Erutuon since this post reminded me about this, because it's another case of Kwamikagami thinking the rules don't apply to them when it comes to single-character entries. Theknightwho (talk) 13:54, 2 January 2024 (UTC)
"8 disallow warnings weren't enough." What disallow warnings?
I had created several pages for unsupported characters, with work-around names "Unsupported titles/...", then recently when Unicode characters were assigned, I moved the pages to those characters. Recently I created a couple more with a move this way (yes, that was a work-around for page creation that wasn't working), but also e.g. Unsupported titles/Linearized tilde when there is no Unicode character. I don't recall any warnings about them. If you think they shouldn't have been moved, you're free to move them back, or to ask me to move them back. I see you did that with one, then changed your mind. kwami (talk) 14:10, 2 January 2024 (UTC)
@Kwamikagami "then recently when Unicode characters were assigned..." but those characters don't yet have assigned Unicode codepoints. They have draft codepoint assignments which may well change in the future; to quote the Unicode character-pipeline page (archived in its current form here for the benefit of anyone reading this discussion in the future), "Until the start of the official beta review, the code points and character names for draft candidates are not immutable—the UTC will consider feedback regarding proposed character names or other issues. Draft candidates may also be reconsidered by the UTC. They may be removed from draft candidate status and be postponed for consideration for a future version, instead." and "Current Status Note: The following repertoire is currently in preparation for alpha review. Code points and character names are not yet frozen, and feedback can be provided via the contact form." There's still a chance (admittedly slim) that some or all of these proposed characters will be dropped from consideration entirely or moved to different codepoints, and we won't have a guarantee that these characters will certainly be assigned (at their proposed codepoints or at all) until Unicode 16.0 enters beta review. And, per the Unicode website (archived here), it won't enter beta review until this May. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:07, 2 February 2024 (UTC)
That's true. It is, however, unlikely the codes will be changed. Probably fewer than 1% ever are, and if they are it's easy enough for us to move our page. Though, as said, I haven't created any more pages like this. kwami (talk) 05:53, 3 February 2024 (UTC)
I suspect that's because I amended the filter to prevent your page-move workaround. Theknightwho (talk) 06:31, 3 February 2024 (UTC)
No, it's because you objected and I said I wouldn't do it any more. I was never prevented from creating such a page.
You might try assuming good faith. kwami (talk) 06:35, 3 February 2024 (UTC)
@Kwamikagami You triggered the filter which disallows the creation of titles which contain unassigned codepoints 8 times between 29th December and 1st January (according to the abuse filter logs), and each time it will have shown you this warning about why creation was disallowed. It tells you why, and it tells you where to raise the issue if you think something should be done about it.
I moved one then moved it back because it started throwing errors as soon as I moved it, and I realised it would require me to put in a load of time to work out what needs to go where if I wasn't just going to mass revert your changes. Theknightwho (talk) 14:19, 2 January 2024 (UTC)
Several of those were triggered for pages that I didn't create because I got an error. And it didn't happen for either of the pages you linked to, which I created months ago and which did not generate an error then or when I moved them. I did see it for . Not trusting my memory, it looks like that's the only article I created by working around the warning.
Sometimes I'll try to save an edit and it will fail, with a warning that it was blocked for being a harmful edit. I'll then delete something (like an extra line that's been in the page for months to years) and the edit will go through. The apparent lesson is that when an edit fails, you jiggle things to work around the problem, and also that the reasons for such failures are often trivial (like a blank line).
In other words, I created one article by working around that warning, a warning that for all I knew was as trivial as an edit being blocked because of an old extra line on the page. If that's not something I should do, then I won't do it any more. kwami (talk) 14:36, 2 January 2024 (UTC)
I think we should have a preparatory project page where people could link their character entries for upcoming Unicode that can then by moved when time is ripe, because when time is ripe someone else might make a new page without seeing previous work and thus invest duplicate work, on the other hand the situation discourages people who have the time or motivation to make the entries earlier but not exactly when the time is ripe. Fay Freak (talk) 05:19, 3 January 2024 (UTC)
I don't think that's necessary at all. We have the Preview feature for "is this really what I mean" in the short term; and for the long term, people are using computers, so they have tools like Notepad, Microsoft Word, or vim or emacs. Equinox 05:24, 3 January 2024 (UTC)
Necessary it is not. But this is not the impulse wherewith people work, as editors. The process of navigating the creation of an entry can be streamlined to increase our close rates. It also benefits loyalty to a product provider if customers can visualize a sense of belonging rather than being redirected to a solitary experience. Fay Freak (talk) 07:50, 3 January 2024 (UTC)
Agree with Fay Freak here. We're a collaborative project. Let's let people collaborate. They can do it in the Appendix, or one chosen user's userspace. This, that and the other (talk) 06:20, 4 January 2024 (UTC)

I'd like to correct the entries listed in this category but in many cases I can't figure out what the problem is. For example, ablaküveg, Abaújszántó. Currently, I'm going through Category:Hungarian links with redundant target parameters and while the corrections there seem successful, the Category:Hungarian links with redundant wikilinks keeps growing, as if my corrections in the other category would create a problem in this one. Panda10 (talk) 21:34, 3 January 2024 (UTC)

The category inclusion is caused by the internal coding of {{hu-pos-etek}}, but I am not convinced it is an error that needs fixing. Questions (not for Panda10 but for others): Why do these "redundant wikilinks" categories exist? What problem is created by writing {{l|en|]}}? This, that and the other (talk) 06:23, 4 January 2024 (UTC)
@This, that and the other It’s less efficient, since the link module has to do a full parse of the input, which starts to become a real issue if applied to many links across the page (e.g. if it is hard-coded into templates with many links). It also serves absolutely no purpose, and creates annoying clutter in the Wikitext. Theknightwho (talk) 08:25, 4 January 2024 (UTC)

{{ja-adj}} error

{{ja-adj}} with |infl=na is incorrectly adding na to the romaji link. See e.g. しなやか which is producing "adnominal しなやかな (shinayaka na na)". Ultimateria (talk) 17:48, 5 January 2024 (UTC)

@Ultimateria Thanks - that has to be something to do with linking. I'll have a look. Theknightwho (talk) 18:26, 5 January 2024 (UTC)

Label ‘Middle Cherokee’ generating category Category:Middle Egyptian

At the page for Category:Middle Egyptian, the auto-generated description reports as follows: ‘The following labels generate this category: Middle Cherokee (aliases Eastern Cherokee, Kituhwa, Kituwah); ’. Anyone know why this is happening? — Vorziblix (talk · contribs) 11:32, 7 January 2024 (UTC)

@Vorziblix I fixed this by moving the Cherokee labels into the Cherokee-specific label module instead of having them in the general module. Benwing2 (talk) 20:36, 7 January 2024 (UTC)
@Benwing2: Excellent, thanks! — Vorziblix (talk · contribs) 00:02, 8 January 2024 (UTC)

Towards a better solution for sorting

Hi everyone - so I've been investigating ways to improve how we do sorting, because at the moment it's a massive fudge that causes headaches for various different reasons. The standard MediaWiki behaviour is to sort characters in codepoint order, which is pretty arbitrary outside of basic ASCII. Our ways of getting around this are generally:

  1. Splitting digraphs (Æ to AE, Œ to OE etc).
  2. Stripping common diacritics (e.g. acute, grave, umlaut etc).
  3. Reordering certain letters in the alphabet by sorting them as the letter before plus a really high codepoint character, which means they always gets sorted after that letter. For example, Russian Ё is supposed to be sorted between Е and Ж, but by default it isn't. To get around this, we sort it as though it were Е󏀀 (Е + U+F000), which causes it to always be placed after it.

There are several drawbacks:

  1. We run into problems as soon as we start encountering rarer letters/diacritics. There are thousands of English entries using nonstandard orthography, and tons of different diacritics. Most of these aren't accounted for, which means those entries essentially get sorted arbitrarily, unless we go out of our way to do it manually. Most of the time we don't.
  2. Simply removing diacritics altogether isn't great when there are several terms which only differ in that respect, because they get sorted randomly. Quite often, languages will use those diacritics/apostrophes/hyphens/whatever as tie-breakers when all else is equal: e.g. Mandarin and Yoruba have specific orders for tonal accents, and even in English we would usually sort (e.g.) make do before make-do.
  3. The hacky way of changing the order of letters causes problems with categories, because it means that terms starting with those letters aren't placed under their own header. That's not such a problem with Russian Ё being placed under Е, but it's more of a problem when Swedish Å gets sorted under Z, or Azerbaijani Q under K.

These issues are possible to solve, but this may require a bit of work to implement:

  1. I'd be really keen to have some way of implementing the Unicode Collation Algorithm, which is essentially the standardised way to sort characters. In brief, it sorts equivalent letters as the same thing (e.g. e.g. A. A, a, a, á, à, â), but they all have secondary and tertiary sorting weights which get used as tie-breakers. Not only would this provide us with a consistent baseline for sorting terms across all languages, but it would make a lot of the language-specific sortkeys unnecessary. Obviously, we still need to have a way to account for the fact that some languages sort the same characters in different ways, but the UCA is specifically designed so that you can modify it as necessary for each language.
  2. Implementing this would need to be done quite carefully: ultimately, any implementation will need to generate a sortkey which is based on codepoints, because that's the format the MediaWiki software itself sorts by. However, since UCA sortkeys have nothing to do with codepoints, it would be safer to use private-use characters for the sorting output to avoid serious confusion. The UCA uses sortkey weights from 0000 to FFFF, so we can simply use the characters U+100000-U+10FFFF. This is exactly the kind of thing that private use characters are there for, after all.
  3. I've also come up with a solution to the category header problem, which is a script which converts these private use characters into the correct header in the category. It would be straightforward for us to set this up as a gadget. I've also written a module which reads the raw UCA data published by Unicode and outputs a JS object which can be copied and pasted into the script, so it would be trivial to update with each Unicode release.
  4. This can be extended to account for language-specific adjustments, and (unlike the current system) can account for any and all changes we'd want to make. This would be particularly helpful for languages which use a completely different sorting order to Unicode, such as our Latin entries in Egyptian (ꜣ, j, y, ꜥ, w, b, p, f, m, n, r, h, etc).
  5. We would also need a way of handling manual sortkeys, which take two forms:
    1. Raw categories with sortkeys (e.g. ]). These would need to be removed, because they would be totally incompatible with the new system. However, there aren't that many of them (see Category:Pages with raw sortkeys), and they can easily be converted to category templates by bot. Note that this would not affect raw categories without a sortkey specified (e.g. ]), since they use the page's default sortkey, which would be compatible with the new system.
    2. Templates using the sort= parameter. These might need more careful handling, but outside of Japanese, Korean and Vietnamese I don't believe they've been used systematically to any large degree. It might be necessary to implement special handling for these three languages in the short-term, however, as a manual review of 40,000+ entries is impractical.
  6. There is also the possibility of changing the MediaWiki software to use the UCA. However, this wouldn't actually be all that helpful, because it lacks any way for us to tailor things on a language-specific basis. It would also make language-specific adjustments trickier, because you'd have to account for the fact that any adjustments will also be put through the UCA, which could get extremely complex; I suspect it would ultimately increase the amount of work we would have to do.

Pinging @Benwing2, @Erutuon, @Surjection, @This, that and the other, @Lingo Bingo Dingo, @Thadh, @Vininn126, @Qwertygiy, @Wpi as people who are experienced with Lua and/or have expressed interest in this issue in the past. Theknightwho (talk) 14:53, 7 January 2024 (UTC)

I hope these questions aren't too naive, but they seem to need some best-guess answers to make decisions:
What is the current state of play for sorting? Ie, which languages/scripts have more-or-less satisfactory sorting?
How long might the comprehensive solutions (UCA, at Mediawiki or local) take to implement?
In five or ten years who will be maintaining Wiktionary's own sorting if we don't rely on Mediawiki?
Have any other language Wiktionaries addressed this for multiple languages?
DCDuring (talk) 17:01, 7 January 2024 (UTC)
@DCDuring
  1. We have sorting for most languages with 2-letter codes, and quite a few with 3-letter codes.
  2. It's generally okay, but you'll always run into problems with the edge cases, which usually crop up when there are orthographic borrowings (e.g. diacritics in English, Latin text in Russian etc etc). Some are quite sophisticated in handling tie-breaks etc; most are not. This becomes more obvious in column templates (as opposed to categories), because the diacritics that are stripped from page names are often still shown in column links, and it would be useful to sort those properly (since for some languages it genuinely matters).
    A secondary point to add to this is that the UCA is designed to be modified on a per-language basis, and there are quite a lot of "tailorings" (their term) out there which we could use with little modification. That saves us time and effort.
  3. Writing the UCA module won't take too long - I already wrote an implementation a few months back, but I'd prefer to start from scratch since my coding has improved a lot since then. What might take a little longer is laying the groundwork, as we need to clean up a number of entries that have raw sortkeys, etc.
  4. I'm making sure to design it in a way that's as easy to maintain as possible: the UCA DUCET (the data file with all the weightings) is published and updated by Unicode, as new versions are released. I've copied this to Module:User:Theknightwho/UCA/DUCET (minus the bulky comments), and have created a script which automatically generates the Javascript table which enables us to automatically show the correct headings in categories. I will make sure to do the same for any other data modules we might need. The upshot is that it should only take a few minutes to update.
  5. No idea.
Theknightwho (talk) 17:58, 7 January 2024 (UTC)
I was a bit worried about the groundwork, whether we go homemade or MW. Are we talking months, years, or "no idea"? I don't really know which wikts have good technical capability, but I occasionally hear about things at de.wikt that seem good. DCDuring (talk) 18:07, 7 January 2024 (UTC)
@DCDuring The raw categories could probably be done in an afternoon by a bot. Japanese, Korean and (to a lesser extent) Vietnamese are a bit trickier, since between them they make up about 95% of uses of sort=. If needs be, I think we could hold those three back (and in any event, I suspect the main changeover would probably take a month or so, as all the languages with sortkeys will need to have them converted to the new system).
The problem with manual uses of sort= is that if you enter "ABC", it currently means "sort using the literal codepoints ABC", and not "sort as though the term was ABC": this means if someone puts a term with diacritics/hyphens/apostrophes into sort=, none of them are removed. The flipside is that people tend to enter the term without diacritics/hyphens/apostrophes as a way to change the sorting order, which means that if we simply applied the UCA to what they'd entered then it would be wrong, since the UCA expects the input to be the original term. Worst-case scenario is they'd all need to be manually reviewed, but I suspect it's possible to partially automate since the Japanese editors seem to have been pretty systematic about it.
On a related note, I'd prefer if sort= were changed to mean "sort as though the term was ABC", because the current system just leads to a bunch of ad hoc attempts to change the sorting order, which are mostly incompatible with each other (with the notable exception of the Japanese editors). I'll raise that separately another time, though. Theknightwho (talk) 18:29, 7 January 2024 (UTC)
Thank you for pinging me.
"a script which converts these private use characters into the correct header in the category" - Will it be possible to make this equip by default for our readers (so, people that are not logged-in)?
Also, what are the downsides of this implementation? Would we need to manually define sorting for all languages? (this isn't a downside per se, but I'm just curious) Thadh (talk) 17:04, 7 January 2024 (UTC)
@Thadh That's my main concern: if we used it, it would effectively mean Javascript would be mandatory if you wanted to have usaeble category headings; otherwise all you'll see is 􀀀 where the heading should be. I guess it wouldn't be the end of the world, but I'd like to be sure it's not going to impact the vast majority of people, since it's ultimately a bit of a hack. Theknightwho (talk) 18:02, 7 January 2024 (UTC)
To answer your question about manually defining sorting: we'd need to manually define any adjustments to the UCA for a particular language, but that's something we already have to do anyway (e.g. Q will need to be given special behaviour for Azerbaijani, since it comes after K). In many cases, it will render the adjustments we're currently making unnecessary, so it should cut down on the number/size of sortkeys in the language data modules. Theknightwho (talk) 18:04, 7 January 2024 (UTC)
@Theknightwho:: I hope this comment isn't too late. It sounds as though you should be starting with the CLDR collation algorithm, which is similar to the UCA, but uses a compressed sort code compared to DUCET, and has tailoring built into ICU, which is opensource and liberally licensed. I'm a bit worried about the load time, including tailoring. Also, though this is a difficult case, I defined a tailoring that gets a native CLDR-friendly(!) Lao sort order, where it can be deduced from the spelling, which unfortunately needed around a million weightings, and was way beyond ICU's capacities. Lao sorting should really work syllable by syllable, and, unlike official Thai, the discontiguous multigraphs have to be sorted as units. The UCA can't handle all this. The CLDR algorithm can, just, with the look-back key definitions. There are a lot of optimisations in ICU that wouldn't help us, but just waste CPU cycles and quite possibly memory when the task is to generate a single sort key. (It would have been easier to load the collation element table directly.)
The friendliness of that Lao ordering is that in a CVC syllable, the vowel outweighs the final consonant. In an unfriendly order, the final consonant outweighs the vowel. Look-back is needed because the keys for CVCV cannot be built by processing CVC and then V - one has to recognise that we are dealing with two CV syllables. There may be similar fun with Chinese characters and Hangul together in Korean - I've not delved into that.
Unicode used to publish what looked like a tailoring to explain the ordering of DUCET. This got caveated and maybe later withdrawn after I pointed out that it was wrong - I think the tailoring definitions do a bad job of matching hand-crafted weightings if they don't have recourse to non-characters. --RichardW57m (talk) 13:25, 26 January 2024 (UTC)

I don't know how widely this problem stretches, but this template is broken. See, for example, 困#Hanja, which reads, ]. TE(æ)A,ea. (talk) 02:13, 8 January 2024 (UTC)

@TE(æ)A,ea. Fixed. It's because {{lang}} was putting out a category, and you can't put categories in links. Theknightwho (talk) 02:22, 8 January 2024 (UTC)

{{l-self}} and {{m-self}} are now redundant

So these two templates are intended for situations where a link template might end up linking to itself, usually because they're used in inflection tables. However, I've worked out a way to automate this by creating a function which works out what language section the template is being called from, which renders them unnecessary. If a term links to its own language section (and there's no id= parameter), then it becomes a self-link automatically. If it links to a different language section (or has an id= parameter), then it renders as a clickable link. The exception to this is when the langcode is und, which always renders as a self-link since we don't have a language section for Undetermined. Theknightwho (talk) 02:30, 8 January 2024 (UTC)

Re your last sentence, we do have a few Undetermined language sections, like 𐇑 and ΖΩΑΠΑΝ and ΖΟΑΠΑΝ. (Or do you mean something else?) But they don't have inflection tables or anything I can think of where it might be hard to anticipate whether a link was going to go to the entry it was on unless we decide to make a w:Template:Runes-style template to crosslink all the Phaistos symbols, so as long as {{l}}s still work it should be fine. - -sche (discuss) 03:31, 8 January 2024 (UTC)
@Theknightwho Does this add overhead? Benwing2 (talk) 07:28, 8 January 2024 (UTC)
@Benwing2 Nothing I’ve been able to measure. Theknightwho (talk) 07:32, 8 January 2024 (UTC)
@-sche Thanks - I’ll have a think about the way best to solve this, because I’d like to be able to do away with the self-templates if at all possible. It’s a really marginal issue, so hopefully shouldn’t be too difficult to solve. Theknightwho (talk) 07:34, 8 January 2024 (UTC)
@Theknightwho: There's a bit of a downside. If one's editing a section at the L3 level or below, the self-links will display as ordinary links when previewing. That's confusing but, I think, tolerable. --RichardW57 (talk) 23:03, 8 January 2024 (UTC)
@RichardW57 Hmm - I did check that as I wondered if that would happen, and it didn't in my tests. Could you give an example? There might be a way to solve this. Theknightwho (talk) 23:15, 8 January 2024 (UTC)
@Theknightwho: Edit and preview the section nāka#Noun. The vocative singular at the bottom left of the declension table will show up as blue, whereas it will have been black before and after editing. --RichardW57 (talk) 23:27, 8 January 2024 (UTC)
@RichardW57 Thanks - that is solvable. It's because when you edit a section below L2 it gets treated like a page that doesn't have an L2 in the preview, which currently isn't accounted for. Note how the issue doesn't occur if you edit the L2 Pali section.
I'll set the logic so that if there's no L2 then it gets treated as though it's in its own language section (i.e. so it shows a self-link). This behaviour should only ever happen in previews since L2 headers are mandatory, so I can't see that this would cause any problems. I suppose it could happen outside of mainspace if a page has no L2 headers on it, but I don't really see how that could ever be a problem (and I expect the vast majority of non-mainspace pages that aren't user pages do have L2 headers, in any event). Theknightwho (talk) 23:43, 8 January 2024 (UTC)
Derp - I'm forgetting this would cause problems for other languages linking to the same page. I guess it's just something we'll have to tolerate, since it only affects previews. Theknightwho (talk) 23:46, 8 January 2024 (UTC)
Am I inderstanding correctly that there is no way now to link to the same section from whithin that section without specifying a senseid? That seems rather undesirable, as that is sometimes useful in etymology sections (saying "Etymology 1" doesn't always work when dealing with highly inflected languages and various non-lemma stems). Thadh (talk) 10:32, 19 January 2024 (UTC)

@Benwing2, Theknightwho: Is there any easy way to convert a link embedded in a larger string to a 'self-link'? I suspect not. The issue comes up in Sanskrit alternative spellings when Devanagari doesn't readily show the difference. e.g. Sanskrit බුද්‍ධ (buddha) v. බුද‍්ධ (buddha)), and there may be a more invasive but better solution than global substitutions on the generated wikicode. --RichardW57m (talk) 10:13, 8 January 2024 (UTC)

@RichardW57m I'm not fully sure what you mean - it will already detect self-links when they're part of an embedded link (e.g. {{l|en|] ]}} would show "term1" as a self-link if it's in the English section of term1's page. Theknightwho (talk) 03:42, 9 January 2024 (UTC)
@Theknightwho: The problem comes when the string in question is {{replace|{{l|en|term3}}|term3|term1}} in term1's page. This is a simplification to the problem of showing the declension of බුද්‍ධ (buddha) (with a ligatured conjunct). When I request the declension using {{sa-decl-noun-m}}, I instead get the declension of බුද‍්ධ (buddha) (with touching letters), in which there is no self-link. (The correct rendering of the consonant cluster is quite different for the two spellings - Noto Sans Sinhala shows the desired forms.) I solve the problem by making a global edit to the declension table, but I then get a blue self-link for the vocative singular. I tried overriding the form of the vocative singular, but when I specified a ligatured conjunct, this gets overridden. Expressed differently, I want to defer the detection of self-linking.
Having now tried delving into the mechanism being used, it seems that this technique for eliminating {{l-self}} doesn't help with this problem, and a more fundamental solution addressing the tricks of Sanskrit inflection is required. One possibility is to move global substitution into the inflection modules with something like the |subst= of {{quote}}.--RichardW57m (talk) 10:46, 9 January 2024 (UTC)
@RichardW57 I don't think that's possible, because templates can't know what other templates they're being nested inside. Theknightwho (talk) 10:49, 9 January 2024 (UTC)
@Theknightwho: What I want is sufficiently restricted is that it is possible in the circumstances in which I want it; however, the obvious mechanisms look extremely messy and fragile. I think your method doesn't actually help with this problem, and I have to go for the intrusive methods of fixing an inflexible design. --RichardW57m (talk) 17:51, 9 January 2024 (UTC)
Does this have any way of identifying the section if an identical link template is used in its own language's section and another language's section? Though I can't think of a situation where that happens, so it's probably rare. I was thinking etymology sections, but those would more often use etymology linking templates, or usage note sections, but those would rarely link to other languages' words. I haven't searched pages to see if I could find any examples yet. — Eru·tuon 08:19, 19 January 2024 (UTC)

In the headword of "don't fuck in the factory" (a phrase that uses English words but is apparently only attested in German), the words "don't", "fuck", and "factory" are orange because I have Orange Links on and there's no German entry for those words. "in" shows up blue, because it is a German word... but "the" also shows up blue even though there's no German entry the#German. This is the case whether the headword line is {{de-proverb}} or {{head|de|proverb}} . Why? - -sche (discuss) 03:16, 8 January 2024 (UTC)

@-sche the OrangeLinks code (line 149) checks whether the page has a category whose name begins with the relevant language name, and if so, keeps the link blue. The the entry contains the (newly-renamed by @Benwing2, Theknightwho) category "German links with redundant target parameters", which is confusing the OrangeLinks script. This, that and the other (talk) 07:15, 8 January 2024 (UTC)
Hopefully fixed in this edit. Some entries which are in neither "LANG lemmas", "LANG non-lemma forms" or "LANG logograms", such as daldırmayan and amangachii-, will now show up as orange when they should be blue, but there are not many of these entries and they seem to be like this as a result of broken templates. (In fact, by total coincidence, I was running SQL queries to investigate this precise set of entries at the time I read -sche's post! What a world.) Let me know if you see any interesting brokenness. This, that and the other (talk) 07:25, 8 January 2024 (UTC)
@This, that and the other Thanks! I was going to suggest having the code read the list of POS's but I realize that isn't necessary since all such terms also go into either 'lemmas' or 'non-lemma forms' (or 'logograms?'). Benwing2 (talk) 07:26, 8 January 2024 (UTC)
@Benwing2 @This, that and the other @-sche On a semi-related note, this issue might still be happening with German and German Low German (as well as any other languages whose names are parts of others, like Turkish and Ottoman Turkish), due to the way the category parse currently works. I’ll rewrite the logic at some point to eliminate this, but just FYI in the meantime. Theknightwho (talk) 07:31, 8 January 2024 (UTC)
@Benwing2 The logograms category is a red herring after all. I found it in the script linked from Wiktionary:Todo/Derivation category does not match entry language, but there is precisely 1 entry in "LANG logograms" that is not also in "LANG lemmas" or "LANG non-lemma forms". The offending entry is 𒉺𒅁, which doesn't even have a headword line somehow.
@Theknightwho the issue shouldn't happen with German Low German anymore, at least as far as it concerns OrangeLinks. Compare German Dunnersdag vs German Low German Dunnersdag. Or are you referring to something else besides OrangeLinks? This, that and the other (talk) 07:34, 8 January 2024 (UTC)
@Theknightwho Not sure what category-parsing code you're referring to but it's unlikely to be invoked from JavaScript. Benwing2 (talk) 07:39, 8 January 2024 (UTC)
@Benwing2 What I meant was that the category parsing causes extra maintenance categories to get added, and occasionally these are for the wrong language due to this issue. Theknightwho (talk) 07:43, 8 January 2024 (UTC)
@This, that and the other That’s good. The thing I’m referring to is that sometimes this causes a false-positive in the language-specific maintenance category for the lang with the shorter name, because when it scans over the categories on the page it stores them in a big table under the names (e.g.) “German”, “German Low”, “German Low German”, “German Low German nouns”, because it has no idea which is the real language, since looking that up on-the-fly is slow and memory intensive. When the head template for the given language then looks up which maintenance categories it needs to add, it simply checks under its own name, which is normally fine, except for the situation like the one I described. Theknightwho (talk) 07:42, 8 January 2024 (UTC)
@Theknightwho Which code are you referring to that does this? It should be smart enough to look for capital vs. lowercase initial words at least. Benwing2 (talk) 07:57, 8 January 2024 (UTC)
@Benwing2 It’s in Module:headword/data towards the bottom. We can’t rely on capitalisation as a break point (e.g. Antigua and Barbuda Creole English). Theknightwho (talk) 08:01, 8 January 2024 (UTC)
@Theknightwho I think you can be smart about this; how many languages have the last word lowercase-initial? Benwing2 (talk) 08:08, 8 January 2024 (UTC)
Or if necessary, whitelist the few languages that do. Benwing2 (talk) 08:08, 8 January 2024 (UTC)
@Benwing2 I’ll have a look, but I’d like to avoid an explicit whitelist as it’d be a kludge. Theknightwho (talk) 08:15, 8 January 2024 (UTC)
@Theknightwho What happens if you don't? Do terms get miscategorized? Benwing2 (talk) 08:17, 8 January 2024 (UTC)
@Benwing2 The examples I've seen haven't been affected by that - it's more when you get things like "German Low German" and "German". I'll have a look at it now, as this should be possible to solve properly. Theknightwho (talk) 22:18, 8 January 2024 (UTC)
@Benwing2 I've fixed this: it now checks from longest to shortest, then breaks when it finds a language name, e.g. een is (correctly) in Category:Dutch Low Saxon entries with language name categories using raw markup, but is no longer showing up as a false positive in Category:Dutch entries with language name categories using raw markup. This was only possible by checking against Module:languages/canonical names, but loading it via require means that it doesn't have any significant impact on memory usage (versus mw.loadData, which adds a lot). I assume it gets cleared out of memory at some point, due to it being called by a static module which is only loaded once per page. Theknightwho (talk) 00:20, 9 January 2024 (UTC)
@Theknightwho Great, thanks! Benwing2 (talk) 01:45, 9 January 2024 (UTC)
Actually, I take back my comment about "logograms". We do need it after all. It seems that we have a bunch of entries with an Akkadian header without an Akkadian lemmas/non-lemma forms category, but they do have a Sumerian section that has a Sumerian lemmas/non-lemma forms category, which is what my SQL query was picking up. Here's one: 𒅆𒂍. Clearly I still need to work on my queries! This, that and the other (talk) 07:40, 8 January 2024 (UTC)
@This, that and the other Are you sure? 'logogram' is a lemma POS so any term using it in {{head}} should get added to 'lemmas' as well. Do you have an example of a term that's in 'logograms' but not 'lemmas'? The term you linked to isn't in CAT:Akkadian logograms. Benwing2 (talk) 07:56, 8 January 2024 (UTC)
Yeah, pinging User:Sartma: it seems like those Akkadian logograms like 𒅆𒂍 should all use headword templates that categorize them as logograms, yes? - -sche (discuss) 08:02, 8 January 2024 (UTC)
@Benwing2 gosh, this is very confusing. It seems like 𒉺𒅁, the entry I initially identified, is the only entry in Cat:Akkadian logograms but NOT in Cat:Akkadian lemmas after all. However, we definitely need to do something about the Akkadian entries (L2 sections) without any categories at all. This, that and the other (talk) 08:58, 8 January 2024 (UTC)
Not sure what you guys are talking about, since I'm quite shit at all the technical part of Wiktionary. Unfortunately, being the only editor of Akkadian and Sumerian, I don't have time to complete all entries as they should be, so sometimes I just add the table that gives the Akkadian Sign values of a cuneiform sign/logogram (since this is one of the main pieces of information that people want to know), hoping that at some point I'll be able to add the rest (i.e. the Akkadian entry(s) corresponding to that logogram/sign. Ideally, a "finished" Akkadian logogram entry should look like this: 𒅆. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 14:29, 9 January 2024 (UTC)

The documentation of Module:links does not mention the field no_check_redundant_translit of the argument data of exported function full_link. Nor is the list of users complete - it is also used in Module:pi-decl/noun. (It is needed there to avoid flooding the list of Pali pages with redundant transliterations - default Pali transliteration cannot be totally relied upon, as it doesn't always know which writing system is used well enough.) --RichardW57m (talk) 18:10, 8 January 2024 (UTC)

Parsed Wiktionary database

English Wiktionary parsed enwikt20231001 (at Academic Torrents) - this is parsed by Wikokit the Wiktionary dump in MySQL format. The source dump enwikt20231001 (Oct 1, 2023) was parsed. -- Andrew Krizhanovsky (talk) 08:46, 9 January 2024 (UTC)

Add reset gadget buttons

Some gadget is messing up my view. But I can't tell which ones were turned on by me! There should be a "reset to default" button for each.

Oh, I see, all that are checked are turned on by me.

OK, there should be a "turn off/reset all" button.

Jidanni (talk) 11:34, 9 January 2024 (UTC)

@Jidanni you can reset your entire preferences, but this is a drastic measure (it resets a bunch of other stuff too!).
I've been tempted for a while now to move all the on-by-default gadgets to their own section. This would help in situations like this. Alternatively, critical gadgets where there is no reason to ever turn them off (like script font support) can be marked as hidden so there is less to interact with in the preferences page. This, that and the other (talk) 11:41, 9 January 2024 (UTC)
But I don't want to reset all, just that one section.
OK, in an incognito window I can see their original values.
I also saw
<Gadget-section-Hidden gadgets and utils>
Yup, a real &lt; which was supposed to be an "<".
It's right there on https://en.wiktionary.orghttps://dictious.com/en/Wiktionary:Preferences/V2 in incognito mode! Jidanni (talk) 12:02, 9 January 2024 (UTC)
OK, I found and executed the reset-all button, and now the nightmare is over. Jidanni (talk) 12:08, 9 January 2024 (UTC)
I agree with TTO it would be useful to mark gadgets which are on by default. Maybe not moving them all to a separate section (it can make sense to have them sorted by category like other gadgets), but at least adding "on by default" to their descriptions or something, like "legacy global scripts" has "(disable at your own risk)". - -sche (discuss) 02:49, 10 January 2024 (UTC)
@-sche how does the gadget list look now? I didn't move gadgets from one section to another (except for the search engines one) but I did move the on-by-default gadgets to the top of the respective sections and added a clear label. This, that and the other (talk) 09:20, 12 January 2024 (UTC)
Well, it's easy to track what's on by default now! Thank you. We could probably drop "disable at your own risk" from anything that's just aesthetic and won't break functionality if turned off, e.g. "Add default styles..." and "Format fake headings on JavaScript and CSS pages". It might also be nice to link to what exactly our "legacy global scripts" are. - -sche (discuss) 10:10, 12 January 2024 (UTC)

Display jump toward bottom of page using linking templates

I observed that in linking to galla from gall that the display briefly (< 1 sec.) went to the Latin L2, then jumped to the bottom of the page.

I can duplicate this behavior from my sandbox page and can achieve similar results linking to many of the language sections on ]. The extent of the downward jump seems to increase from minimal to galla#Finnish to the end of the entry for galla#Latin. Similar behavior is apparent when I link to other long entries with multiple L2 sections. It seems to defeat the purpose of L2 linking. DCDuring (talk) 17:13, 9 January 2024 (UTC)

I use Vector Legacy, FF 121.0 (will update shortly), Windows 10. DCDuring (talk) 17:16, 9 January 2024 (UTC)

@Theknightwho, Benwing2 Problem continues with FF 121.0.1. DCDuring (talk) 23:10, 9 January 2024 (UTC)
Does anyone else experience similar problems? DCDuring (talk) 23:11, 9 January 2024 (UTC)
@DCDuring Can you give me a recipe for reproducing it? Benwing2 (talk) 23:18, 9 January 2024 (UTC)
@User:Benwing2 I thought I had: Do what I did. Go to ]. In one of the etymologies is a link to galla. Click it. OR click this link to: galla, which uses {{l}}. DCDuring (talk) 23:49, 9 January 2024 (UTC)
@DCDuring This link works normally for me, so I think it may be something JavaScript-related at your end. Theknightwho (talk) 23:54, 9 January 2024 (UTC)
@DCDuring This doesn't occur for me on Chrome 120.0.6099.129 with Mac OS Ventura 13.6.3, but it does occur for me on Safari 17.2, same OS, with my admin account (User:Benwing). So it's browser-specific but definitely real. I am guessing it's something to do with the recent self-link change by User:Theknightwho but not sure. Benwing2 (talk) 23:56, 9 January 2024 (UTC)
It's not related to the two different accounts; still occurs on Safari using User:Benwing2. Benwing2 (talk) 23:58, 9 January 2024 (UTC)
@Benwing2 Let me install Safari and see if I can replicate it (though I'm on Windows, so who knows). Theknightwho (talk) 00:00, 10 January 2024 (UTC)
Urgh, seems they've stopped supporting it. I'll see if there's a website which emulates it or something. Theknightwho (talk) 00:02, 10 January 2024 (UTC)
@Theknightwho Can you check with Firefox? Benwing2 (talk) 00:04, 10 January 2024 (UTC)
@Benwing2 Yep, I'm seeing it with Firefox, though only the first time the problem page is loaded (or if you do a hard-refresh). I don't see how this could be related to the self-linking change, since the fragments in the links are correct. Theknightwho (talk) 00:11, 10 January 2024 (UTC)
@Theknightwho Hmm, you are probably right; my only other hypothesis is something that changed in MediaWiki recently, but who knows what? Benwing2 (talk) 00:16, 10 January 2024 (UTC)
@Benwing2 I think @Erutuon may be thinking along the right lines, as the Finnish templates do flash up in an expanded form when the page first loads. It's odd that it only affects certain browsers, though. @Surjection, as the creator/maintainor of the Finnish inflection templates, do you have any ideas? Theknightwho (talk) 01:05, 10 January 2024 (UTC)
@Theknightwho I suspect this is not specific to Finnish but to any page that has collapsed tables in it. Benwing2 (talk) 01:08, 10 January 2024 (UTC)
I was able to reproduce this in Firefox 120 using the Vector Legacy and Vector 2022 skins. Maybe the Finnish declension tables are loading expanded and then the page is scrolled to the Latin section and then the Finnish declension tables are collapsed, causing the page to be scrolled to the bottom. That fits with the distances from the top of the page that I measured with the JavaScript console in the two situations. The window.pageYOffset is greater at the Latin section with the tables expanded than at the bottom of the page when the tables are collapsed.
The table is collapsed by JavaScript (MediaWiki:Gadget-defaultVisibilityToggles.js), and I guess the JavaScript is running too late. It should run before the page scrolls to the correct section, if possible. Unfortunately, I don't know how to make sure that happens. Or the page could be scrolled by JavaScript to the correct section after the tables are collapsed, though I don't know if that's a good idea. — Eru·tuon 00:08, 10 January 2024 (UTC)
The Finnish inflection table on ] has two layers of collapsed tables. Does that cause additional delay? DCDuring (talk) 02:32, 10 January 2024 (UTC)
I don't think this is a new issue; I think (as y'all seem to have worked out already) this is the longstanding (years-, possibly decade-) old issue that when our anchors point to specific sections of a page, you get taken to that section (correctly, for an instant) but then the javascript that collapses various tables runs and you're left in the wrong place except in specific browsers that either prevent or correct for this. At least, that old issue is what I experience when I click the link on gall to galla#Swedish (in Firefox). The main thing which I recall changing with regard to this, in the last decade+, is that apparently some browsers like Chrome now prevent/correct this issue. - -sche (discuss) 02:37, 10 January 2024 (UTC)
(Mention of this issue in 2012, 2016.) If anyone can think of a way of fixing this longstanding issue, besides switching to Chrome... either by doing something on Wiktionary that would 'reset' the page once the tables collapsed, or by pestering Firefox to do whatever it is Chrome does to solve this... go for it... - -sche (discuss) 02:45, 10 January 2024 (UTC)
I have tried to report the problem to Mozilla as an anon user, but my attempt seemed to hang at the very end. It might be better for someone with a GitHub account to report it to Mozilla. DCDuring (talk) 03:07, 10 January 2024 (UTC)
In long pages like this one, I find that using the contents page to jump to a topic usually doesn’t work. I end up seeing a different topic from the one which I clicked on. — Sgconlaw (talk) 04:50, 10 January 2024 (UTC)
@Sgconlaw Hmm, it usually works for me. Maybe this is due to different browsers (I usually use Chrome). Benwing2 (talk) 04:54, 10 January 2024 (UTC)
@Benwing2: could be. I use FireFox on a laptop, and Safari on mobile devices. It’s slightly annoying but I’m not unduly worried by the issue. — Sgconlaw (talk) 04:57, 10 January 2024 (UTC)
I wonder if we could add extra code at the end of the table-collapsing code that looks at the page anchor and re-scrolls the page to that position. I would need to read through MediaWiki:Gadget-defaultVisibilityToggles.js to see exactly where it needs to go, though. This, that and the other (talk) 09:26, 10 January 2024 (UTC)

Maintenance warnings caused by Module number_list

Module:number list appears to be the sole populator of cat:Pali links with redundant alt parameters and is probably so for other languages, at least ones that continue to use {{cardinalbox}} or {{ordinalbox}}. (Until today there was another infelicity populating that Pali maintenance category.) Is there any reason I shouldn't dive in myself to try to eliminate the apparent problem? To be honest, I'm not quite sure what the merit is in eliminating cases where the 'term' and 'alt' are the same, but a single module unnecessarily dominating a maintenance category isn't good. --RichardW57m (talk) 14:36, 10 January 2024 (UTC)

@RichardW57m: I've removed unconditional alt inclusion in the manual number box templates in Module:number list. I don't know what the purpose was, but this change removed the unnecessary maintenance categories and didn't cause problems in the entries that I previewed with the modified version of the module before saving. — Eru·tuon 17:21, 10 January 2024 (UTC)
@Erutuon: Thank you. There'll now be a lot of empty maintenance categories to probably delete at the next purge. --RichardW57 (talk) 20:16, 10 January 2024 (UTC)
@RichardW57 We don't generally delete maintenance categories like this, or otherwise there'd be a lot of pointless deletion/recreation. Theknightwho (talk) 21:06, 10 January 2024 (UTC)

JIS2004 kanji on iOS 17.1

JIS X 0213:2004 (commonly known as JIS2004) changed the rendering of a number of kanji, such as 辻、葛、噌、巷. However, I have recently noticed that Wiktionary on iOS 17.1 displays the kanji with the pre-JIS2004 forms. I have also tested in other environments:

  • English Wiktionary on an older, unspecified version of iOS that I could not retrieve because the device broke down a while ago,
  • English Wiktionary again, Windows 10
  • Wiktionary in a number of other languages (including Japanese), also iOS 17.1
  • Wikipedia in a number of languages (including English and Japanese), iOS 17.1

All of the above cases display the JIS2004 forms. Is it a problem with iOS 17.1, or did Wiktionary decide recently to switch to the pre-2004 forms? OosakaNoOusama (talk) 00:15, 11 January 2024 (UTC)

@OosakaNoOusama: Generally this type of thing is determined by what fonts are installed in the operating system. If all the fonts in your operating system don't render the kanji correctly, Wiktionary can do nothing about it because it doesn't determine which fonts users have installed. (The only way it could provide fonts to readers would be with webfonts, but it doesn't serve webfonts to readers.) If one font in your operating system renders the kanji correctly (and does not have older versions that render them incorrectly, which might be installed in other readers' operating systems) and another default font renders them incorrectly, Wiktionary can tell the browser to use the non-default font that renders the kanji correctly using MediaWiki:Gadget-LanguagesAndScripts.css. I would recommend, if you want to solve this issue, that you test a bunch of fonts and figure out which font is displaying the kanji incorrectly and whether there is another font in the operating system or online that displays them correctly. Then you can suggest fonts to add to the .Jpan list in MediaWiki:Gadget-LanguagesAndScripts.css. (The CSS will only fix the rendering for readers who have one of those fonts installed.) I generally do font testing in the browser by adding font-family rules in the developer tools, and checking which font each character is using and whether the character is rendering correctly, but there may be other ways to do it. — Eru·tuon 23:26, 11 January 2024 (UTC)

Etymology Albanian braktis

Hi, I’m trying to add etymology for the word braktis but an error message keeps popping up. Can you tell me what am I doing wrong? Thx Etimo (talk) 11:04, 11 January 2024 (UTC)

@Etimo the wikitext you were trying to add looked like this:
===Etymology===

From a Slavic dialect....
But there should be no line after a level 3 header:
===Etymology===
From a Slavic dialect....
I hope this helps. This, that and the other (talk) 11:40, 11 January 2024 (UTC)
I wonder (seeing that you set this filter back to just "warn") if we should also set 166 back to "warn"; it stopped e.g. and , and although both of those entries did later get created, I wonder if others are getting blocked — since apparently what to do to resolve the issues these filters catch is not as intuitive as we might like it think, if someone like Etimo who's been around for a long time couldn't work it out. - -sche (discuss) 09:23, 12 January 2024 (UTC)
I think there is a general problem with overstrict filters. I nearly had to abandon an edit because I couldn't find a stray tab character in the edit window. I got lucky, guessing that it was a fat-finger problem, but I have never had call to try to locate a not-visually-detectable character before. I suppose that I could have copied the contents of the entire entry window, abandoned the edit, tried to find and delete the offending character off-line, and then return to the entry to paste the cleaned material. If that wouldn't work, I'd quit editing for the day. DCDuring (talk) 13:43, 12 January 2024 (UTC)
@-sche, This, that and the other, DCDuring Yes, we should set 166 to warn. IMO all of these formatting filters should be set to warn, for multiple reasons, e.g. if set to "block" they block legitimate users and also make bot changes more difficult (esp. the ones that act outside of the changed region). Benwing2 (talk) 19:14, 12 January 2024 (UTC)
I set 164 and 166 to warn. I considered doing this for 156 ("No defaultsort") but this seems something it's easier to avoid and is likely only to hit established users, since new users are unlikely to know about DEFAULTSORT. Benwing2 (talk) 19:19, 12 January 2024 (UTC)
Thanks. I am also keeping an eye on 158, but for a different reason: in nearly six months it appears to have caught just a single edit (?) which means it may be safe to deactivate. - -sche (discuss) 19:31, 12 January 2024 (UTC)
Hmm. I'm surprised this got hit so few times, but maybe it's a similar case where only established users are likely to hit it. In any case, there are now 3 hits because I tested it to make sure it works in cases where it should :) Benwing2 (talk) 19:49, 12 January 2024 (UTC)
@-sche @Benwing2 My two cents on 156 and 158 are that they should both remain as disallowed, as they both stop something that causes problems beyond the page the user is editing (since it impacts the category as well). DEFAULTSORT in particular is a real pain, because it can mess up the page's sorting order in lots of categories, sometimes in ways that are completely incompatible with the scheme used by the rest of the category; this was particularly bad when it was widespread in entries with CJK characters in the title, because pages were being sorted with Japanese kana readings in categories completely unrelated to Japanese. If we set this to warn, I'd have to re-enable the check for it in the headword module again, which simply adds unnecessary overhead. Plus I'd rather not have to remember to clear them up periodically (because - as we can see - some users simply plough on regardless even if given a warning).
The reason for 158 is that I fixed about 20-30 pages where people had used sort= in a raw category. It doesn't happen very often, but it's clear that they tend to go unnoticed when it does happen. Theknightwho (talk) 23:20, 12 January 2024 (UTC)

New Entry Creator

This appears to have been deactivated by somebody in the last 24 hours. It was working yesterday. DonnanZ (talk) 23:07, 12 January 2024 (UTC)

@Donnanz Hmm. I don't use this but I haven't touched anything in the last 24 hours that would have affected it. Can you give a recipe for reproducing the problem along with what happens (or rather, doesn't happen)? Benwing2 (talk) 23:30, 12 January 2024 (UTC)
When I search for e.g. "barfoos" and then scroll down to the "These entry templates may help when adding words:" and hit the blue "Plural" or "Noun" buttons, I get a preloaded entry, i.e. it seems to still be working for me. - -sche (discuss) 23:58, 12 January 2024 (UTC)
I don't see it using Windows 10, FF 121.9.1, Vector Legacy. DCDuring (talk) 00:08, 13 January 2024 (UTC)
Didn't work in any usable skin for me. DCDuring (talk) 00:17, 13 January 2024 (UTC)
@Benwing2: (edit conflict) If, say, I wanted to create an entry for Bloggsville I get the message "You may create the page "Bloggsville" on a blank page, request its creation or create it using the New Entry Creator!" A template would pop up when I clicked on "create it". But not today (it's now tomorrow!). DonnanZ (talk) 00:11, 13 January 2024 (UTC)
@-sche: I see what you mean, they are not all of the options you should get clicking on "create it", which include "Proper noun". DonnanZ (talk) 00:21, 13 January 2024 (UTC)
@Donnanz this may have been me, sorry. I was prompted by another discussion here to inspect the content of the "legacy scripts" gadget, and I removed some apparent remnants of old experiments. Part of the code I removed referred to a CSS class "necblah". I searched for this class and noticed that it is mentioned in the New Entry Creator code, but I determined that the code in the legacy scripts gadget must have been redundant nonetheless. Perhaps I was wrong. If so, I apologise. This, that and the other (talk) 00:36, 13 January 2024 (UTC)
(I reverted my change, so please check if your issue is resolved.) This, that and the other (talk) 00:41, 13 January 2024 (UTC)
I was searching around trying to figure out how it works (because I'd forgotten if I ever knew), so here goes. On the search page, when MediaWiki infers that you're looking for a page that doesn't exist, there's a link to an edit page with the parameters title=search query here&action=edit&editintro=User:Yair_rand/usenec (transcluded from MediaWiki:Searchmenu-new). The editintro parameter in the URL causes MediaWiki to put the text in User:Yair_rand/usenec above the edit box. The text contains a HTML tag with id="necblah". MediaWiki:Gadget-legacy.js searches for that element and loads User:Yair rand/newentrywiz.js if it is found. Then the script adds a form that fills in the wikitext of a basic entry. A bit of a convoluted process. — Eru·tuon 00:48, 13 January 2024 (UTC)
@Erutuon thanks for the explanation. Let's maybe rename that ID to something a bit more real-looking than "necblah"... And maybe move that page out of userspace while we're at it... This, that and the other (talk) 02:32, 13 January 2024 (UTC)
@Erutuon @This, that and the other There used to be a bunch of stuff in User:Yair rand's userspace that was used generally. Not sure how much is left but there may be other things. Benwing2 (talk) 02:43, 13 January 2024 (UTC)
I have renamed the ID necblah to necplaceholder to try to prevent this code from being mistaken for test/experimental code again. NEC is temporarily down as a result of pages being out of sync, but it should reappear in a couple of minutes once caches are cleared. If not, please restart your browser.
I'm also going to move User:Yair rand/usenec to MediaWiki:nec-editintro so it is clear that it is part of the UI. This, that and the other (talk) 09:32, 13 January 2024 (UTC)
Yeah, sorry about all that. There's a lot of messiness relating to the scripts I wrote back ~2010 (when I, ah, may have been a minor with little idea of what I was doing). Thanks for the fixes. --Yair rand (talk) 02:54, 16 January 2024 (UTC)
Thanks everyone. I see the template pops up now when clicking on "create it", so it should work now. The software may be old and crusty like me, but I have got used to it, and it's more sophisticated than using a blank page. But that doesn't mean it shouldn't be improved. DonnanZ (talk) 10:16, 13 January 2024 (UTC)

Religism definition

The Wiktionary platform thrown the following warning when I tried to enrich the religism term: "This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please start a new Grease pit discussion and describe what you were trying to do. A brief description of the abuse rule which your action matched is: probably vandalism."

What I tried to add: Religism - 2. Form of Fascism, similar to Nazism (all non-Aryan races discrimination), but when one religion group strives to forcefully dominate to usurp power and to eliminate, injure, enslave, displace, imprison, suppress or otherwise threaten lives and freedoms of all the other people (other religions and atheists) on a given territory.

What do you think? And how to overcome the automatic prohibition to update the religism term? UA Republic (talk) 20:33, 13 January 2024 (UTC)

Your edit also added a stray character to a header, which is probably what triggered the abuse filter. As for the content: if you had succeeded in adding it, the first thing I would have done would have been to submit it to WT:RFVE, because it looks like something you made up- we're a descriptive dictionary, so we only cover terms that are/were in actual use. Also, your definition reads more like a discussion of the concept then a definition. You don't need to mention Nazism, because that's only one type of fascism. You also don't need to go into such detail about the kinds of things such a group would do. You seem to be simply referring to a form of fascism that's based on religious ideology, and ideologues who seize absolute power tend to do all manner of bad things to those who aren't favored by their ideology. Chuck Entz (talk) 21:09, 13 January 2024 (UTC)
IMO, current events in Nigeria clearly describe the term Religism. If this term wasn't and isn't yet used in this context but clearly IS, so what one should do to establish this clear and evident association and definition?
https://www.vaticannews.va/en/church/news/2023-04/over-50000-christians-killed-in-nigeria-by-islamist-extremists.html
I propose the following more concise definition, then:
Religism - far left extremist political and religion group of people that strives to violently discriminate all the other people on a given territory, which are not of that same religion. UA Republic (talk) 21:48, 13 January 2024 (UTC)
Do people use the word religism in this manner to describe current events in Nigeria? We define words, and not concepts here, so the fact that the word could describe these events is not relevant. CitationsFreak (talk) 04:34, 15 January 2024 (UTC)

old uses of T:cite that don't work right

In spot-checking CAT:English citations of undefined terms, I have come across multiple ancient pages with now-nonfunctional uses of {{cite}} — e.g. look at the source code vs the displayed content of Citations:addictionary — if anyone fancies a bot or AWB run to try and fix those to use {{quote-book}} and {{quote-web}} templates or something else functional. I don't know how many there are. There may be similar things on Talk: pages. - -sche (discuss) 02:02, 14 January 2024 (UTC)

I have to say, I'm really struggling to understand the purpose of this template in comparison to the established family of cite templates like {{cite-book}} etc. It's only used on 500 pages, of which 150 are via {{R:zom:Singh:2013}}. Perhaps it would be better to RFDO it. This, that and the other (talk) 23:50, 14 January 2024 (UTC)
Replace and delete: yeah, I’d say it’s redundant to the {{cite-}} family of reference templates. — Sgconlaw (talk) 09:14, 15 January 2024 (UTC)
@@-sche, Sgconlaw, This, that and the other: In order to preserve our history, we could always repair @Sgconlaw's edit, which dropped the |text= parameter rather than redirecting it to the more editor-hostile |passage= parameter, e.g. by passage={{{passage|{{{text|}}}}}}. Longer term, it would make sense to deprecate the template in favour of {{cite-book}} etc. In the example given, I think we actually want to have {{quote-book}}! --RichardW57m (talk) 16:56, 15 January 2024 (UTC)
If I remember (this was eight years ago!), at the time all the {{cite-}} reference templates used only |passage=, so I was probably trying to align all the parameters. — Sgconlaw (talk) 17:07, 15 January 2024 (UTC)
OK, I've RFDO'd it. - -sche (discuss) 01:48, 16 January 2024 (UTC)
@sche, Sgconlaw, This, that and the other: I've repaired the functionality of {{cite}}. Remember, Wiktionary is not paper! --RichardW57m (talk) 10:07, 16 January 2024 (UTC)

The CJK Unified Ideographs Extension I list page in Appendix:Unicode currently has no glyphs listed on it. It only has 622 code points/glyphs so someone with script/module knowledge should have no problem updating it. Bumm13 (talk) 07:57, 15 January 2024 (UTC)

Inflections in etymologies

At wast#Old Gutnish we see something that is not terribly uncommon: an inappropriate use of {{inflection of}} in the etymology section of the entry. However, it's not clear exactly what to replace it with. As far as I am aware, the current best practice is to type out the inflection text by hand, which is tedious and error-prone.

Should we have a template for this? I think we should, but I struggle with the naming. The obvious name is {{inflection}}, along the lines of existing pairs such as {{clipping of}}/{{clipping}} and {{abbrev of}}/{{abbrev}}, but it might be too confusing, since the word "inflection" doesn't appear in the output. Ideas? This, that and the other (talk) 10:36, 15 January 2024 (UTC)

If we just want to refer the reader to the lemma, I guess no etymology section is required since the definition line already contains a link to the lemma. I can see, though, that it is sometimes desirable to have an etymology showing that the inflected form is the lemma plus a suffix (for example), or to explain that it is borrowed from another language. I encounter that sometimes in English entries. I don’t think we need a separate template for those cases. — Sgconlaw (talk) 12:28, 15 January 2024 (UTC)
There once was talk of using something like {{lexicalization}}. I was discussing this with @Surjection. Vininn126 (talk) 12:31, 15 January 2024 (UTC)
@Sgconlaw: But this is an irregular form, so stating that it is an inflection of the unentered lemma doesn't explain the /s/. In this case, I think, "See {{m|gmq-ogt|wara}}." might be better, though confidence requires a crystal ball to know where the irregular inflected forms are explained. Perhaps the etymology should refer to an Old Norse entry. --RichardW57m (talk) 15:20, 15 January 2024 (UTC)
@RichardW57m: *ahem* “I can see, though, that it is sometimes desirable to have an etymology showing that the inflected form is the lemma plus a suffix (for example), or to explain that it is borrowed from another language.” So this would be one of those situations. But what’s currently in the etymology section of the non-lemma entry now seems fine. Is another template really required? — Sgconlaw (talk) 15:46, 15 January 2024 (UTC)
@Sgconlaw:. There is the problem that {{inflection of}} sometimes categorises. Now this could be solved by using |nocat=1 - except that the documentation mentions no such parameter! If it does categorise, it will categorise for the wrong language.
Actually, there was a problem in the etymology section - {{root}} was categorising the lemma as being an Old Norse derivative! That looks like a copy and paste error, and I've just corrected it. --RichardW57m (talk) 16:29, 15 January 2024 (UTC)
Perhaps I made this more confusing than it needed to be by giving a nonlemma form as an example. This erroneous template usage is seen in lemma entries too.
I'm curious to know if @Benwing2 has any thoughts on this. This, that and the other (talk) 20:00, 15 January 2024 (UTC)

Orphaned ৰ

Where, if anywhere, should we note that it is not considered worthwhile erecting an Eastern Nagari script for Pali to include both (ra) (U+09B1 BENGALI LETTER RA, currently in Wiktionary script Beng) and (U+09F0 BENGALLI LETTER RA WITH MIDDLE DIAGONAL, currently in Wiktionary script as-Beng)? The only current issue is that we have to include "sc=Beng" to format or transliterate the two words Pali (va), and the letter if we are thinking of it as Pali. Adding script pi-Beng seems excessive to me. --RichardW57m (talk) 15:09, 15 January 2024 (UTC)

SI prefixes for Chinese

I guess only part of this discussion belongs in the Grease pit, but I'm putting it here anyway.

I was working on Chinese entries for SI prefixes. The current predicament for SI prefix navigation in Chinese is all over the place. In all the prefixes for the negative powers of 10, there's a simple table that allows you to navigate through SI prefix entries like this.

SI prefix
Last Next
/ / / / / / /

The problem is these tables practically differ from page to page. Some of them have pinyin romanization on both the left and the right boxes. Some of them have pinyin only on the left. Some of them have pinyin only on the right. has a 3-column table with n/a on the right, whilst (yōu) and had (which I copied when extending the SI prefixes down to  / and  / (kuī)) only a 2-column table with no "Next".

On top of that, there is no navigation table for any of the prefix pages for positive powers of 10. That's a whole other fiasco to deal with, especially with certain characters corresponding to two different prefixes like (zhào) which corresponds to both mega- and tera-. And we have to deal with the whole 1981 prefix standard, so that's a whole up to 3 characters (+ the simplified variants) we have to deal with.

I propose we need to make a template, maybe {{zh-SI}}, that can simply just be used on all character pages that correspond to any SI prefix that establishes consistency and addresses all the demands of Chinese SI prefixes. The question is what that would look like, i.e.: pinyin or no pinyin; what to do for (zhào); what to do about 1981 prefixes; marking PRC vs. ROC standard; marking simplified and traditional for both sets of prefixes; etc. Basically, just, "how can we improve upon the current table". LittleWhole (talk) 09:20, 16 January 2024 (UTC)

Also, on English SI prefix entries using {{enum}}, "Next" corresponds to increasing magnitude whilst "Previous" corresponds to decreasing magnitudes, which is the opposite of what we have right now. I think we should bring it into line with the English standard. LittleWhole (talk) 09:24, 16 January 2024 (UTC)
Oh yes, also: it gets worse. Some of the prefixes have single-character shorthands as well as full phonetically borrowed/matched variants, e.g. (róu) and 柔托 for ronto-. So, that's something that has to be dealt with as well. LittleWhole (talk | contribs) 09:30, 16 January 2024 (UTC)

desctree not recursive?

I had a very simple thing in mind: under Latin tubus, English tube is listed as a descendant; I went to add the "bor=1" arrow. Wait. French tube is also listed as a descendant, which of course it really is. So I should put the English under the French. Oh. The French uses {{desctree}}. Why isn't English under that? Oh. The English descends from Middle French tube, not French. Okay, it's pedantic, but if I change the "fr" to "frm" to get the descendant tree of Middle French . . . now it correctly shows French and (bor=1) English, but no longer shows the descendants of modern French, namely Romanian and Turkish.

I would have thought {{desctree}} of all things should be recursive. Shouldn't the Middle French show French, which then shows Romanian and Turkish? Am I missing something? --Hiztegilari (talk) 20:01, 16 January 2024 (UTC)

@Hiztegilari: It is recursive, the Middle French entry just didn’t have desctree, and with this insight I fixed it for you. Fay Freak (talk) 20:18, 16 January 2024 (UTC)

Currently internal WT links are either blue if the target entry exists, and otherwise red (e.g., redlink versus redlinked). But, as per an example at Wiktionary:Information_desk/2024/January#Definition_by_synonym, would it be good to mark internal WT links with an existing entry but where the target anchor doesn't exist in a distinct colour (e.g., orange or red) too? —DIV (1.152.106.191 00:21, 17 January 2024 (UTC))

You can already do this by going to WT:Preferences/V2 and turning on the OrangeLinks gadget. Even better, if you create an account, you can turn on the gadget (and many others) permanently, without having to frequently revisit that page and turn it back on again! This, that and the other (talk) 01:00, 17 January 2024 (UTC)
Cool! While it's a shame not to be the first one with the idea, on the plus side I figure this means it wasn't a silly idea....
Not sure it's quite "orange"
Renders for me as around Hex #826f34:  
but that's a minor quibble.
The widget will "colour links orange if the target language is missing on an existing page".
TEST: when the widget is activated hun#Norwegian_Bokmål & hun#Etymology_2_4 should be blue, hun#Norwegian_Foo (& hun#Etymology_2_94?) should be orange, and hunNorwegian_Bokmål & hunEtymology_2_4 should be red. RESULT: all as expected, although hun#Etymology_2_94 remains blue (I realise from the above-linked Information Desk discussion that such links are not encouraged, and I suppose the behaviour is arguably(?) matching the design specification of the widget. Also note that, for me, the "orange" colouring is applied with a lag of circa 1 to 3 seconds after the page has loaded & otherwise rendered.
—DIV (1.152.106.191 11:32, 17 January 2024 (UTC))
@This, that and the other: Orange links are not quite the same. When orange links are enabled, a word has to be in an appropriate category to be blue, which is confusing if the head word or equivalent is missing or too defective, and seems to be subject to delays in adding pages to categories. I've been getting a bit confused by links that should be blue still being orange, but I should be OK now that I understand the mechanism. --RichardW57m (talk) 11:40, 17 January 2024 (UTC)

Japanese usage examples and quotations with Template:ja-usex

Japanese usage examples and quotes with {{ja-usex}} are hardly searchable outside Wiktionary. It looks like the wikicode and ruby in the way.

Example at アイスランド (Aisurando):

(ほっ)(きょく)(かい)()かぶ()(ざん)(とう)アイスランド

Hokkyokukai ni ukabu kazantō Aisurando
Iceland, a volcanic island in the Arctic Ocean.

Wikicode:

{{ja-usex|北%極%海に浮かぶ火%山%島アイスランド|^ほっ%きょく%かい に うかぶ か%ざん%とう ^アイスランド|Iceland, a volcanic island in the Arctic Ocean.}}

It is desired that each part is searchable in Google. Only the last two are.

  1. Plain Japanese text (without furigana on top and formatting code): 北極海に浮かぶ火山島アイスランド
  2. Kana (without formatting code and spaces, as it should appear normally): ほっきょくかいにうかぶかざんとうアイスランド
  3. Rōmaji: Hokkyokukai ni ukabu kazantō Aisurando
  4. English translation: Iceland, a volcanic island in the Arctic Ocean

Anatoli T. (обсудить/вклад) 01:10, 17 January 2024 (UTC)

Granted, this is sub-optimal. That said, I'm not sure how to resolve this -- if Google only sees the wikitext, the only apparent way to make the plain-text version Google-able would be to engage in pretty massive data duplication — which is both no fun, and a maintenance nightmare.
That said, I bow to others with more expertise. ‑‑ Eiríkr Útlendi │Tala við mig 19:53, 24 January 2024 (UTC)
Google would be looking at the resulting HTML rather than the wikitext, as mentioned in the task below. In the HTML, the Japanese and the annotations are interspersed: with the HTML tags removed, 北(ほっ)極(きょく)海(かい)に浮(う)かぶ火(か)山(ざん)島(とう)アイスランド. I guess if Google can't search parts 1 and 2, its search indexer sees the text like this and doesn't go to the trouble of splitting the HTML into two parts. Maybe we could put parts 1 and 2 in separate HTML elements that are hidden by default (CSS: display: none;), as suggested in another case in Wiktionary:Beer parlour/2022/December § Arabic transliterations: let's use ʔ and ʕ instead of ʾ and ʿ.. — Eru·tuon 01:19, 30 January 2024 (UTC)
phabricator:T150111Fish bowl (talk) 00:02, 30 January 2024 (UTC)

Order of diacritics with mw.ustring.toNFD

I’m working on this module and I’m encountering a problem with the ordering of diacritics. Apparently NFD puts dot-below in front of the circumflex, and if I then convert dot-below to grave, NFC doesn’t put the diacritics back together. Specifically, I expect

{{#invoke:User:MuDavid/vi-adj|rdp_one|độ}}

to give

đồ độ

but instead it gives

đồ độ

with the circumflex on top of the grave in the first syllable, instead of the other way around. How do I solve this? MuDavid 栘𩿠 (talk) 03:18, 17 January 2024 (UTC)

@MuDavid You're right that the dot below should go before the circumflex, given the respect combining classes for the characters. Unfortunately, it's an issue with the ustring library, which is something you'll have to open a Phabricator ticket about. Theknightwho (talk) 03:57, 17 January 2024 (UTC)
There may be a misunderstanding. My problem is that dot-below does go before circumflex, and if the grave goes before the circumflex, toNFC doesn’t put them together correctly. I managed to solve it with another gsub. (The first time I did this, it failed, apparently because those pesky zero-width characters were in the wrong order.) MuDavid 栘𩿠 (talk) 08:34, 17 January 2024 (UTC)
@MuDavid Derp, sorry - I must have misread. Glad it's sorted. Theknightwho (talk) 08:48, 17 January 2024 (UTC)
@MuDavid: You're wanting a language sensitive ordering of vowel and then tone, but that isn't available unless we add it. (Yoruba and Vietnamese conflict on what's a vowel diacritic and what's a tone mark.) Additionally, the combinations of grave and circumflex are formally sensitive to the order, and I'm not sure that the order grave then circumflex doesn't occur for some language or transliteration standard. Use of gsub does seem to be the only solution. --RichardW57m (talk) 12:05, 17 January 2024 (UTC)
Not necessarily. What I want is, if I start from ộ, apply NFD, substitute dot-below with grave, and apply NFC, to get ồ, rather than the Frankenstein of ò + circumflex. Is there any language that puts circumflexes on top of ò? If so, why didn’t they get their own codepoints for that? And if not, why doesn’t o + grave + circumflex recombine to ồ under NFC? MuDavid 栘𩿠 (talk) 01:38, 18 January 2024 (UTC)
@MuDavid The issue is that the Unicode assumes the order only matters if the diacritics are on the same side of the letter: e.g. o + circumflex + grave will never get normalised to o + grave + circumflex or vice versa; however, it will change the order if the diacritics are on different sides, because the assumption is that it doesn't matter. e.g. o + grave + dot below is exactly the same as o + dot below + grave. The problem you were having came from the fact that you were converting a diacritic on one side into a diacritic on the other. Unicode (probably arbitrarily) decided that dot below should go "before" the grave or circumflex, but obviously that doesn't matter under most circumstances.
What you suggest of o + grave + circumflex combining to ồ would be a bad idea, because it would make it impossible to type ò̂ on MediaWiki, for example, since the software automatically converts it to NFC on save. Theknightwho (talk) 02:49, 18 January 2024 (UTC)

{{km-usex}} (also {{km-xi}}) can't handle Khmer numerals and punctuation. It produces errors.

  1. Symbol : {{km-xi|ជម្រាប សួរ ។|Hello.}} should give ជម្រាបសួរcumriəp suə.Hello.
  2. Numerals: {{km-xi|០១២៣៤៥៦៧៨៩}} should give ០១២៣៤៥៦៧៨៩01234567890123456789

Anatoli T. (обсудить/вклад) 06:01, 17 January 2024 (UTC)

@Theknightwho: Hi. Are you able to fix it? Anatoli T. (обсудить/вклад) 06:14, 18 January 2024 (UTC)

WT:Todo/Lists - looking for collaborators

As some of you have noticed already, I have taken the initiative to start a project for generating cleanup lists: WT:Todo/Lists.

Quite often someone will say "keeping track of should be done by analysing dumps, not with a live maintenance category that incurs a cost every time the page is rendered". This project is intended to bridge the gap between intent and reality!

The Todo Lists project is designed to be an enduring, collaborative operation, not dependent on any individual user as so many of our current cleanup lists are. (For example, Erutuon's list of typos in headers hasn't been updated for two years, and DTLHS' lists are also frozen in time.) Moreover, it will provide a central location for todo lists so that they can be easily found, similar to Wikipedia's database reports page.

From a technical standpoint, the project is hosted on Toolforge, one of WMF's cloud platforms. For maximum accessibility, the code is written in Python (there is some SQL as well) and I have fully documented everything at WT:Todo/Lists/technical documentation.

I plan to eventually make a post at BP and WT:NFE properly announcing the project and asking for suggestions beyond the set of 11 todo lists currently provided. But for now, I am looking for people to join the project as collaborators. There is no obligation on any collaborator to do anything – I just want to make sure other people have access to the infrastructure in case I vanish from the project, and perhaps start contributing todo lists of your own.

Let me know if you are interested - pinging some potential people who might be: @Erutuon, JeffDoozan, Jberkel, Benwing2, -sche (did I miss anyone?). There's a tutorial for connecting to Toolforge at wikitech:Help:Toolforge/Quickstart if you've never done it before. This, that and the other (talk) 06:21, 17 January 2024 (UTC)

@This, that and the other: Who's the 'we' for requesting exceptions to the list. There are a few very short module subpages for which it is simpler to have a stub: Module:pi-decl/noun/Cakm (my doing), and possibly Module:languages/data/3 and Module:languages/data/3/extra. --RichardW57m (talk) 13:13, 17 January 2024 (UTC)
@RichardW57m the blank and short pages list needs more thought: it clearly has a problem with false positives. "We" (just me for now) will think about it. This, that and the other (talk) 21:48, 17 January 2024 (UTC)
The white list looks like a solution to me; the problem was just that the 'we' needed expanding to a list one could work on. Of course, the white list should be available for review. In retrospect, Module:languages/data/3 looks as though it never has and never will be usefully used! --RichardW57m (talk) 09:41, 18 January 2024 (UTC)
Good point. I deleted it. Actually its existence was a net negative, as it shoved its way into the auto-generated breadcrumb trail for all its subpages despite having no content. This, that and the other (talk) 09:58, 18 January 2024 (UTC)

Expand subcats and count entries

Is there here or in wmflabs a tool like { {#invoke:family tree|show|ine-pro|yes}} or method to expand subcategories and give the number of entries of each? I wanna get it for etymologies and do statistics. Actually like when you go to see a category, but without the need to expand thousands of subcats. ※Sobreira ◣◥ 〒 @「parlez01:53, 18 January 2024 (UTC)

Wiktionary:Todo/Lists/RFVs and RFDs tagged but not listed

@This, that and the other: This list is picking up the very few entries invoking {{rfv-quote}}, even though all their handling is via the maintenance category system. Example: Pali sithilatta. @Benwing2, as he was involved in a recent revision of the system for challenging quotations. --RichardW57m (talk) 10:33, 18 January 2024 (UTC)

@RichardW57m Good spot. There are two possible ways to solve this:
  • The list's SQL query could be changed so it excludes any pages that transclude {{rfv-quote}}. This could lead to some pages being missed (perhaps they have another RFV template somewhere else on the page), but as you say, rfv-quote is rare so it shouldn't be a big deal.
  • Or we could remove "Category:Requests for verification in LANG entries" from {{rfv-quote}}. To me, that category doesn't belong on pages that are not formally listed at RFV. Compare {{rfv-term}}. But Benwing2 might indeed have opinions about that.
This, that and the other (talk) 11:45, 18 January 2024 (UTC)
@This, that and the other: While I prefer the second approach - but @Benwing2 disagreed because {{rfv-quote}} is so rare - wouldn't a third approach for the rfv-related list be to only include pages that invoke {{rfv}} or {{rfv-sense}} directly or indirectly? --RichardW57m (talk) 12:11, 18 January 2024 (UTC)
@RichardW57m There's a tradeoff here between perfectly correct output and future-proofing. I tend to lean towards the latter: If {{rfv}} and {{rfv-sense}} are hard-coded as the only two RFV templates, a new {{rfv-***}} template that gets created one day wouldn't be included in the list. In fact, we already have {{rfv-t}}, another RFV template where you are meant to list the entries at the central RFV page. And there may be others we're both unaware of! This, that and the other (talk) 23:00, 18 January 2024 (UTC)
I think anything that categorizes into "Category:Requests for verification in LANG entries" should be listed at RFV, and therefore if something is intended to be listed in some other forum, it should not be in that category. For example, {{rfv-etym}} does not categorize into the "Requests for verification" category, as such requests are handled in the WT:ES. So IMO we should either (1) handle {{rfv-quote}}s by listing them at RFV (as was done recently for vagitate), or (2) have that template add a different category, if we're intending to handle them via some other forum. Either of those seems fine to me. - -sche (discuss) 18:43, 18 January 2024 (UTC)
@Benwing2 I'd like to hear from you before taking any action here. This, that and the other (talk) 23:18, 22 January 2024 (UTC)
@This, that and the other: It's not completely clear to me what the issue is (it might help if you posted the SQL query) and what you're asking, but I agree with User:-sche's option (1) here. There are only five mainspace pages that use {{rfv-quote}} so I don't think it would be helpful to have a dedicated category for it. In general, rare RFV variants such as {{rfv-quote}}, {{rfv-t}} and {{rfv-term}} don't seem especially useful to me. Benwing2 (talk) 06:13, 23 January 2024 (UTC)
@User:Benwing2 {{rfv-t}} is useful when a single sense, especially a minor one, of a polysemic term is under challenge. {{rfv-term}} needs to be corrected or deleted: It sends items to RfD, not RfV. It is intended to lead to removal of spurious redlinks from derived and related terms, but is more often discussed than used. DCDuring (talk) 15:55, 23 January 2024 (UTC)
@Benwing2 the issue is tangential to the SQL query; Richard and -sche convinced me that the problem should be fixed on-wiki. Let's follow -sche's (1) then. Thanks! This, that and the other (talk) 08:35, 23 January 2024 (UTC)
@-sche @RichardW57m the possibility now exists to list instances of {{rfv-quote}} at the relevant RFV request page. This, that and the other (talk) 08:42, 23 January 2024 (UTC)
@Benwing2: {{rfv-term}} is another reason for the query Red_and_Black-Link_Disverifications, as failures to verify don't get recorded unless the entry is spelt the same as another. If we can created dodgy entries RfV-ed on creation, then we don't need {{rfv-term}}. --RichardW57m (talk) 10:37, 23 January 2024 (UTC)

Trying to edit something but I am told my action was identified as harmful.

If you go to the lemma for "Fräulein" the Usage Notes read: "Fräulein as a formal address for an unmarried woman is now uncommon and considered disrespectful and sexist by woke people." Removing the "woke people" part results in what I am talking about in the title. 2A02:587:5F88:9300:6094:638:313D:994F 13:46, 19 January 2024 (UTC)

I’m not sure why that happened, but I’ve reverted the edit which introduced those words, which was made by an anonymous editor. — Sgconlaw (talk) 15:07, 19 January 2024 (UTC)
Pinging @Chuck Entz who set up the filter in question. The filter seems to disallow more than it permits for the users affected; I wonder if it needs tweaking. This, that and the other (talk) 06:28, 20 January 2024 (UTC)
@This, that and the other: this is an awkward situation involving a very persistent and rather dense Greek IP editor who has used a wide range of IP addresses. They have studied philosophy and the part of physics that's related to cosmology, so they're convinced that they know everything in those areas better than the poor, misguided speakers of the English language. As such, they've invented new words and have tried to redefine a number of existing ones to conform to their theories. Because they use their own made-up definitions for the words in their definitions and talk-page messages, they tend to border on gibberish (see diff for an example). Some of the entries they've abused include aracial (and derived terms), atheism, metaphysical / antimetaphysical and variations, personocracy, precosmic, etc. (there's much more than that, but this is as much as I can find quickly). Normally the solution would be a range-block until they give up and go away, but they've used IP addresses in most of the ranges available to Greece, and they show no signs of going away (They've been around since 2015 and there's at least one edit in the filter log from November, 2023 that looks like them). The filter was an attempt at allowing as much access as possible to Greek content for Greek editors while still keeping this person out of the areas where they were adding their own stuff. I'm not that great with regexes and I didn't want to add too much overhead to the abuse filters by using separate conditions, so I'm sure it could be tightened up quite a bit. Unfortunately, there doesn't seem to be an easy way to allow edits like the ones in this case. Chuck Entz (talk) 05:34, 21 January 2024 (UTC)
@Chuck Entz thanks for the explanation. It looks like there isn't anything we can do in that case. I'll link this post from the "notes" page of the filter in question for future reference. This, that and the other (talk) 06:36, 21 January 2024 (UTC)

QQ stops working at a certain time of day

Citing stuff the last few days, I notice that at a certain time of day (now-ish; I could work out exactly when it is each day from my contribs), Quiet Quentin stops displaying results for me, returning "no more results" no matter what I search for—even phrases that get plenty of results when I go to books.google.com.
My guess is (less likely) there's a time-dependent bug whereby QQ stops working for a while (since after several hours it starts working again), or (more likely) Google stops letting QQ make searches at a certain time or after a certain number of searches.
If anyone can search for quotes of common phrases with QQ now, we could rule out "QQ stops working at a certain UTC time", and I guess someone could pull a list of the N-thousand most common words from somewhere and write a script to make QQ search for each one to test the theory that Google stops letting QQ make searches after a certain number.
If anyone can find a way to let QQ 'forward' any "we think you're a robot, please identify cars to prove you're not" tests to the user, that'd be cool. BTW it'd also be great if we could make QQ forward any of Google's "although you searched for X, we changed your search to Y; if you meant X, click here" messages so I could still use QQ to find and format cites of words that Google won't allow QQ to search for. - -sche (discuss) 03:53, 20 January 2024 (UTC)

I have experienced that kind of problem using Google Books, getting sometimes no results for searches that yield plenty at other times. Does QQ use Google Books? I could pay closer attention to the timing, but I haven't been using bgc regularly lately. DCDuring (talk) 15:08, 20 January 2024 (UTC)
Nevermind. My problem seems to occur at all times, probably Google-caused, though. DCDuring (talk) 15:54, 21 January 2024 (UTC)
FWIW, I'm experiencing this again (starting ~2 hours ago); anything I search for (in QQ) turns up 'no more results'. I'm not sure if it's time-of-day related or if Google stops letting QQ make searches after a certain number of searches (which, of course, probably wouldn't be something Wiktionary could solve). - -sche (discuss) 02:08, 2 August 2024 (UTC)
Probably it has something to do with QQ not knowing how to recognize pictures of fire hydrants or traffic signals... Chuck Entz (talk) 05:13, 2 August 2024 (UTC)
It probably means that, from Google's PoV, this is a feature, not a bug. They really want to know what individual humans search for so that they can charge for that information one way or another. Anything that even superficially looks like a machine would throttled down and even blocked. I suppose we should be grateful that they still let us search old books and newspapers using search terms that don't have too much to do with merchantable items and candidates. DCDuring (talk) 12:52, 2 August 2024 (UTC)
can confirm it's down again as of the last couple hours, i don't believe it's timeout/rate limited related though as it was down for me yesterday right at the start of my editing session Akaibu (talk) 22:10, 3 August 2024 (UTC)

Changing Old Galician-Portuguese's code + adding it to Module:languages/data/2

(Paging @Stríðsdrengur, Amanyn, Nicodene for having contributed to the language recently.)

Some time ago, we decided to change Old Portuguese's name to "Old Galician-Portuguese". One of the reasons was so it'd be a 'neutral' term and wouldn't exclude Galicians. I 100% agree with that. However, we didn't change the code. Its code, roa-opt, still means "Romance-Old Portuguese", which still excludes speakers from outside Portugal. Furthermore, the "roa-" prefix makes the language inconsistent with similar languages like Old Spanish (osp) and Old French (fro). With the recent discussions and reformulation projects on the language, I think it's time for it to have a new, more suitable code (yes, again — I saw in WT:AROA-OPT that it was changed before once, oof). I have two suggestions:

  • Portuguese Wiktionary uses the gpm code. I believe it stands for "Galego-Português Medieval". I'm not sure why different wikis are allowed to use different ISO codes for things, but I like it and it'd work just great for solving the aforementioned issues. I think it'd be similar to Old French's code where it seems to be based on the original language?
  • Alternatively, me and other editors of the language have been referring to the language as "OGP" for a while now... for self-evident reasons. ogp could be a great code for it.

That's what I had to say for the first half of the section title. Now for the second one.

@Thalyson2019 and @Froaringus have been putting diacritical marks on different verb conjugations in the verb tables only in order to facilitate reading for modern-day speakers of Portuguese and Galician, as there have been a few changes to vowels ever since the 1200s. These marks are modern inventions added completely for didactic reasons and, since most forms don't yet have their own articles, this is (for now) the only way to convey the proper pronunciation of the terms as indicated by scientific papers and comparative evidence.

Since the accentuated forms didn't actually exist, I'd like them to receive the Latin macron treatment in Module:languages/data/2 where any links to OGP with accents in them get automatically converted to links into accentless pages. This is for diaereses, grave, acute and circumflex accents only (especially the latter two), as tildes did exist back then and were actually way more abundant than they are in the modern day. MedK1 (talk) 04:54, 20 January 2024 (UTC)

@MedK1 The problem with changing the language code to a three-letter code is that we generally prohibit inventing our own three-letter codes. We generally only allow such uses when there's an existing ISO 639 code for the language, which there isn't in the case of Old Galician-Portuguese. That's why this language has a code prefixed by roa-. Also there are many codes of various sorts that reflect former names for languages, and we generally don't change them for this reason. As for changing the entry-name handling to automatically strip certain diacritics, I'm not opposed to that but we need to make sure that these diacritics never occurred in the original texts. Benwing2 (talk) 05:54, 20 January 2024 (UTC)
@Benwing2, @MedK1 Since we can't invent our own three-letter codes, then maybe changing it to "roa-ogp"? I thought this was the one going to be proposed here, as it makes more sense, since we have to keep the "roa" part. Or, as Benwing said, keeping the current one. Have a good day everyone. Amanyn (talk) 12:21, 20 January 2024 (UTC)
@Benwing2: I understand that making up prefixes isn't exactly what's standard, but isn't Old Spanish right there with a made-up three-letter prefix? Although it's listed here, the page links to loc.gov in case of any discrepancies, calling that the "definitive source". And indeed, Old Spanish is nowhere to be found there. How come O. Spanish gets its own made-up prefix and O. Galician-Portuguese can't? Shouldn't it be "roa-osp" or something?
It'd be great if we could do away with the roa- prefix; with how the extra four characters more than double the length of the original code, it ends up being quite the clunky little thing, especially since it needs to be typed multiple times per page...
"we need to make sure that these diacritics never occurred in the original texts" None of the OGP editors have seen any diacritics like that afaik. It is known that ^ is a somewhat modern invention too, so at least that should be out of question. Technically nothing is impossible, but I really doubt any such diacritics it could've been a thing... Even the tilde only existed because it was shorthand for an "n", it most definitely wasn't an actual accent mark per se — this is reflected by how it's considered wrong to call it an "acento" in Portuguese. MedK1 (talk) 21:56, 20 January 2024 (UTC)
osp is an ISO code. It's not something we "made up"; you can tell because you linked to where it's in the ISO code list. gpm and ogp are not valid ISO codes at present, but the ISO could assign them to languages at any time. We can't (or at least, shouldn't and won't) assign our own three-letter code to Old Galician-Portuguese ourselves, because (among other reasons) if the ISO does assign that code to some language (they do tend to add several codes each year, almost exclusively for living languages) it creates a problem for us if we're using it for a different language. So yes, whatever code we use needs to be of the form nearest ISO family code + hyphen + three letters for the language... or you could request that they assign ogp (or gpm, or whatever) as an actual ISO code for Old Galician-Portuguese. Although they've made some statements about only adding codes for living languages, I recall an extinct language nonetheless getting approved within the last few years. - -sche (discuss) 23:25, 20 January 2024 (UTC)
@-sche, Benwing2 Messages to the ISO are being sent so if everything goes well, the code should be makable soon. In the meantime, can we get the accent link thing done? I'm positive they didn't exist back then. 2804:1B0:1900:42DB:9082:B744:78D9:E9CA 01:02, 30 January 2024 (UTC)
@MedK1 I am guessing this is you. The issue with accents in links should be fixed. Benwing2 (talk) 01:38, 30 January 2024 (UTC)

Help with a quotation template

Hi, everybody
I've recently made some edits to the quotation template {{RQ:it:Eneide}}, and now — for reasons I'm still not getting — putting other quotation templates below it create a new list item. To illustrate—since I'm not sure I put it into words correctly—this is what it looks like in the entry stella:

  1. star
    • mid 1560s , “Libro quinto”, in Annibale Caro, transl., Eneide, translation of Aeneis by Publius Vergilius Maro (in Classical Latin), lines 746–748; republished as L’Eneide di Virgilio, Florence: G. Barbera, 1892:
      Tal sovente dal ciel divelta cade
      Notturna stella, e trascorrendo lascia
      Dopo sè lungo e luminoso il crine.
      Thus a night star, ripped from the sky, falls, and passes leaving after itself a long, shiny tail.
    • 1810 , “Libro XIX”, in Vincenzo Monti, transl., Iliade, translation of Ῑ̓λιάς (Īliás, Iliad) by Homer (in Epic Greek), lines 380–382; republished as Iliade di Omero, 4th edition, Milan: Società tipografica dei classici italiani, 1825:
      Stella parea
      Su la fronte il grand’elmo irto d’equine
      Chiome,
      ἡ δ’ ἀστὴρ ὣς ἀπέλαμπεν
      ἵππουρις τρυφάλεια ]
      hē d’ astḕr hṑs apélampen
      híppouris trupháleia
      The great helmet, fitted with horsehair, looked like a star on the forehead,

Can anyone offer any suggestion as to why this number pops up?
Also, if I put one template directly after the other, like this:
#* {{RQ:it:Eneide | {{...}} }}#* {{RQ:it:Iliade | {{...}} }}
the problem does not arise.
Thank you in advance for your time. —— GianWiki (talk) 09:11, 20 January 2024 (UTC)

@GianWiki: I have removed the newline at the end. J3133 (talk) 15:25, 20 January 2024 (UTC)
Thank you very much. I don't know why I couldn't figure it out. —— GianWiki (talk) 16:23, 20 January 2024 (UTC)

Not having links to the component terms of those English compounds spelled with hyphens (or dashes?) wastes time for curious users interested in relevant definitions of individual words in compound terms. Sometimes it causes the creation of vacuous etymology sections, whose only legitimate justification is to provide those missing links. I don't see why it should be necessary for contributors to have to add a 'head' parameter. Contributors often don't add them, possibly expecting the same linking behavior as for English terms with spaces.

I don't know what other languages this might apply to, but English Wiktionary needs to at least handle English terms well. DCDuring (talk) 17:32, 20 January 2024 (UTC)

@DCDuring: I'm not sure what you're talking about. There is only one inflection template for English, and Special:WhatLinksHere/Template:en-conj doesn't include any hyphenated terms. Do you mean the inflections shown by the headword templates? Even there, all the display is the inflected forms, which are links to whole inflection-of entries.
The only thing I can think of that you might be referring to is the headword display, which links the individual space-separated words by default. If that's what you're referring to, I can see how it might be tricky in cases where affixes and freestanding words overlap in spelling- after all, there's a reason why {{affix}} requires affixes to be marked with hyphens. This would be a problem mostly in alternative-form entries, since we generally have the full entry at the hyphenless form. Chuck Entz (talk) 00:27, 21 January 2024 (UTC)
@Chuck Entz I *think* User:DCDuring is indeed referring to linking the components of hyphenated English terms in the headword display. I have implemented this for several Romance languages, where if there's a hyphen but no space, the hyphenated components get individually linked. If there are both spaces and hyphens, the default behavior is to only link the space-separated components, but this can be changed using |splithyph=1. Conversely, |nolinkhead=1 disables any linking of components, even in space-separated terms (e.g. for cases like English je ne sais quoi). Benwing2 (talk) 02:47, 21 January 2024 (UTC)
Sorry for the confusion my error caused. Yes, I meant the headword templates.
Am I missing something in my rationale for this? I am glad there are ways to manage the exceptions. DCDuring (talk) 03:22, 21 January 2024 (UTC)
@DCDuring: Just FYI, I am working on this. Benwing2 (talk) 01:57, 25 January 2024 (UTC)
@Benwing2: attitude-y, grand-daughter, and great-grandson should link, respectively, to -y, grand-, and great- instead of y, grand, and great. J3133 (talk) 08:18, 9 February 2024 (UTC)
@J3133 I have purposely avoided treating things as prefixes if they also exist as regular English words, like great and grand do. For example, we have plenty of great-aunt and great-uncle terms, but we also have great-go, great-hearted, great-heartedness, great-pox, great-tailed grackle and great-tit that are correctly linked with great not great-. Similarly we have grand-duc, grand-ducal, grand-duke, grand-guard and grand-quarterly that are correctly linked to grand not grand-. For similar reasons I don't include counter- among the prefixes. I think it should be possible to use |head=~great-:~ and |head=~grand-:~ to link the prefix together with the following hyphen, using the link modification syntax; if not I will make it work. The case with -y is different and it's reasonable to link it as a suffix by default; I don't have any support for suffixes like this but I can add it. Benwing2 (talk) 10:37, 9 February 2024 (UTC)
@Benwing2: Others with this problem are e- (e-book and e-mail link to e), -esque (1984-esque and Dubai-esque link to esque), and -like (dungeon-like and roguelike-like link to like). J3133 (talk) 07:27, 11 February 2024 (UTC)
Why do we even have an English L2 for great- when we have the appropriate definition at great? DCDuring (talk) 14:00, 9 February 2024 (UTC)
@DCDuring: Collins and the Cambridge Dictionary also have great- (prefix). J3133 (talk) 14:10, 9 February 2024 (UTC)
But not MWOnline, AHD, WNW, Oxford, Cambridge American. A redirect would serve our users well. DCDuring (talk) 15:31, 9 February 2024 (UTC)
@DCDuring That would also be inconsistent with how we handle most terms, which wouldn't be helpful to readers. Theknightwho (talk) 15:39, 9 February 2024 (UTC)
I'd be happy to see the facts about that. DCDuring (talk) 16:21, 9 February 2024 (UTC)
I would like to say that (1) I have no opinion on this issue and I am fine with any of the end results that I'm seeing above and (2) I am currently cleaning up the hyphenated geographical terms that I have worked on where I had initially not included any "head" information, but now, apparently due to recent site-wide changes, am required to do so in order to avoid linking the individually meaningless syllables of the hyphenated words. Here is an example edit of what I'm doing: diff. This is no problem for me, and I will integrate this "head" information into future entries. --Geographyinitiative (talk) 16:33, 9 February 2024 (UTC)

Ryukyuan transliteration and lemmatization

I am uncomfortable with using Kanji + kana to transliterate Ryukyuan. As far as I know, this system is only used by JLect, which is ran by someone who doesn't equate cleanly with the field of Japonic linguistics, foreign Yamato-n-chu (ヤマトゥンチュ) and Wiktionary (highly contagious!).

I would recommend Hirayama Teruo (1992)'s 現代日本語方言大辞典 (if my mother can buy the 8 vols that cost over a hundred and something + shipping; I'm in the US; the ILL is taking forever). Sadly, Aramaki Morozov's account (not the guy himself!) got killed by the WTIUM (Wiktionary Inactive User Maker), and everything is almost literally a dog's breakfast because of this.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): what can we do? Chuterix (talk) 23:28, 21 January 2024 (UTC)

This seems more like a WT:BP subject?
Setting that aside, inasmuch as Ryukyuan languages are poorly documented, and that there does not appear to be any standardized orthography, my general impression is that we should use the Latin alphabet, with possible additions from IPA where necessary. In such Latin-alphabet lemma entries, we should catalog any known in-the-wild spellings. ‑‑ Eiríkr Útlendi │Tala við mig 19:51, 24 January 2024 (UTC)
Yeah, I didn't know where I should post that too.
Ainu is also katakana language, in addition to Latin. Perhaps a mix of such for all Ryukyuan languages? Chuterix (talk) 20:07, 24 January 2024 (UTC)
We should use the formulation that natives use even if it's not fully standardized. I see Katakana as a whole more common than the use of the Latin script (outside of texts made for English speakers), so if we change from the Kanji system we have right now, imho it shouldn't be the Latin script. It feels a bit weird to do so. AG202 (talk) 16:17, 31 January 2024 (UTC)
My impression is that these languages are largely undocumented by native speakers, with the bulk of written works being composed by others writing about the languages. If I'm wrong and we do have literature in these languages, then great! Yes, we should use whatever scripts and spellings we see there.
But if there is no literature written in these languages, or even just super sparse literature, and if academics writing about the languages cannot agree on a katakana orthography (which is what I've seen so far, with each author seemingly coming up with their own transcription scheme), should we not use the Latin alphabet? We are, after all, the English Wiktionary, targeting an English-language readership. And, many of these languages have coda consonants, which (aside from nasal /ɴ/) katakana was never intended to express. The katakana character set had to be extended (somewhat clumsily, IMO) to express the set of coda consonants in Ainu, and support for that is very spotty as it is. ‑‑ Eiríkr Útlendi │Tala við mig 18:18, 2 February 2024 (UTC)
"We are, after all, the English Wiktionary, targeting an English-language readership". This should absolutely never be the rationale for using romanization for lemmatization. At that rate we might as well be putting everything in romanization for an English audience. If academics can't agree on a katakana orthography, then we could easily find the one that's most common and lemmatize based on that. Other languages (even written in the Latin script) do the same. Also, orthographies are never perfect nor do they always express everything, but that is not a reason to shift to another script that's not as common nor used by natives. I'm very concerned that this option keeps being brought up more and more.
Taking the example of Okinawan, according to the Wikipedia page for it, there have been published works from centuries ago till now that are written in Okinawan. If I can take the time to find tons of written works for Jeju, and even less-documented language that wasn't written until the past few decades, then I'm sure that folks can actually take the time to find works written in Ryukyuan languages. AG202 (talk) 18:53, 2 February 2024 (UTC)
My point there was more that, if there is no standardized katakana orthography, we have nothing at all to go on. We should not be in the business of inventing a katakana orthography. However, we do have the Latin alphabet, and the (mostly-Latin-alphabet-derived) IPA, so if we are to create entries for these languages, the alphabet + IPA makes more sense to me (as a provisional measure, at any rate, until such time as there is a standard used by actual native speakers) than trying to invent a standardized katakana approach. FWIW, I have encountered a couple papers written in Japanese that use the alphabet to transcribe Ryukyuan words.
Okinawan is less what I was thinking about, as yes, we have literature in Okinawan. Meanwhile, I'm not aware of any corpus for other Ryukyuan languages like Miyako. ‑‑ Eiríkr Útlendi │Tala við mig 02:19, 3 February 2024 (UTC)
We could easily try and find the most common spelling and link the others as alternative forms. We just have to put in the effort to do so. The Ryukyuan languages are far from the only languages that don't have standard prescribed orthographies, but I'm not going to go argue that they should all be in IPA or something for example. If it does end up that Ryukyuan speakers use the Latin script the most, then I won't object to its usage, but until we find out what is used the most, we should maintain the status quo. AG202 (talk) 02:30, 3 February 2024 (UTC)
@Eirikr Yeah, I completely agree with @AG202 that using English orthography by default is the wrong approach. There is a lot of value in using an orthography that's congruent with other languages in the same family where possible. Theknightwho (talk) 00:00, 3 February 2024 (UTC)
See above -- my impression is that there just isn't any standard orthography. Which actually raises the deeper question, should we be creating any such entries in the first place?
Separately, I cannot agree with the idea that, because Japanese and, say, Miyako, are in the same family, that we should therefore use katakana to record Miyako. Katakana simply cannot express the range of phonemes that actually appear in languages outside of Japanese itself. Japanese authors using katakana to transcribe Miyako do so by means of inventing all kinds of difficult-to-reproduce diacritics and character variations, sometimes handwritten and not reproducible as computerized text. This is partly why there seems to be no standardized katakana orthography -- the script itself doesn't express the range of sounds. ‑‑ Eiríkr Útlendi │Tala við mig 02:25, 3 February 2024 (UTC)
In this case I agree with User:Eirikr that "use an orthography that's congruent with other languages in the same family where possible" is not a good rationale. If there is no standard native orthography, using the most common scholarly transcription (which is likely to use the Latin alphabet) is far better than doing something ad-hoc in Katakana. Benwing2 (talk) 03:08, 3 February 2024 (UTC)

Adding article for 

I don't really know what else to say really. Funmoring46 (talk) 00:06, 22 January 2024 (UTC)

@Funmoring46: First of all, we don't have entries for most brand names or logos- see WT:BRAND. More importantly, this is a Private Use Area character. That means it can be completely different things on different systems- there's no standard. For me, it displays an Apple logo because I'm viewing this on an Apple computer. For someone using an Android phone or a Windows or Linux PC, it may show something else entirely, or just tofu. Chuck Entz (talk) 00:49, 22 January 2024 (UTC)
Only my Linux computer, it is a grey box. —Justin (koavf)TCM 01:44, 22 January 2024 (UTC)
@Chuck Entz @Koavf On Apple devices U+F8FF displays as the Apple logo, which is presumably what motivated the request. This is a great example of precisely why the filter against non-characters exists, because otherwise users will inadvertently create entries that are nonsensical to others. Theknightwho (talk) 02:23, 22 January 2024 (UTC)

Tamil consonants with superscript Arabic numerals

Over at Module talk:sa-convert#Tamil we are trying to sort out how to support such Tamil script 'nuktaed' letters as ப², ப³, and ப⁴, which are primarily, but not exclusively, used for Sanskrit and Saurashtra. I propose the following protocols that could conceivably impinge on the use of English Wiktionary for terms in other languages.

  1. Pages whose names differ only by the permutation of vowels, non-spacing marks, superscript digits and subscript digits immediately following a Tamil letter may be made hard directs to one another.
  2. Such permutations may be made to the headwords of such page names to obtain a least objectionable rendering.

For example ப³ீ may be the ideal encoding for 'bī' (Unicode document L2/10-440 says it is), but பீ³ generally renders better, and is identical to the ideal encoding in appearance.

We might decide to further limit ourselves on what we do.

If a super- or sub-script number in such a permutable sequence is to be interpreted as a normal superscript or subscript such as could be found in the Latin script, it should be separated from the Tamil letter by Zero Width Non-Joiner (ZWNJ, U+200C). Notifying (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76, RichardW57): also @Exarchus, Sbb1413. I think we may ask @Theknightwho to implement the choice of least objectionable rendering in links, as a type of normalisation.

Does anyone object to these protocols? --RichardW57m (talk) 15:31, 23 January 2024 (UTC)

Interesting failure mode in transliteration

I don't know if we need to do anything about this anomalous behaviour. The script determination for transliteration is not looking for the language's script by determining the language's script that best matches the text, but is only looking for the best match amongst scripts for which the language has a transliteration module.

This is occurring when I test that transliteration of Sanskrit from Devanagari to Soyombo and then from Soyombo to Roman script yields the same result as from Devanagari to Roman, as a fallback for not having the required Soyombo form in the test data. (The test results are at Module:sa-convert/testcases/Soyombo. There can in general be reasons why the transliteration doesn't commute.) However, the test text contains ḷa, which doesn't occur in classical Sanskrit, and so that letter isn't transliterated to Soyombo. Now, we don't have a transliteration module to convert Soyombo to Roman, so the module to transliterate Devanagari to Latin is invoked instead, and therefore transliterates that letter, leaving the rest in Soyombo! (In the other 20 odd test cases, all characters are transliterated, so transliteration from Soyombo script simply fails, and the test case is discarded.)--RichardW57m (talk) 17:45, 23 January 2024 (UTC) Misdiagnosed. The problem is that Soyombo is not registered as one of Sanskrit's scripts. --RichardW57m (talk) 11:16, 24 January 2024 (UTC)

Lua-fication of simple reference templates -- rationale?

I am curious about changes like this to Template:RQ:Kojiki. The older version was plain-old wikicode, pretty uncomplicated at that.

The new version invokes a module, and seems much less straightforward, doing some (to me, anyway) rather arcane and unknowable things with parameters.

→ Considering the problems we have with Lua memory caps, isn't this exactly the kind of change we shouldn't be doing?

→ What is the rationale for converting simple wikicode templates to use Lua instead? This seems to only increase the potential for technical problems. ‑‑ Eiríkr Útlendi │Tala við mig 19:56, 24 January 2024 (UTC)

@Eirikr So I know nothing about this particular template, but:
  1. Lua memory is no longer the issue it was, since the memory cap was doubled recently (meaning no pages hit the limit anymore). That doesn't mean we should go crazy, but Module:quote is generally used on large pages many times anyway, and repeated uses of the same module have diminishing costs.
  2. It looks like most obvious advantage to using Lua with this particular template is that you can actually give the quotation using the template. The old version only gave a citation of where the quote could be found.
Theknightwho (talk) 20:08, 24 January 2024 (UTC)
Hmm, interesting. If memory is not an issue, that happily resolves one of my bigger concerns.
I notice that the template now outputs a colon, regardless of whether there's a quote actually supplied, which seems like incorrect behavior?
I see a similar conversion over at Template:RQ:Man'yōshū, but with no documentation to explain how the quoting works. @JeffDoozan, it looks like you're doing the Lua-fication. Could you add the appropriate documentation? And could you tweak so the template doesn't output a quote sorry, colon, if there's nothing to come after it? ‑‑ Eiríkr Útlendi │Tala við mig 20:18, 24 January 2024 (UTC)
@Eirikr: There is |nocolon= to suppress it in general. It looks as though that needs to be added to the list of propagated parameters. In the general case, one might use the citation line from {{quote-*}} with another presentation template for the quotation itself. --RichardW57m (talk) 15:50, 25 January 2024 (UTC)
One important rationale for Lua-fication is that it has the potential to trap use of misspelled or unused parameter names, like pgae or pg in place of page. However, this functionality doesn't seem to be available on {{RQ:Kojiki}} (yet?). This, that and the other (talk) 01:25, 25 January 2024 (UTC)
@This, that and the other: That is also a general problem with templates which themselves invoke other templates. All that can be checked for is missing parameters. Access to the parameters of two levels up is denied for 'security reasons'. --RichardW57m (talk) 09:18, 25 January 2024 (UTC)
@RichardW57m I actually do understand the rationale behind this one: if a module is nested, say, 7 layers deep, allowing access to the arguments passed at each layer in the process makes things very complicated very quickly; especially when a module could be called by multiple different templates, which could all nest it at different levels.
By only allowing access to 2 levels (i.e. the arguments passed to the template, and the arguments the template directly passes to the module), it forces editors to structure things in a manageable way that avoids all of that complexity. It's still possible to do all the same things, though - it just means we need to be sensible in how our templates are structured. Theknightwho (talk) 10:32, 25 January 2024 (UTC)
@Theknightwho: Looking at {{#invoke:quote|call_quote_template}}, I'm not sure it isn't just moving complexity to a more intimidating place. I was contemplating converting calls of {{quote-book}} within templates to {{#invoke:quote|quote_t|type=book}}, but it would cause problems in converting a page number from Western Arabic digits to a presentation of, for example, both the Arabic number and the system in the book itself. (And the latter has some tricky cases, such as front matter with Arabic page numbers and the main body's page numbers in native script.) There is a reason for banning #invoke from main-space pages.
On the other hand, ensuring standardisation of quotations by using {{quote-book}} would have made perfect sense, at the price of losing the checking of parameter names. --RichardW57m (talk) 15:43, 25 January 2024 (UTC)
@RichardW57 Yeah there isn't always a perfect solution. I also understand User:Theknightwho's point about not allowing arbitrary access to parameters farther up the call stack; in fact, in a typical programming language, you don't even have two levels of access, you only have one, and you have to thread all the parameters through a series of function calls if necessary (this is why we often prefer to put the params in an object and pass just the object). I think the gain that comes from having parameter checking is significant; for example, when I started cleaning up Module:quote, there were maybe 2500-3000 invocations of {{quote-*}} with invalid parameters, causing all sorts of broken display, and it took weeks and weeks to clean them all up. I also think if the MediaWiki developers had in mind from the start creating a two-level system with templates and Lua behind it, they could have done a much better job designing the template syntax (using existing things like the C preprocessor or the M4 macro language as inspirations); the current syntax is really awful and painful to work with, but is typical of what happens when complex things evolve organically without an initial design and without restructuring along with way (compare the Perl language, which is now dying for this very reason). Benwing2 (talk) 00:04, 26 January 2024 (UTC)
@Benwing2, Theknightwho:: We could have a general purpose exported Lua function to check that the only parameters passed in are on a comma-joined list, e.g {{#invoke:checker|params|template_name|n_positional|param_names}}. Would that be simple enough, @DCDuring? We might be able to eliminate template_name by examining the frame structure; I would want it for inclusion in the error message. We'd need an escape value for an unlimited number of positional parameters, e.g. -1. Does such a beastie already exist? --RichardW57m (talk) 12:43, 26 January 2024 (UTC)
It might have been if well-documented, but it seems to be in the process of being vetoed. I think I'm better off if none of the taxonomic reference tables get messed with. DCDuring (talk) 14:45, 26 January 2024 (UTC)
@RichardW57m @Benwing2 I’m not keen on partial Lua-isation, because it becomes really inefficient when a template calls into Lua multiple times. Having something like this would incentivise that design. Theknightwho (talk) 13:21, 26 January 2024 (UTC)
@Theknightwho:: What we now have with {{RQ:Kojiki}} is the template calling a somewhat dedicated function in Module:quote to manipulate and check the parameters, and then invoking the template {{quote-book}}, which then calls another function in the same module. That strikes me as even less efficient. --RichardW57m (talk) 14:05, 26 January 2024 (UTC)
Thanks for the summary, RichardW57m. Fay Freak (talk) 16:22, 26 January 2024 (UTC)
Luafication does have the effect of disempowering simple-minded users. DCDuring (talk) 13:15, 25 January 2024 (UTC)
I suggest a different wording :) -- I don't consider myself "simple-minded", and I'm actually technically oriented in specific areas (creator and maintainer of some rather complex scripts used in localization workflows). The shape of my life simply means I don't have the bandwidth to dive into the details of the Lua language, or more specifically, our Lua infrastructure here at EN Wikt.
While the old template setup had its quirks, at least everything was usually in one place. For our modules, things can often be multiple levels deep (as described above), and add to that the apparent allergy to commenting our code modules -- all of which substantially increases the effort required to figure out quite what is going on. I don't have the time or patience for that these days.
→ To rephrase: "Luafication does have the effect of disempowering those users who are not already familiar with our Lua infrastructure." Which, I suspect, may be the majority of us. ‑‑ Eiríkr Útlendi │Tala við mig 19:56, 26 January 2024 (UTC)
@Eirikr: I removed that statement about not using comments. I completely disagree with it, and I'm sure I'm not the only one. Benwing2 (talk) 00:40, 2 February 2024 (UTC)
There are about 2,500 quotes in the mainspace that use a combination of {{RQ:}} plus {{quote}} and I converted a handful of the {{RQ:}} templates used by these quotes in order to support |passage= (and |translation=, |transliteration=, where applicable). This seemed like a fairly uncontroversial change, since many other {{RQ:}} templates invoke the same method and 200,000+ other quotes throughout mainspace are formatted using Module:quote. I don't want to convert all existing {{RQ:}} templates to Lua, but there are about 10-15 more that, if adapted, would remove most of the split usage of {{RQ:}} + {{quote}}. I'm happy to adopt whatever 'best practice' comes out of this discussion. JeffDoozan (talk) 16:11, 26 January 2024 (UTC)
I have not seen such kinds of reference or quotation templates, of the form {{#invoke:quote|call_quote_template}}, but it’s thrilling and I see nothing unreasonable. My quotation and reference template editing art culminates in {{R:sem-eth:Littmann}}. Fay Freak (talk) 16:22, 26 January 2024 (UTC)

Linking to a gloss

Is it possible when creating a wikilink from a definition description to a definition to link or specify the relevant gloss? For instance, at chemical proteomics, I've wikilinked discipline which would best be linked to or indicated to be for the 2nd gloss of the term. Thoughts on this? Regards --01:36, 26 January 2024 (UTC) Ceyockey (talk) 01:36, 26 January 2024 (UTC)

Similarly, I've added a wikilink to native where gloss 9 is the best target. --Ceyockey (talk) 01:41, 26 January 2024 (UTC)
@Ceyockey Yes, you can do it using the {{senseid}} system. See the "Examples" section at Template:senseid in particular. Does this help? This, that and the other (talk) 02:13, 26 January 2024 (UTC)

Label 'epicene'

In Tamil entries like உம் (um), 'epicene' is used as label, but doesn't link to the glossary. Should 'epicene' be added to 'Module:labels/data', or should the Tamil labels be changed to 'gender-neutral'? Exarchus (talk) 10:29, 26 January 2024 (UTC)

@Exarchus gender-neutral tends to mean something other than epicene. The problem with epicene is that it itself has two meanings. We have a gender mfbysense which I think displays as masculine or feminine by sense that captures the specific meaning of a word that is either masculine or feminine depending on the gender of the referent. That appears to be what epicene is intended to mean in this context. Benwing2 (talk) 19:45, 26 January 2024 (UTC)
@Benwing2 In Hindi, 'g=mfbysense' is used for example at पठान (paṭhān), so with still having two different declensions. Exarchus (talk) 20:03, 26 January 2024 (UTC)
@Exarchus That's not a counterexample. Most uses of mfbysense in Romance languages are declined only one way regardless of the gender, but some have different plurals (e.g. Italian nouns in -ista). Benwing2 (talk) 20:06, 26 January 2024 (UTC)
@Benwing2 I think 'mfbysense' is used for languages where all nouns are classified by gender, whereas Tamil is more like English in that it just uses natural gender. Apparently 'epicene' is an established category in Tamil grammar (at least on wiktionary, see verb conjugations), so I wouldn't change the term, what I basically wanted was just a link to the glossary. I think linking 'epicene' to the current explanation ("Having a single form for both male and female referents") wouldn't be inaccurate. Exarchus (talk) 21:17, 26 January 2024 (UTC)
@Exarchus It is true that mfbysense is normally used for nouns with inherent gender, but as I said above, epicene has multiple meanings (see Epicenity), so I would prefer to avoid it. I don't think it's necessary in this case to specify epicene at all; we don't do that for English you or Spanish , for example. Benwing2 (talk) 21:27, 26 January 2024 (UTC)
@Benwing2 In the case of உம் (um) you have a point, the label would be more appropriate for அவர் (avar), although the meaning can still be expressed by the translations. Exarchus (talk) 21:46, 26 January 2024 (UTC)
To whom are we addressing the Tamil entries that would bear this label? Normal folk who are would-be learners of Tamil as a 2nd language? Teachers of TSL? Ourselves? Translators? Are they familiar with this "established category in Tamil grammar"? Least common denominator would be would-be learners of Tamil. Links are OK, but it would be better if folks didn't have to risk getting lost in our glossary. I hope that we at least use sense-id on any link. DCDuring (talk) 15:54, 27 January 2024 (UTC)

Tool for adding derived terms

It would be great to have one, like for the Rhymes, to semi-automatically add Derived terms. The idea is to go to the list at Wiktionary:Todo/compounds not linked to from components/2024-01/page 1, and click a link that adds the terms directly to the ====Derived terms==== section, which it can add if necessary. Please make it happen P. Sovjunk (talk) 18:46, 27 January 2024 (UTC)

That would be useful, especially if it could also properly alphabetize the terms (for example, like {{rhyme list begin}}). — Sgconlaw (talk) 18:49, 27 January 2024 (UTC)
Absolutely. Most of the Derived terms I've edited are an unalphabetized mess in edit mode, but to the casual reader they appear in order (which is, of course, what matters) P. Sovjunk (talk) 19:05, 27 January 2024 (UTC)
Yes, the automatic alphabetizing feature of {{col}} is great, but it doesn’t make it obvious that a particular term has been inadvertently repeated. — Sgconlaw (talk) 19:27, 27 January 2024 (UTC)
Meh, it should be a simple task to find any repeated terms. P. Sovjunk (talk) 19:30, 27 January 2024 (UTC)
It should be possible to simply ignore repeated terms. Theknightwho (talk) 09:15, 29 January 2024 (UTC)

Transliterating foreign language usage examples with numerals

I thought about it myself and @Tooironic asked me about transliterating Chinese numbers in usage examples. For reference, there was an old deleted discussion where I suggested to use

已經隔離14十四
已经隔离14十四
Wǒ yǐjīng gélí guò 14 (shísì) tiān le.
I have already been in quarantine for 14 days.

or by @Suzukaze-c, @Justinrleung the below, where number are respelled in words

嚴禁林區500範圍野炊
严禁林区500范围野炊
Yánjìn zài línqū wǔbǎi mǐ fànwéi nèi yěchuī.
Open-air cooking is strictly prohibited within 500 meters of the forest area.

Well, numbers are numbers, even if they are used in a foreign language, even using different numerals, e.g. "10 in Arabic ١٠ (10) or Thai ๑๐ (10). It get more complicated if a conversion from a different system is required, e.g.   ―  liù qiān  ―  6,000 (six + thousand).

However, I think users want to know how to read numbers, especially when there are inflections, more than possible reading, there some complex rules, etc.

For languages where you can use regular templates and the transliteration is more or less straightforward, I used |subst=, e.g.

  1. 6 알람 맞추었어요.6 (yeoseot)si-ro allam-eul matchueosseoyo.I set the alarm for 6 o'clock - telling the user that a native Korean reading 여섯 (yeoseot, six) should apply.
  2. It can be done similarly with Arabic: ١٠٠٠ اِمْرَأَةٍʔalfu imraʔatinthousand women (native numbers) or 1000 اِمْرَأَةٍʔalfu imraʔatinthousand women (regular numbers).
  3. Here's a Russian example: Из 220.000 оста́лось то́лько 45.000.Iz 220.000 (dvuxsót dvadcatí týsjač) ostálosʹ tólʹko 45.000 (sórok pjatʹ týsjač).Out of 220,000 only 45,000 remained.
  4. Thai with European numbers: 500(ห้าร้อย)เป็นจำนวนธรรมชาติที่อยู่ถัดจาก499(สี่ร้อยเก้าสิบเก้า)และอยู่ก่อนหน้า501(ห้าร้อยเอ็ด)
    500 (‘’‘hâa rɔ́ɔi’‘’) bpen jam-nuuan tam-má-châat tîi yùu tàt jàak 499 (‘’‘sìi rɔ́ɔi gâao sìp gâao’‘’) lɛ́ yùu gɔ̀ɔn nâa 501 (‘’‘hâa rɔ́ɔi-èt’‘’)
    500 (five hundred) is the natural number following 499 and preceding 501.
    ๕๐๐ (500, 500) would be the native Thai way to write out numbers

Some language specific templates don't allow |subst=, so Chinese, Japanese, Thai or Khmer templates {{zh-usex}}, {{ja-usex}}, {{th-usex}}, {{km-usex}} won't allow it as of today. The Khmer template also fails to transliterate numbers currently. It would be great to add |subst= parameter to language specific templates. What are your thoughts on transliterating numbers?

What is the preferred method to transliterate numbers in a foreign language other than simply displaying "10" as "10" or reading the words as in Chinese? Are there any nicer technical solutions? @Octahedron80, @Fenakhay, @Benwing2, @Theknightwho Anatoli T. (обсудить/вклад) 09:01, 29 January 2024 (UTC)

@Atitarev: Writing out numbers in quotations runs a significant risk of getting it wrong. For example, in Thai, would 1500 be "phan hâa rɔ́ɔi" or "phan hâa"? I think ๕๐๐ would be better described as 'archaic' than 'native'; it's not the natural way of writing numbers. (At least, when I've seen Thais working with numbers, they've always used Arabic numerals, not Thai.) There seems to be some undertone to the use of Thai digits. Would you claim that "md" was the native English way of writing out this number, rather than the heathen '1500'? And how would you spell English '1500' out - 'one thousand five hundred' or 'fifteen hundred'?
{{th-usex}} doesn't need |subst=; it's got its own mechanism for overriding 'transliteration'. --RichardW57 (talk) 23:47, 29 January 2024 (UTC)
@RichardW57: I take your point on using numbers in Thai. I just wanted to give an example. The Thai overrides are imperfect when you try to respell a complex or long portion, as in {{th-xi|500{ห้า ร้อย}}}, which doesn't work with numbers and depends on the position of the overrides (can't be at the end of an example) Anatoli T. (обсудить/вклад) 01:53, 30 January 2024 (UTC)
@RichardW57: Since you picked on something I think it was irrelevant to my question, I chose an example where Thai numbers were actually (the example is from a book):
หนองคายอยู่ห่างจากกรุงเทพฯ ๖๑๔กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep 614 gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
Even if the numeral ๖๑๔ (614) were written as "614", the question is - whether we can or should re-transliterate the numerals to help with the pronunciation. I can't demonstrate using Thai methods but I'd like the numbers ๖๑๔ (614) or "614" to be transliterated as หกร้อยสิบสี่(hòk rɔ́ɔi sìp sìi), in other words, show the original text with the numbers only but display transliteration as "614 (hòk rɔ́ɔi sìp sìi)}. @Octahedron80. Anatoli T. (обсудить/вклад) 06:48, 31 January 2024 (UTC)
@Atitarev: And how do you know the intended pronunciation of the sentence in the book in the first place? My experience is that the final vowel is actually short! And in this case, how do you know how the author pronounces ร้อย (rɔ́ɔi)? A lot of Thais pronounce it with /l/, even teachers. If the 'transliteration' is supposed to show pronunciation, then we may be falsifying a lot of quotations. And the 'r' in the name of the capital has problems - there is even a trisyllabic pronunciation with a vowel before it! I just when to check the utterance quoted for โควิด-19, but we've lost the sound, and I can't remember whether the number in that instance was pronounced as English or as Thai - I've heard both in Thai. Basically, extending quotations beyond the record we have of them is a dubious practice, and our transcriptions should not mislead us into thinking of them as records of pronunciation.
Now, fabricating examples is a different matter, but I'd rather give the pronunciation explicitly in IPA. --RichardW57 (talk) 22:08, 31 January 2024 (UTC)
@RichardW57: The book has an audio, it's from a textbook. The reader pronounced it "hòk rɔ́ɔi sìp sìi" with articulated length. Not sure we need to worry about shortening in a transliteration. We transliterate ร้อย (rɔ́ɔi) with "r", even if it were pronounced /l/. I can hear an unclear /r/, rather than /l/ in the particular recording.
The only thing I changed in the example is the spelling of กรุงเทพฯ (grung-têep), which was spelled with a space กรุงเทพ ฯ (grung-têep) in a book, the translit is failing using {{th-x}}. The entry with a space I have just created.
In any case, the reading of numerals is optional in the usexes, but it's up to the editor to provide the most common, intended reading. When I saw ๖๑๔กิโลเมตร(614 gì-loo-méet), I didn't know how to pronounce it (at least one acceptable reading), so I had to use the audio. Why don't you find it useful? I haven't fabricated anything here, I rendered the way it was intended.
I actually find the entry โควิด-19 somewhat useful, since it educates users how to read the term and it's not predictable. Anatoli T. (обсудить/вклад) 22:30, 31 January 2024 (UTC)
@Atitarev: The short vowel I referred to was that of กิโลเมตร (gì-loo-méet). Putting a space before before is wrong. In this case, you have the pronunciation, so that makes a big difference. But in general, although our transliteration looks like a transcription, it's a formally just a Romanisation. And there are some gems, such as the formal pronunciation of น้ำ (náam) having a short vowel, just as in the spelling and as the first element of a compound. --RichardW57 (talk) 22:48, 31 January 2024 (UTC)
@RichardW57: We cater for irregular readings, including น้ำ (náam) but even some with irregular shortening of , which is normally unmarked. กิโลเมตร (gì-loo-méet) is pronounced long in the recording. @Octahedron80 edited it at some stage too. If a short pronunciation is also valid, it can be added. Anatoli T. (обсудить/вклад) 23:11, 31 January 2024 (UTC)
It's beyond this discussion but กรุงเทพ ฯ (grung-têep) with a space is counterintuitive (to me) and seems wrong but it's not wrong and definitely very attestable and common. Anatoli T. (обсудить/вклад) 23:27, 31 January 2024 (UTC)

DocTabs error

After following the steps to reproduce described in phab:T356145, I got "TypeError: portlet is null" at MediaWiki:Gadget-DocTabs.js#L-111. I'm not entirely sure if this is the culprit in the issue reported in the Phab task, but either way it needs to be fixed. I assume the script shouldn't run at all when the page doesn't exist (i.e. when wgArticleId is 0). Nardog (talk) 08:22, 30 January 2024 (UTC)

I've gone and made the function linked above (and a similar one) return early when the portlet (tab container) isn't found so that there isn't an error potentially causing other JavaScript to not run. I'm not familiar with the code, but this should be better. — Eru·tuon 09:04, 30 January 2024 (UTC)

Bot request: Hungarian verbs with multiple conjugation templates

I'd like to find all the Hungarian verbs that have more than one conjugation template under their Conjugation header. E.g. sodor (3), függ (2), haldoklik (2 with no text separating the templates), rezg (2). The templates start with hu-conj. Thank you very much. Panda10 (talk) 19:22, 30 January 2024 (UTC)

@Panda10 Here:
This, that and the other (talk) 00:28, 31 January 2024 (UTC)
@This, that and the other Thank you! These four should be on the list, but they are not, though: haldokol, kormányoz, könyököl, rabol. Would you mind double checking? Panda10 (talk) 17:17, 31 January 2024 (UTC)
@Panda10 oh, I wrongly assumed the multiple occurrences of hu-conj templates would be on separate lines of wikitext. Here, they are on the same line. I'll regenerate the list in a few hours. This, that and the other (talk) 21:10, 31 January 2024 (UTC)
@Panda10 sorry for the delay. The only other one I found, besides the ones you listed, was furcsáll. This, that and the other (talk) 11:06, 4 February 2024 (UTC)
@This, that and the other Thank you very much for your help! Panda10 (talk) 14:17, 4 February 2024 (UTC)

Sanskrit Tamil script transliteration

Please modify Module:languages/data/2 to add Module:sa-Taml-translit to transliterate Tamil script Sanskrit. --RichardW57 (talk) 01:22, 31 January 2024 (UTC)

@RichardW57: Done Done see diff Kutchkutch (talk) 01:38, 31 January 2024 (UTC)

Apostrophes in language names

Currently we have some language names with apostrophes or apostrophe-like characters, where WT:LOL contains a version with a straight apostrophe ', but many of our entries use an L2 heading with a fancier Unicode character, for example, Ḵ̓w with Kwakʼwala instead of Kwak'wala.

Two questions:

  1. Does anyone care? If not, I can just make my scripts ignore all apostrophe-like characters, treating them all as equivalent to one another.
  2. If there are people who do care... What is correct? I am sure some languages, in their own orthographies, use a character that resembles an apostrophe when writing the name of their language, but does that mean we, ostensibly writing language names in English, should follow suit? Should we use a bot to correct all L2 headers to use straight apostrophes?

For reference, I note the existing character set used in canonical language names is -'()aAáÁàÀâåäãbBcCçdDeEéèêëfFgGhHiIíìîïɨjJkKlLmMnNñoOóòôốöÖõồpPqQrRsStTuUúùüvVwWxXyYzZǀǁǂǃ. Some slightly spicy characters in there, particularly the clicks and , but nothing apostrophe-like. This, that and the other (talk) 04:57, 31 January 2024 (UTC)

@This, that and the other This is a perennial point of debate. Personally I think we should *NOT* follow suit in insisting of the "correct" apostrophes in the English version of language names. It just makes it harder to type the language names in the headers and invites silliness like what you're observing. Similarly we generally omit macrons, tone marks and the like in names of languages as rendered in English. This means that absolutely we should bot-fix the headers to match the spelling of the language in the data modules. (This goes independently of whether the Wiktionary data module language name has straight or funky apostrophes in it, does or does not have macrons, acute accents, etc. etc.) Benwing2 (talk) 06:09, 31 January 2024 (UTC)
I would strongly oppose having a blanket rule on this, and when it comes to very small languages I don't think it's safe to assume that the diacritic-less version is even going to be more common in the first place, since the only literature about the language in English will have been written by linguists.
That being said, any L2 headers should match the data, or otherwise we'll start running into problems with modules that check for specific language headings. Theknightwho (talk) 06:25, 31 January 2024 (UTC)
I have a bot task that automatically renames WT:LOL L2 AltName to L2 OfficialName, but for some reason the alt-format apostrophes aren't listed in the data source it was using ({{#invoke:list of languages, csv format|show}}). I switched it to use {{#invoke:User:DTLHS/languages|export_languages|en}} and it's now renaming Kwakʼwala -> Kwak'wala, Kʼicheʼ -> K'iche' and iʼ Chʼortiʼ -> Ch'orti'. If there's a better source for machine-parsable language name data, please let me know. JeffDoozan (talk) 19:08, 31 January 2024 (UTC)
@JeffDoozan from WT:Todo/Lists/Language headers not in WT:LOL it looks like there are couple more languages affected by wrong apostrophes. Would you consider doing a one-off bot run to fix these apostrophes? This, that and the other (talk) 22:07, 31 January 2024 (UTC)
It just cleaned up everything on User:JeffDoozan/lists/section_headers/fixes#Language_title_is_in_'Other_names'_on_WT:LOL, which looks like it includes the languages on your list that I didn't notice when compiling the above list (Feʼfeʼ and Oʼodham). If you notice anything else that it missed, please let me know. JeffDoozan (talk) 22:16, 31 January 2024 (UTC)
@JeffDoozan: I just noticed your header-related pages and I'm impressed. I used to do it (User:Erutuon/mainspace headers/possibly incorrect and such) but it was messy and not automated enough, so I gave up two years ago. Looks like you've picked up the task and done a better job. I think the problem with {{#invoke:list of languages, csv format|show}} is that it hasn't caught up with language data changes. There used to be only otherNames, but now the otherNames are being split into aliases and varieties, which haven't been added to the CSV. — Eru·tuon 01:28, 1 February 2024 (UTC)
@Erutuon, JeffDoozan It's just occurred to me that we should be able to automate a check for this via Lua, now that it's possible to determine the section a template has been called in: the head template just needs to cross-check the L2 header against the canonical language name. The same goes for the part of speech header as well, though there would be exceptions (Chinese Hanzi etc.). Theknightwho (talk) 12:34, 1 February 2024 (UTC)

Romansch altforms

Would it be possible to have a bot set the main lemma to Romansch entries tagged as standard ('Rumantsch Grischun') and altform-ify the others? This would help with the cluttering/repetition problem; e.g. at the moment every variant of nulla shows up in both Category:Romansch cardinal numbers and Category:Romansch terms derived from Latin.

I have manually sorted out chatschar as a demonstration (also replaced {{lb}} with {{tlb}} for further decluttering). Nicodene (talk) 16:05, 31 January 2024 (UTC)

@Nicodene: Personally I am in favor of this but I think we might want to hear from Romansch editors (if there are any active ones), as I have heard that Rumantsch Grischun is a bit controversial (as seems to be the case with all similar standards created to bridge dialectal differences, cf. Basque, Tamazight, Quechua, etc.). Benwing2 (talk) 02:16, 1 February 2024 (UTC)
Thoughts, @Embryomystic? From a survey of the Romansch content I see you've contributed an impressive proportion. Other contributors (not an exhaustive list) include: @Word dewd544, Linguoboy, Vedac13, Waelsch, Pne, Nimic86, Jedi Friend. Nicodene (talk) 18:18, 1 February 2024 (UTC)
I'm not an active speaker of Romansch so I don't have an educated opinion on the issue of RG vs the vernacular forms. I am, though, more interested in hearing about the "clutter" problem. Correct me if I'm wrong, but Category pages are all dynamically generated, so there's no data storage issue here--the underlying text remains {{auto cat}} regardless how many variants are displayed. Is the problem that the display makes matters more difficult for Wiktionary users and, if so, how? Linguoboy (talk) 18:35, 1 February 2024 (UTC)
For anyone who heavily uses categories for research purposes, e.g. to survey all the words in X language thought to derive from Y, or all the ones containing such-and-such sequence of letters and sounds, having to wade through nearly as many duplicates as actual unique words can get tiresome. Likewise having to make identical edits to, say, seven entries for seven Romansch variants any time one wants to update an etymology. Ultimately there is a reason lexicographers operate with a lemma system rather than a free-for-all. Nicodene (talk) 18:57, 1 February 2024 (UTC)
@Nicodene: An issue that comes up if we are to do this is what about cases like baditschun (Sursilvan) ~ baditschùn (Sutsilvan) where there is no Rumantsch Grischun form? In this case, the Rumantsch Grischun form is given as mintun. Benwing2 (talk) 01:09, 2 February 2024 (UTC)
In that case I’d just leave them as-is. Nicodene (talk) 14:02, 2 February 2024 (UTC)
I think that in the absence of input from regular speakers of Romansch, I would lean towards making the Rumantsch Grischun form standard, if one exists, and when there is none, leaving both/all alone but linking to one another. embryomystic (talk) 03:40, 6 March 2024 (UTC)

LiquidThreads deprecation

Hello everyone

As you might already know, the Wikimedia Foundation works on changes to how IP editing is handled: IP Editing: Privacy Enhancement and Abuse Mitigation. Temporary accounts for unregistered editors will be a new type of user account. This requires changing how all the features we use to contribute to the wikis' work. This impacts LiquidThreads (LQT), used at your wiki.

LiquidThreads is a talk pages feature that is not developed since 2014. Only 5 wikis use this extension. As a consequence, we take the opportunity of the work on temporary accounts to remove LQT from the wikis.

Discussion tools are the replacement for LQT. They are the default discussion system at all wikis. They allow anyone to start, reply or subscribe to a conversation. They provide a visual experience on wikitext-based conversations, and they offer more features than what LiquidThreads has.

The goal with this conversation is to respond to your questions regarding the archival of LiquidThreads.

The idea is to proceed in two stages:

  1. discussion pages using LQT are archived as subpages. The pages left blank are replaced by a classic discussion page. In this way, the most active pages will already be ready when we proceed to step 2:
  2. LQT are removed from the wiki. Existing pages (including archived ones) will be converted to a format yet to be defined.

We have a few questions for your community:

  1. Are the reasons given for removing LiquidThreads clear?
  2. Are the two steps outlined above for archiving and uninstalling LiquidThreads clear?
  3. If so, what is a reasonable timeframe for archiving pages for deinstallation? At present, deinstallation is not planned on our side (even if the second quarter of 2024 is mentioned).
  4. In your opinion, what format should pages currently using LQT be converted to when we proceed with the deinstallation of structured discussions?

If you need clarification, please ask! I've subscribed to this section, and I'll try to answer as soon as possible.

Best, Trizek (WMF) (talk) 16:52, 31 January 2024 (UTC)

For those young folks who may not know what this is, LiquidThreads was a threaded discussion/forum system that individual talk pages could (and technically still can) opt into, overriding the standard wikitext-based discussion system. You can see it in action at WT:LiquidThreads testing.
LQT is currently enabled on: (a) the talk pages (in User talk: and MediaWiki talk: space) of various gadgets and scripts developed by Yair rand, and (b) the talk pages of a handful of users, many of whom are long since inactive. The active users whose talk pages continue to use LQT are: Catsidhe, Commander Keane, HastaLaVi2, Internoob, Jagwar, LA2, Pengo, Rua. They may be best placed to comment.
@Trizek (WMF) given the way LQT has been used on this wiki, I would suggest to archive the LQT discussions onto the talk page itself as if the discussions had been carried out using traditional wikitext (by adding a Level 2 header for each thread and an appropriate indent and signature to each post, and reordering the discussions into chronological order by first post). This, that and the other (talk) 22:39, 31 January 2024 (UTC)
Thank you, noted! Trizek (WMF) (talk) 09:27, 1 February 2024 (UTC)
@Trizek (WMF) I'm not sure if this is a known issue but at least half the time when I try to use the "Reply" functionality, I get an error The "reply" link cannot be used to reply to this comment. To reply, please use the full page editor by clicking "Edit".. I know about Phabricator but it's hard for me to figure out if this is a known issue esp. since the error message provides no reason why I can't use the Reply link, and I'm not sure why it sometimes works and sometimes doesn't. Benwing2 (talk) 01:16, 2 February 2024 (UTC)
@Benwing2, is it happening on specific pages? Do you use any gadgets? Trizek (WMF) (talk) 08:56, 2 February 2024 (UTC)
@Trizek (WMF) I've seen it across many pages; probably esp. in the Beer Parlour and Grease Pit since that's where I do most of my responding, but there's no clear pattern. I have the following non-default gadgets:
  • Focus the cursor in the search bar on loading the Main Page.
  • FastRollback
  • OrangeLinks
  • Add a sidebar menu of user-defined regex tools, with a dynamic form for an instant one-use regex or multiple one-session regexes.
  • Add accelerated creation links for common inflections of some words.
  • HotCat
  • FastRevert
  • Enable definition editing options.
  • Remember the dot's syntax highlighter for wiki markup.
  • Editor enhancements for script developers.
  • AjaxEdit.
I'm not sure which of these I actually need since I turned them on awhile ago. Benwing2 (talk) 09:45, 2 February 2024 (UTC)
@Benwing2, I don't see one specific gadget in your list known to cause issues, but it could be multifactorial. I encourage you to make a few tests based on mw:Help:Locating broken scripts whenever you have time. Trizek (WMF) (talk) 15:41, 2 February 2024 (UTC)
I am fairly certain this happens whenever one tries to use the reply button directly from WT:BP or similar pages rather than from the monthly subpage. — SURJECTION / T / C / L / 16:20, 2 February 2024 (UTC)
It could be it, good spot @Surjection. I didn't thought about it as the main cause of malfunctions can be gadgets, but there is this one case that is caused by how pages have been designed by the community.
Transcluded talk pages work, but they don't when HTML markup is present in the transclusion. Templates or pages structures using <div> or <hX> are usually not working. I checked, and it is the case on Template:discussion recent months, where <h1> markup is used. You should consider replacing them with =.
Have a nice week end, Trizek (WMF) (talk) 17:38, 2 February 2024 (UTC)
The issue with =heading= is that it shows the edit link, which is not desirable (It would not point to the right place). Perhaps there is another way to hide it? — SURJECTION / T / C / L / 20:00, 2 February 2024 (UTC)
@Trizek (WMF) I ran into this issue again when trying to add a reply to WT:Beer parlour/2024/January#user Rua is using ə which actually is not a letter of the Slovene alphabet. Going to the monthly subpage makes no difference; I still get the same error. Benwing2 (talk) 20:52, 2 February 2024 (UTC)
Strangely, going through the link I just posted did allow me to respond using Reply. It looks like the Reply-to functionality is very fragile and (a) needs work to make it less fragile, (b) needs to have the generic error message replaced with something that indicates why it couldn't post a reply. Benwing2 (talk) 21:00, 2 February 2024 (UTC)
@Surjection, you can add a __NOEDITSECTION__ tag to avoid the edit link. Trizek (WMF) (talk) 13:34, 5 February 2024 (UTC)
This would also disable the edit links for the individual discussion headings, which is not desirable - only the month headings should have the links hidden. — SURJECTION / T / C / L / 13:41, 5 February 2024 (UTC)
Yeah, correct. But as the Wiktionary: namespace is not a talk page, the solution is probably around where __NEWSECTIONLINK__ is located.
Ideally, Wiktionary:Grease pit should not use __NEWSECTIONLINK__ as it converts it as a talk page. But Wiktionary:Grease pit/2024/January should have __NEWSECTIONLINK__ in it (it should be tested when trancluded in Template:discussion month). I checked at fr.wp's VP, and it is how it is done. The main page can't have new sections added to it (you have to put a "new topic" button to it) but using the reply tool is working well. Trizek (WMF) (talk) 17:17, 7 February 2024 (UTC)
@Benwing2 @Trizek (WMF) The Phabricator task is phab:T259824. This, that and the other (talk) 23:41, 2 February 2024 (UTC)
The problem seems to be that the reply tool can't handle two layers of templates. WT:BP etc transclude {{discussion recent months}}, which in turn transclude the monthly discussion pages. I have experimentally subst'ed {{discussion recent months}} on WT:GP and now the reply tool works for me on WT:GP itself! This, that and the other (talk) 23:48, 2 February 2024 (UTC)
@This, that and the other The fact that this task has seen no action in almost 3 years is not encouraging. Benwing2 (talk) 03:11, 3 February 2024 (UTC)
Reading the ticket, I decided to looped it back for a team discussion, as their decision isn't making sense to me (I'm new in this team). Trizek (WMF) (talk) 11:17, 5 February 2024 (UTC)
@Trizek (WMF) Thank you! Benwing2 (talk) 21:30, 5 February 2024 (UTC)
@Trizek (WMF) is there a relevant Phabricator task that I can subscribe to if I want to stay updated on this? This, that and the other (talk) 00:02, 15 March 2024 (UTC)
@This, that and the other, we have this huge task for community engagement, and one specific to LiquidThreads. We were waiting for the collection of feedback before moving on in details. I'll keep you posted! Trizek (WMF) (talk) 15:15, 19 March 2024 (UTC)
@Trizek (WMF) I don't have a huge talk page to archive anyway, so whatever solution is fine. I only tried LiquidThreads in the hope it would be better than the existing mess of an ad hoc system which was never designed for discussion threads. If DiscussionTools has the same issue as the conventional talk pages of having no perma-url to a thread, no way to link a specific message, and no url which can survives archiving then I will be deeply saddened. Pengo (talk) 05:16, 22 August 2024 (UTC)
@Pengo, actually DiscussionTools enhance conventional discussion pages by adding permalinks to threads, links to specific messages, and a redirect to a message moved elsewhere/archived. :) Trizek (WMF) (talk) 14:33, 26 August 2024 (UTC)
@Trizek (WMF) That is absolutely terrific news, sorry for the skepticism! Pengo (talk) 02:25, 27 August 2024 (UTC)
I won't blame you, @Pengo: it is sometimes too easy to miss things when they are announced. If you want to know more about these features, here is the summary and I can answer your questions. :) Trizek (WMF) (talk) 12:56, 27 August 2024 (UTC)

Could a "nocap" parameter be added to this template? The documentation says to put the general meaning before the "undetailed=1" parameter, but in FL entries, it looks weird, because the first part of the definition would be uncapitalized with a weird capitalization (of the template text that is displayed) in the middle. Also, I think it would be better if the "undetailed" parameter didn't include the "Possible meanings include:" text. Andrew Sheedy (talk) 22:53, 31 January 2024 (UTC)