User talk:Theknightwho/2022

Oxford stuff

Latest comment: 3 years ago7 comments2 people in discussion

Hi. Good to see someone touching these :) I would like to suggest to you that it isn't always appropriate to add an "Oxford English" gloss. I mean, what is Oxford English? The glosses are supposed to describe the language usage, not the location of the referent. I'm a dirty Reading rat but I still call it "the Bodleian". And indeed an American who studied at Oxford and moved on to Facebook would probably do so as well. The Oxford gloss should be used for any strange local language that isn't used outside of that region: I can't think of any example, lol. Maybe Norwich I could think of one or two. Equinox ◑ 17:40, 27 January 2022 (UTC)

Thanks - and that's a good point, and one which is quite difficult to get a sense on given they're not really recorded very well. I created that category for Carfax, which is definitely used in the city as a whole beyond the context of the University, but would draw blank stares in much of Oxfordshire without additional clarification. Hence why I felt the need to create an intermediary category. You're right about the various university institutions, though. I suspect this would primarily apply to cuttings. Theknightwho (talk) 17:48, 27 January 2022 (UTC)

Amusingly the reason you've been popping up in my watchlist is that I imported (with some sane curation, I hope) a bunch of old slang, 18th century etc., and of course Oxford and Cambridge were there. It does mean we probably ended up with a lot of "Ox uni slang" that hasn't been used since 1799. I put "archaic" if I'm sure nobody uses it any more, but let's be fair, there are places that trade on their age. Hahaha! Anyway. All good, I just wanted to raise the point. Equinox ◑

It's been fascinating to see how little much of it has changed, actually! Sconce is a good example. Really, it's only the "-ers" that have fallen out of use, but a few seem to have developed a bit since those definitions were written. This has been mostly working from my own knowledge (which is a half a decade out of date by this point anyway), so there are probably a few things I've added that are going to be difficult to actually find sources for, even though I'm certain they're correct. For example, the pronunciation of OUDS. Thanks for the message, anyway - glad to know someone appreciates me adding these :) Theknightwho (talk) 17:57, 27 January 2022 (UTC)

We don't usually do "place names" but (especially when it's a single word) it's not the hill I will die on. I do love Oxford, it's probably the only city in the world that is beautiful. (Or I'm showing my parochialism. Whatever.) I was talking to someone about the botanic gardens the other day and saying "YES, you can grow oranges in Britain, but it's really difficult". Do you remember when the.. um forgotten the name of the place. ~~something green?~~ GLOUCESTER GREEN... was a goth/rock bar. I was really sad to go back and find that it was a pizza restaurant. But they kindly allowed me to sit in the front with a drink, and I got talking to some economist who is smarter than I will ever be. OH but I miss that rock bar. Okay enough out of me. Equinox ◑ 18:08, 27 January 2022 (UTC)

I think it's one of those places that's much nicer to visit than to actually live in haha. I moved away a couple of years ago after 8 years in the city, and I think I'd definitely had enough Oxford for quite a long time when I left - but it's one of those places you can never quite get away from! I actually dont know which bar you mean - I always used to go to the goth nights at Cellar til that shut down about 3 years ago, but when that went there was basically nothing left! Glad they let you have that drink, though. It's a real shame how much of the Oxford nightlife has basically just disappeared in the last few years. Theknightwho (talk) 18:18, 27 January 2022 (UTC)

Carfax Tower is there so that the cops can tell who is punting into the botanic gardens without paying. Equinox ◑ 18:12, 27 January 2022 (UTC)

You're thinking of Magdalen! Theknightwho (talk) 18:19, 27 January 2022 (UTC)

Special:WhatLinksHere/Module:labels/data/regional

Latest comment: 3 years ago2 comments2 people in discussion

Before you even think about experimenting again with coding in a module like this, I have an assignment for you: follow the link in the in the header of this comment, and page through to the end. Spoiler alert: it may take you a while. there are 925,526 pages in that list, according to Link Count. Even a very minor error can easily break all of them.

Worse, when you make an edit to a module, it isn't like editing an entry. The change has to be propagated to every single page that transcludes it. If there are lots of transclusions, the changes go into the job queue for an automated process that does this for every single Wikimedia project. I've seen it take as long as a week for all the module errors to clear from CAT:E. What's more, the order in which the pages are processed is impossible to predict. You may see the numbers at CAT:E start to go down after you fix an error, then go up again because the bad edits are still making their way through the queue. For any one page, the bad edit has to happen first before the fix can reach it. Even perfectly harmless or beneficial edits should be done with caution, because all changes have to go through the same job queue and you don't want to delay more urgent or important changes.

I know you mean well, but when you're editing a widely transcluded module, you're also editing every single page that transcludes it. I hope you will be more careful and responsible in the future. If you aren't, I can easily lock you out of the the Module namespace altogether- but I'd rather not have to. Chuck Entz (talk) 07:50, 10 February 2022 (UTC)

@Chuck Entz Yes, understood, sorry! I had thought it was an extremely simple addition, and stupidly made the function local, so it wasn't getting called by the submodule. Lesson learned, and I've set up the sandbox for that module for any future experimenting. It's been fixed and works correctly now, so should make it a bit more straightforward to add numerous aliases as and where necessary. Theknightwho (talk) 07:56, 10 February 2022 (UTC)

Latin template issue

Latest comment: 3 years ago4 comments2 people in discussion

Hello. Could you take a look at WT:GP#Latin template issue? You've been the one most actively editing the la-verb module, so I figure this is within your wheelhouse. I have proposed a change that would remove undesired forms like *ingreditōte from ingredior. Since string:find(...) returns nil on failure, I think this is the right check to perform. (Note that, unlike the equivalents in a lot of other languages nil == false evaluates to false in Lua.) 70.172.194.25 05:46, 21 February 2022 (UTC)

Hi - this was absolutely right, yes. Thanks! Theknightwho (talk) 19:03, 21 February 2022 (UTC)

Unrelated, but praecognosco is throwing an error. I think it could be fixed by supplying the third parameter to the headword and conjugation templates (based on similar verbs, maybe it should be |praecognōv, but I'm not confident enough to make the edit). 70.172.194.25 05:38, 23 February 2022 (UTC)

That was correct. I had dealt with all of the other “nosco” verbs, but missed that one for some reason. Theknightwho (talk) 14:46, 23 February 2022 (UTC)

praecognosco

Latest comment: 3 years ago2 comments2 people in discussion

You need to fix Module:la-verb so it knows how to deal with someone leaving out the third parameter in a case like this. I'm not saying it has to come up with the right output- though that would be preferable- but any real error message is better than something cryptic like Lua error: bad argument #1 to 'sub' (string expected, got nil). Either way, something needs to be fixed either in the module or in the entry. There's no excuse for a module error going unattended this long- you should always monitor CAT:E for a day or two whenever you work on modules. Chuck Entz (talk) 05:48, 23 February 2022 (UTC)

I had missed one “nosco” verb when dealing with its special cases. No change to the module needed. Theknightwho (talk) 14:45, 23 February 2022 (UTC)

The naming and placement of Early Scots

Latest comment: 3 years ago10 comments4 people in discussion

I agree with you that treating Early Scots as a variety of Middle English is confusing, but it is more reflective of the linguistic reality than treating it as a variety of Scots. To avoid confusion, you could call Early Scots "Scottish Middle English", but I believe that would create confusion of a graver kind, because "Early Scots" is by far the most common term for this variety. Given this, I believe the best option is to put Early Scots sco-osc under Middle English enm; if it helps you, there's at least two clear-cut parallel situations in the system of linguistic nomenclature we employ:

Anglo-Norman (xno) is treated as a variety of Old French (fro), not Norman (nrf).
Palatine German (pfl) is treated as a variety of Rhine Franconian (gmw-rfr), not German (de).

If you believe you have a idea that can nicely resolve this knotty situation, I'll be happy to hear it. Hazarasp (parlement · werkis) 16:27, 4 April 2022 (UTC)

@Hazarasp So I agree with you that we need to sort this out, and I think much of this confusion is coming about because of the fact that Wiktionary uses "Early" as one of its standard qualifiers to denote a period within a particular L2, which doesn't apply to Anglo-Norman or Palatine German. There's also the fact that Early Scots is the direct forerunner to Middle Scots (which we definitely do categorise as part of Scots), which is another difference that doesn't apply to your two examples.

What you propose leaves us in the bizarre situation of calling it "Early Scots" and agreeing that it is the direct ancestor to Scots, while saying that it isn't Scots. I'm not sure that there are any other examples like that on Wiktionary.

I think a lot of this comes down to the fact that I'm not sure that I agree that Early Scots is simply ME as spoken in Scotland, as the cut-off we're using is 1450, but there was divergence well before this point. Theknightwho (talk) 17:08, 4 April 2022 (UTC)

Early Scots is a direct descendant of Northumbrian Old English. Why not treat such chronolects as part of an Anglic family instead of part of random fullfledged languages? ·~ dictátor·mundꟾ 14:58, 6 April 2022 (UTC)

@Inqilābī It is, just as Old French and Old Spanish are direct descendents of Vulgar Latin. It would be just as perverse to treat a term from the 9th century in either of those languages as Latin, though. The question is whether Early Scots should be treated as a chronolect of Middle English or Scots; I am not proposing that it gets its own L2.

This fundamentally boils down to the issue that the distinction between languages is not obvious because there is rarely a clear breaking point, but it is ridiculous to say that Early Scots only became Scots once it developed into Middle Scots. We don't treat any other language that way, or indeed anything else. Neither of the two examples given (Anglo-Norman and Palatine German) follow that model, as they are off-shoots rather than direct ancestors to the modern language.

(Maybe this issue is irritating more than it normally would because it's become frustrating to see people treat Scots as just a dialect of English, and while I appreciate that this is somewhat different, it's definitely adjacent to it.) Theknightwho (talk) 16:19, 6 April 2022 (UTC)

The treatment of Scots as a fullfledged language is itself a little controversial. Scots (whether Modern or it earlier forms) is basically the group of Anglic lects spoken in the territory of Scotland, a political division. It would make more sense either to treat Scottish and Northern English lects together as one language (following the isogloss of Great Vowel Shift), or to treat all Anglic dialects as English. The way we define a language on Wiktionary is influenced by the sociolinguistic factors: other examples include the treatment of ‘German’ Low Saxon and ‘Dutch’ Low Saxon as different languages based on political divisions, the treatment of some Eastern Indo-Aryan lects (Sylheti, Chittagonian, Chakma, etc.) as fullfledged languages and not part of (East) Bengali, etc. So, my concern is that until we abandon our sociolinguistic treatment of language in favour of a diachronic treatment, this issue shall ever remain.

I think we should start with the creation of Category:Anglic languages, and various chronolects (like Early Scots, Middle Scots) should be made subcategories thereof. Unifying modern Anglic dialects also solves the problem of the existence of the same word in the original lect and the acrolect, that is to say, the existence of the following two categories: CAT:Scots lemmas & CAT:Scottish English. ·~ dictátor·mundꟾ 20:20, 6 April 2022 (UTC)

@Inqilābī Certainly when it comes to the modern day, Scots and Scottish English are simply not the same thing. There is a very long history of Scots being erased with that kind of logic, and I would oppose any attempt to continue that here. Scots has only partial mutual intelligibility with English, and is about as distant from it as Ukrainian is from Russian, though I can't imagine you would support a similar merger of those.

It is important to remember that Wiktionary is descriptive, not prescriptive, and that the recognition of a language as a language is dependent on social, political nad historical factors as well as linguistic distance.

I also have no issue with the categorisation of language families, but that is something that should be done on a site-wide basis, and not as a way to euphemistically brush Scots under the rug. Let's just stick with the model used with literally every other language on the site: the early and middle forms of the language are either given their own L2, or are categorised as chronolects of the titular language. Theknightwho (talk) 20:48, 6 April 2022 (UTC)

Not ‘every other language on the site’: consider our treatment of Chinese for example, where all Sinitic dialects and even chronolects are treated under a single language header. Mutual intelligibility is not a necessary factor. If Scots is treated as a separate language, then Northern English should also have been treated as a fullfledged language, but it's treated as English. ·~ dictátor·mundꟾ 21:04, 6 April 2022 (UTC)

@Inqilābī There is lengthy ongoing project to separate out Chinese "dialects" (languages).

Plus you still seem to be under the impression that Scottish English and Scots are the same thing, but they aren't. Nobody is proposing that Scottish English should have its own L2. However, as before, I invite you to make a similar merger proposal of Ukrainian and Russian, because it makes about as much sense if you think that linguistic distance is all we should be considering. Theknightwho (talk) 21:08, 6 April 2022 (UTC)

I think you'll have better luck finding parallels on the Iberian peninsula, where you have languages that split, with one of the daughters becoming dominant. Really, though, the basic problem is that the terminology became set in stone before anyone took Scots seriously as a language. This would be solved by making Middle English "Anglo-Scots" and Old English "Old Anglo-Scots", but that has zero chance of happening. English is a massive and dominant world language, while Scots is a regional lect that's not taken seriously even by most of its native speakers. It may be unfortunate, but there's not much that can be done about it.

As for Scottish and Northern English: my take on it is that Scots had already split off, with its own standard orthography, but then it was forcibly dragged back into being a dialect of English everywhere except in some of the more remote or inaccessible areas that were out of reach. Sure, you can find isoglosses that unite Scottish and Northern English with Scots, but the bulk of the first two is more like the rest of English than it is like Scots. The only reason Scots-speakers understand it as well as they do is because they're all bilingual. Chuck Entz (talk) 03:38, 7 April 2022 (UTC)

@Chuck Entz I don't disagree with any of that, really! Theknightwho (talk) 14:13, 8 April 2022 (UTC)

Looks like someone noticed you

Latest comment: 3 years ago2 comments2 people in discussion

. 70.172.194.25 03:05, 27 April 2022 (UTC)

I'll take that as a compliment, quite honestly! I didn't think anyone would care haha. Theknightwho (talk) 03:11, 27 April 2022 (UTC)

Tangut data

Latest comment: 3 years ago7 comments2 people in discussion

Hi, if it would be useful for adding Tangut entries to Wiktionary, you are very welcome to make use of my (currently private) Tangut database, which includes data on reconstructed readings, IDS descriptions, character constructions, etc. You may download the latest version from TangutDatabase.xlsx. BabelStone (talk) 21:30, 27 April 2022 (UTC)

@BabelStone Amazing - thank you so much. I've been compiling my own spreadsheet, but this is a massive step up so I can't thank you enough. It's so difficult to find quite a few of the source texts. Theknightwho (talk) 21:44, 27 April 2022 (UTC)

If you need any help with source texts just ask. I have quite a large digital collection of Tangut texts, and I have been slowly adding transcribed and punctuated versions of Tangut texts to Wikisource 𗼇𗟲 (see my user page for a list of texts that I have added). BabelStone (talk) 23:20, 27 April 2022 (UTC)

@BabelStone That would be absolutely wonderful - thank you.

Do you know if there's an English translation of Kepping's Тангутский язык: Морфология anywhere? I've been trying to get to better grips with the grammar of Tangut, but a lot of English-language papers assume knowledge that's locked behind a Russian language barrier for me. Theknightwho (talk) 13:27, 28 April 2022 (UTC)

Yes, Alan Downes translated it into English recently, as a follow-on from his PhD dissertation, but it is not yet ready for publication. If you ask him, he may let you have a draft version of the translation. In my opinion, Tangut grammar is by far the hardest aspect of Tangut language to master (or even get a basic grounding in), and none of the available overviews of Tangut grammar are terribly helpful in this respect. Other English language resources you could try include Nishida Tatsuo's "Outline of the Grammar of the Hs-hsia Language" (1964) and Shi Jinbo's Tangut Language and Manuscripts: An Introduction (2020). BabelStone (talk) 14:20, 28 April 2022 (UTC)

Thanks - I had found Alan's website, so that's very promising. I will drop him a message. Thanks also for the recommendations (and I will see if I can find a copy of Shi Jinbo's book).

Frankly, it feels like every aspect of Tangut is extremely difficult to get to even basic grips with! I still haven't really got my head around the verb system at all, particularly with the directional prefixes. Theknightwho (talk) 14:50, 28 April 2022 (UTC)

The price of the Shi Jinbo book is so exorbitant that it is out of the reach of most of its target readership, but if you email me I might be able to help out. BabelStone (talk) 16:45, 28 April 2022 (UTC)

Adding locative forms of Latin adjectives

Latest comment: 3 years ago8 comments2 people in discussion

Hi, I was curious about the recent addition of locative forms to the declension tables for some Latin adjectives, such as abrelictus. From what I can tell, it's quite rare in Latin when a noun is modified by an adjective for them to be put in the locative case: several grammars state that the usual construction is instead to put the noun and adjective into the ablative:

Latin Grammar, Albert Harkness (1881): "Instead of the Locative in names of towns the Ablative is used, with or without a preposition— 1) When the proper noun is qualified by an adjective or adjective pronoun: In ipsā Alexandrīā, in Alexandria itself. Cic. Longā Albā, at Alba Longa. Verg."
Concise Latin Grammar, Benjamin Leonard D'Ooge (1921): "The locative domi may be modified only by a possessive adjective or by a noun in the genitive; when it would be otherwise modified, the ablative with in is used instead."

Based on these statements, I'm confused about the reasons to include locative forms in the declension table of normal (non-possessive) adjectives. Are words like abrelictus attested in the locative? If so, where?--Urszag (talk) 15:26, 5 May 2022 (UTC)

@Urszag - Wikipedia has quite a good explanation here (though admittedly it doesn't deal with adjectives directly). I have noticed a tendency (particularly in 19th century academia) to generalise over Old Latin. You see the same sort of issue with the sigmatic forms of verbs, too, where they get (incorrectly) described as future perfects. Theknightwho (talk) 15:40, 5 May 2022 (UTC)

Thanks for the link! As you say, that section doesn't talk about adjectives ... it says "The Latin locative case was only used for the names of cities, 'small' islands and a few other isolated words", indicating a limited use, as reflected on Wiktionary by the locative case row only appearing for certain words, not all. To clarify my question, I think it's clear that we can apply the regular rules of locative formation to derive the locative forms abrelictī/abrelictae/abrelictīs; but it's not clear to me whether these are purely hypothetical, or actually attested in use as locatives. If they are attested, in whatever stage of the language, I'm hoping we could add some cited examples of how they are used. If they are hypothetical, I don't understand the point of including them (and why for this adjective but not others): is the envisioned use case something like "abrelictae Rōmae" to mean "at abandoned Rome"? From what I can tell, that is either outright ungrammatical, or at best, dispreferred in comparison to the alternative construction "(in) abrelictā Rōmā". Therefore, I think a user of this dictionary is unlikely to either encounter a locative form of "abrelictus" or be in a situation where it will be advisable to use one; so I can't figure out the benefit of having the form listed. (Anyone who for some reason really needs to know what the locative of abrelictus is would be able to follow the same rules the inflection module does to derive the correct form.)--Urszag (talk) 16:19, 5 May 2022 (UTC)

@Urszag I do agree with you that it would be ungrammatical in the Augustan period, and our coverage of Old Latin is pretty patchy in general (arbitrarily covering some endings but not others etc.). I'm fine to remove them except for the handful of possessives and alienus. Use in Old Latin needs a bit more careful thinking, as I've come at this in a far more piecemeal way than I did with the sigmatics, which I made sure to only add where I could see a clear attestation, and the verb conjugation table has an explanation. I'd need to do something similar here. Theknightwho (talk) 16:24, 5 May 2022 (UTC)

An approach like that would make sense to me. I'm not familiar with how the locative was used in Old Latin, or how it may differ from other periods. (I know that the locative was more widely used in some other Italic languages, such as Oscan and Umbrian.) If there are Old Latin uses of adjectives in the locative, it seems valuable to mention them in the entry for that adjective; but given that Old Latin spelling and grammar may be different from that of later periods, giving a citation of the specific form and context where it is attested seems useful.--Urszag (talk) 16:39, 5 May 2022 (UTC)

@Urszag - What do you think of the approach at facio? Theknightwho (talk) 16:43, 5 May 2022 (UTC)

It looks good! I appreciate the work you've done to add more documentation of Latin verb forms to Wiktionary. I believe forms spelled with "cs" such as "facsint" are also found, but I'm not sure what the best way to handle spelling variants is here or in general (from what I understand, pretty much any Latin word containing "x" could potentially have a spelling variant with "xs" in some texts).--Urszag (talk) 18:12, 5 May 2022 (UTC)

@Urszag You may be right, though facio is hugely overrepresented in mentions, and quite a few of the others are seen only a very small number of times (though the form no doubt existed in more widespread use, just before most Latin was written down). We only have 4 verbs with known sigmatic future passive forms at all (plus the deponent mercor which uses it in the active), so we can only hypothesise that there must have been endings such as -ēssor (2nd conj.) and -īssor (4th conj.) to complement -āssor (1st conj.) and so on.

In any event, I'll do something more rigorous with the locative at some point. Theknightwho (talk) 18:33, 5 May 2022 (UTC)

Formatting for Latin words with multiple pronunciations

Latest comment: 3 years ago2 comments2 people in discussion

Hello again! It looks like there are currently different methods being used to indicate multiple pronunciations for Latin forms with distinct vowel lengths, such as the nominative and ablative singular of first-declension nouns ending in -a, so since you have been editing this type of entry, I wanted to bring that to your attention and discuss which method you think is preferable.

After I made a request for cleaning up definition lines of the form "vocative of X" (where the nominative and vocative were completely the same in both spelling and pronunciation), This, that and the other kindly created and worked on a list of entries where this applies and formatted them as seen in the current revision of summa. As I said in that discussion, I think that the declension table probably provides sufficient indication of the form of each case-number combination that shares its spelling with the lemma, without noting that anywhere else on the page. However, the declension table only provides spellings, not pronunciations. The differing pronunciations corresponding to the spelling have been indicated in this entry by placing them both under a single "Pronunciation" header that has multiple bullets (I don't love the way that this formatting uses up four lines of vertical space when the two lines for Ecclesiastical pronunciations will always be the same for nouns of this class, but I'm not sure I can think of an improvement--it would be more technically difficult, but maybe it would be better to display both pronunciations on the same line? E.g. something like

(Classical) IPA(key): summa /ˈsum.ma/ ; summā /ˈsum.maː/
(Ecclesiastical) IPA(key): summa, summā /ˈsum.ma/

?) Do you think that the current format or something like the suggested modification above seems sufficient to convey all necessary information, or do you think it's better to instead have a separate line lower down on the page that uses the inflection-of template to generate "ablative singular of X", as in the current entry for acceia?

If it turns out that there is a consensus for the second option, there could possibly be an advantage to adjusting the headers used. Apparently, multiple pronunciation headers can be difficult for formatting bots to handle. I think it may be worth asking the Beer Parlour for advice, but I'd guess it might be more convenient to use an empty Etymology 2 header with a second "Pronunciation" header beneath it.

Sorry to bother you about this kind of technical nonsense, but I hope discussing it now will help to save work later if one or the other set of pages need to be redone.--Urszag (talk) 00:54, 11 May 2022 (UTC)

@Urszag Hiya - thank you for raising this, as I think we probably do need to have a discussion about how these are handled, as it's a quirk of Wiktionary's approach to Latin that they aren't hived off to separate pages and treated in their own way there (which I agree with, for the record).

In an extremely formal sense they do technically have separate etymologies, because it is merely coincidental that the nominative and ablative forms happen to be spelled with the same letter. However, if I were to apply that logic to its extreme, the same argument could also be made for the nominative and vocative endings, which are completely identical. The appropriate place to deal with that issue is clearly -a.

I like your suggestion of keeping the pronunciation section on two lines but placing them side by side (though I'd suggest the tiny change that the Eccelsiastical line should display "summa and summā". The template already does something similar when a macron and caron are both used on the same letter (when pronunciation is uncertain, such as with acadēmī̆a):

(Classical Latin) IPA^(key): ,

I do, however, think we need to put two different noun headings under that etymology, because otherwise we aren't actually giving the -ā form a definition at all. Of course it's there in the declension table, but it's very easy to miss. In any event, we should retain consistency with the approach on non-lemma forms, where we already do this.

Theknightwho (talk) 14:28, 13 May 2022 (UTC)

impluit

Latest comment: 3 years ago1 comment1 person in discussion

Since I noticed you were making non-lemma entries for Latin verbs recently I thought I'd mention that this one has little or no form entries created yet, so you can go ahead and tackle that if you want. Acolyte of Ice (talk) 08:31, 20 May 2022 (UTC)

A belated thanks

Latest comment: 3 years ago2 comments2 people in discussion

I just now noticed you were responsible for adding the parameter to Template:zh-forms for second-round simplifications. Thank you so much! It makes the template so much cleaner than with the ugly little labels that were otherwise required! 104.246.222.113 05:20, 30 May 2022 (UTC)

No problem at all! I've been meaning to work on that a little further at some point, as there are some other forms that could probably do with their own treatment as well. Theknightwho (talk) 09:51, 30 May 2022 (UTC)

ISO 3166-2

Latest comment: 2 years ago2 comments2 people in discussion

I noticed that you've recently done a lot of good work regarding language ISO codes including ISO 3166. What do you think about my proposal to include ISO 3166-2 (subdivision) clippings? See User:Fytcha/FR for an (outdated) sample. — Fytcha〈 T | L | C 〉 16:11, 21 June 2022 (UTC)

@Fytcha Hiya - I'm supportive of this. The template {{ISO 3166}} has room for expansion to deal with this, as the first parameter specifies which part you want to refer to, so it could take the format e.g. {{ISO 3166|2|CH|Fribourg}}. I need to draw up proper documentation, because there are various parameters to deal with certain oddities, but I agree that it would be good to tap into WP's ready-built redirect system. In fact, I was dealing with that exact issue with ISO 639 (language codes) when you wrote your message.

My medium-term aspiration is to create {{ISO}} with a module behind the scenes, because that would allow us to be much more sophisticated in how we deal with these. For example, having look-up tables for country names, which would circumvent having to update every entry if it changes and so on. Theknightwho (talk) 16:21, 21 June 2022 (UTC)

Have you documented all your fancy templates yet?

Latest comment: 2 years ago3 comments2 people in discussion

love, -- your mum. Equinox ◑ 20:29, 21 June 2022 (UTC)

Ugh. Sorry mum. Theknightwho (talk) 01:16, 22 June 2022 (UTC)

That nice American young man, Charles, came over the other day and he was telling us all about his botanical interests, it was very nice. Did you know there are plants that smell like high schools, to attract incels, which they eat? Oh I hope you'll be home for Christmas. (P.S. I spent the entire afternoon documenting my database schema for my client, and I've only done about a third of it. You finally start to value yourself...) Yep do your documentation, I won't forget. Equinox ◑ 02:29, 22 June 2022 (UTC)

Theoretical Issue

Latest comment: 2 years ago4 comments2 people in discussion

Hey! I want to sincerely thank you for all your recent work on transliterations. I have been trying to work up to getting something like this set up for years and years. Thanks so much for your work. I would praise you more, but I don't want to seem silly. Also, you don't want to be associated with a crazy person like me.
Now, I was looking at diff and I just wanted to comment that IF (if) Shanghai is derived from Wade-Giles (it may be, idk-- I haven't looked into the oranges of this word in particular) THEN I would say that Hanyu Pinyin is not the origin point of the word. Neither is Tongyong Pinyin. To me, the origin, the etymology, is telling us "where it came from". So if- if- the word 'Shanghai' is from Wade-Giles, then Hanyu Pinyin and Tongyong Pinyin and etc. are irrelevant to the origin of the term. My guess is that the term 'Shanghai' will predate the earliest form of Wade-Giles (the 'Wade'-only era of Wade-Giles) which reaches back to 1868ish and hence neither Wade-Giles nor Hanyu Pinyin nor Tongyong Pinyin are part of the origin of the word- they are all "johnny come lately" systems piggy-backing off of the ancient romanization systems. idk bro --Geographyinitiative (talk) 19:15, 25 June 2022 (UTC)

@Geographyinitiative You make a good point, and I did think about this. I suppose it's one of those strange situations where we can assume that it is very likely that if Hanyu Pinyin differed from Wade-Giles that the common English spelling would have followed suite - just like with Beijing. Then again, you're right that it may be that the direction of causality is in the other direction, with spellings such as Shanghai affecting what decisions were then made about the later romanisations. Theknightwho (talk) 19:19, 25 June 2022 (UTC)

That's exactly what happened- Wade-Giles is just as much a "new" system as Hanyu Pinyin and Tongyong Pinyin. Early Wade was playing into the pre-existing mélange of the early 19th century, which is borne out of French, Portuguese, etc transcriptions. They are all stacked on the original romanizations done by some ancient Jesuit priests. Nanchang and Yunnan are some other examples. Idk. --Geographyinitiative (talk) 19:23, 25 June 2022 (UTC)

I think with Shanghai it's probably best to simply mention Hanyu Pinyin without actually saying that the term is derived from it (though I'm unsure when the term what actually adopted and whether WG is the origin). There's probably scope for doing something more extensive, but it would involve digging into the origins of the romanizations themselves, and I don't have the expertise to do that. Theknightwho (talk) 19:28, 25 June 2022 (UTC)

Thoughts

Latest comment: 2 years ago2 comments2 people in discussion

I personally like the 'nonstandard' alt form stuff you are adding to the pinyin pages. Those pages are in dire need of a clean-up- many of them have characters that are not pronounced with the pinyin of the entry. This is some ancient problem that was never fixed. There ought to be a mechanized/programming way to fix it. cf. [[lū ]] 謢 --Geographyinitiative (talk) 16:49, 28 June 2022 (UTC)

@Geographyinitiative Thanks - I think what happened is that there was a mass import of the main CJK Unified Ideographs block back in the early 00s, which used Unicode's pronunciation data. Much of that Unicode data has since been deprecated, and a lot of it still contains errors. It's a complete mess. Theknightwho (talk) 16:54, 28 June 2022 (UTC)

Corr pinyin

Latest comment: 2 years ago3 comments2 people in discussion

Re this edit, shōng is not correct pinyin (see pinyin table). That's what "corr pinyin" refers to in the edit summary. AjaxSmack (talk) 23:10, 30 June 2022 (UTC)

@AjaxSmack See Talk:Pinyin_table#shong. I'm the one who removed "shong" from that table in this diff. Theknightwho (talk) 23:15, 30 June 2022 (UTC)

Thanks. I think you made my case for me. AjaxSmack (talk) 23:34, 30 June 2022 (UTC)

嫇奵 and 方寧

Latest comment: 2 years ago3 comments2 people in discussion

Re: RFVCJK. Are you just verifying the Mandarin pronunciations, or are you also doubting the existence of the word? Pronunciation verification should be done with {{rfv-pron}}, with discussion at WT:TR. — justin(r)leung _{{ (t...) | c=› }} 16:35, 4 July 2022 (UTC)

@Justinrleung Just the pronunciations. I'll move them over. Theknightwho (talk) 16:37, 4 July 2022 (UTC)

Great, thanks. — justin(r)leung _{{ (t...) | c=› }} 16:38, 4 July 2022 (UTC)

Babel

Latest comment: 2 years ago1 comment1 person in discussion

Since you are working a lot with Chinese languages, it would be much appreciated if you indicate your language level of Chinese varieties on your Babel box. — justin(r)leung _{{ (t...) | c=› }} 16:40, 4 July 2022 (UTC)

Nonstandard pinyin spellings

Latest comment: 2 years ago2 comments2 people in discussion

Hi, I see you reverted my entries for nonstandard pinyin spellings. Can you explain why they were removed? I looked at the wiktionary info page and it says "monosyllables with no tone mark" are allowed. I noticed that nonstandard spellings are included for many pinyin words and have not been removed. For example see these search results. OjdvQ9fNJWl (talk) 23:11, 9 July 2022 (UTC)

@OjdvQ9fNJWl You added mama and baba, which are both two syllables (i.e. not monosyllabic). The results you link are all monosyllabic, and a lot of them are standard entries that have links to the toneless form. There are also a handful of other entries that are nonstandard in other ways, but I don't see any polysyllabic pinyin without tones. Theknightwho (talk) 23:19, 9 July 2022 (UTC)

I just noticed it says monosyllabic. Thanks for the clarification. OjdvQ9fNJWl (talk)

Kugel

Latest comment: 2 years ago1 comment1 person in discussion

You could simply revert because it's not correctly formatted and because we don't do reconstructions based on a single reflex - so that they don't continue to look embarassing.

This is unfair, because it does not look like OR and may very well be sourcable (texas' pie lexicon, maybe), but that's usually not enough to make a reconstruction entry, which is needed to format correctly and without permanent red-links. Even if it (the PIE rootz) needs to be there (in the main entry), it would be better served at the lower nodes. This resolves the stale mate. The rest of the entry is a mess as well.

If they, like me, don't have the material to deal with MHG, they have hardly any business adding etymologies from books that they can only judge by the cover. Needless to say I can't agree with the reconstruction. ApisAzuli (talk) 03:31, 10 July 2022 (UTC)

Cantonese readings

Latest comment: 2 years ago3 comments2 people in discussion

青陽 and 相山 are unlikely to be correct and do not inspire great confidence. Please stop. —Fish bowl (talk) 04:34, 12 July 2022 (UTC)

In addition, can you confirm that 凤台, 台江, 石台 etc. use 台 and not 臺 in trad. Chinese? —Fish bowl (talk) 04:38, 12 July 2022 (UTC)

I haven't done a thorough check, but those do seem to be the forms in use. I am seeing evidence for 臺 being in use with all three in some publications (though not so much 臺江區), but unsure if that should take the primary entry with any of them. Theknightwho (talk) 05:07, 12 July 2022 (UTC)

New Area

Latest comment: 2 years ago2 comments2 people in discussion

Hey, you're doing a good job out there. I am thinking about expanding the footprint of Wiktionary's English to more provinces or in more detail. Let me know if you are interested in any particular type of expansion or any region or have any ideas. Geographyinitiative (talk) 15:46, 13 July 2022 (UTC)

@Geographyinitiative Thank you! My current aim is to get every county-level division entered, as they're clearly of genuine importance (to some people, anyway). Where there are multiple places, I've been trying to list in order of importance.

One thing that would be good to have is a consistent way of dealing with ancient names that have described the same area for a long time, but may have been attached to several different administrative entities over the years. Grouping them all together isn't always appropriate, but equally it feels silly to list every incarnation, as that implies they're unconnected. Theknightwho (talk) 15:52, 13 July 2022 (UTC)

ar:Suf/Ningxia

Latest comment: 2 years ago3 comments2 people in discussion

Hey, great work out there. My question for you is this: I noticed you made something special for Tibet where "ar:Suf/Tibet" produces "Tibet Autonomous Region". I'm thinking about working on expanding Ningxia coverage on Wiktionary, and I would feel better about it if ar:Suf/Ningxia produces "Ningxia Hui Autonomous Region" or similar. (1) I don't know how to do it but (2) even if I did know how, not sure if it will be a good idea. Let me know if you have any thoughts on this. --Geographyinitiative (talk) 13:43, 21 July 2022 (UTC)

@Geographyinitiative Hiya - so adding ":suf" after the entity type (i.e. "ar" in this case) adds it after the name, and putting ":Suf" capitalises it. ":pref" and ":Pref" work in the same way. You could put "p:Suf/Shandong" and it'd say "Shandong Province", for example. The tricky thing for Ningxia is that "Hui autonomous region" isn't a special entity type, and I'm not sure it's a good idea to add a bunch of special entities for all of the various ethnic minorities, as I wouldn't say those are the most common name. Tibet is a bit of a special case, as there is obvious and widespread international disagreement on the legitimacy of the Chinese regime there.

Is there a particular reason for wanting to do the same for Ningxia? Theknightwho (talk) 13:50, 21 July 2022 (UTC)

Thanks for your reply & helpful explanation. I personally feel that merely writing "Ningxia" gives one the impression that "Ningxia" is just another province of China, like Shandong or Jiangxi, whereas writing the full official name (with "Hui") gives you the idea that this is a specially-designated place within the administrative structure of the PRC. If Wiktionary writes "Ningxia" only, it's just another cog, another geographical entity to fit into the cookie-cutter notion of "just another administrative division". Giving the full name feels more authentic, more interesting, more professional, and more respectful generally.
But maybe not. Anyway, I will proceed with what I was going to do regardless of this issue. --Geographyinitiative (talk) 13:56, 21 July 2022 (UTC) (modified)

H-Hold on...

Latest comment: 2 years ago5 comments2 people in discussion

I edited the page prae- by analogy with sub- and I didn't think it was a good idea to have a duplicate identical etymology on two different pages. The pages mână and mwin was edited because there was no consensus between linguists beyond Latin (see discussion). Catonif (talk) 23:08, 21 July 2022 (UTC)

@Catonif To be fair, prae- the prefix stands on its own merits, though we should probably mention how it came to be. This kind of etymology duplication is pretty common, though there should be something in the works to automate a lot of it (as obviously the current situation is less than ideal when they can get out of sync).

I hadn't realised there'd been discussion re the reconstructed forms. I would suggest that we mention the different theories, rather than removing them altogether. Apologies for being a bit gung-ho with it. Theknightwho (talk) 23:16, 21 July 2022 (UTC)

Ah yes, we should mention the different theories. I thought that it would be better for a multiple possibility etymology to be written only once in the Latin and/or Proto-Italic page, instead of copy-pasting it to all romance terms. I wanted to write it but got lost in reconstructed languages dictionaries, and in the end decided to remove the inconsistencies from the Romance entries and left the Latin entry alone for future adventurers to decipher. Catonif (talk) 23:26, 21 July 2022 (UTC)

@Catonif Hopefully this will eventually iron itself out, once we can start pulling through etymologies into descendants, in any event. It's one of those things we just haven't got around to yet. Theknightwho (talk) 23:33, 21 July 2022 (UTC)

So, uhm, am I allowed to re-revert your reverts on the manus pages? As it stands the entries disagree on the etymologies, and some of them are just wrong. Some mention meh2- (which means "good" and is unrelated), some (s)meh2- (which means "to beckon" and is unrelated), some mon- (which de Vaan actually mentions, but it supposed to mean "neck" and here it links to "man"), and some meh2r (which is actually one of the theories, even though here it's a weirdly formatted redlink). Why keep this mess on the romance entries?

Oh about prae-, can't we do something like "Akin to prae. See there for more."? Sorry to bother you with these word-specific questions, I'm asking this to have an example on what I should do elsewhere. These repetitions get messy and gain incongruences, so I don't see why we'd prefer them. Catonif (talk) 14:13, 22 July 2022 (UTC)

Retropinging

Latest comment: 2 years ago3 comments3 people in discussion

Adding a ping after the fact is totally useless. See Wiktionary:Beer_parlour/2021/June#A Primer on Proper Pinging, where I explained it more fully. Chuck Entz (talk) 04:55, 6 August 2022 (UTC)

Doesn’t it work if you re-do your signature? Theknightwho (talk) 10:08, 6 August 2022 (UTC)

It does not. You can enable notifications for mentions (pings) if you want to know when it works. J3133 (talk) 10:14, 6 August 2022 (UTC)

Check this out- Shanghai

Latest comment: 2 years ago2 comments2 people in discussion

Hey, I'm trying to drum up interest in determining the origin of the English language word 'Shanghai' here: Scriptorium Thread, (see Citations:Shanghai for some early cites I found, as well as Citations:Shang-hai and Citations:Shanghae). Why? Because (1) it's objectively not Wade or Pinyin, (2) it's an important, interesting word known to most native English speakers, (3) we just do not even know when/where/how it came into English. If you have any thoughts or additional variants or similar, go for it my man. --Geographyinitiative (talk) 22:36, 10 August 2022 (UTC)

@Geographyinitiative Sorry for the slow response! I will have a think about this. Theknightwho (talk) 20:51, 3 September 2022 (UTC)

CAT:E

Latest comment: 2 years ago3 comments2 people in discussion

Hey! Your changes to Template:mn-proper noun have caused some errors on some Cyrillic Mongolian pages. — Fytcha〈 T | L | C 〉 13:09, 19 August 2022 (UTC)

@Fytcha Hiya - thanks for the heads up. I had checked a few lemmas that would be affected, but a handful seem to have followed a different format for some reason. All sorted, anyway. Theknightwho (talk) 13:16, 19 August 2022 (UTC)

Thanks for taking care of it! — Fytcha〈 T | L | C 〉 13:24, 19 August 2022 (UTC)

temporizedst, stumpedst

Latest comment: 2 years ago6 comments2 people in discussion

Errrr I never thought I'd have to ask you this, but are these words according to anybody's standards, or are you just randomly attaching st to things? Equinox ◑ 14:13, 2 September 2022 (UTC)

I've been working on {{en-conj}} (see below). I've been taking the approach Latin does of not being super worried about attesting inflections so long as the word actually existed at the time - we can certainly attest temporizeth, for example. It's just weird to include some inflections but not all without any evidence to suggest it was defective, as that's just misleading. Yes, it's English not an LDL, but historical stuff is always a bit different.

Conjugation of *temporize*
	present tense	past tense
infinitive	(to) temporize

1st-person singular	temporize	temporized
2nd-person singular	temporize, temporizest^†	temporized, temporizedst^†
3rd-person singular	temporizes, temporizeth^†	temporized
plural	temporize	temporized

subjunctive	temporize	temporized

imperative	temporize	—

participles	temporizing	temporized

^† Archaic or obsolete.

Theknightwho (talk) 14:19, 2 September 2022 (UTC)

Template is nice. It's also quite possible to say "if anyone had ever used the past participle of the verb, it would have been this, but we cannot find it". I'm not gonna be so petty as to RFV these (probably) but I did wonder what you were up to. Equinox ◑ 16:00, 2 September 2022 (UTC)

@Equinox That's a good idea. Nightmarish for something like Latin inflections, but given the small numbers involved with English it'd work. Theknightwho (talk) 16:34, 2 September 2022 (UTC)

My only serious concern with automating this stuff at all is that we might end up creating archaic forms for verbs that didn't exist in that time (thou defragmentedst the hard drive). Equinox ◑ 23:50, 2 September 2022 (UTC)

Agreed - the old forms are opt-in with old=1, which should hopefully prevent that. Theknightwho (talk) 23:54, 2 September 2022 (UTC)

Admin (don't panic)

Latest comment: 2 years ago4 comments3 people in discussion

As a follow-up to this discussion, really : do you want the vote? You spend a lot of time around here, and you seem generally fair-minded and logical, and to know what you are doing. (I know you had an angry "Equinox moment" today where you yelled at Dan, but that's atypical. And nobody is trying to deop me yet. Ha.) Equinox ◑ 23:34, 16 September 2022 (UTC)

@Equinox I will think about it! I may have ruffled too many feathers at the moment, but it would certainly be very useful to be able to nab vandals, and there are a fair few templates I've needed permission for and so on. Theknightwho (talk) 00:52, 17 September 2022 (UTC)

Ooh, Eq, I wanted to nominate him! Almostonurmind (talk) 20:30, 17 September 2022 (UTC)

Uhoh. Theknightwho (talk) 20:43, 17 September 2022 (UTC)

Moved page ´ to ◌́

Latest comment: 2 years ago3 comments2 people in discussion

Hello @Theknightwho. I saw that you moved the pages ` to ◌̀ and ´ to ◌́ . It broke the old links in some of the Macedonian sections. I tried to fix them, I had success for ◌̀ but not for ◌́ . Can you please edit the Macedonian "See also" section and try to fix the link from ´ to ◌́ ? Thanks. Gorec (talk) 15:38, 17 October 2022 (UTC)

@Горец Hiya - I've fixed the link. Putting a colon at the start of the parameter in the link template overrides the removal of diacritics. You also need to prevent the colon from displaying by using the second field for alternate display. Theknightwho (talk) 15:46, 17 October 2022 (UTC)

Thank you! 👍 I see, I tried different options but none of them worked. Thanks for the clarification. Gorec (talk) 16:08, 17 October 2022 (UTC)

Bolding of clippings etc.

Latest comment: 2 years ago3 comments2 people in discussion

Did you go ahead and make this change, after discussion on Discord? I just noticed it at the "clipping" sense at mongo. I don't like the new formatting. The entire line should be in italics (as it used to be), to indicate this is a "non-gloss" or whatever we call it, and it's not a noun that means "a clipping of something". Should have been a vote... Equinox ◑ 15:19, 9 November 2022 (UTC)

I haven’t touched it! Agree that anything like that should be put to a vote. Theknightwho (talk) 16:04, 9 November 2022 (UTC)

@Equinox I've just spotted that it's because it was using {{clipping}} (an etymology template), not {{clipping of}}. I've corrected it. Theknightwho (talk) 16:51, 9 November 2022 (UTC)

Speedy deletion

Latest comment: 2 years ago2 comments2 people in discussion

Hi, please don't blank pages that you nominate for deletion. If they're misspellings, include a link to the right spelling in the rationale and they may be useful as (nonexistent) redirects. Ultimateria (talk) 03:37, 10 November 2022 (UTC)

I don't usually, but these were bad orthographies that we don't want (even as redirects), because they contain mistakes that we don't want to propagate. Noted re including the right spelling in the note, though. Theknightwho (talk) 03:41, 10 November 2022 (UTC)

Serbo-Croatian entry name normalization

Latest comment: 2 years ago3 comments2 people in discussion

Hello. One of your module edits has resulted in the unlinkability (via Module:links) of Serbo-Croatian pages whose title contain the character ć.

Examples:

{{m|sh|vruć}} currently generates the red link vruć instead of vruć.
{{m|sh|Janković}} generates the deceptive blue link Janković (to the English entry Jankovic) instead of Janković.

The diacritic on ć is a standard part of Serbo-Croatian orthography that shouldn't be stripped away from page names, unlike e.g. the diacritic on ȉ. The only reason I even noticed this was the module error on tisuća. Hopefully the problem is confined to this one character in one language, but I haven't tested extensively. Cheers, 98.170.164.88 09:21, 24 November 2022 (UTC)

Thanks - I thought I'd caught all of these! I've implemented a remove_exceptions parameter for entry_name, which excludes specific characters from having their diacritics removed. Currently, it will only work for precomposed characters (as I wanted to get it working ASAP), but I'll fix up a general solution shortly. Theknightwho (talk) 09:59, 24 November 2022 (UTC)

(As a side point, I was very confused as to why your links didn't fix once I made the change, and thought I'd made a mistake! Just realised that you've hard encoded the links haha.) Theknightwho (talk) 10:01, 24 November 2022 (UTC)

ийс

Latest comment: 2 years ago2 comments2 people in discussion

After doing hundreds of null edits to clear the results of an error you made yesterday in a massively transcluded module, I found an error at ийс that superficially defies explanation: "The current page name 'ийс' does not match any of the numbers listed in Module:number list/data/inh for 9. Check the data module or the spelling of the page." When I look at Module:number list/data/inh, the relevant part has:

numbers = { cardinal = "ийс", }

Without examining the character codes, I can't see any difference. Indeed, wikilinking the entry name, ийс and the string in the module, ийс, gives links to the same page stating that there's no match.

Going through the transclusion list at ийс, I see that the only recent edits were all by you, and they seem to have revolved around dealing with diacritics. Not so coincidentally, this is the only item in Module:number list/data/inh that has a diacritic.

I don't have the time nor the expertise to figure out what the exact problem is, so you're going to have to fix this. Thank you. Chuck Entz (talk) 19:31, 25 November 2022 (UTC)

@Chuck Entz Caught the issue. It was to do with the fact that I was decomposing all precomposed characters (which includes й) in order to strip the appropriate diacritics - which circumvents to have massive lists of precomposed characters for those diacritics you want to strip (which was contributing to memory usage, and is also a general PITA to maintain). Certain languages use dedicated modules for entry names (in this case MOD:inh-entryname). In those particular cases, what I'd omitted to do was recompose characters again. That usually doesn't matter, as the wiki software accounts for it automatically. However, as Lua doesn't natively support UTF8, it was comparing й with и + ◌̆ and declaring them to be different. Hence, the links worked, but they weren't being recognised as the same by the module. Theknightwho (talk) 20:27, 25 November 2022 (UTC)

sortkey changes

Latest comment: 2 years ago22 comments4 people in discussion

Hi, can you explain all your sortkey changes to Module:languages/data2? They have led to a bunch of errors in CAT:E related to Module:collation. Benwing2 (talk) 21:04, 27 November 2022 (UTC)

@Benwing2 It was the latest change to Module:languages that caused the error - I'll have to work out what the issue is, as it seems to be cropping up on a small percentage of pages using the column templates. I've reverted it for now. Theknightwho (talk) 21:06, 27 November 2022 (UTC)

What I meant is, what is the overarching purpose of these changes? Is it to save memory? If so are you sure it actually saves memory? Adding new modules tends to increase memory. Benwing2 (talk) 21:08, 27 November 2022 (UTC)

@Benwing2 Yes. It ensures that the sortkeys are only loaded if that language is actually used on the page. Theknightwho (talk) 21:09, 27 November 2022 (UTC)

OK. Please monitor CAT:E for memory-related issues once you finish making your changes, as their occurrence often doesn't follow obvious logic. Benwing2 (talk) 21:11, 27 November 2022 (UTC)

Yep, absolutely. Theknightwho (talk) 21:14, 27 November 2022 (UTC)

I think the problem is that you made it so that some conditional branches resulted in the function returning a nil value, which cannot be compared. 98.170.164.88 21:09, 27 November 2022 (UTC)

Yep - I see the issue. Silly mistake. Theknightwho (talk) 21:09, 27 November 2022 (UTC)

I see 16 pages in CAT:E with memory errors; these could be related to your changes if you pushed them live. Benwing2 (talk) 00:51, 29 November 2022 (UTC)

I'm looking into it. I'm seeing a reduction in memory usage on most pages, but these are odd outliers. Theknightwho (talk) 00:52, 29 November 2022 (UTC)

Probably related to how many languages occur on a given page; with your changes, lots of little modules are loaded, with one being loaded every time a sort key for those languages needs to be created (since they contain functions, meaning loadData can't be used), and module loads appear to have significant memory overhead. Benwing2 (talk) 01:12, 29 November 2022 (UTC)

I suspect you're right. I also suspect there are some horrors lurking in some of the language-specific modules, but it's such a massive task to start hunting for them.

I'm trying to find what the function loadData actually looks like, to see how they manage to share memory between different #invoke calls. Might give some insight into what approach we could take. Theknightwho (talk) 01:26, 29 November 2022 (UTC)

@Benwing2 I've done a nasty hack, by taking advantage of the package.loaded logic. It hasn't got rid of all the errors, but it has made them go away on an. The way that package.loaded works is that if a module has already been loaded via a previous invoke, then a key/val pair will exist in the package.loaded table - presumably allowing the module to be run again with a smaller footprint. What I've done is pre-load the module, and then set the val to the output (i.e. the sortkey). Any times the module is subsequently "run", it will just output the string (bypassing the module logic).

This only works due to the fact that a page will only ever have one sortkey for a given language. As a result, this fudge won't work for entry name functions. Theknightwho (talk) 02:02, 29 November 2022 (UTC)

Did you ever get your "hack" working properly? If not I think you should at least consider undoing all the sortkey changes as a failed experiment -- from what I've seen, they increase rather than decrease memory on the most memory-intensive pages, which are the only pages that matter for these purposes, because they result in lots of small modules getting repeatedly loaded on pages that use lots of languages. The reason those pages no longer appear in CAT:E is because IP 98.* has been diligently converting all the pages to use the *-lite templates, which is a big hack that we should avoid if possible. Benwing2 (talk) 03:20, 5 December 2022 (UTC)

I'm working on it! I'd rather not just roll these back, as quite a few were amended in the process for various reasons, so would need to be changed back manually. I'll have a look in more detail again tomorrow. Theknightwho (talk) 03:22, 5 December 2022 (UTC)

There are still scads of Latin entries popping up in CAT:E that have to be due to one of your module edits (but which one- who knows?). They go away after a null edit, but they make it hard to see real module errors. There's also the matter of added work for the servers while they propagate all this tinkering to the entries.

Rolling out something like this on such a massive scale without waiting to see what kinks need to be worked out is a very bad idea- there are literally millions of entries that could be affected by some of the edits you've done. I've blocked people for less. Chuck Entz (talk) 04:11, 5 December 2022 (UTC)

I've noticed the Latin issue too. The first time I saw it I went through and manually null-edited everything, which took a while due to hitting a ratelimit after every 10 or so entries. But that was probably about five days ago, and this is still going on. Is there a bot we can get to automatically null-edit these entries? 98.170.164.88 16:38, 5 December 2022 (UTC)

@Benwing2 I've done quite a bit of experimentation with this, and have implemented some memory savings. As I mentioned to 98 in the thread below, the issues with Lua 5.1's garbage collection make memory savings unpredictable, which means that we can't say a change is a failure just because it causes some pages to start throwing errors. What we aren't seeing are the load of other memory-critical pages which haven't started throwing errors, as they're now using 48MB instead of 49.5MB. It seems like there will always be casualties with any changes that we make, unfortunately. e.g. towards the bottom of this Phabricator thread, Surjection mentions a particularly ridiculous example, where the creation of the extra data modules for languages actually increased memory usage on some pages. I've also noticed that completely trivial changes to modules (e.g. swapping the order of two minor functions, with no change in result) will cause massive increases/decreases in memory usage on certain pages for no obvious reason. Theknightwho (talk) 00:15, 9 December 2022 (UTC)

Sorry to be a pain but do you have evidence that there are a lot of pages that have decreased from 49.5MB to 48MB? In this case quite a lot of pages increased their memory as a result of this change, and the number of pages using the 'lite' templates has significantly increased. I think just saying "it's counteracted by several pages that decreased their memory" is not a good response. I have in general refrained from sweeping changes that try to optimize memory for precisely this reason. Benwing2 (talk) 04:10, 9 December 2022 (UTC)

It's going to be quite difficult to prove, but I can try to put something together. Theknightwho (talk) 04:11, 9 December 2022 (UTC)

Maybe it would be good to have a daily (or more frequently updated) log of how much memory all the critical pages use, so we can track how that changes. 98.170.164.88 05:46, 9 December 2022 (UTC)

Please excuse the delay - I’ve been dealing with some real life stuff most of today, which has meant no time to look at this. Theknightwho (talk) 22:48, 5 December 2022 (UTC)

Module errors

Latest comment: 2 years ago1 comment1 person in discussion

Please add a return statement to the end of this function. Thanks. 98.170.164.88 21:05, 27 November 2022 (UTC)

`{{lb}}` broken

Latest comment: 2 years ago6 comments2 people in discussion

As of writing, {{lb}} doesn't generate any links or categorize entries as it should. You recently made some edits to Module:labels and its data submodules, which seem likely to be the cause. As soon as this change begins propagating, topical and regional categories will begin to depopulate, so I suggest reverting or otherwise fixing the issue ASAP.

Might I suggest that in the future you test your changes to widely-used modules in a sandbox first? 98.170.164.88 18:28, 7 December 2022 (UTC)

It was certainly working for several pages which I checked, but evidently not others (and I can already see why). Frustrating. I'll do more extensive testing in future. Theknightwho (talk) 18:37, 7 December 2022 (UTC)

FYI, prior to the revert I also noticed a bunch of pages in CAT:E with memory issues, including mi. Now the category is practically empty. I'm not sure whether your change is the culprit, as it seems it should have had the opposite effect, but I don't know how else to explain this. 98.170.164.88 18:49, 7 December 2022 (UTC)

The problem that keeps recurring is that a change might improve 20 memory-critical pages, but might make 10 others worse at the same time. I tried to find some sort of pattern with Erutuon, but we couldn't find one. Theknightwho (talk) 18:53, 7 December 2022 (UTC)

As of writing, CAT:E again has 3 mainspace entries in it (including mi), up from zero mainspace entries as of my previous comment, but down from the number I saw earlier today. I guess I'll change mi and na to use lite templates, and angel probably needs a translation subpage. 98.170.164.88 19:18, 7 December 2022 (UTC)

You're right. Frankly, what we need is an increase to the memory limit, but in the absence of that I'm going to keep looking for ways to reduce it. Theknightwho (talk) 19:39, 7 December 2022 (UTC)

more memory errors

Latest comment: 2 years ago38 comments5 people in discussion

We now have 20 pages once again in CAT:E with memory errors, probably resulting from your change to Module:scripts. I really don't see why you keep making changes like this; I would strongly recommend holding off on any more changes to core modules for at least several weeks. Benwing2 (talk) 05:13, 9 December 2022 (UTC)

@Benwing2 It's the beginning of the deprecation of {{zh-l}} et al, which is sorely needed. I have spoken to some of the Chinese editors about this. Theknightwho (talk) 05:16, 9 December 2022 (UTC)

Have you even reversed the sortkey changes yet? I have zero interest in fixing the existing memory errors, because it seems there's always more every single day due to some change. — SURJECTION ^{/ T / C / L /} 07:19, 9 December 2022 (UTC)

Not at this stage, and doing so en masse would introduce yet more unpredictability. Theknightwho (talk) 07:21, 9 December 2022 (UTC)

The only unpredictable thing is all the new changes. Before the sortkey changes, there were zero module errors. After them, there were dozens. It's like you're not taking this memory issue seriously at all. — SURJECTION ^{/ T / C / L /} 07:26, 9 December 2022 (UTC)

I completely agree. I feel now we should back out all the sortkey changes, forcibly if needed. And once again, please defer all further changes to core modules, including Chinese ones. Benwing2 (talk) 07:29, 9 December 2022 (UTC)

Obviously I am taking it seriously, which is why I am working to solve the issue. If we roll back all of the changes, that will also undo a large amount of work which did more than re-implement what was already there in a different format. Theknightwho (talk) 07:40, 9 December 2022 (UTC)

The practice should be that memory issues take priority above anything else. If there's so much as a single page in CAT:E that fails due to a memory error, there should be no changes whatsoever to core modules to add functionality. — SURJECTION ^{/ T / C / L /} 07:57, 9 December 2022 (UTC)

The memory issues are solvable. We should not be left in a position where it is practically impossible to add functionality. Theknightwho (talk) 08:02, 9 December 2022 (UTC)

Having memory issues on entries is unacceptable. Forcing some editors to use workarounds is not. — SURJECTION ^{/ T / C / L /} 08:03, 9 December 2022 (UTC)

Which is why I am working to solve the issue. Ultimately, it is not great that we are forced to use lite modules, but it is the situation that we are in. Theknightwho (talk) 08:11, 9 December 2022 (UTC)

You say that, but do a massive change that turns out to be a failure that manages to only increase memory usage (the whole sortkey thing), and then instead of working to reverse it, start working to integrate even more functionality into the core modules, which only exacerbates the problem. — SURJECTION ^{/ T / C / L /} 08:19, 9 December 2022 (UTC)

We have no way of knowing that it increased memory across the board without doing a more systematic check. As you and I both know, changes that should by all rights reduce memory usage do not always (ever?) do that (e.g. fan), and the tests that I did do were showing reduced usage. By its very nature, CAT:E only shows the problems, not the successes. Theknightwho (talk) 08:22, 9 December 2022 (UTC)

Even if it decreased memory usage on pages that were close to but under the memory limit*, that doesn't really matter much IMO. The status quo ante was that CAT:E was usually empty or close to it, and occasionally a page or two would go over the limit and then someone would apply lite templates to that one entry. That was manageable. The result of your changes was to cause dozens of pages to overflow the limit in a matter of weeks. Just look at the history of Template:m-lite. Over half of the edits were in the past month alone, even though the template was created a year ago.

As for whether to revert these module changes or not, I don't know. I can certainly see a case for it, but if your edits weren't just restructuring, but also made substantive improvements that would affect output, I obviously wouldn't want to get rid of that useful work. I also fear that switching back will end up causing issues. As you pointed out, it seems any change will reduce memory usage on some pages and increase it on others, implying there's a cost inherent to transitioning. But I would at least advise against making any more sweeping Module edits without public discussion of them, and testing, first.

* I wish I had extracted memory usage data for the critical pages right before you made the edits so we could check this claim more rigorously. I'm at least skeptical, based on what I've personally observed. 98.170.164.88 11:34, 9 December 2022 (UTC)

That is fair. I am not going to make any further changes for the time being, because there seems to be no obvious way to proceed. Theknightwho (talk) 11:40, 9 December 2022 (UTC)

There are currently no (relevant) errors in CAT:E. I don't anticipate any more should appear, though I will deal with any if they do. Theknightwho (talk) 17:27, 9 December 2022 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ I don't think it should be the case that even a single extra memory error should block changes to core functionality, because then nothing can ever get done, but it should be the case that changes to core modules should be done very carefully, and if several new memory errors arise, the changes should generally be undone. I've been able to make lots of changes to core modules without causing CAT:E to fill up with memory errors, and sometimes I've had to restructure my changes when they did cause such errors to happen. I've also tried to reiterate several times that splitting modules into small pieces is *NOT* the way to decrease memory, but that's exactly what you did in the sortkey changes. I'm not sure all the changes you made recently but I would still advise backing out the sortkey changes. It shouldn't be too hard to do so; can't you just restore the sort keys to the language data modules? As for further changes, I would advise thinking of this like they do in companies that do software engineering; before you make a big change like obsoleting {{zh-l}}, create a design document outlining exactly what you plan to do and have it reviewed by people who are familiar with the core modules (e.g. me, User:Surjection, User:Erutuon, and IP 98.*). That way people can suggest better ways of doing things that won't increase memory, and design errors are more likely to be caught. Benwing2 (talk) 05:34, 10 December 2022 (UTC)

On top of everything else, (at least) the Russian sortkey is messed up because it fails to remove diacritics like it should. I suspect several others are similarly messed up. Benwing2 (talk) 06:28, 10 December 2022 (UTC)

@Benwing2 Which diacritics are the sort key failing to remove compared to the version that was there previously? Before I made any changes, the sort key made a single change (ё to е + a private use character ). The sort key does not and should not deal with diacritics, except that. Theknightwho (talk) 18:29, 10 December 2022 (UTC)

And to deal with the rest - no, it isn’t straightforward to roll back all of the changes, because they were integrated with a large number of nontrivial changes. It will be a large job to attempt, and not necessarily straightforward. I also don’t see the benefit, unless we’re planning to roll back all of the high memory-use pages, which would be of little benefit anyway. Theknightwho (talk) 18:34, 10 December 2022 (UTC)

Apologies for the terseness above - I was on mobile before. I appreciate everything you've said about clearing changes like these with others before making them, and I completely agree and will do so. I will look into getting some kind of Lua dev environment set up, or at least a set of Wikt core modules set up in my userspace, because with large-scale changes (such as integrating {{zh-l}}), it's very often difficult to work out exactly how those changes should be done without making changes incrementally. Certainly with {{zh-l}} in particular, it's going to be quite a difficult job, given that the Chinese modules essentially have their own module ecosystem. Theknightwho (talk) 19:39, 10 December 2022 (UTC)

Fuck it to hell, I'm getting frustrated. Numerous people have indicated that the sortkey changes should be rolled back, yet you seem very resistant. Do we need a poll to convince you of this? Can you enumerate what sort of nontrivial changes you have made in the process of all the sortkey changes? Benwing2 (talk) 05:25, 11 December 2022 (UTC)

@Benwing2 I'm resistant because you're not presenting a case for rolling back 50+ hours of work now that things are stable, especially when it would require re-implementing all of the changes that I made along the way. This affects about 150 languages - it's seriously not worth it.

And no, and I didn't make these changes blindly - I also cross-checked that they were actually correct as I went, which in many cases they weren't. Or at the very least, they were lacking. Rolling everything back would reintroduce a hell of a lot of junk. Theknightwho (talk) 05:34, 11 December 2022 (UTC)

My point is that you almost certainly increased the memory of a lot of pages, which are now hovering right at the edge of the memory limit due to all the *-lite templates added to bring them down to that limit. That is what everyone else is saying as well. This is going to hurt us down the road every time a change is made to any of those pages. Things may be stable now but I honestly see no gain to all the sortkey changes and a lot of downside. I think it's pretty clearly a failed experiment, and the right thing to do is to back it out. Again you haven't enumerated what other changes you made along the way to 150+ languages; was it simply creating all the sortkey modules (which can be deleted or left alone once the changes to the data modules have been backed out) or a bunch of other random fixes? If so, what are those? Once again, can you please make a list of the "nontrivial changes" you keep referring to? Benwing2 (talk) 06:18, 11 December 2022 (UTC)

@Benwing2 It was a very large number of fixes as well, which quite honestly took up the majority of the time because they involved manual checking of everything. That is one reason that it took so long. In addition to all that, some required bespoke logic (e.g. Module:za-sortkey, which is particularly complex, and to a lesser extent those such as Module:kaa-sortkey); some involve unwieldy numbers of substitutions (e.g. Module:aqc-sortkey and those for many other Caucasian languages); and others are consolidated sortkeys for multiple languages (e.g. Module:Grek-sortkey and Module:fr-sortkey, but there are others). See CAT:Sortkey-generating modules. That's without mentioning the many fixes that I did along the way, too, which involved everything from pre-1918 letters in Russian ~~to stuff as simple as supporting diacritics on capital letters with languages where someone had only entered lowercase letters into the sortkey.~~ Actually, that was for entry names, but you get the idea - lots of small fixes.

Remember that many of the sortkeys we had were either very old, or had been added by people not very familiar with coding (who often copied the older stuff, too). Very few (maybe none, I can't remember) were even using the remove_diacritics option. As a result, they were littered with issues. What I implemented was intended to clean that up, and to allow us to have a robust, standardised format which could easily be copied (and in fact, it already has been). This was not done mindlessly, and I would appreciate if you'd have a look at what I've actually done. Theknightwho (talk) 06:40, 11 December 2022 (UTC)

I still maintain that having separate modules is the wrong thing to do in the majority of cases because it increases memory usage. This should not be controversial. I have observed it repeatedly and for this reason I avoid splitting modules in two unless it's possible to bypass one of the modules entirely (Surjection's changes are generally of this sort). If you made a bunch of fixes to the sortkeys, the correct thing to do now is to port those back into the language data modules themselves whenever possible. For example the Russian sortkey module is quite simple and could easily be put back into Module:languages/data2. Yes this may be a significant amount of work but IMO you brought it on yourself by deciding to do a big overhaul of the sortkey system without understanding the added memory pressure it would bring. Benwing2 (talk) 06:50, 11 December 2022 (UTC)

I should add, "whenever possible" means if you have a really complex sortkey module, it should stay as-is but otherwise ported back. Things like shared sortkey modules can be ported back by having a single variable in the data module that is used in several places. Benwing2 (talk) 06:53, 11 December 2022 (UTC)

@Benwing2 The cases where it makes sense to port back are where:

There is no bespoke logic. With normal sortkeys, is there is a predictable order to the substitutions? If so, it should be possible to port any with double substitutions.
They're not shared by languages in different data modules. I know that doesn't apply to Module:Grek-sortkey, for example, so it'll need to be kept.

Theknightwho (talk) 07:02, 11 December 2022 (UTC)

Yes the substitutions are applied left to right. Why do we need to keep modules just because they are shared across languages? I'm not sure I see the need for this. Benwing2 (talk) 07:05, 11 December 2022 (UTC)

@Benwing2 Because you can't have a single variable in the data module for a sortkey used by el, grc and so on, as they're in different data modules. Theknightwho (talk) 07:08, 11 December 2022 (UTC)

Sure but if they are simple enough it's worth the duplication in the most-used languages to avoid extra module loads. In particular for Greek I would put in-data-module versions of Module:Grek-sortkey for at least 'el' and 'grc'; similarly for 'fr' and 'wa' (in the same module), 'frm' and 'fro' (in the same module) and potentially also in 'nrf'. Put comments indicating where the same code is duplicated so it can be kept in sync. Benwing2 (talk) 07:52, 11 December 2022 (UTC)

@Benwing2 That was precisely what I was trying to avoid, because with Greek in particular they weren't in-sync (despite having that note). Given this affects a relatively small number of languages, I don't think performance concerns are justified. Theknightwho (talk) 07:58, 11 December 2022 (UTC)

French and Greek (Ancient and Modern) are highly used languages. Keep in mind with your set up, the little modules are loaded repeatedly on every page. Please try it both ways and see what the memory difference is; this will indicate whether it's justified or not. Benwing2 (talk) 08:06, 11 December 2022 (UTC)

@Benwing2 I can, but given the randomness of the changes, it won't be enormously helpful. After this, I suggest we try to put together what 98 suggested and have a basket of (say) 100-200 pages that we can test changes on and have the stats reported back in some way. That would allow us to see what's going on a bit better, because at the moment we're all working off hunches. Theknightwho (talk) 08:13, 11 December 2022 (UTC)

As a practical matter, though, there are only a few languages that use the Greek alphabet, so memory errors are extremely rare. There are only two such terms in the {{redlink category}} exclusion list, and they date to before Surjection's work on the modules. The real problem is the Latin-script sortkeys. Chuck Entz (talk) 08:20, 11 December 2022 (UTC)

@Chuck Entz Remember that sortkeys are also used on anything in column templates (and probably in various other modules as well), so irrespective of the language they can and will affect large pages unexpectedly. Theknightwho (talk) 08:22, 11 December 2022 (UTC)

My point is that there aren't any Greek-alphabet entries that are big enough to have problems, with only two language sections and no translation tables. I suppose there might be some Latin-script entries with lots of language sections having Ancient Greek in their etymologies, but I can't think of any that have ended up in CAT:E. Chuck Entz (talk) 08:31, 11 December 2022 (UTC)

@Chuck Entz True, though it will affect any with lists of Greek (though I can't think of any likely to have that, off the top of my head). Conversely, putting things in separate modules as I've done should (in theory) be saving memory on pages with large numbers of translations, because none of them will be pointlessly loading the sortkey table into the language object. Instead, they're just loading the string with the name of the sortkey module (which doesn't get invoked). It is an impossible balancing act. Theknightwho (talk) 08:39, 11 December 2022 (UTC)

C

Latest comment: 2 years ago2 comments2 people in discussion

This page is protected for some reason. Please make the following substitutions:

{{l-lite|mul|L|gloss=50}} => {{l-lite|mul|L|t=50}} (It would be possible to change {{l-lite}} to accept this parameter name, but it's deprecated anyway.)
{{l-lite|mul|D|gloss=500}} => {{l-lite|mul|D|t=500}}
{{l-lite|en|C#}} => {{l-lite|en|Unsupported titles/C sharp|C#}}
{{der-lite|nb|ett|𐌂}} => {{der-lite|nb|ett|𐌂|sc=Ital|tr=c}}
{{der-lite|nb|grc|Γ|t=gamma}} => {{der-lite|nb|grc|Γ|sc=polytonic|t=gamma|tr=G}}
{{der-lite|nb|phn|𐤂|t=gimel}} => {{der-lite|nb|phn|𐤂|sc=Phnx|t=gimel|tr=g}}

98.170.164.88 18:55, 9 December 2022 (UTC)

Done. Theknightwho (talk) 03:40, 10 December 2022 (UTC)

Merry Christmas

Latest comment: 2 years ago2 comments2 people in discussion

I wish to you and all users of Wiktionary Merry Christmas and Happy New Year. Leonard Joseph Raymond (talk) 22:14, 25 December 2022 (UTC)

@Leonardo José Raimundo Thank you - and to you! Theknightwho (talk) 22:27, 25 December 2022 (UTC)

Oxford stuff

Special:WhatLinksHere/Module:labels/data/regional

Latin template issue

praecognosco

The naming and placement of Early Scots

Looks like someone noticed you

Tangut data

Adding locative forms of Latin adjectives

Formatting for Latin words with multiple pronunciations

impluit

A belated thanks

ISO 3166-2

Have you documented all your fancy templates yet?

Theoretical Issue

Thoughts

Corr pinyin

嫇奵 and 方寧

Babel

Nonstandard pinyin spellings

Kugel

Cantonese readings

New Area

ar:Suf/Ningxia

H-Hold on...

Retropinging

Check this out- Shanghai

CAT:E

temporizedst, stumpedst

Admin (don't panic)

Moved page ´ to ◌́

Bolding of clippings etc.

Speedy deletion

Serbo-Croatian entry name normalization

ийс

sortkey changes

Module errors

{{lb}} broken

more memory errors

C

Merry Christmas

Wikious

Boobota

Sagapedia

`{{lb}}` broken