Wiktionary:Beer parlour/2025/May

Retiring dual phonemic-phonetic transcriptions for Latin

As I said at Wiktionary:Beer_parlour/2025/April#The_i/j_distinction_in_Latin, I think we should consider replacing the dual IPA transcriptions in Latin entries, because I think few of our users actually understand the distinction between phonemic and phonetic transcriptions. As a concrete example, someone on Reddit wrote: “I see these pronunciations listed for classical Latin: /kon.stan.tiːˈno.po.lis/, A little googling showed me that the lowercase j in superscript position means palatalization, so I guess that was an alternative pronunciation sometime, somewhere.” I think a non-negligible amount of our readers will not notice or appreciate the difference between // and and will make the same mistaken assumption that these are two distinct, alternative pronunciations rather than two different types of transcription representing the same pronunciation.

I think removing the phonemic pronunciation and leaving the phonetic pronunciation is better than the reverse. Our average reader will probably have only a hazy concept of what a phoneme is. To the extent that they know IPA, it's plausible that they've been introduced to it as a "phonetic" alphabet, as per its name (despite the fact that it isn't reserved for phonetic use by linguists), and they may expect the same IPA symbols to represent more or less the same sounds across languages. This kind of expectation is one reason why we use the transcription /ɹ/ in English even though /r/ would be just as adequate as a phonemic transcription. Therefore, if they see the transcription /siɡˈnaː.tim/, they're likely to be liable to get the misimpression that the first syllable rhymes with league or cig and the last syllable sounds like team or Tim. The transcription may be overly narrow, but at least it suggests that the first syllable rhymes with sing, and the last syllable doesn't rhyme well with any English word.

A phonemic transcription actually requires making more theoretical presuppositions than a phonetic transcription. There are a number of areas where, despite broad agreement about phonetics, there are different approaches to the phonemic analysis of Latin.

Final -m: We're reasonably sure that when the letter -m came in word-final position in Classical Latin, it was pronounced phonetically as nasalization on the preceding vowel. But there is not consensus among linguists about whether this means Classical Latin had phonemic nasalized vowels. We currently transcribe it phonemically as /m/, but that's also dubious. Cser 2016 explicitly distinguishes it from the phoneme /m/ and treats final -m in Latin as a "placeless nasal consonant" (pages 15, 28-29): we could transcribe that with a non-IPA symbol like "N", but that's unlikely to be understandable to our readers.
qu, su, gu: Cser 2016 argues in favor of analyzing these as biphonemic clusters /kw ɡw sw/ (pages 16-28) but acknowledges there are arguments for analyzing them as single complex phonemes /kʷ ɡʷ sʷ/.
ae, oe, ui, ei, au, eu, etc.: Cser 2016 argues for analyzing these as phonemic vowel-consonant sequences /aj, oj, uj, ej, aw, ew/ (page 31-37), whereas others analyze them as phonemic diphthongs /a͜e, o͜e, u͜i, e͜i, a͜u, e͜u/.
Some consonants are always double between vowels within a word, e.g., in Ecclesiastical Pronunciation. Technically, there's no possible minimal pair between /ˈfaʃʃia/ and /ˈfaʃia/, and so I'm not sure how theoretically sound it is to indicate the doubling on the phonemic as opposed to the phonetic level.
Latin stress is usually predictable based on the phonemic structure of the word. There are a few exceptions, but it's not clear that means stress is assigned defined in all cases at the phonemic level. In contrast, it's clear there's no theoretical issue with including stress in a phonetic transcription of Latin.

We already have languages where we show phonetic transcriptions without a phonemic transcription; e.g. this seems to be the norm for Catalan entries, and Nicodene made this suggestion for Sicilian.

There is only limited benefit to including both transcriptions. They would be useful only to readers whose understanding of linguistics is sophisticated enough to appreciate the difference between phonetic and phonemic transcription, but whose understanding of Latin spelling and pronunciation is superficial enough that they can't easily derive the phonemic transcriptions themselves (the way that Module:la-IPA does). I think that benefit is small enough that it's outweighed by the risk of confused readers mistakenly interpreting the phonemic transcriptions as phonetic transcriptions. Urszag (talk) 05:30, 1 May 2025 (UTC)

@Urszag: Could you give some examplar Latin words wherein ⟨ei⟩ is pronounced as /ej/ or /e͜i/, and not as /e.i/ vel sim., with hiatus, please? 0DF (talk) 15:35, 1 May 2025 (UTC)

@0DF dein(de) (“afterwards”), deinceps (“successively”), one pronunciation of ei (“him”), and interjections like hei and oiei / ojei. In all cases, the “diphthong” is an artefact of consonantal ⟨i⟩ occurring in a position where ⟨j⟩ would be disallowed by convention, as it’s before a consonant (e.g. *dejnde) or at the end of the word (e.g. *hej).

The same issues applies to ⟨ui⟩, ⟨au⟩ and ⟨eu⟩: e.g. E͡urōpa is indistinguishable from *Evrōpa etc. etc. Theknightwho (talk) 19:13, 1 May 2025 (UTC)

@Urszag I

Support this change and note that it's also the norm for Russian to include only a phonetic transcription (and this is likewise done in my mostly-complete German pronunciation module I wrote a couple of years ago but never completed). Including a phonemic transcription, as you note, can get you into hairy discussions of what counts as a phoneme, which is problematic for German (e.g. with glottal stops and the sound) and especially problematic for Russian due to vowel reduction. My one reservation here is the level of detail shown in the phonetic transcription, which I think may be a bit too much, particularly as regards the diaeresis over the and the dentalization diacritics on , and . It's true that technically is a front vowel, but in practice an without diacritics is usually interpreted as central; and it's also true that the dental obstruents are a difference from English, which mostly has alveolar n, d, t, but this sort of difference is (I would argue) not immediately apparent to an English speaker not trained in phonetics, and having all these diacritics feels maybe a bit overwhelming. I would argue, on the other hand, that the bar under the to indicate retraction is useful to have, since (assuming this is meant to indicate the same sound as in Spanish apicoalveolar retracted ) the difference is immediately noticeable to an English speaker. In general my preference when including phonetic transcriptions is a "lightly phonetic" version, one that tries to capture the most salient aspects of the sound without overwhelming the reader with detail. Benwing2 (talk) 06:01, 1 May 2025 (UTC)

I think another major problem is that our module features a random mixture of ideas from (several) different scholars as well as purely user-invented guesses. A cleaner way to handle all this would be to choose a single principal source to use—so, either Allen or Cser—and only use other sources, if at all, to fill in a detail here or there that isn’t covered by that principal source.

There are a number of areas where, despite broad agreement about phonetics, there are different approaches to the phonemic analysis of Latin.

I’m not sure that this is necessarily less common than the reverse, namely cases where scholars broadly agree on the Classical Latin phonemes but disagree on the phones. (Not so much a problem for living languages, like Catalan, where speakers are available for reference.)

Some examples of there being no clear consensus on phonetic details:

/p t k/: somewhat aspirated or not?
The coronals: dental, alveolar, or perhaps a mix?
/l/ (when not velarized): somewhat palatalized or not?

If we ‛de-narrow’ our Latin phonetic transcriptions in general, however, then I suppose the above could comfortably fit within , , .

In case some of it happens to be useful, here is an (unfinished) overview of what various scholars have to say on Classical Latin pronunciation. Most of the information is in the footnotes.

qu, su, gu: Cser 2016 argues in favor of analyzing these as biphonemic clusters /kw ɡw sw/ (pages 16-28) but acknowledges there are arguments for analyzing them as single complex phonemes /kʷ ɡʷ sʷ/.

If I’m not missing something, these cases are also difficult to decide on the phonetic level.

Edited to add: in keeping with your last two bullet-points one could also mention my old gripe against the notion of phonemic syllabification (no such objections against ). Nicodene (talk) 10:28, 1 May 2025 (UTC)

@Nicodene I do think that etc. would be acceptable broad transcriptions regardless of whether slight aspiration, secondary articulations, etc. are recognized or not for these sounds. Likewise, while there may be some different opinions about the phonetic realization of qu gu su, I think prevocalic and tautosyllabic are similar enough to make either acceptable in the context of broad phonetic transcription (the phonetician Mark Liberman notes that "the phonological distinction between a doubly-articulated consonant and a cluster is not always phonetically plain"). Whatever difficulties there may be with coming to consensus on a phonetic transcription, I think omitting phonetic transcription is not an option we should take because transcriptions like /siɡˈnaː.tim/ and /konˈfrin.ɡoː/ don't show notable features of Latin pronunciation such as , final -m loss, and vowel lengthening before nf, and there's a real risk of readers misunderstanding these as phonetic transcriptions.--Urszag (talk) 21:18, 1 May 2025 (UTC)

I’m actually a lot more amenable to that than the above comment probably made me seem. That is, given a generally broad (aligned, presumably, with Cser 2016) rather than what we have now. Nicodene (talk) 21:31, 1 May 2025 (UTC)

@Urszag @Benwing2 I appreciate that the line is somewhat blurry, but it sometimes feels as though our definition of "phonemic" veers too far towards morphophonology, and that some of the issues Urszag raises can be safely dealt with under a phonemic transcription.

There is no need to treat ⟨m-⟩ and ⟨-m⟩ as the same phoneme, simply because they are written with the same letter. The fact that they cannot be contrastive due to position is purely morphophonemic.
The question of /kʷ/ or /kw/ has essentially already been answered by the way the module has been written: analysing it as /kw/ would require a bunch of special exceptions that aren't necessary when treating it as /kʷ/, and (though I'd need to check in detail to be sure), I can't think of any instances in the other direction. By comparison, ⟨x⟩ is treated as though it were ⟨cs⟩, and you run into the same problem in reverse if you try to analyse it as a single phoneme (not that anyone does, but it's illustrative of my point). The same goes for /ɡʷ/ and /sʷ/; e.g. compare cuiusvīs /kujˈjus.wiːs/ (penultimate stress) and */ˈkuj.ju.sʷiːs/ (initial stress).
All Latin diphthongs ending in ⟨-i⟩ and ⟨-u⟩ seem to be artificial constructs to get around the fact that ⟨j⟩ and ⟨v⟩ cannot conventionally occur before consonants or at the end of a word, and I don't really see any evidence that they represent anything phonologically distinct. Regardless, I'm not sure how and are supposed to contrast phonetically anyway, so I don't see why we would include these at all.
The gemination of /ʃ.ʃ/ is still phonemic, because /ʃ/ and gemination are both phonemic features of the language. The fact that it only occurs under certain conditions is a morphophonemic feature, though - I don't think it occurs at word boundaries.
The same goes for Latin stress: it is regularly predictable, but there are exceptions (e.g. illic and istic), so we have to treat it as phonemic. Theknightwho (talk) 10:56, 1 May 2025 (UTC)

For point 4, Italian /ʃ/ does geminate across word-boundaries and so, accordingly, does the Italo-Ecclesiastical counterpart (as in ‛qui sciret’). Nicodene (talk) 12:11, 1 May 2025 (UTC)

@Nicodene That's a good example - thanks. I'm a bit wary of extrapolating Italian to Italianate Latin, given there are notable differences (e.g. Italian fascia /ˈfaʃ.ʃa/ and Latin fascia /ˈfaʃ.ʃi.a/). That being said, I'm still inclined to call the gemination phonemic, because it's still treated like a cluster: e.g. mariscī is /maˈriʃ.ʃi/, not */ˈma.ri.ʃi/. That only works if we treat it as geminate, which is precisely how we handle ⟨z⟩, too. Theknightwho (talk) 13:07, 1 May 2025 (UTC)

It’s in part accidental. The placement of stress in general depends not on the surrounding sounds but rather on the speaker memorizing ancient (no longer pronounced) differences like ū/ŭ and ae/e, word by word, and then applying a series of ‛weights’. Synchronic rules are not allowed to involve the speaker doing historical linguistics, I think. Nicodene (talk) 14:00, 1 May 2025 (UTC)

@Nicodene In this case, isn’t the ancient rule simply that it was a consonant cluster, reinforced by the digraph? That doesn’t really feel accidental. Theknightwho (talk) 16:44, 1 May 2025 (UTC)

The problem is that, as far as Italianate Latin today is concerned, it’s a pattern rather than a determinative rule. Cf. discédo, nescíret (stress follows ) or baptizáre, ratiónem (stress follows , ). Nicodene (talk) 20:44, 1 May 2025 (UTC)

@Nicodene The rule I'm referring to is that a short vowel in the penultimate syllable is treated as light, unless it is followed by a consonant cluster or geminated consonant. This is useful for a couple of reasons:

⟨qu⟩, and consonantal ⟨gu⟩ and ⟨su⟩, only fit the rule if analysed as /kʷ ɡʷ sʷ/.
Classical ⟨z⟩ and Italianate ⟨sc⟩ must be analysed as phonemically geminate, because pulverizō and mariscī both have penultimate stress in Classical and Italianate. By comparison, *pulverisō and *marisī would both have antepenultimate stress. The lack of vowel-length distinction in Italianate isn't relevant, either, because a light syllable can never occur before ⟨sc⟩, whereas we would expect unpredictable variation (corresponding to vowel length in Classical) if ⟨sc⟩ weren't geminate. The upshot is that (1) gemination affects stress, (2) we already established that stress is phonemic earlier, so (3) the gemination of /ʃ.ʃ/ must be noted in a phonemic transcription. Theknightwho (talk) 21:45, 1 May 2025 (UTC)

The lack of unstressed penultimate syllables before in Ecclesiastical Latin could be regarded as a diachronic accident, or at least analyzed in terms other than being two phonemes long in this position: Spanish has a similar gap with the consonants /ʎ ɲ ʝ tʃ r/, but it isn't that common to transcribe these as phonemic geminates in contemporary Spanish. The rule you mention is undeniably valid for Classical Latin, although clusters such as "pr tr cr" can be exempt, so analyzing /kʷ ɡʷ sʷ/ doesn't eliminate all of the exceptions. As Cser says, the phonemic analysis of "qu gu su" has been discussed by multiple linguists with no clear consensus emerging, so I don't think it is actually obvious. While /kw gw sw/ would be somewhat atypical clusters in Latin, /kʷ ɡʷ sʷ/ would be somewhat atypical consonants: they can't precede other consonants or come at the end of a syllable (seemingly not even in the context of a geminate consonant if the spelling "cqu" is taken at face value as /k.kʷ/), and /sʷ/ is sometimes replaced in poetry by /su/.--Urszag (talk) 22:05, 1 May 2025 (UTC)

@Urszag Well, the rule isn't strange at all: it's that they must be followed by a vowel. That's it. I'm honestly not sure why Cser writes as though there is a whole laundry list of things that make /kʷ ɡʷ sʷ/ atypical, when they're all just corollaries of that. Plus, the same rule uncontroversially applies to /f/ and /h/, so it's not without precedent, either. Theknightwho (talk) 00:25, 2 May 2025 (UTC)

It is certainly imaginable that /kʷ ɡʷ sʷ/ are single consonants that are required to be followed by a vowel. But this requirement would put them in the minority of Latin consonants (/f/ can be followed by a consonant in /fr/, /fl/ and /ff/). Latin /h/ also has to be followed by a vowel, but /h/ is clearly the most aberrant Latin consonant, to the point where it is not clear it is a consonant phoneme at all: there is a long history going back to ancient authors of not counting it among the consonant sounds of Latin (since it allows elision and doesn't create heavy consonant clusters). Also, if we consider phonemes to be psychologically real, it seems a bit strange that as far as I know no ancient Latin author on pronunciation describes "qu gu su" as consonants rather than as sequences composed of a "c g s" sound followed within the same syllable by a "u" sound.--Urszag (talk) 01:01, 2 May 2025 (UTC)

@Urszag You know, I originally only wrote /h/, then went back and added /f/ absent-mindedly due to Cser mentioning it on page 26 as an example of a phoneme which must go in syllable-initial position. My bad. In any event, I don't really think it changes my point, especially with the possibility of treating /m/ in the same way, mentioned below.

One thing I find contradictionary in Cser's argument are these two arguments:

p. 19: While all stops occur as geminates in simplex forms, ⟨qu⟩ does not.Furthermore, it does not even occur in a / sequence (which could, in theory, be analysed as the phonetic representation of geminate but also as a + + sequence). This squares neatly with the fact that geminates do not occur next to another consonant (in this case before ). It also squares neatly with the fact that can emerge (though rarely does) at prefix stem boundaries, as in acquirere ‘get’ and acquiescere ‘acquiesce’ from ad+⟨qu⟩. It is only at such boundaries that geminates can be adjacent to consonants.
p. 26: The history of English shows a parallel development of PIE * > (Old) English and * > English , as in which and queen, respectively, where stops developed into what are analysed as clusters on phonological grounds independently of their provenance. Furthermore, the later history of Classical Latin ⟨qu⟩ is far from uniform: in Italian, for instance, it developed intervocalically into , as in acqua ‘water’, which can be seen as a diachronic reflection of its cluster nature (though, admittedly, in Vulgar rather than Classical Latin).

Well, which is it? Is it a cluster because gemination does not occur outside of morpheme boundaries, or is the development of gemination at a non-morpheme boundary evidence that it must be a cluster? It can't be both. Theknightwho (talk) 01:58, 2 May 2025 (UTC)

I am referring to the phonology of Italianate Latin—that is, how it works synchronically and in its own right—not to the elaborate rules our module uses to derive Italianate outputs from Classical (or Classical-by-orthographic-proxy) inputs. The latter would only apply here if one regards Italianate as phonemically identical to Classical (with the full spectrum of contrasts like vowel length, etc). Geminate does not determine stress, per examples like discédo, and there is nothing about the sequence of phonemes /d/+/i/+/ʃ/+/e/+/d/+/o/ that would tell us where the stress should be. From the synchronic perspective it is unpredictable, as e.g. in English. Nicodene (talk) 22:41, 1 May 2025 (UTC)

Theknightwho is referring to the absence of forms such as léntisci from Ecclesiastical Latin. The rule is not that must be preceded by a stressed vowel, but that a vowel before must be stressed if it is in the penultimate syllable of the word: the i in discedo is not in the penultimate syllable, so it isn't a counterexample to that rule. In any case, I kind of just tossed ʃ in there as one additional example and I don't think it deserves much argument; I think the other problems are more important, such as final -m and the predictable but non-contrastive vowel lengthening before nf and ns.--Urszag (talk) 23:35, 1 May 2025 (UTC)

I see. My response in that case would be essentially the same as yours. Nicodene (talk) 01:10, 2 May 2025 (UTC)

reading this thread, I don't quite understand why, for a /phonemic/ transcription, we can't have <-um> as /ũ/ and <-un> as /un/ but neutralize them both to /un/ before a consonant.

I guess arguing about their /phonemic/ values isn't necessary if we are going with a transcription instead. — BABR・talk 07:00, 6 May 2025 (UTC)

There’s a fine line between deciding phonemes and sliding into madness. Nicodene (talk) 08:22, 6 May 2025 (UTC)

Regarding final -m, look at the entry for etiamnum that I recently edited, as well as quamobrem and other artefacts of univerbiated spelling, where final m contrasts with regular m in unexpected ways (when spelling forces it inside a word). True m retains its power before n, as in columna, omnis, unlike etiamnum or etiamnunc -- this one is attested as etiannunc (in papyri) and is also said by Velius Longus to be unpronounceable with an m despite the spelling. So, as a practical consideration, final m in Wiktionary entries isn't even always final. For this reason I'd strongly support adding a final M phoneme, which I would spell as /M/ instead of /N/ for legacy reasons (compliance with the Roman grammarian tradition, where it is never by anyone called a type of N). If anything, /n/ contrasts with final M, and by all accounts a final was an acceptable pronunciation of final M before pause, but a final was not, being a realization of /n/.

In regard to diphthongs, I believe the ancient tradition recognizes only several, of them au, ae, oe, eu. "Cui" is not considered to have a diphthong. Many more diphthongs were added by modern analyses, to the point that now we want to abolish them because the category is overblown and meaningless. Personally I would pay lip service to the ancient tradition and keep the diphthongs at au, ae, oe, and eu, and convert the rest to vowel and consonant sequences (huic, cui, ei). That way we are neither doing something very revolutionary (abolishing customary diphthongs like ⟨au⟩) nor stretching the definition of diphthong to its utmost limits away from the custom (ei of eidem as a diphthong, possibly ai of aio as a diphthong, etc). In the end, it likely doesn't matter too much.

Albeit weakly I suggest keeping in a distinguishing diacritic on t and d, because I find the difference noticeable between a Romance t/d and an English one, and because the English one poses problems when one tries to say it together with a coronal r. On the IPA module talk page I already suggested simplifying some of the phonetic aspects.

On a related note, it could be wise to have the phonemic transcription at least feature vowel qualities presumed to have been at all used in Latin. I don't think Latin was ever by default spoken with narrow and , which the phonemic transcription of /e/ /o/ would imply. This seems to be an artefact of these letters being more natural, but not their sound. Draco argenteus (talk) 23:17, 1 May 2025 (UTC)

/M/ would have some mnemonic value, but it's not IPA and if we use it we can't expect anybody to know what it means without an explanation. I don't know if we can do something like the Italian "°" for syntactic gemination, where the symbol automatically gets a tooltip explanation if you hover over it, but even that wouldn't be a great solution, since it doesn't help mobile users. Furthermore, having the concept of a phoneme /M/ gets into the morphophonological issues that Theknightwho alluded to. When do we use /M/ versus /n/ versus /m/ before a consonant when phonemically transcribing prefixed, suffixed or compound words such as etiamnum, illunc, conpello, compello, cōnficiō, īnfāns, inputō, imputō, circumtrahō, circumpōnō? @Theknightwho what would be your preferred phonemic transcription for final -m and for nasal vowels before -nf- and -ns-?--Urszag (talk) 23:56, 1 May 2025 (UTC)

@Urszag I've not yet considered the ⟨-nf -ns⟩ question, but don't @Draco argenteus's examples suggest that the nasal vowels were phonemic? I don't think there's any problem with treating /m/ as a positionally-restricted phoneme which can only exist in syllable-initial position. Theknightwho (talk) 00:05, 2 May 2025 (UTC)

There is a distinction between "/M/" and /n/ in utterance-final position, as in cum compared to in, but this distinction is neutralized before a consonant. Before a stop, either becomes a homorganic nasal stop; before a fricative, either becomes deleted with nasalization and prolongation of the preceding vowel. If we treat the nasal stops as /m/ and /n/ (according to the identity of the following consonant), and the nasalized vowel as /M/, then we would transcribe these as /etiˈannuM/, /ilˈlunk/, /komˈpelloː/, /koMˈfikioː/, /ˈiMfaMs/, /ˈimputoː/, /kirˈkuntrahoː/, /kirkumˈpoːnoː/ respectively. I don't think the use of M internally before /f/ and /s/ here is very intuitive. But I also don't think the morphophonological approach (where the prefix in- is underlyingly /in/, the prefixes con- and circum are underlyingly /koM/ and /kirkuM/, etc.) will be helpful to our readers. If you're suggesting we use transcriptions like /etiˈannũː/, /kõːˈfikioː/, /ˈĩːfãːs/, I find that prettier and more legible, but it implies that /ũː/, /õː/, /ĩː/, /ãː/ are vowel phonemes rather vowel + consonant sequences on the phonemic level, which contradicts the bisegmental analysis of nasal vowels (as a vowel phoneme + a placeless nasal consonant phoneme) that is preferred by some scholars such as Cser.--Urszag (talk) 00:26, 2 May 2025 (UTC)

I don't suggest going all in on the placeless nasal except for noting it at word-end only, or in the middle of etiamnum, wherever in the spelling it is shown through m. Word internally most often in utrumque and the rest. Here the Romans recognized that the M was special and sometimes even wrote it differently (non tota littera, sed pars illius). Phonemic /m/ for such cases sort of works, but I find it just slightly unsatisfactory, as indeed it was already proposed. Importantly, a true /m/ keeps its character before /n/, /t/ and /s/ (hiems, demsi, though it often acquires an intervening allophonic p before t and s), even if it's special behavior before vowels is to be neglected. Most of the rest regarding adding more of the placeless nasal in phonemic transcription I don't suggest, and I disagree with it. Draco argenteus (talk) 03:43, 2 May 2025 (UTC)

The phrase "non tota littera, sed pars illius" occurs in Velius Longus, in the context of a proposal to write word-final -m differently when followed by a word-initial vowel: "Non nulli circa synaliphas quoque observandam talem scriptionem existimaverunt, sicut Verrius Flaccus, ut, ubicumque prima vox m littera finiretur, sequens a vocali inciperet, m non tota, sed pars illius prior tantum scriberetur, ut appareret exprimi non debere." I don't think this passage gives any support to the hypothesis that the Romans heard the pre-consonantal ⟨m⟩ in utrumque as the same sound as the ⟨m⟩ in utrum + a vowel-initial word. Cser 2016 notes that ancient testimony indicates that words like numqumam/nunquam were pronounced with the assimilated velar nasal consonant , which could be spelled etymologically as ⟨m⟩ or phonetically as ⟨n⟩ (page 19); in this position, is typically analyzed as an allophone of /n/. I'm not a fan of using spelling as the criterion for phonemes in contexts like this: Latin spelling is sometimes morphological rather than phonetic, as in the case of urbs which is pronounced /urps/.--Urszag (talk) 04:21, 2 May 2025 (UTC)

I don't suggest that either. My minimal suggestion was for using /M/ for final M, then analogically extending it to some occurrences of it within words, mainly before consonants, sometimes before vowels as in quamobrem where the behavior differs from that of a true m, to be noted as /m/. The criterion being ⟨m⟩ appearing in spelling but being contradicted by what it really is: for example ⟨quamtus⟩ is an attested spelling for quantus (and stated to be pronounced quantus), while ⟨emtus⟩ is an attested spelling for emptus. Here we can see that denoting both as /kwamtus/ (I'm not typing the superscript w for now, but it is to be understood here instead of w) and /emtus/ will either mean that both are quamptus and emptus, or that both are quantus and entus. This can be avoided with /kwaMtus/ and /emtus/, where the use of /M/ is informed purely by spelling. However, I can see this as being unnecessary, as indeed /kwantus/ /emptus/ provide solutions to the problem, and etiamnum is successfully represented by inputting /etiannum/. Perhaps only prevocalic final M is the most useful, as in quamobrem, but even there the problem is taken care of by word separation. Ultimately I am in favour of scrapping the displaying of phonemic transcription, because of its various problems and apparent uselessness. Draco argenteus (talk) 04:41, 2 May 2025 (UTC)

@Theknightwho You run into endless problems if you try to shoehorn phonetic/allophonic information into the phonemic representation. As an example, occurs as an allophone of <n> before <g> and <k>, and also as an allophone of <g> before <n>, but it's unquestionably non-phonemic. Benwing2 (talk) 00:27, 2 May 2025 (UTC)

At this rate we’re never going to move things along. I propose an informal vote:

Should we set the module to stop outputting both // and ?
- If yes, do you prefer having only // or only ?
Should we set the module to follow the Classical pronunciation reconstructed by a single scholar, as a baseline before further discussions/tweaks?
- If yes, do you prefer we use Allen 1965 (Vox Latina) or Cser 2016 (Aspects of the Phonology and Morphology of Classical Latin)?

My answers:

strong support
weak preference for
strong support
weak preference for Cser 2016

Nicodene (talk) 01:53, 2 May 2025 (UTC)

Support
Prefer only (I have suggested improvements to //, but I'll see the entire idea of outputting // scrapped with little regret)
Strong support
Support for Cser 2016 with possibility for later tweaks

Draco argenteus (talk) 03:49, 2 May 2025 (UTC)

My answers:

Support
Prefer
I don't favor using one of these as a baseline. If we do, I'd prefer Allen 1965. Cser does provide phonetic transcriptions, but as per the title of the thesis, he deals a lot with questions of phonology and morphology. For example, I think Cser's use of as a broad transcription is informed by his favored phonemic analysis: given the change in spelling from "ai" to "ae", I think it's unlikely is a very accurate transcription of Classical Latin "ae".

--Urszag (talk) 04:21, 2 May 2025 (UTC)

Support
Prefer only
I don't have enough knowledge of the two sources to say which one is better. In general I would rather that we start with a single baseline and go from there, so I guess I support this question. One possibility if the reconstructions differ significantly is to list both of them, as we do for Old Chinese reconstructions (there are at least two major ones, Baxter-Sagart and Zhengzhang Shangfang; we put both along with others in a dropdown that when closed shows the Zhengzhang reconstruction, which presumably has a bit more consensus on it these days, maybe just among Wiktionary editors, than Baxter-Sagart). My instinct is to prefer the more recent one; 50 years is a long time in historical linguistics. But I dunno if Cser's reconstruction is more of a sensible, "reflect modern consensus" type of reconstruction like e.g. Ringe for Germanic, or more of an "out there" type of reconstruction like Leiden tends to produce.

Benwing2 (talk) 04:30, 2 May 2025 (UTC)

Overall there aren’t all that many notable points on which Cser contradicts Allen. Examples include his preferring over and (very tentatively) over , both of which he provides fairly detailed argumentation for. (So, it’s not on a whim.)

Instead of resetting the entire module to follow one scholar or the other, though, we could just focus on cleaning up issues like:

the purely user-invented
the unnecessary diacritics in and
the odd/borderline-non-IPA for the of iam, peius, evangelium, vesper
the strange/fake-precise for what both Allen and Cser describe as

Nicodene (talk) 04:08, 4 May 2025 (UTC)

Support removing the really narrow transcription.
No strong opinions on / / vs : I think we're all arguing for the same kind of transcription, but we seem to have different opinions on what counts as phonemic or not, so it's probably safer to go for .
I don't think we need to use a particular author as a baseline, but I agree with the changes Nicodene proposes above. That being said, I disagree with Cser's view on , and would like to keep them.

Theknightwho (talk) 00:32, 5 May 2025 (UTC)

I would like to keep . In my opinion is more difficult to prove, isn't associated with a dedicated letter (while is associated with ⟨q⟩), and can be simplified to , just as is. So, unusually, I propose using for simplicity. Draco argenteus (talk) 01:11, 5 May 2025 (UTC)

Hard oppose using different transcriptions for them. Sorry. They're both digraphs, the use of ⟨q⟩ can be explained diachronically and has nothing to do with this question, and there is no reason at all to assume that we should default to . It isn't a simplification, either, and just makes our transcriptions incoherent by implying some kind of qualitative difference between ⟨qu⟩ and ⟨gu⟩. Theknightwho (talk) 03:10, 5 May 2025 (UTC)

Certainly for reading purposes you are right. I got caught up in other considerations. I support both expressed through superscripts. Draco argenteus (talk) 06:06, 5 May 2025 (UTC)

Something modest like this may be more conducive to consensus, so I’m changing my votes to:

Yes; only
Neither; just these more lightweight changes for now, like >

Nicodene (talk) 00:42, 5 May 2025 (UTC)

I would like to see the narrow transcription simplified to the basics, but while retaining the dark l and the dental diacritic under z. I can let go of the dental diacritics on t and d. I think even the lax i and u are too narrow, since no Roman before Consentius mentions them even when describing the different vowel qualities of e/ē and o/ō, which makes me think the detail was surprisingly irrelevant, as African speakers, who had the Sardinian vocalism, would not receive corrections on their i and u, despite receiving them at least on ē (as with Pompeius). But simplifying the lax i and u may be a spicy opinion. Draco argenteus (talk) 01:01, 5 May 2025 (UTC)

The above would change to , not to .

I’ve checked all the Latin scholars that come to mind and not been able to find one that argues for z being specifically dental. But, note that as a broad transcription accommodates both possibilities (dental and alveolar) while is inherently narrow and can only cover one. Nicodene (talk) 01:25, 5 May 2025 (UTC)

@Nicodene Maybe he means retracted alveolar s? I think there is a fair amount of evidence from Romance languages for this pronunciation in Latin. I dunno about z, which I thought was pronounced more like in any case. Benwing2 (talk) 01:28, 5 May 2025 (UTC)

For the manner of articulation of Classical z, the scholarly ‛votes’ are: fricative (Sturtevant, Allen, McCullagh), affricate (none) — notes, citations.

For the place of articulation of Classical s: dental (Sturtevant), alveolar (Allen/Weiss/McCullagh), apico-alveolar or undecided (Lloyd) — notes, citations. To the best of my knowledge, the only symbol that covers this range of possibilities is a broad . Nicodene (talk) 01:44, 5 May 2025 (UTC)

I spoke for z. I have an idiosyncratic belief where I think whether s was specifically retracted or not is hard to decide upon, but I weakly support it being apico-alveolar (which may be unretracted), but I assume that z specialized toward being dental, but this all mainly from early Romance languages. Essentially I keep putting my belief in /s/ and /z/ not actually contrasting mainly on voicing, which leads to a desire to add some distinctions to them in the transcription beyond making the reader think the main distinction is voicing and some doubling. It's probably not very citable for Latin though. Draco argenteus (talk) 06:15, 5 May 2025 (UTC)

Given what can be cited I propose for s, as several+most scholars support it, while z with a diacritic is unciteable and can be left as is. Draco argenteus (talk) 08:52, 5 May 2025 (UTC)

I’m not aware of information from Romance (or elsewhere) that would suggest Classical /z/ and /s/ had different places of articulation. I suppose you are thinking of the situation later in Iberia, but there /(d)z/ was not the surviving continuation of a Classical /z/ but rather the result of original /k/ palatalizing before front vowels and voicing intervocalically (facere > fazer) and the result of adapting ‛ecclesiastical’ pronunciation in new learned borrowings (baptizare > bautizar). Nicodene (talk) 11:36, 5 May 2025 (UTC)

So that was the evidence. Okay, I can live with it. Draco argenteus (talk) 03:36, 6 May 2025 (UTC)

@Urszag, @Benwing2: any chance we could agree on cleaning up the above? For reference:

user-invented
unnecessary diaeresis in and dental diacritic in etc
non-standard IPA for iam, vesper, etc (not diphthongs)
user-invented for what both Allen (p 30) and Cser (passim) describe as

Nicodene (talk) 03:54, 6 May 2025 (UTC)

All fine with me. Benwing2 (talk) 03:56, 6 May 2025 (UTC)

For what it's worth, I completely agree with this. The current transcription for Classical is annoyingly narrow, and the usage of , instead of , is honestly nonsensical (at least in the onset). Whether should remain in the coda, I think, depends on the phonological rules of the language. — BABR・talk 07:23, 6 May 2025 (UTC)

I don't think they should, unless we want to disallow or (and I don't see why we would). should certainly never appear before a vowel, in any event. Are we going to keep as well? I appreciate that there's speculation on why the shift from ⟨ai⟩ and ⟨oi⟩ to ⟨ae⟩ and ⟨oe⟩ happened, and that it may have been for phonetic reasons, but I'm not convinced:

The distinction between and is very small, and certainly did not get used contrastively. For instance, are maior (phonetically maiior) and praeiacēbō really distinct? This seems unlikely. The difference just seems to be an orthographic artefact of the morpheme boundary after prae-.
There was orthographic pressure to distinguish and from and , as all four are common sequences before a consonant or boundary. There are no instances of ⟨ai⟩ and only one instance of ⟨oi⟩ (proin(de), ) being used as diphthongs in that position during the Classical period, suggesting the distinction served a practical purpose.
On the other hand, and before a consonant/boundary are very rare, with dein(de) , deinceps and huic being the only preconsonantal examples.
Their use as evidence for ⟨Vi⟩ being qualitatively different from ⟨Ve⟩ is also undermined by proin(de) , which is directly analogous to dein(de) etymologically, but never underwent respelling to *proen(de). This, again, suggests ⟨oe⟩ was simply an orthographic convention, because the alternative is to suggest that proin(de) had a unique diphthong distinct from the initial vowel of proelium . While marginal diphthongs evidently did occur under similar conditions, contrastive and is not plausible.

Theknightwho (talk) 13:40, 6 May 2025 (UTC)

@Nicodene I'm fine with replacing with , with , with , with , and replacing in the onset or as geminate consonants with .

@Theknightwho I prefer using for the second portion of diphthongs.

In the case of "ae", the testimony of Terentius Scaurus ("apud antiquos i littera pro ea scribebatur, ut testantur μεταπλασμοί, in quibus est eius modi syllabarum diductio, ut 'pictai vestis' et 'aulai medio' pro pictae et aulae. sed magis in illis e novissima sonat, et propterea antiqui quoque Graecorum hanc syllabam per ae scripsisse traduntur") is often taken as direct evidence for the phonetic value ; thus Allen, Lindsay 1894:43, and others. This has been debated. While could be argued to be broader transcription in some respects, is closer to the spelling, which I consider a point in its favor since the spelling is not disputed.

If we use the transcription for "ae", we would be practically compelled to use the transcription for the ae + vowel sequences found in Greek loans such as iūdaeus: a pronunciation like is ruled out since the second-to-last syllable scans heavy, and a pronunciation like has an unparalleled syllable boundary between a consonant and a directly following vowel. So that leads us to , but that transcription implies no phonetic distinction from the sequence found in words spelled with ai + vowel such as maior. I'm doubtful of that conclusion. There are a few words such as Aiāx where Greek -αι- before a vowel was transliterated in Latin as -ai-, but I think the use of -ai- vs. -ae- spellings is generally stable rather than fluctuating for each individual word, which would be in line with this being a phonetic distinction rather than a purely orthographic one.

I acknowledge Cser’s point that the analysis makes it simpler to interpret and transcribe the pronunciation of words prefixed with prae- like praeacūtus, which usually scan with a light first syllable. If we assume prae- = and this is resyllabified like other consonant-final prefixes, we get . But while this neatly accounts for the metrical facts, I think it’s not actually clear that such words were pronounced with onset rather than with a diphthong that was affected by shortening in hiatus, a phenomenon that can be seen affecting long monophthongs (as in dĕūrās, derived from dē- + ūrō). I would handle this situation with a notation like : this may be more awkward, but this is an edge case anyway.

The case of "oe" is basically analogous. I don't think we can rule out the possibility that proinde was pronounced , even if it was never spelled *proende. The "oi" may simply be a morphological spelling influenced by the form of inde. So I don't think the existence of proinde requires us to transcribe "oe" as . Assuming we accept the transcription of Latin pre-consonantal short i as , it isn't obvious that a diphthong derived from the fusion of the vowels + would have the phonetic outcome , with changed to the more constricted consonant sound .--Urszag (talk) 21:26, 6 May 2025 (UTC)

I agree with @Urszag here; since we're doing a broad phonetic representation, and given Urszag's arguments along with the stability of the spellings <ae> and <oe> and the general tendency for the second element of diphthongs to "relax", or even sounds more plausible than . Benwing2 (talk) 22:04, 6 May 2025 (UTC)

@Urszag Why are you doubtful that iūdaeus could have when the Greek was Ἰουδαῖος (Ioudaîos), with a long vowel? In fact, doesn't that suggest that's precisely how it was pronounced? Theknightwho (talk) 22:23, 6 May 2025 (UTC)

As you say, the αῖ in ancient Greek Ἰουδαῖος functioned like a long vowel: for example, it shows circumflex accentuation, which could not occur on a short vowel followed by a double consonant. The transcription indicates a short vowel followed by a double consonant. After thinking a bit more, I'm a bit less confident in my appeal above to spelling evidence, since most examples showing the distinction in writing between "ae" and "ai" before vowels in Greek loans come from manuscripts written by postclassical scribes. However, we do see ancient grammarians comment on the pronunciation of words such as Troia, Maius and Aiāx that confirm that these were pronounced with short vowels + double . Also, it's hard for me to tell whether any of the Romance descendants of iūdaeus are fully inherited, but if they are, they they show different outcomes from Maius (e.g. Portuguese judeu vs. maio).--Urszag (talk) 22:43, 6 May 2025 (UTC)

Yes the outcomes of iudaeus are broadly in line with those of Deus, e(g)o, meus and not gaius, maior, Maius, Nicodene (talk) 22:54, 6 May 2025 (UTC)

@Urszag I think you've misunderstood my point: the sequence ⟨αῖο⟩ would have been pronounced in Greek, so it's fully expected for the Latin borrowing to do the same. Theknightwho (talk) 23:04, 6 May 2025 (UTC)

I'm not sure I accept that ⟨αῖο⟩ was pronounced in Ancient Greek, given that αῖ isn't accented like a short vowel + double consonant sequence. I know Allen in Vox Graeca presents the analysis of Greek pre-vocalic diphthongs as short vowels + doubled semivowels, but it seems rather speculative, and in any case the foreword of Vox Graeca says its primary aim is to describe Attic Greek of the 5th century BC, centuries before Classical Latin.--Urszag (talk) 23:27, 6 May 2025 (UTC)

@Urszag My point was not to say it's definitive; it was that resting your argument on the doubtfulness of is not as self-evident as it first seemed. I am happy to accept that preconsonantal diphthongs may have been more relaxed than prevocalic ones, but that still leaves us with some questions:

The ongoing discussion on how to handle ⟨ae⟩ and ⟨oe⟩ in prevocalic or word-final positions.
How to treat the other dipthongs:
1. Were ⟨ei⟩ and ⟨ui⟩ also relaxed in deinde and huic?
2. How about ⟨au⟩ and ⟨eu⟩?

Theknightwho (talk) 00:03, 7 May 2025 (UTC)

@Theknightwho I think there's a clear basis for saying that the diphthong ⟨ae⟩, ending in a front glide, did not evolve symmetrically to the diphthong ⟨au⟩, ending in a back glide. Aside from the lack of a spelling change from ⟨au⟩ to ⟨ao⟩, we see that in Romance languages ae always evolves like monophthongal ĕ (or occasionally ē), whereas ⟨au⟩ is often maintained as a diphthong ending in or , e.g. aurum > Galician ouro , taurum > Romanian taur. That supports the asymmetry between the pronunciations and . It's not as easy to give examples of ⟨eu⟩ in vocabulary inherited from Latin to Romance, but I think we can safely conclude by analogy that it was also generally pronounced as eu̯. The pronunciation of the more marginal diphthongs ending in a front glide is more troublesome, but I think the transcriptions are adequate, even if the actual values could have involved slightly different vowel qualities such as ɛɪ̯, ɛi̯ or ʊɪ̯, ʊi̯. In native words, their development by fusion of originally separate vowels is generally more recent than the development of ⟨ae⟩ and ⟨oe⟩.--Urszag (talk) 00:47, 7 May 2025 (UTC)

@Urszag I agree with your argument that the transcriptions are adequate, even if the actual values could have involved slightly different vowel qualities. What I can't agree with is that we take a pedantic approach to ⟨ae⟩ and ⟨oe⟩, but ignore these. Theknightwho (talk) 00:51, 7 May 2025 (UTC)

I would consider all of to be broad, somewhat uncertain transcriptions, so might have really been pronounced more like , but since we can't be certain I prefer using the same letter the Romans did. I think there's a good chance that really ended in an opener phonetic quality than , so I don't find it problematic to use different letters for their offglides. But if everyone else prefers , that's OK with me.--Urszag (talk) 05:48, 7 May 2025 (UTC)

I agree here with @Urszag. Benwing2 (talk) 05:52, 7 May 2025 (UTC)

My stance on ui and ei is that they're not to my knowledge anciently recognized diphthongs, and since diphthongs are progressively arbitrary beyond and , I like to draw an arbitrary line as well and stop the diphthongs at eu, not including ui, ei and other possible ones, which I would put as the corresponding short vowel with . Draco argenteus (talk) 07:19, 7 May 2025 (UTC)

@Theknightwho, I thought it may be worth exploring some of the background for this question.

Per Adams' The regional diversification of Latin (pp 78‒88), evidence points to the monophthongization of Latin ae, presumably to , in several regional accents of the second and first centuries BC, but not in Rome itself - at least, not among the educated. Per Adams' Social variation and the Latin language (p 75), ‛various corpora show provincials in the first three centuries of the Empire writing e for ae with such regularity that monophthongisation must have been widespread across the Empire’, but there is no clear evidence for this in Rome itself. Further (p 80), ‛what little evidence there is in grammarians suggests that in the early centuries of the Empire there was an attempt to maintain to maintain the diphthong, but that by about the fourth century the monophthong was so established that it was acceptable even to grammarians’.

I do not know of a similarly detailed survey of evidence for the Greek diphthong, but per Allen's Vox Graeca (pp 75‒6), spellings indicative of a monophthongal value, presumably also , are found ‛from about 100 AD’ and ‛confirmed for this period by a specific statement of Sextus Empiricus’ (fl. 2nd or 3rd century AD).

Returning to our word iudaeus, the earliest quotations in Lewis & Short are from authors of the first century AD, such as Pliny. They likely encountered the Greek word with a diphthongal αι, and they likely had a diphthongal Latin ae in their own (presumably educated) speech.

To keep the Romance side of things brief, the inherited descendants of iudaeus unambiguously indicate a Proto-Romance * and rule out **. This does not necessarily rule out a Classical pronunciation with * since such a pronunciation could have been superseded by a monophthongized one imported from later Greek. Still, some form of positive evidence for * remains a desideratum. Nicodene (talk) 22:12, 7 May 2025 (UTC)

I believe there is an interesting point of distinction between a Greek-originating ae and a Latin ae, where Latin ae scans as short in verses when a vowel follows it. The example I remember was praeacutus. This seems to mark some conceptual differences between Greek ae's and Latin ae's. However this is just something I've been told is a fact, so I have no further references. Draco argenteus (talk) 08:59, 8 May 2025 (UTC)

@Nicodene @Urszag Given that ⟨ae⟩ and ⟨ai⟩ do seem to scan differently, would it be reasonable to interpret the difference as for ⟨ae⟩ and for ⟨ai⟩? The presence of gemination being indicated by ⟨ai⟩, as compared to ⟨ae⟩, would explain why ⟨aii⟩ was never used, since it's an odd exception otherwise. Theknightwho (talk) 23:42, 14 May 2025 (UTC)

@Theknightwho Perhaps?

I suppose my main question would be what motivated the change in spelling for the native diphthong from older ⟨ai⟩ to Classical ⟨ae⟩, if not a sound-change like > (en route to later ). Nicodene (talk) 00:02, 15 May 2025 (UTC)

@Theknightwho, Draco argenteus I mentioned praeacūtus earlier. Cser analyzes prae- as /praj/, resyllabified as /pra.j/ before a vowel, but I'm not confident that Cser's analysis of prae- as consonant-final is phonetically accurate, even though it gets the right results. I don't think it is likely that Romans felt ⟨ae⟩ and ⟨ai⟩ functioned to distinguish from , since they used ⟨ae⟩ in the spelling of words like iudaeus: this was certainly not pronounced as , and it seems improbable it was pronounced as . Word-final ⟨ae⟩ is often completely elided, which is not typical behavior for a VC sequence.--Urszag (talk) 00:17, 15 May 2025 (UTC)

@Urszag @Nicodene For the purpose of my point, it makes no difference whether we analyse it as or : the issue is the gemination. Nicodene's point on why the orthography used ⟨ae⟩ is a good one, but if ⟨ae⟩ is and ⟨ai⟩ is , then the apparent gap does still need to be explained.

One reason I can think of is orthographic: the use of ⟨AII⟩ at a non-morpheme boundary creates too much ambiguity, due to the number of ways it could be read. We see the sporadic use of long-I to get around this, but making a distinction between ⟨ae⟩ and ⟨ai⟩ works just as well. Theknightwho (talk) 00:34, 15 May 2025 (UTC)

If the phenomenon only concerns prae, it could just be that prae was subject to some form of unstressed reduction. (À la prehendo?) Nicodene (talk) 01:26, 15 May 2025 (UTC)

I want to push back on the notion that having both forms will confuse readers. If this is really the case, then the link pointed to by the "key" text should be edited to say what these brackets mean. As for getting rid of the dual transcriptions... is it wrong for me to say that I find both forms useful? -BRAINULATOR9 (TALK) 00:47, 4 May 2025 (UTC)

I linked to one example that I think shows this confusion occurs, and I can't point to it but I think I've seen other examples. However, I don't have any way to be certain how common this problem is. Adding more explanation behind a link may do some good, but only if readers follow it, which can't be guaranteed. Could you explain more about what you find helpful about having the phonemic transcription alongside the phonetic one?--Urszag (talk) 01:11, 4 May 2025 (UTC)

I don't see any reason why we would have to have either "//" or ""- if we're not representing our notation as completely standard IPA, we're not bound to using the standard IPA conventions. It would simply be a matter of whether they would help us get our information across to our readers. Chuck Entz (talk) 03:51, 5 May 2025 (UTC)

I favor using only IPA characters in the transcription. Using brackets seems like it helps to differentiate the phonetic transcription from other parts of the entry (including respellings, such as "nichil", which Theknightwho has proposed adding to pronunciation sections for some words).--Urszag (talk) 20:41, 5 May 2025 (UTC)

I think we can safely use without any problems, and I'm really not keen on reinventing the wheel by using pseudo-IPA. Phonetic respellings are useful in that they highlight irregular pronunciations in a way that clarifies the differences. The very fact that "phonetic respelling" is there at all is a flag to users that the term is odd in some way. Theknightwho (talk) 21:34, 5 May 2025 (UTC)

I agree with @Urszag and @Theknightwho. Benwing2 (talk) 21:38, 5 May 2025 (UTC)

Phonemic transcriptions are generally simpler and for a language as widely spoken as Latin was, I can't help but think that not everyone would have realized certain sounds the same way. Admittedly, I know very little about the specifics of how Latin speakers spoke Latin, and my thoughts are purely speculative, but it's nice to see what people thought they were saying versus what they were actually(?) saying. -BRAINULATOR9 (TALK) 01:57, 8 May 2025 (UTC)

If people are confused about what // vs means, they're probably not only confused about it with regard to Latin, but any language, right? I suppose we could have all our various pronunciation modules output explicit explanatory notes when they sense or are triggered to output // vs , like instead of "/fu/, " they could produce "(broad phonemic:) /fu/, (narrow phonetic:) " or something, for logged-out users. (The text could have some class so that logged-in users who know the difference could 'turn it off' / make it invisible.) - -sche (discuss) 03:55, 4 May 2025 (UTC)

I'm all in favor of eliminating the overly narrow transcriptions we currently have, including the unnecessary contrast between velarized and palatalized allophones of /l/ and all but the most essential diacritics. As for qu gu su, w:Latin prosody § Quantity tells us "qu counts as one consonant", i.e. it was /kʷ/, but it doesn't say anything about gu and su before a vowel. —Mahāgaja · talk 14:22, 6 May 2025 (UTC)

Saying that "qu counts as one consonant" is one way of conceptualizing the fact that it doesn't make a preceding syllable heavy. Another way is to analyze it as a tautosyllabic complex onset cluster, and say only heterosyllabic clusters make the preceding syllable heavy (because the thing that makes a syllable heavy is really the presence of a consonant in its coda). Compare patrēs: the first syllable can be scanned short (pătrēs ) but hardly anyone considers in this context to be a single consonant phoneme. We can equally say that ăquă is pronounced , with a light first syllable because the cluster is syllabified here as an onset rather than being split across syllables as . I'm fine with using the transcriptions , but prosody doesn't actually prove that they must have been unitary segments.

Typically, "gu" only occurs after (in the context -ngu-) in Latin. It can occur in other contexts in Medieval or New Latin, but I don't think that's relevant to Classical Latin transcription. So there is no direct evidence of how affects prosody in Classical Latin when it comes directly after a vowel. However, I think analogy is sufficient reason to use the same type of transcription for both "qu" and "gu".

"su" only occurs at the start of a morpheme. The scansion of compound words such as mălĕsuādus shows that it does not make a preceding syllable heavy (compare the scansion of words like respondet , where even though the prefix re- has a short vowel, it gets turned into a heavy syllable by resyllabification of the from the initial cluster in the base word, spondeō.--Urszag (talk) 19:30, 6 May 2025 (UTC)

The only other position it ever occurs is initial, in Medieval Latin (and possibly late Late Latin, too). Theknightwho (talk) 23:14, 6 May 2025 (UTC)

We can look to spelling artefacts such as distinguuntur/distinguntur, which are similar to loquuntur/locuntur/loquntur, which suggest similar behavior of qu and ngu, in being neutralized before another u but variously retained in spelling. Cassiodorus (I think?) specifies that nguu has the first u not pronounced and be identical to ngu. Draco argenteus (talk) 07:25, 7 May 2025 (UTC)

Coming to a conclusion

I think it's time we concluded this discussion. It seems there is consensus among everyone (User:Urszag, User:Nicodene, User:Benwing2, User:Babr, User:Draco argenteus, maybe User:Mahagaja), except maybe User:Theknightwho, to discard the phonemic pronunciation and only display a broadly phonetic one, with the following properties:

not
not
no dental diacritics on , i.e. not # etc.
syllable onsets use not
final nasals use not #

This leaves the following that need resolution:

Syllable final diphthongs; consensus is leaning towards .
Alveolar diacritic on ; leave it or keep it?
Dark (or ? I dunno which is more correct) vs. light or ; do we show this distinction, and if so, how? I'm personally in favor of making a distinction, and probably dark vs. light .
/ɡn/ sequences: scholarly consensus seems to lean towards and I think we should do likewise, even if Proto-Romance evidence equivocally suggests ; keep in mind that (a) there's no reason the prestige variant of Classical Latin that we're documenting has to be the same as Proto-Romance, (b) the evidence for Proto-Romance is (AFAIK) somewhat equivocal.
How to represent <qu>, <gu>, <su> when the u was semivocalic. I have no strong opinions here.

Benwing2 (talk) 23:38, 6 May 2025 (UTC)

My responses:

I prefer for reasons discussed above.
I prefer without a diacritic. But if there’s consensus to include a diacritic, I don’t object to it.
The correct IPA symbol for a velarized lateral liquid is . I wouldn't object to using either or for coda l (e.g. falx , albus , facul ): there's pretty consistent evidence that this was velarized throughout the history of Latin. I don't think we should use before vowels, that is, for /l/ in the syllable onset: the evidence for velarization here is contradictory and suggests either variation over time or between speakers. I'm not a fan of using for the lightest allophone of /l/ (although this does follow Sen's transcriptions): I would instead favor transcribing nūllus as , relinquō as
I prefer , and would object to .
I don’t care whether we use or . I would object to an inconsistent system.--Urszag (talk) 00:16, 7 May 2025 (UTC)

I agree on all of these points. Nicodene (talk) 00:21, 7 May 2025 (UTC)

I agree with all of Urszag's responses. But for 5, I prefer whereas they had no preference. — BABR・talk 08:17, 7 May 2025 (UTC)

OK, based on the discussion below with Draco argenteus, I'm revising my answer for 1 slightly to (and for "ui").--Urszag (talk) 22:01, 8 May 2025 (UTC)

/ae̯ oe̯ au̯ eu̯ ei̯/ are fine with me.
I prefer no diacritic on /s/ (and I'm not even sure what the "alveolar diacritic" is anyway, or did you mean the dental diacritic?)
I remain unconvinced that it's necessary to distinguish two varieties of /l/, especially if, as Urszag says, there isn't 100% certainty of their distribution in onset position.
I prefer /ŋn/.
I slightly prefer /kw ɡw/ but could live with /kʷ ɡʷ/. /sʷ/, on the other hand, really rubs me the wrong way. —Mahāgaja · talk 05:39, 7 May 2025 (UTC)

1. Agree until eu and ei. This primary source evidence makes me support (vowel quality of regular short e) instead of . For ei likewise by extension (although I can't confirm right now, I think Sydney Allen suggests the vowel quality of a short e for both of these), however, I oppose this all semivowel i-final diphthongs (except yi), as they are not recognized in the Roman grammarian tradition to my knowledge, and too many of them can be created to no clear benefit or purpose (ui ai ei oi? how would they be distributed? very contentious and pointless) and I prefer recognizing them as simply u/a/e/o + as a consonant.

2. Prefer . Overall s being "special" is a popular topic, and it looks to be citable as with a majority of scholars behind it.

3. Prefer in syllable coda, other than as part of /ll/. Prefer for clear l without any further specifications. Identity of /l/ in syllable onset is fraught with some difficulties, and the easiest way out would be to follow Allen and make the distribution of light and dark l similar or identical to that in British RP -- clear before all vowels. However I'm not opposed to having a full distribution of dark l, where it is assumed before most vowels.

4. Prefer .

5. Prefer for all three.

Draco argenteus (talk) Draco argenteus (talk) 07:10, 7 May 2025 (UTC)

@Draco argenteus I strongly dislike any scheme that mixes the symbols and to denote offglides. I feel like the transcriptions tend to imply the sounds have more phonetic constriction than and tend to suggest the sounds function as consonants on the phonological level. I don't think there is sufficient evidence to conclude that the "i" in words like deinde, deinceps, cui, huic either functioned as a consonant or had more phonetic constriction than the vowel . But if we do adopt such an analysis, I think it makes most sense to apply it equally to seu and laus, ancient doctrines about diphthongs notwithstanding, so I would prefer over .

Thanks for pointing out that passage in Terentianus Maurus about the pronunciation of "eu". I'm not certain about how the first element of diphthongs was pronounced and I don't object to using instead of .--Urszag (talk) 22:57, 7 May 2025 (UTC)

It's possible to read the i-final diphthongs into cases for some reason not recognized by anyone as diphthongs, e.g. eius, maius, Troia, cuius (eius and cuius are both incidentally perhaps phonetically just ei and cui with one more syllable). Here for some reason the idea of there being a diphthong is not adopted, and the ancient view is used that the consonantal i is just said geminate. For this reason I'm for general to eliminate the arbitrariness (diphthong when it's cui, nuh-uh when it's cuius), while I support a diphthongal identification for au and for eu for traditional reasons. Which I understand sometimes just slavishly follow Greek, though it is on that token applicable for Latin given that eu predominantly occurs in words coming from or through Greek and is then employed in Greek-derived poetic meters. But yes, I just suggest it all because it's as good a place to draw a line in the sand where diphthongs end and vowel+consonant sequences begin as any other. Re: constriction and so on, I do not see or imply any phonetic difference whatsoever between and (these are both a non-syllabic i), except that one conceptually here marks a part of a diphthong for me and the other doesn't. Possibly significantly, eventually the so-called "ui" and "ei" diphthongs lose their diphthongal quality in versification and become sequences of two syllables, although this could be a consequence of doctrine regarding diphthongs. Whereas an analogical uncoupling of au and eu did not happen, to my knowledge, nor of any other diphthongs; in fact this makes i-final diphthongs (other than yi imported from Greek) unique in Latin for being freely dissolved, and in fact, I believe, not considered diphthongs anciently. Draco argenteus (talk) 06:24, 8 May 2025 (UTC)

@Draco argenteus It's true that if we use , there will be alternations between and . But as Cser points out, there are also alternations between and in contexts like the conjugation of verbs like faveō, fautum. I don't see how the one is more arbitrary than the other, and to me, it seems more arbitrary to distinguish from but not from than to make both distinctions in parallel, based on the position of the sound in the syllable. Given your position of "I do not see or imply any phonetic difference whatsoever between and ", do you find the phonetic transcriptions acceptable, even if they are not your first choice? I feel like the variable scansion of "ui" and "ei" actually is an argument for transcribing them with two vowel symbols, rather than a vowel symbol + a consonant symbol, since the disyllabic variants unambiguously contain two vowel sounds.--Urszag (talk) 20:53, 8 May 2025 (UTC)

Sure, it's not my first choice but I find it acceptable, at least for now. Maybe it can be discussed later. The variable scansion does bring one to that topic. But, at this point, monosyllable will be easier for the readers who want to scan Classical verses. Draco argenteus (talk) 21:29, 8 May 2025 (UTC)

alongside would suggest a difference for which we, to the best of my knowledge, do not have evidence. Nicodene (talk) 19:13, 9 May 2025 (UTC)

@Benwing2 If I'm not mistaken, the changes favoured by a majority are:

not
not
remove dental diacritics
syllable onsets use not
final nasals use not #
without diacritic
not ; okay in syllable coda (not in the geminate )
keep

Questions on which we seem to remain undecided are:

Representation of diphthongs
Representation of e.g. aqua, lingua, suavis

Perhaps we can go ahead with just making the first group of changes for now? Nicodene (talk) 03:47, 14 May 2025 (UTC)

Sounds good to me. I'll make the changes in a day or so. Benwing2 (talk) 03:54, 14 May 2025 (UTC)

To give my view:

Fine with all the changes proposed.
My view's hardened on preferring , for all of the reasons I've given above, as well as the fact that they never underwent the post-Classical shift from > (which isn't direct evidence for how they should be treated in Classical, but it's supportive of the other points mentioned).

Theknightwho (talk) 23:35, 14 May 2025 (UTC)

I went ahead and implemented the "changes favored by a majority" and also disabled the phonemic notation unless |include_phonemic=1 is given. I didn't touch the representation of diphthongs or of qu/gu/su, so we are for now remaining with what was there before, which writes and . I didn't change the use of l-pinguis before non-high-front vowels; this might be up for discussion. Benwing2 (talk) 06:15, 15 May 2025 (UTC)

Thank you! One immediate request that I have that follows from the disabling of the phonemic transcriptions is to put the syllabification marks in the phonetic transcriptions. They were removed on the basis that a syllable division is not a phonetic entity. This is strictly correct; however, even though "." is not a sound, syllabification has audible effects and the syllabification of words like abluō is important to the meter of Latin poetry. Of course, this request is conditional on other editors agreeing with me.--Urszag (talk) 06:29, 15 May 2025 (UTC)

I completely agree with this FWIW. Benwing2 (talk) 06:31, 15 May 2025 (UTC)

OK I went ahead and restored the dots. It's a one-line change so we can always undo it if there are objections. Note that praeiūdicō is not being handled correctly either at the Classical or Ecclesiastical level; we'll need an additional rule for this, maybe. Benwing2 (talk) 06:42, 15 May 2025 (UTC)

NVM, it's fixable with a manual syllable break. Benwing2 (talk) 06:44, 15 May 2025 (UTC)

Looks much better, thank you. Agreed about the desirability of marking syllables. For /l/ before vowels, I'm happy with either this set-up (clear before /i:/ or /i/) or the ‛Allenesque’ type (clear before all vowels). Edit: I do however have some doubts about after , as in .

For Ecclesiastical, some possible points to discuss:

→ and removing dental diacritics, as with Classical
→ , as in Italian
→ either or (for intervocalic s, as in rosa)
- is more traditional/‛proper’. seems to be gaining ground, as in Italian. seems to be an attempt to transcribe both at the same time.

Nicodene (talk) 07:04, 15 May 2025 (UTC)

I just noticed all the diacritics disappeared.

If we're going to simplify things, can we document all the scholarly discussion at Appendix:Latin pronunciation? The is where I learned about , and stuff and now it's gone and new learners won't discover this.
Our new IPA is broad so shouldn't it use //?
@Nicodene, Draco argenteus: (dental) is the opposite of the (laminal flat postalveolar) that was there before. Speaking of which why didn't anyone mention "postalveolar," why was there, and are there any scholars arguing for it?

174.138.213.2 01:56, 16 May 2025 (UTC)

Yes, we should include information about the articulation of Latin sounds at that appendix.
The term "broad" doesn't have a clear definition. Sometimes it is used as a synonym of "phonemic transcription": our transcription is not "broad" in that sense since it marks non-phonemic allophones such as /l/ as or , /i/ as (before vowels) or (before consonants). A phonemic transcription represents phonemes. (For reasons discussed above, it is difficult or controversial in some cases to determine what Latin phonemes were present in a word, as in magnus, lingua, ēnsem, or they might not be transcribable with IPA because that alphabet doesn't have letters for abstract phonemes like a "placeless nasal" segment.) The 1999 handbook of the IPA introduces "broad" as a synonym for phonemic transcription, and then differentiates between various kinds of narrow transcriptions. Our current scheme for Latin would be categorized by its criteria as "allophonic"/"systematic narrow", of the subtype "slightly narrow" as opposed to "very narrow". Some quotes: "it is possible (and customary) to be selective about the information which is explicitly incorporated into the allophonic transcription", "Narrowness is regarded as a continuum" (page 29).--Urszag (talk) 02:34, 16 May 2025 (UTC)

If we're going to simplify things, can we document all the scholarly discussion at Appendix:Latin pronunciation?

One suitable place to host this information would be the article Latin phonology and orthography.

The is where I learned about , and stuff and now it's gone and new learners won't discover this.

The picture of would-be precision that our transcriptions previously gave was in large part fantastical.

Our new IPA is broad so shouldn't it use //?

// is for phonemic transcription. is for phonetic transcription, which can vary in broadness or narrowness. Some specialists like to use ⟦⟧ to distinguish (very) narrow transcriptions.

@Nicodene, Draco argenteus: (dental) is the opposite of the (laminal flat postalveolar) that was there before.

is apical (apical alveolar in this case).

Speaking of which why didn't anyone mention "postalveolar," why was there, and are there any scholars arguing for it?

For some citations pointing to scholarly discussions about Latin /s/, see here (with a brief overview here).

Nicodene (talk) 02:54, 16 May 2025 (UTC)

On a new centralized citation system for bibliographic references

Happy 1st of May, my dear fellow editors. I would like to propose adopting a new system for handling citations and bibliographies, centered on a template I've developed called {{bibref}}, which works much like Wikipedia's {{sfn}} or our {{zh-ref}}. This system addresses several longstanding issues with how we cite sources, each requiring its own reference template:

Creating or editing these templates is not easy for beginners and tedious even for experienced editors.
They have grown into the hundreds, making them hard to manage and standardise.
Full citations are quite lengthy considering they have to be repeated on each entry, and when an entry has a good number of them it hinders readability. Some have started to hide reference sections in boxes whenever they get too unwieldy.
The most common way to have |pageurl= link to Google Books or the Internet Archive is to invoke Module:ugly hacks. It is unfortunate that we are still relying on a workaraound for such a basic feature.

The new system would be using {{bibref}} in reference sections (e.g. as in 𐁁𐀴𐀍𐀦 (a3-ti-jo-qo)), which makes abbreviated citations linking to a full bibliography (e.g. the Mycenaean one), itself generated starting from a JSON-like database (e.g. Module:bibliography/data/gmy). I believe this system has a good number of benefits.

The citation syntax and the process of adding sources is simpler and more human-readable, with no need to learn convoluted wikitext syntax and template conventions.
Entry pages remain focused on their content, while full bibliographic details are offloaded to the dedicated bibliography pages.
Each language (or, where appropriate, each family) would its own bibliography page, making it easy to see what works have been cited, as opposed to the current system of checking the template categories (e.g. the Armenian one).
Centralisation also improves maintainability and consistency. It becomes easier to find errors, dispreferred formatting, or missing metadata.
Although the system is definitely far from perfect at the moment (a proof of concept made with Mycenaean in mind, possibly lacking features essential for other languages), I believe it more adaptable to future technical changes. Bots (or tireless editors) will not have to update hundreds of individual templates to enforce them.
All this may encourage better referencing habits. By making it easier to cite, editors moreare likely to actually include proper references.

I started this with Mycenaean Greek, as the examples I made earlier show, and similar templates have had succesful precedents in Chinese and other languages of the Sinosphere (Japanese, Korean, Vietnamese), as well as on Wikipedia. The template should ideally be moved to {{R}}, by analogy of {{Q}}, to save up key strokes (currently under {{bibref}} because it would not have been a good idea to create a template with that name without prior community consensus). If adopted, the transition would be slow and gradual, allowing both systems to coexist.

Catonif (talk) 19:16, 1 May 2025 (UTC)

I believe this system has a good number of benefits.

I also believe so.

I am not sure whether editors will find it easier to cite or to decipher.

Some have started to hide reference sections in boxes whenever they get too unwieldy. Now you hide them on separate pages, as admittedly Chinese pages already do. In any case designs on avoiding to repeat a reference in full if you link another page in the same work are legitimate, which however often is not a need in inline references. See in قالة#References I wanted to reference Høst, Georg Hjersing … page 272 and 277 and not repeat Høst, Georg Hjersing … after 272 again. Another solution would be to separately templatize a linking mechanism and/or only access that via a template that only writes the output “page xxx (linked)”.

I cannot comment on the ugliness of the ugly hack and the fairness of the new page fetching mechanism: it would have to be smart enough to distinguish within volumes of the same work or even works from collected works as in {{R:sem-eth:Littmann}}, no? In general the abbreviated style appears more legitimate in philologies of ancient languages, especially Trümmersprachen, where but academically inclined people read and “of course” know what Nakassis 2013 is, because they have these references all the time.

Either way it sounds like fun. You present new logics many people will not wrap their heads around, or otherwise will succeed in it and then forget about it, if interested so much in Wiktionary as the steady commenters here. It is a bit like trying to convince smokers of substituting their habit with vaping. The implication that there is no need to learn, and practice, something convoluted, and that editors are more likely to actually include proper references by the presented solution, has little verisimilitude from my angle: various citation styles being employed across multiple pages, as the varying qualities and workflows disgorged in them, sound equally plausible. (Lots of anarchists here.) Fay Freak (talk) 20:31, 1 May 2025 (UTC)

Fair points. We can scratch the "easier" aspect of point #1 as it is subjective, though he current system is certainly not easy either: vaping has its inevitable flaws but it is less harmful than smoking. And let's scratch point #6 as well, optimism is not evidence and I do not claim to see the future.

And you are right, this works at its maximum potential on languages with a relatively stable bibliography, such as ancient languages or obsure LDLs, but in the end, even languages with greater literature often have those few go-to sources everyone ends up of-course-knowing. Not sure if you mentioned the Høst as a point in favour or a point against, but with the new system it would be {{bibref|ar|Høst:1781|p=272|p2=277}}, handling the page urls as well. And yes it can handle Littmann, and anything it cannot handle it can still be made to handle, the infrastructure is versatile. Catonif (talk) 21:25, 1 May 2025 (UTC)

If I understand correctly, there will be no easy way of finding which pages use a specific reference. Like we currently do by "What links here"? Vahag (talk) 20:43, 1 May 2025 (UTC)

Right, I can set up a tracking mechanism for that. Catonif (talk) 21:26, 1 May 2025 (UTC)

@Catonif Hi. Are you essentially proposing a replacement for {{Q}} that works similarly but is better written and designed? BTW as for the proliferation of reference templates, before @Vininn126 created all the 6,000 or so Old Polish ones that currently exist, I suggested incorporating them into {{Q}}, but I wasn't able to help out because I didn't have the time and didn't (and still don't) understand how that monstrosity of a module works. I would generally be in favor of that but I'd like to get some more info on the specifics, and in place of things like p=272|p2=277 I'd encourage using a single commma-separated param with inline modifiers if necessary, as it's usually a lot easier to type. Benwing2 (talk) 22:18, 1 May 2025 (UTC)

@Benwing2 Hi! The template is meant for references, so it actually aims to replace the {{R:}} templates, while {{Q}} will keep being used for quotations. About the specifics, I will eventually write a more exhaustive documentation, though for now you can get a rough idea of how it works by seeing the Mycenaean data module and all its current istances, alongside the the Mycenaean bibliography database and its outcome. There is still a lot that needs to be fixed and polished, of course, but thought I would go into that after getting consensus, not to waste time in case people would have disagreed. About inline modifiers, you may add that syntax to the module if you want, although it may get messy. Take for example {{bibref|gmy|DMic.|v=1|mi-ta|p=454f.|da-ra--mi-ta-qe|p2=157ab}}, resulting in DMic., vol. 1, pages 454f.: “mi-ta”, page 157ab: “da-ra--mi-ta-qe”. With your syntax you could do |mi-ta<p:454f.>|da-ra--mi-ta-qe<p:157ab>, and for |p=272|p2=277 intuitively |p=272, 277, but for |mi-ta|p=454f.|p2=157ab? I will leave it up to you if you want to meddle with the idea, although I do not recommend it. Catonif (talk) 23:21, 1 May 2025 (UTC)

@Catonif Thanks. In terms of inline modifiers and commas, I see you are making |p= go with the first term and |p2= go with the second. I definitely think in that case that inline modifiers are better because it gets hairy if you have several terms, although I have a module Module:parameter utilities that specifically supports both inline modifiers and separate numbered parameters for list parameters like this, which I have used for things like {{syn}} that support both syntaxes. For | I was suggesting this under the assumption that |p= and |p2= were two pages for the same term rather than page parameters for separate terms. If you do need a way of specifying two pages for the same term, definitely use comma separators (and without a following space; the principle I've used is that comma + space is used for embedded commas and the separator isn't recognized in such a case). I see no issue with |p=454f.,157ab in case we need to refer to two pages for the same term and the pages have more complicated specs like just given. I will take a look at your implementation but in general it would be nice if {{R}} and {{Q}} were synchronized rather than being two entirely different implementations and double the cognitive burden for editors. Benwing2 (talk) 23:45, 1 May 2025 (UTC)

Actually the templates were for Polish, not Old Polish. Vininn126 (talk) 07:11, 2 May 2025 (UTC)

On this note, I've been thinking about a template to more easily organize multiple reference templates. Something akin to {{reflist}}, but for non-inline templates, and the ability to control their style and even group them by whatever categories are needed for the entry. Vininn126 (talk) 07:16, 2 May 2025 (UTC)

Looks very nice. Nicodene (talk) 09:28, 2 May 2025 (UTC)

I am inclined to migrating to this new system. Thanks for having worked on it, Catonif. What appeals to me most is the standardization of syntax. Currently some people give the page number with |page=, others with |1=. Some give volume number with |volume=, others with |vol=, yet others with |2=. It's a pain to remember which template uses which. I also like the new technical capabilities, such as generating separate external URLs for non-sequential pages; the usual templates link only the first page. The automatically generated bibliography list for a given language is also very valuable for researchers in and of itself.

My concerns are:

Looking for reference templates by typing R + language code + first letters of the author name in the search bar will not work anymore. Searching for the ID in the data module is tedious. It would be nice if the bibliography list generates an easily copypastable ID. We could then look at the bibliography to find what we need.
The references at the end of each article will now be cryptic, barely comprehensible collections of numbers, letters and surnames for regular users without following the link to full bibliography. That means each article alone will be incomplete. I don't mind this as I want to capture readers inside Wiktionary biosphere, force them to read several articles, follow crumbs and maybe solve a riddle before I give them the answer. But others prefer to give full etymology chains, full cognate sets, full definitions, full references on each article, creating self-contained units that can be screenshotted and shared on Twitter.
Filling in data modules like Module:Quotations/xcl/data is a pain. Memorizing the rules of filling in the new proposed bibliography databases is worth only if my next point is solved.
The most prolific reference creators will have to be brought on board voluntarily or forcibly. At least me, the Fairy Freak but also User:AshFox who favours a peculiar syntax in reference templates. If not, we will have to memorize even more ways of formatting references. More pain instead of less pain.

Vahag (talk) 12:41, 2 May 2025 (UTC)

Is the transition to this system mandatory in the future? For example, I am currently actively editing Old Novgorodian and references for it... Appendix:Old Novgorodian bibliography. I'm ready to try to move all this into one module... But why, if I need to edit one specific R, should I scroll down each time, look for the necessary line and in one huge list in the conditional Module:bibliography/data/zle-ono. Is this really more convenient? AshFox (talk) 13:28, 2 May 2025 (UTC)

We don't even know one how one reference template belongs to one language only, so we would scroll multiple lists or use a search after trying one or more lists. Here lies the advantage of {{refcat}}, and {{quotation template cat}} and the categories these templates (formerly nude category syntax) place references in.

How is the resource hunger of the new module accessing these lists? They should only access but one line, like now language data is accessed, otherwise the current citation templates are faster also in this respect.

Still the former can't be mandatory obviously because the outlook of moving thousands of templates to then hide the complete references from the main space and gain a bibliography is not motivating at all, and even leaves the impression – though it be irrational, if somebody agrees to it, which however nobody should suffer – that those who industriously created citation templates to source well and keep the Wikicode clean are now punished. In fact Category:Reference templates by language is a bibliography. The only thing we need is |pageN= or |p=454f.,157ab, as @Benwing2 proposed, within wonted {{cite-book}} templates, and another syntax (like ! determines whether page or pages is written) for only outputting the page without even the reference, useful when the reference is used within multiple footnotes, and on talk-pages discussing pages of a work, and in tables like Appendix:English dictionary-only terms. Fay Freak (talk) 17:31, 2 May 2025 (UTC)

Thank you very much for the input! I will try to tackle the points you have made, and I appreciate the opportunity to improve the system with your help.

@Vahagn Petrosyan: It would be nice if the bibliography list generates an easily copypastable ID. It now does! Try going on a bibliography page (e.g. gmy, or now that AshFox made it, zle-ono) and you will notice on the side bar a link that says "Show editor utilities" (the precise wording can be changed). This shows all the IDs of the sources for easier copy-pasting and searching, alongside a link that takes you to its usage tracker.
ach article alone will be incomplete. That is true, and I agree this is a shift in our philosophy. But I'd argue that (1) information is not too cryptic if there is a bright blue link that shows you what it means. We could even set up the page previews gadget so that a quick hover over the link could show you the full citation. And (2) as I think we both agree, it is not each article that needs to be complete but the project in its entirety. Readers who come here just to see one entry and then clear off or see our entries via screenshots on Twitter probably do not really care about bibliographic details anyways, while for editors and researchers centralisation does pay off.
@AshFox: Is the transition to this system mandatory in the future? I am not the kind of person to go out and impose what I think is the best option on such a great userbase, I was only planning this on the technical side and did not consider amending editing rules to demand this. That said, as Vahag said and as FF illustrated with the comic strip, dual systems can eventually create friction. My hope is that the transition can be gradual and community-driven, and that the system proves useful enough that it gains traction naturally.
Your work on Old Novgorodian bibliography is remarkable and really does you credit, you set a high standard to compete with. What the system aims to do is to make this kind of excellence easier to replicate. Note about should I scroll down each time, look for the necessary line and in one huge list that the search feature of your browser (Ctrl+F) should be about as fast as looking up the template name in the search bar.
@Fay Freak: e would scroll multiple lists or use a search after trying one or more lists. Good point, for that reason I added the option to import sources from one bibliography to another, { import_from = "LANG" }, so sources can be shared accross multiple bibliographies just as now they appear in multiple categories.
eaves the impression that those who industriously created citation templates to source well and keep the Wikicode clean are now punished. I hope not! and hope that they rather feel relieved they do not have to do that anymore. The work that went into those templates are the foundation we are improving upon, not something we are throwing away.

Catonif (talk) 19:27, 3 May 2025 (UTC)

edit]

We have two competing approaches, with no clear guideline.

A look at the entry for ⟨a⟩ illustrates how ridiculous articles can become if we attempt to list every language that uses a basic alphabetic letter. And often a letter will have closely related uses in a number of languages that have influenced each other orthographically. In such cases summarizing that usage in a translingual section would make sense.

On the other hand, we've set up categories for letters and diacritics of individual languages, and have navigation templates for individual alphabets that link to those language sections.

In some cases, a letter or diacritic is used for a single language, and it would be odd to call such situations 'translingual'. Examples are some of the Arabic letters of Serer and Rohingya orthography. At the extreme is ⟨Ⱦ⟩, which doesn't even have a lowercase form because the only orthography that uses it is monocase capital. Another example is ⟨[b⟩, originally a hack for barred b that AFAICT is only used for Kiowa. Sometimes a letter + digraph combo is created independently for two languages, which again would be odd to call 'translingual', esp. if the glyph origins were unrelated.

So, when I come across an article on a Unicode character whose only content is a 'definition needed' tag, and I find it's unique to a single orthography, should I create a dedicated language section for it? What if it's only recorded from two? Or if there are more, and we already have an alphabet nav template for one of the languages, or a category for the letters used by that language?

2.5 thoughts: (1) I don't know if we can sensibly avoid having lots of language sections on a, W etc, (1.5) unless perhaps we move things like pronunciation (/ˈdʌbəlju/, /veː/) to appendices? (2) Maybe we could solve/avoid the question of 'how many languages counts as translingual?' by replacing ==Translingual==, in character entries, with a header like ==Character==?

More information

1: At the end of the day, I don't know how sensibly we can avoid having lots of language sections on pages like a or A. To my understanding, most languages which use a particular letter can also use that letter to name itself, in things like "spell this word with one W", and each pronounces that a certain way, like /ˈdʌbəlju/ vs /veː/, with a certain way of spelling it out, like double-u, and when Wiktionary is one day more complete, we'll have each language's pronunciation and way of spelling it out, etc, and we'll need to put them somewhere. We could just have a ==Translingual== section with a ===Pronunciation=== subsection that has a line for every language, and house all the double-us and wus etc in another section, but ... is that better than giving each language its own section? The latter has the advantage of matching what is done in other cases, with e.g. words; we don't have a ==Translingual== section at dog with separate lines in the pronunciation section to explain that it's pronounced one way in English and another in Afrikaans, and with separate definition-lines to explain that it means one thing in English and another in Afrikaans, etc, we just have separate language sections, so it seems reasonable to me to think that the more intuitive presentation of W might also be to house the different pronunciations etc in different language sections. This does mean that W will have many language sections, but again, is having one language section and then a massive pronunciation section, and long list of 'derived terms' or whatever we classify double-u as, better? Is it more intuitive for readers than putting language-specific content in language-specific sections so they can see all the content for the language they're interested in in one place? I don't know.
(Thought 1.5: An alternative which I recall being mentioned before would be to have language-specific appendices with all the pronunciation information etc, and then have one translingual section at W that contains a litany of links to every one of those appendices, which would work.)
2: If people really want to reduce how many L2s such pages have, and want to avoid having to set an arbitrary or unclear cutoff for how many languages count as ==Translingual==, one spitballing idea would be to create a new L2 header like ==Character== (or ==Glyph==; precise name TBD). Any character that currently has a ==Translingual== L2 header (like T and Т) would see the ==Translingual== header changed to ==Character==, but "language-specific" glyphs like Ⱦ (and Τ?) would also have the possibility of being changed to ==Character==. The current definitions of T#Translingual and Ⱦ (like also ⠃) arguably already adequately convey what languages they're used in, ~"anywhere that uses the Latin script" and "Saanich", or if anyone feels they don't, we could add some more explicit "used in X, Y, and Z languages" clauses. This would solve(?) the issue of having to decide whether a particular character is used in enough languages to be ==Translingual==, since any character we choose to have an entry on could go under the ==Character== L2, while it would not necessarily have any effect one way or the other on the question of whether to have language-specific sections, since to whatever extent people want or don't want to have T#Afar alongside T#Translingual, they can want or not want to have T#Afar alongside T#Character.

- -sche (discuss) 18:17, 3 May 2025 (UTC)

Category: "native english words"

This isn't entirely a serious suggestion, but it nonetheless seems interesting to talk about. I saw that "native Korean words" is a category for Korean, and while Korean etymology on this site seems to be handled inherently differently from English (IE for Korean it's usually "first attested in..." rather than "from proto...")

But a category for native English words wouldn't be a horrible thing to do, even though it's weird and unnecessary.

To make it wholly clear a "native English word" would mean this:

proto indo european --> proto germanic --> proto west germanic --> old english --> middle english --> english

Which means no Old Norse words, no Latin words that came to Old English, no words that are known to be of Celtic origin brought into Proto Germanic (such as "iron"), none of that kind of stuff. The word must be confidently accepted as having come straight through the etymology chain I've listed.

This is quite a silly suggestion, I know. Troopersho (talk) 17:24, 2 May 2025 (UTC)

The other way around, "native Korean words" is a poor excuse of a category and should be nuked. — SURJECTION ^{/ T / C / L /} 18:02, 2 May 2025 (UTC)

@Surjection It does actually have some benefit due to Korean being a language isolate- in effect, its own family. Chuck Entz (talk) 18:50, 2 May 2025 (UTC)

Hardly. For one, there's Jeju, which is more likely a closely related language than a dialect. Secondly, we have Old Korean and Middle Korean as separate languages, so any 'native' Korean term should be marked as inherited from either. Thirdly, the category has been misused numerous times, because it is added by {{ko-etym-native}} - which people have on occasion added to obviously recent compounds, some of which were even formed from obvious recent borrowings. — SURJECTION ^{/ T / C / L /} 18:57, 2 May 2025 (UTC)

It is a useless category because you can browse Category:English inherited terms and so on. Fay Freak (talk) 18:42, 2 May 2025 (UTC)

@Troopersho The category for this is Category:English terms inherited from Proto-Indo-European, which excludes all borrowings from other Indo-European languages. Theknightwho (talk) 10:21, 9 May 2025 (UTC)

Let us please get rid of the Korean category. Polomo47 (talk) 15:30, 9 May 2025 (UTC)

Requested unprotection of sweet summer child

In 2021, the above page has been indefinitely protected, allowing only autoconfirmed users to edit it. The user who did this, and who left the project in 2024, gave this reason, based on what they found on the talk page at the time: Excessive vandalism: people keep falsely adding the claim that this pre-dates modern Game Of Thrones books. However, as shown in (Talk:sweet summer child#Etymology) since, those claims had been largely true. We need to rewrite the etymology section, which I removed for now. We could use any help we can get, including from unregistered users. Regardless, the original reason for the protection wasn't valid, so there's no reason to keep it in place. Renerpho (talk) 04:43, 4 May 2025 (UTC)

@Renerpho: It is the first citation of this sense at Citations:sweet summer child. Despite claims that it was used earlier, no citation was added (note: we already have the ones mentioned on the talk page under “poetic allusion of various meanings”). J3133 (talk) 05:01, 4 May 2025 (UTC)

I have restored the etymology as your rationale, “those claims had been largely true”, is without evidence; we already mention that “isolated occurrences go back to the 1800s” (i.e., the mentioned claims which we already had). J3133 (talk) 05:06, 4 May 2025 (UTC)

I think we may be stuck until (if ever) any other etymology-dictionary or scholarly/reference work looks into this. (I'm surprised by how strident the people who think GRRM either definitely did or definitely didn't coin it are.) I think any wording has to hedge, and acknowledge the prior attestations. The current wording leans towards saying he coined it, but does hedge enough, I think. (I will note that many words and names which people have held him up as coining, like Margaery, have turned out to long predate him, so I would view any unhedged statement that he definitely coined this with scepticism.) - -sche (discuss) 01:13, 7 May 2025 (UTC)

Nakba as WOTD

I noticed that Nakba was set as a WOTD for next week. I have no problem with the entry itself but I'm worried that featuring it on the main page might cause controversy given the current political situation. Brexiteer was cancelled a while back for similar reasons (link to that discussion). What do we think? (@Sgconlaw) Ioaxxere (talk) 06:54, 5 May 2025 (UTC)

Happy to go with whatever the consensus is. It was proposed on the WOTD nomination page. — Sgconlaw (talk) 11:24, 5 May 2025 (UTC)

I don't think I have a problem with the entry or the upcoming feature, and as entries for words go, the entry looks more or less fine. In general, I think I'd enjoy WOTDs more when they are just interesting, qualifying words picked out of a hat and aren't politically topical or remembrance based; or, if there's a good entry that also happens to be political, it can be featured on any arbitrary day instead of holding it on a themed day, and that way avoids some controversy too. But I can see that the featurer takes much pride in constraining nominations into day-related themes. (Otherwise it probably gets boring.) I wonder if forcing themes for every WOTD also biases against featuring the many nominations that are just "plain" words (adverbs, interjections, case in point, blud, ouster). Anyway, there are definitely more than enough interesting and not politically controversial words to feature, for next time! (The pace of nominations is currently slower than one a day, I wonder how come there was such a large backlog previously?) Hftf (talk) 11:55, 5 May 2025 (UTC)

@Hftf: on your last point, it could be because some editors like to nominate a whole raft of terms at one go. If you look at the list of nominations you’ll see some instances where there are multiple nominations all with the same timestamp. — Sgconlaw (talk) 12:43, 5 May 2025 (UTC)

I'd vote against WOTD for any politically charged words like this. There might be some sufficiently aged as to have lost their ability to inflame. DCDuring (talk) 13:36, 5 May 2025 (UTC)

Strategically not the best feature in view of recent US government investigations against Wikimedia questioning its nonprofit status due to alleged foreign-influenced political propaganda. Israel couldn't care less, but we have to keep the main-page innocent enough for MAGA-hats. Fay Freak (talk) 13:42, 5 May 2025 (UTC)

@Fay Freak I don't think we need to yield to such poorly-motivated political threats, as long as we aren't actually doing anything wrong. Wiktionary should remain free of any government interference and as far as possible should shirk any attempts to censor it, if we care to actually be a free dictionary that embodies the values we say we do. At any rate, I just despise the idea of having to conform to some arbitrary threat like this. The good thing is that Wiktionary wasn't mentioned in that document, as far as I could see, and generally Wiktionary seems to have nearly 0 optics effect compared to Wikipedia, so I'm sure we can get away with something as small as this.

But ultimately we should decide separately whether we actually want Nakba to be featured; I think it's a relevant and topical word, and I don't think the fact that it's controversial should disqualify it from being featured, as long as there is no opinion being promoted by its inclusion. Kiril kovachev (talk・contribs) 16:16, 5 May 2025 (UTC)

Given the current administration's obsession with language ("banned words" etc.) it's probably just a matter of time before Wiktionary too will get in the crosshairs. Better not poke the bear. Jberkel 12:22, 6 May 2025 (UTC)

We should under no circumstances kowtow to proto-fascist chuds and bullies. —Justin (koavf)❤T☮C☺M☯ 22:23, 7 May 2025 (UTC)

I don't really see a problem with this entry being feature. The only strong opinion I have, is that I agree with @Kiril kovachev, that we shouldn't yield to political threats from the current US administration. We should only prioritize the reader's feelings, not the presidents. — BABR・talk 19:04, 7 May 2025 (UTC)

I think it's best to stay away from prominently featuring politically charged terms as WOTD, and the I/P area is about as politically charged as it gets. From what I can tell, Israelis and Palestinians have diametrically opposed views of the 1948 war, and featuring the term "Nakba" on Nakba Day will almost certainly be interpreted as a political statement and attract a lot of unwanted attention. The same arguments were (IMO cogently) made for not featuring "Brexiteer" on Brexit Day, and I think the same issue would come up if, for example, we were to feature the word aliyah on Aliyah Day. This has little or nothing to do with the current US administration and any hypothetical threats they may make, and much more to do with the fact that we are a dictionary, and need to avoid any appearance of bias. Benwing2 (talk) 22:20, 7 May 2025 (UTC)

I think keep it: it's a word that someone may plausibly see written or hear spoken somewhere and may want to know what it means. —Justin (koavf)❤T☮C☺M☯ 22:23, 7 May 2025 (UTC)

If we are to keep it I would strongly argue moving it to a non-"themed" day. Benwing2 (talk) 22:43, 7 May 2025 (UTC)

Yes, agreed with this. Fine to keep it - it's a valid word - but let's put it on some other day. This, that and the other (talk) 23:36, 7 May 2025 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ To sum up at this point (correct me if I'm wrong):

Editors who suggest that the entry not be featured at all: @DCDuring, Fay Freak, Jberkel.
Editors who suggest that the entry be featured on a date other than 15 May: @Babr, Benwing2, Hftf, Kiril kovachev, Koavf, This, that and the other.

If we are to feature this word on another date, does anyone object to a date in May 2025? — Sgconlaw (talk) 13:54, 9 May 2025 (UTC)

I think it should be on May 15 and capitulating to gross chuds is bad policy. I respect anyone who thinks that we should generally avoid contentious entries on the front page, tho and have no objection to it being on another day for that reason. —Justin (koavf)❤T☮C☺M☯ 13:56, 9 May 2025 (UTC)

@Sgconlaw To clarify my opinion, I would also not object to it indeed being on May 15, and in my opinion that would be the most relevant day to put it – but having it on another day may be less provocative, if that is what we are going for. Kiril kovachev (talk・contribs) 14:01, 9 May 2025 (UTC)

As the emotionality of the discussion above shows: best to avoid contentious political words of all stripe, on all days. 2A00:23C5:FE1C:3701:5CD6:5C00:85E2:3C8A 14:01, 9 May 2025 (UTC)

Thanks. (Just wanted to point out that 16 May is the International Day of Living Together in Peace, but I guess featuring the word one day after Nakba Day would also attract the same concerns expressed above …) — Sgconlaw (talk) 14:03, 9 May 2025 (UTC)

I think it's fine to feature on the scheduled day. I think it'd be weirder to feature it on some random day. (The specter of Trump, invoked above, can and should be ignored. It's been amply demonstrated that people who comply in advance with whatever they think his likes and dislikes are, especially when, as here, he has no power to make them do anything, simply attract him to make more demands, whereas people who keep doing what they're doing succeed.) - -sche (discuss) 21:02, 9 May 2025 (UTC)

@Sgconlaw My vote is that it not be featured at all, but if it is to be featured I'd prefer a date other than May 15, e.g. early June. @-sche Just curious, if we were to feature the word aliyah on Aliyah Day (see w:Yom HaAliyah) and call out (as we tend to do with "themed" words) the fact that this is a celebration of immigration to Israel, would you object? Benwing2 (talk) 21:24, 9 May 2025 (UTC)

@Benwing2: I hazard that the concepts of punching up and punching down are relevant here. There is less objection to an entry that punches up—targets a group that is of greater power or status—than one that punches down. Thus, an entry that highlights an oppressed group is arguably less objectionable that one that highlights an oppressor or aggressor. — Sgconlaw (talk) 21:52, 9 May 2025 (UTC)

@-sche: This applies for Trump personally. If we can get him or one of his cabinet to leave a snarky remark about Wiktionary disseminated in the media, I would of course support putting up something offensive. If insane or random enough, then you win against the madmen. But the concerns of ourself censoring ourselves, I want to highlight the speciosity of this line of argument, are far from material. The cover is not the book. Compliance only concerns self-presentation, not delivered value—mere marketing stunts, which are cap-a-pie interchangeable like a WordPress theme. There is no legal risk or political risk if you get away with it. But I believe that we are too clumsy and amorphous a mass of secluded scholars to manage the message control ruthlessly to our satisfaction, so there is innegligibly a point in unwanted attention. Fay Freak (talk) 16:13, 10 May 2025 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Current tally:

Editors who suggest (or do not mind) that the entry be featured on 15 May: @Geographyinitiative, Kiril kovachev, Koavf, -sche, Sgconlaw.
Editors who suggest that the entry be featured on a date other than 15 May: @Babr, Benwing2, Chuck Entz, Hftf, Kiril kovachev, Koavf, Ioaxxere, This, that and the other, Urszag.
Editors who suggest that the entry not be featured at all: @DCDuring, Fay Freak, Jberkel, and the IP formerly known as Equinox.
Editors who suggest featuring both this entry and aliyah on 16 May: @Chuck Entz, Sgconlaw.

~~Seems we are pretty tied at the moment.~~ I guess it's OK if I cast a vote. @Ioaxxere, do you have a view? — Sgconlaw (talk) 21:17, 9 May 2025 (UTC)

@Sgconlaw: I don't want to have this as WOTD, because it looks too much like Wiktionary is trying to promote one side of a VERY vehement dispute. How about having a dual WOTD on May 16, with both Nakba and aliyah, to show two sides of a dispute that constitutes one of the biggest challenges to realizing the goals of that day? Chuck Entz (talk) 21:49, 9 May 2025 (UTC)

@Chuck Entz: I rather like that idea. Is aliyah a suitable word—is it a coordinate term to Nakba, or at least sufficiently related to be featured alongside it? — Sgconlaw (talk) 21:55, 9 May 2025 (UTC)

No Hftf (talk) 22:22, 9 May 2025 (UTC)

@Hftf: that's too terse. What are you responding to? — Sgconlaw (talk) 22:33, 9 May 2025 (UTC)

It's not really a coordinate term (what's the hypernym?) and just kind of weird to make special exceptions because a word is in a controversial topic. Will it set a precedent that featured words related to controversial topics need to balance "camps"? This is a dictionary, it should once a day simply feature a word that is interesting and not particularly controversial, and having themes and attempts at counterpoints only increases controversialness of a feature, such that I'd rather just do neither/none, but I can't say if anyone else feels the way I do. Edit: To clarify (and repeating what I wrote above), I don't have a particular issue with the scheduled feature in the queue as-is; my vote would be to leave as-is along with a resolution to keep in mind/avoid future controversialness in word selection and theme, such as by assigning future controversial words to any day. Hftf (talk) 22:52, 9 May 2025 (UTC)

my vote would also be to leave it as-is, I don't believe I said anything about the day it's featured. The only strong opinion I had was that we shouldn't self sensor because of the POTUS. — BABR・talk 05:19, 10 May 2025 (UTC)

It's hard to say: the aliyah was made possible by the Nakba, but the latter wasn't its goal. They're different kinds of things, but they're inextricably linked. It just shows that history is more complicated than the narratives of either side can explain. Chuck Entz (talk) 22:32, 9 May 2025 (UTC)

@Chuck Entz: right, I get it. I'm OK with going with those terms. Does anyone still have objections? — Sgconlaw (talk) 22:34, 9 May 2025 (UTC)

There was no single aliyah: there were multiple waves of Jewish immigration to the Holy Land. —Justin (koavf)❤T☮C☺M☯ 22:56, 9 May 2025 (UTC)

I like this option less than any of the others. Having 'topical' WOTDs is OK I guess, but I think we should avoid making it seem like we're taking a stance on the referent of the word itself. If there's too much risk of that, it seems better to use another word or day. Trying to achieve balance by including words for "both sides" of an issue on one day seems to just further establish a politicized tone to the WOTD (bothsidesism is not apolitical: it is also a political position). I agree with what Benwing's said.--Urszag (talk) 22:53, 9 May 2025 (UTC)

There should only be a single word on a single day. I think it is more evil to shy away from a word due to politics than to just go ahead with it. I would support all words, I would support pushing boundaries. Wiktionary is a more beautiful and long lasting thing than this temporary wicked era. Look forward to seeing my comment in evidence in court when we fight for WMF's freedom rofl sup Judge, please support freedom of expression. Geographyinitiative (talk) 23:06, 9 May 2025 (UTC)

@Geographyinitiative: we have previously featured more than one term as WOTDs on a single day, for example, when we had an anagram theme. However, this is very much an exception for the simple fact that it takes twice as long to set the WOTDs—it's a lot of work. — Sgconlaw (talk) 15:09, 10 May 2025 (UTC)

@Sgconlaw: On balance, I would support option #2, since really the political edge seems to come from the description of Today is Nakba Day, which commemorates and protests the Nakba. Trying to cut out any word referring to a historical event shies too close to self-censorship in my opinion. Ioaxxere (talk) 01:05, 10 May 2025 (UTC)

Updating the tallies, I think that on balance most editors do not mind the term being featured (and feel that to deliberately exclude it would be self-censorship and cowing to bullies), but agree that it should not be on Nakba Day itself. There isn't much support for @Chuck Entz's suggestion of featuring both Nakba and aliyah on 16 May 2025. Thus, I'm going to shift Nakba to a less contentious date. What about (1) 15 November (anniversary of Palestine's declaration of independence in 1988); (2) 29 November (anniversary of the date when Palestine was given observer status by the UN General Assembly in 2012); or just (3) 31 May (no particular commemorative date)? — Sgconlaw (talk) 15:09, 10 May 2025 (UTC)

I agree with Benwing's "I would strongly argue moving it to a non-"themed" day." (1) and (2) are still themed days, so I vote for (3).--Urszag (talk) 16:53, 10 May 2025 (UTC)

@Sgconlaw I agree with Urszag, of course, and would vote for (3). Benwing2 (talk) 10:53, 11 May 2025 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Babr, Benwing2, Chuck Entz, DCDuring, Fay Freak, Geographyinitiative, Hftf, Jberkel, Kiril kovachev, Koavf, Ioaxxere, This, that and the other, Urszag: OK, thanks to everyone for their input. I will set Nakba as WOTD on 31 May 2025. — Sgconlaw (talk) 21:11, 22 May 2025 (UTC)

Famous people bringing attention to Wiktionary

There is a joke from the previous-to-last Reich about a fabricated German word repunsieren. A small number of blokes recline in the boozing ken when one of them designed that he shall be part of history, word history precisely, by not waiting till next day to introduce the verb, in this exact spelling, to general parlance, so he asked the buffetière whether he could do it, whereupon she was grossly offended, however in turn the alemaster—anxious as every neurotypical about the adherence to social conventions, though their origin be buried forever—had to assure his guests that of course they may, by which the first follower was gained. Soon afterhand one could read on signs in public whether said action towards the personnel was admitted, &c.

So who will get the Pope to cite Wiktionary? We are already repeat guests in DOI-literature, but hitherto only worldly and utmostly peripheral one. Fay Freak (talk) 16:49, 10 May 2025 (UTC)

Enabling Dark mode for logged-out users

Hello Wikimedians,

Apologies, as this message is not written in your native language. Please help translate to your language.

The Wikimedia Foundation Web team will be enabling dark mode in this Wiki by 15th May 2025 now that pages have passed our checks for accessibility and other quality checks. Congratulations!

The plan to enable is made possible by the diligent work of editors and other technical contributors in your community who ensured that templates, gadgets, and other parts of pages can be accessible in dark mode. Thank you all for making dark mode available for everybody!

For context, the Web team has concluded work on dark mode. If, on some wikis, the option is not yet available for logged-out users, this is likely because many pages do not yet display well in dark mode. As communities make progress on this work, we enable this feature on additional wikis once per month.

If you notice any issues after enabling dark mode, please create a page: Reading/Web/Accessibility for reading/Reporting/xx.wikipedia.org in MediaWiki (like these pages), and report the issue in the created page.

Thank you!

On behalf of the Wikimedia Foundation Web team.

UOzurumba (WMF) 00:08, 7 May 2025 (UTC)

Street names

(See RFD.) It feels weird to have things like A2 paper size and credit rating but not (as I understand our current CFI) A4, M4 or I-5 (roads). For that matter, I see some lexical value in entries for other (more name-ly named) roads: pronunciation; some have interesting etymologies; a few have translations.
It gives me pause that there are a lot of roads—then again, there are also a lot of personal names: it too is an open-ended class—but it feels weird to have the tiniest villages and rarest personal names, but not even the most prominent roads ... particularly since in some cases (e.g. "she lived near Foobar") it's unclear to "someone run across it and want to know what it means" whether a given thing is a hamlet, neighbourhood, name or other entry that they can expect to look up here, or a road we exclude. So I'm wondering if there's any appetite for changing CFI to allow at least some (or any attested—the same criterion as given names) roads.
(One idea: only allow road names that aren't just applications of personal- or place- names to roads, so no entry for Washington Street because we already have Washington the name.) - -sche (discuss) 01:03, 7 May 2025 (UTC)

Roads with non-literal translations can be translation hubs, and the most prominent roads generally have figurative senses or connotations beyond just their literal meaning. Can you give some specific examples of roads you'd like included but which you think are currently excluded by CFI? If I think of Austin, for example, the "interesting roads" are the two main freeways, I-35 (which is formulaically named, so maybe not so interesting) and MoPac Expressway (technically "Texas Loop 1"; the name "MoPac" refers to the Missouri Pacific railroad and could be its own entry as a sort of abbreviation), as well as some streets with weird pronunciations: Manor /'meɪn.ǝr/, Manchaca /'mæn.ʃæk/ (recently renamed to Menchaca, supposedly closer to the original form, but the pronunciation hasn't changed), Burnet /'bɚn.ɪt/, Guadalupe /'gwɑd.ə.lup/, Brazos /'bræz.əs/; but all are the names of nearby towns or rivers, with the same pronunciation. I'm also concerned that unless we formulate a very narrow exception for roads, we'll be inundated with streets and roads from Joe Schmoe editor's home town. Benwing2 (talk) 06:12, 7 May 2025 (UTC)

Drill commands Category:

Should a drill commands category be created under military category? So far 4 languages have this type of category.

https://en.wiktionary.orghttps://en.wiktionary.org/w/index.php?search=Category%3ADrill+commands&title=Special%3ASearch&ns0=1 𝄽 ysrael214 (talk) 13:34, 7 May 2025 (UTC)

And if not, please delete tl:Drill commands category and its links to the pages. Thanks. 𝄽 ysrael214 (talk) 21:05, 7 May 2025 (UTC)

Change "Negerhollands" to "Virgin Islands Dutch Creole"

Would it be possible to change the language name for ISO 639-3: dcr from Negerhollands to Virgin Islands Dutch Creole? The former term is rather contentious, as it contains the Dutch N-word. The latter is also just a better description of the language and is the term used by Glottolog. 92.254.93.189 19:07, 7 May 2025 (UTC)

Moved from Wiktionary:Feedback for discussion. —Justin (koavf)❤T☮C☺M☯ 19:10, 7 May 2025 (UTC)

Wiktionary and Wikipedia's entry on neger say it's not the "Dutch N-word" (for which a different word is used), but more like the English word Negro; yet it is still increasingly charged. The main argument against the change, as far as I can see, is potential confusion with Virgin Islands Creole (ISO 639-3 code vic), which is not the same thing and is an English-based rather than Dutch-based creole. Possibly for this reason, Wikipedia still uses the term Negerhollands. But in balance I think this rename makes sense. Benwing2 (talk) 22:37, 7 May 2025 (UTC)

Inconsistencies between Persian and Tajik transliterations

(Notifying Atitarev, Benwing2, Rodrigo5260, Saranamd, SinaSabet28, Samiollah1357): also @Light hearted sam, and @स्वर्गसुख (who's recently been editing Tajik)

There are two major inconsistencies between Persian and Tajik transliterations (that don't represent pronunciation differences) that I'd like to Iron out, I'd like to propose two changes for more cohesion:

1. Change Persian ğ > ġ to match Tajik, and other Arabic script languages like Urdu.

2. Change Tajik ʾ > ' to match Persian. Using ʾ in the first place is a bit weird, as it implies a distinction from ʿ, which Tajik does not even have a letter for. — BABR・talk 21:10, 7 May 2025 (UTC)

I would prefer to change Tajik ġ to ğ, but I agree with your other proposal. Rodrigo5260 (talk) 21:27, 7 May 2025 (UTC)

well ġ/ḡ are much more common ways of transliterating غ and ğ isn't really common outside of Turkic languages. — BABR・talk 22:06, 7 May 2025 (UTC)

I also agree with the proposal generally. One could also consider ɣ instead of ġ, which is used in some sources. Samiollah1357 (talk) 22:11, 7 May 2025 (UTC)

I think it's best to stick to modified versions of letters used in English, plus ɣ is already in IPA. One may consider all Arabic script languages adopting ḵ for خ and ḡ for غ, which I would like because it would match the common romanizations of 'kh' and 'gh', and would have more cohesion (for letters that are pronounced the same), though I'm not sure there would be support for such a thing. Plus, as many Indic languages use multiple scripts and try to match romanizations, we would have to bring in Indo-Aryan editors into the discussion so they can have cohesion and... it's probably not worth it. — BABR・talk 18:42, 8 May 2025 (UTC)

I was wanting to propose the ḵ over the current x we have for خ since x could be confusing for someone not familiar with the transliterations here. Light hearted sam (talk) 08:02, 9 May 2025 (UTC)

@Light hearted sam well, If you'd like to propose that, I would support using both ḵ and ḡ in our romanization (as a pair), but not ḵ alone. (ḵ and ġ feels weird) — BABR・talk 12:56, 9 May 2025 (UTC)

Support. Also ğ is confusable with ǧ, which is used in some Arabic transcription systems to represent /dʒ/. Benwing2 (talk) 22:39, 7 May 2025 (UTC)

Support. Please also restore the transliteration of word initially as ' as it has been for ever, even if Tajik doesn't use a letter for it. Anatoli T. ^{(обсудить}/^вклад) 22:59, 7 May 2025 (UTC)

Support. Seems good. Light hearted sam (talk) 08:59, 8 May 2025 (UTC)

Support. स्वर्गसुख (talk) 17:58, 8 May 2025 (UTC)

Transliteration of Initial Ayn

(Top comment repeated for clarity)

Support. Please also restore the transliteration of word initially as ' as it has been for ever, even if Tajik doesn't use a letter for it. Anatoli T. ^{(обсудить}/^вклад) 22:59, 7 May 2025 (UTC)

@Atitarev the change with ع that your referring to was proposed by Saranamd on discord, not me. I supported the proposal, but perhaps it was unfair to have the discussion on discord and not on-wiki. — BABR・talk 23:40, 7 May 2025 (UTC)

@Babr: Yes, thanks. That's what I meant. We should have a discussion and agreement here first. Pinging @Saranamd as well. Anatoli T. ^{(обсудить}/^вклад) 23:50, 7 May 2025 (UTC)

For the context, the initial is not transliterated in:

Classical Persian: (ā'ila) (also Dari)
Iranian Persian: (â'ele)

Arabic: عَائِلَة (ʕāʔila)

Urdu: عائِلَہ ('āila) Anatoli T. ^{(обсудить}/^вклад) 23:57, 7 May 2025 (UTC)

Yes, I would also like to add that I support Saranamd's proposal to remove it, because transliterating initial ع implies a pronunciation difference between ع and consonantal alif/alef (ا), when in reality, they are both representing glottal stops. — BABR・talk 00:02, 8 May 2025 (UTC)

@Babr: A plain doesn’t carry any written consonant, unless it has a hamza, above or below, as in أ or إ.So, the glottal stop is not spelled, unlike the case with . Anatoli T. ^{(обсудить}/^вклад) 03:56, 8 May 2025 (UTC)

@Atitarev What you saying applies to Arabic, not Persian. In Persian, hamza cannot appear at the beginning of a word (and thus is never seated on an initial alif), because an initial alif is always a glottal stop. — BABR・talk 04:17, 8 May 2025 (UTC)

@Babr: There is no difference. What you're saying is that the glottal stop consonant is never written in Persian when is used. If it's not written, we don't transliterate it. The letter alef/alif is not a consonant, it has a special purpose.

There is no problem of NOT writing ' in the transliteration of Classical Persian (alif) or Iranian Persian (alef).

The first letter in Arabic words اِسْم (ism) and إِسْرَاع (ʔisrāʕ) are pronounced the same way /ʔi-/ but are transliterated differently dependent on how they are spelled. The symbol "ʔ" is used for the hamza, not for the alif and that's the same way for Persian or Urdu, even if hamzated alifs are not used often, especially or never word-initially. Anatoli T. ^{(обсудить}/^вклад) 05:51, 8 May 2025 (UTC)

@Atitarev No dialect of Persian (except Judeo-Tat if we consider that a dialect) has ever made a distinction between initial ا, أ, and ع, and we know this to have been the case from medieval sources. All three have always been pronounced as non-phonemic glottal stop. How it works in Arabic (which in any case is not simply a spelling artifact, because اِسْم (ism) and إِسْرَاع (ʔisrāʕ) are phonemically different beyond just the spelling in the way they interact with other morphemes in a way that is not true in Persian) is not relevant for Persian.--Saranamd (talk) 06:23, 8 May 2025 (UTC)

@Saranamd thank you for explaining it better than I could. — BABR・talk 06:32, 8 May 2025 (UTC)

In what world, nation or dialect is a consonant by grammarians? There is no equality between letters and .

@Saranamd, @Babr. I can sense stubbornness, unwillingness to undo. You haven't brought any good arguments. may only represent a glottal stop or nothing but should not be ignored in transliterations. It's just not right. Tajiks dropped it word-initially because of the pronunciation but that change is reflected in the spelling.

Is it because you're just trying to match Tajik оила (oyila) with Classical Persian (ā'ila)? Well, you "achieved" it by dropping the essential letter for users not familiar with the Perso-Arabic script!

Each standard per Romanization of Persian uses a symbol for letter and so did we, until the change. Anatoli T. ^{(обсудить}/^вклад) 06:50, 8 May 2025 (UTC)

@Atitarev

"In what world, nation or dialect is ا a consonant by grammarians?"

In the Persian alphabet, only consonants can carry a vowel diacritic, a vowel cannot carry a vowel. By all definitions alif may act as a zero consonant or a vowel, depending on whether it is acting as the syllable onset or the nucleus.

"There is no equality between letters ع and ا."

They are both consonants representing a non-phonemic glottal stop. If initial alif is a zero-consonant, and initial ayn is pronounced the same, then they are both zero consonants in the initial position.

"Well, you "achieved" it by dropping the essential letter for users not familiar with the Perso-Arabic script!"

A wise man once said " literal transliteration is not very useful" + many 'essential letter' are not distinguished including ذ ز ض ظ which are all transliterated as 'z'. If we were aiming for that, then every letter would be distinguished, which isn't the case for most Arabic script languages on Wiktionary. — BABR・talk 07:19, 8 May 2025 (UTC)

@Atitarev I agree with Babr and do not see why spelling-only features not part of the spoken language should be reflected in the transliteration. This is not done for Urdu or Ottoman Turkish, or indeed for Persian with other homophonous letters that are an artifact of Arabic spelling.--Saranamd (talk) 07:27, 8 May 2025 (UTC)

Historically a Persian/Ottoman/Urdu transliteration scheme that had dedicated letters for each Arabic-script letter was indeed common (and remains common in academia), but my understanding is that this was traditionally (1) due to historical issues with printing Arabic script in an otherwise European-language text, (2) to the benefit of scholars unable to read Arabic script, and (3) in some academic contexts, in order to ensure consistency with Arabic. Since we now provide the Arabic script prominently in all entries and our purposes are to document the current language, none of these issues seem particularly relevant. Saranamd (talk) 07:29, 8 May 2025 (UTC)

Animals as Foods in Maltese

In Maltese most animal meats have the same name as the animal, such as baqra, tiġieġ, fenek, ħut and others, just like in English, though for distinction can be said as laħam tal-fenek etc.

What should be done to add these to the mt:Foods category since the animals themselves aren't food? Should we make new entries such as laħam tal-fenek or add a new definition like in English entries stating The meat from this animal? Melithius (talk) 00:27, 9 May 2025 (UTC)

How about adding them to Category:mt:Meats? Whether or not you have separate definitions for the animal vs. the food, chicken as a food is a type of meat.

By the way, to partly answer a question you asked elsewhere: English is unusual in having words for meats that are completely different from the words for the animals they come from. That's because at one time the peasants who raised the animals spoke English, but the paying customers (and landlords who were paid with what the peasants produced) spoke French. Basically, if it was too big to hand to someone as a recognizable animal, they had to tell them what kind of meat it was in the recipient's language: the meat of a cow (from Old English cū) was beef (from Old French buef); sheep (from Old English sċēp), mutton (from Old French mouton), etc. (it's more complicated than that, but you get the idea).Chuck Entz (talk) 03:53, 9 May 2025 (UTC)

Thanks for answering that question, very interesting!

Right, but if I add them to mt:Meats, can I not as well add them to mt:Foods? Thats what I did initially but I was told ‘the animal itself is not food but its meat is’ so thats why I suggested a sense for meat. English chicken and rabbit has this as sense 2: ‘meat from this animal’, so I suggested we do the same and add another sense.

I still think they should just be added to mt:Foods as most of the time the meat and the animal share exactly the same name.

So do you think they should just be added to the categories as is? Melithius (talk) 07:39, 9 May 2025 (UTC)

We categorize the cultural understanding, right? German Meerschweinchen (“guinea-pig”) is raised by some—not outlawed like Hund (“dog”) and Katze (“cat”), don't ask me for the constitutionality—for meat, but most 18-year-olds (soon Schulabgänger) are not informed about this, so only German Schwein (“swine, hog, pig, also pork”) gets added. Neither is Känguru (“cangaroo”) even though most speakers have only immediately seen it in the restaurant, where it was my favourite. In Russian you can use свинья́ (svinʹjá, “swine, pig, hog”) in a meat meaning, because the homo sovieticus is a sore coarse person, but the proper way is to say свини́на (svinína, “pork”), in the same fashion derived terms are used for the other common meats. On a quick glance people do it intuitively right, though I admit that I aligned my lexicographic reasoning with expected intuition. Fay Freak (talk) 11:24, 9 May 2025 (UTC)

So what do you think should be done for Maltese if everyone here understands that for example 'tiġieġ' is chicken meat (assuming context is given). Melithius (talk) 11:30, 9 May 2025 (UTC)

Add it as one of the Category:mt:Meats, as I think that few would oppose adding Schwein to Category:de:Meats, where it currently does not reside. I refer to @Fenakhay for confirmation of my belief on the Maltese side, though I wonder about the criteria for doing otherwise. I don't see a rule like “the term should specifically mean a meat and not the animal”.

There is another argument, of supporting vocabulary learning, one of our primary purposes as a bilingual dictionary; when I started Arabic, as well as any other language, I specifically made note of common grains as you should have staple-foods in your vocabulary and so meats are a separate list, where you don't need to have the term for dog or dog meat unless you learn rural Korean; the ones of animals and vegetables are longer lists, but you talk about fauna and flora in different contexts. Fay Freak (talk) 12:27, 9 May 2025 (UTC)

In English, sentences like he is eating chicken can contrast with sentences like he is eating a chicken or ...is eating chickens, and since we consider the lemma in all three cases to be chicken (not e.g. *a chicken), this helps to show that the animal and meat senses are distinct. On the face of it, it seems reasonable to also have distinct definitions for the corresponding senses in Maltese, as you suggest. (If anyone thinks the senses should not be distinguished, perhaps they can articulate why.) - -sche (discuss) 20:41, 9 May 2025 (UTC)

Ah okay thanks. There are similar grammatical distinctions in Maltese using the article il- Melithius (talk) 20:50, 9 May 2025 (UTC)

Template:bg-conj

What a monster of a template! More than two screenfuls are dedicated to instructing the reader how to form compound tenses. This isn't even the same scenario as German or French, where verbs vary in terms of which auxiliary they use (avoir/haben vs être/sein) - it seems like a lot of the Bulgarian table's content is just hard-coded directly into the Lua module; only the past participle varies.

What's worse, some of the instructions are pretty intricate:

Use the present indicative tense of съм (leave it out in third person) and гово́рил/говори́л¹ m, гово́рила/говори́ла¹ f, гово́рило/говори́ло¹ n, or гово́рили/говори́ли¹ pl

I do not think there is any value in repeating these lines of grammatical textbook content in every single Bulgarian verb table. If you know Bulgarian grammar, it is a waste of space for you. If you don't know Bulgarian grammar, it seems to me that it is presented in too abbreviated a manner to actually be useful - you have to consult other entries anyway to construct the forms.

Our readers of Bulgarian entries would be much better served by presenting only the "atomic" forms in the verb table, and moving the instructive material to a dedicated Appendix:Bulgarian verbs.

It's worth also comparing the verb table for closely related Macedonian. The verb morphology of these two lects seems to be very similar, yet {{mk-conj-table}} (e.g. at стрела (strela)) presents the compound tenses in a much more reasonable (imo) format.

Pinging @Benwing2, @Atitarev for input. This, that and the other (talk) 12:59, 9 May 2025 (UTC)

@This, that and the other I haven't had any issue with this template over the years, but now that you mention it, there is quite a lot in there :) In my opinion, it is nice to have virtually all of the possible constructions mentioned in the table, because then it at least shows to readers that they all exist – but, you might be right that a change may be in order.

Maybe we can ask any known Bulgarian readers, to see whether they would find the table better without the grammatical instructions, or whether they've found them useful. Pinging @SimonWikt, what do you think about this?

Macedonian is definitely a very good format though. Very compact and to-the-point.

By analogy to Macedonian, I think I might make the following hypothetical suggestions to the Bulgarian format:

The imperatives can be moved after the base verb forms like the indicative, etc.;
The participles might be better to go after the imperative;
We can potentially give (actual examples of) the constructions of the compound tenses etc., like Macedonian seems to do as well, in the places which currently consist of instructions.

Also, I feel like I remember something similarly-named to {{bg-conj-full}}, but I don't quite recall what it's called, but as far as I remember it tried to fully expand the instructions into their actual forms and was generally not used on entries, probably because it was very huge. I'm not sure whether that would be helpful to reference as well here. Kiril kovachev (talk・contribs) 14:20, 9 May 2025 (UTC)

As a learner of the Bulgarian language and not yet fully conversant with the grammar I have found the conjugation tables with the instructions very useful, it is helpful to not have to go to other pages or appendices.

I would however agree that a change of layout would be be helpful:

Imperative and possibly conditional after the indicatives
Participles at the end.

SimonWikt (talk) 16:31, 9 May 2025 (UTC)

I don't necessarily have a problem with having tons and tons of forms but having them be collapsible and collapsed by default could be a good solution. Vininn126 (talk) 17:03, 9 May 2025 (UTC)

@Kiril kovachev See съм (sǎm) for an example with the "full" table. It uses e.g. {{bg-conj|съм<irreg.impf.intr>|full=1}}. @This, that and the other Bulgarian verbs are very unlike any other Slavic language except maybe Macedonian (although I think Macedonian simplifies the verb system somewhat compared with Bulgarian). Languages like Russian drastically reduced the Proto-Slavic verb system; OTOH Bulgarian not only kept all the original complexity but (if I recall aright) added a distinction between aorist and imperfect l-participle, and much more significantly, innovated a 4-way evidential distinction that cross-cuts all other categories, essentially quadrupling the number of possible forms. The way to form all the distinct evidential categories is through various periphrastic constructions, but the mapping between category and construction is not very simple or easy to remember, hence the table that spells out the way to form each possible tense/aspect/person/number/evidential combination. As you can see by comparing съм and говоря, there are two styles, the "compressed" one used on most verbs and a "full" one used on only a few verbs. The structure of the verb tables was already in place when I rewrote them in Lua; I didn't change that. We could definitely make the tables look more like the Macedonian ones but I still think they would be bigger, as (AFAIK) Bulgarian has more distinctions than Macedonian (also Bulgarian has free stress while Macedonian has fixed stress on the antepenultimate, and some verbs have more than one possible stress, as in the example with говоря that you quote). Benwing2 (talk) 19:19, 9 May 2025 (UTC)

Thanks for the input; this is incredibly insightful and I'm grateful for the constructive ideas!

It seems like there is some agreement on what can be done to improve the template. As much as the purist in me would like to get rid of the compound tenses altogether (especially the perfect tenses, as their construction seems totally predictable), it seems there is value in keeping them.

Keeping these tenses means the table will remain long. It would be possible to make the compound tenses hidden (collapsed) by default within the template, as Vininn suggested.

I'd like to propose a couple of ideas for {{bg-conj}}:

Keep the template largely as is (noting that it is wider than it needs to be and this cannot be easily fixed in its current form). Use Module:roa-verb/style.css to get dark mode colors. Move infinitive up and participles down, as suggested by Kiril and Simon. Maybe un-bold the instructions to reduce loudness.
or
Rewrite the template along the lines of the Macedonian template. Rather than shouting instructions at the reader, we give them an example of how to form each compound tense, alongside a concise explanation of the rule. (Having mocked this up, I feel this is much more useful than the instructions alone.) I've made a mockup of this option at User:This, that and the other/bg-conj - very much open to input on the layout and formatting.

Thoughts on this? @Benwing2, Vininn126, SimonWikt, Kiril kovachev This, that and the other (talk) 11:54, 10 May 2025 (UTC)

@This, that and the other Your mockup looks good to me. Can you complete it with the remaining evidential/tense/aspect/mood combinations and also sketch out a "full" one (like what currently is in use for съм and ща, and maybe should also be used for имам)? If you do this, I should be able to modify Module:bg-verb to follow the new table format. Benwing2 (talk) 10:52, 11 May 2025 (UTC)

@This, that and the other Wow, this is really cool. Sorry I didn't respond for 5 days – looking at this now though, I like your new idea a ton. If you'd like I can try to fill this in tomorrow with the remaining forms so we can look at how it would look like fully filled-in? (If you are short on time of course. Otherwise I wouldn't want to interfere with it if you wanted to do it yourself.) Kiril kovachev (talk・contribs) 23:35, 15 May 2025 (UTC)

@Kiril kovachev Just speaking for myself (and not for TTO), if you can do that, it would be great. Definitely, the new structure is better than the old. Benwing2 (talk) 23:43, 15 May 2025 (UTC)

@Benwing2 Sorry for my delay in this — I've been dealing with some tough operating system problems, but they appear to be resolved now :) I have finished the indicative forms' part, but I underestimated how much pasting around there would be. Hopefully I'll be able to get to it by tomorrow, however. Kiril kovachev (talk・contribs) 22:13, 18 May 2025 (UTC)

@Benwing2 I added the forms of the participles and the renarrative now – right now I'm struggling to summon the table markup to represent the dubitative, conclusive, and conditional, though – did you intend for us to have those be part of this preview as well? I can try to figure that out if so.

Additionally, I am finding the alternative-stress aorists to perhaps be a little noisy to include in every single box, though that's just my opinion – I find seeing those in all of the boxes where the aorist participle is required might perhaps be able to be abbreviated, since we're instructing users to use that participle in those constructions anyway, and it's already documented in the participles section as well? It might just help to alleviate some of the extra crowdedness and duplication. Although really it's not all that big of a deal, I feel like it might be an improvement anyhow. Kiril kovachev (talk・contribs) 22:42, 19 May 2025 (UTC)

@Kiril kovachev Thanks for doing this. If you can add abbreviated versions of the dubitative, conclusive, and conditional, that should be enough; you don't have to fill in the actual forms and if it's too much hassle to lay out all the rows, maybe just lay out the header and one or two rows so I have an idea if how the color scheme etc. should look. As for abbreviating the alternative-stress aorists, I'm not opposed to that; I would suggest maybe filling out the alternative stresses in the first two rows of each section and putting , ... after all the rest to indicate that the alternative stresses apply to them too. Benwing2 (talk) 23:17, 19 May 2025 (UTC)

@Benwing2 How is this? I hope I got them all right, I must admit after a while I partly lost track of which participle and which mood of which tense formed which compound tense...😅 Also, I was not sure about the positioning of the conditional, so perhaps it could go at the bottom or before the renarrative. And, the renarrative color is changed from the current template's, primarily because I couldn't guess what inflection-table style should be given to it to give the original :) Once again, I apologise for making you wait; it seems I ought not to have suggested I had so much time available.

As final observations, I cannot say why, should there not perhaps be negatives of the future perfect/future-perfect-in-the-past forms? I included one for the dubitative, just as a test of that, but I don't know whether you think it has a grammatical purpose to be there, or whether I've perhaps conjugated it incorrectly. Finally, for the conditional, I was finding the markup hard to wrangle with, so I've not been able to remove the "future" label to the left, which I feel could perhaps be removed. Would you know how to do this?

Thanks again, Kiril kovachev (talk・contribs) 22:40, 26 May 2025 (UTC)

@Kiril kovachev Thanks! I fixed a few issues and did some restructuring. In particular I separated out the 1st/2nd person from the 3rd person in the places where it's constructed differently, and made sure to include the alternative-stressed aorist participle everywhere (since it's the only thing that distinguishes the aorist participle from the imperfect participle in this verb, which in turn is the only thing distinguishing e.g. the present/imperfect from the aorist renarrative mood), but I included it in a shortened fashion by just putting the alternative stressed form in parens after the regular stressed form. I also added fem/neut endings everywhere where appropriate, which I feel are important to note, both because it's not obvious for a learner which forms vary by gender and which ones don't, and the word бил is ending-stressed while the word щял is stem-stressed. (It would be nice to show the fact that щял in the plural is ще́ли, but that probably would end up taking too much space.) I also added the negative of all the future (perfect) (in the past) forms since it's not simply не + positive, and removed the negative conditional, which *is* just formed this way. I wasn't sure how to fill out the negative future forms, maybe you can do that. Also the directions in some places say e.g. present/imperfect of съм + declined past active imperfect participle when the actual order appears reversed; can you correct the directions to agree with the examples? I think once we make these tweaks we should be ready to implement it. Benwing2 (talk) 08:21, 27 May 2025 (UTC)

I made the above changes and also added a footnote for all forms that are conjugated for person and number. Let me know how it looks now. Benwing2 (talk) 21:40, 27 May 2025 (UTC)

@Benwing2 Awesome, thank you! Ever-so-sorry that I was not able to look at this in at all a timely fashion, and thanks for going to the effort of making the fixes yourself. I think it looks very good. Should we perhaps write something in the conditional's left box, e.g. simply "conditional", just to fill the blank?

I also like the addition of footnote #2 for marking conjugations. I think that seems very efficient.

I apologise to ask now, but did you already take care of the part with the negative future forms? Additionally, sorry for the original mistakes, and thanks for weeding them out! Kiril kovachev (talk・contribs) 21:40, 7 June 2025 (UTC)

I agree and added "conditional" in the left box of the conditional forms. I did fix up the negative future forms previously. If you like the way it looks now, I'll (at some point soon, hopefully) get to fixing the module to reflect this. Benwing2 (talk) 21:48, 7 June 2025 (UTC)

@Benwing2 Oh, thank you – it looks good to me! @SimonWikt, @This, that and the other, what did you guys think about the changes? Kiril kovachev (talk・contribs) 23:28, 8 June 2025 (UTC)

New Persian transliteration proposal

(Notifying Atitarev, Babr, Benwing2, Rodrigo5260, Saranamd, SinaSabet28, Samiollah1357): and @स्वर्गसुख

Concluding the above thread where these suggestions were made:

Transliterating initial ع: (Mostly) rejected
Changing Tajik ʾ to ' Agreed
Changing Persian ğ to ġ, matching Tajik. Agreed

On the last point, while I originally agreed, I thought about changing x (خ) to ḵ since it would match with Arabic and avoids any confusing for those not familiar with the transliteration system. And to make it consistent (+ matching with Arabic again), change ğ to ḡ instead of ġ. So the new system would be:

Persian غ and Tajik Ғ (+ fa-ira ق): From ğ/ġ/ğ to ḡ
Persian خ and Tajik Х: From x to ḵ

Light hearted sam (talk) 19:31, 9 May 2025 (UTC)

Support, but only on the condition that they both match, so I'd like ḡ and ḵ, but i'd be opposed to say ġ and ḵ. Also, the fact that ḡ and ḵ match the common romanizations of gh and kh are a plus for me. Plus it would kinda match transcribing ENP ڤ and ذ as ḇ and ḏ? But honestly, I think x and ġ are already an improvement, so even though I slightly prefer ḡ and ḵ, I'd honestly be content either way. — BABR・talk 19:57, 9 May 2025 (UTC)

As for the transliteration of ع, it makes sense only when transcribing the dialectal pronunciations of places like Nishapur or Kulab, where /ʕ/ exists as a sound. E.g. (ʿasb) for the dialectal form of (asb). Samiollah1357 (talk) 20:41, 9 May 2025 (UTC)

It is only x in Iranistik, never ḵ. Don't forget that Persian is an Iranian language, not an Arabic dialect. Vahag (talk) 20:51, 9 May 2025 (UTC)

@Vahagn Petrosyan what? many Persian dictionaries use ḵ/k͟h and ḡ/g͟h (In fact, I see those more frequently than x and ğ), and they match the much more common lax transliterations of kh and gh. And I'm not sure why usage by Iranists matters but the largest Iranistik encyclopedia (Iranica) exclusively uses ḵ and kh, never x — BABR・talk 21:00, 9 May 2025 (UTC)

I meant historical linguistics works, not synchronic dictionaries. Vahag (talk) 21:17, 9 May 2025 (UTC)

Linguists and lexicographers of modern Persian overwhelmingly use a variation of kh in their transcriptions (as do the literal government's of Iran, Afghanistan, and Tajikistan). I don't agree with the notion that the practices of Iranic historical linguists trumps the practices of literally everyone else — BABR・talk 21:55, 9 May 2025 (UTC)

Oh but to clarify: I'm only saying that I disagree with the idea that the usage of ḵ is outlandish (it's very common), or that we must use x because certain linguists do. I'm not trying to imply that we must use ḵ or anything, lol. — BABR・talk 23:27, 9 May 2025 (UTC)

The transcription or transcription of Persian should match Old Persian, the other modern and historical Iranian languages, and Proto-Iranian as *ráwxšnaH. Hence introducing ḵ is inacceptably odd. In addition we get a lot of Iran-stans who only know fictitious Iranistics including Goths in Northwestern Iran and are unbrokenly on the rag for making Persian appear to depend on Arabic.

In Arabic then, ḵ is an anomaly specific to English and we already discussed possibly dropping it, perhaps for x, otherwise for ḫ.

None of the representations of this phoneme is reliably understood, including the digraph by reason of its ambiguity, if you discount any educated understanding, so all arguments for popularity are specious. Even though we do not merely target full-time academics, we have to assume some familiarity with professional standards for international presentation of groups of languages for efficient use of the dictionary.

I agree with changing Persian ğ to ġ. Fay Freak (talk) 01:42, 10 May 2025 (UTC)

@Fay Freak I do think that not wanting a random cutoff from x -> ḵ into modern Persian is fair. Though I don't think 'kh' as a diagraph is odd, it's by analogy to English 'th', t = /t/ and th = 'θ', the fricative equivalent. By the same logic, kh being /x/ makes sense. The underline is similar in concept, just less confusion because it's clear it's not a cluster of /k/ + /h/ — BABR・talk 02:24, 10 May 2025 (UTC)

@Fay Freak To clarify, I wasn't trying to make Persian "depend" on Arabic (how would a change of translit do that?), and you do have a point. (But still, having all sounds with no letter in English be a variation of another letter would be a plus). I'll roll with the any reached consensus. Light hearted sam (talk) 09:03, 10 May 2025 (UTC)

I agree with @Fay Freak and @Vahagn Petrosyan. The lax transliteration using the digraph <kh> is of course the most common, but if we're avoiding that, <ḵ> is indeed an artifact of Semitic transliteration schemes and <x> is the standard in Iranian linguistics. Encyclopaedia Iranica has a transliteration scheme overly dependent on Arabic, e.g. they write ث as <ṯ>.--Saranamd (talk) 10:39, 10 May 2025 (UTC)

Encyclopaedia Iranica too uses <x> in diachronic articles. For example here, "NPers. āxor". Vahag (talk) 10:51, 10 May 2025 (UTC)

Future of the Eggcorn Database - at Wiktionary?

The following was posted on the American Dialect Society e-mail list

"Chris Waigl

"Sat, May 10, 10:34 PM

"to ADS-L "The Eggcorn Database has in recent months and years been rather unstable (broken security certificates, or even complete outages). The cause of this sorry state of affairs is a lack of investment of time and efforts by me, in part due to my procrastination before the rather daunting task of fixing the underlying issues of outdated, broken software.

"The site's status and future has been recently raised in a forum thread here: https://eggcorns.lascribe.net/forum/viewtopic.php?id=7470 . In it I provide some more background information as well as ruminate on options.

"Since then my thinking has come around forging a path forward in the following manner: a) back up all existing content and b) convert both the forum and the ECDB into a static site, preserving all URLs. I would also at this time move the site to a better hosting provider. However, this would be the end of new posts to the forum, and also the inability to resume EDCB entries. It could be revived in the future or the content reused with no more effort than it would take now.

"This is by way of an announcement. Any thoughts are welcome, too.

"Chris"

We've had a passing interest in this for a while. If we had more content, it might become something sustainable. We might win some new contributors as well.

Would it be worthwhile to offer to investigate hosting this? DCDuring (talk) 13:14, 11 May 2025 (UTC)

What kind of eggcorns does it include? Ones that are found in many books, or ones that one person heard once? I.e., would many meet CFI or would it be akin to the List of Protologisms that was deleted? If it would not just be another List of Protologisms, but would contain at least a decent proportion of CFI-meeting things, then (to me, at least) it seems like the kind of thing we could easily spare an appendix for, or at least tolerate a userspace page being used for if the user were also making helpful edits to the rest of Wiktionary (e.g. adding the most commonly attested, CFI-meeting eggcorns to mainspace). But I don't know if the database is structured in a way that would make it easy to construct an appendix out of, or not. - -sche (discuss) 17:58, 11 May 2025 (UTC)

I give a certain amount of credit to the man's active membership in the American Dialect Society. I personally have only the mildest interest in eggcorns, but thought some here might be interested. DCDuring (talk) 19:15, 11 May 2025 (UTC)

Here is the database. It has 648 items. Of the few that I have looked at, all seem to have 3 or more cites, though not always from durably archived sources. Analysts or reporters on individual items include Mark Liberman, Ben Zimmer, and Arnold Zwicky. DCDuring (talk) 19:32, 11 May 2025 (UTC)

Well, see it that way, Chris, there is no one to hinder you creating well-formatted and -supported entries. Most people aren't enough into the meme to be fastidious about what an eggcorn even is. Our definition of eggcorn does not really say by which logics or plausibility test a word is assumed and then declared an eggcorn, though I assume it is etymological connection, in which cases more serious people grown up in the Old World speak of misconstruction, so I guess other people than me would have entered insider baseball as an eggcorn instead of misconstruction, and for the philologies of historical centuries we are at a loss, where there are lots etymological reconnections actualized by speakers, or, if the language is educated and artificiality or lectio facilior is admitted, writers, but the fashionable linguists inventing such novel terms who are early adopters of the blogosphere don't appear to have lifetime over to think through these matters, and they always ignored me anyhow when I wrote them, very slick in not failing to appear professional so much that they would feel to be downgraded contributing here anyway, before noticing that you gain some practicality in mirroring the wilds of language by admitting for some inconsistency. Fay Freak (talk) 12:41, 12 May 2025 (UTC)

I emailed Chris about this and she responded:

Thanks so much for your message and kind words about the Eggcorn Database. I have a lot of respect for Wiktionary and other Wikimedia projects, so it's great to have a personal connection.
It's an intriguing thought to integrate eggcorns into Wiktionary. I'm not sure how good a fit it would be. And I would have to give thought about the licensing issue. We clearly neglected to resolve that back then, and now of course a lot of the contributors have dispersed. On the other hand, online lexicography is certainly in flux and will hopefully be, so we should at least remain aware of each other's projects and future opportunities for coordination or collaboration.
My message to ADS-L got several private responses including from the "old" online linguistics blogging community I used to be part of, 20 years ago. Now we can add podcasting to it. One thing Wiktionary wouldn't solve is the future of the forum and residual community, and that's where several of us have been thinking about. I'm not sure where this is leading, but it's what I want to explore first. This said, a category "English Eggcorns" in Wiktionary could exist in addition to any other eggcorn site, if we can resolve the licence suitably.
I'll let you know where the conversation is going. I'll certainly not delete anything or make it impossible/harder to retrieve the content. But I may prioritise stabilizing the hosting situation and finding a home for the comunity in the forums.
Best,
Chris

Ioaxxere (talk) 17:51, 12 May 2025 (UTC)

Thanks. Judging from the modest enthusiasm both here and at the Eggcorn Database, I'm skeptical about the prospects. I'm not even sure that there is any interest here in "eggcorns' as a category as opposed to "misconstuctions". DCDuring (talk) 19:08, 13 May 2025 (UTC)

I mean, there's enough enthusiasm for eggcorns here that we do have Category:English eggcorns; I see no problem with looking at the eggcorns the database has found, checking which meet our own CFI, and spinning up our own entries for those. But that would indeed not solve the issue of where to host the forum. - -sche (discuss) 00:19, 14 May 2025 (UTC)

The forum could be at Wiktionary talk:English eggcorns (with a relevant guideline/policy page) or Category talk:English eggcorns. —Justin (koavf)❤T☮C☺M☯ 00:30, 14 May 2025 (UTC)

That would be practical, but I'm not sure that our environment would measure up to a collegial academic one. Also, I don't think our license is negotiable, which may not suit at least 2 of the distinguished contributors. DCDuring (talk) 01:21, 14 May 2025 (UTC)

keep yourself safe; filter-avoidance terms (vs spellings)

I don't think keep yourself safe is a filter-avoidance spelling: it's a whole different set of words, which I expect are not just spelled but pronounced differently in spoken filter-avoidance, like I hear people say sewer slide and corn aloud. My instinct is to redefine it as a {{synonym of}} like sewer slide, and mention filter avoidance in the etymology... but there seem to be a fair few entries in the same boat (we are also currently presenting grape as a "spelling", but in my experience it's also spoken differently, just like corn), so I want to check: do you agree with redefining keep yourself safe and grape as not mere spellings? and do we want a category for them, like "CAT:Filter-avoidance terms"? (We do categorize e.g. "archaic terms" as well as "archaic spellings".) - -sche (discuss) 17:42, 11 May 2025 (UTC)

Filters are just one method of censorship. Before that people were writing things like "read that fine manual" to avoid sanctions from newsgroup moderators, and before that it was things like "jeepers, creepers" that were used to avoid punishment by parents, teachers, etc. And then there's the matter of how US English ended up with terms like chickadee and donkey... Chuck Entz (talk) 21:09, 11 May 2025 (UTC)

So create {{filter-avoidance form of}}, abbreviated {{fa f}}. Everyone, conceding our diachronic perspective on languages, will own that this occurs at least when a spelling was only used to avoid textual filters, but later pronounced, possibly only because many people tried to be funny. The precedence of the former template will grow out of date when audio recognition in social media matures, so filter-avoidance forms will have distinct pronunciations in the first place. Fay Freak (talk) 12:15, 12 May 2025 (UTC)

I agree that these are strictly not filter-avoidance spellings, and suggest it would even be bad to say that these terms are "filter-avoidance" anythings (i.e., that the users of the term were specifically intending to avoid filters) in Wiktionary voice without good evidence. They're in a class of terms I don't exactly know how to call, or even how they overlap in a Venn diagram with other types of euphemism and algospeak, but would label with something like "euphemistic" for now. Hftf (talk) 22:33, 12 May 2025 (UTC)

Advice/Help with Appendix

Arabic has an appendix for its verb forms (Appendix:Arabic verbs). I wish for Maltese to also have such appendix as I think its equally as necessary and helpful.

Can someone tell me how I can create this appendix please? Melithius (talk) 21:20, 11 May 2025 (UTC)

@Melithius you can enter the page title "Appendix:Maltese verbs" in the search bar, do the search, and click the red link that comes up. This, that and the other (talk) 10:39, 12 May 2025 (UTC)

Ah okay, thought it was more difficult than that lol. Thanks a lot :) Melithius (talk) 13:16, 12 May 2025 (UTC)

Call for Candidates for the Universal Code of Conduct Coordinating Committee (U4C)

The results of voting on the Universal Code of Conduct Enforcement Guidelines and Universal Code of Conduct Coordinating Committee (U4C) Charter is available on Meta-wiki.

You may now submit your candidacy to serve on the U4C through 29 May 2025 at 12:00 UTC. Information about eligibility, process, and the timeline are on Meta-wiki. Voting on candidates will open on 1 June 2025 and run for two weeks, closing on 15 June 2025 at 12:00 UTC.

If you have any questions, you can ask on the discussion page for the election. -- in cooperation with the U4C,

Keegan (WMF) (talk) 22:08, 15 May 2025 (UTC)

You people are wasting everybody’s time. Whenever I reported misbehavior on Steam, Imgur, or other social networks, there was no requirement for me to show up in a virtual court to belabor my case in public. (People trust moderators to exercise their own judgment.) Even worse is that you don’t normally allow anonymous reports, which only invites the accused to flip the subject. Reading 1989’s case against A.Savin, that is exactly what happened: A.Savin responded by promptly flipping the tables, making the subject about 1989 instead.

If you truly cared about detoxifying Wikimedia, you would never burden anybody with this bureaucratic nonsense. I can promise everyone that the U4C is only going to detoxify Wikimedia with all the celerity of a tortoise. (((Romanophile))) ♞ (contributions) 14:40, 18 May 2025 (UTC)

Dhivehi written in Devanagari?

After a bit of work, I've managed to get Category:Dhivehi terms in nonstandard scripts down to 62 entries. The rest are single-character/ligature entries in the Devanagari script (the one exception is a Thaana-script entry with a Devanagari-script alternative form).

For background: Devanagari is one of the main scripts of India, mainly due to the Hindi language. We use it for our Sanskrit entries, and there are other languages such as Nepali that use it as well. To the north into Pakistan, the Arabic script tends to predominate, and to the south the Dravidian languages there tend to have their own set of scripts.

Dhivehi is a bit of an odball: it's out in the ocean away from the other Indo-Aryan languages, and it has its own script (Thaana) created semi-randomly from those of other languages, including Arabic. It has also used a couple of other scripts in its history- but to my knowledge, not Devanagari.

Which brings up the matter at hand: Is there any evidence of Devanagari ever being a standard script for Dhivehi? I mean, not just used to write it here and there, but having a standard alphabetical order, etc.? The entries in question have definitions like:

The twelfth consonant in Dhivehi, written in Devanagari

I would also note that there is exactly one actual Dhivehi word written in Devanagari in all of Wiktionary's mainspace entries (redlinked, of course)- the entries are only about the characters, not about what might be written using them. There are no references in any of these entries, though three of them have misspelled links to Omniglot's page on the Thaana script.

If Devanagari has been used for Dhivehi, we will need to add it as a standard script in the module, and we will need to provide references, not to mention some examples of actual usage. If it hasn't, we'll need to see about deleting all of these incorrect entries. A couple of them are already tagged for RFV, but I don't think they've been listed. Chuck Entz (talk) 06:33, 17 May 2025 (UTC)

The only thing I found was this Wikipedia article which seems to suggest that, the Devanagari script is not used by speakers in the Maldives or India. There is an (unsourced) claim near the bottom of the page that a Devanagari script was created in 1950, but it seems like it never caught on. I might be wrong, but I strongly suspect the Maldivian Devanagari script is just constructed script some guy made and promoted but didn't catch on. — BABR・talk — BABR・talk 06:16, 29 May 2025 (UTC)

@Chuck Entz, Babr

Minicoy is the island in India that speaks a dialect of Dhivehi.
The Devanagari script is certainly used in Minicoy.
However, it seems that Devanagari is primarily used for writing Hindi in Minicoy rather than Dhivehi, and that the Minicoy dialect of Dhivehi is written in the Thaana script just like Dhivehi is written in the Maldives.
Therefore, it is probably best to delete the Devanagari entries in CAT:Dhivehi terms in nonstandard scripts and the redlink to महल् at މަހަލް until there is any evidence that the Minicoy dialect has been written in Devanagari.

Kutchkutch (talk) 15:27, 1 June 2025 (UTC)

Age-old question of what counts as a surname in a language

@0DF recently added Quicherat#Polish on the grounds that declined (singular) forms are attested. Unadapted borrowings of surnames happen quite often, but the thing I'm wondering about is that within Polish texts it seems to always be in reference to one of the two notable French brothers. Given Wiktionary:Criteria_for_inclusion#Names_of_specific_entities I'm inclined to say that the entry shouldn't exist and should be nominated for RFD, but I wanted other people's input before taking that step. Vininn126 (talk) 11:19, 18 May 2025 (UTC)

@Vininn126: Thank you for bringing this up here. I am also eager to read others' input on this matter. My position on Polish Quicherat is as I wrote in this edit summary, namely “I know what you mean , and if this surname were indeclinable in Polish, I would see no reason for a Polish entry. However, given that the declined forms Quicherata and Quicheratowi are attested, what are we to do? Have Polish entries for the declined forms but not the lemma?” And as an addendum, I don't believe we should do without entries for those declined forms, since, per the general rule for inclusion, they are terms that it is “likely that someone would run across…and want to know what mean”. That was certainly the case for me when I ran across Latin Quicheratus, which is almost on all fours with Polish Quicherat. And re the point about names of specific entities, what are we therefore to make of Category:Individuals? 0DF (talk) 11:33, 18 May 2025 (UTC)

One particular solution in this case would be to change the definition - if we decide the term isn't a Polish surname but rather an individual, it would be more accurate. The main question of the thread still stands. Vininn126 (talk) 11:48, 18 May 2025 (UTC)

@Vininn126: That's really far from the most interesting and significant aspect of this topic, however. Chances are, given enough searching, that we'd find Polish uses of that name in reference to another Quicherat; almost certainly that other notable brother, at the very least. But even if not, what should we do about the Russian rendering of this surname, Кишра (Kišra)? 0DF (talk) 12:49, 18 May 2025 (UTC)

That's an assumption that should be borne out by actual research, and the entry should reflect that. We are here to present facts, not assumptions. Vininn126 (talk) 12:52, 18 May 2025 (UTC)

@Vininn126: OK. Well, for Russian Кишра́ (Kišrá), see s:ru:ЭСБЕ/Кишра, Жюль Этьен Жозеф, s:ru:ЭСБЕ/Кишра, Луи-Мари, and google books:"Кишра". 0DF (talk) 13:21, 18 May 2025 (UTC)

@Vininn126: Polish Quicherat now has three citations: the 1854 and 1872 ones refer to Juliusz and the 1877 one refers to Ludwik, so the assumption has now been borne out by actual research. Even if it hadn't, there are plenty of surnames in the world and there is plenty written in Polish in the world; having to address the underlying issue was just a matter of time. 0DF (talk) 16:31, 18 May 2025 (UTC)

See Wiktionary:Tea_room/2025/March#Черняк. Vahag (talk) 11:54, 18 May 2025 (UTC)

@Vininn126 That section doesn't explicitly say you can't include names of individuals, rather that there's no precise agreement on how to handle such names; and in practice a whole lot of them are included, e.g. translations in various languages of Socrates, Archimedes, Rachmaninoff and many others. The issue here seems to be names of well-known people, which is different from the more general issue of whether to include a given surname in a given language. Intuitively for the former, it seems we'd want to include "obviously well-known" people like the ones I just mentioned, and exclude non-notable people, but for in-between cases (e.g. the Quicherat brothers are "notable" but not "obviously well-known") it's hard to say. And for the latter, more general issue, "common" (or more generally, "interesting") names should be included and uncommon/uninteresting ones excluded, but the criteria are naturally hard to pin down exactly (cf. the famous "I know it when I see it" test concerning what qualifies as pornography). Some people might say the general three-citations requirement of WT:CFI should be enough, but given the voluminous amount of text published in many languages (esp. if we allow citations in statistical works), there are many uninteresting names that would pass this threshold that we might want to exclude.

(I suppose what I just wrote is largely unhelpful ...) Benwing2 (talk) 21:24, 18 May 2025 (UTC)

@Benwing2: The “former” issue, which you addressed, concerns whether and when to have subsenses for individuals like this, but not which surnames, given names, etc. should be defined in which language sections. I was being sincere above when I thanked Vininn126 for bringing this up here, because I really do agree with him that Quicherat isn't really a Polish surname, but rather a French surname declined with Polish case endings. I would like to find a way to reflect this usage that doesn't imply that Quicherat is just as Polish a surname as Przybyszewski (a red link!). Unfortunately, so far all we've really discussed is the “entries for individuals” issue, which is a bit of a red herring, IMO. I like {{name translit}}, which I've used for the Russian Кишра́ (Kišrá) and for other names that have seen Romanisation or Cyrillisation, but that wouldn't work for interlingual-but-intrascript renderings of names like the Polish Quicherat, since there's no transliteration, transcription, or even change of spelling going on there. Anyway, Quicherat may not really be Polish, but Quicherata and Quicheratowi sure as hell aren't French. I'm open to suggestions of how to handle this case and the many, many others like it. 0DF (talk) 23:31, 19 May 2025 (UTC)

@0DF Do you know about {{foreign name}}? It's designed for exactly this situation, i.e. cross-language but within the same script. Benwing2 (talk) 23:41, 19 May 2025 (UTC)

We also have {{name respelling}} (for those ubiquitous Latvian name respellings) and {{name obor}} (for East Asian orthographic borrowings). Benwing2 (talk) 23:43, 19 May 2025 (UTC)

@Benwing2: No, I did not know about {{foreign name}}. That looks a lot better to me. Thank you for bringing {{foreign name}} and {{name respelling}} to my attention (I was already aware of {{name obor}} from some pages with Chinese and Japanese entries).

@Vininn126: Would do you make of Quicherat’s redefinition? Are you happy with this solution?

0DF (talk) 00:35, 20 May 2025 (UTC)

Yes, glad there were these. Vininn126 (talk) 06:29, 20 May 2025 (UTC)

Excellent. 0DF (talk) 11:46, 20 May 2025 (UTC)

For these foreign surnames (i.e. used in transliterations only) I would like to keep pronunciation information (especially if their spelling is irregular), other than that idrc what people decide to do with them. — BABR・talk 18:05, 20 May 2025 (UTC)

Adding alternate IPA transcriptions of the sequence /dɹ/.

The majority of English speakers, particularly young speakers pronounce words such as dragon of drink as starting with /d͡ʒɹ/ instead of the older /dɹ/. This was considered an allophone, but I found two words with would be pronounced the same if you have this feature. agere and agedre I think these should be added as the main pronunciation for all words with the phonemes /dr/ along with a note that younger speakers are much more likely to use that pronunciation. I haven't looked into the /dɹ/ change's related changes like train to /t͡ʃɹeɪn/ (which is also the majority nowadays), and strong to /ʃtɹɒŋ/ (not the majority but rising) but I wouldn't be surprised if I can find any. BirchTainer (talk) 07:06, 19 May 2025 (UTC)

@BirchTainer Are you sure this is a "young speaker" thing? Do you have a reference for this? AFAIK the pronunciation of /tɹ/ as similar to has been noted for a long time and is nothing new. And I don't understand your examples of agere and agedre; these are not English words. Benwing2 (talk) 21:28, 19 May 2025 (UTC)

FWIW I have an old used book of songs in musical notation that has a handwritten note that the word literature should be pronounced British-style as "LITCH-ra-cha". I suspect the note is at least 40 years old, and so this clearly is not new and applies to British as well as American English. Benwing2 (talk) 21:32, 19 May 2025 (UTC)

Those are English words at least according to this dictionary. It is a young speaker thing according to this video (Also my parents and grandparents don't do it but most people my age do) but that's not really a big dataset. You are correct it is a shift that happened for every variety of English. BirchTainer (talk) 23:45, 19 May 2025 (UTC)

@BirchTainer This may be specific to the /str/ sequence. The /t/ in initial tr- is aspirated, which naturally leads to a -like pronunciation, but the /t/ in initial st- is not aspirated. Possibly young speakers are either starting to aspirate the /t/ in st- sequences or aspirate it only in str- sequences. Benwing2 (talk) 23:51, 19 May 2025 (UTC)

In the video results are shown for all 3 phonetic changes, and they all have younger speakers as higher. BirchTainer (talk) 23:58, 19 May 2025 (UTC)

@BirchTainer i'm not sure how these are pronounced, but it wouldn't change that /d͡ʒ/ and /d/ aren't distinguished before /ɹ/, and thus are conditional allophones. — BABR・talk 22:24, 19 May 2025 (UTC)

It does change because agere is pronounced as /eɪ.d͡ʒɹi/ and traditionally agedre is pronounced as /eɪd͡ʒ.dɹi/, but if you have the phonetic change, one less phoneme is pronounced because the /d/ become /d͡ʒ/ and it merges with the other /d͡ʒ/, this changes where the syllable break is in agedre, and it becomes /eɪ.d͡ʒɹi/. BirchTainer (talk) 23:55, 19 May 2025 (UTC)

Since neither of these are even remotely common English words, I can't respond definitively but I doubt this; whether /dr/ sounds like has little to do with whether you elide the second of two affricates in a row across a compound boundary. Benwing2 (talk) 00:03, 20 May 2025 (UTC)

I haven't found an exact match for General American but drafts /d͡ʒræfs/ and giraffe /d͡ʒʊ.ræfs/ are extremely similar in my dialect, and I've managed to find some examples of people being confused between the 2 words (or the non-plural forms) Here are some people online saying this: Here Here Here and Here BirchTainer (talk) 09:06, 20 May 2025 (UTC)

sorry giraffes not giraffe

BirchTainer (talk) 09:08, 20 May 2025 (UTC)

The affrication of "dr" and "tr" is a well-known phenomenon of English pronunciation. There are practical problems with having separate pronunciations and a note on every page for a word that contains /dr/. Visual space is not free. It would be worth covering at Appendix:English pronunciation. The examples you chose, agere and agedre, are very niche terms and so don't do a good job at showing the importance of marking the distinction; I'm not convinced it is necessary as a general practice. Fusion of /d͡ʒ/ with a following /d͡ʒ/ is not automatic in English: phrases such as "orange juice", "hinge joint" and "page-jack" are not obligatorily pronounced as oran-juice, hin-joint and pay-jack. So while I don't doubt that may be one possible pronunciation of agedre, I would ask, what evidence do you have that this is anything more than a fast speech simplification of /eɪd͡ʒ.dɹi/, ?--Urszag (talk) 00:13, 20 May 2025 (UTC)

I think it could be fairly convincingly argued that /dɹ/ and /tɹ/ are not phonemic onsets in (certain varieties of) English, and that agedre can simply be explained as /-d.ɹ-/. Theknightwho (talk) 10:00, 24 May 2025 (UTC)

Latin female proper names

In ancient Rome, it was standard to refer to a woman by the feminine form of her father’s family (gens) name (his nomen gentile). For example, the daughter of a man with the gens name Tullius was regularly called Tullia; the daughter of a man with the gens name Aemilius was regularly called Aemilia, etc. I noticed that on Wiktionary, a number of these feminine forms of nomina gentilia either don't have entries yet, or are currently only (mis?)defined as praenomina, and now I’m wondering what the right format would be for the family names.

Looking at e.g. the Russian entry for Иванов (Ivanov), it is defined as a proper noun but given a declension table that includes feminine and plural forms. The nominative feminine singular Иванова (Ivanova) is defined as a proper noun form, not as its own proper noun. Should Latin be handled likewise?

I think I favor treating Latin a little differently, having masculine and feminine versions as separate proper nouns with separate declension tables, but each linked to the other in the headword line (as with equus and equa), and with the feminine having some template on the definition line that defines it as just the feminine version of the masculine gens name. One reason I think this is better is that some names were used in Roman times as feminine nomina gentilia, but came to be used in later eras as given names: e.g. Iulia. (To be clear, even if the feminine versions are listed as proper nouns rather than as proper noun forms, I don't think they should be added to Category:Latin nomina gentilia, since that would be completely redundant.)

Right now, the norm for Latin entries for male gens names seems to be not to mention feminine forms either in the headword line or in the declension table. I think some bot work might be called for to reformat all the existing Latin gens name entries. It also seems like it might be helpful to have more template support for consistently formatting these entries: I don't see a definition-line template for them, they're just manually put in the category using Template:cln. Should Template:surname be used or is a Latin-specific template a better idea here? Based on the existing entries, it seems like a helpful feature for a "la-nomen gentile" definition-line template would be parameters to optionally link to Wikipedia articles of famous bearers of the name.

The existing templates and categories for Category:Latin praenomina seem fine in general (although it seems weird that there's Category:Latin feminine praenomina but not Category:Latin male praenomina).

Latin cognomina were additional names after the nomen gentile that were sometimes passed down hereditarily within branches of a gens, and such inherited cognomina sometimes have feminine forms like a nomen gentile: e.g. the daughter of Claudius Marcellus was called Claudia Marcella, the daughter of Caecilius Metellus was called Caecilia Metella. Unlike with gens names, I'm not sure that every masculine cognomen has a regularly used feminine version, or vice versa. I don't know if inherited cognomina were passed down to daughters as regularly as gens names. Also, I think some women were given a cognomen that did not originate directly as the feminine-inflected form of her father’s cognomen: e.g. w:Julia Drusilla (mentioned in w:Naming conventions for women in ancient Rome) does not seem to be named after a male *Drusillus. I think that, as with nomina gentilia, the best system will be to enter feminine cognomina in all cases as their own proper nouns, and to use the definition line to make it clear that Marcella is the feminine form of Marcellus. (And keep separate categories for masculine and feminine cognomina, even though there is some overlap.) Urszag (talk) 22:11, 20 May 2025 (UTC)

We could either add a parameter to {{surname}} or make a Latin-specific {{la-gens name}} template or whatever. In general I favor having fewer templates, but here it doesn't matter that much because the same underlying code (with appropriate conditionals) would be used to handle regular surnames and Latin gens names, regardless of whether we have one or two templates. Note that we do have a separate {{patronymic}} template, which just calls the surname function, passing in a type parameter. (Support is also there for matronymics but there's as of yet no {{matronymic}} template defined.) Also note that other languages like Greek and Czech that have gendered surnames tend AFAIK to define both the male and female variants as lemmas, and even in Russian this happens sometimes, as with Ахмадуллина (Axmadullina). I think the right thing to do is for you to propose a specific template interface with the appropriate parameters and such, and we can discuss whether it needs any modifications. Benwing2 (talk) 23:56, 21 May 2025 (UTC)

@Benwing2 I'd prefer the parameter solution, or otherwise we'll end up with template duplication with various other Italic languages (not to mention Old Latin, if that ever gets split out). We don't want a repeat of the current Japonic situation, with loads of forked templates. Theknightwho (talk) 09:48, 24 May 2025 (UTC)

RfC ongoing regarding Abstract Wikipedia (and your project)

(Apologies for posting in English, if this is not your first language)

Hello all! We opened a discussion on Meta about a very delicate issue for the development of Abstract Wikipedia: where to store the abstract content that will be developed through functions from Wikifunctions and data from Wikidata. Since some of the hypothesis involve your project, we wanted to hear your thoughts too.

We want to make the decision process clear: we do not yet know which option we want to use, which is why we are consulting here. We will take the arguments from the Wikimedia communities into account, and we want to consult with the different communities and hear arguments that will help us with the decision. The decision will be made and communicated after the consultation period by the Foundation.

You can read the various hypothesis and have your say at Abstract Wikipedia/Location of Abstract Content. Thank you in advance! -- Sannita (WMF) (talk) 15:27, 22 May 2025 (UTC)

Glossary template needs a language association

UPDATE: I'm realizing that this is a more appropriate discussion for the grease pit, I will move this note there.

Currently, words linked to the glossary are not associated with any particular language. Because of this, the list of terms not associated with an entry in the glossary are dumped into one giant list (https://en.m.wiktionary.orghttps://dictious.com/en/Category:Pages_linking_to_anchors_not_found_in_Appendix:Glossary) which means anyone wanting to improve the glossary doesn't have a good way to look at terms only in their target language.

This could be fixed by either requiring a language tag (and putting in a nag notice on those lacking it so they can be fixed over time). Or by changing the template so that there's a language association based on the page it comes from. I think this second solution makes the most sense, unless there is some template dynamic between languages that I don't know about. Proudlyuseless (talk) 19:59, 22 May 2025 (UTC)

Suggestions for FWOTD fallbacks for June dates

I am helping to set some FWOTD fallbacks for dates in June. These are pages in the format "Wiktionary:Foreign Word of the Day/ " which are displayed when no specific FWOTD is set for a particular day in a year. The eventual goal is to create a complete set of 366. (I have been doing this sporadically for WOTD as well.)

If you speak languages other than English, please have a look at "Wiktionary:Foreign word of the day/Nominations" and suggest nominations—where possible satisfying the criteria on that page—suitable for the dates set out below.

1 (Global Day of Parents).
2 (Festa della Repubblica, Italy's national day).
3 (World Bicycle Day).
5 (Constitution Day (Denmark); World Environment Day).
6 (UN Russian Language Day).
7 (World Food Safety Day).
8 (World Oceans Day).

Volunteers interested in setting FWOTDs generally are also sought. (@Polomo47, Svartava, for your information.) — Sgconlaw (talk) 21:48, 22 May 2025 (UTC)

Official Launch of The Million Wiki Project

We are thrilled to announce the official launch of The Million Wiki Project!

Our mission is to enrich Wikimedia projects with high-quality and diverse content related to the Middle East and North Africa (MENA) region. This initiative focuses on creating new articles, multimedia, structured data, and more, covering topics from MENA countries, communities, and diaspora worldwide.

Who Can Participate?
All registered Wikimedians are welcome to join! Whether you're an individual contributor or part of an organization, your support is valuable. We encourage content creation in any of the six official UN languages (Arabic, English, French, Russian, Spanish, and soon Chinese).

What Kind of Content Are We Looking For?

New Wikipedia articles focused on MENA topics
Multimedia contributions on Wikimedia Commons (photos, videos)
Structured data for Wikidata
Language entries on Wiktionary
Public domain texts on Wikisource

Note: Make sure your content follows local Wikimedia guidelines and licensing policies, including Freedom of Panorama for media files.

Join us in bridging content gaps and showcasing the richness of the MENA region on Wikimedia platforms!
Stay tuned for more updates and participation guidelines. Reda Kerbouche (talk) 09:08, 23 May 2025 (UTC)

Template:etymon for Chinese

There is no consensus as of now to use {{etymon}} on Chinese entries, thus I removed all usages of such. It is especially problematic due to existing issues in Chinese etymologies.

Externally, many loanwords are borrowed via some dialect which is then borrowed orthographicaly by other lects. The current practice of simply using zh, but not individual lects, obscures such fact and reverses the ordering of borrowing (hence I have previously asked others to stop doing so but to no avail). Also, certain words are borrowed multiple times via multiple sources, e.g. 吉他 is borrowed from Japanese in several Taiwanese dialects, so the template usage itself is already incorrect, not just the presence of the tree. (The ideal solution is to split the entry into multiple etymologies, but many of the Chinese editors does not seem to bother, plus discerning the etymologies is sometimes complicated and time-consuming, e.g. 咖啡)

Internally, the long history of Chinese meant that morphological changes, phonological changes, layers of literary readings, etc., are bound to happen. Combined with the fact that Chinese is written in a logographical script, our current etymologies effectively provide zero information, and writing {{etymon}} simply based on these information is not helpful to the reader.
Also, the automatic transliteration for zh generates pinyin, which is anarchronistic or wrong for any non-modern borrowings, see for example Japanese 英語 where pinyin is "helpfully" provided for 英格蘭 / 英格兰 and 英語 / 英语, and yet there is no way to turn these off. (the average editor will not )
Tthe PST entries are also known to be problematic (User:Mellohi! is in the process of reworking on them) with a metric ton of uncertainty, and the phylogeny is unclear, so adding them (which are often the wrong forms) is pointless.

Many of these problems regarding Chinese etymologoies obviously need to be fixed, but at the same time I don't think we should allow them to propagate elsewhere (i.e. via the transclusion feature of {{etymon}}, which is still the case even when |tree=0).

Given my observation that most editors simply blindly add the template without understanding such issues, I suggest that any {{etymon}} usages on Chinese should be disallowed.

cc @Babr @Fenakhay who reverted my removals

also (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, Saph, The dog2, Theknightwho, Tooironic, 沈澄心, 恨国党非蠢即坏, LittleWhole): – wpi (talk) 12:14, 23 May 2025 (UTC)

Support dring sim 12:34, 23 May 2025 (UTC)

I am completely sympathetic to the concerns raised by @wpi, but I'm not sure if an across-the-board ban is warranted. I can see it still being useful when it is not misleading. I guess this ban would be a safer option to ensure that there is no misinformation, but if the template can be customized to make sure certain romanizations can be suppressed/replaced, I believe there is still some value to this template. — justin(r)leung _{{ (t...) | c=› }} 13:53, 23 May 2025 (UTC)

Support Etymon clearly is not suitable with the currently unified Chinese entries at the moment. If Chinese is split up (or etymon is patched to work around these issues) we can reconsider this. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 15:51, 23 May 2025 (UTC)

@Wpi about the revert, the discussion you cited was only about {{etymon}}s usage for etymology trees, not its usage for e.g. categorization. So, I put it back with the tree deactivated for the generated categories.
Regarding a full on ban of {{etymon}} in Chinese, I have no strong opinion. But, as Justin said, I'm not sure if that is warranted. — BABR・talk 19:33, 23 May 2025 (UTC)

@Babr: I am aware that the vote only concerned etymology trees but not the template itself. My aversion towards {{etymon}} is partly due to asthetics, but also due to how it (and more specifically some of the edits adding it to our Chinese entries) affects how the etymological information is presented and introduces more inaccuracies (or will do so if using it is allowed), which I have outlined above.

My understanding from reading the vote is that the usage of {{etymon}} should not affect the existing etymologies, but sadly this seems to be the case with Chinese (due to the conflicts between Unified Chinese vs the assumptions made when {{etymon}} was designed ). – wpi (talk) 15:27, 24 May 2025 (UTC)

Relevant earlier discussion: Wiktionary:Beer parlour/2025/March#Etymons. How many times more will we need to go over this? — SURJECTION ^{/ T / C / L /} 14:31, 24 May 2025 (UTC)

I had a discussion with @Wpi on Discord who recommended that I make these changes to etymon:

Chinese (zh) entries no longer get categorized.
Links to Chinese (zh) etymons no longer have a transliteration.

Now that I've implemented these, I think the current state is preferable to the "nuclear option" of removing all uses of the template, which I don't think will have any benefit to Chinese etymologies.

Notifying the previous voters @沈澄心, Justinrleung, Mellohi!

Ioaxxere (talk) 17:59, 28 May 2025 (UTC)

@Ioaxxere: Thank you, I think this would be a more acceptable solution for the time being. I'm also wondering whether the template should throw an error if |tree= or |text= is set to true for Chinese. – wpi (talk) 18:17, 28 May 2025 (UTC)

I also agree that this solution is preferable for the time being (until Chinese is split). As I agree with wpi's concerns regarding Unified Chinese etymologies in etymon, but felt that the previous option was a bit too 'nuclear' (hence why I didn't vote). — BABR・talk 22:14, 28 May 2025 (UTC)

Changing Translinguality of New Tai Lue word characters

I apologise if I should have put these questions in an earlier topic from this month. They are related, but they seem too specific.

The issue is that the script seems only to by use for Tai Lü, so I want to change the characters found in words, except possibly for the tone marks, from being translingual to being in the Tai Lü language, and a lot of characters are currently missing. I have been recording a lot of New Tai Lue characters as descendants of Tai Tham characters, but I have been recording them in the invocations of {{desc}} as being Tai Lü, which generates an implication that they are missing. Many of the pages are currently missing.

Now, if the New Tai Lue script is used for other languages, please advise me, because I should by then be recording these descendants as translingual, and new character entries created should mostly be for translingual characters. The rest of my questions assume that the script is restricted to the language.

Is it in order for me to create the new pages for Tai Lü characters? I believe they are thus restricted.

Is there now a streamlined process for getting agreement to converting the script's characters from translingual to Tai Lü? When I last looked, many months ago, there was a creaky process for changing individual entry's languages, but I fear some thought I was abusing the process. I haven't tracked it down yet, as it may have been superseded.

Is there a guide on how to record the descent of characters and symbols? Some of the New Tai Lue characters are simply Tai Tham digraphs or words adopted as characters (or symbols). The creation process in at least one case proceeded:

Change a Tai Tham non-spacing mark to a New Tai Lue (NTL) spacing mark.
Ligated the NTL spacing mark with a Tai Tham subscript letter form.

Where and how should these innovations be recorded?

The following people are likely to have useful input on this matter: @Benwing2, -sche, Noktonissian. --RichardW57 (talk) 22:57, 23 May 2025 (UTC)

User:Mysteryroom still making bad edits after ban

See here: . Before his ban I had told him not to edit a page just to change British to American spellings. 2A00:23C5:FE1C:3701:8C0B:6FE9:4746:3E03 19:04, 24 May 2025 (UTC)

Is that against Wiktionary rules to change words to American spellings? mysteryroom (talk) 19:08, 24 May 2025 (UTC)

Yes it is; you should in general leave the spellings as-is, as neither British nor American spellings are "better". Benwing2 (talk) 19:23, 24 May 2025 (UTC)

Okay, I will refrain from changing the spellings. mysteryroom (talk) 19:24, 24 May 2025 (UTC)

Sorry for my error. mysteryroom (talk) 19:27, 24 May 2025 (UTC)

order of T:alti, T:syn, etc

When ordering the inline templates like {{alti}} and {{syn}} that go right after particular definitions, where should {{alti}} be in the sequence? My instinct is to put {{alti}} first (like this), followed by {{syn}} etc, to match the order of the corresponding full sections. Currently, Jeff's helpful cleanup bot is putting {{alti}} last instead (due to the historical accident that when in 2023 it was decided to order {{syn}}, {{ant}} et al "in the order listed at Module:nyms/documentation", {{alti}} was not and still is not mentioned there), but he indicated this would be easy to change and doesn't effect that many pages; see User talk:Quercus solaris#alti for the discussion that led to this. @Quercus solaris, JeffDoozan. - -sche (discuss) 21:34, 24 May 2025 (UTC)

I would agree with placing it first and have personally been doing so. — SURJECTION ^{/ T / C / L /} 21:51, 24 May 2025 (UTC)

Before the nym templates, for the reason you mentioned, the sections are sorted in intuitively follows how 'close' things are. I add that it might be possible that a word, enquired about by help of the dictionary, is recognized by a reader in an alternative form, that's how close alternative forms are to the definition. They are, in a sense, perfect synonyms, or synonyms + related terms, so are expected immediately before the synonyms. Fay Freak (talk) 22:00, 24 May 2025 (UTC)

I too support putting {{alti}} first, for the reasons mentioned by others above. Thanks, Quercus solaris (talk) 16:11, 25 May 2025 (UTC)

Done - moved {{alti}} before {{syn}} JeffDoozan (talk) 15:43, 3 June 2025 (UTC)

"Old Romanesco"

An IP geolocating to Milan has just added several entries using this language header, but with the language code for Italian. This is obviously wrong, but I can't really clean this up without knowing what to change these to.

First of all, is there really such a lect? From the name, it should have something to do with Rome, and given that modern Italian is from Tuscan, the speech of Rome before that standard was adopted might be distinctive enough.

Second, if it does exist, how does it fit into Wiktionary's system? Is it a separate language that deserves its own L2 header, or a dialect or chronolect that should be given an etymology-only code and an alias, but no more?

I don't want to just nuke everything, but I don't know enough about Italian to clean it up myself. Thanks! Chuck Entz (talk) 22:07, 24 May 2025 (UTC)

We’d classify these as Italian, with {{tlb|it|Romanesco|archaic}}. Nicodene (talk) 22:36, 24 May 2025 (UTC)

@Nicodene: Thank you! For reference, the entries currently using the header are: daiesse, lopo, responnere and vraccio. Chuck Entz (talk) 22:46, 24 May 2025 (UTC)

User:Geographyinitiative removing citations according to a non-Wiktionary policy

Please see page history of Liaoyuan and 法律案. 2A00:23C5:FE1C:3701:A818:1F7A:7CB0:A6F9 09:33, 25 May 2025 (UTC)

I think citations should only be removed from the entry and sent to the Citations page if (a) the citation is low-quality or superfluous etc (not the issue here), or (2) the quote itself is objectionable in some way - e.g. is egregiously biased or offensive in a context where that is not lexicographically relevant, and better citations are available. Just because a source may be objectionable isn't a good reason to do this, in my view.

Moreover I can't see how what Wikipedia thinks of a source is of any use to us. This, that and the other (talk) 09:22, 28 May 2025 (UTC)

This is merely a curation decision, i.e. a matter of presentation, not outright deletion. I put it on the Citations page so it's not a removal (removal is a hype buzz word in this scenario). Obviously I have no problem with the citations, but not on the entry proper. If you want to restore content to the mainspace referencing dangerous sources that are only used in a controlled way on Wikipedia, be my guest. But I think prudence dictates that they are most appropriate for the Citations pages rather than mainspace. Putting the Wikipedia-deprecated sources on the mainspace discourages mainstream editors and readers (the normal audience) from trusting Wiktionary. However, they are completely appropriate for the Citations page; we can put anything there for the better understanding of a word's usage. The Citations page does not imply endorsement by Wiktionary whereas the entry proper really does have that connotation, even if you don't want it to. Don't allow my unpopularity on Wiktionary to blind you and cause you to endorse referencing Wikipedia-deprecated sources on the entry proper.
And I will say this; I'm just playing it by ear here. I'm trying to be respectable. I can definitely change my opinion about this. I want to reference the deprecated sources in some way. I like to learn about them and the patterns of alternative forms in the deprecated sources, the different habits that they will have in word usage. If you want more of that on the mainspace, I could definitely do that and I am interested. But I'm trying to protect the website's reputation to the degree I can. --Geographyinitiative (talk) 10:11, 28 May 2025 (UTC)

You're confused about how this works. Wiktionary is a separate entity from Wikipedia. Sources do not work the same way. This website's reputation or trust is not protected by what you're doing. The distinction between Citations and an entry proper is not what you think and is not about endorsement. "Removal" is not a "hype buzz word". "Dangerous sources" may be though?

As a curation decision, there are many valid ways to defend different curation of information but it shouldn't be tolerated to do so in this way! Hftf (talk) 10:57, 28 May 2025 (UTC)

God, you've fallen right into the trap. I mean, do you know what you're asking for here? Hey, I'm just trying my best to improve the website, but if you really want it, God, okay then, sure. --Geographyinitiative (talk) 11:03, 28 May 2025 (UTC)

No one is doubting your good faith. One ought to remember that we are a secondary source while Wikipedia is tertiary. Our quotations are used merely to prove the existence of a word. WT:Wiktionary is not Wikipedia, WT:CFI. Vininn126 (talk) 11:12, 28 May 2025 (UTC)

I have been down that road of thought, and it is fun and feels open-minded. If you want to follow it, feel free. But don't come crying to me when the whole mainspace of the website is full of facially unoffensive citations referencing Wikipedia deprecated sources. --Geographyinitiative (talk) 11:43, 28 May 2025 (UTC)

Glad to know you are more enlightened than our current policies. Feel free to start a discussion changing before enforcing them, however.

Sarcasm aside, considering the fact you haven't addressed the fact we are secondary and Wikipedia is tertiary, I'm going to assume you still haven't gotten the point. Vininn126 (talk) 11:47, 28 May 2025 (UTC)

Okay, let's say you win. Now what? --Geographyinitiative (talk) 11:52, 28 May 2025 (UTC)

The quotes go back on the page and you stop removing them based on Wikipedia's criteria and instead focus on our internal policies and criteria. Isn't that obvious? that was the entire point of the discussion from the beginning. Vininn126 (talk) 11:54, 28 May 2025 (UTC)

Okay, go ahead then. Put the deprecated sources back on the mainspace. --Geographyinitiative (talk) 11:55, 28 May 2025 (UTC)

Glad we could come to an agreement! Vininn126 (talk) 11:58, 28 May 2025 (UTC)

I really am okay with it, but I guess the weight of responsibility gets a little heavy for me alone to carry. Its a good argument to lose. Geographyinitiative (talk) 12:01, 28 May 2025 (UTC)

I think the Liaoyuan edit was appropriate for more or less the reason given. If a term is derogatory or controversial then the quotations can be too. To document the name of a large city we don't need inline quotations at all. They could all be on the citations page. Vox Sciurorum (talk) 18:07, 28 May 2025 (UTC)

Inline quotations are almost always better. Visible for users, as they are unlikely to go to the citations page - and they are there to prove a term exists. Vininn126 (talk) 18:11, 28 May 2025 (UTC)

Citations to prove the existence of a name of a large city are clutter. The Citations namespace exists in part as a place to move clutter. I have added citations for city names in Ottoman Turkish where there may be some doubt about the spelling. Anybody who thinks these get in the way of a short and to the point definition can move them to Citations. Vox Sciurorum (talk) 12:08, 29 May 2025 (UTC)

They are collapsed by default. Hardly seems cluttery to me. Vininn126 (talk) 12:11, 29 May 2025 (UTC)

I agree. Weird removal. It should never be controversial to have some rather than no quotations listed on the entry, IMO. — Mnemosientje (t · c) 13:30, 30 May 2025 (UTC)

I really want to defend Geogy here, but the discussion is too lame for me. Vilipender (talk) 12:38, 29 May 2025 (UTC)
"Citations to prove the existence of a name of a large city are clutter." Well, citations actually are important for every English language location name and surname and etc derived from Chinese characters because these loan words into English have numerous different origins and sources, that is to say, there are some current official names of some provinces that have existed in English since the 1500's, and there are locations with names in English that have existed only since the 2000's. And the whole web of synonyms and alternative forms in English has become an unbearable weight and intolerable maze on readers, and the different usages signify underlying beliefs and habits. Since when has Shanxi existed as a word? 1958 like Beijing? Or some murkier time between 1958 and 1978? Did Beijing maybe exist in 1957? Since when has Hunan existed, and who gets the credit for the romanization of that word into English? When did the word Shanghai emerge? I know the linguists have focused on the verb 'shanghai', but have you ever stopped to ask when 'Shanghai' emerged? Since when has Cijin and its variants been used, who uses it instead of Qijin or Chichin? Local/nonstandard orthography versus official: Wuhan and Wu Han. Spelling varitions like Harbin Ha'erbin and Haerbin- in what contexts do they emerge? Various patterns of etymology and usage emerge over a broad comparison, and the citations, references and similar are invaluable in informing the readers on these issues. They can also signify underlying trends. Also there are patterns of systematic errors like Hubei/Hebei confusion or Ji'nan/Jinan. There are some terms that are supposedly the official name of a location, but they almost never occur, and a substitute occurs. Lvliang versus Luliang, etc. You can use this information to guess the period when a text was written or the mindset and bias of the author. I can tell when google books and archive.org get a publication date wrong on a book at a glance. Abbreviated forms, etc. Spellings used for a few years and then never again like Shi-jia-zhuang. Urumchi has like 15 variations so far, and those are just the easy ones! All kinds of things, and there's no real compendium of all the terms to help people understand what's happening. And how do you pronounce these, including the incorrect but widely used pronunciations? In the broad historical context, it is an emerging part of the English language, and the context of these citations, whatever source, is always going to be helpful, especially for rarer vatiants. Not a few fully durably cited entries for rare geographical word variants have their first Google result being Wiktionary, an invaluable aid to the wayward searcher. And should more common or official spellings be denuded of their citations and only the rare words be shown in context? No I say: Let a ten thousand entries, of whatever commonness or rarity, flourish and grow in the light of the highest quality quotations! --Geographyinitiative (talk) 14:19, 31 May 2025 (UTC)

Wikimedia Foundation Board of Trustees 2025 Selection & Call for Questions

More languages • Please help translate to your language

Dear all,

This year, the term of 2 (two) Community- and Affiliate-selected Trustees on the Wikimedia Foundation Board of Trustees will come to an end . The Board invites the whole movement to participate in this year’s selection process and vote to fill those seats.

The Elections Committee will oversee this process with support from Foundation staff . The Governance Committee, composed of trustees who are not candidates in the 2025 community-and-affiliate-selected trustee selection process (Raju Narisetti, Shani Evenstein Sigalov, Lorenzo Losa, Kathy Collins, Victoria Doronina and Esra’a Al Shafei) , is tasked with providing Board oversight for the 2025 trustee selection process and for keeping the Board informed. More details on the roles of the Elections Committee, Board, and staff are here .

Here are the key planned dates:

May 22 – June 5: Announcement (this communication) and call for questions period
June 17 – July 1, 2025: Call for candidates
July 2025: If needed, affiliates vote to shortlist candidates if more than 10 apply
August 2025: Campaign period
August – September 2025: Two-week community voting period
October – November 2025: Background check of selected candidates
Board’s Meeting in December 2025: New trustees seated

Learn more about the 2025 selection process - including the detailed timeline, the candidacy process, the campaign rules, and the voter eligibility criteria - on this Meta-wiki page .

Call for Questions

In each selection process, the community has the opportunity to submit questions for the Board of Trustees candidates to answer. The Election Committee selects questions from the list developed by the community for the candidates to answer. Candidates must answer all the required questions in the application in order to be eligible; otherwise their application will be disqualified. This year, the Election Committee will select 5 questions for the candidates to answer. The selected questions may be a combination of what’s been submitted from the community, if they’re alike or related.

Election Volunteers

Another way to be involved with the 2025 selection process is to be an Election Volunteer. Election Volunteers are a bridge between the Elections Committee and their respective community. They help ensure their community is represented and mobilize them to vote. Learn more about the program and how to join on this Meta-wiki page .

Thank you!

https://meta.wikimedia.orghttps://dictious.com/en/Wikimedia_Foundation_elections/2022/Results

https://foundation.wikimedia.orghttps://dictious.com/en/Committee:Elections_Committee_Charter

https://foundation.wikimedia.orghttps://dictious.com/en/Resolution:Committee_Membership,_December_2024

https://meta.wikimedia.orghttps://dictious.com/en/Wikimedia_Foundation_elections_committee/Roles

https://meta.wikimedia.orghttps://dictious.com/en/Wikimedia_Foundation_elections/2025/FAQ

https://meta.wikimedia.orghttps://dictious.com/en/Wikimedia_Foundation_elections/2025/Questions_for_candidates

Best regards,

Victoria Doronina

Board Liaison to the Elections Committee

Governance Committee

MediaWiki message delivery (talk) 03:08, 28 May 2025 (UTC)

Revisiting the Proto-Prakrit idea

@AryamanA @Pulimaiyi @Kutchkutch @Svartava @Kwékwlos In 2024, we had the beer parlour discussion "Our treatment of MIA reconstructions". In the past few months, edits like this, based on that discussion, have moved reconstructed "Ashokan Prakrit" terms to reconstructed "Prakrit" terms. Not to dwell on this point, but I am interested to revisit the discussion that we had there with a couple follow-ups:

In the 2024 discussion, we considered creating a "Proto-Prakrit" language code; Kutchkutch said that doing so "would mean that we would have to decide whether it is an ancestor, descendant or contemporaneous with the merged Prakrit language" and this point wasn't discussed further. For the purposes of NIA term categorization, I suggest now that:
1. Proto-New Indo-Aryan (or Proto-Prakrit) be set up as a variety of Prakrit.
2. We add the label (Proto-New Indo-Aryan) on reconstructed "Prakrit" pages, e.g. *𑀥𑀼𑀁𑀠𑀇 (*dhuṃḍhaï) (as is done on *bassiāre for (Proto-Romance))
3. On the etymology section of NIA pages, we can reference Proto-NIA, e.g. on Hindi ढूँढना (ḍhūṇḍhnā) we write "Inherited from Proto-New Indo-Aryan *𑀥𑀼𑀁𑀠𑀇 (*dhuṃḍhaï)" and the word gets classified as Category:Hindi terms inherited from Proto-New Indo-Aryan.
This is a pretty clean addition to Reconstructed Prakrit idea since the NIA words get classified a bit more accurately. It seems wrong to me to classify ढूँढना (ḍhūṇḍhnā) as Category:Hindi terms inherited from Prakrit (nor Category:Hindi terms inherited from Ashokan Prakrit) because there is no exact attested Prakrit etymon to speak of. But now, we would split off such terms into a Category:Hindi terms inherited from Proto-New Indo-Aryan, which is a more accurate label. This also parallels what is done for Romance languages, where Vulgar Latin is classed as a variety of Latin, and it is possible for Romance languages to inherit from Vulgar Latin. For example, Spanish bajar is derived from Proto-Romance *bassiāre and is classified as Category:Spanish terms inherited from Vulgar Latin by using {{inh+|es|VL.|...}}.
Candidly, I am not a fan of Wiktionary's peculiar usage of reconstructed "Ashokan Prakrit" as a catch-all for Turner's reconstructed vocabulary that (1) has Dardic descendants and/or (2) has a particular form that is disallowed by the phonotactics of the later dramatic Prakrits. In the 2024 discussion, Aryaman said "For the past couple years our strategy has been to call these reconstructions Proto-Ashokan Prakrit, which is a language we made up and not a label that is really used in any literature (0 hits on Google)" and also Kwékwlos later raised a very good point, "Ashokan Prakrit here is misconstrued as a generic proto-Prakrit (in fact the ancestors of Sinhalese and Dhivehi left the subcontinent before Ashokan Prakrit was even attested; one can argue also for the early separation of Dardic, which is best interpreted as an areal grouping of northwestern Indo-Aryan languages)". Additionally, a consequence of this is the rather-confusing present situation where, for example, Category:Hindi terms inherited from Ashokan Prakrit now contains virtually zero Hindi terms that are actually derived from attested Ashokan terms but is instead a weird catch-all for Deśaj words and other words that don't have a "clean" dramatic Prakrit reconstruction. And even if a word has Dardic descendants, must we assume that the term belongs to "Ashokan Prakrit" rather than "Prakrit" (i.e. what if it originated in the later "Prakrit" time, and was borrowed into the Dardic languages)?

In an effort to be accurate and not overstate a term's age, may I suggest that we nuke "Reconstructed Ashokan Prakrit" by splitting its terms into "Reconstructed Sanskrit" and "Reconstructed Prakrit":
1. Most "Reconstructed Ashokan Prakrit" terms would become "Reconstructed Prakrit" and on every such page we add the label Proto-New Indo-Aryan, even if Dardic descendants exist. For example, we move Ashokan Prakrit *𑀙𑁄𑀝𑁆𑀝 (*choṭṭa) to Prakrit *𑀙𑁄𑀝𑁆𑀝 (*chŏṭṭa) and remove Ashokan Prakrit *𑀨𑀺𑀭𑀢𑀺 (*phirati) in favor of 𑀨𑀺𑀭𑀇 (phiraï). In the few cases where (1) Dardic descendants exist and (2) it is not very obvious that the reconstruction predates the dramatic Prakrit period, then they can be mentioned in the etymology of the Reconstructed Prakrit page, e.g. "Related to Kashmiri/Phalura/Kalasha ...". By doing this, we are being vague about the exact age of the term, since it's not clear whether the Dardic word derives from Prakrit or shares a common ancestor with Prakrit. Hindi छोटा (choṭā) is classified as Category:Hindi terms inherited from Proto-New Indo-Aryan rather than Category:Hindi terms inherited from Ashokan Prakrit as it stands right now, which is much more accurate.
2. Where it makes sense to do so, we can create a Reconstructed Sanskrit entry which is of course allowed to have Sinhala, Dardic, Romani, etc descendants. Such cases would be (1) entries that either consist of or are obviously related to attested Sanskrit vocabulary, like *अर्धपूरक (*ardhapūraka), and (2) entries that are a descendant hub for terms beyond the scope of "Reconstructed Prakrit" (e.g. Dardic), like *कियत्त (*kiyatta). Just like what was suggested above, we could create Reconstructed Sanskrit as a variety of Sanskrit. We label all reconstruction pages with this code and use it in the etymology section of NIA pages. So on the Hindi page चौखट (caukhaṭ), the etymology section would say "Inherited from Prakrit चउक्कट्ठी (caükkaṭṭhī), from Reconstructed Sanskrit *चतुष्काष्ठ (*catuṣkāṣṭha)" and the term gets classified as Category:Hindi terms inherited from Reconstructed Sanskrit which is kept separate from Category:Hindi terms inherited from Sanskrit.
In my opinion, this is a clear and clean solution to cutting down the terms that were classified ad-hoc under Category:Hindi terms inherited from Ashokan Prakrit, etc. We reserve "Ashokan Prakrit" strictly for the terms that are found in Ashokan inscriptions or the writing of that time. It also allows us to be generally vague about the age of reconstructed terms (while still being helpful to readers by mentioning all relevant ancestors/descendants and relevant Dardic vocabulary), which seems to be the safe approach that Turner takes in CDIAL.

I don't know if this would be a particularly difficult change to implement, though someone who knows more than me about the back-end of languages on Wiktionary could explain if otherwise. Curious to see what others think, thanks! Dragonoid76 (talk) 05:00, 29 May 2025 (UTC)

@Dragonoid76 may I suggest that we nuke "Reconstructed Ashokan Prakrit" by splitting its terms into "Reconstructed Sanskrit" and "Reconstructed Prakrit"

Whether reconstructions should be considered as Prakrit or Sanskrit may be controversial.
For example, the editors who participated Reconstruction talk:Sanskrit/तिथिवार agreed to delete it after finding Apabhramsa तिहिवार. However, User:स्वर्गसुख replaced the Apabhransha term with Sanskrit *तिथि-वार in the etymology for તહેવાર.
Sanskritists, those who have an inclination towards or expertise in Sanskrit, seem to favour Reconstructed Sanskrit due to the higher familiarity of Sanskrit phonotactics compared to Prakrit phonotactics. However, I am usually against Reconstructed Sanskrit if there is no sound justification for using Old Indo-Aryan instead of Middle Indo-Aryan. Although it may not seem like so, there is a considerable difference between the two stages.

in fact the ancestors of Sinhalese and Dhivehi left the subcontinent before Ashokan Prakrit was even attested

There are several theories regarding the ancestors of Sinhalese and Dhivehi. However, since there is not a coherent consensus about what exactly happened, it seems appropriate to make them descendants of Ashokan Prakrit.

we can create a Reconstructed Sanskrit entry which is of course allowed to have Sinhala, Dardic, Romani, etc descendants

Since Sinhala, Dardic, Romani are already descendants of (Reconstructed) Ashokan Prakrit, changing the reconstructed stage from Middle Indo-Aryan to Old Indo-Aryan is not needed for this purpose.

we would have to decide whether is an ancestor, descendant or contemporaneous with the merged Prakrit language Proto-New Indo-Aryan

The issue with Proto-New Indo-Aryan is that discrepancies such as gemination that are better explained at Middle Indo-Aryan or earlier would have be reflected at the New Indo-Aryan stage. Kutchkutch (talk) 09:12, 29 May 2025 (UTC)

@Kutchkutch,

Proto-New Indo-Aryan is that discrepancies such as gemination that are better explained at Middle Indo-Aryan or earlier would have be reflected at the New Indo-Aryan stage

I think I understand what you mean here. It might be better to use the label Reconstructed Prakrit (as opposed to Proto-New Indo-Aryan) on reconstruction pages, since the reconstruction fits into MIA rather than NIA. So Hindi छोटा (choṭā) will have "Inherited from Reconstructed Prakrit *𑀙𑁄𑀝𑁆𑀝 (*chŏṭṭa) ..." in its etymology section and be categorized as Category:Hindi terms inherited from Reconstructed Prakrit.

For example, the editors who participated Reconstruction talk:Sanskrit/तिथिवार... I am usually against Reconstructed Sanskrit if there is no sound justification for using Old Indo-Aryan instead of Middle Indo-Aryan

I agree with the decision on तिथिवार (tithivāra). I understand your perspective and I agree that there are clear differences between Early MIA and OIA. To raise a few points:

Just to be clear, how many "reconstructed" terms actually have Dardic descendants? Are there many examples in Ashokan Prakrit right now? If more-or-less every term in Category:Ashokan Prakrit reconstructed nouns was moved into Category:Prakrit reconstructed nouns I don't think there would be any problems. By rough searching, I found only *𑀨𑀺𑀭𑀢𑀺 (*phirati) with a Kashmiri descendant which can just be mentioned in the etymology section of Prakrit 𑀨𑀺𑀭𑀇 (phiraï) or be a one-off case of reconstructed "Ashokan Prakrit".
Gemination was not reflected in spelling in the edicts of Ashoka, but we have pages right now like Ashokan Prakrit *𑀝𑁄𑀓𑁆𑀓 (*ṭokka) which should clearly be "Reconstructed Prakrit" rather than "Reconstructed Ashokan Prakrit".
In the handful of cases where Dardic terms are clearly inherited from a common ancestor with Prakrit terms that must be traced back to an older language OR where the reconstruction does not fit the phonotactics of dramatic Prakrit, I don't see much of a problem with reconstructing Sanskrit rather than Ashokan Prakrit. Just to be clear, there is overlap between Early MIA and OIA phonotactics and many Early MIA terms were borrowed into Sanskrit and live under Sanskrit alongside other Sanskrit words. So, for example, converting Ashokan Prakrit *𑀢𑁆𑀭𑀺𑀟𑁆𑀟 (*triḍḍa) to Sanskrit *त्रिड्ड (*triḍḍa) is fine because words like Sanskrit हड्ड (haḍḍa) exist (with Dardic descendants).
Regardless, the example I gave with Sanskrit *चतुष्काष्ठ (*catuṣkāṣṭha) is still valid. Are you for or against adding the label Reconstructed Sanskrit to these terms and changing the categorization for the NIA descendants in the manner I suggested?

I should clarify that my main concern is with words in NIA being classified as inherited from "Ashokan Prakrit" or "Prakrit" when in fact they are not inherited from attested MIA vocabulary but are instead from some reconstruction based on several NIA words. This is one way I thought of resolving that issue. Dragonoid76 (talk) 10:32, 29 May 2025 (UTC)

@Dragonoid76: there is overlap between Early MIA and OIA phonotactics … converting Ashokan Prakrit *𑀢𑁆𑀭𑀺𑀟𑁆𑀟 (*triḍḍa) to Sanskrit *त्रिड्ड (triḍḍa) is fine because words like Sanskrit हड्ड (haḍḍa) exist (with Dardic descendants)

Even if *triḍḍa has the initial cluster tr and a geminated ḍ, Ashokan Prakrit *𑀢𑁆𑀭𑀺𑀟𑁆𑀟 as an Early MIA reconstruction is still preferable to Sanskrit *त्रिड्ड.
The initial cluster tr is permissible in Early MIA.
Furthermore, since the ḍḍ in Sanskrit हड्ड (haḍḍa) is adopted from Prakrit, it would seem odd to consider *triḍḍa as Sanskrit.

Sanskrit *चतुष्काष्ठ (catuṣkāṣṭha) is still valid. Are you for or against adding the label Reconstructed Sanskrit to these terms

RC:Sanskrit/चतुष्काष्ठ and RC:Sanskrit/अर्धपूरक seem to be examples of favour Reconstructed Sanskrit due to the higher familiarity of Sanskrit phonotactics.
Based on the descendants at Wiktionary and those mentioned at CDIAL, there seems to be no justification as to why these two reconstructions necessarily have to Sanskrit.
Therefore, considering these two reconstructions as Sanskrit rather than MIA may overstate their age.

how many "reconstructed" terms actually have Dardic descendants

See RC:Ashokan Prakrit/𑀕𑀸𑀟𑁆𑀟 for another example.
The Kashmiri descendant ग्वफ् could possibly be unified with the descendants at RC:Ashokan Prakrit/𑀕𑀼𑀧𑁆𑀨𑀸

Gemination was not reflected in spelling in the edicts of Ashoka

This observation on its own is not a reason to eliminate Reconstructed Ashokan Prakrit for Early MIA reconstructions.
When Reconstructed Ashokan Prakrit is needed for Dardic descendants, they can be spelled using gemination.

Kutchkutch (talk) 12:25, 29 May 2025 (UTC)

@Kutchkutch Ok let's leave this point be then. Seems like Ashokan Prakrit *𑀕𑀸𑀟𑁆𑀟 (*gāḍḍa), *𑀕𑀼𑀧𑁆𑀨𑀸 (*gupphā), *𑀢𑁆𑀭𑀺𑀟𑁆𑀟 (*triḍḍa), and *𑀨𑀺𑀭𑀢𑀺 (*phirati) and a few others constitute a handful of words with Dardic descendants. I think reconstructing Sanskrit is just easier to look at and understand than reconstructing "Ashokan Prakrit", but I can understand why they are constructed at the Early MIA stage rather than Sanskrit and in any case this doesn't affect a huge number of words.

Can you please let me know what you think of the first point on the creation of "Reconstructed Prakrit" as a variety of Prakrit and using this label in the etymology of NIA terms? Dragonoid76 (talk) 19:37, 29 May 2025 (UTC)

@Dragonoid76 let me know what you think of the first point on the creation of "Reconstructed Prakrit" as a variety of Prakrit

If there are no Dardic descendants (or another compelling reason) for Reconstructed Ashokan Prakrit, then Reconstructed Prakrit would be used.
Reconstructed Prakrit is certainly different from attested Prakrit.
However, having a separate etymological code and label for the reconstructed variety is not done for other languages that are both attested and reconstructed. See
- Category:Reconstructed nouns by language
- Category:Reconstructed verbs by language

parallels what is done for Romance languages

The labels for Reconstructed Latin are for subfamilies similar to the subfamily labels used for the entirely reconstructed Proto-Germanic as Category:Regional Proto-Germanic.
By comparison, the Magadhi lect label is used for Eastern Indo-Aryan at RC:Prakrit/𑀧𑀜𑁆𑀘𑀻𑀮 and RC:Prakrit/𑀰𑀸𑀮𑀺𑀓𑁆𑀓.

Vulgar Latin is classed as a variety of Latin

For Indo-Aryan, the MIA stage rather than Sanskrit is the more immediate vulgar form.
Unlike Vulgar Latin, MIA is attested as a classical language.

using this label in the etymology of NIA terms

Not using in the etymology of NIA terms was proposed by User:Svartava because not all Prakrit terms are associated with a lect.
Therefore, having some NIA etymologies with a lect specified and others without a lect specified causes inconsistencies even for the descendants of the Maharastri lect.
Therefore, using the Reconstruction namespace and the asterisk may be the only appropriate way to distinguish Reconstructed Prakrit from attested Prakrit.

Kutchkutch (talk) 02:00, 30 May 2025 (UTC)

@Dragonoid76: I am on board with everything you wrote until "Reconstructed Sanskrit". I frequently find New Indo-Aryan terms that come from Sanskrit compounds that are not otherwise attested. For them, I just give the Sanskrit term with a hyphen in between and then put an extra vertical bar in there so it does not create a red link. For example, at कठौता (kaṭhautā), I have written {{inh|hi|sa||काष्ठ-पात्र}}. I'm certainly not okay with this and several such other pages now having "Reconstructed Sanskrit" in the etymology. Here is the reason: in Sanskrit, compounds were formed very, very freely. One frequently finds compounded words in Sanskrit literature that does not find a mention in the dictionaries. The famous Bhagavad Geeta verse, Chapter 4 verse 8 for instance, has the word धर्मसंस्थापनार्थ (dharmasaṃsthāpanārtha, “purpose of establishing Righteous Law”). This and several other examples attested throughout Sanskrit literature are so specific to contexts, that dictionaries simply cannot list all of them. So terms like पक्ववट (pakvavaṭa), काष्ठपात्र (kāṣṭhapātra) etc are fully valid Sanskrit, and not hypothesized terms speculated to have existed by linguists. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 01:30, 30 May 2025 (UTC)

terms like पक्ववट (pakvavaṭa), काष्ठपात्र (kāṣṭhapātra) etc are … not hypothesized terms speculated to have existed by linguists.

It may be true that compounds were formed very, very freely, dictionaries simply cannot list all of them and unattested compounds may be synchronically fully valid.
However, from a diachronic standpoint, if such compounds are not attested in the past, then their ability to be the ancestors of other words at an earlier time is nonetheless speculative.
Using a hyphen instead of the conventional asterisk avoids a term from being immediately identified as a reconstruction.

Kutchkutch (talk) 02:32, 30 May 2025 (UTC)

@Kutchkutch @Pulimaiyi Ok I'm fine with conceding the "reconstructed" Sanskrit idea after reading both of your valid points. To Pulimaiyi's comment on कठौता (kaṭhautā), I think what is currently on the page or the similar "Derived from Sanskrit काष्ठ (kāṣṭha) + पात्र (pātra) ..." in the etymology section would be uncontroversial and helpful to the reader. I agree that such compounds were formed "very, very freely", but I think if the exact compound in question is unattested then it's controversial to use "inherited". I think the usage of "derived" here is uncontroversial.

Can we set up "Reconstructed Prakrit" within Category:Varieties of Prakrit (the same way that Vulgar Latin is in Category:Varieties of Latin). It is not strictly speaking a "variety" of Prakrit, but merely exists to house the fairly-large number of MIA reconstructions from NIA vocabulary in order to separate these reconstructions a little from Prakrit (but ultimately all the reconstructions will still just fall under the "Prakrit" language). If I understand correctly, Pulimaiyi is okay with this. Dragonoid76 (talk) 04:44, 30 May 2025 (UTC)

@Dragonoid76 I think what is currently on the page or the similar "Derived from Sanskrit काष्ठ (kāṣṭha) + पात्र (pātra) ..." in the etymology section would be uncontroversial and helpful to the reader.

For the etymology of कठौता, *𑀓𑀝𑁆𑀞𑀯𑀢𑁆𑀢 should remain as the reconstructed direct ancestor.
The constituents 𑀓𑀝𑁆𑀞 (kaṭṭha) + 𑀧𑀢𑁆𑀢 (patta) can be shown on the entry for *𑀓𑀝𑁆𑀞𑀯𑀢𑁆𑀢.
For a Sanskrit perspective, compare Sanskrit काष्ठ and पात्र could be shown as a comment in the etymology sections of *𑀓𑀝𑁆𑀞𑀯𑀢𑁆𑀢 and कठौता.

Can we set up "Reconstructed Prakrit" within Category:Varieties of Prakrit

There already exists Category:Prakrit reconstructed terms for that purpose.

Kutchkutch (talk) 05:35, 30 May 2025 (UTC)

@Kutchkutch The point would be for Reconstructed Prakrit to be an etymology-only language added to this module so that it can be used in NIA etymology sections. Dragonoid76 (talk) 06:35, 30 May 2025 (UTC)

@Dragonoid76

Since there would be no difference between an etymology-only Reconstructed Prakrit language and Prakrit terms derived through reconstruction, an etymology-only Reconstructed Prakrit language would be redundant.
Reconstructed terms are already marked using the asterisk and supposed to be placed in reconstruction categories, and reconstructed entries are in the Reconstructed namespace.
Reconstruction merely refers the suspicion that a term existed without being attested. The process usually ignores dialectal variation and assumes that sound changes are regular.
Thus, an etymology-only Reconstructed Prakrit language would not be a variety in the same sense as a Prakrit lect (Maharastri, Sauraseni, Magadhi, Ardhamagadhi, etc.) or Epigraphic Prakrit.

Kutchkutch (talk) 09:06, 30 May 2025 (UTC)

@Kutchkutch Is there some way to create a category in Category:Hindi terms inherited from Reconstructed Prakrit which is separate from Category:Hindi terms inherited from Prakrit without introducing a new lect? Dragonoid76 (talk) 16:40, 30 May 2025 (UTC)

@Kutchkutch Also then, why is Category:Vulgar Latin and especially Category:Proto-Romance allowed but "Reconstructed Prakrit", "Proto-Prakrit", or "Proto-New Indo-Aryan" not allowed? We'd be using it in more or less the same way and it would help with organization. Dragonoid76 (talk) 17:27, 30 May 2025 (UTC)

Actually, I have another idea that might be more palatable. On a page like कचहरी (kacahrī), we can write in the etymology section "Derived from Middle Indo-Aryan *𑀓𑀘𑁆𑀘𑀳𑀭𑀺𑀆 (*kaccahariā) ..." with {{der+|hi|inc-mid}} {{m|pra|*𑀓𑀘𑁆𑀘𑀳𑀭𑀺𑀆}}. I like the usage of the etymology-only language "Middle Indo-Aryan" here as a broad label for reconstructed ancestors of NIA terms and with {{m|pra|...}} or {{m|inc-ash|...}} can select for the appropriate level of MIA reconstruction. This would sufficiently separate the NIA terms derived from attested Prakrit from NIA terms which do not have an attested Prakrit ancestor. The latter terms which have no attested Prakrit ancestor will go to Category:Hindi terms derived from Middle Indo-Aryan languages which is more reasonable than Category:Hindi terms inherited from Prakrit. Dragonoid76 (talk) 22:27, 30 May 2025 (UTC)

@Dragonoid76 I like the usage of the etymology-only language "Middle Indo-Aryan" here as a broad label for reconstructed ancestors of NIA terms and with ... or ... can select for the appropriate level of MIA reconstruction

An etymology-only language is a subset of a single Wiktionary language rather than encompassing two Wiktionary languages such as Ashokan Prakrit and Prakrit.
Moreover, Middle Indo-Aryan is a chronological stage of Indo-Aryan rather than a single (Wiktionary) language.
Reconstructed entries currently exist in two MIA languages (Ashokan Prakrit and Prakrit), which represent the Early MIA and (Middle) MIA substages.
If Late MIA reconstructions are made in the Apabhramsa language, then the number of reconstructed MIA substages could possibly increase to three.

terms which have no attested Prakrit ancestor will go to Category:Hindi terms derived from Middle Indo-Aryan languages which is more reasonable than Category:Hindi terms inherited from Prakrit

Doing so would remove the distinction between the two (or potentially three) MIA substages when categorising descendants from the reconstructed MIA substages.
Making such a distinction between the MIA substages when categorising descendants is necessary.

Kutchkutch (talk) 03:01, 31 May 2025 (UTC)

@Kutchkutch @Pulimaiyi Ok this argument is dragging out too long, so beyond this reply I don't care enough to keep defending this idea.

I'm basically requesting that NIA terms which have no attested Prakrit ancestor (which is a lot of terms) be categorized somewhere that is separated from C:Hindi terms inherited from Prakrit or C:Hindi terms inherited from Ashokan Prakrit. I think such categorization would be helpful and nice given the breadth of NIA terms with no secure MIA (or OIA) etymology. If this is possible, great, if not then continue with status quo.

Making such a distinction between the MIA substages when categorising descendants is necessary.

Fine. This affects maybe at most 15 words that, for whatever reason might exist, must be constructed at the Early MIA-level.

If you're going to be pedantic about the distinction between Early MIA and Middle MIA reconstructions, I would think that you also have to be pedantic about Wiktionary's usage of "Ashokan Prakrit" as catch-all for Early MIA reconstructions. This is a bizarre Wiktionary-invented idea that is not used anywhere else in the literature (as Aryaman said in the original 2024 discussion). Seeing "Inherited from Ashokan Prakrit *𑀘𑀽𑀳 (*cūha)" on the page चूहा (cūhā) is such an eyesore to me.

Here's my final idea and then I'm done with this. Why don't we split the family MIA languages into Early MIA (further divided into Pali, Ashokan, Niya, and Gandhari), Middle MIA (the parent of Prakrit and all its sub-lects), and Late MIA (the parent of Apabhramsa and maybe Kamarupi). There is precedent for making these finer-grained categorizations (see this)

Then on a page like कचहरी (kacahrī) we write "Derived from Middle MIA *कच्चहरिआ (kaccahariā)" which redirects the user to the Prakrit page *𑀓𑀘𑁆𑀘𑀳𑀭𑀺𑀆 (*kaccahariā). On some other term that must go back to Early MIA-level, we would write "Derived from Early MIA ..." which redirects the user to a similar Ashokan Prakrit reconstruction page. Such NIA terms will automatically be categorized into C:*NIA language* terms derived from Middle Middle Indo-Aryan and C:*NIA language* terms derived from Early Middle Indo-Aryan, which will be filled with the many NIA terms with no attested MIA etymon. Such words will no longer clog up C:*NIA language* terms inherited from Prakrit or C:*NIA language* terms inherited from Ashokan Prakrit. AND, we are still correctly making a distinction between the MIA substages as you request. This is so much cleaner of a categorization and etymology-section, at least in my view. Dragonoid76 (talk) 00:23, 1 June 2025 (UTC)

@Dragonoid76 Why don't we split the family MIA languages into Early MIA (further divided into Pali, Ashokan, Niya, and Gandhari), Middle MIA (the parent of Prakrit and all its sub-lects), and Late MIA (the parent of Apabhramsa and maybe Kamarupi)

A Middle MIA family is not possible because the sub-lects of Prakrit are merged as a single Wiktionary language.
The status of Kamarupi (whether it should be merged with Apabhramsa or remain separate) is unclear, because there has been only been one major Kamarupi editor.
Thus, a Late MIA family is untenable for the time being.

on a page like कचहरी (kacahrī) we write "Derived from Middle MIA *kaccahariā which redirects the user to the Prakrit page *𑀓𑀘𑁆𑀘𑀳𑀭𑀺𑀆 (*kaccahariā). On some other term that must go back to Early MIA-level, we would write "Derived from Early MIA ..." which redirects the user to a similar Ashokan Prakrit reconstruction page.

Regarding “a NIA entry would only link to the Middle MIA reconstructed entry or the Early MIA reconstructed entry but not both”,
- As per the discussions at 1 and 2, textual etymologies are only to be truncated for the entirely reconstructed languages preceding Sanskrit.
- Accordingly, terms later than Sanskrit are not to be truncated in etymology sections.
- So, if there are reconstructed entries for both the Early MIA and Middle MIA stages of the same term, then both reconstructed terms would be mentioned in the textual etymology of a NIA entry.
Regarding “having reconstructed entries for both the Early MIA and Middle MIA stages of the same term”,
- Having separate entries for both the Early MIA and Middle MIA stages of the same term would involve maintaining page links, categorisation, descendants, etc. for both entries instead of just one entry.

Kutchkutch (talk) 03:56, 1 June 2025 (UTC)

The thing is, *kaṭṭhavatta cannot be synchronically formed by kaṭṭha + patta unless we assume that the Prakrit speakers knew that an intervocalic /p/ had to be realized as /v/. Going back to a singular form kāṣṭhapātra is required. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 07:09, 30 May 2025 (UTC)

Going back to a singular form kāṣṭhapātra is required

If Prakrit reconstructions cannot have intervocalic /p/ realized as /v/, then such a rule should be written explicitly in the Reconstructed notes sections of any corresponding reconstructed entries to explain the reasoning behind choosing that particular stage.
It remains to be decided whether the stage of the reconstructed etymon of कठौता with intervocalic /p/ is Early MIA (Ashokan Prakrit) or OIA (Sanskrit).

*kaṭṭhavatta cannot be synchronically formed by kaṭṭha + patta unless we assume that the Prakrit speakers knew that an intervocalic /p/ had to be realized as /v/

If Pali can retain the intervocalic /p/ in cases such as sapattī, then an intervocalic /p/ seems reasonable for an Early MIA reconstruction.
In any case (Early MIA or OIA), there is a reconstruction involved rather than a Sanskrit compound that uses a hyphen instead of the conventional asterisk.

Kutchkutch (talk) 08:57, 30 May 2025 (UTC)

Alas, it's more complicated than that. I'm strongly tempted to the view of Pali as demagadhised Magadhi. One of these changes is that the reflexes of older intervocalic /p/ usually surface in written Pali as /p/, but sometimes survive as /v/. The usual view is that these exceptions are that words that escaped normalisation, rather than invasions from faster evolving speech. --RichardW57 (talk) 17:11, 3 June 2025 (UTC)

Categorising Ukrainian toponyms by oblast

I'd like to add categorisation for each of the Ukrainian oblasti to Module:place/locations, in a similar manner to the way in which categorisation already exists for each of the Russian federal subjects. Is there any objection to my doing this? 0DF (talk) 22:17, 29 May 2025 (UTC)

@0DF No objections from me, but IMO such categorization only makes sense if there are enough toponyms to fill up many of the categories for at least some language (e.g. English and/or Ukrainian), or if you're planning on adding such toponyms. Also you might want to make your changes in a sandbox module so I can take a look at them before you push them to production, to make sure they're correct. Benwing2 (talk) 21:18, 30 May 2025 (UTC)

@Benwing2: Adding Ukrainian toponyms seems to be one of the main things I do, so yes, I am planning adding such toponyms. Is there a particular sandbox module in which you'd like me to add the code for your inspection? 0DF (talk) 21:54, 30 May 2025 (UTC)

@0DF You can edit Module:User:Benwing2/place/locations (which I just updated with the latest production version) or create your own. Benwing2 (talk) 21:59, 30 May 2025 (UTC)

@0DF Also note that Russian federal subjects are particularly complex because of the multitude of different types, so they're not the best country to model the Ukrainian oblasts on; you might use Romania or some other country with a single type of division, or Moldova, which has a couple of "autonomous territorial units" which is more or less parallel to the Ukrainian situation. Benwing2 (talk) 22:03, 30 May 2025 (UTC)

I went ahead and added the oblasts. There are some questions though:

The code uses "municipalities" rather than "hromadas". Generally we try to avoid using special-purpose borrowings like "hromada" in favor of more generic equivalents, but I don't know if hromadas are close enough to municipalities to make this equivalence. Either way, the definition of the toponyms using {{place}} has to agree with whatever term the module uses in order for them to get categorized correctly into e.g. Category:en:Municipalities in Kharkiv Oblast, Ukraine. (Actually it's possible to accept both "hromada" and "municipality" and have them both categorize the same way. I'm not sure how you've been entering the toponyms so far.)
Wikipedia says that in addition to the 24 oblasts, there is one autonomous republic (Crimea) and two cities with special status (Kyiv and Sevastopol). I didn't enter them yet pending how to handle them. (Unless the cities have surrounding municipalities that are different both from the cities themselves and the larger containing unit, i.e. oblast or autonomous republic, nothing really needs to be done. As for Crimea, one thing that could be done is to use a combined category like 'oblasts and autonomous republics'; this is done already for several countries, e.g. 'provinces and territories' in Canada, 'states and territories' in Australia, 'states and union territories' in India, etc.)

Benwing2 (talk) 06:24, 31 May 2025 (UTC)

@Benwing2: You beat me to it! Sorry, I was in bed. Thank you for doing that. In answer to your questions:

I've been using the terms hromada, raion, and oblast, per the English Wikipedia. I infer from the titles of the other language-editions of Wikipedia that the Ukrainian територіа́льна грома́да (terytoriálʹna hromáda) is rendered by them thus:
Arabic مجتمع إقليمي (mujtamaʕ ʔiqlīmiyy), Belarusian грамада́ (hramadá), Catalan hromada, Chinese 市鎮 / 市镇 (shìzhèn), Danish hromada, Dutch gemeente, English hromada, Estonian vald, French communauté territoriale, German Hromada, Indonesian hromada, Italian comunità territoriale, Japanese フロマーダ (furomāda), Latvian teritoriālā kopiena, Occitan hromada, Polish hromada, Russian общи́на (obščína), Spanish municipio, Swedish hromada, Turkish belediye, Vietnamese hromada
July–August last year, I gathered a number of quotations illustrating the term in an attempt to work out its range of meanings, but I ran out of steam and haven't returned to continue to work at that particular wordface since, so my analysis of the term is incomplete. Nevertheless, if you look at the entry for Ukrainian грома́да (hromáda), you'll see the word means “community”, but also an assembly of that community, as well as the адміністрати́вно-територіа́льна одини́ця (administratývno-terytoriálʹna odynýcja, “administrative–territorial unit”) under its jurisdiction. Its polysemy resembles that of сільра́да (silʹráda), which, being a calque of the Russian сѐльсове́т (sèlʹsovét), means “village council”, the building in which that council meets, and the administrative–territorial unit under its jurisdiction. Before the introduction of hromady, by far the most common of the bottom-level administrative divisions of the Ukrainian Republic and predecessor SSR was the silrada. AFAICT, silrady ceased to exist as administrative–territorial units in 2020, when they were superseded by hromady (except on the Crimea — whether that's because of the Russian occupation, its political autonomy, or some other reason, I don't know). However, the councils themselves that administer the hromady are still called ра́ди (rády), so we shouldn't expect грома́да (hromáda) to develop the same range of meanings as сільра́да (silʹráda). I don't think municipality is a good translation. Territorial community (per French, Italian, and Latvian) is OKish, but misses the nuance of the term. I would prefer we use hromada, just like we use raion instead of district and we use oblast instead of region.
I'll address the cities and autonomous republic separately:
1. Kyiv and Sevastopol are the Ukraine's two cities with special status, a.k.a. cities of republican significance, cities of national significance, cities under republican jurisdiction, cities of republican subordination, etc. This means they are part of the Ukraine, but they are not part of any oblast, raion, or hromada. Cities with special status (in a non-technical sense) used to be pretty ubiquitous in the Ukrainian Republic and SSR, as they were in the USSR and still are in Russia. In the Ukrainian polities, there used to be other cities of oblast significance and there were cities of raion significance, too. These cities' administrative–territorial units were called міськра́ди (misʹkrády). If you take a look at w:Subdivisions of Kyiv and w:Administrative and municipal divisions of Sevastopol, you'll see that both cities are subdivided into міські́ райо́ни (misʹkí rajóny, “urban districts”); other Ukrainian cities are similarly subdivided into urban districts. Urban districts are further subdivided into things like neighbourhoods and мікрорайо́ни (mikrorajóny, “microraiony”), but AFAIK these don't have local autonomy. You may notice that these urban districts are called raiony in Ukrainian; however, I think it makes sense for us to distinguish the relatively territorially extensive raiony simpliciter (which comprise hromady) from the urban raiony/districts (which contain neighbourhoods, microraiony, etc., but not hromady). Finally, for clarification, note that the City of Kyiv is not part of Kyiv Oblast.
2. The Autonomous Republic of the Crimea is subdivided into raiony, just like an oblast. However, those raiony are not subdivided into hromady. The Crimea still uses the old subdivisions, namely the сільра́да (silʹráda), сельра́да (selʹráda), and міськра́да (misʹkráda), which elsewhere in the Ukraine have been superseded by the сільська́ грома́да (silʹsʹká hromáda), се́лищна грома́да (sélyščna hromáda), and міська́ грома́да (misʹká hromáda), respectively.

If you want a catch-all category name for these, you could have oblasti, autonomous republic, and cities with special status or, if that's too wordy, first-level subdivisions or top-level subdivisions.

I hope I correctly understood and adequately answered your questions. 0DF (talk) 16:19, 31 May 2025 (UTC)

Cities do not need to be handled in the same fashion as other divisions even if they're top-level; lots of countries have top-level cities and we do not group them in any way but put them directly under the country. So there's no need to mention "cities with special status" in the grouping category. Also using foreign plurals like oblasti is (IMO) extremely pedantic and simply unnecessary; oblasts is totally fine. You have convinced me though that we should use hromadas in place of municipalities, and it parallels the use of e.g. comunes in Italy, communes in France and several other countries, etc. Benwing2 (talk) 01:27, 1 June 2025 (UTC)

@Benwing2: I have no objection to treating Kyiv and Sevastopol thus. Re preserved plurals, these Google ngrams are interesting:

oblasti,oblasts — negligible use of either until ~1950, oblasti more common than oblasts until 1986, oblasts more common thereafter, oblasts around two or three times as common as oblasti in recent usage
raiony,raions — negligible use of either until ~1950, raions more common than raiony at all times except 1974–1975, usually considerably so until ~2010
hromady,hromadas — negligible use of either until ~1950, hromady more common than hromadas 1952–1972 and 1974–1992, frequency identical 1992–2008, hromady more common than hromadas 2008–2016, hromadas more common than hromady 2016–2022

Unfortunately, the hromada qua current administrative–territorial unit didn't exist until 2015 and Google Ngrams' English corpus only extends to 2022, so this sample may be too small for statistical validity. 0DF (talk) 07:19, 1 June 2025 (UTC)

@0DF You asked about forbidding or tracking the use of hromada by itself without a preceding 'urban', 'rural' or 'settlement'. I *think* if you set link = false in the entry for hromada in Module:place/placetypes, it will cause such an error to be thrown, without affecting the more specific variants that have 'hromada' as a fallback. You can check who uses the entry placetype 'hromada' using Special:WhatLinksHere/Wiktionary:Tracking/place/entry-placetype/hromada; unfortunately this also tracks uses of 'urban hromada' and 'rural hromada' (but not 'settlement hromada') because 'urban' and 'rural' are recognized qualifiers. But there are only 29 entries in the link I just gave, so you could check each one to see which ones use bare 'hromada'; and then once you've cleaned them up, set link = false as described above. Benwing2 (talk) 21:58, 1 June 2025 (UTC)

@Benwing2: Thanks. I've just done as you suggested. Fortunately, that what-links-here does not track uses of the aliases rhrom, shrom, and uhrom, so that's a feasible way of keeping an eye on this. I've also changed all the uses of municipality for Ukrainian toponyms to uhrom (they all refered to Lviv), so could you now remove municipality as an alias of hromada, please?

Your suggestion of specifying link = false for hromada simpliciter worked perfectly when it was |2=hromada in {{place}}. Unfortunately, it didn't do anything when it was |3=hromada/... in {{place}}. Is there a way of "prohibiting" the use of hromada/ simpliciter, too? 0DF (talk) 00:37, 2 June 2025 (UTC)

Yeah the link = false only works for entry placetypes, not for holonym placetypes. There's no current way of prohibiting the use of certain placetypes for holonyms but I can add it. Benwing2 (talk) 00:42, 2 June 2025 (UTC)

@Benwing2: It would be wonderful if you could do that, yes, please. Special:WhatLinksHere/Wiktionary:Tracking/place/entry-placetype/hromada already has fifteen new entries accrued in the last eleven hours, all of them referring to Zaporizhzhia urban hromada using the holonym placetype urban hromada/. Removing those entries from the list would only take changing those instances to uhrom/, but there's nothing exactly wrong with them as they are, so it is to be hoped that wouldn't be necessary. 0DF (talk) 11:22, 2 June 2025 (UTC)

@0DF What are you requesting exactly? Special:WhatLinksHere/Wiktionary:Tracking/place/entry-placetype/hromada is for entry placetypes, not for holonym placetypes. (BTW they got added because I fixed Zaporizhzhia and added 'urban hromada' in the definition, and used {{tcl}} to define all the translations of Zaporizhzhia to include the four main definitions as (1) city/urban hromada, (2) raion, (3) oblast, (4) historical geographic region. There are also a zillion small villages called Zaporizhzhia, which I added, but definitions for them exist only in English and Ukrainian. Note also that when a city and municipality have substantially the same boundaries, we tend to define them together using a single definition.)

Potentially we could

(1) Have special entry placetype tracking for just the full placetype, so that use of 'urban hromada' won't get entered into the 'hromada' tracking.

(2) Have tracking for holonym placetypes, i.e. use of 'hromada/Foo' in a holonym.

(3) Forbid use of bare 'hromada' in holonyms.

Let me know what you were looking for. Benwing2 (talk) 20:07, 2 June 2025 (UTC)

@Benwing2: I'm sorry; I expressed myself poorly. I advocate plan-of-action 3, forbidding the use of hromada simpliciter in holonyms. Hromady have apparently been a thing for centuries, going back at least as far as the fourteenth century. However, it appears that they had always been the organs of self-government of individual (rural) settlements up until the introduction of amalgamated hromady in 2015. If that is indeed the case, then hromada can legitimately serve as a holonym placetype only under the holonym c/Ukraine. (It is unclear whether the West Ukrainian People's Republic and/or the Ukrainian People's Republic also had hromady of the modern sort, but if either did, I've never seen any particular hromada of theirs referred to.) Since it should always be specified for any given Ukrainian hromada what kind (rural, settlement, or urban) of hromada it is, that means that hromada simpliciter cannot legitimately serve as a holonym placetype and that only rural hromada, settlement hromada, and urban hromada can legimately serve as holonym placetypes. Does that make sense?

I am not confident that the city of Zaporizhzhia and Zaporizhzhia urban hromada do have substantially the same boundaries. It may not be common for a single settlement to constitute a hromada, but it was pretty common for a single settlement to constitute a sil-, sel-, or miskrada. I've come across many that started off comprising multiple settlements, but over time all but one of the constituent settlements became depopulated and disappeared. Examples include: Vyshnivka silrada, established in 1992 when it comprised Vyshnivka, Mala Vyshneva, and Zhuvakivka, the latter two of which ceased to exist at some point; Bohdanivske silrada, established in 1943 when it comprised Bohdanivske and Hrianykivka, the latter subsumed into the former in 1998, which did nothing to change the fact that the (now-former) village is surrounded by forest; and Vysoke Czech national selsoviet, established in 1924 when it comprised the then-named Vysoke-Cheske, Churanda-Vynohrady, and Horbashi, the latter of which disappeared ante 1941 and the middle of which disappeared ante 1946. It is surely not the case that the boundaries of the eponymous settlements suddenly expanded to match those of their silrady at the point that they became the sole survivors thereof. Rather, I would say that the boundaries of a settlement end where its buildings end and that the land outside that is part of its hromada, but not part of the settlement itself. All that being said, it is also pragmatic to split sole settlement from hromada by sense because the two will have different translations. 0DF (talk) 23:54, 2 June 2025 (UTC)

I guess I also expressed myself poorly, what I meant was that if a given urban hromada (or generally, municipality, com(m)une, etc.) contains only one settlement, its capital, then it's generally easier not to split out the hromada/etc. If the settlement itself and municipality have different translations, that would be one reason to split them out, but it seems that will rarely happen. One clue as to whether to split out the municipality is how Wikipedia handles it; in the case of Zaporizhzhia, the same article serves for both the city and urban hromada, and it's not clear they even have a separate government. But as for your request, I can implement #3 "forbid use of bare 'hromada' in holonyms". Benwing2 (talk) 01:04, 3 June 2025 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘

@Benwing2: Actually, an administrative division and its eponymous settlement will almost always have different translations in the Slavic languages I've looked at (Belarusian, Czech, Polish, Russian, and Ukrainian). The English Chasiv Yar corresponds to the Belarusian Ча́саў Яр / Ча́сіў Яр (Čásaŭ Jar / Čásiŭ Jar), Czech Časiv Jar, Polish Czasiw Jar, Russian Ча́сов Яр (Čásov Jar), and Ukrainian Ча́сів Яр (Čásiv Jar); the English Chasiv Yar urban hromada corresponds to the Belarusian Часавая́рская гарадска́я/ме́ская грамада́ (Časavajárskaja haradskája/mjéskaja hramadá), Czech Časovojarské městské společenství, Polish Czasowojarska miejska hromada, Russian Часовоя́рская городска́я общи́на (Časovojárskaja gorodskája obščína), and Ukrainian Часовоя́рська міська́ грома́да (Časovojársʹka misʹká hromáda). The grammatical difference here is that, whereas English constructs the names of these administrative divisions as + , those Slavic languages construct the names as + . Most of those relational adjectives are pretty straightforwardly formed by suffixation with Belarusian -скі (-ski), Czech -ský, Polish -ski, Russian -ский (-skij), or Ukrainian -ський (-sʹkyj), but irregularities are frequent and the relational adjectives of phrasal nouns are often hard to guess (the Belarusian часавая́рскі (časavajárski) above might well be wrong; Wikipedia was the only place I could find the relational adjective), which makes giving them in translations valuable.

The English Wikipedia may redirect w:en:Zaporizhzhia urban hromada to w:en:Zaporizhzhia (the city), but the Ukrainian Wikipedia keeps them separate (hromada, city). Likewise for:

Bilhorod-Dnistrovskyi (hromada: Ukrainian, Chinese, Crimean Tatar; city: Ukrainian, Chinese, Crimean Tatar);
Chernihiv (hromada, city);
Dubno (hromada, city);
Enerhodar (hromada: Ukrainian, Chinese; city: Ukrainian, Chinese);
Izmail (hromada: Ukrainian, Crimean Tatar; city: Ukrainian, Crimean Tatar);
Kharkiv (hromada: Ukrainian, Chinese, Hungarian, Russian; city: Ukrainian, Chinese, Hungarian, Russian);
Melitopol (hromada: Ukrainian, Chinese; city: Ukrainian, Chinese);
Mykolaiv (hromada, city);
Nikopol (hromada, city);
Novodnistrovsk (hromada, city);
Odesa (hromada: Ukrainian, Crimean Tatar; city: Ukrainian, Crimean Tatar);
Pavlohrad (hromada: Ukrainian, Chinese; city: Ukrainian, Chinese);
Pryluky (hromada, city);
Samar, formerly Novomoskovsk (hromada, city);
Shakhtarske, formerly Pershotravensk (hromada, city);
Slavutych (hromada: Ukrainian, Italian, Chinese; city: Ukrainian, Italian, Chinese);
Synelnykove (hromada, city);
Teplodar (hromada: Ukrainian, Crimean Tatar; city: Ukrainian, Crimean Tatar); and,
Uzhhorod (hromada, city).

Those are all twenty of the cities each of whose hromady comprises only one settlement that I found by going through the 1,293 query results I received by conducting the search w:uk:Спеціальна:Пошук/intitle:"міська громада". According to w:en:Hromada#List of hromadas, there were 409 urban hromady as of October 2023. If the list above is complete, that means that less than 5% of urban hromady comprise only one settlement. I haven't checked, but I expect an even smaller percentage of the 435 settlement hromady and 625 rural hromady to comprise only one settlement. Consequently, combining the senses of these settlements with their eponymous hromady would save little space and would come at the expense of translation provision and would remove the entries for those settlements from the relevant subcategories of Category:Hromada capitals. (BTW, try as I might, I couldn't get {{auto cat}} to work in that category or in Category:en:Hromada capitals, so I just reproduced their preambles manually. I have no idea why {{auto cat}} works for Category:Oblast capitals, Category:Raion capitals, and their subcategories but not for the hromada capitals' categories.)

Finally, re "forbid use of bare 'hromada' in holonyms", thank you. If it could be made to show a warning message like “Please specify the type of hromada: rural hromada (rhrom), settlement hromada (shrom), or urban hromada (uhrom).”, that would be perfect. 0DF (talk) 19:25, 3 June 2025 (UTC)

@0DF All right. It turns out that setting link = false makes categories like Category:en:Hromadas not work, so I need to fix this in a different way. This is probably the reason why Category:Hromada capitals doesn't work as well. BTW usually (or at least often) when {{auto cat}} encounters an issue, it logs a warning, which you can see by previewing the page, scrolling to the bottom where it says "Parser profiling data", and opening up the Lua logs section. Indeed, when you do that, you see:

Display form for pl_placetype "hromadas" is false, can't categorize

This is because of the link = false setting, meaning it doesn't know how to display (i.e. correctly link) the word "hromadas". I'll fix this shortly, and implement the appropriate warning message for usage of entry and holonym placetype "hromada". As for distinguishing cities from urban hromadas, that is fine. Benwing2 (talk) 19:42, 3 June 2025 (UTC)

@Benwing2: Ah-ha! I see. Thanks for working that out, and I'm sorry I missed that clue; I am still but a cautious novice when it comes to Lua.

I previewed disallow_in_entries whilst editing Slavutych — it works great! I look forward to seeing disallow_in_holonyms in action. Thanks for working on this. 0DF (talk) 00:36, 4 June 2025 (UTC)

@Benwing2: now the listing of placetypes in the {{place}} documentation page is throwing an error. Chuck Entz (talk) 14:24, 4 June 2025 (UTC)

@Chuck Entz: Could you perform null edits on Module:place/placetypes and Template:place to see whether Special:Diff/85032654 fixed this problem, please? 0DF (talk) 15:05, 4 June 2025 (UTC)

@Chuck Entz Fixed. Benwing2 (talk) 17:45, 4 June 2025 (UTC)