Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2025/April. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2025/April, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2025/April in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2025/April you have here. The definition of the word Wiktionary:Beer parlour/2025/April will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2025/April, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
I've just noticed many plural nouns such as cattle or arms ('warfare') are never labeled as uncountable (in the sense of numbers/numerals) as different from police or staff (compare: three (members of the) crew are... vs an item of clothing/*clothes).
I just left a comment at Appendix_talk:Glossary to this effect, but you can find more than a handful of examples of cattle used directly after a numeral, which makes it inaccurate to call it "uncountable" without giving further qualifications to that label.--Urszag (talk) 11:24, 1 April 2025 (UTC)
Could you clarify what you mean by "plural uncountable nouns"? Arms might well be in that category, taking a plural verb and usable with much, but clothing takes a singular verb.
To try to answer your question: I think the reason is that different contributors focus on different aspects of the grammar of English nouns. Some focus on countablily/uncountability, some on number agreement with a verb, some on the form(s) of the plural, and others on the semantics of number. It is not obvious to me how one can consistently present the relevant information in an English noun entry. Further complication of already complicated {{en-noun}} may be necessary.
To start, almost any usually countable noun can be used uncountably with a predicable meaning relative to the countable use. Similarly, uncountable nouns can by used countably. A test for uncountability is use of the noun with a determiner like much ("much arms" can be found) (vs. many with a plural form for countability). Whether an English noun is (un)countable does not transparently answer the question of number agreement with a verb. DCDuring (talk) 14:49, 1 April 2025 (UTC)
Uninflected pl-only: i) cattle livestock police poultry1 vermin ii) folk people1 {These cattle belong - *This cattle belongs} to my uncle.
cannot be used with low numerals, but are found with high round numerals (‘quasi-count’). Their denotation is thought of en masse, with none of the individuation into atomic entities that a low numeral implies. Genuine count nouns (usually of more specific meaning) must be substituted in order for this individuation to take place:
a thousand cattle vs *seven cows/*cattle
ii two hundred police vs four policemen / *police / police officers.
They are non sequitur: their denotation is thought of en masse means it is a mass noun thought in one number only and thus not being counted. The agreement is no evidence for it being plural, for the said empirically observed English agreement behaviour. Plural or singular is a morphological category, which is not marked in cattle, grammatical number is a feature that expresses count distinctions, while agreement of number within the subject and the predicate is not required, but a syntactic rule hinging on various environment factors including (most commonly) semantics and word order, e.g. in Arabic PSO sentences always have the predicate in the singular when the same sentence in SPO requires agreement, while in turn English has a more loose requirement of subject and predicate numbers agreeing (because agreement in sense trumps the ideal of sentence constituents). Fay Freak (talk) 19:43, 1 April 2025 (UTC)
@Fay Freak: Take into account etymologies (dia- vs synchronic):
< Medieval Latin capitāle, 'holdings, funds' < from neuter of Latin capitālis
@JMGN: Yeah, the Latin is singular. Translations of course do not even need to be the same POS, let alone number.
See also people your CGEL mentions. Wiktionary labels it as uncountable, e contrario from not labelling any sense but one as countable – which gained abstraction to substitute nation, in the same fashion as United States has gone used with singular predicates due to no certain plurality of individual persons (as states have been personhoods, by analogy to clubs consisting of natural persons) being felt or conceptualized, ousting notional agreement, but United States always stayed a plural. It would not work and be confusing not to. The sense of people called by Wiktionary “plural of person” is lexical suppletion to express plural senses of person by means of a singular. Fay Freak (talk) 20:08, 1 April 2025 (UTC)
CGEL is a good resource, of course, but when it stars *seven cattle and *four police, it is only describing the usage (or judgements) of one group of speakers rather than universal facts about English. Low numerals can in fact be found used with both of these nouns ("four police", "seven cattle").--Urszag (talk) 20:53, 2 April 2025 (UTC)
Both of those links go to texts published in the United States, although I haven't looked up the details of the authors. Of course, time can also be a component.--Urszag (talk) 21:09, 2 April 2025 (UTC)
@Urszag Anyway, it was never about whether that particular word can be used with numerals, but a systematic lack of description in our current labels. JMGN (talk) 21:11, 2 April 2025 (UTC)
Am I right in perceiving that there is currently a gap in en.wikt because there ought to be some Translations section where jabonera (adj) and sabonera (adj) can go? There is not yet any provision made for them for the noun adjunct use of 'soap' at soap#Noun → Translations, and they obviously don't belong at soapy#Adjective → Translations. It seems to me that there needs to be a noun adjunct sense/use listed at soap#Noun → Translations, and it simply hasn't been added yet. But I have never had occasion to ponder this particular pattern before, and there must be thousands of instances, so I may be missing something. Thus asking here. Quercus solaris (talk) 23:32, 1 April 2025 (UTC)
There've been a few small discussions of this over the years: DCDuring and I discussed it back in 2011, after which I mocked up two ways of handling this, in brass/spruce (give other languages' adjectival translations of English attributive use of regular noun senses their own translation table) and cork (put the translations into the "main" table, with qualifiers highlighting the POS mismatch). It came up again in 2016 (mentioned again in 2017), but I've always been hesitant that there was never a big discussion in which a large number of people formalized either approach. Perhaps we can decide here and now which approach (either of those? or something else?) is best. Then again, maybe entries having taken such approaches since 2011 without objection — indeed, many editors and many entries (e.g. racial) have been taking the cork approach (reasonably listing Rassen- as a German translation despite the POS mismatch) for as long as Wiktionary has existed — means that is the accepted way to handle this. - -sche(discuss)15:32, 3 April 2025 (UTC)
(I've excluded two of them: cs-u-sd-cz642 because it's unused and I have a feeling it will break something if added; and hu-formal because it's not really necessary IMO; I'm probably going to remove both of these from Babel as well soon.)
Aside, I think all of these codes should already be etymology-only codes anyway, and adding them will mean that the proficiency categories generated for them are defined in the category tree. Saph (talk) 17:59, 2 April 2025 (UTC)
None of that looks sensible, even in the names little thoughts have been expended — Kirmanjki, Dobrujan Tatar? Database monsters without looking into the linguistic material will not be nursed. For personal identity in babel boxes okay, otherwise begone. Fay Freak (talk) 19:23, 2 April 2025 (UTC)
OK I'm not too set on this, so nevermind. acf probably should be added as a fully-fledged language though - I'll open a new discussion about that later. Saph (talk) 19:28, 2 April 2025 (UTC)
I agree with what's been said. For example, Kurdish is a family, not a language, and there is strong consensus not to add etym variants for individual countries where Serbo-Croatian is spoken. Any country-specific variants of "Chinese" or Hokkien need consultation with the Chinese editors before adding. In general these are all tricky cases as shown for example by the previous discussion on Dobrujan Tatar, which appears not to exist. Benwing2 (talk) 02:24, 3 April 2025 (UTC)
Pronunciation of heteronyms
this discussion may have happened many times in the past, but it seems unresolved due to there not being a proper agreed method to handle these. how should multiple pronunciations for heteronyms be handled if the terms share an etymology? see entries like coop, compound, reject, toot, and, somewhat related, ghoti, among many others. Juwan (talk) 22:33, 2 April 2025 (UTC)
Err, I think you answered your own question. Any of the ways we display them in the entries you is decently handled. I have a feeling, though, that you're looking for a Wiktionary: namespace page that states pls do heteronym pron this way. Father of minus 2 (talk) 22:42, 2 April 2025 (UTC)
it seems so! or at the very least an unofficial official way that used across entries, as often Wiktionary guidelines are left unstated. Juwan (talk) 22:51, 2 April 2025 (UTC)
I am referring to how it should be layed out, yes, but these examples and the policy don't give a good practice for these specific cases. Juwan (talk) 10:56, 3 April 2025 (UTC)
Disabling Babel categorisation for inactive users
@benwing2, -sche (since Benwing mentioned |nocat= was something you wanted):
There was a vote which ended in a consensus for this, but it was in 2017, so I wanted to reaffirm that this was still the consensus. I've already put together a script for this, and I've done 15 test edits under a bot account and all of them except the first worked fine. I can post the script on Github if anyone would like to see it. Saph (talk) 00:57, 3 April 2025 (UTC)
I just noticed, the vote specifies that the users would be moved to categories appended with (inactive); is there any preference between this and, as |nocat= is currently set up, outright disabling categorisation? Saph (talk) 13:04, 3 April 2025 (UTC)
I personally think I prefer the inactive cat as opposed to a total lack of categorization, not just per consensus. Vininn126 (talk) 13:07, 3 April 2025 (UTC)
@Saph: Ceteris paribus, (inactive) is probably better than no categorisation, but it's a minor issue, so if you've already written the script, I would say go with the standard |nocat= function. 0DF (talk) 13:08, 3 April 2025 (UTC) so I second what Vininn126 wrote. 0DF (talk) 13:09, 3 April 2025 (UTC)
Adding (inactive) would be pretty trivial to implement, it would just be a simple edit to MOD:Babel and the category tree, so I'm fine with either way. Saph (talk) 13:55, 3 April 2025 (UTC)
I've implemented this on the Babel end, but adding it to the category tree wasn't as easy as I thought it would be and I've asked on the Discord if someone else more knowledgeable than me can do it. For now I'm going to do an AWB run and convert the |nocat=1 usages to |inactive=1. Saph (talk) 14:51, 3 April 2025 (UTC)
I would prefer something to fetch user contributions and determine that if the last edit is > 1 year ago the categorization is disabled. It is less intrusive than every page being edited for categorization – as most people deem their user page personal, and by extension the editing of others suspect to be inappropriate –, though of course this can also be a regularly bot job, unnecessarily stressing rate limits to flood the recent changes to the project. On the other hand I may be wrong and that operation would be more expensive. Fay Freak (talk) 13:11, 3 April 2025 (UTC)
Unfortunately checking user contributions is not possible in Scribunto, as far as I know at least. Saph (talk) 13:53, 3 April 2025 (UTC)
Yes, Wikimedia categories tend to get updated when something changes, but inactive users, by definition, aren't changing anything. Anything system-based is going to be seriously useless. My main concern is with what happens when someone starts editing again after being away. How long does it take before the categorization catches up? At least something that edits a user page is going to be easily detectable by the user. Then there's the matter of alternate accounts... Chuck Entz (talk) 14:08, 3 April 2025 (UTC)
Support (e/c). If it's easy to implement recategorization into "inactive" categories rather than un-categorization, that seems like a great idea. There was also a brief discussion of this in 2023, associated with which I (for lack, at the time, of a better method) commented out a few inactive users' Babels; if we start tackling this in a proper way, I can probably locate the relevant batch of edits and either revert them (putting the inactive users back into Babel categories for the bot to find) or update them in the same manner as the bot. - -sche(discuss)14:55, 3 April 2025 (UTC)
Yeah, I saw some of those edits, the bot ran into a few - if you could go through them that would be great. The script handles pages already having |inactive=1 fine, so I would just update it in the same way as the bot. Saph (talk) 14:58, 3 April 2025 (UTC)
There seem to be no objections (except from FF, maybe? I can't really tell) so I've gone ahead and created the vote. Saph (talk) 15:08, 3 April 2025 (UTC)
"folk medicine" vs. "alternative medicine"?
Should we categorize them differently or is there too much overlap? I think of "folk medicine" as the traditional medical practices of (especially non-Western) cultures, e.g. Ayurveda, Traditional Chinese Medicine, etc. whereas "alternative medicine" is Western-invented non-evidence-based practices such as chiropractic, homeopathy and aromatherapy. I know it gets fuzzy in that practices like acupuncture and moxibustion are TCM in origin but adopted widely by Westerners. I ask because I just moved 6 Cebuano terms from Category:ceb:Folk medicine (not defined in the category tree) to Category:ceb:Alternative medicine and moved 蒙醫 / 蒙医(méngyī) (defined as "traditional Mongolian medicine") from Category:zh:Medicine to Category:zh:Alternative medicine, but I don't know if these are the best categories. (Is "traditional Mongolian medicine" in essence the same as Traditional Chinese Medicine? If so do we need to rename the latter?) Benwing2 (talk) 05:31, 3 April 2025 (UTC)
I agree with your thought process: there is overlap that does not lend itself to pure dichotomization. Thus, there is no perfect answer about how to categorize them in a dichotomous way. Fuzzy logic could apply, with differing percentages depending on the term, such as X%-weighting-for alternative medicine and Y%-weighting-for folk medicine. The same theme is also true regarding alternative medicine versus quackery, although the quackiest kind of quackery is when the doctor consciously knows that the treatment is useless and sells it anyway, as opposed to instances when the doctor mistakenly believes in the treatment. But there is also the layer on which it is true that if the placebo effect brings peace of mind or other customer satisfaction to the patient (whether blinded or even nonblinded), then the placebo cannot be said to be useless, because even though it has zero efficacy, it has more than zero effectiveness. Quercus solaris (talk) 06:19, 3 April 2025 (UTC)
Maybe folk medicine is a part of alternative medicine? (at least if used in modern society which sees the difference between alternative and scientific methods). Also, I will not consider stuff like bustein as alternative medicine, but it is still a part of old folk medicine, as long you don’t use it today (then it’s gonna be alternative medicine). Tollef Salemann (talk) 06:30, 3 April 2025 (UTC)
This seems a good idea; maybe folk medicine is a subcategory of alternative medicine that includes all manner of traditional medical practices, so for example, Ayurveda and TCM have folk medicine as the parent category, which in turn has alternative medicine as a parent category, while chiropractic and homeopathy directly have alternative medicine as the parent category. Benwing2 (talk) 06:34, 3 April 2025 (UTC)
You have it backwards. Alternative medicine is a single modern culture of "traditional" medicine, from an anthropological or historical lens. The fact that we call it traditional medicine rather than something not time-specific is what's likely misleading you. — Ganjabarah (talk) 02:06, 23 April 2025 (UTC)
Agreed (with what Tollef Salemann said). Etic versus emic worldviews; variable ontologic construal. The folk medicine of pre-Columbian Amerindian peoples was just plain medicine within their worldview; and there was no alternative medicine. Among people today who are willing to use alternative medicine, many would not bother with a magic stone, because they are seeking things that "actually work" (as far as they know or believe), so to them, the stone is ancient folk medicine but nothing else; but some of them will embrace the stone and love it, and if they do, then it is alternative medicine in their hands. Quercus solaris (talk) 06:39, 3 April 2025 (UTC)
OK this all sounds good but I'm not understanding how you're proposing to structure the categories, or even if you're making any proposal at all. Benwing2 (talk) 06:42, 3 April 2025 (UTC)
Unfortunately I mean that any strictly hierarchical categorization (versus a fuzzy logic one) is fine (good enough) but is also incapable of modeling the reality 100% accurately. Which doesn't mean that a method can't be chosen for it! Nor that it is futile. Just that it is capable at most of being an approximation, a stylized representation. With that being true, I am agnostic as to which option to choose for its design. Whether impressionism or cubism, both could be nice, even though neither is photorealism. Quercus solaris (talk) 06:49, 3 April 2025 (UTC)
Trying to sound less insane about it (lol), it's like email folders (like Outlook traditionally uses) versus email labels (like Gmail uses): each treatment could have any label applied, or more than one (i.e., alternative med, folk med, or both, or quackery), but which labels apply to each treatment is not the same yes/no value for all people (it varies by who is judging it). Thus, trying to put one label totally inside another one (as if subfolder into bigger folder) doesn't apply. Quercus solaris (talk) 07:10, 3 April 2025 (UTC)
It’s not Western invention versus Eastern tradition, instead the West and the East has vaguely evidence-based tradition, as we had pharmacopoeiae before one knew how to do clinical studies or cell physiology worked, based on anecdotal evidence like even today medical experience often has to make conclusions from observational studies, and later due to the abuse of media there came to be belief systems vaguely built on tradition and quackery in the fashion of conspiracy theories, woo
You of course pick existing vocabulary to sound superficially reasonable and attract supporters that aren’t reliably trained in critical thinking, since to some degree it always presupposed academicism still nowhere attained in the masses and often not even after a college diploma: not everyone is equally attentive. Interestingly we have an extensive article about pseudolegal practice and advice espoused by Reichsbürger and freemen on the land; perhaps we will add Category:Pseudolaw.
You might say that "alternative medicine is folk medicine with an army and a navy". The word "folk", like "dialect", tends to imply something rural and backward, while "alternative" tends to imply that sophisticated/normal people have decided to try something different. Chuck Entz (talk) 13:56, 3 April 2025 (UTC)
@Fay Freak — Not sure whether I read it right at "You of course pick existing vocabulary to sound superficially reasonable and attract supporters that aren’t reliably trained" — but I'm not personally espousing any woo-woo or hoo-haw at all. I'm mentioning the epistemologic view of physicians who reject reductionism as constituting all of science rather than just a part of it. Scientists with a maximally reductive view believe that even having any agency like NCCIH exist, at all, equals promoting pseudoscience. But many scientists disagree with that assertion, because of factors such as (1) they view studying the placebo effect as scientifically valid, as to whatever extent a placebo helps the patient (even when nonblinded) it is not useless to the purpose of health care, and (2) there are some things that science so far hasn't fully understood but may come to understand better, and then a particular treatment that many people were labeling as "alternative" meaning "against science" will be viewed as "not against science but rather valid per the concept of effectiveness as differentiable from efficacy (distal causality rather than proximal causality)." An appeal to authority saying basically that "the plebs should shut up and not even try to understand what science is except to accept whatever a high priest of science tells them it is" is counterproductive to those high priests' own self-interests in the end, anyway. A thumbnail example of why is that in 1960 or thereabouts your doctor would prescribe you some thalidomide for your morning sickness and recommend that you smoke Chesterfields instead of Marlboros because they're better for your lungs. Oops, guess today's current state of science isn't a last word forevermore, huh? The plebs aren't willing to take the high priests' word without the high priests being willing to discuss and defend and debunk. The fact that JAQing off and sealioning are endlessly tiresome and are often done disingenuously by shysters doesn't negate the fact that the poor little plebs demand to be included epistemologically and will throw a pitchfork revolt if they think they're being disrespected and discounted by the high priests. It is therefore in the high priests' own self-interest to somehow live with and deal with the burden of explaining and discussing and defending and debunking. I well realize how hard of a problem it is. But the alternatives are even worse, though. Quercus solaris (talk) 15:53, 3 April 2025 (UTC)
@Quercus solaris: This was great to read 😂. I did not imagine anyone to espouse woo-woo or hoo-haw here, but this illustrates how any field of study, scientific or not, is practiced and enforced on the basis of reproduction of previous treatments compared with empirical reality and statistical assumptions and interwoven with personal stakes in entering a science and maintaining positions therein, dragging down the recognition of health causality in a stream of tradition representing individual custom and habit, in Hume’s terms.
The question is then at which point it becomes belligerent enough to earn the label “alternative”, “fringe” or “minority” view, with its army and navy, and when a former majority-accepted view has gathered enough dust and blows to contrast with the state of art due to its subterfugial evidence base, never going with time in marketing itself. The difference is that of two genres of art, one very old, largely oblivial as much as it is maladaptive, and recent ventures one can still inflate as a consequence of little risk-aversion and attractiveness of new products, and how can so many buyers be wrong? Like cryptocurrencies popular enough to take off, by the volatility of which some people lost their fortunes, whereas for gold coins you don’t have to argue, everybody knows what he gets from them, though they be of limited direct practical use. These are all things targetting communities or communal identities. Fay Freak (talk) 19:36, 3 April 2025 (UTC)
Final proposed modifications to the Universal Code of Conduct Enforcement Guidelines and U4C Charter now posted
The proposed modifications to the Universal Code of Conduct Enforcement Guidelines and the U4C Charter are now on Meta-wiki for community notice in advance of the voting period. This final draft was developed from the previous two rounds of community review. Community members will be able to vote on these modifications starting on 17 April 2025. The vote will close on 1 May 2025, and results will be announced no later than 12 May 2025. The U4C election period, starting with a call for candidates, will open immediately following the announcement of the review results. More information will be posted on the wiki page for the election soon.
Please be advised that this process will require more messages to be sent here over the next two months.
The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.
Please share this message with members of your community so they can participate as well.
Many trivalent verbs in the fields of communication and giving/transfer show alternation between a construction with object + to PP and one with two objects: She showed the new draft to her tutor ∼ She showed her tutor the new draft
IO or 'to': with such verbs as tell, read, show, teach, IO and DO are aligned with less central cases of recipient and theme: the IO-referent comes to hear, see, or learn what is expressed by DO, rather than to have it.
Transitive/intransitive contrasts : They fined us ($100).
The single object of the monotr corresponds to the IO of the ditr (i.e., us), but other verbs such as charge allow both types of omission:
Compare:
They fined/charged us ($100)
They charged/*fined $100.
The verbs in follow the pattern of charge:
bet, cost, envy, excuse, forgive, refuse, SHOW, teach, tell. Yet, there's a distinction between the understood elements, which are either (in)definite:
I asked him the price but he wouldn’t tell me (sc. “the price”: definite)
He tells lies / dirty jokes (addressee indefinite).
If he were surprised, he didn’t show it.
Cost only in informal style (That’ll cost you, i.e., with “a lot” understood) or in the idiom to cost sb dear (where the syntactic analysis of dear is unclear).
Factive and entailing governors (i.e., the content clause complement is normally presupposed):
(ii) entailing and non-factive a. happen, prove, SHOW, turn out.
@JMGN Okay, seems like this affects more verbs than I assumed. We should probably just use one sense definition which lists example sentences for all three cases. Tc14Hd (aka Marc) (talk) 01:47, 6 April 2025 (UTC)
names of cities in India
I am expanding the list of cities in Module:place/shared-data to include at least all the cities over 1,000,000 people as of the most recent (2011) census (the 2021 census has been repeatedly delayed and is not yet conducted). Per w:List of cities in India by population there are 46 cities over 1,000,000 people (city proper, not metro area) as of the 2011 census; by now there are certainly more. I would like to solicit opinions on some issues:
Bangalore vs. Bengaluru. Wikipedia just renamed their article earlier this year, but (a) Wikipedia tends to lean towards official names and endonyms rather than common names, despite the "common name" policy; (b) per the move request closer, this was a close call. @-sche has proposed using Google Scholar as a first-line source; a search using only >= 2024 sources shows 17,900 occurrences of Bangalore vs. 17,300 of Bengaluru. I am inclined to keep using "Bangalore" for now with Bengaluru as an alias, but am open to suggestions.
Cities renamed for political reasons. We have at least the cases of Aurangabad vs. Chhatrapati Sambhajinagar (a red link!) and Allahabad vs. Prayagraj. Wikipedia has the former at w:Aurangabad but the latter at w:Prayagraj. The Google Scholar test for the latter using only >= 2024 sources shows 10,700 Allahabad vs. 7,100 Prayagraj. I think the case for Aurangabad is obvious and I am inclined to use Allahabad over Prayagraj.
Metro areas anchored by multiple smaller cities rather than a single large city. There are three of them in the list: Pimpri-Chinchwad, Kalyan-Dombivali/Kalyan-Dombivli and Vasai-Virar. Should we use the hyphenated names for categories or should we use the component cities, and if we go with the latter option should we include all 6 cities or only some? One possibility is to use the hyphenated names but add the components as aliases that are recognized but categorize under the combined name.
The second hyphenated city is further confused by the variants Kalyan-Dombivali vs. Kalyan-Dombivli. Wikipedia puts the city at Kalyan-Dombivli with no mention whatsoever of the variant Kalyan-Dombivali, but the above list article uses Kalyan-Dombivali, and so do we. What is the deal here and which variant should we use? The lack of context in the Wikipedia article makes it hard for me to judge what to do.
Metro areas over 1,000,000 inhabitants. w:List of million-plus urban agglomerations in India lists 52 such metro areas per the 2011 census and 65 according to the 2023 report of Demographia (which uses 2023 estimates). Maybe I should go off one of these two lists instead of the city-proper estimates; this also eliminates the issue with Pimpri-Chinchwad, Kalyan-Dombiv(a)li and Vasai-Virar, which are all satellite cities that get included in a larger metro area. (But on the other hand it introduces new issues with Durg-Bhilainagar and Hubli-Dharwad -- both red links for us.)
OK I've decided to go with the Demographia list of 65 metro areas. The Wikipedia article on Durg-Bhilainagar redirects to Bhilai so I've used that as the category name, with Durg, Durg-Bhilai, Durg-Bhilainagar and Bhilainagar all existing as aliases. Hubli-Dharwad per the Wikipedia article uses that name with Hubli and Dharwad as aliases. Comments welcome. Benwing2 (talk) 03:36, 6 April 2025 (UTC)
I support going with official names like Bengaluru and Prayagraj which are well-established and dominant by now. Aurangabad to Chhatrapati Sambhajinagar is a recent rename and the new name is longer, so it isn't as popular as others. Maybe in some years Sambhajinagar (without the honorofic) would become popular and usable. Svārtava (tɕ) 09:59, 6 April 2025 (UTC)
The renaming of Indian cities is certainly politically motivated to replace Mughal- or European-sounding names.
Therefore, there may be prescriptivist pressure towards using the new official names, because they are argued to be a ‘purer name’ or the ‘original name’.
Except in well-known cases such as the renaming of ‘Bombay’ to ‘Mumbai’, determining which name is more common may require some investigation.
I am inclined to keep using "Bangalore" for now with Bengaluru as an alias
Although Indian English may have accepted ‘Bengaluru’, non-Indian English still prefers the anglicised form ‘Bangalore’.
So, I can agree to keep using ‘Bangalore’.
I think the case for Aurangabad is obvious
I agree. As @Svartava implied, not all Indian English may be accustomed to using the new name of Aurangabad/Sambhajinagar, since it only occurred in June 2022, and it is quite lengthy.
If the new name of Aurangabad/Sambhajinagar catches on, ‘Chhatrapati’ would likely be dropped in ordinary usage but still maintained in purist usage. For comparison, Mumbai’s airport is officially called ‘Chhatrapati Shivaji Maharaj International Airport’, but is almost never referred to as such in ordinary speech.
Perhaps something to investigate would be how strong the prescriptivism is for ‘Chhatrapati Sambhajinagar’ outside the state of Maharashtra and the Marathi language.
Regarding Dombiv(a)li,
Whether or not to retain schwas as ‘a’ in English renderings of Indian names from schwa-dropping languages is a widespread issue.
It should be noted that the first ‘a’ representing an orthographic schwa in the Indian state of ‘Gujarat’ is often deleted in ordinary speech but may be retained in careful speech. On the other hand, the underlyingly identical term (with a different referent) is spelled in English as ‘Gujrat’ for the city in Pakistan.
The ‘a’ in ‘Dombiv(a)li’ could be dropped as per Wiktionary’s transliteration policy of dropping orthographic schwas in both Marathi and Hindi terms, since the case retention is not as strong as it is for the Indian state of ‘Gujarat’.
the issue with Pimpri-Chinchwad, Kalyan-Dombiv(a)li and Vasai-Virar, which are all satellite cities that get included in a larger metro area
These satellite cities in Maharashtra are large suburbs of their respective main city that are differentiated for administrative and legal reasons.
Kalyan, Dombiv(a)li, Vasai, Virar, Navi Mumbai and Thane are large suburbs of Mumbai. Of these satellite cities of Mumbai, Navi Mumbai and Thane would be the strongest cases for being separate cities altogether.
Pimpri and Chinchwad are essentially suburbs of Pune.
In any case, using hyphenated compound forms, including Hubli-Dharwad in Karnataka, is very awkward.
@Benwing2: I concur with Svārtava in favouring official names in the vast majority of cases, but, my word, Chhatrapati Sambhajinagar is unwieldy. I took it upon myself to etymologise Aurangabad and to create Chhatrapati Sambhajinagar. The latter literally means “Parasol-lord Sambhaji’s city”, and has the same number of syllables if you don't drop your schwas. I can't imagine anyone would say the whole thing everytime he referred to the city. By contrast, a name analogous to *Sambhajiton sounds a lot more manageable, so maybe we can lemmatise Sambhajinagar if and when that catches on. 0DF (talk) 01:37, 16 May 2025 (UTC)
@Benwing2: Thank you. As a general point, is redirecting derivations categories ever a good idea? I notice that pages added to a redirecting category do not get added to the redirected-to category. 0DF (talk) 12:19, 7 April 2025 (UTC)
I maintain that we need to name every (type of) category in a way that explicitly spells out the intended scope, leaving no category at all named just "CAT:Deserts" or "CAT:Waterfalls" etc, because it is demonstrated all around Wiktionary that if any category is named just "CAT:Waterfalls" (etc), each different person is liable to use it for something different (as discussed here and elsewhere). Perhaps "CAT:Individual buildings" (or "CAT:Names of individual buildings"?), "CAT:Types of buildings", and (hypothetically) "CAT:Terms related to buildings"? (In the case of buildings, that last one might not exist, but replace "buildings" with e.g. "rivers".) It is conceivable that people might want to make exceptions for specific kinds of category, e.g. to let CAT:Cities continue to be named that (as opposed to renaming it CAT:Individual cities), but then again... the ambiguity of "CAT:Cities" means people have put pomerium (term related to the topic of cities) and capital city (type of city) into CAT:en:Cities... - -sche(discuss)16:29, 7 April 2025 (UTC)
Old Sundanese spelling
Should lemmas in Old Sundanese be spelt with the modern Sundanese Latin orthography, or a modified version of the Sundanese spelling as seen in Old Javanese lemmas? (e.g. with ṅ, ñ, ĕ). Yes, there are already 83 lemmas in total (alll of them using the modern Sundanese spelling), but I think a new spelling for Old Sundanese would be interesting. Udaradingin (talk) 09:06, 7 April 2025 (UTC)
For Old Sundanese? I mean they use Old Sundanese script, Kawi, or Pallava. What I meant was romanization of OS (sorry for not clarifying earlier), like how lemmas in Old Javanese use romanization as the main entry (for example abhiṣeka, aḍaṅ, etc.). And these romanization uses one letter to represent one sound (e.g. ṅ for /ŋ/ instead of ng, ñ for /ɲ/ instead of ny, differentiation between e and ĕ, etc.). I'm asking your opinions if the Old Sundanese entries should be created following the Old Javanese romanizations (or at least based of it). What do you think? Udaradingin (talk) 13:57, 7 April 2025 (UTC)
Here's an example of the proposed spelling:
"Ini silokana: mas, pirak, komala, hintĕn, ya ta saṅhyaṅ catur yogya ṅara(n)na. Ini kaliṅana. Mas ma ṅaranna sabda tuhu tĕpĕt byakta pañcāksara. Pirak ta ma ṅaranya ambĕk rahayu. Komala ma ṅaranya gĕi(ṅ)na padaṅ caaṅ lĕga loganda. Hintĕn ma ṅaranya caṅciṅ sĕri sĕmu imut rame ambĕk. Ya ta sinaṅguh catur yogya ṅarana."
@Udaradingin: Well, I believe we should lemmatise the forms in the Old-Sundanese/Pallava and/or Buda/Kawi scripts. The important thing in Romanisation, as far as I'm concerned, is that the original-script form be reconstructable from the Romanisation. That is usually facilitated by a bijective glyph-to-glyph correspondence. If the original script embodies the “one dedicated letter for each sound” principle, then so should the Romanisation. That is a convoluted way of saying that I agree that the Romanisation of Old Sundanese terms should be like those of Old Javanese rather than like those of modern Sundanese, but that the Romanisations should not be the main entries for Old Sundanese. 0DF (talk) 15:35, 7 April 2025 (UTC)
@0DF I totally understand your point and I agree, especially with the last part. But, there are several reasons why I think making Old Sundanese/Kawi/Buda/Pallava script as the main entry of OS would be difficult:
The number of sources providing the original Old Sundanese texts in said scripts are very limited, with most sources only giving the romanization or at least the modern-Sundanified (read: heavily edited so that modern Sundanese people are able to comprehend the text). Counterpoint being, the existence of photographs and rubbings of inscriptions and/or manuscripts, especially this image of the Astana Gede inscriptions and this facsimile of Carita Waruga Guru. But then again, very limited.
Old Sundanese are (in my opinion) are generally less-researched and less-documented compared to Old Javanese (this might be related to the 1st reason).
Much like Old English, the Old Sundanese orthography wasn't standardized as it is in the modern one, let alone other scripts like Kawi, Buda, and Pallava.
Because we are trying to follow the examples of Old Javanese lemmas, I think it would be easier and more consistent if we only change spelling to be like those of OJ (still being in Latin alphabet) rather than changing the writing system into OS/Kawi/Buda/etc. This would bring a balance between a sense of familiarity (comprehension) and novelty (spelling system).
@0DF However, I'm thinking of a middle path here. We could put the OS/Kawi script as a soft redirect to the main entry in Latin, similar to how Sundanese or Pali entries are structured (examples for Sundanese and Pali entry). Do you have anything in mind about this? Udaradingin (talk) 01:02, 8 April 2025 (UTC)
@Udaradingin: You say that most sources for Old Sundanese only give the Romanisation. What do you mean by "sources" here? Do you mean dictionaries, grammars, and other sources that discuss Old Sundanese, or do you mean texts from the Old Sundanese corpus are usually published in Romanised form? Pali was not originally written in the Latin script, but the editions of the Pāli Text Society mean that a lot of the Pali corpus exists in Romanised form; is that the case for Old Sundanese? This has a bearing on the issue of attestability of forms.
You wrote that “uch like Old English, the Old Sundanese orthography wasn't standardized”. My understanding of Old English writing is that there is variation that reflects dialectal differences in pronunciation and inflection. Is this the case with Old Sundanese? I don't see why this would be a problem, since any Romanised system would also need to reflect such differences. We would, of course, need to choose lemmata from the range of attested forms, but that's a problem no matter the writing system, surely. 0DF (talk) 00:58, 10 April 2025 (UTC)
@0DF Yes, what I meant by "sources" is that dictionaries, books, etc. generally publish Old Sundanese using the Latin script. This trend dates back to at least the 53rd edition of Tijdschrift voor Indische Taal- Land- en Volkenkunde (1913) where they used (at the time) the Dutch-influenced spelling. Even today, most scholarly materials today follow this convention. This suggests that what we often work with are editorial spellings or diplomatic transliterations rather than the reproductions of the original manuscripts.
I think I agree with your opinion on the 'unstandardized orthography' of the Sundanese system being unproblematic. But I think we should take into consideration that not only is OS less standardized than Old English, but the corpuses also had varying practices in different regions and periods, so it would be difficult to establish a uniform system were if we to not use a single system of writing.
The usage of Kawi, while recently implemented to Unicode, can still not be properly rendered in some devices. While the Buda script isn't even in the Unicode at all (as of yet). This gives it a kind of a technical problem. Given that Romanized texts are more readily available, it gives a more accessible point of reference, especially for public audiences in Wiktionary. This is also how we can select the lemmata from the attested forms. Udaradingin (talk) 12:03, 29 April 2025 (UTC)
@Udaradingin: It sounds like we should be using the Dutch-influenced orthography, in that case, if most texts in the corpus are published in it. 0DF (talk) 01:18, 10 May 2025 (UTC)
@0DF I wasn’t saying that most Old Sundanese texts are written in Dutch-influenced (Van Ophuijsen) spelling. I cited the 1913 edition of the TITLV just to show that Latin orthography has been in use for Old Sundanese since the early 20th century. The Dutch-based spellings were shaped by the orthographic conventions of the time and is intended for a colonial-era readership. For example, the term laṅṅit was spelled as langngit. Same goes on for purasani (poerasani), ja (dja), cikal (tjikal), and so on. Today, modern Indonesian spelling has shifted toward, or at least influenced by the EYD system. However, like in Dutch-influenced spelling, it often merged or obscured phonemic distinctions important for understanding Old Sundanese (like ng vs. ṅ, ny vs. ñ, or e vs. ĕ). One could not whether the word dunya is spelled/pronounced as "du-nya" (duña, /duɲa/) or "dun-ya" (dunya, /dunja/) ultil they check at the original text.
So rather than adopting a 20th-century Dutch-influenced system entirely, it makes more sense to follow a modified Latin orthography that reflects current understanding of the language, just like in Latinized Old Javanese entries. Adapt rather than adopt, I'd say. Udaradingin (talk) 19:09, 15 May 2025 (UTC)
Romanization?
As a part of speech, Romanization seems like a weird label. The definition is : "The act or process of putting text into the Latin (Roman) alphabet, by means such as transliteration and transcription."
But the underlying words or phrases being Romanized have their own parts-of-speech in their native language, don't they? There are about 117K entries in English Wiktionary with "romanization" for part-of-speech, so I'm not suggesting any changes. I just want to understand why this category exists as a pos and not say a "form" or "form of" or "alt" section, etc, in an entry that has the actual pos for the Romanized word.
Also, is an entry like ꜣꜥy even Romanized, according to the definition? 2/3 of those characters don't look like Roman letters to me. ( Btw, I think it's amazingly cool you can look up ancient Heiroglphys like ꜥꜣꜥ. What a great tool this is!)
Killeroonie (talk) 04:08, 9 April 2025 (UTC)
@Killeroonie: I think the operative sense when it comes to the “Romanization” POS header is rather the countable one: “An instance (a string) of text transliterated or transcribed from another alphabet into the Latin alphabet.” The characters ꜣ and ꜥ are letters of the Latin script, according to Unicode. 0DF (talk) 01:03, 10 April 2025 (UTC)
I guess I need a code because it is a language of its own with seven dialects of its own. Probably other codes are needed too. Why is it a problem to add new codes? Make Dargwa great again (talk) 11:19, 9 April 2025 (UTC)
To split a language we must know there will be people who will maintain the split, otherwise there will be asynchronization and bardak, with uncared-for split varieties coexisting with unsplit Dargwa.
If you want to separate Kaitag from Dargwa, then we should also split the other 16 Dargwa varieties and you should promise us that:
1) You will go through all of Category:Dargwa lemmas and assign each lemma to a newly split variety.
2) You will go through all of Dargwa translations given in the translation tables of English terms and assign each translation to a newly split variety.
3) You will add Category:Dargwa lemmas to your watchlist and will assign new lemmas added by others to newly split varieties. You will also stick around to tell newbies that Dargwa is not a single language anymore and will teach them how to assign Dargwa words to a proper variety. They will not know it because Dargwa is treated as a single language by all dictionaries published in Russia.
4) You will split the entry Dargwa аба(aba) into 16 new sections to demonstrate proof of concept.
I checked the first 15 translations or so, all of them are into standard northern Dargwa. I don't get why it is needed to split the other 16 Dargwa varieties at once? Make Dargwa great again (talk) 18:46, 9 April 2025 (UTC)
If we split Kaitag, that would create a precedent. The activists of the other Dargwa varieties would demand a code like you do.
Understand that splitting a language is like a vasectomy: it is possible and even reversible, but it is painful. I suspect if we give you a Kaitag code, you will create like 10 entries then disappear; none of our Daghestani editors have stayed around. Then an etymologist like me will be stuck with the need to check each Dargwa word I may occasionally add to the Etymology or Descendants section to see if it is a Kaitag Dargwa or non-Kaitag Dargwa to assign the proper code (most etymological sources would not mention the variety, they would simply say "Dargwa"). This burden is worth bearing only if the new code attracts a dedicated editor. I suggest you contribute under the header ==Dargwa== using the code dar and the label {{lb|dar|Kaitag}} for a couple of months to see if you are that editor. Vahag (talk) 09:27, 10 April 2025 (UTC)
The Soviet orthography is more or less the same, but the IPA values are a bit different. There is also a new orthography developed by Alkaitagi (it is accepted ). You can see the comparison here .Make Dargwa great again (talk) 14:27, 4 May 2025 (UTC)
@Make Dargwa great again You have the code xdq now. Please add references to the entries you create. If you need reference templates, let me know. Also, the spelling of words from {{R:xdq:Magomedov:2025}} should be normalized to what you called Soviet orthography, used for example in {{R:dar:Temirbulatova:2022}}. You can put {{normalized}} at the top of the normalized page if the spelling is found only in the non-standard source. Vahag (talk) 11:57, 1 June 2025 (UTC)
I think it would be nice to have one common standard of giving pronounciation. For example, take a look at these English entries: the, of, and, to, in and conpare the "stressed" caption. You can see that all these five entries use a slightly different layout.
In my opinion it's more likely that a reader wants to learn all pronounciations for one variety of English rather than learn all pronounciations of a stressed form. So my suggested layout would look like this:
Our current layout for pronunciations is here: Wiktionary:Entry_layout#Pronunciation. If you're proposing some change to that page, it will probably have to follow a vote, since changes to the that policy have site-wide impacts. I'm personally in favor of some kind of standardization, but have no strong feelings on what that would be. —Justin (koavf)❤T☮C☺M☯22:23, 9 April 2025 (UTC)
I'm proposing the style mention above then and I think our current layout leads to a lot of different style. our has yet another arrangement 185.18.68.21022:30, 9 April 2025 (UTC)
It seems that User:Jöttur has prolifically generated bad Icelandic content (especially pronunciations and usage examples) using ChatGPT or similar system. I have been trying to delete all the bad content but there's a ton of it, and a lot still left. I would like to propose a formal prohibition on using AI chat bots and LLM's to generate any sort of content for Wiktionary. Doing this would make the user subject to escalating blocks. Thoughts? Benwing2 (talk) 07:28, 10 April 2025 (UTC)
Online translators like Google Translate are effectively specialised forms of LLM, and I think those are already prohibited as the basis of entries (although I can't find the policy). If there is a policy, then we can hopefully modify it fairly simply - if not, it should cover both cases (more generally, "Do not use any automated tool to generate content that you would not be capable of writing and verifying independently." - effectively rule 1 of WT:BOT). The only reasonable exception I can think of might be for grammar checkers - I mean, if I edit Wiktionary on mobile, technically I'm already using an LLM (since the autocorrect, autocomplete and spell check/grammar check functions on phones use a low-level AI), and I can imagine someone reasonably using a manually-controlled semi-automatic grammar bot on definitions and etymologies. Smurrayinchester (talk) 08:11, 10 April 2025 (UTC)
Support. AI may be a useful tool in some cases, but since it is very prone to hallucinating, anything it creates must be manually reviewed by a human before any of it is contributed. In particular, asking an AI something and believing it without question must never be treated as an alternative to actual proficiency, as in adding entries or examples in a language that you do not actually speak. — SURJECTION/ T / C / L /14:52, 10 April 2025 (UTC)
Support. We have always had serious problems with people who contribute prolificly in languages they don't know. AI provides a handy tool to make such people even more prolific, and all kinds of apps are intrusively and aggressively promoting their AI services without mentioning their limitations. While it's hard to conclusively prove that a pattern of bad edits is due to use of AI, we definitely need to make it clear upfront that we do not allow AI-generated content. We should also develop a help or policy page explaining what AI is, what can go wrong, and why editors should never use it here. Basically, an AI app is hard to distinguish from a human pathological liar, and neither should be trusted. Chuck Entz (talk) 15:18, 10 April 2025 (UTC)
On this note I think we should also start coming down harder on people editing languages they are not competent with more. You don't need to have perfect fluency, but you do need to know enough linguistically speaking to not leave stubs, etc. Yes, there is WT:BOLD but if you're leaving a mess or a bunch of request templates then you might as well not make the page. Vininn126 (talk) 16:09, 10 April 2025 (UTC)
Strong support, I have tried chat GPT's etymolical and page-building skills "for fun": it is disappointing, to say the least, if not a win for myself/ourselves in its failure. Saumache (talk) 15:24, 10 April 2025 (UTC)
How exactly will we confirm our suspicions that content is AI-generated? Or do we act on our suspicions and require the contributor to show that contributions aren't AI-generated? A ban makes me think of King Canute. DCDuring (talk) 18:17, 10 April 2025 (UTC)
Policy for editors: AI is forbidden. Policy for admins: Poor quality bulk submissions can be bulk deleted without going through formalities. There is no need to change the blocking policy. Vox Sciurorum (talk) 18:29, 10 April 2025 (UTC)
What makes some submissions "bulk submissions"?
In effect, you propose that we use an anti-AI policy to justify eliminating "formalities" to allow "bulk submissions" deemed (by whom? ie, what judge and juty) of "poor quality" to be summarily deleted (by whom? ie, what executioner?). We won't even have to have a show trial. DCDuring (talk) 18:54, 10 April 2025 (UTC)
FWIW I asked several contributors on Discord to evaluate the quality of User:Jöttur's work and posted some details about this on their talk page. I don't think it's helpful to throw around accusations that contributors are getting railroaded into "show trials" with "summary executioners" or anything like that. In reality it's quite the opposite; we have a massive problem with people who have no idea what they're doing but think they do and contribute junk across several languages, significantly degrading the quality of the dictionary. It typically takes months, sometimes years, before these bad contributors get blocked, and often their bad contributions never get cleaned up because it's a big task especially once time has elapsed. You're not the one who is cleaning up the messes so please, do your due diligence before casting aspersions. Benwing2 (talk) 19:18, 10 April 2025 (UTC)
@User:Benwing2 I reacted to VS's proposal: arbitrary deletion at the discretion of individual editors.
I think that the use of the label "AI-generated content" is a canard if we don't have the ability to actually detect it. It seems to me the problem is large quantities poor-quality content from languages for which we apparently cannot muster sufficient trusted contributors to promptly review entries or translations. Maybe AI is the source, but without any specific ability to detect whether AI is in fact the source, we may just be ignoring the root problem. Do we need to filter additions of L2 sections in certain languages for which we have no active editors, so that only qualified editors can work on them or on critical parts of them (creation? etymology? pronunciation? definition?)? Are there some technical means (joined with necessary contributor workflow) to detect and limit (ban?, quarantine?) the flow of entries and translations in certain languages for which we have no qualified contributors or reviewers? Maybe our cottage-industry approach to entry review needs technical, not rhetorical reinforcement. DCDuring (talk) 23:15, 10 April 2025 (UTC)
I think that bad entries made with good intentions can sometimes be improved and become good, but entries made by lazy copypaste from Google Translate with AI made code are not possible to improve, neither are they made with good intentions. I can’t say I make perfect edits all the time, but I try to at least make sure that they don’t contain false information, and use lots of time to verify stuff. Using AI is the opposite, when the form is set over the content and honest work. Tollef Salemann (talk) 20:11, 10 April 2025 (UTC)
Strong oppose. Has it not been established that trying to "detect AI generated content" is a fool's endeavor? Even if this were somehow possible, which it isn't, there is no inherent issue in using LLMs for editing help. An obvious example that comes to mind is editors whose first language isn't English using ChatGPT or similar for proofreading. If a user contributes AI slop, it should be deleted as slop and the user dealt with accordingly. Focusing on the AI part is completely unhelpful. 🌙🐇⠀talk⠀⠀contribs⠀22:18, 10 April 2025 (UTC)
I think you may be missing the gist here. The idea is not to prevent people from using LLM's for help in proofreading or verifying the correctness of generated content. I used to do that several years ago using Google Translate when the Russian quality was so-so at best, essentially as a "second opinion" to make sure what I was doing wasn't crazy. The issue is people who mass-generate content using ChatGPT or similar and don't correct its mistakes, like the above-mentioned user. Ultimately yes there is an arms race between detecting AI-generated content and generating content to fool the detectors (that was the explicit idea of GAN's developed by Goodfellow et al several years ago ... I work in the field in fact so I'm familiar with the issues). But much current content is obvious AI slop and having a policy to explicitly prohibit such slop would make it a lot easier for admins to block editors who enter such slop. As it is, AFAIK it's not explicitly prohibited so it can be difficult to justify a lengthy block as a first offense, which just makes life that much more difficult for admins who have to repeatedly deal with problematic users who wait out their block and then continue the same behavior. Benwing2 (talk) 22:32, 10 April 2025 (UTC)
"obvious AI slop" is not obvious though, as I am sure you know. And here it's not like we have something akin to WT:CheckUser that would enable admin to insist someone used AI, against their word. Wouldn't it be a much more constructive policy direction if we prohibited mass-contributed, incorrect, low-effort content, AI-generated or not? AI could be mentioned, sure, but as formal policy I fear it just doesn't provide any meaningful coverage. 🌙🐇⠀talk⠀⠀contribs⠀22:40, 10 April 2025 (UTC)
Sure, but I am afraid such a policy against something as nebulous as "mass-contributed, incorrect, low-effort content" will prove impossible to enforce or even define. Benwing2 (talk) 23:04, 10 April 2025 (UTC)
@User:Benwing2 What about capping the flow of certain contributions from editors without established track record in the language of the contribution? Such capped contributions could be quarantined pending review by a competent reviewer, should we ever get one for the language involved. DCDuring (talk) 23:20, 10 April 2025 (UTC)
@Lunabunn: There's more to a ban than enforceability. Sure, there are people who will use it-and get away with using it- no matter what we say or do, but there are some people who wouldn't use it if they saw that it wasn't allowed. Chuck Entz (talk) 03:51, 11 April 2025 (UTC)
I think the primary purpose of this policy is to educate editors and clearly explain to them that the unverified AI-generated content is not a good contribution. Some of the contributors acting in good faith may genuinely believe that they are doing something useful. A month ago, somebody behind an IP tried to add a bunch of Belarusian entries, which looked like a bot automatically submitting AI-generated content. I spooked them via leaving messages at their talk page and they stopped doing that (unfortunately without responding to me and without providing any explanations). But if this was somebody really malicious, then they could have continued doing damage. --Ssvb (talk) 05:00, 11 April 2025 (UTC)
Now that I dissect the matter after @Lunabunn’s distinction, running unapproved bots is disallowed. The bot policy page, which appears more relevant to this matter than the offence of copying from online translators, is, surprisingly, largely unedited and stable since 2006. Help in proofreading or verifying the correctness of generated content by whichever technical means does not constitute the act of content generation per se and hence has not been disapproved of even if it be subject to automation. I figure it is more intuitive to me to assume AI in terms of complicity, disallowed as a principal but not accessory. Ssvb is right that people are innocent enough to need explication. Fay Freak (talk) 18:48, 11 April 2025 (UTC)
Strong support Just like we forbid other kinds of automatic content generation. As long as someone checks it and ensures the content is quality is there, I approve, but then it just becomes a way of typing out the content rather than mass automated content, so that's okay IMO. Kiril kovachev (talk・contribs) 00:32, 11 April 2025 (UTC)
Strong support with a ban for repeat or blatant offenders. I'm the first one to look for shortcuts, and anything to make my life easier – AI is not one of them. Bots can be employed, but AI aren't something that we can rely on yet, and when that time does come, Wiktionary or Wikimedia would probably roll out its own AI, or an AI / LLM that is tailored towards Wiktionary, rather than individual users trying to use 3rd party AI services. نعم البدل (talk) 22:41, 27 May 2025 (UTC)
I asked AI for its views...
That's a great and complex question—there are strong arguments on both sides. Here's a breakdown of the pros and cons to help frame the discussion:
✅ Arguments for allowing AI-generated content on Wiktionary:
Speed & Scale
AI can generate large volumes of content quickly, especially for underdeveloped entries or less commonly spoken languages.
Consistency
AI can follow structured templates and formatting rules, which may improve consistency across entries.
Augmenting, not replacing
AI-generated entries can be used as a base for human editors to refine, speeding up the editing process without replacing human judgment.
Accessibility & Inclusion
AI could help document languages or dialects with fewer contributors, making Wiktionary more globally representative.
❌ Arguments against allowing AI-generated content:
Accuracy & Reliability
AI might hallucinate or pull from inaccurate sources, especially with obscure or nuanced definitions.
Loss of Expertise
Human contributors bring contextual knowledge, cultural nuance, and etymological accuracy that AI might miss.
Vandalism & Abuse Risk
AI-generated content could be used to flood the platform with low-quality or even malicious entries if not properly controlled.
Community Erosion
Overreliance on AI might discourage human contributors, weakening the collaborative spirit of Wiktionary.
🤔 Possible Middle Ground
AI-assisted editing only: AI suggestions require human review before publishing.
Flagged content: Mark AI-generated entries for transparency.
Pilot programs: Test AI contributions in specific languages or entry types.
Support. The only exceptions I would give are Smurray's mentioned above, and using an LLM to reformat existing information by manipulating the markup. Converting a long list to a MediaWiki table, for example, can be very tedious to do by hand. — excarnateSojourner (ta·co)15:37, 1 May 2025 (UTC)
Support. To address concerns about it being an impossible task, there are some tells you can look for. I just found cách mạng sắc màu by searching for "?utm_source=chatgpt.com". Even if a full ban is unworkable, we need some sort of policy on LLM use, at the very least requiring disclosure. Apocheir (talk) 02:21, 27 May 2025 (UTC)
Hello, I would like to request the interface admin rights to be able to edit MediaWiki:Gadget-LanguagesAndScripts.css, as until now I had to bother other interface admins with requests. I would only add fonts or maybe do minor adjustment for scripts and languages of minor importance. Catonif (talk) 19:39, 10 April 2025 (UTC)
Wikidata and Sister Projects: An online community event
(Apologies for posting in English)
Hello everyone, I am excited to share news of an upcoming online event called Wikidata and Sister Projects celebrating the different ways Wikidata can be used to support or enhance with another Wikimedia project. The event takes place over 4 days between May 29 - June 1st, 2025.
We would like to invite speakers to present at this community event, to hear success stories, challenges, showcase tools or projects you may be working on, where Wikidata has been involved in Wikipedia, Commons, WikiSource and all other WM projects.
If you are interested in attending, please register here.
If you would like to speak at the event, please fill out this Session Proposal template on the event talk page, where you can also ask any questions you may have.
I apologize if here is not the correct place to voice this issue. I am a casual user of Wiktionary, normally using it to check the etymology of words. Normally, I just to go to the hub page of Wiktionary and use the search engine there to look for a particular word. Before, the search engine simply directed me to the Wiktionary entry of the searched word; however, since a few days ago, it started redirecting me to Wikipedia instead. The glitch happens both on my PC and on my smartphone. Does this happen to everyone or somehow I've botched up the settings of the engine?
Formally allowing removal of Babel boxes by other users if proficiency is contradicted
Another User:Jöttur-related issue. Benwing deleted Jöttur's Babel box after I suggested the idea on Discord after Jöttur was blocked due to a consensus that he was persistently adding incorrect Icelandic information despite claiming native Icelandic fluency in his Babel box. In consequence, I would also like to add the following to {{Babel}}'s documentation and Wiktionary:Babel in case another situation like that happens in the future:
Babel boxes may be removed by other users if it is clear that the user's claimed language proficiency levels are unsubstantiated.
Claiming native Icelandic by using AI is a good reason to doubt such stuff, but Babel is pretty much subjective otherwise. Doubtful use of Babel is not really common, I remember just two cases in the last year, and they were very obvious and were soon stopped, as the contributions made by the users were so bad, so they were banned. Tollef Salemann (talk) 21:41, 12 April 2025 (UTC)
We have had another contributor to Icelandic, Numberguy6, greatly overstating his knowledge of the language (the correct Babel assessment would be "is-1" instead of his claimed "is-4") adding some significant inaccuracies and mass copyright violations that will never be fixed due to his high volume of edits. But, like with Jöttur, removing the Babel box would not have changed anything, as these over-eager editors rarely listen to pleas for them to stop. The only thing that might help would be for it to be clear that these users should be reported somewhere for immediate admin intervention. 130.208.182.10308:16, 13 April 2025 (UTC)
Please don't take this personally, but my take on this is that the Icelandic language just needs competent Wiktionary editors who are willing to contribute on a regular basis. You are hiding behind an IP and have contributed very little during all these years since 2021. Of course, it isn't like you have any obligation to contribute, but I'm not surprised that imposters are filling the void.
I also don't know what to feel about the speedy lynching of Jöttur, which was based on your report and the testimony of "another Discord user", who was hopefully really a different person rather than your account there. I wonder, wouldn't it have been a good idea to ask for expert opinion of some active Icelandic Wikipedia contributors when resolving this dispute, such as perhaps @TKSnaevarr or the others? --Ssvb (talk) 10:31, 13 April 2025 (UTC)
I am the above IP address. I gave up on contributing as there was no end in sight of bad Icelandic contributions to review. I tried on multiple occasions to get Numberguy6 to clean up after himself but to no avail. I do not have Discord. TKSnaevarr is welcome to review Jöttur's contributions; even though most have been deleted Jöttur's userpage is representative of his competence in Icelandic. Hvergi (talk) 11:40, 13 April 2025 (UTC)
@Ssvb Nearly everything contributed by Jöttur was completely wrong, and it was repeatedly called out by others trying to clean it up. As just one example, he added the Afrikaans section on ander, and an IP later redid it with the comment
Correcting + expanding Afr. adj. inflections. They were added by someone unfamiliar with Afr. grammar, who thought all attributive forms take -e. This is quite wrong. An oversimplified rule of thumb is: (a) Polysyllables take -e unless ending in -el, -er. (b) Monosyllables take -e only when ending in -f, -d, -s, g.
This is typical of his contributions. As for the "another Discord user" possibly being Hvergi's Discord account, please assume good faith on my and Hvergi's part. In fact the user was @Anarhistička Maca, who gave me her permission (on Discord) to quote her response, and is an active Icelandic Wiktionary contributor (you specifically said "Icelandic Wikipedia contributors"; if this is intentional I don't know why it matters whether it's Wikipedia or Wiktionary). Also, depending on how bad User:Numberguy6's contributions are, I am willing to nuke them as well as I'm really out of patience with poor-quality editors who lie about their competence in a language and contribute slop. Benwing2 (talk) 09:53, 15 April 2025 (UTC)
@Benwing2 Please assume good faith on my part and try to put yourself in my shoes. I posted my comment, after analyzing the information that was publicly available. And from where I stand, nothing in User_talk:Jöttur indicated that he was "repeatedly called out by others" for the issues related to the Icelandic language skills. I understand that some other communication channels could have been used for that, but yet nobody bothered to bring this topic to the user's talk page until just a few days ago. And this is strange, considering that the Jöttur's account is not exactly new. Is "nearly everything contributed by Jöttur was completely wrong" a hyperbole or somebody's objective assessment? Regarding the Icelandic language dispute that unfolded, and without having any other information, I see that you are relying on two expert opinions. One of these experts was labelled by you as "an actual Icelandic speaker" without disclosing their identity, but now upon my request, you have clarified that it was @Anarhistička Maca with "is-2" self-assessed Icelandic language skill in her Babel box. Another expert opinion came from an IP user, who later turned out to be @Hvergi, and whose Icelandic language proficiency is currently still ambiguous due to a missing Babel box. May I kindly ask Hvergi to make a statement about their self-assessed Icelandic language proficiency? You may assume that I'm not assuming good faith, but I'm merely asking for more transparency in handling this matter. And I'm surprised that the others haven't pointed out the same.
I mentioned Wikipedia in my previous comment, because it doesn't seem to be perfectly clear whether Wiktionary even has sufficient in-house Icelandic language expertise at this right moment to resolve the Icelandic language issues on its own. So active Wikipedia contributors could be possibly consulted as independent experts, of course if they don't mind. --Ssvb (talk) 10:19, 17 April 2025 (UTC)
Among other things there were several pings to Jöttur made in the edit messages of commits trying to clean up his bad contributions, which he ignored, just as he ignored my and others' messages to him. The assessment also comes from me; although I am not an Icelandic speaker, I have enough linguistic background to have written the Icelandic noun and adjective declension modules (and consider that Icelandic declension is extremely complex), and I have been around on Wiktionary long enough that I can clearly identify when contributions are full of mistakes of various sorts. Anarhistička Maca also has deep linguistic knowledge of Icelandic, which I can attest based on personal conversations with her; her self assessment in Babel is probably based on her speaking ability, not based on her linguistic knowledge of Icelandic. I can spell out in gory detail all the errors but I don't see the point; ultimately either you trust my judgment or you don't. I welcome Wikipedia contributors with Icelandic knowledge to check Jöttur's contributions, but keep in mind they may not know Wiktionary's standards and rules, which are very different in many ways from Wikipedia. Benwing2 (talk) 21:32, 17 April 2025 (UTC)
SupportNumberguy6 (talk) 17:21, 15 April 2025 (UTC) Feel free to delete/downgrade Icelandic (and all the other languages) from my box. I'm not lying, but rather misunderstanding: I've always assumed that Babel is equivalent to how well one speaks a language (and if it's not, then someone should put that on the page), and I can speak Icelandic fluently (ref). Of course, I wasn't this good when I started contributing, but I've improved a lot over time, which is why I keep thinking "I made mistakes before, but I won't make them anymore". I've also contributed in many other languages (which I haven't tried to become fluent in), and been blocked for a month over that. The problem is that it's just too hard to know how good I am at contributing (which is a problem I've been on the other side of countless times on Wikipedia: "I know you learned how to write papers in school, but this is different."). Allowing others to edit one's Babel would be a great first step towards fixing this problem. As for next steps, I'm thinking of a process similar to AfC on Wikipedia or the Test Wikidata, where new users (and existing users learning new languages) can write entries and then have them reviewed by experienced users; if they pass the review, then they can contribute.
@Numberguy6 FYI, there's the Wiktionary:Babel page, which explains what each level means and "fluent" is supposed to be level 3. Being able to speak fluently without feeling that the language skill restricts your ability to express yourself doesn't mean that what you say is always grammatically correct. And there's a foreign accent too. My English is very likely not worse than your Icelandic. But the "near native" level 4 likely requires being truly indistinguishable from a native speaker. Which might be possible if, for example, somebody relocated to a new country at a very young age. And there are bilingual countries too, where everything is much more complicated. --Ssvb (talk) 18:47, 17 April 2025 (UTC)
I just realized that Wiktionary's Babel system only goes up to 4, while Wikipedia's goes up to 5. Since I've always interpreted 5 (not 4) as "indistinguishable from native", I've been setting my own Icelandic level at 4. Numberguy6 (talk) 19:26, 17 April 2025 (UTC)
This doesn't seem to be documented yet, but Wiktionary's Babel system actually goes up to level 5. You can see this, for example, by using edit preview. Level 5 is defined as "professional", and I assume this level is reserved for individuals with exceptional language skills, such as professional linguists, professional translators, and authors of notable literary works – people whose proficiency far exceeds that of the average native speaker. --Ssvb (talk) 02:45, 18 April 2025 (UTC)
This is correct; 5 is "professional" level which means you work with the language professionally. @Numberguy6 please set your Icelandic competency to 2 or 3 as it's clear you don't have near-native proficiency. Benwing2 (talk) 05:28, 18 April 2025 (UTC)
To be frank, I think we need a way to measure one's ability to add information, they are aware of various linguistic things important when making an entry, not just fluency and ability to speak. I know many fluent l2 speakers of, say, English, that don't have much philological or linguistic knowledge. Vininn126 (talk) 21:04, 17 April 2025 (UTC)
I therefore weigh in analysing written texts and knowing the whole grammar and typical manners by heart more than listening comprehension, speaking ability, and writing skills, which would otherwise have to be put into the basket together with reading comprehension to formulate decorations in general society. Then again even natives cannot plead their own language like most court interpreter, so what does near-native (this is a WT:COALMINE, like non-native) even mean? For a scientist it counts; for a kind of clerk supporting a business—there was a profession of foreign language secretary popular once—the writing may be where the money is, and then our perspective is slanted to business writing to the disadvantage of academic writing and conversational writing, while strikingly different “skill-sets” are sought in a call center—but we don’t count the scammers!—, and for some general reason, not merely technicalities, e-mail and phone support is done in separate corporate departments. There is lots of material to argue, either way. Fay Freak (talk) 21:51, 17 April 2025 (UTC)
in many pages in the thesaurus namespace, editors decide to mark groups of words with a different sense (say, all vulgar vs. all neutral) with a heading. there is currently no standard way on how to implement these, though most use a pseudoheading created using italics or bold formatting. I wonder then how should they be formattted, a pseudoheading or a level 5 heading or something else? Juwan (talk) 14:52, 12 April 2025 (UTC)
I would like to request extended mover rights for moving several entries from IPA to something more proper. There's only 10 + 1 entries but I would rather not bother others to clean up the mess I had originally created. – wpi (talk) 16:31, 12 April 2025 (UTC)
Moving archaic, obsolete, rare and uncommon meanings to the end
I guess that when someone uses Wiktionary they are probably more likely to want to see modern popular meaning first rather than archaic or rare ones. In that case, is it possible to automatically move all the obsolete and rare meanings (in all entries) to the end of the list?
We don’t do automatically. Against the likelihood of what someone wants to see there are issues like we don’t always know this, or frequencies, and it is unclear how diachronic perspective should be weighted against synchronic views—what to do with a once common term now only used in a marginal specialist sense?—, then in the end we give senses some logical order to have better presentation notwithstanding frequencies. Sorting by likelihood is specious, but it sometimes happens in place of complete arbitrariness. Fay Freak (talk) 00:27, 13 April 2025 (UTC)
I prefer chronological order when the evolution of the senses is clear. In complicated definitions I have tried to group related senses. For ghost the senses "disembodied soul" and "human soul" are closely related and I would group them together, likely as subsenses, if I cared enough to edit the page. Vox Sciurorum (talk) 13:09, 13 April 2025 (UTC)
I support putting archaic senses last. It is comparatively less useful to tell the reader "this is what the word meant 500 years ago" vs "this is what the word means now". Especially when there are long lists of definitions and it's not easy to immediately single out which one is still relevant. — BABR・talk18:34, 13 April 2025 (UTC)
While we generally put archaic and obsolete senses after the current ones, I don't think this should be a strict rule because sometimes putting the archaic and obsolete senses first indicates how the meaning has evolved over time, especially when earlier etymons of a word have a certain meaning, and the current meaning of the word seems different and unconnected. — Sgconlaw (talk) 18:45, 13 April 2025 (UTC)
I reverted the change a user made to ghost#Noun. I believe the sense that is once more the first has somewhat broader use than the previous labels ("dated, obsolete") wrongly (IMHO) indicated. I certainly agree that it is not the most frequent use, but I also doubt that many other than the most hasty users would be confused because it appeared before the more frequent uses. Frankly, English Wiktionary is not really a suitable online dictionary for such a user. I believe we have already discouraged such users (and probably some "normal" users, too) by the complexity of our entries, the bulk of etymologies and pronunciations appearing before definitions, etc. DCDuring (talk) 18:57, 13 April 2025 (UTC)
I don't agree with that revert, especially since there was an active, ongoing discussion about it when you did so. Regardless, you have to remember that "power users", like yourself, are not representative of the average reader. Most "power users" are on desktop, but most of our readers are on mobile, for example. I don't see how it is more helpful to the reader to see dated or historical usages of a term before the most common modern usage (that they are more likely to need).
However, I agree with Sgconlaw that we probably shouldn't make a hard rule about how senses should be listed. — BABR・talk19:57, 13 April 2025 (UTC)
I don't know that it is true that a typical reader is "more likely to want to see modern popular meaning first". They might well be looking up a word with an archaic meaning because they are reading something old, and the use of the word in the old thing does not correspond with modern usage. bd2412T18:59, 13 April 2025 (UTC)
This is a primordial debate that has been waged over the decades from the beginnings of Wiktionary. Each has its own adherents and each has good arguments behind it, so neither has prevailed. The difference is basically between having the arrangement tell a story or show the logic behind the development of the sense, on the one hand, or having the arrangement help the reader find the things that they're most likely to want to find.
The problem is that the entries are often far too complex to reduce to an algorithm. For one thing, we have things separated by etymologies. Since Wiktionary is organized by spelling, we have to deal with wound, the past tense of wind, and wound, an injury (with a verb that comes from it). Likewise, wind has the present tense of wound and the movement of air (again with a verb that comes from it). Having the most common sense of each right next to each other would be confusing, so we would have to settle for arranging the senses within the etymologies. Even there, the senses within an etymology have subsenses. The extra verbiage needed to provide the information the reader would get from the sense/subsense arrangement would add to the clutter in our already quite cluttered entries for common terms.
In the end, we can't rearrange things to completely fit either phiolosophy- and we're likely to make a mess of things if we try. Chuck Entz (talk) 20:56, 13 April 2025 (UTC)
Not to mention less-common subsenses in highly polysemic words. In principle we could try to selectively hide definitions based on subsense status and label, but that would be very difficult, possibly impossible (eg, subsenses that don't have an explicit substitutable supersense definition.). DCDuring (talk) 18:25, 15 April 2025 (UTC)
The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review of the EG and Charter was planned and implemented by the U4C. Further information will be provided in the coming months about the review of the UCoC itself. For more information and the responsibilities of the U4C, you may review the U4C Charter.
Please share this message with members of your community so they can participate as well.
I recently wasted a minute or so because I could not distinguish bumbag (bumbag) from burnbag (burnbag). My display is large and Windows scaling is at 150%. Further scaling helps, but surprisingly little. I use Vector legacy, but didn't get better results on others skins that were otherwise tolerable. I don't recall other character pairs that cause a problem.
Is there a way to select a better ("more visually accessible") font, preferably not a monospace one, or better kerning in personal CSS or JS? What is the appropriate place to whine about such an "accessibility" issue? DCDuring (talk) 12:10, 18 April 2025 (UTC)
Why not just choose any font you want using either personal CSS, as you say, or browser settings? I cannot be bothered to check other themes but at least as of Vector 2022 English text just uses the default browser sans-serif font. (Which is good, as I would be very much against Wiktionary shipping its own English font.)
@DCDuring yes, Lunabunn is right, Wiktionary (and Wikipedia, etc) has always used your default browser font for body text. Or you can do the override in your personal CSS:
Firefox provides for adjustment of character spacing, but I'll look for a font that addresses the problem of the specific pair r,n vs. m. DCDuring (talk) 14:51, 19 April 2025 (UTC)
Lucida Sans Unicode is an improvement. Good enough for now. Thanks for getting me pointed in the right direction. DCDuring (talk) 16:18, 22 April 2025 (UTC)
Windows has good font-choice features, including a good number of fonts. The Braille Institute has "Atkinson Hyperlegible Next" which doesn't fully handle my problem, but may be better overall. DCDuring (talk) 17:14, 22 April 2025 (UTC)
Pinyin-derived English Language Terms: Approximate Origin Dates
Many Pinyin-derived English Language Terms originated in the "Mid/Late 20th c.", which is what I plan to put in the etymology section of any Pinyin derived word if: the word appears in "Shabad, Theodore (1972) “Index”, in China's Changing Map, New York: Frederick A. Praeger, page 345". My rationale is that the initial uses or mentions of many of the words in the Index are going to be somewhat obscure, and I might say "Late 20th c.", but if the words were known of in that 1972 Index, they could easily have been used at some point in the 1960s or even late 1950s, see especially Citations:Beijing, which clearly existed in 1958 (though I have found no evidence of it in 1957). However, as for words that do not appear in that index, I do not assume they exist at all before 1979; they are an open question to me whether they would have happened in the 1960s or early 1970s. I will leave those without any dates for the time being (unless they have a clear later date, like Xiong'an.) I will try this dating scheme, and build on the developments that grow from it; please let me know if you have any insights or comments. EDIT: No, I think I'll do something like "circa late 20th c." c.Late 20th c., for all these so I don't create an artificial distinction between words in and not in that Index, but I write circa because technically some words have a real chance to exist between 1958 and 1967, albeit hyper rare. Unless I have some other more specific info, and if I know the word existed in the 20th c. and could theoretically have been created in 1958 though probably wasn't well known until 1979 and there's no evidence of it between 1958 and 1967 (counterexample, cf. Guangzhou), I'll use this. I've done about 15 test cases just now--, see if you have any objections: Zhongsha, Xisha, Ritu, Xian, Sanmenxia, Atushi, Ningxia, Quanzhou, Shanxi, Wuxi, Jixi, Zhangzhou, Shashi, Wulumuqi, Haerbin. I'm acknowledging the possibility of very early period usage or mentions, but I'm not forcing it when it actually may not have happened. --Geographyinitiative (talk) 10:54, 20 April 2025 (UTC)
We have a problem
You see, it doesn't make sense to call Old English a different language. Languages are like people, they don't become different languages. Even though it's drastically different, calling Old English a different language is like calling you when you're 15 a different person from when you're 2. Same thing with Middle English. — This unsigned comment was added by 2603:9000:e102:e587:4443:782f:4e79:f9f5 (talk).
There is no single determiner between one "language" and another "language" or between various lects within a language (see chronolect, dialect, topolect, etc.). One common and handy rule of thumb is if the two are mutually intelligible. If you personally time traveled 900 years back to the days of the Angles and Saxons in England shortly after the Norman Invasion and you said "Hello, I am from the future" in your current tongue, they would have no idea what you're talking about and you couldn't understand anything they said as well. Take a look at a copy of Beowulf (written 1,000 years ago in Old English) or Chaucer's Canterbury Tales (written 600 years ago in Middle English) and tell me if it makes sense to you as a modern English speaker. The latter will have several reasonably similar passages and you can stumble thru a lot of it. The former will be virtually like Greek. It is common and reasonable to call the "Old" versions of a language something that is either so different from the current one as to be a separate language or a clearly demarcated chronolect that a contemporary speaker could not understand or could only understand a little with great difficulty. See also, e.g., Vulgar Latin leading to Old Spanish to our current Spanish. —Justin (koavf)❤T☮C☺M☯23:41, 20 April 2025 (UTC)
By your standard, French, Spanish, Italian and Romanian are all just Latin. For that matter, all the language families would consist of a single language each- are we communicating in "Indo-European" or "English"? England has been a single nation located in the same place under basically the same name since the latter part of the Old English period, and even invasions of Old Norse and Old French speakers didn't change that. That makes it easy to call the language(s) spoken there by the same name, which leads people like you to assume that they're inherently the same thing- in some ways they are, and in others they aren't. Of course, there's room for debate as to whether it's better to treat Old, Middle and Modern English (not to mention Scots) as one or multiple languages- but not because it's impossible for them to be anything but one language. Chuck Entz (talk) 01:17, 21 April 2025 (UTC)
Precisely. The fact that they all have the word "English" in the name does not imply we have to treat them as the same language. Theknightwho (talk) 01:28, 21 April 2025 (UTC)
Think about it like this: would you be requesting this merge if the name we used for Old English was "Anglo-Saxon" instead? Theknightwho (talk) 13:13, 21 April 2025 (UTC)
I couldn't have said it better myself. Actually, what the original poster said should be extended as much back as possible, for the proposition to reach its logical consistency, that is, to Proto-Indo-European. It doesn't really make sense to treat all its dialects as separate "languages". We should put all different deviating and innovative senses a pIE word has developed under the pIE page, and, if needed, add appropriate labels for more recent pronunciations, appended with a specifications for when and where such and such word was pronounced this or that way. That would be an endeavor worth a dictionary aspiring to be called "etymological".Make Dargwa great again (talk) 20:14, 24 April 2025 (UTC)
As a Georgian, I’ve been saying this for years. Look what I have to do with my Babel box when it’s really just ka-3, ine-2/3. Nicodene (talk) 13:33, 31 May 2025 (UTC)
Request to become interface administrator - User:Theknightwho
Hi - as per Wiktionary:Interface administrators, could I please be added as an interface administrator? This would mainly be to deal with languages and scripts, as well as scripts like MediaWiki:UpdateLanguageNameAndCode.js. What prompted this request is the fact that I am putting together a series of modules for keeping our Unicode data up to date, which would work by extracting the data from Unicode text files saved in raw modules (e.g. Module:Unicode data/raw/DerivedCombiningClass.txt), and constructing it into a suitable format that can be accessed by other modules. This isn't possible without JavaScript. Theknightwho (talk) 13:11, 21 April 2025 (UTC)
(pinging @Vininn126, Lunabunn from a Discord discussion a few weeks ago):
The current setup of usexes has three templates: {{ux}}, for multiline; {{uxi}}, for inline; and {{uxa}}, which automatically switches between the two. IMO this is too liable to be poorly used. Generally uxa is best for most situations and the default behaviour of ux should reflect that; we should not have 150 character inline usexes nor should we have 10 character multiline usexes.
I think it's best, then, that we switch ux to have the behaviour of uxa by default and retire uxa and uxi. If, for some reason, the formatting needs to be manually overridden, then we can add parameters to ux to do that.
On a technical level, as well, this is relatively trivial to do: change ux to use the same code as uxa, and run a bot job to switch calls of uxa and uxi to use ux. Saph (talk) 13:53, 21 April 2025 (UTC)
Support — @Saph: I was not previously aware of {{uxa}}, but I note that {{ux}} already has an |inline= parameter which, if the template's documentation is accurate, can be used to switch from the template's default behaviour (always-multiline) to {{uxa}}-style automatic switching or to the always-inline presentation of {{uxi}}, depending on the argument supplied. IMO, we should make {{ux}} act like {{uxa}} by default, make |inline= Boolean for switching to always-inline, and have |multiline= (or something similarly intuitive) as a Boolean parameter for switching to always-multiline. That is what I take your proposal implicitly to entail anyway. 0DF (talk) 14:48, 21 April 2025 (UTC)
@Lunabunn, Saph: Please consider accessibility for non-coders’ sake, choosing intuitive parameter names and ideally idiot-proof inputs. Shortening |inline= to |i= is a particularly bad idea, given the number of templates that use |i= to activate italicisation. 0DF (talk) 21:07, 22 April 2025 (UTC)
@0DF I also don't particularly see the need for an alias FWIW because I thought the entire point is to eliminate the need to specify in most cases. In the occasional edge case where it is still needed, a few extra characters shouldn't be an issue. That being said, Saph did say aliases; I don't see her proposal hurting anyone, either. 🌙🐇⠀talk⠀⠀contribs⠀21:28, 22 April 2025 (UTC)
Strong support. Hard-coding one or the other is an active hazard for accessibility across different environments. I am working on a better inlining heuristic for {{uxi}} that takes into account viewport width, which should hopefully further improve UX (pun intended). 🌙🐇⠀talk⠀⠀contribs⠀18:30, 21 April 2025 (UTC)
The only bot that has consistently added audio files to Wiktionary pages has been blocked for months now with no indication that the issue(s) will be resolved. Meanwhile, Wiktionary:Approved Lingua Libre users is growing slowly but steadily, and Commons users continue to upload files under the xx-foo.ogg nomenclature. Audio pronunciations are too valuable a resource to be added by hand a few at a time. To give an idea of what we're missing out on, there are more pages in commons:Category:Dutch pronunciation than results here for the search "terms with audio pronunciation". Is anyone willing to create or adapt a bot for the purpose of importing audio files? Ultimateria (talk) 19:11, 21 April 2025 (UTC)
It's too bad that User:Derbeth has not been willing to do this work. On the surface this sounds simple but there are some potentially tricky issues to work out:
Ensuring that we don't auto-add audios when there are multiple pronunciations specified for a given term, as we won't know which audio goes with which pronunciation.
Ensuring (or trying to ensure) that we don't re-add audios that have been previously deleted.
Additional language-specific tweaks; e.g. we may want to entirely exclude languages written in the Arabic script at first due to the underspecified vowels.
I agree. I recall one annoying issue was that pronunciations that were identified as incorrect (for example, stressed on the wrong syllable) kept getting readded by the bot. Any new bot will need to have a way to avoid this. — Sgconlaw (talk) 05:55, 22 April 2025 (UTC)
I think this can be implemented by looking in the page history to see if the audio was already added. Probably the best way to do it is to include the name of the file and language in the changelog message, in a particular format such that the bot can identify just based on the changelog message that it previously added the same audio. AFAIK, it's pretty fast to retrieve a list of the last 500 commits (including commit messages) to a given page, but slower to check the contents of the commits, because each such commit has to be retrieved individually. Benwing2 (talk) 06:04, 22 April 2025 (UTC)
Provisions on Sicilian Entries
Catonif, Nicodene, Scorpios90, Medellia, Afc0703, Benwing2, and if you know anyone else who edits Sicilian (bonus points if they're mothertongue) ping them as well. Benwing2 is pinged as this topic, aside from probably being of interest to them, also includes topics of templates.
I think it's necessary to make a few express decisions on Sicilian entries, and to ultimately create a Wiktionary:About Sicilian page, as Sicilian has already posed some challenges before. I'd like to reach some official decisions around 3 thoughts I had:
Could vulgar/ad hoc spellings possibly be added as pages? (directing visitors to the CS spellings) (CS: Cademia Siciliana)
Pointing out the official templates (especially for verb conjugations and pronunciations)
Labelling narrow pronunciation transcriptions and alternative written forms by dialect?
I'd like to spend a few extra words of mine on all of these, to get the conversation going and to give everyone an idea of what my initial proposals are so you can put your ideas on the table. After you can also propose other concerns you've had so we can make as many useful decisions as possible.
Including "vulgar" or ad hoc spellings with redirects and labels
You might be asking yourselves what I mean by "vulgar/ad hoc spellings". I don't have any other name for them, but as you know Sicilian is (unrightfully so!) not taught or officially used in school, and this also means effectively illiteracy, analphabetism in Sicilian, for speakers. Now, a lot of the youth use a modern Sicilian to communicate, and verbally there is no problem. On text, however, I see my peers having to eye dialect their way through the words they use, leading to what I call vulgar or ad hoc spellings. These can be found on social media, in chats, but also in places such as restaurants that may have their name in Sicilian (although often those try to use more CS-like spellings in stuff like their menus, but that's beside the point). Here are some examples of this spelling that I commonly see used in my school class group chat (Gelese Sicilian): macna for màchina, po for pû, itve for jìtivi, femmna for fèmmina, vene for veni, foco for focu. I'm quite confident these spellings are probably influenced by Neapolitan song names and lyrics being written in this sort of way. Examples in the wild: po culo, pizz, femmna (Gela)(P.S. the restaurant might actually be traditionally Neapolitan, from skimming their page), como vene si cunta.
Now, the examples I found are by no means many and while I'm sure I could find some more (I know one place I could look), it does seem like there's less of this sort of writing on the indexed web than I thought. I'm not sure what other people who grew up in Sicily will say, whether they've had a similar experience or not, but at least in Gela, and at least in these recent years, every teenager writes in Sicilian like this. Do you think these spellings should be included in our overall project (with appropriate labelling as ad hoc spellings)? I would personally discard any non-popular spellings of words from being covered as basically nobody will use them (I'm talking about like, idk, spelling cunigghiu as kunigghju and stuff like that). CS should remain the "official" orthography for Sicilian entries on Wiktionary.
Pointing out the official templates (especially for verb conjugations and pronunciations)
There appear to be 9 templates for Sicilian verb conjugations (Category:Sicilian verb inflection-table templates). The number of Italian ones? 1. (Category:Italian verb inflection-table templates). It would definitely be better if we could do the same with Sicilian verbs, and use only that one template. I can help with this point, if morphological help is needed!
I can also help for a uniform, phonemic, broad transcription pronunciation template, if phonemic help is necessary.
Labeling pronunciations and forms by dialect or region
Pronunciations on Sicilian entries, as of current, are most often narrow transcriptions (P.S. this does not seem to actually undoubtedly be the case), of different and unspecified dialects, e.g. the pronunciations in arrè and babbaluci look to me like they're central or conservative, while I'm very sure the pronunciations in aḍḍumari and astutari are another accent. I don't think anyone would be against labelling all narrow transcriptions (and possibly also audio pronunciations) by the accent represented, either by the name of the town, e.g. "(Gela)", or by the name of the dialect/accent associated to the location, e.g. "(Gelese)", or by the type of Sicilian (Eastern, Western, South Eastern, Central...) if the exact location is unknown. Whichever way you people like. In this same bout we could also label alternative forms the same way, by accent (e.g. jattu could be labelled "predominantly Eastern Sicilian, Catania").
As a final word... this language is a mess on Wiktionary, amongst obsolete spellings in page contents or even titles, amongst stub pages, missing information and haphazardous coverage. But hey, if we make some decisions, we can include them in an About Sicilian page because imo it could be a good way to promote a standard of quality and good information across Sicilian entries. What you all think? Crunchy Cloaky Crackdown (talk) 00:53, 22 April 2025 (UTC)
@Crunchy Cloaky Crackdown Yes, Sicilian is unfortunately a mess. @Nicodene and I did some cleanup of pronunciations, esp. the narrow ones, which were often simultaneously incorrect and over-detailed. It would be great if you're willing to clean some of the mess up. As for point #1, these sorts of spellings can be included, yes, but they should be properly sourced (which may be a bit tricky, as Facebook, Twitter and the like don't count as valid sources). As for #2, I'm pretty busy now with all sorts of requests so I can't commit time at this point to writing a Sicilian verb module, although it should definitely be possible to create one by modifying the Italian module (or possibly maybe better, start with the Spanish module; the Italian module is somewhat complex in order to handle all the irregular and obsolete forms sometimes found in standard Italian, which might not be needed for Sicilian if we only want to cover a single modern standard). As for #3, I'm pretty sure Nicodene prefers broader or even phonemic pronunciations. My general preference is for "lightly phonetic" pronunciations, which means that some salient allophonic features may be represented if they're non-obvious to a language learner, but mostly you should follow the phonemic form. Definitely if we include narrow details they need to be tagged with the appropriate accent identifier. Benwing2 (talk) 05:52, 22 April 2025 (UTC)
@Benwing2 for #1: you say sourcing for the ad hoc spellings is necessary? Does that mean having a link to a source that includes this spelling in the "References" section of the entry? (Also, why are social media (like Facebook or Twitter) not valid for this? Where else should one look?)
For #2, given what you say I might start fiddling by myself later, at least with the conjugation template.
For #3, I forgot to make this clear but yes we should definitely prioritize broader or phonemic transcriptions over specific realizations like you and @Nicodene prefer. Either way though if we make a pronunciation template like I proposed this problem will basically vanish forever anyway. Crunchy Cloaky Crackdown (talk) 12:45, 22 April 2025 (UTC)
Hi. If you’re curious, here is the previous thread on Sicilian pronunciations. A major problem that @Catonif and I had was that it’s unclear which pronunciation we should take as ‘standard’. The issue could in principle be side-stepped by simply tagging all given pronunciations by location, I suppose, at the cost of leaving a backlog of some hundreds or thousands of already-added pronunciations with no such tag.
Have the Sicilian Academy published any sort of ‘orthoepic’ recommendations? That could form the basis for a pronunciation module I suppose. Alternatively, if there exists a detailed phonetic description of, say, Palermitan (or whatever speakers tend to regard as a prestige variety) we could use that as our in-house standard.
As for adding commonly occurring non-standard spellings as alternative forms: yes, I think they deserve to be documented. Nicodene (talk) 18:46, 22 April 2025 (UTC)
Yes, Cadèmia Siciliana has a whole book on its orthographic proposal, Proposta di normalizzazione ortografica comune della lingua siciliana (first edition, 2017), available on the website for free—as it so happens I just started reading it last week! I'm not sure if they have worked on a second edition, but they have a periodical, some of the articles in which are orthography-related (though I haven't looked much into these). As for your question, I'm not yet sure if they actually prescribe standard pronunciations or if they're even concerned with that, but of course any preference of orthography (even if the system is for something "objective" like phonemic IPA) will inevitably favor some pronunciations over others. — Ganjabarah (talk) 22:34, 22 April 2025 (UTC)
@Catonif Ya I read that thread to get informed before posting this whole thread! Personally, I don't think we should consider any pronunciation at all as 'standard', I don't think there's any point to doing that, unless I'm missing something. Meanwhile, you hinted at in your own 2022 post as well, that Sicilian words share the same phonemic base across all dialects: I'm in favor of always including that phonemic transcription. As well, any other regional pronunciations, transcripted broadly, but not phonemically (so possibly containing phones absent in Sicilian phonology, like /ʔ/) might be added, and alongside those, in case a native speaker (or certain researcher) wills to add them, (accent-labelled) narrow, allophonic transcriptions local to specific areas might also be added. For example gattu could have, in its Pronunciations (pseudo printed output):
Or something like that (with hyperlinks too of course, not sure how that could be gotten to work). The only complaint then could be that it's a tad long, but to balance that out it's decently informative and does contain all the information one should need.
However yes, as you point out there's been a lot of narrow transcriptions that have been added in the past and we, erring on the side of caution, could never know and add the exact location for any and all of them, resulting in a painful backlog as you say. Maybe a half-solution could be to mark these pronunciations with some request for verification as to what accent it is? But I'm not too sure how that truly works and if I've got my head in the clouds saying this.
By the way I searched, and the CS has not released anything on pronunciation as far as I can see. Even then though, I should definitely be able to help a decent bit for a pronunciation module myself anyway. Sicilian phonology and phonemics aren't very complex, even with written word to phoneme conversion in mind. That being said I do have a few doubts about a few sounds and if they constitute different phonemes (off the top of my head, I have difficulty coming to terms with the "soft c", amongst , , , and even ), but I'm probably thinking that with some research and paying attention when people speak I should be able to make sense of things. Crunchy Cloaky Crackdown (talk) 22:45, 22 April 2025 (UTC)
Yes it's very possible to have appropriate hyperlinks added to accent qualifiers, and I don't have any objections to long pronunciation sections; if necessary we can simply hide some of the pronunciations by default (we follow the same approach for Spanish; see cebolla for an example). As for marking pronunciations as needing verification, that could be done too, either with an existing template like {{rfv-pron}} or a new Sicilian-specific template. In either case the term would be added to a cleanup category. My main concern with this approach, however, is that the terms may never be cleaned up; historically, cleanup categories have tended to languish unless there's someone particularly diligent about going through them. An alternative is I could generate a page listing all the existing pronunciations, and someone like you or @Nicodene or @Catonif could mark all the questionable ones, and I can have a bot go through and delete them. I'd rather have no pronunciations than pronunciations that are sketchy, questionable or clearly wrong. We followed a similar approach for Manx for several thousand bad lemmas added by Embromystic. Benwing2 (talk) 23:23, 22 April 2025 (UTC)
@Benwing2 Oh that's useful info, so we can definitely make Sicilian prons look pretty and well implemented then. As for the cleanup operation, are you referring to narrow transcriptions, broad, or both? I've looked around some of the
words in Sicilian that have IPA pronunciations and most seem fine to me*, apart from some that have transcriptions with sounds that I haven't heard before (might be attributable to inexperience in the language). Could you show me some examples of pronunciations you think look sketchy or questionable as you say? And if it's that many then we might consider mass deleting pronunciations like you suggest. I understand though that people adding misinformation on small languages is very common on Wiktionary so I wouldn't be surprised.
*I will have to say though, the stress on some entries actually is wrong, and I also often think some audio pronunciations sound like the person just read the Sicilian word in an Italian accent and that was it (again, maybe I'm inexperienced and people actually pronounce it like that somewhere, but I find that very, very unlikely from my overall experience).Crunchy Cloaky Crackdown (talk) 21:35, 23 April 2025 (UTC)
@Crunchy Cloaky Crackdown I'm referring to narrow transcriptions, many of which on first glance looked wrong to me (in the conversation linked by @Nicodene). But I'm not actually sure whether they're wrong, I'm just guessing based on this along with what you said (that many of them are unidentified as to accent) and the fact that, as you note, it's very common for people who don't know what they're doing but think they do to add incorrect info to less-known languages. Likewise for the audio pronunciations; if there are a lot of incorrect ones and we can identify a pattern either in the contributor or the format of the audio filename, we can mass-delete them. Benwing2 (talk) 22:05, 23 April 2025 (UTC)
To me those pronunciations specifically, on the old 2022 thread, look passable (although of course they're of unidentified accent) and realistic enough. As for the audio files... it turns out that most of the ones that have been added (which are still very few, consider only A, B, C, and S lists exist) were added by me... that leaves a few remaining which are the ones that rubbed me the wrong way: biḍḍizza, beḍḍu, Sicilia, which are all by the same recorder, @User:Àncilu. On their own user page their Babbel says scn-4, but their recordings (I went out of my way to see some of their own Sicilian recordings on Wikimedia (scroll down)* + another page (Ctrl+F and see the last few ones, like "a fini" and "a cunnizioni i" and none of them sound convincing or even slightly believably Sicilian, and they don't even really get the sounds of Sicilian accurately either, especially ⟨ḍḍ⟩ which they treat as if it were ⟨dd⟩) make me doubt they have that level of competence at least in pronunciation. Next there is an editor that leaves me a little perplexed with the narrow pronunciations they add, also because they're a red name and as such they don't have a Babble: @User:Inqvisitor. I'm concerned about them because they generally add pronunciations that I find a little weird (personally, it's the ones having in them like ballari, or also tulimaicu with how they didn't split into , but I might just be finding reasons to be over-critical), and they've also said a very weird thing, if you look at beddu's history: the edit summary on the 13rd July 2023 (they've done a similar thing in biddizza, same date). The reason I find this statement contestable is mainly the "plus /ḍḍ/ is not even standardized scn orthography for " statement, which if we go by CS is absolutely not true? Also, ⟨dd⟩ and ⟨ḍḍ⟩ are two totally different sounds and it seems weird to not want to differentiate them. Again I might just be trying to find reasons to be skeptical, and that can be done with anything in life, but there really is a difference between a blue name editing and openly not including a Sicilian level in their Babbel (therefore admitting they might be making mistakes editing, e.g. you guys who are very noble in this) and a red name you don't know any background about.
@User:Hyblaeorum seems to make good edits, and the narrow pronunciations they add are very un-complicated and generic.
*their Italian recordings don't sound like they have a Southern accent either (the most southern it sounds to me is Naples, but if I had to say one accent I'd say Tuscan. Possibly purposefully recording a Standard Italian accent?), and there are also some Russian recordings too which don't sound very convincing (seemingly missing palatalization sometimes, vowels not being realized as specific allophones based on context); both of these are in the same page as linked by the wayCrunchy Cloaky Crackdown (talk) 16:09, 24 April 2025 (UTC)
@Crunchy Cloaky Crackdown Iu parru lu sìculu (pirchì lu mè nannu veni dâ. Nun haju pirò n'accentu/prununza sempri bona pirchì nascivi ntê Pugghî e abbitava pi 17 anni n Vaḍḍi d'Aosta, supratuttu pâ ḍḍ. Siḍḍu vuliti scancellu di ccà tutti li mè file di prununza n sicilianu. Pû russu, è na lingua ca iu studiai pirciò la mè prununza nun è pirfetta. Àncilu (talk) 17:08, 24 April 2025 (UTC)
Ah, capisciu. Cumunca se, lu putìssitu fari di scancillalli, ma prima vulissi sèntiri si chistu va bonu pi l'autri :)
Ah, I understand. By the way yes, you could indeed delete them, but first I'd like to hear if this is fine for the others :)
@Benwing2, @Àncilu is talking about deleting all his audios in Sicilian (either me or them could provide a translation of his reply if you need (Google Translate + slight guesswork should be enough), but I'd still prefer to keep this conversation in English) Crunchy Cloaky Crackdown (talk) 22:35, 24 April 2025 (UTC)
@Crunchy Cloaky Crackdown Thanks. Google Translate does an OK job but leaves out entirely the sentence where @Àncilu says it's OK to delete his audios (I assume that's what he's saying). I'm fine with removing them; that's probably the best option as (per my earlier statement) it's better to have nothing than something wrong. As for the other users yeah there are a lot of users who think they know what they're doing but don't; it's a chronic problem and one where I'm increasingly convinced that we just need to nuke all the contributions of some users rather than trying to correct them. Some users will respond to warnings telling them not to contribute to languages they don't know, but others won't, and there are very few people who will clean up past bad contributions they've made. Benwing2 (talk) 22:43, 24 April 2025 (UTC)
I did not mean delete from lingua libre but in the sense of not having it appear in the English Wiktionary. The goal of lingua libre is to document as many pronunciations from different locutors as possible. But if you prefer a 100 % Sicilian accent, no problem, you can record it yourself if you can get a more faithful accent. This means it will remain in the French Wiktionary and that's it. But in my opinion, it is interesting that a French-speaker in Morocco would record the pronunciation of rural locations of the word “voiture”: instead of Àncilu (talk) 22:50, 24 April 2025 (UTC)
@Àncilu Yes, what I meant by "delete" is to remove the audio templates from the English Wiktionary. I won't delete anything on Lingua Libre (and don't even know how). Benwing2 (talk) 22:53, 24 April 2025 (UTC)
By the way, my point in the previous thread was not that phonemic transcriptions are preferable to phonetic ones, but rather that if one does want to make a phonemic transcription, one should make sure that what one puts in it really is phonemic. I would actually suggest using phonetic transcriptions for representing regional variation in Sicily, since you can simply focus on the actual sounds without having to make assumptions about the deeper sound-structure (phonology) of each dialect. Nicodene (talk) 02:57, 24 April 2025 (UTC)
Oh, right, I totally see now seeing the broad transcriptions at the start of the post. Those ones aren't phonemic. And I've seen that since then, a lot of Sicilian entries have, in their histories, edits that report their Pronunciations being normalized as you call it, so that's good. Also yes being able to compare regional, phonetic pronunciations would be the ideal (as long as a base phonemic pronunciation is still present of course, but I don't think you wanted to make that optional). Crunchy Cloaky Crackdown (talk) 22:58, 24 April 2025 (UTC)
It is possible to do without the phonemic level entirely. This is now the case for our Catalan and Russian pronunciation modules, for instance.
For Sicilian, a ‛pan-insular’ phonemic transcription may be possible for something like gattu but not for words reflecting a number of historical developments. For instance /ˈnɔvu/, /ˈnɔvi/, /ˈforti/ would fail to account for Mistrettese having , with a diphthong yet without a diphthong (AIS 1579, 186). /ˈmɛrlu/, /parˈlassi/, /ˈtɛrra/ would fail to account for the same dialect having with yet , with (AIS 493, 1627, 420). And so on.
Oh wow, I actually didn't know there existed situations like these where you can't always predict the pronunciation for one dialect with just the phonemic transcription... and I also didn't know Catalan and Russian only used narrow transcriptions. I totally understand now. So, do you feel Sicilian phonemic transcriptions should even remain anymore at this point, if we can easily do well without? I saw Catalan has a pronunciation template, Template:ca-IPA, which generates three "macro-regional" square-bracketed pronunciations, maybe the Sicilian pronunciation template could be something similar, with narrow transcriptions for Palermitan, Catanese, and some other major dialects? I would not be able to help directly with transcribing any of those (I could only Gelese), but I'm sure there exist studies of either dialect's strict phonetics. This site that you consulted though... I haven't fully made sense of it yet, but it seems like it allows one to fetch realizations by location? Do you feel like a hypothetical Sicilian pronunciation template could make use of that? Crunchy Cloaky Crackdown (talk) 18:10, 25 April 2025 (UTC)
I do think that using phonetic transcriptions, whether relatively broad or relatively narrow, is the most practical solution here. It takes quite a bit of work to establish phonemic correspondences for a range of dialects. (See e.g. this discussion with @Jamala regarding Neapolitan.)
The website that I linked is a digitization of this linguistic atlas. It’s useful for reference but would be difficult to base a pronunciation module on.
Ideally we’d base the module on one or more varieties of Sicilian whose phonetics are described at length in multiple sources. If such exist, and you find the sources, I can help by condensing the relevant information into a rough draft for the module. Nicodene (talk) 22:56, 25 April 2025 (UTC)
The real issue is the lack of recordings. Most readers unfortunately don't know or care the first thing about IPA, let alone phonemicity. What they do tend to learn is how orthography corresponds to a pronunciation, based on hearing many examples, and any IPA transcription to supplement a corresponding recording is mostly helpful to linguists or the 1% of Sicilian learners who are interested in the linguistics. In my opinion reaching out to speakers from various dialects to provide multiple audio recordings per word should be the top priority. An audio is worth a thousand IPA characters… or whatever they say. — Ganjabarah (talk) 00:23, 26 April 2025 (UTC)
@Nicodene I tried to search for those in English on both normal Google and Google Scholar and nothing came up- when I tried doing the same in Italian, I was able to find three papers on Sicilian pronunciation. I'm not sure how useful they could be but I'm assuming you know where to look.
Very old one from 1890, doubt it could be useful as it only seems a little superficial, and it also doesn't read well
I'm surprised there was seemingly nothing for specifically Catanese or Palermitan. Either way I also sent an email to the Cademia Siciliana, maybe they know some more sources. Either way I hope I managed to provide :) Crunchy Cloaky Crackdown (talk) 22:21, 26 April 2025 (UTC)
Thank you. I can’t seem to access the second source through that link. The third source seems fairly high-quality.
Good to know I was of use, and I fixed the link for the second source and even casually found another on the same dialect! Lmk when you have something Crunchy Cloaky Crackdown (talk) 10:38, 2 May 2025 (UTC)
In Cimbrian, we apparently call the perfect tense (be/have + past participle) the "preterite". See the definitions and the inflection template at "haban". This is highly unusual and misleading. (1.) Throughout Continental West Germanic the perfect tense stands in for the preterite, which latter is often in limited use or -- in Cimbrian, but equally in all other forms of modern Upper German -- has been lost entirely. Nevertheless the remaining composed past tense is called the "perfect" in all of these languages. More or less the same is true of various Romance languages including French and Italian. (2.) The term "perfect" (= completed) is entirely adequate for such a tense and in line with the scope of the original Latin perfect. (If anything, the perfect tense in English is a misnomer.) The word "preterite", on the other hand, is used in Germanic specifically for the synthetic past tense. Therefore I see no justification to deviate from the general rule in Cimbrian and call the perfect the "preterite". I ask for permission to change the Cimbrian conjugation templates and remove the term "preterite". Nothing speaks against replacing it with "perfect", but "past tense" could be used as well. 84.57.154.522:09, 22 April 2025 (UTC)
No objections from me, but give it a couple of days to see if anyone else comments. You are right that "preterite" is usually used to indicate a synthetic past tense and not a tense formed with auxiliary + past participle. Benwing2 (talk) 23:25, 22 April 2025 (UTC)
Glyph origin
This is about the (graphical) etymology section called “glyph origin” of mostly Chinese, but also Japanese, Korean, etc. glyphs.
I own quite a number of books on the topic of the development of Chinese and Japanese characters (glyphs, graphs), the best of them published in Chinese and Japanese. A recent example is the book: 漢字字形史字典【教育漢字対応版】 (Dictionary of the historical evolution of kanji forms: Edition covering all Elementary school characters) 落合淳思 Ochiai Atsushi. 東方書店 Tōhō Shoten. Tōkyō Metropolis, 2022.
From this book and others it can be learned that for a great many glyphs there is no consensus about the origin or development of a certain character. The author of this particular book deals with that by having selected ten important researchers and comparing his own analyses with these other researchers, for each glyph. Additionally, research in this field is very much ongoing, and earlier opinions are often discarded or changed.
However, in the section “glyph origin” only very rarely sources are cited.
I don’t mind that contributors only give the most commonly held view on the origin of a specific character. I would certainly not want contributors to use the elaborate method used by Ochiai, the researcher I referenced above.
However, it would be very helpful if contributors would cite their source, to show who's opinion they are giving, and from which time period.
As I indicated, giving the source is quite rare, which puzzles me. I can only assume that contributors are using a source and not making it up, so why do they normally not include their source as well?
There are also contributions that contain sentences like “An alternative theory suggests...” - and not naming the source of that alternative theory either.
In conclusion: There is no way for the reader to judge the reliability of a given explanation, by noting who's opinion it is, from which period, of to seek more information by looking up the source.
I’m not an active contributor myself, so I’m wondering what is going wrong here. Perhaps contributors need a reminder to include their source? Perhaps a list of sources should be provided to the contributor, so that is easy and not time consuming to add the correct source? Perhaps there should be some other way to make it easier to add a source?
There have been a lot of active discussions recently about strongly encouraging or even requiring sources. Wiktionary in the past has not required such sources, which IMO was a mistake. Cc. @Thadh @Vininn126 @AG202 as some who have participated in the discussion about sources, and @Justinrleung and @Wpi who may be able to comment specifically on the glyph origins and where the info is coming from. As for making it easier to add source info, the way to do that is to create the appropriate templates: reference templates of the form {{R:zh:...}} listing the actual sources, and parameters in the glyph origin templates to make it easy to cite specific sources (I did that, for example, for Italian pronunciations, where a lot of them are sourced to DiPI). Benwing2 (talk) 22:09, 23 April 2025 (UTC)
Currently SC headline template for proper nouns does not support female equivalent/f= to handle nationalities. Should we add this function? Chihunglu83 (talk) 11:55, 24 April 2025 (UTC)
How is it proper noun though if it has a female equivalent? In family names it can be, of course then we should have this function. You have Swede and German as nouns, only the language German as a proper noun—which is neither a proper noun in my opinion which I also have argued at some other place, there are various Englishes and Germans. So Švéđanin should also be declared a noun and not a proper noun. These terms being entered as proper nouns presumably merely has taken place due to fallacious conclusion from their capitalization. Nobody makes this mistake for Arabic script where no capital letters exist, e.g. أَلْمَانِيّ(ʔalmāniyy), also the language أَلْمَانِيَّة(ʔalmāniyya). Fay Freak (talk) 14:14, 24 April 2025 (UTC)
Yeah I agree that demonyms should be common nouns not proper nouns even if capitalized, but IMO language names are fine as proper nouns even if they can sometimes be pluralized, because they usually have a single referent. Benwing2 (talk) 22:45, 24 April 2025 (UTC)
@Benwing2 @Fay Freak AFAIK, SC linguistics classified ethnonyms and demonyms as proper nouns (which I also feel weird), an example ]here discussing the upper and lower-Case Letters of proper nouns in plural. In general, I just want to ask: how should we handle Šveđanin/Šveđanka? Previous editors put them in derived terms section which I think headword would be more proper. Ideas? Chihunglu83 (talk) 11:45, 25 April 2025 (UTC)
@Chihunglu83, Benwing2: Previous editors weren’t that equipped in coding of the templates or modules behind them. It would be more straightforward to have the female forms of demonyms and ethnonyms in the headword, since they are too necessary just to be entered as derived terms. They would have to be presented as nouns however, since even by comparison with other Slavic languages we can hardly wrap our heads around them being proper nouns.
For family names the situation is peculiar in Serbo-Croatian and Slovene, as opposed to Macedonians and any other Slavic-speaking nation; they don't print female surnames regularly. You can still form them with -ka for the wife of so-and-so and -ova / -eva for the daughter of so-and-so (in ⅔ of cases someone ending with -ić, following Serbo-Croatian naming customs), which is theoretically negligible historicizing use but apparently necessary already in journalistic reports: they are required if no female forename or the word gȍspođica or similar or anything indicating social gender precedes the surname, lest congruence with perfect verb forms be not maintained, Telegraf.rswrites it would be definitely wrong to mean a woman and write "stigla je Jovanović", jer to krši pravila o kongruenciji, budući da se predikat mora slagati sa subjektom rodu (ako glagol razlikuje rod), stoji u pravopisu srpskog jezika. Also in the oblique cases only the preceding word suffers inflection but not the surname if it is a woman, pozovite gospodina Jovanovića, ali - pozovite gospođu Jovanović. This would have to be relegated to inflection tables as the particular surname inflection type if Ben segues to the modularization of Serbo-Croatian noun inflections. (Again I intuitively say noun inflections since the idea of their being proper nouns is repugnant.)
In sum this means we have no case left where a Serbo-Croatian proper noun head needs a |f=. (Natively, since a Bulgarian mixed-sex immigrant couple is granted two gendered forms of their surname, which has little to do with the entry language as it would even appear in English.) Fay Freak (talk) 22:10, 25 April 2025 (UTC)
Should social media posts be able/enough to attest terms and "Scots problem" spellings?
(Notifying @User:Benwing2):
I have been told that as of current, social media are still not allowed for attestation due to not being durably archived. In that case, do you think Wayback Machine archival could possibly negate that concern? As a counterpoint still, however, we surely know the Internet Archive project is in a shaky state in general.
And how about underdocumented languages with no official orthographies (colloquially, presenting the "Scots problem"), like Sicilian and more? Do you think an exception could be made in the policy for these languages, where there might be no other way to encounter an "ad hoc" 'unorthodox' spelling if not on social media posts? Crunchy Cloaky Crackdown (talk) 23:34, 26 April 2025 (UTC)
I wonder who told you that? We have accepted entries based on social media attestation alone, although in practice users prefer to see more attestation than the bare minimum of three uses in twelve months. The policy (WT:ATTEST) requests that an internet archiving service is used when doing this - the Internet Archive is not the only one. This, that and the other (talk) 04:52, 27 April 2025 (UTC)
@This, that and the other It was me who mentioned this; I'm aware we have social media entries but I thought they were frowned on, since WT:CFI doesn't explicitly allow them but says they need community approval per source. The issue that @Crunchy Cloaky Crackdown is running into is that Sicilian isn't a well-documented language so the sources for it other than social media, esp. for "in-the-wild" spellings, are often lacking. I didn't realize there are other archiving services, but what happens if the only good social media posts aren't archived? Benwing2 (talk) 19:39, 27 April 2025 (UTC)
Technically yes, we need community approval per source, but there's been little to no enforcement on that policy in CFI in the past few years, which has led to a very laissez-faire attitude, where as long as no one notices, social media-based entries have been allowed. There's just not enough editors, time, or energy to monitor something like that. @Benwing2AG202 (talk) 19:52, 27 April 2025 (UTC)
It's because by linguistic standards, we can be confident it is not anyhow irrational an attitude, this is about best practice for “languages like that”, contrasted with prestige or imperialist languages. Realistically for most creoles and pidgins this is the most likely place anything at all is written. Then again you might have heard something in a piece of music, where these languages are more often present, and just cross-check that your spelling is not utterly off the wall, by this new means of support, for you would not get any frequency data, and formally published sources also maintain their quirks and would constitute biased selection. Fay Freak (talk) 03:04, 28 April 2025 (UTC)
Requested Entries
As noted at WT:RFVE#xanadu, sometimes someone adds a term to the Requested Entries list, someone else evaluates that it doesn't meet CFI and removes it, and then the same person or someone else re-adds it and someone creates it unaware of the earlier evaluation. It could be useful to have a way to track that a Requested Entries request was denied, and why. One idea would be to give each word its own headered section so (after tweaking aWa) rejected requests could be archived to talk pages; that'd make it more likely that if the entry was created someone could notice it was previously discussed and RFV it if needed, but it'd make the prior discussion invisible to anyone (re)adding a term to the main RE page. Alternatively we could keep all requests, and people's comments of why they couldn't be created, on the RE page, rather than removing 'dead' requests, but then the page will be huge. Soliciting other ideas! - -sche(discuss)23:41, 26 April 2025 (UTC)
Here's an idea: We create a gadget, similar to the translation adder, that lets people add to REE using a simple form (I'm envisaging two fields: the term itself, and a freetext field for a comment and links to sources).
Or if we prefer to use entry talk pages, the gadget could notify the user if the entry's talk page contains an archived REE "discussion" (since REE uses bulleted lists we could adapt aWa to follow that structure, or create a new archiving tool specially for the page). This, that and the other (talk) 04:57, 27 April 2025 (UTC)
It's becoming increasingly impossible to avoid separating topic categories by type. The more I work with {{place}} and geographic topics, the more I run into this problem. For example, Category:en:Mountains is supposed to be a name category, but not surprisingly in fact it contains a mixture of individual (named) mountains, types of mountains and terms related to mountains. I propose we do this incrementally, something like this:
Sorting of categories in their parent categories will ignore the ramification prefix "Individual", "Types of", or "Terms related to"; this will happen automatically in the category tree code.
The categorizing template {{C}} will accept abbreviated prefixes to indicate the ramification type: ind:, typ: or rel:, probably with shorter abbreviations i:, t: or r: (per User:This, that and the other I'm avoiding special characters for this purpose, which will be hard to remember).
Now, what happens if you don't use a ramification prefix? I propose that corresponding to each generic topic category, or maybe to a subset of them, is a default ramification type, whereby if you just write {{C|en|Astrology}}, it automatically goes into Category:en:Terms related to astrology, as if you had written {{C|en|rel:Astrology}}. I say "maybe a subset" because in cases like "Mountains", it the ramification type may not be obvious, but in the case of "Astrology", "Individual astrology" makes no sense and "Types of astrology", while possible, is less likely to be applicable to a given term than "Terms related to astrology". Similarly, "Musical genres" seems to be an obvious "types of ..." category, and "States of the United States" an obvious "individual ..." category. We already in essence have a default type for each topic, specified right in the category tree data modules. In the case of a topic category where we don't assign a default type, omitting the type dumps the page into the generic category.
The breadcrumb tree will have some way of indicating the ramification type that doesn't take a lot of space. In particular, given the parallel hierarchies described above, typically the top few categories in the breadcrumb trail will be either non-topic categories or special grouping categories, and the remainder will all be topic categories of a specific ramification type. For example, Category:en:Mountains has the following trail: Fundamental » All languages » English » All topics » Names » Places » Natural features » Mountains. Everything starting with Category:en:Places is an "individual ..." category so maybe the breadcrumb for this category will show in a smaller font with the assumption that everything below this is of the same type.
The only thing problematic about this scheme I can think of is that it may make autocompletion harder, since it works off of the beginning of a category. Someone searching for a specific ramified category related to a given topic will have to type Category:Individual ... or Category:Terms related to ... which is a fair amount of typing. The generic categories will still exist and facilitate autocompletion, but with this in mind, possibly the ramified categories should be named more like Category:en:Mountains (individual), Category:en:Mountains (types) and Category:en:Mountains (related to); and in this case maybe Category:en:Mountains (named) is better than Category:en:Mountains (individual). (Do all individual foo have names? Probably so ...)
One last thing is we might want to make a naming exception for certain classes of categories. For example, any geographic category of the form PLACETYPE in/of LOCATION such as Category:Counties of Texas, USA is almost certainly an 'individual' category (for example, I *suppose* there could be types of Texas counties and terms related to Texas counties, but there are unlikely enough to reasonably warrant a category for them). So maybe these classes of categories can be automatically ramified without having the ramification type noted in the category name. But maybe this is more trouble than it's worth.
I like much of this; in particular (I realized I should clarify my comment in the last discussion) I like the idea of still having the top-level categories like "Category:Mountains" both to group the subcategories and potentially even to categorize entries directly into in cases where there are too few entries of any one subtype to split them. Or maybe we shouldn't categorize anything into top-level categories, maybe for consistency we should enforce always subcategorizing? It's tricky because for some things (waterfalls?) there aren't that many types vs terms-related-to (so splitting would make for lots of small categories or (if we disallow categories with few entries) require in non-categorization, whereas in other cases there probably are so many terms-related-to and so many types that splitting them makes sense. (The curse of a dictionary is to encounter every edge case and I trust we will run into edge cases and spanners here, alas...) If the top-level categories still exist (and even if they don't), I am not sure I'm a fan of point 4; typing {{C|en|Foobar}} and having it result in "Category:Terms related to Foobar" for some values of Foobar, but "Category:Individual Foobars" for other values of Foobar (and "Category:Foobar" for some?) seems unintuitive, and like a recipe for people putting things into the wrong categories because they saw that {{C|en|Barfoo}} generated the category-type they wanted and so they assume {{C|en|Barbaz}} will too and are unaware it aliases differently. I admit that this means, as you say, people have to type long(er) category names when trying to add or find them, which is not great. - -sche(discuss)23:39, 27 April 2025 (UTC)
Yeah we can dispense with #4 if necessary, and simply make it so that typing {{C|en|Foobar}} always dumps into Category:en:Foobar. I also think we should ideally have every term go into one of the ramified categories rather than a generic category, and treat any terms in generic categories as cleanup opportunities. Unfortunately we can't force people to subcategorize into a ramified version of a given category until we've gone and cleaned the generic category, which will be a long process; but we can definitely mark individual categories as needing ramification (potentially even on a language-by-language basis), so that people aren't allowed to add to the generic version of that category (unless they use a manually marked up category, which is difficult to prevent unless we start using edit filters to disallow this).
We also need to consider how labels interact with the new category structure; I haven't thought this through. Benwing2 (talk) 23:54, 27 April 2025 (UTC)
Not a stupid question. I redid the category parent system a few weeks ago so that in general 'PLACETYPES in/of FOO' goes in first parent 'FOO' and second parent 'PLACETYPES in/of BAR' where BAR is the container of FOO. So you'll see for example, Category:Cities in Arizona, USA going under Category:Cities in the United States. However, if FOO is a country or country-like entity, the second parent is instead 'PLACETYPES' so that e.g. the second parent of Category:Cities in the United States is just Category:Cities rather than Category:Cities in North America (that last category shouldn't exist but it does because of the single entry Mexko in Central Huasteca Nahuatl, which defines this explicitly as a city in North America rather than a city in Mexico; I should fix both the entry and the categorization system so it won't categorize in such a situation). Due to a decision going back into the depths of time, England, Scotland and the like are made to perform like countries rather than administrative divisions, which is why they aren't in Category:Cities in the United Kingdom; it may have been @Donnanz who made a request of this sort since he works a lot on toponyms in the UK. But this is definitely open to change. As for all the dependent territories, they're more or less grouped into their own group so I can change group properties; maybe for example, Category:Cities in the Isle of Man, Category:Cities in the Falkland Islands and the like should have second parent Category:Cities instead of Category:Cities in the United Kingdom. If I do this at the group level it will likewise affect Puerto Rico, Guam, etc. but it can be done at the individual territory level. You're welcome to take a broader look at the current category structure and make some suggestions; I am not well versed in the subtleties of dependent territories (which I imagine differ from territory to territory). Benwing2 (talk) 19:42, 29 April 2025 (UTC)
That is not a stupid question at all. And yes, you bring up a good point. The way to do this is by editing modules. As for your point about the Channel Islands, granted, but I don't know that anyone is going to change this system based on the technicality. I'll leave that up to others, but try to fix the England/Scotland thing (as well as Northern Ireland and Wales). —Justin (koavf)❤T☮C☺M☯02:00, 29 April 2025 (UTC)
It looks like in Module:place, FIXME #25 addresses this and is related to FIXME #15 (which itself is resolved). So there may be some deeper reason why this hasn't been fixed yet. I'll see what other kinds of feedback we get here before editing unilaterally. —Justin (koavf)❤T☮C☺M☯02:05, 29 April 2025 (UTC)
@Benwing2: The only cities in the UK are in the constituent countries: England, Wales, Scotland and Northern Ireland. The Isle of Man, although it has the Ordnance Survey grid system, unlike Ireland which has a different grid system, is a Crown dependency. I think treating Douglas as a capital city is erroneous, it's the capital, sure, but no more than the largest town on the IoM (I visited it years ago). Whether it has city status, similar to official cities in the UK, which have royal approval, I don't know. The Falkland Islands are a British overseas territory, the capital of Stanley is a town, not a city (2,000 or so people).
The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.
Please share this message with members of your community in your language, as appropriate, so they can participate as well.
Initialism defs are allowed to point straight to the Wikipedia entry when no Wiktionary entry exists (whether 'not yet' or 'not ever' — either one). Thus, either {{init of|en|foo bar bar}} or {{init of|en|w:foo bar bar}} or {{init of|en|] ] ]}}, ideally in that order of preferability. I don't know where this fact is documented, if at all, but it is currently de facto true in thousands of entries. Quercus solaris (talk) 18:55, 29 April 2025 (UTC)
Looks good, thanks. I touched it up with an edit (diff), which shows how {{also}} can be used at top of page. The "senseid" element is optional, so don't sweat it if you don't care to. Quercus solaris (talk) 19:38, 29 April 2025 (UTC)
Is that "lightweight information that describes objects" or "objects of the lightweight information-describing variety"? The quaint, dated custom of using hyphens to make life simpler for readers would help. DCDuring (talk) 22:40, 29 April 2025 (UTC)
I see no policies for capitalization of Latin entries set out at Wiktionary:Latin entry guidelines or Wiktionary:Entry layout. Obviously, Classical Latin had no case distinction. It seems the earliest bicameral Latin texts may have arisen in Carolingian or even Merovingian handwritten texts, but I think it would be really tough to verify their usage. So I think there are two practical methods that we can follow: a) describe the usage that can be observed in printed Latin texts b) ignore that usage and just follow our own rules of capitalization based on logic, e.g. "Capitalize all proper nouns, lowercase all adjectives and common nouns". I'm inclined to go with following the usage of printed texts (whether editions of ancient authors, or original New Latin works), with entries for alternative case-forms when there are multiple in use. That would however mean that we would have capitalized entries for various words other than proper nouns, such as certain adjectives or common nouns referring to nationalities, ethnicities, locations, or in some cases types of mythological beings. As far as I'm aware, most English-Latin dictionaries (and at least some non-English ones) do use capitalization for words other than just proper nouns; for example, looking at Logeion, we see capitalization of Harpyia "a Harpy/harpy", Acherusius "Pertaining to Acheron", Ianuarius "January/Pertaining to January", Latinus "of Latium", for the indexed dictionaries other than the digitized Latino-Sinicum (it looks like in the original print version of this dictionary, all the entries were capitalized, and so a case distinction was not available when it was digitized). Urszag (talk) 20:29, 29 April 2025 (UTC)
@Urszag I oppose having capitalised forms for adjectives, as they're trivial, represent an artificial post-Classical distinction that serves no practical benefit, and they present a maintenance burden. Theknightwho (talk) 20:37, 29 April 2025 (UTC)
Having a single clear-cut rule has some advantages. I think it tends to mislead readers about the actual conventions typographers tend to follow for Latin text. Also, it would require always placing the main entry for nationality adjectives on a separate page from the main entry for nationality singular nouns, e.g. Hispānus; this is certainly doable, as in Romance languages, but I think in that case it adds maintenence burden. (A related issue is the lemmatization of these nouns at singular vs. plural forms. Traditionally, Latin dictionaries use the masculine plural rather than the singular as the lemma of nationality/tribe-name nouns: the nominative singular forms are much less common, and I think in some cases not even attested as nouns in Classical texts, which could be related to the general avoidance in Latin of using nominalized adjectives in the masculine nominative singular form; e.g. bonus tends not to be used by itself with the sense "a good man", even though "boni" is often used with the sense "good men").--Urszag (talk) 21:00, 29 April 2025 (UTC)
Right, I keep forgetting that ethnonym nouns are not considered to be proper nouns (I think I get confused because they're capitalized in French). In that case, they would be lowercase according to the rules that you favor, which does make things simpler. I don't think "holdover from English capitalization rules" is an accurate diachronic description of how the Latin convention came to exist.--Urszag (talk) 21:37, 29 April 2025 (UTC)
Following that convention, we'd write the first part of Caesar's Commentarii de Bello Gallico as follows: "Gallia est omnis divisa in partes tres, quarum unam incolunt belgae, aliam aquitani, tertiam qui ipsorum lingua celtae, nostra galli appellantur." It's not unreadable, but my opinion is that lowercase ethnonyms in Latin look strange and don't read as smoothly as using the normal capitalization. There are already cases where we make concessions to common conventions for the sake of readability (such as using punctuation, distinguishing "v" from "u", and not distinguishing "i" and "j").--Urszag (talk) 06:36, 30 April 2025 (UTC)
This Latin sentence caught my eye as I stumbled across this thread, and reading through it (unaware that you were discussing ethnonyms) I didn’t find notice anything strange about it at all.
Lowercase demonyms may be at odds with English orthography, but they are in line with the orthography of the Romance languages (although there is some variation in French). Nicodene (talk) 15:38, 30 April 2025 (UTC)
I am of two minds here. I like the logic behind @Theknightwho's suggestion of only capitalizing proper nouns (i.e. basically names), but at the same time as a general principle we should try to follow what other dictionaries do, esp. if there is a consensus. Logeion seems to show that all cited dictionaries on the site capitalize demonyms/ethnonyms and related terms (e.g. Hispanē(“in a Hispanic manner”)) except for Latino-Sinicum. (Du Cange shows up for terms like hispanus but that's because this dictionary writes headwords in all caps. The actual citations given for hispanus do capitalize the term.) I assume that most Latin dictionaries capitalize demonyms and derived terms because that's what Medieval and modern writers tend to do. (And I would guess that English capitalization rules for demonyms came from Latin rather than vice versa.) So on the balance I think we should go with capitalizing demonyms and derived terms even if it's somewhat illogical; if we do it the other way, we'd need soft redirects all over the place in any case from the capitalized to the lowercase versions, which would be somewhat of a pain. Benwing2 (talk) 23:58, 29 April 2025 (UTC)
As far as I know, we've been more inclined to follow editors' standards. I have moved some pages in the past (Camēnālis) and began making a list of should-be capitalized nouns and adjectives, I lacked systematicity in noting them but know for a fact it is all over the place. Saumache (talk) 06:21, 30 April 2025 (UTC)
I definitely think we should have a policy that goes beyond whatever an editor feels like. It seems pretty straightforward to use the practice of other dictionaries and of the published sources used for quotations as a criterion; e.g. the linked Lewis and Short entry writes Cămēnālis, so we can say that justifies having a capitalized entry for this word.--Urszag (talk) 06:36, 30 April 2025 (UTC)
I meant publishing standards, I am all for laying out and enforcing policies. Theorically, any word derived from a capitalized proper noun should be upper case as well, that is at least the rule I have been accustomed to reading Classical and Medieval Latin works in modern editions, which Wiktionary users are more likely to be reading than manuscripts, all with differing spelling standards. It's the same old issue that keeps getting brought back in various fora for word-by-word discussion (Wiktionary:Tea room/2025/March § quirinalis). Saumache (talk) 09:01, 30 April 2025 (UTC)
In my opinion, using prior dictionaries as a reference and practical criterion isn't a matter of "blindly following" anything. The fundamental principle behind this strategy would be documenting the usage of modern edited publications—just as we do for capitalization in languages such as English—rather than imposing some novel scheme, invented by us, that may not be attested in any Latin text that our readers are likely to see. So if it happens that dictionaries somehow make a mistake and contain a capitalized entry for a word that isn't actually capitalized in attested Latin publications, then we should correct that mistake, but I highly doubt that errors in this regard will be any more common than errors in e.g. noun genders or definitions. There are certainly generalizations that could be made on the basis of that data, which it might be helpful to record somewhere, but the philosophy here would not be to start with rules and decide how to capitalize entries based on that, but to start with usage, which I think is easy to observe in most cases.--Urszag (talk) 09:48, 30 April 2025 (UTC)
In other words, I'm proposing the primary criterion "if a word appears capitalized in Latin text, have a capitalized entry for it on Wiktionary. If it appears uncapitalized, have an uncapitalized entry." I think we can generally rely on dictionaries to accurately indicate which words are usually capitalized. As with other aspects of spelling, entries would be subject to RFV if someone suspects the indicated usage doesn't actually exist.--Urszag (talk) 09:58, 30 April 2025 (UTC)
@Urszag The issue is that we are always going to be using an artificial scheme of capitalisation, because Classical Latin - in which we will find a large number of attestations - did not have a capitalisation distinction. We are also a secondary source, not a tertiary source, so we are not bound by the same limitations as Wikipedia in blindly (and yes, it is blindly) following what other publications do simply because that is what they do.
I am not opposed to us including alternative entries with capitalisations if other users feel there’s a need for that (though I’m not sure I do), but we are at liberty to make the same editorial choices as the authors of all those other dictionaries who have chosen which entries they capitalise or not. This feels like an area in which we are prioritising a somewhat arbitrary distinction at the expense of usability for the reader, who likely does not care one bit for whether we capitalise the headword or not. Theknightwho (talk) 10:13, 30 April 2025 (UTC)
Also, to add to this: there is an important difference between us and print dictionaries (and their electronic versions), which is that print dictionaries are laid out with the entries one after the other, so capitalisation does not present any kind of impairment to a reader finding the entry they are looking for. By contrast, our choice of capitalisation affects findability, because a reader is far more likely to find entries by typing them into the search bar, and we lack the ribbon which lays out entries in alphabetical order to one side. Capitalisation can be expected under certain circumstances (e.g. proper nouns), but I'm not convinced that that translates over to adjectives and adverbs. An entry at Aegyptiacus is not helpful for a user looking up the second word of Spinosaurus aegyptiacus, for instance. Theknightwho (talk) 11:04, 30 April 2025 (UTC)
Binomial nomenclature has its own fixed rules for capitalization. I agree that capitalization affects findability. That's why I think it is best for us to use capitalizations that match the conventions used by the majority of Latin documents, even if this is less simple than using our own bespoke rule system. I'm not certain about the best way to implement this criterion in practice, and I'm fine with having guidelines to ensure consistency with which entry we set as the main and which as the soft redirect when both case-forms are used.--Urszag (talk) 21:27, 30 April 2025 (UTC)
I think even the idea to only capitalise proper nouns runs into the problem that proper nouns are defined differently in different languages. Are names of languages proper? Are names of people? Is there a difference between a name of the people and the name of a country for the speakers? Note how Ingrian does make a difference between soomi and Soomi, but that it also struggles in written text to make a distinction between soomen(“Finnish”) and Soomen(“Finnish”). Or how Polish Niemcy is plural.
Is a name really a name or is it a nicknames? Do nicknames get capitalised? What about metaphors? Where's the line?
So I think this is not as easy as "proper nouns", we need to define it further. This is also partly why I personally would favour lemmatising at the all-caps no-ujg Roman script, as less distinctions techincally give us less freedom to impose our own biases on the system. Thadh (talk) 11:32, 30 April 2025 (UTC)
Given the absence of a universal modern convention, I also find myself inclined to drop the upper/lower-case distinction and revert to following Roman practice, a solution which could also finally decide the matter of j/v versus i/u. Nicodene (talk) 12:44, 30 April 2025 (UTC)
We are not only a dictionary of Classical Latin, though. New Latin is covered as well, and New Latin is usually written bicamerally. Communicating capitalization norms is relevant to anyone who wants to read or write in New Latin or who wants to read modern editions of Latin texts. Using spelling that diverges from New Latin authors and editors could pose an obstacle to our readers, albeit not an insurmountable one. I think it’s safe to assume that almost none of our readers will encounter Latin primarily in the form of ancient unicameral inscriptions and manuscripts.--Urszag (talk) 21:27, 30 April 2025 (UTC)
I also agree with Urszag here; in particular, I would find it unintuitive and unprofessional/wrong-seeming to find (say) Romulus and Jupiter rendered unicamerally as romulus and iuppiter. (But if a significant body of texts render them that way, I would not personally have any opposition to creating soft redirects from those and other unicameral titles.) - -sche(discuss)22:48, 30 April 2025 (UTC)
Which capitalization norms? The norm of capitalizing demonyms or the norm of not capitalizing them? What about adjectives that coincide with demonyms, titles of people or divinities, sobriquets or noms de guerre, non-proper nouns of religious significance in Christianity, days of the week or months?
I don’t see what’s so bad about writing ⟨iuppiter⟩ or ⟨ivppiter⟩, as native speakers actually did, but if modern-style capitalization is a must then I suppose we’re left to choose between:
1) Making up a set of rules ourselves.
2) Following whatever rules happen to be used in some modern source that publishes extensively and carries some kind of authority (the Vatican?)
Regarding the "where's the line?" question, assuming we do go with the rule of only capitalizing proper nouns, it would be as simple as using the same criteria that we use for the part-of-speech header "Proper noun".--Urszag (talk) 22:06, 30 April 2025 (UTC)
@Urszag: have you not read the rest of the comment I posted? That line is different for different languages. We will need to invent one for Latin. Thadh (talk) 05:33, 1 May 2025 (UTC)
My point is that these criteria are needed in any case, unless you're proposing that we forbid the use of the POS header "Proper noun" from Latin and convert all of its existing uses to "Noun". That would be another change to the established style for Latin entries. I'm working now to add a summary of what kinds of terms in Latin are proper nouns to Wiktionary:Latin entry guidelines.--Urszag (talk) 05:39, 1 May 2025 (UTC)
@Urszag: if we went with a lack of capitalisation we could, yes. But if you're willing to make an exhaustive list of what makes a Latin proper noun, then I guess it's fine as well. Thadh (talk) 05:42, 1 May 2025 (UTC)
Why would we change the part of speech? We’d just be decapitalizing the first letter (or capitalizing/small-capping the other letters) in all Latin lemmas where they, for whatever reason, are capitalized currently. Including things other than proper nouns, like Februarius, Hispanus (the adjectives). Nicodene (talk) 21:09, 1 May 2025 (UTC)
@Nicodene: Thadh asked "Where's the line?" between nouns and proper nouns in Latin and suggested that the distinction is unclear. I was simply responding that per current practice, we need to draw that line to determine the part-of-speech header, so decapitalizing by itself would not eliminate the need to answer that question. If we can't provide consistent guidelines for answering that question, that would constitute a reason for not only decapitalizing, but also getting rid of the part of speech "Proper noun" in Latin entries. But I am optimistic that we can identify reasonable rules. I have edited Wiktionary:Latin_entry_guidelines#Proper_nouns to add some guidelines that I believe will not be controversial. However, some cases may be more difficult, such as names of holidays, religions, doctrines, or political movements.--Urszag (talk) 21:32, 1 May 2025 (UTC)
I see.
One might try solution #2 above and choose a specific source—like publications from the Vatican, or one of the aforementioned Latin dictionaries—as a point of reference for capitalization, or for orthography overall. Then it’s just a matter of describing what kinds of words that source happens to capitalize, which may not fit into a neat grammatical rule. Nicodene (talk) 22:21, 1 May 2025 (UTC)
@Nicodene I suspect we can probably fold the capitalisation issue into the multiple-spelling issue, in the sense that transclusion is probably the way to go. Theknightwho (talk) 22:31, 1 May 2025 (UTC)
I haven't read the entire above discussion, but I've never seen a modern edition of De Bello Gallico that didn't capitalize the demonyms, regardless of the nationality of the editor. Looking at the Wikipedia articles about that work in all the major modern Romance languages, I see that Spanish, Catalan and Romanian do not capitalize the demonyms, while French, Portuguese and Italian do capitalize them, so the modern languages are split 50/50 on the issue. I would definitely find it jarring to see nouns like Hispanus, Gallus, Celta and Germanus lower case. I'd prefer the corresponding adjectives to be capitalized as well, but seeing them lower case is much less jarring than seeing the nouns that way. —Mahāgaja · talk13:57, 6 May 2025 (UTC)
The i/j distinction in Latin
The current practice in Latin entries is to make no distinction between ⟨i⟩ and ⟨j⟩ in entries; instead, we use ⟨i⟩ in all circumstances. I think we should make the distinction, for a few reasons:
It leads to ambiguity. For instance, the term adjuvō currently has its main entry at adiuvō, but the lack of i/j distinction means that adiuvō is ambiguous as to whether it is 3 syllables (ad-iu-vō) or 4 syllables (ad-i-u-vō). This same ambiguity affects Iēsūs, which notes that it has both 2-syllable (Iē-sūs) and 3-syllable (I-ē-sūs) readings. This ambiguity does not arise in our entries due to the u/v distinction, where we distinguish servit (2 syllables: ser-vit) and seruit (3 syllables: se-ru-it).
It is inconsistent with how we handle ⟨u⟩ and ⟨v⟩. Classical Latin made neither distinction, meaning that u/v were represented by ⟨V⟩, and i/j were represented by ⟨I⟩ (e.g. Venus was ⟨VENVS⟩, and Jēsūs was ⟨IESVS⟩). Our page WT:Latin entry guidelines states that this is because "the distinction between I and J only appears post-Classical Latin", but the same also applies to U/V, so I'm uncertain why the current editorial practice was chosen. My suspicion is that it's because this is a popular practice in modern scholarly editions.
However, what makes sense for scholarly editions is not necessarily what makes sense for us, because our primary aim is not faithfulness to the original source material, but to make phonemic distinctions as clear as possible to readers. As a point of comparison, scholarly publications rarely include macrons, but that does not mean we should exclude them, because they represent an important phonemic distinction that existed in Classical Latin. Likewise, the distinction between ⟨i⟩ and ⟨j⟩ represents an important distinction that we should not be glossing over.
Pronunciation sections are inadequate for making the distinction, just as they're inadequate for giving the length distinction shown by macrons. Most readers can't read IPA, and we shouldn't assume that any can.
There's nothing stopping us giving soft redirects at alternative spellings anyway.
I agree that our current practice of reserving v for the consonant and u for the vowel(s), while making no corresponding distinction between j and i, is illogical. I would strongly prefer a consistent alternative, whether that means making both distinctions or neither of them. I am a bit more inclined to distinguish neither, for the reasons mentioned in the thread above this one, but your point about indicating important phonemic distinctions—in a way that readers unfamiliar with IPA can understand—is valid. I think it bears mentioning though that applying the latter principle consistently would also mean distinguishing vowel length in lemmas (not just in headwords) as well as distinguishing diphthongs from sequences of monophthongs.
ETA: given that we handle vowel length with diacritics in the headword, couldn’t we also do that with glides? For instance ⟨iam⟩ for the lemma and ⟨i̯am⟩ for the headword. Nicodene (talk) 13:55, 30 April 2025 (UTC)
Are there many cases where the whether something is a glide or a vowel is not clear from the phonological structure of the word though? It may be more worthwhile to include diacritics for hiatus (aï, aü?) than the other way around. Thadh (talk) 14:02, 30 April 2025 (UTC)
There are indeed far fewer unpredictable cases, such as iambus with /i/ or belua with /u/, than predictable ones. Incidentally it seems iambus is already written with a diaeresis in the headword, as you suggest, but not (yet) Iason, io, or iulus.
As for the letters C and G, they may not have been distinguished in Old Latin but they were in Classical Latin (and ever after). Nicodene (talk) 14:46, 30 April 2025 (UTC)
In that case as TKW says - if we do split off Old Latin, we should have no distinction there, if we don't, we don't. Thadh (talk) 15:36, 30 April 2025 (UTC)
@Nicodene What would be the benefit of ⟨i̯am⟩? I think this could be covered by my past suggestion of having a box that shows the various different orthographies. I'm not sure that making no distinction would be of benefit for the average user, however, who is likely to find the lack of distinctions to be confusing and/or actively unhelpful in a way that the macronless spelling likely wouldn't be.
My main issue with the alternative suggestions (namely, ⟨i̯⟩ and ⟨ï⟩) is that they represent other ways of making the same distinctions that I've suggested we make here but in a way that is less familiar to the average user (especially in the case of ⟨i̯⟩). I'm not entirely sure what removing the distinction in favour of these would achieve. Theknightwho (talk) 14:49, 30 April 2025 (UTC)
Consistency. We otherwise follow typical Roman orthography in our lemmas. It’s not clear to me why the distinction between /j w/ and /i u/ would be more deserving of a special exception than the one between long and short vowels, or the one between diphthongs and adjacent monophthongs. Or why some of these should be indicated in the lemma while others are left to the headword. I’d be inclined to treat them all the same way, whatever that may be.
I suppose not everyone knows what a diaeresis is for, but by the same token not everyone knows what a macron is for either, yet we do use them. Nicodene (talk) 15:21, 30 April 2025 (UTC)
As I said in the above post, I would personally prefer going towards not distinguishing the two, and probably not distinguishing c and g, either. Thadh (talk) 14:00, 30 April 2025 (UTC)
Support, it's easier to remove, than add, information. Lemmatizing at spellings with the i/j u/v distinction, and generating the spellings without the distinction, would also be better for search results. — BABR・talk16:20, 30 April 2025 (UTC)
Hello! I am happy to see interest in general infrastructure for Latin, which has not been very lucky in this regard. Nonetheless, I support the current normalisation. To address the remarks you made:
1–2. It leads to ambiguity and t is inconsistent. Yes, though that is true for many, if not most, languages. The i-u-v scheme we are using is, in my experience at least, the most used among both modern and not-so-modern publications. The consistency and/or unambiguousness of alternative schemes does not, in my opinion, make up for their being less common. As a side note, the difference in opinions here may depend on educational background. In Italy what you are suggesting would not have been taken seriously, the scheme used is unshakably i-u-v and has been that for quite some time and I suspect that is the case for most of Europe, while it seems that the i-j-u-v scheme gained wider use in English-speaking contexts than it did here.
3. As a point of comparison, scholarly publications rarely include macrons While running text may lack macrons, I do not recall seeing a modern dictionary without them. Lexicographical diacritics, a cross-linguistical concept, are not comparable to orthographical normalisation choices. A better analogy would be Nicodene's proposed i̯am, which although I have never seen elsewhere, does look compelling. ur primary aim is not faithfulness to the original source material, but to make phonemic distinctions as clear as possible to readers. I disagree with this. It is a legitimate viewpoint, especially among laguages with a less crystallised orthography, but not a universal principle, and in this context I do not think it is the best course of action for a general-purpose resource like Wiktionary.
4. Pronunciation sections are inadequate for making the distinction. Pronunciation section are by definition the best place to hold pronunciation information. If IPA is too technical we could find another way to show it, like has been done for English, without changing the spelling.
I think this proposal puts too much weight on historical accuracy and not enough on the way Latin has been taught and used for the last two thousand years: Latin never died. Catonif (talk) 16:42, 30 April 2025 (UTC)
I suspect that the i-u-v scheme came from Italy to begin with, judging by the identical one used for writing the Italian language itself.
Perhaps this would be a good time to compare the lemmatization practices of the ‛big boys’ in Latin lexicography:
This favours a four-way distinction. In principle, I'm not opposed to using ⟨ĭ⟩ to denote syllabic ⟨i⟩, but it seems unwise to rely on the presence or absence of a breve, given that editors frequently omit length information when it isn't readily available. Theknightwho (talk) 23:39, 30 April 2025 (UTC)
Thank you for the overview, exactly what the discussion should have had to begin with. It seems that the i-j-u-v scheme has a greater popularity among English-language lexicography (and even Gaffiot 2016 mentioned by Benwing!) than I imagined. And yes, the i-u-v scheme likely developped in the context of Italian, so take my opinion with a grain of salt. Nonetheless, TLL's approach would be my favourite, keeping the i-u-v orthography while indicating the phonemic distinction (Iēsus vs. Ĭēsus). As a side note, perhaps we could use the breve for other istances of hiatus as well, e.g. coĕmō instead of coëmō, although this may be less understandable. Catonif (talk) 19:37, 1 May 2025 (UTC)
Oppose: I don't have a strong personal preference, but I think it's best to continue using the i-u-v scheme because I agree with Catonif that it is the most widely used. I strongly expect therefore that most of our users will prefer it and find it the most familiar system. Alatius ran a poll around 2010 surveying 251 users of certain online Latin discussion forums, and apparently found the i-u-v scheme to be "by far the most popular", although unfortunately I think the images showing the precise poll results have been lost: "Survey of Latin orthography preferences", Alatius, 2010, Textkit Greek and Latin. I think that it is very unusual nowadays to write Ecclesiastical Latin without the u-v distinction, so the i-u scheme feels somewhat biased against this form of Latin. I know that some academics have adopted i-u in recent publications, but I think it is still rare in textbooks and introductory learning materials. Also, I think most people who use the i-u scheme in lowercase use the I-V scheme in uppercase (i.e. the uppercase counterpart of "u" is "V"), which is an added complication that would be tricky for us to handle. Therefore, my second-place preference would be i-j-u-v.--Urszag (talk) 22:32, 30 April 2025 (UTC)
I would also oppose any change that eliminated the u/v distinction. However, whatever we decide, we absolutely need to find a consistent way to handle the i/j ambiguity, because the current approach is inadequate. Theknightwho (talk) 23:32, 30 April 2025 (UTC)
I don't think we can always represent both conventional Latin spelling and Classical Latin pronunciation with a single headword spelling in a way that looks natural and isn't a mess of diacritics. We don't do that for other languages; I realize Latin is a little different since we can't illustrate Classical Latin pronunciation using audio files in the way that we can for e.g. English, but I still think that we should take advantage of having a separate dedicated pronunciation section, as Benwing mentioned. Even if we use i-j-u-v, there's still cases where aspects of pronunciation will not be apparent just from spelling. There are words like obiciō, where the standard spelling uses a single letter "i" to represent the consonant "j" followed by the vowel "i". There are words like biiugus, where the consonant "j" is single between two vowels because of the prefix-base boundary, in contrast to cases like eius where the consonant "j" is pronounced double. There are words like abripiō, where the "br" is always split across syllable boundaries in Classical Latin pronunciation because of the prefix-base boundary, in contrast to the "br" in a word like celebrō, where both consonants are normally pronounced together at the start of a syllable as a complex onset cluster. There are words like illūc, with stress on the final syllable. If we go with "Most readers can't read IPA, and we shouldn't assume that any can", I think the solution to that would be to present a non-IPA respelling in the pronunciation section, like Template:enPR for Latin. So we would have adiuvō on the headword line, but something like "AD-ju-vō" or "ád-ju-vō" in the pronunciation section before the IPA transcriptions (currently /ˈad.i̯u.u̯oː/, , although I personally would prefer that we revise Latin IPA to use "j" and "w" instead of "i̯" and "u̯"; you can see that the implementation is currently buggy and incorrectly fuses sequences of vowel + semivowel in phonetic transcriptions). I think having Module:la-IPA generate such a non-IPA respelling would be pretty straightforward.--Urszag (talk) 00:08, 1 May 2025 (UTC)
Actually, here's a somewhat more radical proposal: I think it might be good to eliminate IPA phonemic transcriptions from Module:la-IPA. I don’t think most readers know what the difference is between // and anyway: I’ve seen people elsewhere online misunderstand our entries as displaying two separate pronunciations. (For example, the author of this recent Reddit comment thought it was an alternative pronunciation: “I see these pronunciations listed for classical Latin: /kon.stan.tiːˈno.po.lis/, A little googling showed me that the lowercase j in superscript position means palatalization, so I guess that was an alternative pronunciation sometime, somewhere.”) It may be clearer to use an obvious non-IPA respelling showing phonemes and syllable divisions, followed by (reasonably broad) phonetic transcriptions for Classical Latin and Ecclesiastical Latin respectively. That way, we also can avoid some tricky questions such as what the phonemic identity of word-final -m was in Classical Latin, and whether assimilations such as bs > operate on the phonemic or phonetic level. Going back to adiuvo, my proposal would be for its pronunciation to be displayed as follows: "AD-ju-vō, Classical Latin IPA(key): , modern Italianate Ecclesiastical IPA(key): "--Urszag (talk) 00:45, 1 May 2025 (UTC)
@Urszag I'm all in favour of overhauling Module:la-IPA (though not everything you suggest), but there are a couple of things here:
It needs its own thread, as it's a separate question to which orthography we use.
I don't understand what the attraction of using these bespoke, ad hoc standards is, when we are perfectly able to handle multiple spelling conventions. Nobody is suggesting that we remove the entry at adiuvo if adjuvo is made the primary lemma. This is not an either/or situation. In fact, we can make use of transclusion to ensure that we don't even lose information on alternative entries, so all that is achieved by this is to make things more difficult from a technical perspective, requiring a higher degree of maintenance from editors to ensure that the templates are fed the correct info. This is not helpful. It's one thing to make things more accessible to users; it's entirely another to do so at the expense of users who benefit from clear phonemic information.
So far, in a proposal to be more precise about the information given in headwords, we have propsoals to (a) eliminate u/v as well, and (b) to remove phonemic information from pronunciations altogether. Collectively, we seem to have forgotten what the point of a dictionary is. Theknightwho (talk) 00:57, 1 May 2025 (UTC)
My point I guess is that any attempt to indicate all pronunciation information in the headword spelling essentially turns into a bespoke non-IPA transcription system. If you aren't in favor of indicating all pronunciation information using respellings like "AD-ju-vō", I don't understand why you think it's unacceptable to omit the fairly predictable i-j distinction in this context. "Inadequate" is a strong word to use for the popular i-u-v scheme.
I'm not married to the idea of removing the IPA phonemic transcriptions, but I'm not sure how to reconcile you first arguing that we should expect them to be useless for most of our readers, and then arguing that we need to retain them. It isn't a difficult task to infer them from the spelling along with the phonetic transcriptions.--Urszag (talk) 01:12, 1 May 2025 (UTC)
@Urszag But the pronunciation section is not the headword, and it is a well-established practice to include certain phonemic distinctions in Latin headwords. I'm not sure why it is necessary for me to demonstrate the purpose of headwords ab initio when proposing a relatively minor extension by analogy to a system we have used for 20 years. There are an awful lot of barriers being thrown up here, and we are losing sight of the original point of the proposal, which takes for granted the fact that headwords already include certain phonemic information that is not necessarily distinguished in running text.
And yes, the i-u-v system is inadequate for dictionary purposes. That does not mean it is pointless, that I dislike it, that we should ignore it etc. etc. It simply means that it is not adequate for our needs, and "fairly predictable" doesn't cut it for a dictionary, especially when you're proposing an ad hoc system to get around it instead of using a well-established standard that is both intuitive and widely understood. None of that precludes us having entries that use i-u-v, though, and we can make use of transclusion to get around that problem.
My point about users not necessarily understanding IPA was to drive home the point that the headword should contain as much phonemic information as possible; it was not to suggest that we should add to the confusion by only giving phonetic information, thereby making the phonemic difference between ⟨i⟩ and ⟨j⟩ even less clear. That would be a huge backwards step. Theknightwho (talk) 01:20, 1 May 2025 (UTC)
@Urszag: I agree with using ⟨j w⟩ for the IPA. I also agree that displaying both // and is excessive, but why not remove the ? It seems rather strange for a dictionary to auto-generate purported phonetic pronunciations dated to two millennia ago. (Not to mention that this encourages all sorts of invented pseudo-precision like for /d t/ and for /-um/.)
As for indicating pronunciation with non-IPA respellings, is it going to be any easier for readers to understand than, say, the symbols in /ˈadjuvo:/? We already expect readers to know basic IPA to access pronunciation in general on Wiktionary, or at least to follow the auto-generated link to an IPA key.
Ah, I'll start a third topic on this since Theknightwho pointed out that it's really another conversation. I think /ˈadjuwo:/ is relatively accessible (which is why I'm not sure I agree with Theknightwho's argument that "Most readers can't read IPA, and we shouldn't assume that any can"), but I think IPA stress, length, and syllable division marks are all probably less immediately intuitive to most readers than a non-IPA convention of marking stress with acutes or uppercase, length with macrons, and syllable divisions with hyphens.--Urszag (talk) 05:35, 1 May 2025 (UTC)
@Theknightwho: I do like your proposed system more than the current one, but help me out here: what makes it preferable to i-u? If we prioritize reader comfort/familiarity, aren’t we better off with the status quo of i-u-v? If we prioritize making phonemic distinctions in the lemma, then why omit other important ones (/V/≠/V:/, /VV/≠/VV̯/)? Or for that matter why not leave these distinctions to the pronunciation section, since that’s what it’s for anyway? Nicodene (talk) 04:03, 1 May 2025 (UTC)
@Nicodene Given the complete lack of consensus for any system (every possible system seems to have been proposed), an alternative approach might be as follows:
We use transclusion to display entries at multiple different lemmatisations. This requires a nontrivial amount of work to put together a module which is capable of doing this, but given the differences are orthographic and regular, the final product should be relatively painless for the average editor to actually use.
In the back-end, modules should use a maximalist distinction, on the principle (stated by @Babr) that distinctions are easier to remove than to add. As such, the actual working modules would make all relevant distinctions. Given that the system would be a technical one, it's not especially important what we use, so long as it's capable of making all of the relevant distinctions.
In terms of display, this could then be converted to the relevant system, be it Classically-faithful, the standard i-u-v system, or whatever.
Importantly, this would clear up the problems that we currently face with headword and inflection templates, which will currently generate nonsense like præacutae or uāgīvī.
This should be a way to ensure that no particular scheme gets prioritised, from a user perspective.
I like the idea in general of using transcription to avoid duplication and circumvent problems of where to lemmatize. I proposed a very similar idea before with Punjabi, where there are two writing systems (Gurmukhi, a South-Asian-style abugida used in India, and Shahmukhi, a Perso-Arabic script used in Pakistan). For the most part, neither script is losslessly convertible from one to the other (although maybe Shahmukhi with the right vowel diacritic system can be converted to Gurmukhi, but this may not always be the case and it's difficult to enforce the correct use of diacritics in most Perso-Arabic scripts, which outside of Arabic typically have no native tradition of doing so); and lemmatizing at either script is a potential political statement that we'd like to avoid. My proposal was to lemmatize using a maximalist romanization that captures all the distinctions in both scripts, and use transclusion to generate the appropriate lemma entries in the two scripts. Something similar can and should be done for Serbo-Croatian, which currently has duplicated entries everywhere. There are numerous technical issues to work out, esp. with the case of Punjabi, e.g. where to put the underlying romanized lemmas (in an appendix?), but it's definitely feasible. If you go down this route, it will be important to design such a system with other languages than Latin in mind, so that when the time comes to use it for Punjabi or Serbo-Croatian, we don't have to start over from scratch.
That said, I'm a bit leery of adopting such an approach here, because the number of lemmas where it would be used is only a subset of the whole, and it will impose a non-trivial technical burden on anyone wanting to add lemmas that might not be worth it considering that we're talking about a letter here and there vs. a fundamentally different script. Benwing2 (talk) 06:27, 1 May 2025 (UTC)
@Benwing2 I think there are a couple of things that should ease the burden for anyone adding lemmas:
Assuming we use the maximalist spelling as the baseline (i.e. all the distinctions), all other spellings should be trivially-derivable in 99% of cases. Morpheme boundaries may be awkward, but these should be pretty rare. This should require very little input from the user, but I think it makes sense to integrate any alternative spelling display with the pronunciation, given that anything which affects one will affect the other (e.g. if ⟨ae⟩ straddles a morpheme boundary, {{la-IPA}} should be using a.e anyway).
Generating alternative entries can be done via acceleration. The actual wikitext should be minimal, as everything should be transcluded. This also goes for any inflection templates etc etc.
As a sanity check, we will want some kind of inverse-inflection capability, similar to that in {{es-verb form of}}; this goes for alternative spellings and inflections (and combinations thereof).
OK, I'm not quite understanding where you would put the maximally-spelled lemmas. Would they be mixed in with the regular lemmas, or segregated into an appendix or something? In the former case, how do we ensure that they don't show up in categories? (Are you planning on introducing some sort of special flag in Module:headword to mark "underlying source" lemmas like this, like we do for alternative forms?) Benwing2 (talk) 06:41, 1 May 2025 (UTC)
@Benwing2 The maximal spelling would be the "real" lemma, in the sense that it would contain substantive info that users might want to edit. The other entries would simply be pointed at it.
There does need to be a plan to avoid flooding the lemma category, yes. I'll have a think about how best to do it. Theknightwho (talk) 07:04, 1 May 2025 (UTC)
Do you mean something like the ‛mirror’ that I described here?
{{head}} has a parameter |altform=1 which excludes altforms from the usual part-of-speech categories and instead dumps them into ‛Category: alternative forms’. Nicodene (talk) 07:32, 1 May 2025 (UTC)
Yes I understand that, but my question was rather: (1) where will the "real lemma", as you call it, live? In the mainspace or in an appendix? (2) What counts as maximal? Is it the i/j/u/v form or does it include macrons for long vowels and maybe breves for short vowels? If yes to macrons and breves, what about unclear cases? (3) If it lives in the mainspace, what happens if the maximal spelling happens to agree with one of the externally visible spellings? Benwing2 (talk) 07:45, 1 May 2025 (UTC)
Mainspace. It should be possible to transclude the content of the entry, cutting/amending parts as necessary.
Maximal would mean i/j and u/v; any macrons or anything else can be taken from the headword. In theory, this could be taken from any entry, so long as the pronunciation section is complete, but using the maximal spelling builds in redundancy (e.g. the pronunciation and headword at the main entry should be compatible).
I’m not sure I understand this point: the maximal spelling would be a real entry, and I’m not suggesting we create pages with macrons or anything like that. Theknightwho (talk) 08:45, 1 May 2025 (UTC)
Oppose as a frequent user of Wiktionary for looking up Latin. When I look up a word, I look it up using the spelling I see. Most Latin texts I've encountered, including all modern Ecclesiastical texts, maintain the u-v distinction but do not use J. The reason for this is simple: in some pronunciation schemes of Latin (like Ecclesiastical Latin) there is a more significant (and less intuitive) pronunciation difference between u and v than between i and j. In our current system, users can still look up J forms, but since they're less likely to encounter those forms in the wild, I don't see why they should become the lemmas. Andrew Sheedy (talk) 02:12, 1 May 2025 (UTC)
@Andrew Sheedy Yes; currently we have soft redirects from Latin words with j in them to the corresponding lemmas with i in them. If we switched to lemmatizing at the j, we'd essentially reverse the direction of soft redirects, and e.g. ianua would be a soft redirect to janua instead of vice-versa. Benwing2 (talk) 06:04, 1 May 2025 (UTC)
@Benwing2 @Andrew Sheedy On this point, I've just proposed a system which would avoid prioritising any one system above, which should be a way to circumvent the issue of different people preferring different systems, or otherwise we'll always leave someone unhappy. Theknightwho (talk) 06:09, 1 May 2025 (UTC)
I would be fine with such a system. I agree with Benwing that it would be good to look beyond Latin and try to find a solution that would be compatible with other languages with multiple orthographies. Andrew Sheedy (talk) 20:07, 1 May 2025 (UTC)
I'd be happy with a technical solution that eliminates the need for soft redirects in either direction between i/j and u/v variants, assuming it has good performance and doesn't add much difficulty to creating new entries.--Urszag (talk) 20:47, 5 May 2025 (UTC)
@Mahagaja In brief: there is no consensus between editors for which spellings they prefer; some prefer i/j, some prefer the status quo, and some prefer no u/v distinction either. In addition, users may encounter a wide variety of spelling schemes, our entries are generally poor at accommodating these outside of the most common words, and Latin is far more likely than most languages to have many one-off/occasional users with low understanding of the language, due to the historical status of Latin as a lingua franca.
To get around all these issues, and to avoid massive duplication, I have proposed that we retain one spelling as the main entry (as now), but use a special template to transclude the content of entries to entries at the other spellings (which would have minimal wikitext). This is a system that already works well in other languages, though there is no need to follow the exact same system as those. Theknightwho (talk) 14:21, 6 May 2025 (UTC)
Oppose. Using j in Latin just looks hopelessly old-fashioned, and (unlike u and v) they never contrast. Wiktionary would look like it was being written in 1825 instead of 2025 and there would be no benefit. —Mahāgaja · talk13:59, 6 May 2025 (UTC)
No, it isn't. From w:Minimal pair: "In phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element, such as a phoneme, toneme or chroneme, and have distinct meanings" (emphasis added). Iēsūs and Jēsūs don't have distinct meanings, so they're not a minimal pair. And what's the evidence for a trisyllabic pronunciation anyway? —Mahāgaja · talk14:31, 6 May 2025 (UTC)
@Mahagaja We do not need a perfect minimal pair in order to see that a distinction between adjuvō and gladius, which is made evident by the fact that coadiūtor has the wrong pronunciation because it has been wrongly assumed to have syllabic ⟨i⟩. Theknightwho (talk) 14:47, 6 May 2025 (UTC)
The fact that {{la-IPA}} hasn't been written carefully enough to get the pronunciation of coadiūtor right doesn't prove that the distribution of /j/ and /i/ is unpredictable. Neither */a.diˈuwoː/ nor */ˈɡlad.jus/ is a possible word of Latin. —Mahāgaja · talk14:54, 6 May 2025 (UTC)
@Mahagaja In what way is the pronunciation of coadiūtor deducible as coadjūtor from the spelling? That's aside from examples like Gāius, where the syllabification can be traced over time reducing from 3 syllables to 2 (). You are repeating a common dogma that doesn't actually reflect reality.
Neither /a.diˈuwoː/ nor */ˈɡlad.jus/ is a possible word of Latin. Explain, because you seem to be making etymological inferences. Theknightwho (talk) 14:57, 6 May 2025 (UTC)
I guess it's the morphology rather than spelling that clinches it. Anyway, when is Iēsūs ever trisyllabic? I've certainly never heard it pronounced that way in Church Latin in my 40 years of singing in church choirs. —Mahāgaja · talk15:10, 6 May 2025 (UTC)
@Mahagaja No, the morphology simply isn't relevant. You run into more difficulties when you compare injūrus with paliūrus, where the distinction is clear due to the etymology, but completely opaque phonemically. I can have a look for evidence of the syllabification of Iēsūs, but you can't ignore the evidence for the two different syllabifications of Gāius. Theknightwho (talk) 15:17, 6 May 2025 (UTC)
Though for what it’s worth, I’ve been singing choral music for over 20 years and I’ve never encountered it either, so it may be limited to Late Latin. Theknightwho (talk) 15:39, 6 May 2025 (UTC)
The two different syllabifications of Gāius are irrelevant because (1) once again, it's the same word, so not a minimal pair, and (2) they belong to different time periods. It isn't a contrast. And iniūrus has a morpheme boundary that paliūrus doesn't. —Mahāgaja · talk16:08, 6 May 2025 (UTC)
@Mahagaja The morpheme boundary is irrelevant to deriving the pronunciation from the spelling, and using that would prevent u/v from being contrastive as well: servit (serui-t) and seruit (ser-u-it). It's a specious argument, as is the argument about time periods (we aren't representing one time period) and phonemic contrast (this would prevent us representing poetic metrical differences at all). Theknightwho (talk) 16:14, 6 May 2025 (UTC)
The morpheme boundary is absolutely crucial to the question whether or not the distribution of /i/ and /j/ is predictable. As for Iēsūs, if a trisyllabic pronunciation ever existed at all, I would expect it to be early, used when the name was a relatively unfamiliar borrowing from Greek and the Greek trisyllabic pronunciation was being copied. As the name became more familiar through Christianization, the much more nativelike disyllabic pronunciation probably ousted any foreign-sounding trisyllabic one. —Mahāgaja · talk17:06, 6 May 2025 (UTC)
Julius has three syllables with /j/ and his supposed progenitor Iulus also has three syllables with /i/. There's no way to predict this automatically. jam has /j/ but its derivative etiam has /i/ even though there's a morpheme boundary after the et- (and if you argue there isn't, your reasoning is circular). In general the only difference between the i-j distinction and the u-v distinction is that the latter has more functional load, but both are phonemic and IMO there's no good reason for making one distinction but not the other. Benwing2 (talk) 20:06, 6 May 2025 (UTC)
There are too many cases where the difference is unpredictable. Cf. am versus ambus and the cases listed by Cser (2016: 14) such as bela versus sila. Nicodene (talk) 22:30, 6 May 2025 (UTC)