Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2024/December. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2024/December, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2024/December in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2024/December you have here. The definition of the word Wiktionary:Beer parlour/2024/December will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2024/December, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
Latest comment: 22 days ago5 comments4 people in discussion
In the modern Dutch alphabet, the digraph ij is used instead of y (although it's often written like a y with an umlaut), but in Early Modern Dutch y was used (up until 1804 apparently). But if you search Wiktionary for any of the y versions, you won't find them. Should we be including the y versions in Wiktionary? And if so, should they be listed under Early Modern Dutch or just Dutch (Early Modern Dutch is not listed at Wiktionary:List of languages). Nosferattus (talk) 01:35, 1 December 2024 (UTC)Reply
Yes, these forms should be included. Wiktionary views any term written after 1500 to be modern Dutch. There are already a few of these y-forms added: zyn, cyfer. As you can see, they use the {{obsolete spelling of}} template. I think their inclusion is limited, not because they shouldn't be included, but because editors mainly focus on adding terms in current use.
It would perhaps be a good idea to create a template similar to {{pt-pre-reform}}, to better organize the obsolete and superseded spellings.
We still find this in Max Havelaar, written in 1860. For example, on just one page we find myn, my, blyken, hy, tyd, belangryke, twyfel, pryzen, stryden, misdryf and zyn. The author used his own, somewhat idiosyncratic spelling, though. --Lambiam10:30, 1 December 2024 (UTC)Reply
Regarding Dutch spelling, pannekoek is given as superseded spelling of pannenkoek, and the latter is the official spelling, but not everyone agrees with that, see Witte Boekje. Shouldn't 'pannekoek' be rather a 'non-official spelling'?
Reveal potentially shocking/NSFW images only upon clicking?
Latest comment: 24 days ago10 comments6 people in discussion
I visited loxoscelism to add a translation and was greeted by a slightly revolting image.
I would be in favor of hiding such images behind a "click to reveal" message so that they aren't shown by default. This should be quite easy to do using JS.
FWIW, this was discussed in 2015 and last year. If we start censoring images, it's a slippery slope: people made headlines the very week we last discussed this for censoring Michelangelo's David. People—you see them on Talk:gay as recently as this week—complain that gay people are pornography / NSFW, and pass laws to that effect. There are people who think, and seek laws saying, trans people are pornography / NSFL. There are people (conservative Jews, Muslims, Christians) who think images of any women are NSFL. People complain about the image at swastika, or issue legal challenges (at least to WP) over maps of countries they'd prefer had different borders. Some people object to the image at penis, or the image at areola, but I think they're worth a thousand words, and don't see why a workplace would be OK with someone looking up penis, and only freak if the dictionary were illustrated—as others said in prior discussions, if one works at such a place, one may need to avoid Wiktionary at work, since images are liable to show up. With that said, I acknowledge that it's reasonable that we unofficially have some practices, e.g. the entry for mangle doesn't contain an image of a mangled body even though it theoretically could. I don't mind the image at loxoscelism, but I'm not entirely opposed to hiding some images behind a click... I'm just very wary of the slippery slope. One idea, if this doesn't exist already, is an opt-in gadget which would hide all images and require a click to see them; that avoids the slippery slope by being image-agnostic and opt-in. - -sche(discuss)16:43, 1 December 2024 (UTC)Reply
@-sche: Thanks for the reply as well as the links. One of my take-aways is that selectively hiding pictures (behind a "click to show" message) is not at all "politically unviable" on Wiktionary.
As pointed out plentifully, finding a sane demarcation will prove difficult. Reading these discussions, the impression I got was that the wisest strategy would be to start with a very liberal policy (that is however enacted for everyone by default in an opt-out fashion) and then have people incrementally work out amendments in subsequent (BP) discussions. These kinds of demarcations are not found conclusively in a single sweep. My mentioning of "NSFW" above was probably ill-advised, so what I'd suggest now as a starting point for which images to hide by default is (medical) gore, i.e. photos of wounds, deformations, the effects of disease, photos taken during surgeries, etc.
one may need to avoid Wiktionary at work, since images are liable to show up.: That's true; currently, people cannot access Wiktionary at work (or in similar situations) free of risk. What I would point out is that this is an unusual and thus surprising fact about Wiktionary as a dictionary. Of all the dictionaries I've used, I don't think there has ever been another one where I had to be cautious using it in front of other people. — Fytcha〈 T | L | C 〉 19:07, 1 December 2024 (UTC)Reply
Whether it is a slippery slope depends on the art of formulating policy, otherwise of course it can be watered down if we are unsure about it. We can distinguish motivations by which people might avoid images. For cases of medical irregularities the hardiness which we expect differs — one may well prefer a certain time of decision and mental preparation to see the image because there is only so much repulsive content any one can consume without his affective wellbeing being called into question – from the responsiveness to the regularly behaving exposed human body. If someone does not suffer locally appropriate coverage of it on كَتَبَ(kataba, “to write”) it is his problem and it is not even easy to have a depiction of an action while on the other hand the majority of the internet is porn anyway, and grounds for much greater dissonances and contradictions to scripture offended readers would have to care about, calling the survival of Islam in the 21st century into doubt, a question of available and appropriate attention we have to put into the balance. We do not have to equate illness, violence, nudity, and making love. There is also a historical depth to the matter: I guess Nazi stuff falls under “violence” but we can expect a distance towards things because of how long ago a thing prevailed, possibly again leaving only a limited number of images.
However yes, I’d rather not burden our editors with dealing with thinking about the general guidelines even, and keep a policy of deliberate ambiguity beyond what we have written. You could try some technical execution anyway of course, just that, unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images by reason that “I have 10,000 edits per year/I am admin and I know well enough which pictures are appropriate in the given context, you however have an ideological agenda, from what I can see.” It would result in templates and/or gadgets which, in effect, new users would be discouraged to use, not to say disallowed. Fay Freak (talk) 17:48, 1 December 2024 (UTC)Reply
However, we take the cutoff between Middle English and Modern English to be around 1500, so it has always struck me as anachronistic to say that the Modern English word arose in the 13th century. That information should be at the Middle English entry.
I'd like to propose moving the origin date of these senses to Middle English, then replacing the Modern English {{defdate}} invocation (some of which can be found using this crude search) with
Commenting and subbing as I have been wondering the same for Lechitic lects. I similarly do not use {{etydate}} in Polish if the term was inherited from Old Polish, etc. Vininn126 (talk) 08:31, 2 December 2024 (UTC)Reply
I don’t see a contradiction perforce. You propose to water down, information that could later be used, to edit history. If these datings are credible information at all and not random attestation ages that can happen with the Middle Ages; we still have not solved the problem of regularly labelling “reconstructed” lects, which would allow us to cleanly state things like “probably in the 4th century already, but attested from the 9th”; okay I sometimes use the etymology for this, as on بَال(bāl), if a reconstruction entry is not feasible. How is the move of Byzantine Greek going? 🙄 Fay Freak (talk) 18:05, 3 December 2024 (UTC)Reply
Would sister languages also be marked as being attested since that time? Would we use Latin to mention since when we see attestation dates of Spanish? Vininn126 (talk) 18:47, 3 December 2024 (UTC)Reply
Our coverage of Latin is larger. The decision depends somewhat on how secure an individual language’s editors are expected to be with the corpora, and what they can expect to be created any time soon. If we had lots of Greek entries having such phrasing as proposed, the planned reorganization would be considerably more challenging, demanding to revisit the attestation situation in affected cases. Just let the editors—including you—leave as much as they know, in so far as it is not overwhelming to the eye?
To ever halt before your problem, one has to construe one’s task as an editor gigantic enough that one boasts to never leave any gap, inconsistency, or inconsequence, which also does not align with reality, in as much as the presence of a gap, inconsistency, and inconsequence appears to align not with the actual reality of a language. Instead we acknowledge our finite manpower. Not some imaginary limit stemming from language cutoffs, the purpose of which apparently one has remind editors about once in a while again: inasmuch as they are justified by mutual intelligibility of languages, they do not constitute impermeable walls, though we may remember them as such and speakers constitute their identities by such ideas to some degree; instead the language headers, subheaders and labels are there to communicate something which you otherwise wouldn’t immediately relate to them. Seen in such a way, the defdates to senses are, beyond their situation in time and place—as identified by dialect and chronolect headers—, exactly what the dictionary glosses to senses of a word are supposed to do. What you bring up as a question of logics turns out a question of balance. Fay Freak (talk) 22:27, 3 December 2024 (UTC)Reply
Adding the information to the Middle English entries definitely seems like a good idea. While I can see the theoretical justification for replacing dates before 1500 with "Middle English", I'm not sure that change is really an improvement: it obviously removes some information, and the periodization convention of distinguishing between Middle English and Modern English is not particularly significant in and of itself.--Urszag (talk) 22:35, 3 December 2024 (UTC)Reply
I support using dates with definitions, so 13th to 16th century and not from Middle English to the 16th century. The date rage is more informative and easier to read. The sometimes arbitrary boundaries between stages of a language can live in the etymology section and the categories generated from it. Vox Sciurorum (talk) 00:30, 4 December 2024 (UTC)Reply
I would delete that from the list of Modern English senses and move it to the etymology section (‘from Middle English foo “bar”…’) and to the Middle English entry. Nicodene (talk) 21:09, 4 December 2024 (UTC)Reply
By all means add information to Middle English entries, but I don't see any reason to remove it from English entries. The proposal just makes things vaguer and more imprecise. The distinction between ‘Middle English’ and ‘Modern English’ is just a historical convention anyway; for a linguist, enforcing this distinction in practice is next to impossible if you're working with texts from the 16th century (as I have done here in the past). At least with Old English there is a clear break in the written record which makes the change in grammar and vocabulary pretty sharp. Ƿidsiþ07:23, 7 December 2024 (UTC)Reply
I agree. Plus, anyone who cares about the distinction between Middle English and modern English can extrapolate from the dates given. But anyone who has never heard of Middle English (which is probably most people) won't find the information meaningful. Andrew Sheedy (talk) 05:04, 20 December 2024 (UTC)Reply
I agree. English is a bit of a special case relative to other major languages, because so much of its vocabulary entered the language late. I wouldn't go past Old English, and maybe not even that far (I would be fine with the defdate template reading , but not . Andrew Sheedy (talk) 17:18, 20 December 2024 (UTC)Reply
Support, provided the 'removed' information is transferred over to Middle English. I disagree with changing "c." to "century" though. Regardless of whether you're doing things online or on paper, it's generally a good idea to optimize the space used and keep things concise; it just looks prettier that way. MedK1 (talk) 04:05, 26 December 2024 (UTC)Reply
FYI: December 2024 Unicode update
Latest comment: 24 days ago1 comment1 person in discussion
Latest comment: 17 days ago5 comments3 people in discussion
IMO it is confusing that we use 'forms' to mean 'spellings' in categories like Category:American English forms and Category:European Portuguese verb forms and Category:Brazilian Portuguese forms superseded by AO1990; also for that matter, more generally in CAT:Obsolete forms by language, CAT:Archaic forms by language, etc. Most of the descriptions of these categories make clear that the "forms" referred to are superseded/archaic/obsolete/etc. spellings, not some other kind of form. Even opening up Category:Ukrainian archaic forms produces 5 subcategories whose names all contain "spellings" or "terms spelled with" in them. Unfortunately the term "form" is badly overloaded at Wiktionary; any action to reduce the overloading is welcome in my book. So I propose at first to rename ad-hoc language-specific categories containing 'forms' to 'spellings'; and if there are no objections, rename the more general 'LANG superseded/archaic/obsolete/dated/rare/uncommon/informal/nonstandard forms' -> 'LANG superseded/etc. spellings'. Any terms that are in a 'LANG foo forms' category but aren't mere spelling variations should be moved to the corresponding 'LANG foo terms' category (which exist for all 'foo' except for 'superseded', but 'superseded' seems specifically for spellings, so this is unlikely to be an issue). The only 'foo forms' category I've excluded is 'LANG short forms', which is using "forms" differently, and should eliminated in favor of either 'LANG ellipses', 'LANG clippings' or 'LANG abbreviations' (depending on what sort of short form is involved), but that's a different can of worms. Benwing2 (talk) 09:08, 7 December 2024 (UTC)Reply
Thanks for the ping. The entries in these categories don't seem to be all of one type: it seems they will need pruning (especially, but not only, if renamed) iff people still want to distinguish spellings from forms. (Or are we abandoning that distinction? I know some later commenters in that discussion argued for that instead, and I'm not sure whether a decision was reached or, if not, which approach would be best.) For example, I see we have anemia as an American form of anaemia (it should indeed rather be spelling if we're distinguishing those two words), but we also have airfoil in the same "American forms" category but using an "American spelling" label although it differs from aerofoil in more than just spelling. Likewise Abissinia, currently listed as an archaic "form", would be better as a "spelling", but the difference in adipsy and adipsia is not just spelling; abyssus, currently presented as an "archaic form ofabyss", also does not seem like a mere archaic "spelling", but perhaps it is also best not as a "form" but as an archaic (or obsolete) synonym ofabyss, or as (obsolete) Abyss. So, especially (but not only) if renaming the categories, it seems like we need to decide what we want the scope to be, and whether we want to distinguish "only the spelling is different" and "the pronunciation is also different" or combine those two things...? - -sche(discuss)17:23, 9 December 2024 (UTC)Reply
Hmm. In practice I suspect people won't be able to distinguish mere archaic/obsolete/American/British spelling variants from those that also differ in pronunciation (aluminum vs. aluminium). At the same time I think "form" is far too overloaded. Maybe we could say "American English variants" etc.? Also technically the "European Portuguese verb forms" vs. "Brazilian Portuguese verb forms" reflect slight differences in pronunciation; they are mostly in past tense -amos (Brazilian) vs. -ámos (European), which is meant to indicate a difference in vowel quality. Likewise although the majority of "Portuguese forms superseded by AO1990" are just spelling differences, there are a few that are not, e.g. pre-reform abeto Douglas vs. modern abeto-de-douglas (although in that case the definition specifically says "pre-reform spelling of ..." and it seems there was also a pre-reform abeto de Douglas). So maybe we should use the term "variant". As for alt forms vs. alt spellings, I do think we should try to make that distinction since some of the things tagged as "alt forms" differ quite a bit from the form they are said to alternate with. Benwing2 (talk) 20:50, 9 December 2024 (UTC)Reply
Me and the Portuguese editors I know use {{alt spelling of}} when the difference is in spelling but not in pronunciation, at least “phonemically” — i.e., different spellings between European and Brazilian Portuguese are alternate spellings because the difference comes from each dialect’s pronunciation of phonemes, not just of that particular word. Meanwhile, I use {{alt form of}} when it’s a different pronunciation that doesn’t stem from a systematic difference between dialects.
However, this distinction in template usage is almost entirely moot if the category that gets assigned is the same. I think the most useful decision is to create new categories, 'LANG archaic spellings' etc., as daughters of 'LANG archaic forms' etc. Though this would need us to pay some real attention to replace the category tree definitions as well as the categorizations called by templates. Polomo47 (talk) 01:48, 10 December 2024 (UTC)Reply
Or just replace it entirely with {{clipping}} (of), since syncope is a type of clipping anyway, and it's not clear why one would want a specialized category for it. Nicodene (talk) 10:23, 9 December 2024 (UTC)Reply
I actually only had instances of clipping in Polish entries. Syncopy might be seen as more phonological and clipping is often a process in more colloquial things. Not sure. Vininn126 (talk) 10:25, 9 December 2024 (UTC)Reply
There isn't a difference, as far as I am aware, other than the fact that syncope refers to clipping in medial position. And that it sounds fancier. Nicodene (talk) 10:37, 9 December 2024 (UTC)Reply
I too was under the impression that syncopy is a purely "mechanical" phonetic process whereas clipping is a deliberate removal of syllables used to coin new words. Not that I have any source to support that interpretation but... PUC – 11:18, 9 December 2024 (UTC)Reply
I can't find any sign of such a difference outside the realm of (accidental?) Wiktionary convention. Google Books, for instance, brings up a laundry list of sources confirming that these are indeed synonyms. Nicodene (talk) 12:00, 9 December 2024 (UTC)Reply
I have to add my voice to the chorus of people saying that I find the present distinction to be valuable. I certainly wouldn't insist on the current terms used - and I am increasingly convinced we shouldn't keep using them as we are. But distinguishing between clipping that occurs as part of a gradual phonological process (e.g. Romance syncope, or English fancy) vs deliberate, conscious truncation (e.g. math) seems very valuable. This, that and the other (talk) 00:22, 10 December 2024 (UTC)Reply
Very well. In that case the issue is finding a pair of terms that can reasonably be specialized in the way that you have described.
I personally am fine with elision and clipping respectively since we're already using clipping in the sense of "conscious truncation". Benwing2 (talk) 07:27, 10 December 2024 (UTC)Reply
It could be that these are all part of the same process, I'm not sure I like the usage of ellipsis - differentiating skipping a word versus a syllable (and from there skipping a syllable in other ways) could be useful. Perhaps that could be a separate parameter. Vininn126 (talk) 10:32, 9 December 2024 (UTC)Reply
Latest comment: 17 days ago6 comments4 people in discussion
We have cite-thesis. I was wanting to add a quote from a thesis to an entry, and the quote, quote-book, and quote-journal templates are not fit for purpose. Is it worth putting it to a vote? I don't know if I can do that as I've only been active on Wiktionary quite recently. Cameron.coombe (talk) 23:04, 9 December 2024 (UTC)Reply
Latest comment: 17 days ago2 comments2 people in discussion
I've been doing mass-correction of Portuguese pre-reform or otherwise archaic spellings — for reference, see how many pages are listed in WT:RFVI, and how I've cleared out ]. My current project is clearing out the categories dated forms, archaic forms, and, above all, obsolete forms.
Latest comment: 17 days ago3 comments2 people in discussion
I'm just fixing a quote here and noticed tbe editor put a title (Dr. med.) preceding the author name. Is this established practice here? I don't personally like it because it's clutter and likely not applied consistently. But I couldn't find a policy. Cameron.coombe (talk) 04:45, 10 December 2024 (UTC)Reply
@Cameron.coombe: I don’t think we have a policy on this yet, but I always remove titles and forms of address unless they are strictly required to identify the author (for example, in some early works, female authors were named after their husbands, as “Mrs. John Smith”). — Sgconlaw (talk) 05:07, 10 December 2024 (UTC)Reply
Is there a reason behind this? Whether an adjective is general use (so no note), attr only, postpositive only, or pred only is important information, especially for non-native speakers, and it's provided in other dictionaries. There is an attributive label in the lb template, but it links to a gloss of the meaning for nouns, not adjectives, and, at least based on the common examples above, doesn't seem to be in use? Cameron.coombe (talk) 10:53, 11 December 2024 (UTC)Reply
It is a counsel of perfection that we should properly label every adjective sense that needs such a label. Add the label to the appropriate senses when you find them. I can't think of a practical way to detect all the cases where such labels are missing. A list from some source would be helpful, probably just for the more common cases.
We have labeled as "attributive" (mostly not "attributive only") some 200+ English terms. To label a sense of a polysemic adjective "attributive only" may risk user confusion. DCDuring (talk) 14:38, 11 December 2024 (UTC)Reply
@DCDuring thank you for the thoughts. I'm quite happy with simple "attributive," which is what I'm familiar with from other dictionaries. "Mostly attributive" can also be helpful if pred. sense is rare or nonstandard but attested. I'm not sure about the label auto-linking here though when I'd use it of adjs. Cameron.coombe (talk) 22:11, 11 December 2024 (UTC)Reply
If you are saying that we should link the attributive (and postpositive and predicate) labels to something explanatory, I agree, though I would usually settle for our entries for the terms or {{senseid}}ed definitions at the entries. It also might make sense to have categories for the terms that have such labels. Making the changes required is not in my wheelhouse. DCDuring (talk) 22:30, 11 December 2024 (UTC)Reply
@DCDuring cool, thanks. I wasn't familiar with senseid. I can have a play next time I need to. My only concern now though would be adding a whole lot of attributive labels and then having someone go through and revert them. I've got your support, but I don't know how universal that translates to! Cameron.coombe (talk) 23:15, 11 December 2024 (UTC)Reply
I'm already disagreeing with myself about my rejection of attributive only as a label rather than attributive. The normal ("unmarked") state of an English adjective is that is prepositive and usable both attributively and as a predicate. The function of our labels is to mark exceptions to the unmarked state. Bare attributive does not do this, IMO. I don't know that we can be certain that only should follow attributive, because exceptions are likely, if not now, then perhaps in the past, and if not in UK and North America, then in Australia, the Caribbean, or India. Maybe the default for all of these should contain usually, with stronger only reserved for cases where the supporting evidence is strong. DCDuring (talk) 00:32, 12 December 2024 (UTC)Reply
True, attributive only or usually attributive is more precise. Other dictionaries use simply attributive, but probably because of space restrictions. (I know space isn't a concern for Wiktionary, but is clutter?) For exceptions, I would handle these as subdefs:
I note that User:Gapazoid is (in addition to being locally blocked) globally locked as a "Vandalism-only account", though User:Xaosflux stated that global unlocking might be considered if en.Wiktionary unblocks. Unless Gapazoid has deleted contributions on other wikis that I can't see, I actually find the global lock rationale harder to understand than the local block rationale; the user appears to have edited only en.Wiktionary and en.Wikipedia, and the few edits to en.Wikipedia appear to be mundane copyediting. AFAICT Gapazoid has made only a single edit to Wiktionary content (?), to MAP; the only other (eight) edits the user has made were to his/her talk page; is this correct? (I see no deleted contributions.) If Special:Contributions/2600:387:0:803:0:0:0:95 (also locally blocked and globally locked) and/or Special:Contributions/2603:6011:C8F0:E4E0::/64 are the same person, their own sole contributions were to threaten, on Gapazoid's talk page, to commit suicide. If the user has made other edits I have missed, either on Wiktionary or elsewhere, I hope someone will bright them to light. The user's edit to MAP was to change the usage note from commonly interpreted as a sign that the speaker supports (or is sympathetic to) such people to ...a sign that the speaker supports sexual contact between adults and children. That change seems mistaken / incorrect to me, and had I seen it, I would have undone it with an edit summary explaining that the "supports such people" language seems more accurate, but — if the edit had been made with no edit summary, or with a mundane edit summary — I would have taken it to be a mistaken but good-faith edit, not vandalism, and would not personally have issued a block. However, the edit was made with an edit summary which, like the user's posts on his/her talk page, state that he/she is a pedophile but is opposed to child sexual abuse. I can understand the user's objection to the original block summary saying he/she engaged in "pedophilia advocacy", and I appreciate the improved block summary. I also understand the position that a user openly announcing himself/herself as a pedophile is disruptive, somewhat similar to w:WP:HID; threats of w:WP:SUICIDE also seem disruptive. I'd also note that my spider sense is that people who are blocked for things like this and then spend this much time trying to get this many wikis / functionaries / organs of the WMF / etc involved in unblocking them . . . in the situations in the past where it's happened, such users have either been felt also by other wikis' admins to be NOTHERE (and so remained blocked not only here but also on other wikis that considered their appeals), or have been unblocked but then proven themselves to indeed be disruptive (NOTHERE, here only to bog people down in debates, etc) and gotten reblocked in time. Considering all of that, I, for my own part, as just one admin here, decline to unblock. If other admins (or other editors) want to weigh in, I encourage them to do so! I pinged Xaosflux above to make him aware of this discussion, and now ping User:Ghilt and will also leave a note on Gapazoid's talk page pointing to this discussion. - -sche(discuss)00:44, 12 December 2024 (UTC)Reply
Thank you for your statement, -sche. And also thanks to Surjection for changing the log entry. This concludes the matter for us. On behalf of the U4C, --Ghilt (talk) 09:27, 12 December 2024 (UTC)Reply
After a private lock appeal I have unlocked the account. To respond to comments here, the lock was implemented after an SRG request due to pedophilia advocacy - similar to why we, for example, lock accounts for uploading CSAM on Commons, even if they only edited that project. With that being said, they have made a reasonable further explanation to me in private and I see it as a sign that this can currently be locally handled. EPIC (talk) 16:20, 19 December 2024 (UTC)Reply
Protecting pages as "model pages"
Latest comment: 10 days ago17 comments8 people in discussion
Saltmarsh (talk • contribs) has semi-protected a couple dozen Greek entries as "model pages". I don't think this is a good practice, since it deters editors who could materially improve these pages (no dictionary entry is ever complete), and there are much better approaches, e.g. having example entries in a separate namespace. — SURJECTION/ T / C / L /18:46, 11 December 2024 (UTC)Reply
@Surjection, PUC, these pages were not locked; I have edited often (they used to be protected from anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes. That is because we -editors of modern greek- are not around every single day). The models are in Category:Greek model pages so that we can copypaste from them. All languages should have copypaste models for us: because wikitext is getting harder and harder. Also see a trial at User:Erutuon/Ancient Greek model pages which is even more complicated. I always try to find copypaste patterns from recent edits by administrators; I would have liked to have them in some Cat with their endorsement, rather than going around Histories and their Contributions, hoping to find something similar to my task. If not protected, fine: but someone has to patrol them. Thank you. ‑‑Sarri.greek♫I10:47, 12 December 2024 (UTC)Reply
These pages are still semi-protected and many of them were admin-protected. I don't see any "anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes" in the history of any of these pages, so they cannot simply have been protected to guard them from vandalism.
The idea of model pages on its own appears sound, but it's not a good idea to make the actual mainspace pages the 'model pages' and then protect them because they're 'model pages'. These should be in a separate namespace. — SURJECTION/ T / C / L /10:51, 12 December 2024 (UTC)Reply
No problem: unlock them, M @Surjection. We can make a List and write specific examples -because they cannot be changed without discussion: they are heavily copypasted- at the About Greek page or Help Greek. My administrator @Saltmarsh has done SO much for Modern Greek! I would like to help him a bit. It's just... mmm I need a little help from programmers. For example, a little template for the Orthographic Reform to monotonic of 1982. (cf Άγγλος.2024 cf Notes Little things like that. I could make it myself, but from experience, I see that only interface programmers check Templates and make them in a correct way. ‑‑Sarri.greek♫I15:52, 12 December 2024 (UTC)Reply
@Surjection, Sarri.greek As far as I can see these "Model pages" do no harm (kindly point any out if you any see one). New editors need help with layout, not always easily extracted from "Help". Protecting them (which again does no harm) ensures that any changes in suggested layout can be discussed. — Saltmarsh☮14:29, 12 December 2024 (UTC)Reply
Yes, thank you @Saltmarsh. Need to trust some pages; the ones checked by an admin. By the way, I am checking some of the pages. When robots finish their work, we can check again. (... I know only named parameters, cannot remember the sequences of positional params: I hate it). I have to throw away alll my cheatsheets. Thank you, dear Salt!! ‑‑Sarri.greek♫I14:38, 12 December 2024 (UTC)Reply
I would oppose protecting any page in principal namespace on the grounds that it is a model page. Such model pages might be useful in Wiktionary space. I wonder how that could work in any page with multiple L2 sections.
It might be useful to have templates, possibly located on entry talk pages, that indicate that a given L2 section has achieved some stage of "completion", so that contributors could find such "models". DCDuring (talk) 14:56, 12 December 2024 (UTC)Reply
Nice idea, thank you M @DCDuring. Something analogous to wikisource's coloured bars. not reviewed / reviewed - see List so and so. A list of 'SOS' pages can be created, especially the ones with 3 Greek L2s, 2 Greek L2s, for every part of speech or inflectional group etc. Usually, I edit Ancient and Modern Greek in unison (lots of pages coinicide and Modern refers all the time to previous etymologies and inflections. Especially with Hellenistic Koine -which has many problems and is usually ignored-). I hope robots will normalise the standard templates because it is very difficult to have 2 or 3 ways to write the same thing in the same page. I am awaiting also for the pending Medieval Greek gkm. Thank you all for your attention. ‑‑Sarri.greek♫I15:11, 12 December 2024 (UTC)Reply
I realized that the situation in Ancient/Modern/Medieval? Greek made the model-page-in-principal-namespace idea practical for those languages, as other languages do not use the same characters. But it wouldn't work so well for CJKV entries where the different L2s often have different levels of development. I would prefer an approach that worked across all kinds of entries with multiple L2s. Maybe it would be useful to see what works for Greek-character entries along the lines that you suggest, without protecting model pages. That might be a 'model' for entries with multiple L2s in other character sets. DCDuring (talk) 15:30, 12 December 2024 (UTC)Reply
I agree with DCDuring there may be a case to be made for putting such entries in the Wiktionary namespace, but I also agree strongly with Surjection that these protections should be reverted. This is a bad use of the page protection mechanism. — Mnemosientje (t · c) 19:53, 17 December 2024 (UTC)Reply
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Well @Surjection "against the entire idea of having a wiki." Wikipedia has numerous such protected pages. These pages do no harm at all - I suspect that basically you "just don't like them". Well I do!, and these interminable discussions, which some people seem to relish, really piss me off !! — Saltmarsh☮19:58, 12 December 2024 (UTC)Reply
Pages are protected because of vandalism, because of high use rate (templates, modules, etc.) or because they are non-content pages that should not be edited by anonymous users. Neither applies here. Again, if we want to have "model pages" that are protected, then they need to be copies outside of mainspace. — SURJECTION/ T / C / L /20:45, 12 December 2024 (UTC)Reply
I agree that entries should not be fully protected (admin-only) unless they have been or are likely to become the target of enough vandalism to warrant that (and even then, unless the vandalism has been enduring, protection should generally be temporary, like the protection applied to words that appear on the mainpage). Protecting pages (even at a lower protection level) simply because they are "good" is not the way to go; as recent edits to some of the pages mentioned here have shown, they were far from complete, so preventing some people from improving them is inadvisable. I agree with Surjection that if the goal is to show ideal formatting (or such), it is better to have examples (or even one single example, e.g. made-up word illustrating all possible things, e.g. how to format an adjective, a verb, a noun, all at once) somewhere in Wiktionary: space like the language's "About" / "Entry guidelines" page.
Inspired by the discussion above, I looked at what other pages are indefinitely edit-protected at high levels. 吃飽 has been indefinitely protected, allowing only template editors(!) and admins to edit, since an edit war 2019; is this still needed? The user who was edit-warring back then seems to have matured. (Even if there is still a problem, we now have the ability to block specific users from editing specific pages, while still allowing them to edit the rest of the site, which seems like it'd be better than protecting the whole page and thus blocking anyone from editing it.) - -sche(discuss)00:08, 13 December 2024 (UTC)Reply
Chiming in: Protecting pages for them to be models is obviously unsound. This is platonic idealism, which does not hold water empirically given that there is always room for improvement, you just have not exerted yourself long enough on it. And protecting pages always expresses distrust to users, which needs to have some basis other than the quality of the page. Fay Freak (talk) 00:57, 13 December 2024 (UTC)Reply
Cantonese, Hainanese, and Hakka lemmata treated as Chinese
Latest comment: 11 days ago9 comments3 people in discussion
@0DF: Chinese is a special case, because terms are simultaneously Chinese and any of a huge variety of sublects. The writing system has a lot to do with this, since it allows writing things that are basically the same words in writing but completely different when spoken. It's very complicated, with variations in grammar, in pronunciation, and in writing that only partly overlap.
There's a whole universe of Chinese-specific templates and modules that do things in a completely different way from anything else on Wiktionary. When I'm going through the Todo lists, I treat most of the Chinese-related stuff as false positives and leave it alone. In all likelihood, "fixing" things will just cause other problems. The other CJKV languages share some of the same issues and are best left alone, for the most part.
I do fix things like Chinese etymologies that use language codes for Tibetan, and any {{lb|en}} on CJKV definition lines- but I know my limits (I took a year of Beginning Mandarin at UCLA, but that was 38 years ago). Chuck Entz (talk) 04:16, 16 December 2024 (UTC)Reply
@0DF: This is happened before (see here) and the correct way would be something like Special:Diff/72937108/75521861, and not changing it to Cantonese L2.
@Chuck Entz: Yes, I was somewhat aware that Chinese is a unique case: unified in writing, but divided in speaking. Thank you for chiming in.
@Wpi: OK, I'll add {{cln|yue|lemmas}}vel sim. henceforth. That should fix things. Thanks for pointing me to the correct solution, and I hope you're successful in fixing the issue with Module:zh-pron. I'd already clocked that “If you do not speak the language(s) involved,…” caveat, but if I observed that literally, I wouldn't be being nearly as productive or helpful as I would be by being bold in editing. I'd already noticed that my changes to 0T were inadequate, hence my raising the issue in this BP section and then pinging you and the other relevant editors, which has led to the proper resolution, so I think I have my boldness–caution level fairly well calibrated.
Thanks. Regarding Hainanese, I believe it's because {{zh-pron}} does not support Hainanese (yet), so there hadn't been any category infrastructure for it. – wpi (talk) 14:40, 16 December 2024 (UTC)Reply
Latest comment: 2 days ago4 comments3 people in discussion
The Wikimedia Foundation is in the process of rolling out temporary accounts for unregistered (logged-out) editors on multiple wikis. The pilot communities have the chance to test and share comments to improve the feature before it is deployed on all wikis in mid-2025.
Temporary accounts will be used to attribute new edits made by logged-out users instead of the IP addresses. It will not be an exact replacement, though. First, temporary users will have access to some functionalities currently inaccessible for logged-out editors (like notifications). Secondly, the Wikimedia projects will continue to use IP addresses of logged-out editors behind the scenes, and experienced community members will be able to access them when necessary. This change is especially relevant to the logged-out editors and anyone who uses IP addresses when blocking users and keeping the wikis safe. Older IP addresses that were recorded before the introduction of temporary accounts on a wiki will not be modified.
We would like to invite you to read the first of a series of posts dedicated to temporary accounts. It gives an overview of the basics of the project, impact on different groups of users, and the plan for introducing the change on all wikis.
We will do our best to inform everyone impacted ahead of time. Information about temporary accounts will be available on Tech News, Diff, other blogs, different wikipages, banners, and other forms. At conferences, we or our colleagues on our behalf are inviting attendees to talk about this project. In addition, we are contacting affiliates running community support programs.
Banning Proto-North Caucasian and Proto-Northeast Caucasian reconstructions
Latest comment: 10 days ago10 comments9 people in discussion
1. Proto-North Caucasian. In my opinion, there are currently no reconstructions of the Proto-North Caucasian simply on the grounds that there are no reconstructions of the Proto-Northeast Caucasian. Here I would prefer to end any discussion about this superfamily and delete the category itself in order to avoid reconstructions.
2. Proto-Northeast Caucasian. Just as it was written above, I believe that there are no reconstructions of the Proto-Northeast Caucasian. Whereas the so-called reconstructions of Starostin and Nikolaev are actually tentative pseudo-reconstructions. In addition, they do not give reconstructions of Proto-Northeast Caucasian forms anywhere. All their reconstructions in the database are Proto-North Caucasian, which are identical, apparently. Realizing this, Johanna Nichols uses the pound sign (#) for pseudo-constructions in her works.
This convention follows Williams 1989, who uses the asterisk for reconstructions based on regular sound correspondences and the # for "seudo-reconstructions based on a quick inspection of a cognate set without working out sound correspondences".
It should be noted the recent case of a User:Qmbhiseykwos who began to add (in addition to pseudo-reconstructions by Nichols and "reconstructions" by Starostin and Nikolaev) "reconstructions" by the Dutch linguist P. Schrijver (2018, 2021, 2024), which should also be considered pseudo-reconstructions. For example, Reconstruction:Proto-Northeast Caucasian/rɔḳʷ(ə).
2.1. Appendix. Since the Wiktionary does not operate with the concept of tentative pseudo-reconstructions, all such "reconstructions" of the Proto-Northeast Caucasian should be indicated only in the appendix. For example, Appendix:Proto-Nakh-Daghestanian reconstructions
2.2. Renaming. I believe that it is necessary to rename the family to the (Proto-)Nakh-Daghestanian one. This must be done, since the name hint at the division of the North Caucasian and South Caucasian languages (Kartvelian), which is unacceptable.
2.2.1. Accordingly, it is necessary to rename the (Proto-)Northwest Caucasian to the (Proto-)Abkhazo-Circassian or (Proto-)Adyghe-Abkhaz, etc.
3. Proto-Daghestanian. It may be necessary to create a category for this family. Regarding this family, there are curious reconstructions by B. Giginejšvili (1977) and E. A. Bokarev (1981). But they don't seem to give any reconstructed forms. It is difficult to tell me anything here, since I have not studied these languages. I'll give you a comment by the American Caucasologist Alice C. Harris (2003: 180):
“It should be noted first that the phonetic reconstructions proposed by Nikolayev and Starostin (1994) and adopted by Alekseev (1985) are not widely accepted. For example, Nichols (1997) and Schulze (1997) show serious problems with the proposals in Nikolayev and Starostin (1994), and Giginejšvili (1977), Schulze (1988), Talibov (1980), provide reconstructions that are in various ways more rational”.
Not even Nakh-Daghestani sound correspondences are fully understood, to even consider enrolling Abkhaz-Adyghe here is insanity. Proto-North Caucasian was never _not_ controversial, so I have no clue why it was even added to Wiktionary in the first place. Nuke Proto-North Caucasian. On the other hand, banning Proto-Nakh-Daghestani reconstructions is perhaps too extreme. Imho, there's no great harm in having them exist even if it turns out they're wrong/imprecise. კვარია (talk) 14:39, 15 December 2024 (UTC)Reply
Nuke North Caucasian both as a group and as a reconstructed language, for Nakh-Daghestani/NE-Caucasian - I'm fine with agreeing not to create reconstructions, but I think having a code may be a good idea nonetheless. Just need to patroll them from time to time. Thadh (talk) 21:57, 15 December 2024 (UTC)Reply
Nuke North Caucasian yes, already since it's unclear if even the family exists at all. Tentative cognates in NWC could be noted in NEC entries if we end up having/keeping them (same as we do with longer-standing Indo-Uralic, Altaic, etc. etymologies).
Keep Proto-Northeast Caucasian. NCED's (and Schrijver's) reconstructions may have many problems, but they are generally not "pseudo-reconstructions", and there's enough reason to think many of them are at least valid etymological groups. Any etymologies where Nikolayev & Starostin propose NWC reflexes are given in PNC form, but this is mainly because they set up very few changes from there to PNEC. The one I find on a lookover of their preface is *gg(w) > *ddɮ(w). In effect they admit the reconstruction is of PNEC in the first place, but it comes out so complex they end up able to derive (their reconstruction of) PNWC almost directly from it.
I do not follow the argument against "Northwest Caucasian" and "Northeast Caucasian", perfectly illustrative and mainstream names as far as I can tell. What is the "division is unacceptable"? Treating Kartvelian / South Caucasian as an unrelated family? That if anything seems to be much closer to consensus than the question of North Caucasian, and I also do not see how this would be "hinted at" here.
"Daghestanian" as a distinct node is also not consensus, does not have distinct reconstructions for it either, and should not be added (seems IMO like an outdated typological unit against Nakh being more innovative). Probably we should not commit to any NEC grouping scheme beyond the unambiguous base units like Lezgic or Avar-Andic.
I agree, let's ban Proto-North Caucasian. I have no opinion on the rest of the issues. I will only note that there are many weak scholars and outright charlatans dealing with the three Caucasian branches. All of their etymological works should be reviewed by our more intelligent editors. @Qmbhiseykwos, no mindless copying, please. Vahag (talk) 18:03, 16 December 2024 (UTC)Reply
Perhaps we should have a section in the Appendix namespace for StarLingish. It would fit right in with Klingon, Na'vi and other constructed languages from fictional universes... Chuck Entz (talk) 06:06, 17 December 2024 (UTC)Reply
I, too, am surprised about the presence of Proto-North Caucasian on Wiktionary. It must have slipped our eyes like the proverbial monkey walking down the street, few would be willing to see, and only been included as a consequence of Wikipedia or another reference not having been unequivocal about its unacceptedness. It has to be removed. Fay Freak (talk) 21:38, 16 December 2024 (UTC)Reply
Latest comment: 9 days ago3 comments2 people in discussion
I know how much everyone loves a part-of-speech question, so here is another one.
"I was indoors."
"He is upstairs."
"They were outside."
"Look, your keys are there!"
Sometimes I feel that some dictionaries, including Wiktionary, are coy about giving examples of this nature, as if they are unsure of the part of speech of the complements. While these are not "traditional" adverbs, in that they do not modify anything adverbially in a traditional sense, and cannot be removed leaving a relevantly valid sentence, nevertheless they do answer adverbial wh-questions, and do not seem like adjectives. Some people call these "adverbial complements", I think. Are we happy to place these uses under "adverb"? Another possibility for some cases -- e.g. "outside" in these examples -- is "intransitive" preposition, in that "They were outside" implies "They were outside (somewhere/something)", but I'm not sure that this concept is fully mainstream. What do you think? Mihia (talk) 21:52, 17 December 2024 (UTC)Reply
Gosh, I forgot entirely about that vote. Thanks for reminding me. Can we make any clear distinction between "I was indoors" being an adverb, and "Is Mr. Smith in?" being an adjective, as is currently listed at in — and, indeed, generally between my examples above and various other supposedly adjectival instances of other "short function words", where in some cases the philosophy, quite possibly perpetuated in part by myself, seems to be "if it's the complement of the be-verb then it's an adjective"? Mihia (talk) 22:54, 17 December 2024 (UTC)Reply
Rethinking Middle Korean verb lemmatization
Latest comment: 18 hours ago6 comments4 people in discussion
But I really cannot help but think (as I and others have already stressed) that this is misguided and adds needless confusion.
Etymology sections already use the faithful phonemic form by convention. This creates at best alt hyperlinks/double hyperlinks and at worst redlinks even when we have an entry for the MK verb in question. This is especially problematic because, let's be real, 99% of people ever going to MK entries on here do so through a MoK ety section.
In the discussion linked above, it was said that "by convention" Korean lemmatizes actual inflected forms for verbs.
Since when? Even in Modern Korean, 다(-da) is defined as a "dictionary citation form ending," sufficiently demonstrating that even within our morphological orthographic framework we are specifically citing "dictionary forms," not any real form in use.
This should and has carry/ied over to Middle Korean dictionaries. Consider four popular Middle Korean dictionaries—15세기 국어 활용형 사전, 우리말큰사전(옛말과 이두), 고어사전, and 한불자전. Consider now that the former two use the "morphophonemic spelling," and only the latter two use "faithful" spelling as we do now. Consider further that 고어사전 is a) from 1960 and b) also lemmatizes other forms such as the infinitive in some cases, with the express goal of being accessible for learners. We don't do this, we shouldn't do this, etc. 한불자전 is from 1880(!) and was written by a French missioner. Is this really the precedent for us to be following?
I would love to start adding more MK entries but there are a lot of gaps right now in infrastructure(?) that make this difficult. This is IMO the largest blocker; I've brought this up countless times on the Discord, but I'd love to reach an actual BP consensus. Any input appreciated. Lunabunn (talk) 03:48, 18 December 2024 (UTC)Reply
Agreed. I've already expressed my opinion on this several times before, but I'd rather the forms show the original stem. This will be beneficial for the learners and those curious in the long run, and it will help majorly advance the cause to create an automated conjugation template, most importantly for the header.
Additionally, I myself also want us to reach a consensus fast, as there seems to be a confusion whether or not Modern Korean etymology header contains the actual, attested form of the verb (=용언) or the root. The only real caveat is, syllable-final ㅸ looks pretty ugly in syllables — 어드ᇦ다, 셔ᇕ다, ᄠᅥᇕ다, etc... would be some of the roots we have to add. Other than that, I think this is for the best.
Strong support with a suggestion. This exact issue has been on my mind for the past few years ever since editors, including myself, have begun adding significant numbers of MK verb/adjective entries. I thought I should speak on this matter as someone who has added numerous MK entries throughout the years. Thanks @Lunabunn for finally bringing this up.
I can now see consistency for consistency's sake is really the only thing going for the current "historically faithful (allophonic)" framework we have. While unapologetically uniform in its lemmatization rules, I agree that this leads to needless confusion and is at the expense of navigability. This is especially true for readers who likely access MK entries through MoK etymology sections (whom I assume are the overwhelming--I can't stress this enough--majority as you have mentioned). As for the "convention" from the previous discussion (I was there), I believe it referred not to dictionaries but the MK spelling convention, i.e., 표면형(表面形) (phonetic), as using the 기저형(基底形) (morphophonemic) would be anachronistic.
Speaking from personal experience, this has also been quite confusing and time-consuming for even editors who are familiar with MK orthography. For example:
Having to actively think about the proper "historically faithful" lemma when creating wikilinks (see how, in ᄉᆡᆷ, I had to link 기픈 to the "proper" 깁다, ATM a totally unrelated MoK entry, instead of the phonemic 깊다, at least the descendant MoK entry), which isn't intuitive at all as myself a native Korean speaker accustomed to MoK orthography. Although correctly linked according to current conventions, I would imagine this would be utterly baffling to a beginner.
Having to add "phonemically faithful" stubs to make up for this (e.g., see the MK "entry" for 및다, which would become the main MK entry under this proposal), unnecessarily adding workload to the already thin MK editor base.
All in all, it is clear that, in addition to the points Lunabunn has brought up, the positives--if any, really, other than doing it ostensibly for faithfulness' sake--of creating lemmas consistent with historical MK orthography do not outweigh its numerous negatives. Being anachronistic is not a good reason to continue this. I am now convinced that this is not the goal for which we should aim, especially Wiktionary being a word dictionary and not a spelling guide. This is also what modern monolingual dictionaries do, and this is what we should follow, which is more in-line with general Wiktionary policy. Moreover, we already don't do this for nouns, so the historicity argument is indeed moot.
However, I do not think an entirely phonemically faithful lemmatization scheme is desirable. As @Solarkoid specified, this would mean we would need to create entries such as 셔ᇕ다, which never appeared in actual MK or MoK texts (it's like an imaginary number) and is, well, yes, "ugly." Aside from looks, which shouldn't be something we consider in a dictionary, the general 표준어대사전 and the academic 15세기 국어 활용형 사전 both actually list the historically faithful 셟다 as their headword, while mentioning the phonemic 셔ᇕ- as a "form" appearing before vowels (which is not wrong). Conversely, both list the phonemically faithful 맞다 rather than the historically correct 맛다 as the headword. Monolingual dictionaries (and thus conventionally cite verbs/adjectives with the ending -다) seem to treat the lenes ㅸ and ㅿ (distinct phonemes in MK) as exceptions, to align, I am pretty sure, with how MoK treats them. In fact, 15세기 국어 활용형 사전 explicitly states this in its preface. Therefore, you simply are not going to find 셔ᇕ다, particularly as the headword with the ending -다, in any mainstream dictionary or work (except, I suppose, research papers in Korean, which have the liberty of, well, not being a dictionary for learners; they could use forms such as 셔ᇕ다 all they want).
I believe we should not implement an entirely phonemically faithful lemmatization scheme for, again, the sake of uniformity, as neither do popular modern monolingual dictionaries do this; we would be the first dictionary to do this, as it is also demonstrably not a "convention," as with currently using historically accurate forms. As such, I think creating entries such as 셔ᇕ다, spelled in Hangul, would also be a source of confusion for those expecting the same lemma coming from popular MK dictionaries (as well as deviating from MoK morphophonemic standards and z as allophones calling it "irregular conjugations"] on which they by principle base lemmatization yet with which most people would be familiar). It's not that using 셔ᇕ다, besides aesthetics (lol), is inherently wrong (it's actually correct); however, 셟다, technically "wrong" (read: an inconsistent treatment), is how contemporary dictionaries have chosen to lemmatize in order to make it easier for modern readers unfamiliar with MK phonology. So it's really nobody's fault; we would just be following precedents--conventions as you will.
Yet, this is not perfect either, as some words would be lemmatized according to a different principle from the rest (and for something as superficial as their spelling at that). Nevertheless, I propose that we still likewise make exceptions for cases like ㅸ and ㅿ (the only exceptions I could find with a cursory review of dictionaries) for the reason explained above, in Hangul, of which we use the historically faithful spelling, but apply an entirely phonemically faithful (containing the root) scheme in Romanization. We can do this as Wiktionary is unique in that it always provides both Hangul and Romanization for MK.
So, for example, in -ᆸ다 where ㅂ represents an underlying /β/ such as in 셟다, we would use the Yale W. Consequently, we would get 셟다 (Yale: syelW-ta), with the historically faithful Hangul spelling as the headword and phonemically faithful spelling as the romanization. We would not be the first to do this, as some English language works on MK, which only use Yale Romanization, do exactly this (see Martin 1992 p. 57, who uses the phonemic stem syelW- to refer to this exact word). In -ᆸ다 where ㅂ represents an overt /p/ such as in 저줍다, we would obviously still use the Yale p, and such verbs/adjectives are not affected by this proposal. Hence, instead of using -ᇦ다 and -ᆸ다, -ᆸ다, as an exception, could have two possible romanizations, -Wta and -pta, depending on the word, but the Hangul spelling won't reflect this. The same goes for -ᆺ다 with -zta and -sta, instead of -ᇫ다 and -ᆺ다 (e.g., ᄃᆞᆺ다 and 벗다). In all other cases, both Hangul and romanization would represent the phonemic spelling as opposed to the historically faithful spelling that we use now, as per the proposal. This compromise would follow the convention found in monolingual dictionaries while still being consistent in providing readers with at least one phonemically faithful representation throughout all MK verbs/adjectives; there is no ambiguity, and the two different phonemes are distinguished.
This seems like a simple enough solution for a problem of an otherwise commonsense change IMO. The only downside I could think of is the need for manual input for transliteration, but MK already has these cases.
For those who might not fully understand or tl;dr, here is essentially what would happen:
Current entries with "historically faithful" spelling must be moved and could be converted to non-lemma entries as an inflected form. 맞다 becomes the lemma while 맛다 is reserved for an entry for "inflected" forms (if they are ever created, though ; Middle Korean다(-ta) had a more complicated usage compared to its modern descendant, so it wouldn't be a mostly empty, redundant entry. Nonetheless, I think the entry at Middle Korean다(-ta) will suffice). 다(-ta) would serve two functions: form part of the dictionary citation form as per the modern convention (with phonemic spelling) and as part of inflected forms (e.g., declarative mood suffix) (with historical spelling). The second case would ever only be seen in conjugation templates, quotations, or, as mentioned above, non-lemma stubs. This entails that, for example, 맞다(mac-ta), as the lemma, is the only form you would see in most parts of Wiktionary, while 맛다(mas-ta), despite being historically accurate, would only be seen in the above mentioned places. For the ㅸ/ㅿ cases, if accepted, the romanization 셟다(syelW-ta), the lemma version, would be the one seen in most parts of Wiktionary, whereas 셟다(syelp-ta), with the same Hangul spelling and "accurate/literal/surface" transcription, would, again, only be seen in the above mentioned places (telling the reader that it represents a real (attested or possible) form/inflection with 다(-ta), for disambiguation purposes). And, of course, for anything else, normal romanization rules apply (e.g., 셟고(syelp-kwo)); only ㅸ/ㅿ headword forms get this special treatment.
Examples of current entries whose main entry would be affected (if we adopt an entirely phonemically faithful lemmatization scheme) are:
더럽다(telepta) and ᄃᆞᆺ다(tosta) would be moved to 더러ᇦ다(teleWta) and ᄃᆞᇫ다(tozta), respectively. However, if the above-mentioned exception is applied, these would stay at their original locations.
벗다(pesta) would stay where it is, as its Hangul phonemic and historic spellings are the same; ᄌᆞᆽ다(cocta) would stay where it is, but ᄌᆞᆺ다(costa) is correct under the current convention.
여다(yeta) and 우다(wuta) would be moved to 열다(yelta) and 울다(wulta), respectively.
깃다(kista, “to rejoice”) would be moved to 기ᇧ다(kiskta), whereas 깃다(kista, “to cough”) would be moved to 깇다(kichta).
됴타(tyotha) and 나타(natha) would be moved to 둏다(tyohta) and 낳다(nahta), respectively.
Thank you for your thorough contribution. I am relieved to hear that my opinions on the matter are shared by other editors (no doubt more experienced than myself). Just one thing I would like to comment on:
In fact, 15세기 국어 활용형 사전 explicitly states this in its preface.
This seems misleading. Yes, that dictionary does indeed state in its preface that W and z stems would be listed with p and s respectively, but it also explicitly states that it is for convenience only, not indicative of an analysis of these stems as p and s stems under any circumstance ("... 이런 어간들의 기본형을 'ㅅ, ㅂ'으로 하겠다는 인식을 반영한 것은 아니고 편의상의 조치임을 밝혀 둔다."). Indeed, modern scholarly practice does not treat W and z stems as irregulars (although 표준국어대사전 does, that's just because it's ass), so we shouldn't either.
Now, if we choose to lemmatize these forms with p and s instead of W and z anyway for convenience, I do not necessarily object. I do, however, find myself wondering what convenience we gain by lemmatizing p and s if that means we have to manually specify the headword for romanization.
If we decide to lemmatize with p and s, I would also like to suggest that we use the W/z form in the hangul headword as well, not just its romanization. This would be aligned with how we don't include diacritics in entry titles but still show it in the headline.
I agree. Manual transliteration for the same "spelling" displayed would not be ideal. I would support Lunabunn's idea if we do decide to lemmatize with /p/ & /s/. AG202 (talk) 03:40, 27 December 2024 (UTC)Reply
Beekes
Latest comment: 7 days ago4 comments4 people in discussion
Bluntly, Beekes is neo-Vennemann, except for Greek, and without even an actual attested language (/family) from which to derive the substrate.
That may even be too polite. I personally have thought Beekes dubious since my first encounter with him (his grammar of Avestan, in which he identifies numerous Avestan roots without Sanskrit analogues, almost none of which are actually without obvious Sanskrit analogues). That said, I am joined by Meissner, de Decker, Vine, Verhasselt, Beckwith, Nikolaev, Woodhouse, Olson, Miller, Simkin, Colvin, Meester, Garnier, Nardelli, and countless others in my reservations about Beekes as a source in the specific matter of Greek etymology/'Pre-Greek'. Even *within* Leiden, Beekes was considered peculiarly dogmatic, even by very close colleagues (e.g. Lubotsky, Kloekhorst, etc.) - indeed, even van Beek, his prize student, has published numerous papers over the last several years, especially after Beekes' death, rejecting Beekes' particular approach to Pre-Greek. Kroonen's public critique is also worth noting.
I have numerous criticisms of Beekes' approach to Pre-Greek, and am happy to systematically go through them if anyone should wish, but I hardly need to, since the critical scholarly literature is, at this point, voluminous. That said, if anyone is curious, do ask.
I am not going to go so far as to say that Beekes should not be cited at all on matters of etymology, but his views should *always* be tagged as his, as opposed to in the voice of Wiktionary, and preferably with a modifier that makes it clear that his views do not reflect the communis opinio ('Beekes, typically, assigns...' or similar), where applicable, which is frequently the case.
Beekes's etymologies are all over Wiktionary not because we find him particularly reliable, but because his accessible dictionary is the only one in English, so it was easily copy-pastable into Wiktionary. Frisk is in German, Chantraine is in French. Others' English etymologies are sprinkled across inaccessible articles.
Regular Wiktionary editors all have made pertinent observations and more or less openly concluded with remarks encouraging liberal dismissal of Beekes’ etymologies.
It would be more frank to mark etymologies as unknown or uncertain or otherwise speculated upon, while pushing Beekes’ claims of to his mere reference, not worthy of taking space in serious etymology, since collectively they have to be regarded as nuisant.
Of course, the silver bullet for anyone in the know about the particular philology is to cite an author or more to positively provide differing opinion. You don’t “need to” but it is a gain for all of humanity and your personal scholarly achievement. There is a mismatch between those who have an intimate familiarity with certain comprehensive university libraries and other historically interested people who attempt to have conceptions of the past, if only because one works on another language touching upon Greek. Our open Hellenic lexicography is seriously underdeveloped, and part of it is uncritical thinking, burdened by Beekes’ dogmaticism and lifeless superficiality in place of inviting examples of how language science is actually done. Fay Freak (talk) 16:19, 19 December 2024 (UTC)Reply
French Wiktionary Word of the Year
Latest comment: 8 days ago3 comments2 people in discussion
It started on November 15th with a call to suggest words, without any specific methodology in mind, like an analysis of statistics of reading or anything. A dozen of people suggested about 50 words. Then in December, we had a vote with 30 participants and a simple result as a list. It wasn't perfect but it was not that complicated to do.
To my knowledge, it wasn't experimented yet in English Wiktionary, is it? If you want to try next year, I suggest you create an on-going draft to keep track of some new words during the year, it would make the selection easier. Also, having a meeting in person with seven Wiktionarian in December helped a lot. Finally, I am not hoping any echos in the press this year, but we may work to build something for 2025 and 2026, and I think we could be stronger together, if several editions of the Wiktionary project are organizing a similar initiative in parallel. So I invite you to try it too! Cheers Noé12:33, 19 December 2024 (UTC)Reply
@Noé: there is currently an ongoing vote on whether to have a Word of the Year, and what that word should be. Currently, it looks like the vote will fail, as it failed last year. There doesn't seem to be enough support for the proposal at the English Wiktionary. — Sgconlaw (talk) 12:56, 19 December 2024 (UTC)Reply
Thanks for pointing this discussion, I missed it in November Beer parlour, and it was not called back in December. It is interesting to read the various opinions on this process and goals. I did not asked for a collective validation at first, I just started it and I realize now that it should have be nice to open a discussion first. Well, sometimes, it is hard to have pros and cons on something completely new, without having evaluation what words may be in the final list. Having two weeks to collect entries suggested by anyone and a simple vote with top 5 was, I think, was easier to manage that your way of doing it. I am not sure. Well, if someone want to discuss this idea next year, in October maybe, I would be glad to help with more feedback on our experimentation and media responses Noé13:20, 19 December 2024 (UTC)Reply
Latest comment: 7 days ago1 comment1 person in discussion
I don't understand what kind of situations this sentence refers to: "If there are multiple paraphrases in the target language for an English term but no direct translations, one such paraphrase may be provided after {{no equivalent translation}}." Template:no equivalent translation/documentation isn't helpful either. What is "potentially unidiomatic / sum-of-parts descriptive" supposed to mean? I want to know when this template should be used and when it shouldn't.
More generally, I am often in doubt about what to do when the most direct translation (with the same part of speech) isn't actually the best translation. I know I'm not the only one with this issue, because such tricky translations are currently most often left blank. Can we clarify the relevant sections? —Caoimhin ceallach (talk) 10:08, 20 December 2024 (UTC)Reply
Latest comment: 4 days ago7 comments5 people in discussion
Hi there, the Dobrujan Tatar language doesn't have a separate language code. Therefore is used in Wikis. But in Wiktionary there is only Crimean Tatar, and when I add a word in Dobruja Tatar it appears in Crimean Tatar categories. This is a problem, because the languages use different orthography and are not actually not so connected how it seems. Also there is the Category:Dobrujan Crimean Tatar, but this naming is wrong, it's Dobrujan Tatar. Would it be possible to use the code Dobrujan Tatar, with Dobrujan Tatar categories? Zolgoyo (talk) 10:34, 20 December 2024 (UTC)Reply
Hello! I personally do not think we need a separate name space for Dobruja Tatar specifically, for the following reasons:
1. It is a dialect of Crimean Tatar (so assumes Ethnologue and Glottolog.)
2. From what I have seen, Wiktionary does not show dialects in their own name space, to give examples on Turkic languages we have:
Yenisei Kyrgyz (Old Turkic,) uses different letters altogether and would be illegible to someone familiar with the Orkhon script. Orthographical differences is not a big deal for inclusion.
Viryal and Anatri Chuvash represented as just 'Chuvash' (except in etymologies)
Various dialects of Turkish and Azerbaijani, all shown with a lb tag.
Kumandy, Kuu-Kizhi and Kyzyl dialects (which can be quite divergent at times) of Northern Altai are under the same name space.
and so on... Dobrujan Tatar would be best to be shown just by a lb tag, so like the rest.
3. There seems to be only one published dictionary for this dialect ('Dobruca Kırım Tatar Ağzı Sözlüğü'), and the vocabulary is clearly reminiscent of the main Crimean Tatar one.
4. If Wiktionary added Dobrujan Tatar, then why shouldn't it add Nogai Tatar also? Spoken 10 km. north of the Dobrujan Tatar speakers with a far divergent lexicon?
However, this is my opinion. We really don't need this name space.
We have Nogai as a distinct code, CAT:Nogai lemmas. And we do often have a separate code for varieties traditionally considered 'dialects', if this is found necessary to effectively document the variety. I can't speak for this specific case though. Thadh (talk) 01:03, 22 December 2024 (UTC)Reply
These Dobujan Tatar words are from a book, which is probably not so good for etymology. Not bad book, but be carefull. Check it out on Tomriga in the references to Taner Murat. He writes "Tomri - queen of Mesagetes, also known under Persian form Tahm-Rayis, Greek Tomiris.... from her name came the name for Dobruja province, Tomriga". The guy is obviously a Turkic nationalist from the parallel world where Massagetes are speaking Tatar and establish Dobruja. Tollef Salemann (talk) 01:39, 22 December 2024 (UTC)Reply
It seems like that. The dictionary there is not quite academic I figured.
Moreover, it seems like Dobrujan Tatar is just a descendant of a (relatively) larger 'Romanian Tatar' family. This article also says how similar the Dobrujan Tatar dialect is to Crimean Tatar, saying how children use Crimean Tatar primers/reading books in schools.
There's also a poem, in Dobrujan Tatar, that's the extent I could find about this language.
Note that the Nogai Tatar you speak about are probably quite different from Nogai lemmas listed in the category which Thadh speaks about. The Nogais of Dobruja are related to Nogais of Caucasus, but they have splitted up in 1850-60s because of the war with Russia. I mean, they have splitted even before it, but had some contacts until 1850-s. So their language are probably closer to the Crimean. Tollef Salemann (talk) 02:00, 22 December 2024 (UTC)Reply
It does, to be fair, say at the top of the page that "Tests can be used as guides during RFD, but they are not hard/fast rules", but, even so, one would expect the guidelines to at least apply to the examples given. Mihia (talk) 09:47, 23 December 2024 (UTC)Reply
The closing statement from the 2016 RFD is quite interesting. I wonder if there's more to the history of the ‘tennis player test,’ because this alone makes it pretty questionable. Seems THUB was the keep reason all along?
RFD kept as no consensus for deletion: ≥ 12 keep votes. Note that translation target was used often as the keeping rationale, while the "tennis player test" was rejected by multiple participants.Polomo47 (talk) 00:08, 23 December 2024 (UTC)Reply
Whoops, I wasn't aware of the policy when I did that, given the policy existence it would need be removed. But IMO the policy itself sounds pretty dubious, by its wording it would also allow professions such as turtle feeder or cookie taster. I would personally ditch the policy and keep tennis player for THUB. The test's paragraph itself claims its partial redundancy to THUB anyways. Catonif (talk) 08:51, 23 December 2024 (UTC)Reply
Noting that "COALMINE" is mentioned individually in the CFI. So is the "fried egg" test, which is also part of "WT:IDIOM", implying that that one is policy too, I suppose? I haven't checked all the others. Mihia (talk) 15:36, 23 December 2024 (UTC)Reply
I’m not that favorable to the test either. If almost all terms that qualify for it also qualify for THUB, then all it does is prevent us from adding (This sense is a translation hub). Is that desirable? I don’t think so, since the main reason for keeping them appears to be translation. Polomo47 (talk) 15:08, 23 December 2024 (UTC)Reply
I could be wrong, but I think the Tennis player test predates a consensus on keeping translation hubs. So it may have been a good workaround when it was first proposed, but it seems redundant now. Andrew Sheedy (talk) 16:06, 23 December 2024 (UTC)Reply
Romance languages: reflexive verb forms and enclisis
Latest comment: 3 days ago12 comments4 people in discussion
This discussion is an offshoot from this RFM, which discusses reflexive verbs in Portuguese specifically. Said RFM in turn derives from this RFD discussion.
Currently, some Romance languages have a specific way of making entries for reflexive verbs; others do not have a pattern at all. Per @Benwing2, Spanish and Portuguese currently follow this scheme:
The page without-se lists, for Spanish, that the word is only used with a proclitic pronoun; see Spanish automedicar. For Portuguese, the page without se usually does not exist.
If a verb has reflexive senses in addition to non-reflexive ones
Some Portuguese editors complained about this arrangement a while ago. We proposed a new scheme in an RFM (linked above), but some editors felt the need for consistency with other Romance languages. Thus, this is a proposal on changing/standardizing how it works for most other Romance languages — the use of unhyphenated enclisis (despedirse vs. despedir-se) changes things slightly. For languages that do use a hyphen in their enclises, such as Catalan, a proposal closer to the one for Portuguese is more adequate.
The proposal, for languages with unhyphenated encliticals:
If entries exist for both the forms with -se and without it, they will get merged under the page without -se. The entry at the page with -se will list infinitive of verb combined with se.
If an entry exists only at the page with -se, it will be moved to the page without -se. In its place, the page will list infinitive of verb combined with se.
A brief list of applicable reasons. For more detail, please read the Portuguese RFM and RFD discussions (which also includes some unapplicable arguments).
It is inconsistent and confusing to list reflexive-only verbs at the page with -se, but list verbs with reflexive senses only at the page without -se.
Listing reflexive-only verbs at their enclitical forms implicitly prescribes the use of enclisis, but proclisis is just as valid and may even be used more often.
By having the entry under the page with no -se, we could format its headword to include both forms. Like, automedicar-seorse automedicar
Among dictionaries, there is no consensus on what URL reflexive verbs get put under. The only consensus is that the headword includes the enclitical pronoun, which we can do regardless per the above.
CC: @Benwing2: For Spanish, honestly, I'd match what the RAE does: if the verb is only used pronominally/reflexively, then they put the lemma at the version with -se. Ex: RAE entry for automedicarse. I really don't like the idea of putting "se" in the headword at the lemma without "se", especially when the page with "se" already exists. That seems to add a much higher level of inconsistency.
I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former, as the verb is only used pronominally. What we have now isn't my favorite way to go about things (I'd have the reflexive usages at the entries with -se, regardless of if the non-reflexive version exists), but it's better than having everything at the bare infinitive. There's also precedent, at least with Spanish. AG202 (talk) 03:00, 23 December 2024 (UTC)Reply
I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former. How so? My proposal is that we move the definitions over precisely to solve this type of issue.
Also, while the RAE categorizes URLs in that way, the RGL does not, and many Portuguese dictionaries don’t either. I don’t know about Italian, though. Polomo47 (talk) 03:56, 23 December 2024 (UTC)Reply
@Polomo47: Oops, I meant search for the former and be directed to the latter, sorry. Learners are more likely to search for the forms with "se" is what I wanted to say. AG202 (talk) 06:13, 23 December 2024 (UTC)Reply
Hm, I’m not confident that’s how people usually search for words. I would expect native speakers (even if we don’t particularly appeal to them) as well as more advanced learners to search without the enclitical. That’s what I do, at least — do others google differently? Polomo47 (talk) 15:04, 23 December 2024 (UTC)Reply
At least for Spanish, having been studying it since 2013, I've almost always seen learners search with the "se" form once they're aware of it, as that'll give them more direct hits, especially from learning websites. In pretty much every learner's text as well, they'll be listed as the "se" form in any vocabulary section. I personally still search that way as well. For (notably Brazilian) Portuguese, I'd expect the trends to be different, since the se forms aren't used as much. AG202 (talk) 17:38, 23 December 2024 (UTC)Reply
I actually find this to be very persuasive in Spanish's case. Having had some mild interactions with it over the years, it's very true that Spanish speakers just love their "se" forms. — comparatively, "lo" forms go essentially unused by Portuguese speakers around me.
While I'm really starting to think that 'it tracks' that reflexive clitics in Spanish are seen as more integral to the verb — and not necessarily because of the spelling — I can't help but wonder about other forms.
I hope I'm not bringing this up too early when we haven't even truly talked at length about the initial proposal, but do we really need pages for all the forms? This likely enters CFI territory, but I'd like to draw some attention to the non-reflexive forms. In Spanish medicar and mostrar, there's an entire table dedicated to combined forms, and yet I see several that might be missing?
Admittedly, I don't know a lot about Spanish, but one such form would be "medícote" — corresponding to "te medico" in proclisis — or something like "mostrárlela". Perhaps Spanish's rules forbid these pairings (tho I did get a hit for the latter), but Standard Galician's doesn't. — you'll find many hits for, say, quérote and mostrarlla online. There's even a TV program named Dígocho Eu.
I guess we could include all of these combinations (every single tense of many many verbs with nearly every single clitic tacked on afterward — me, te, che, vos, os, o, ma, mo, ta, to, cho, cha, lle, lles, nos, lla, llo, possibly a couple more), but I can't help but think it'd be a more productive use of our time to instead draw a line somewhere.. I'm getting some serious COALMINE conversation flashbacks right now. MedK1 (talk) 19:13, 23 December 2024 (UTC)Reply
It occurs to me there are various possibilities for the way reflexives are handled, and this may have some consideration on the ultimate outcome (please expand with other languages):
Reflexives are always enclitic, and written as part of the verb. Examples: East Slavic (Russian, Ukrainian, Belarusian, ...) and North Germanic (Icelandic, Swedish, Danish, Norwegian, Faroese, ...).
Reflexives are normally proclitic, including in particular on the infinitive, and written as a separate word. Examples: German, French, apparently also Romanian. (Clarifications: German reflexives sometimes come after the finite verb, particularly when the verb is in V2 constructions and in imperatives. French reflexives come after imperatives and are joined by a hyphen, and when coming before the verb are joined with an apostrophe if the verb is vowel-initial.)
Reflexives are sometimes proclitic, sometimes enclitic. AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive.
When enclitic on the infinitive, the verb + reflexive is written as a single word. Examples: Spanish, Italian, Galician in standard spelling.
When enclitic on the infinitive, the reflexive is attached to the verb with a hyphen. Examples: Portuguese, Galician in reintegrationist spelling.
When enclitic on the infinitive, the reflexive is written as a separate word. Examples: West Slavic languages (Czech, Polish, ...), South Slavic languages (Bulgarian, Macedonian, ...).
I mention this because there is a lot of inconsistency in how reflexive verbs are lemmatized, and it may partially correlate with the way the reflexive infinitive is written.
AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive. Is that really how it works? In the case of Portuguese, from what I gather automedicar-se is no more valid an infinitive than se automedicar — the former is just the preferred form used by dictionaries because (1) it’s a single word (2) it’s less predictable than proclisis (3) it’s something people generally like to prescribe, lol. I’ve yet to find another explanation for the preference for enclisis, but I have no reason to believe it’s because automedicar-se is the only possibility. Polomo47 (talk) 04:06, 23 December 2024 (UTC)Reply
Sorry, I meant to clarify that "all such languages have the reflexive pronoun enclitic on the infinitive" refers to how dictionaries express the forms. I know that Brazilian Portuguese, for example, leans towards proclisis in all cases and thus says vou me deitar, not #vou deitar-me. West Slavic languages similarly are very flexible in word order and sometimes have the reflexive pronoun before the infinitive and sometimes after, but all dictionaries I've seen lemmatize the reflexive pronoun after. In contrast, French dictionaries always list reflexive infinitives with the reflexive pronoun before, because it never comes after in actual usage. Benwing2 (talk) 05:15, 23 December 2024 (UTC)Reply
I mentioned above that Galician has a hundred forms (bare minimum) that Spanish completely lacks coverage for at the moment (potentially because they don't exist over there? I wouldn't know); you can mix and match any tense with any clitic for the most part.
It might be worth noting that for Portuguese, these countless forms exist as well, and often with more patterns — Galician roughly shares the European Portuguese rules prioritizing enclises, while for Portuguese, we have Brazilian Portuguese's proclises preferences to consider as well.
Since for Portuguese, they're framed as 'regional preferences' rather than the rules actively changing, you get far more possibilities than you would normally, all of them being SOP — you have either a separate word before, a separated suffix or a separated infix according to tense.
With Brazil liking proclises, the lemmatized enclitical can end up being quite rare in comparison to the proclitical ones. "Precisamos parar de automedicar-nos" even sounds weird in comparison to nos automedicar to me. You can have similar sentences for -te and others too. Do note that these are all considered impersonal infinitives (i.e. the ones that get lemmatized in Wiktionary).
For these and many, many other reasons, it stands to reason that one shouldn't include any of those clitical forms as separate pages for Portuguese at least. This doesn't necessarily mean anything for Spanish; more and more I'm thinking their systems are different beasts altogether and as such should be treated differently..
Latest comment: 1 day ago12 comments6 people in discussion
Please lift the ban on including Yiddish terms attested in Latin characters. I know that writing it with other scripts is uncommon (except to assist beginners), but there are a few lengthy Yiddish works written mostly or entirely in the Latin script. Examples:
Probably even more. And in Cyrillic as well I guess? Also, I remember to own myself "Di Avantures fun Alis in Vunderland", having both Hebrew and Latin script in the same book. Tollef Salemann (talk) 02:20, 23 December 2024 (UTC)Reply
The Brill article confirms that Cyrillic is another script, yes, but it is the rarest of the three (and the only other script in which Yiddish is attested, as far as I'm aware). I welcome lifting the prohibition on that as well. Anyway, cheers for suggesting another source. (((Romanophile))) ♞ (contributions) 04:06, 23 December 2024 (UTC)Reply
Does the Wiktionary's transliteration of Yiddish terms match the spelling used in these books? I tried searching for some random words from the "Di Avantures fun Alis in Vunderland" preview sample and successfully found the relevant Yiddish entries on Wiktionary. Is this not good enough for the end users? --Ssvb (talk) 05:16, 23 December 2024 (UTC)Reply
Usually they do match, though historically Romanizations of Yiddish have varied in form and consistency. In any case, utility is not the motive here. We already have Romanization entries for Chinese, Japanese, and Serbo-Croatian. I doubt that a proposal to delete them would succeed on grounds that there are already transliterations in the main entries, thereby making the Romanization entries 'redundant'. (((Romanophile))) ♞ (contributions) 06:31, 23 December 2024 (UTC)Reply
I wouldn't advocate deleting them. Just creating additional Latin script entries and keeping them in sync with the Hebrew script entries is an extra maintenance effort. If contributors are ready to spend their time and efforts on that, then it's fine. If attestable Latin spelling of some terms encountered in real books differs from the transliteration of their corresponding Hebrew script entries, then these can be probably prioritized. --Ssvb (talk) 09:11, 23 December 2024 (UTC)Reply
Yes. This has been requested at least twice in the past year, once by me at Wiktionary:Beer_parlour/2024/April#Latin-script_Yiddish and once after that by someone else somewhere else... but although there seems to be support for at least allowing Latin-script entries to point to the Hebrew-script entries, like is done for Arabic-script Afrikaans (pointing to Latin-script Afrikaans) (or, in a different vein, for Latin-script Gothic), neither I nor anyone else has gotten around to it yet. Well: unless there are objections, I will finally add "Latn" as another script to yi in, say, a week (ping me if I forget), with the understanding that Hebrew script will continue to be lemmatized at least in most cases. - -sche(discuss)06:28, 23 December 2024 (UTC)Reply
I personally favor a treatment like Serbo-Croatian where both scripts are lemmatic, but I won't feel devastated if we treat the Latin script as secondary to the Hebrew one either. You may want to include the Cyrillic script as another option, too, though I don't have examples on hand. (Yiddish's cousin Ladino is more my field of expertise. Or should I say Spanish Yiddish?) (((Romanophile))) ♞ (contributions) 06:42, 23 December 2024 (UTC)Reply
My understanding is that this is the intention, yes; in the April discussion, Benwing proposed using {{spelling of}}, which would look like this. @Romanophile, if at some point in the future we have the ability to lemmatize two different scripts/spellings without them falling out of sync (e.g. via them both "transcluding", with "smart" changes, some underlying central backend page), I would support "double-lemmatizing" a great many things, but for now it would just lead to duplication. - -sche(discuss)16:52, 26 December 2024 (UTC)Reply
If it's just a stripped down soft redirect entry, then the required maintenance effort is low. BTW, does it need a declension table? And what would be the right place for book quotations in Latin script? I'm interested in this topic, because many of the same guidelines would probably also apply to Belarusian Łacinka, like the horny entry. --Ssvb (talk) 17:33, 26 December 2024 (UTC)Reply
As Yiddish has been a contemporary of Early New High German, there needed to be Yiddish text in blackletter, and certain Germanists on the continent regularly deal with these Early Modern equivalents, but from the perspective of Anglos it is a suppressed blind spot: fractura est, non legitur. We have to cover Yiddish in Latin script like we include Hebrew spellings of Arabic language as Judeo-Arabic. The current Hebrew-written standard is just a later Ausbausprache like Luxembourgish, but unlike Luxembourgish, which is within the ballpark of another broader dialect (Category:Central Franconian language), Yiddish, due to ethnic and cultural separation, always was a distinct dialect, though the Middle High German beginnings are difficult to oversee, of course. So I don’t see how it was ever banned, only a skewed perspective; more parsimoniously one may observe an oversight in the language data, which until now only lists Hebrew script for Yiddish, factually wrong. A few times I also added Serbo-Croatian terms in Arabic script only to be annoying, without any preference for it and without believing it to be prohibited, only that rendering is faster if we only check Latin and Cyrillic script. Fay Freak (talk) 17:16, 26 December 2024 (UTC)Reply
Dutch defective verbs
Latest comment: 1 day ago8 comments3 people in discussion
(Notifying Mnemosientje, Lingo Bingo Dingo, Azertus, Alexis Jazz, DrJos): I am working on an update of the Dutch verb conjugation module, and in that I came across the issue of how to handle defective verbs. These are verbs that act like they have a separable part, but are (generally) not actually separable.
I usually use woordenlijst.org for checking Dutch conjugation, and it seems two distinguish two types of defective verbs. The first is verbs like herinvoeren, for which the subordinate clause form is given, but the main clause omitted. The second is verbs like zakkenrollen, for which only the infinitive and present participle is given. However, searching online, it seems that in actual usage, the second type is actually used exactly like the first type (i.e., forms like zakkenrolt and zakkenrolde are attestable). I added the option to specify these types of verbs through a parameter |subonly= (see the bottom of the page at User:Stujul/test-nl-conj).
My main question is about how to categorise these verbs. Currently there are two categories for these verbs: Cat:Dutch defective verbs and Cat:Dutch uninflected verbs. The first is added manually and the second is added by a parameter in the headword template {{nl-verb}}. These should definitely be merged. But should the two types of defective verb I mentioned be categorised separately as different subcategories, because the forms of the second one are nonstandard?
I hope to hear your opinions on this.
PS - sorry if this not the appropriate place for this discussion.
If forms of zakkenrollen are missing, might it be the woordenlijst that is defective? In the conjugation table on the Dutch Wiktionary all seem to be present, although the subjunctive currently seems unattestable. Here, for example, is a use of gezakkenrold, and here of finite zakkenrollen in a main clause. Is it not just like stofzuigen (not only semantically, but also grammatically)? --Lambiam21:07, 23 December 2024 (UTC)Reply
Maybe zakkenrollen was a bad example. It seems indeed to be used more like stofzuigen. This may have to do with the fact that rollen is a weak verb. For example geboogschiet and gelipleest return far fewer results than respectively booggeschoten and lipgelezen. About the Dutch Wiktionary's approach: I found a list of such verbs and most are listed as fully defective there. liplezen gives the main clause forms in parentheses, and on the main page gives a note that these forms appear sporadically. I also note that some verbs that you may expect to fall into this category are actually given as complete verbs on woordenlijst.org, e.g. hartenjagen.
It may just come down to a case to case analysis, but it would be nice to have a standard approach when dealing with such verbs, as we are currently very inconsistent with it.
Gelipleest is orthographically wrong anyway; /ɣəˈlɪp.leːst/ should be written as gelipleesd. But liplezen is one of the entries on this list of defective verbs.
We are not prescriptive; shouldn’t three properly attestable uses of forms like gelipleesd or lipgelezen trump any lists and suffice for including these forms (with a note warning that they are not generally accepted)? Here are two uses “in the wild” of lipleesde: , . --Lambiam11:26, 24 December 2024 (UTC)Reply
Sure, we are not prescriptive, and three attestable uses do merit an entry for these forms, I don't disagree with that. But I'm not sure whether we should include these forms in the conjugation table on the lemma entry. You can find many "in the wild" uses of "ik leesde", but we don't include that form in the table at lezen. Of course, in that case, there is a clear "correct" and "incorrect" form, while for liplezen, there isn't a "correct"/"standard" form we can point to (should it be lipleesde, liplas, las lip,...).
The Dutch Wiktionary is again inconsistent in this regard: indeed heruitbrengen is conjugated as a normal separable verb, herinvoeren gives an alternative construction "ik voer opnieuw in", and heruitzenden just leaves the main clause forms empty.
Both these forms that you gave also feel wrong to me.
I'm amazed that I was completely unaware that these kind of verbs existed. Thinking about it I would indeed categorise them as defective, as the woordenlijst does. If you put a gun to my head I might indeed say "ik zakkenrol" or "ik herindeel", like other speakers, but they still don't feel quite right. My intuition is that these forms which can be sporadically attested are ad-hoc formations. Some standard strategy to deal with these in the language may crystalize at some point, but the fact that everyone feels unsure about them shows that it hasn't yet. —Caoimhin ceallach (talk) 18:02, 26 December 2024 (UTC)Reply
Latest comment: 16 minutes ago3 comments2 people in discussion
I'm probably not the first person to ask this, and I likely won't be the last: but what is the reason for Wiktionary to use conventional Israeli romanization (i.e. based on colloquial Israeli Jewish pronunciation) over something more narrow and scholarly like ISO 259? Narrower transliterations have a lot of bells and whistles, sure, but I think they still do a good job at being a compromise between various historical, regional and cultural variants of Hebrew. Why should non-geminated ⟨צ⟩ be written as "ts" when that's not how Yemenite or Sephardic Jews pronounce it? Why should ⟨ח⟩ and non-geminated ⟨כ⟩ be rendered both as ⟨kh⟩ when this merger pretty much only happens in Israeli Hebrew, while every other dialect still distinguishes the two? Why should ⟨א⟩ and ⟨ע⟩ not be rendered at all when, even inside Israel, some Jews do pronounce them? Even if Israeli Hebrew is the de facto standard dialect these days, the common transliteration isn't even the de jure standard, that would be the Hebrew Academy's, which is slightly different. I understand Hebrew is a living language, but if you're like me, a non-Jewish non-Israeli who has a mostly academic historical linguistic interest in Hebrew, the modern Israeli transliteration is just not very useful. Sure, it's more "phonetically accurate" (as discussed, for a single dialect anyway), but isn't that what the IPA section is for?
Obviously we'd have to agree on the details of the transliteration, and I have my opinions on the specifics, but overall, I think a narrower transliteration would make much more sense. It would also likely allow us to begin some sort of automatic transliteration template that languages like Russian, Arabic and Greek have got going on. Pescavelho (talk) 15:55, 27 December 2024 (UTC)Reply
No good reason, sure, only catering to cognitive biases of majorities. The thought of continuing to use your English keyboard without any acquired extra characters is just too appealing.
In recent months, I have increasingly succeeded to see through the grievances of the world as being the consequences of neurotypicals splitting up the world, they ever imagine, into social relations: what is relevant in the present context (see it again!), for this reason, is that they fail to imagine capable keyboard layouts or input methods, and rather configure six different keyboard layouts if they know French, Spanish, Romanian, Turkish and German, for instance, in addition to English, rather than to use the international version of any of these layouts, or a Unicode search made accessible on their machine for the very occasional but recurring goal of transcribing certain foreign phonemes faithfully.
Engaging the habit learning circuitry of the brain to switch to a more convenient, even if less intuitive (according to neurotypical cognitive biases), input setup would be easy though: it is just excusable, not defensible, not to switch to us(intl) or de(deadtilde) from us(basic) (in /usr/share/X11/xkb/symbols/), and many neurotypicals editing this dictionary or similar academic works already succumbed to this which is reasonable. I also use the actual Russian layout, with extensions, ru(prxn), for all Cyrillic languages, when my neurotypical bro is ticked off by it because its assignments do not phonetically correspond to the ones on the standard German layout—all being invented by someone around 1900 and hence carried forward, few ever questioning it, the social pressure to type the same layout with “ten fingers” is too high.
One just has to look up which combination can be utilized to get bonus characters, and repeat until one does not need to expend notable brainpower for it. Juggling multiple languages to maintain polyglotism is a context where one needs bonus characters, like it or not (everyone shall like it, following the adapt neuroscientific recipe). Fay Freak (talk) 16:33, 27 December 2024 (UTC)Reply
Is the point here that it's "too cumbersome to type"? That feels subjective, some people would feel like setting up all the templates an average Wiktionary page uses is rather cumbersome (I've certainly felt so at times). In any given case, I'm hoping the adoption of a narrower transliteration would go hand-in-hand with automated transliteration, so this concern would be null and void. Pescavelho (talk) 21:38, 27 December 2024 (UTC)Reply
Adjective definitions
Latest comment: 1 hour ago1 comment1 person in discussion
E.g.:
Whose first and last vertices are different.
That ends in a vowel.
My feeling is that adjectival definitions of this style seem old-fashioned or cryptic, and are potentially difficult for modern readers to understand. I would change them where I seem them to e.g. "Ending in a vowel", but does anyone else have an opinion? Mihia (talk) 20:53, 27 December 2024 (UTC)Reply