Wiktionary:Beer parlour/2015/March

Templatizing topical categories in the mainspace

FYI: Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace.

Let us postpone the vote as much as discussion needs.

This thread seems related: Wiktionary:Beer_parlour/2015/February#Simplification of topic categories adding. --Dan Polansky (talk) 21:32, 1 March 2015 (UTC)

How is this even close to being ready for a vote?

m.Wiktionary.org: (all) Edit pages

Hi, this message is to let you know that, on domains like en.m.wikipedia.org, unregistered users cannot edit. At the Wikimedia Forum, where global configuration changes are normally discussed, a few dozens users propose to restore normal editing permissions on all mobile sites. Please read and comment!

Thanks and sorry for writing in English, Nemo 22:32, 1 March 2015 (UTC)

Thanks for the news. We forgive you for speaking in English. --Type56op9 (talk) 14:44, 5 March 2015 (UTC)

Sports logos in images

Happened to notice both woman and American have sponsorship logos clearly visible in the image thumbnails. If we need to illustrate these concepts, can we find images which aren't as corporatish? Pengo (talk) 07:16, 2 March 2015 (UTC)

We should also extirpate all national flags, political slogans, references to NGOs, religions, etc. not essential to the ostensive definitions the images provide. DCDuring TALK 12:32, 2 March 2015 (UTC)

Logos I'll grant that getting rid of a corporate logo for a generic concept like "woman" is probably a good idea but an American flag behind an American on the entry for "American" doesn't seem like a problem to me. In this case, the image contains the word "Toyota", which is the problem, not American symbols. —Justin (koavf)❤T☮C☺M☯ 14:12, 2 March 2015 (UTC)

I agree that the American flag in ] is OK. I've switched the entry's image to one which is similar in every way except that it lacks the Toyota logo. - -sche (discuss) 17:34, 2 March 2015 (UTC)

`{{l-self}}`

The documentation for {{l-self}} claims it does not support tr=, but a simple test reveals this is not the case. The question is then: should it? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 14:29, 3 March 2015 (UTC)

In principle there's no reason why it couldn't. —CodeCa t 19:55, 4 March 2015 (UTC)

But are there any languages that use transliteration within inflection tables? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 20:06, 4 March 2015 (UTC)

Yes. And this template isn't used only in inflection tables. It's used for any template that includes links to the same language. And the underlying logic which omits links to the current page is also used by {{head}} for the inflections: {{head|en|noun|plural|fish}} on fish will not generate a link for the form. —CodeCa t 20:19, 4 March 2015 (UTC)

Parameter for Template:head to indicate that a form is missing

Several templates across a variety of languages have custom-written code to show a message like "missing" or "please provide" if one of the forms in the headword line is lacking. For missing genders, we already have a standard approach that {{head}} understands, which is to use "?" as the gender. I'd like to do the same for headword-line forms, so that the following will automatically generate a message and categorise the entry appropriately: {{head|en|noun|plural|?}}. Of course, templates written to use Module:headword or {{head}} can then use this themselves.

Of course, the downside is that you can't link to the entry ? in the headword line anymore, which is probably not normally going to be a problem, but there may be a few edge cases where it turns up. So an alternative way would be to include an extra parameter to indicate that a request should be included in case of a missing form. Something like this: {{head|en|noun|plural|f1request=1}} or perhaps the shorter {{head|en|noun|plural|f1req=1}}. This would then fit into the same fN... format that many of {{head}}'s parameters already use.

I don't expect there will be much opposition to this, but I'd like to ask anyway just in case. If you have a preference for one of the two proposed approaches, please indicate this. —CodeCa t 19:54, 4 March 2015 (UTC)

The first one looks much better, is there (will be there) any edge case to start with? I don't think there would be any. --Z 08:13, 18 March 2015 (UTC)

Min Nan loanwords

How should Min Nan loanwords from Japanese be written when they don't have any kanji/Chinese characters? Min Nan is usually written in Chinese characters or in POJ. Should they be written in Pe̍h-ōe-jī, or should they be written in hiragana/katakana? For example, the Taiwanese Min Nan word for ice cream is "ai55 sirh3 khu33 lin51 mu11" according to 臺灣閩南語常用詞辭典. Currently, I have written it as アイスクリーム (ai55 sirh3 khu33 lin51 mu11) in the translation box under ice cream. The problem with loanwords is that that they don't follow tone sandhi and may not even have one of the 7 tones of Min Nan, which is problematic for POJ. Any ideas for this situation? Justinrleung (talk) 03:56, 5 March 2015 (UTC)

Min Nan terms should be written as they would be by Min Nan speakers. Unless Min Nan speakers use katakana to write the terms, we shouldn't. If we know the Japanese terms that are borrowed, those should be linked to in the etymologies for the Min Nan entries, but not be used in the names of those entries. Beyond that, I would refrain from meddling with a language I don't know. Chuck Entz (talk) 04:16, 5 March 2015 (UTC)

We should probably use the attestability for translations, just like for entries. I doubt "アイスクリーム" (Japanese for ice cream) can be attested to be Min Nan or any Chinese topolect, besides, it's a borrowing (ultimately) from English, so "ai55 sirh3 khu33 lin51 mu11" is a Min Nan pronunciation of "ice cream". Min Nan (Hokkien) is mostly a spoken dialect. If a written form is missing, then it shouldn't be added. As an example, Armenians use a lot of Russian words in speech but those terms lack a written form (ask User:Vahagn_Petrosyan). There are many other cases with diglossia or when a language/dialect lacks a well-developed written tradition.

The other issue is non-standard transliteration, as in tempura, see Min Nan translations 天麩羅 (thian35 pu55 lah3). As Justinrleung explained, it's not a standard tone sandhi but the source is only one online dictionary. --Anatoli T. ^{(обсудить}/^вклад) 04:24, 5 March 2015 (UTC)

@Anatoli, the source is clearly English, but I'm curious if the Min Nan term might not have arrived via Japanese? ‑‑ Eiríkr Útlendi │ Tala við mig 16:49, 9 March 2015 (UTC)

Are there any Min Nan speakers who can give any suggestions to this problem? Justinrleung (talk) 20:14, 7 March 2015 (UTC)

We have currently no native Min Nan speakers. The term may be derived from Japanese but katakana is not used to write Min Nan. It would be hard to attest both the Japanese spelling "アイスクリーム" and the "ai55 sirh3 khu33 lin51 mu11" since Min Nan, as I said, is mostly a spoken dialect. If it's written down, it's written in Chinese characters or Pe̍h-ōe-jī. The source above doesn't suggest the term is written in katakana in Min Nan. Here's what the dictionary says with English translations in brackets:

詞目 ai55 sirh3 khu33 lin51 mu11 (dictionary item)
日語假名アイスクリ－ム (Japanese kana)
日語羅馬拼音 aisukuriimu (Japanese rōmaji)
釋義冰淇淋(附錄－外來詞表) (meaning "ice cream" (appendix - table of loanwords))

From "ai55 sirh3 khu33 lin51 mu11" one can't really say that it's definitely from Japanese, not from English. I have recently added all translations of "ice cream" into Min Nan I could find in dictionaries and made アイスクリーム to be verified. Eventually, it should be deleted, since it's not verifiable as a Min Nan term. --Anatoli T. ^{(обсудить}/^вклад) 21:57, 9 March 2015 (UTC)

Sorry for any confusion -- I wasn't making a case for アイスクリーム#Min_Nan. I agree with you that katakana, AFAIK, are only used to write Japanese. Instead, I just intended to ask if the etymology of the Min Nan term was EN > NAN, or EN > JA > NAN. ‑‑ Eiríkr Útlendi │ Tala við mig 23:47, 9 March 2015 (UTC)

I understood your question. It may be of Japanese origin, if it's a word in Min Nan. According to the dictionary it is. What I meant is that non-standard romanisation "ai55 sirh3 khu33 lin51 mu11" doesn't really indicate that it may be Japanese (except for "khu33"), it's very similar to how Mandarin words are transliterated using Chinese characters and phonology, note that 斯 (sī) and 姆 (mǔ) are some of the Chinese characters used in romanising loanwords with non-syllabic "s" and "m". Yes, Japanese words are or were well known in Taiwan and there are loanwords in colloquial Taiwanese Mandarin and Min Nan but this particular word may only have been used colloquially and may never had a written form. Most words have Chinese character spellings or at least POJ. --Anatoli T. ^{(обсудить}/^вклад) 00:04, 10 March 2015 (UTC)

Inspire Campaign: Improving diversity, improving content

This March, we’re organizing an Inspire Campaign to encourage and support new ideas for improving gender diversity on Wikimedia projects. Less than 20% of Wikimedia contributors are women, and many important topics are still missing in our content. We invite all Wikimedians to participate. If you have an idea that could help address this problem, please get involved today! The campaign runs until March 31.

All proposals are welcome - research projects, technical solutions, community organizing and outreach initiatives, or something completely new! Funding is available from the Wikimedia Foundation for projects that need financial support. Constructive, positive feedback on ideas is appreciated, and collaboration is encouraged - your skills and experience may help bring someone else’s project to life. Join us at the Inspire Campaign and help this project better represent the world’s knowledge! MediaWiki message delivery (talk) 19:22, 5 March 2015 (UTC)

What 20%? We don't have women on Wiktionary. Hos are not good at lexicography. --Vahag (talk) 20:31, 5 March 2015 (UTC)

Not many but we do have them. What about active ones like Hekaheka, CodeCat, Panda10, Fumiko Take (not 100% about the gender of others)? --Anatoli T. ^{(обсудить}/^вклад) 22:18, 5 March 2015 (UTC)

@Vahag, despite your generalization I'll assume good faith* and direct you to read ho. Modern and Old Armenian, Russian, German, and English aren't enough to familiarize— well, even some native speakers of American English— with just how
b£00d¥ ɟ∪ɔkᵻɳɢ INSULTING that word is. It has no place whatsoever in any Wikimedia project except to be discussed, never used. --Thnidu (talk) 00:43, 6 March 2015 (UTC)

* Whoops! I just fixed this link. --Thnidu (talk) 04:41, 7 March 2015 (UTC)

I think good faith can only be assumed in combination with an assumption of mind-boggling ignorance and/or stupidity. Either way: not acceptable. --Catsidhe ^{(verba, facta)} 00:51, 6 March 2015 (UTC)

Unanimi sumus, Catsidhe. Nonne clare videtur ira mea? --Thnidu (talk) 05:35, 6 March 2015 (UTC)

Not good at all. I think Vahag was just being silly. I didn't get what "hos" mean at first. --Anatoli T. ^{(обсудить}/^вклад) 05:53, 6 March 2015 (UTC)

@Anatoli T. (I've switched our four-colon replies to maintain chrono order.) "Being silly" does not stretch that far. (... Боже мой, I envy your polyglottism!) Perhaps one has to live in the US or be in very close touch with its cultures to appreciate that word. Calling that "being silly" is like excusing groping a stranger's crotch as "just like a tap on the shoulder". Uh-uh. And look at the sexist remark the word is embedded in. --Thnidu (talk) 06:15, 6 March 2015 (UTC)

I've known Vahag for a long time, not personally though. He trolls from time to time and gets into trouble for that but he is not really a racist, sexist, homophobe and anti-Semite as he sometimes pretends to be with his silly jokes and comments. I think he just wants attention or create a stir. Not sure. Re: polyglottism - thanks for the praise but I am not as good with languages as you may think but I spend a lot of time on them. --Anatoli T. ^{(обсудить}/^вклад) 06:35, 6 March 2015 (UTC)

North American English vs Canadian and American English

Some entries are labelled {{lb|en|North America}} and some are labelled {{lb|en|US|Canada}}, and these are categorized differently. This seems unhelpful — users have to check two categories to find all Canadian (or American) entries. Should we (a) make {{lb|en|North America}} an alias of {{lb|en|US|Canada}}, or (b) try to periodically change instances of {{lb|en|US|Canada}} to {{lb|en|North America}}?
The first option is obviously more practical, as the second would require the sort of vigilance and recurring effort that we don't always manage to muster. One might say that it's useful to have a category for words common to both the US and Canada, but the same could be said of "ambitransitive" verbs, yet we've made that label an alias of "transitive, intransitive".
- -sche (discuss) 00:00, 6 March 2015 (UTC)

I like having "North America" be an alias for the separate categories. It would be useful to periodically review definitions that were in {{lb|en|US}} and not {{lb|en|Canada}} and vice versa, but, as we have no practice of marking items as having been passed such a review, it seems to mean a lot of repeated coverage of the same issue. DCDuring TALK 03:19, 6 March 2015 (UTC)

OK, I've made the "North American" label an alias for "Canada, US". Wiktionary:Todo/North American is a list of entries which are labelled as either Canadian or American but not both. We could go through the list, removing entries as we checked them. Once all the entries were removed, we could restore the list to its original state, periodically compile new versions of the list, and compare them to that version to find out which entries were new and thus needed checking. That would hopefully avoid too much re-examination of the same entries. - -sche (discuss) 05:13, 7 March 2015 (UTC)

anchors for links from other Wikimedia projects

On occult, I've added a null-length HTML span with ID to the medical sense of the adjective, as a target for a link from Wikipedia:Occult (disambiguation)#medicine, there being no single appropriate WP page; see the Talk page there.
I've done similarly on several other definitions here before, generally noting the reason for the anchor. But this time it occurs to me to ask if there's any problem with my doing this.

Please message me to reply. --Thnidu (talk) 00:09, 6 March 2015 (UTC)

Ungoliant MMDCCLXIV has helpfully answered me on my talk page:

Nothing wrong with it, just use the template {{senseid}} instead of adding the html code manually.

--Thnidu (talk) 02:28, 6 March 2015 (UTC)

We generally discourage HTML, especially in principal namespace In this case {{senseid}} is available and could be useful as a target for in-Wiktionary linking too. DCDuring TALK

Thanks, DCDuring. I'll try to go back over my contribs and templatize any HTML anchors. --Thnidu (talk) 05:41, 6 March 2015 (UTC)

Etymology: root or stem?

How should the words root and stem be used in an etymology? Are they interchangeable? E.g. "From a Proto-Ugric root *xyz-" or "from an imitative root with -asb suffix"? Google search returns more hits for "imitative root" than for "imitative stem" and 9 hits for "Proto-Ugric stem" (mostly from our Wiktionary), 7 hits for "Proto-Ugric root". It would be helpful to have a list of recommended usage. --Panda10 (talk) 18:23, 7 March 2015 (UTC)

Looking at the Lexicon of Linguistics and other references at “root”, in OneLook Dictionary Search. and “stem”, in OneLook Dictionary Search., they probably should not be used interchangeably in a dictionary with our pretensions to technical precision. As I understand it a stem is the invariant, common part of a set of inflected forms of a word. I think it should only be used within a given language. I think root can be used to refer to something more basic than a stem within a language as well as in comparisons across language (I'm hand-waving here.). DCDuring TALK 18:58, 7 March 2015 (UTC)

I don't know about Proto-Ugric, but there is a clear distinction between root and stem in Proto-Indo-European. The root is the most basic lexical part, which has a canonical shape (one or two consonants followed by a vowel followed optionally by a sonorant consonant followed optionally by an obstruent consonant). A stem is in many cases a root (appearing in one of its "grades", full grade, o-grade, or zero-grade) followed by a suffix; the stem is what the endings are added to. A single root may form multiple stems, especially in verbs, which may have a present stem, perfect stem, aorist stem, etc., all formed from the same root but using different "grades" and different suffixes (or no suffix at all—some stems are identical to the roots they're formed from) and maybe other modifications like reduplication. See for example *gʷem-, a root, which forms the present stem *gʷm̥sḱé-, the aorist stem *gʷém- (which happens to be identical to the root in this case), and the perfect stem *gʷegʷóm-. —Aɴɢʀ (talk) 19:31, 7 March 2015 (UTC)

The Uralic languages (to which Ugric belongs) also have a distinction between roots and stems. There are two basic root types: (C)VCV and (C)VCCV, where the second vowel must be a, ä or e (i is also equivalent to e in non-initial syllables). So anything that does not ultimately have this structure is not a root in Uralic. The difference with PIE is that roots can be (and often are) words on their own, so we don't put a hyphen after them. If the root is a verb, we do add a hyphen. As for Ugric, I would be very cautious making reconstructions for it as there isn't actually agreement on whether Ugric even exists as a linguistic group with a definite ancestor (other than Proto-Uralic). User:Tropylium can tell you more. —CodeCa t 00:32, 8 March 2015 (UTC)

The technical definition is indeed as Angr says: a root is an inanalyzable content morpheme, a stem is a root plus any possible (productive or fossilized) derivational suffixes. Some definitions may include epenthetic vowels or other morphophonological alternations as a part of a stem, but not as a part of a root; e.g. it would be possible to say that Hungarian hal (“fish”) has the root √hal, but in some inflected forms the stem hala-.

(The a/ä/e thing is probably not a useful criterion for Ugric, since original unstressed vowels are not distinguished in Hungarian.)

Within etymology, I'd suggest not calling proto-language items "stems", unless one is talking about proto-language morphology specifically. --Tropylium (talk) 01:11, 8 March 2015 (UTC)

Thank you all for the helpful information. I have already started removing the words root and stem, using simply "From Proto-Ugric *xyz-" or "From Proto-Finno-Ugric *xyz". For the proto-language items, I am using two reliable references: Uralonet, an online Uralic etymological database of the Research Institute for Linguistics, Hungarian Academy of Sciences (take a look at kerül and its Uralonet entry, the other is a printed etymology dictionary. The challenge is to provide an accurate translation of the Hungarian text. --Panda10 (talk) 14:15, 8 March 2015 (UTC)

Women honoured in scientific names / Inspire Campaign

Estimates of the percentage of Wikipedia editors who are female range from 9% to 23% percent.(source) I imagine the stats on Wiktionary are similar. WMF are searching for ways to address the gender gap with their Inspire Campaign. I have little idea how to address that issue in any really useful way.

But if anyone's interested in making entries for women naturalists/biologists, etc who have been honoured in scientific names, like, for example, ], I can put together a candidate list of potentially eponymous specific epithets (e.g. the most common epithets ending in -ae which have no other declensions). Then it will be a matter of picking out the names of humans from the list (which will also include places and parasite hosts) and making entries for them. Perhaps some notable scientists who are missing Wikipedia entries could be uncovered, and so feed into efforts of Wikipedians looking for such entries to create. I might try making a test list, and if anyone's interested in adding their name to a proposal, I might write up something for IdeaLab. —Pengo (talk) 16:56, 8 March 2015 (UTC)

I take it that entries like idae#Translingual are not what you have in mind. DCDuring TALK 18:29, 8 March 2015 (UTC)

Looking through the "A"s (through "An") in my Dictionary of Scientific Bird Names, there are a fair number of women's names. Unfortunately, the yield of those who were not wives, daughters, innamoratae, patrons, mythological or historical figures, or unknown is not high, to wit, two: angelae and annae. I looked at ever eponymous epithet in the range. I'm not really willing to go through the whole book with such a modest yield. DCDuring TALK 19:17, 8 March 2015 (UTC)

My impression from looking at hundreds of insect names is that people named tend to be: 1) The people who found and/or provided the type specimens, 2) colleagues (especially authors of invalid names superseded by the names published) 3) benefactors 4) friends and/or family 5) celebrities and/or historical figures 6) targets of disguised insults or other hidden messages. The earlier custom was to draw as much as possible from classical antiquity, which deteriorated into picking random names out of dictionaries as the number of new taxa outstripped the supply of meaningful figures to allude to. The sheer volume of taxa and the restriction on identical generic names or binomials has led to more and more frivolity such as puns, names from pop culture, etc.

Of the categories above, there are some really interesting people in the first category, including a surprising number of women. There are also a few surprises in the second category with some notable female scientists from a century or more ago. Chuck Entz (talk) 22:31, 8 March 2015 (UTC)

@DCDuring, Chuck Entz — kingsleyae was actually my first find of a missing -ae named for a human, which gave me some hope. Most of the fish named for her seem to have been first discovered by her too. "idae" is kind of borderline, I guess at a minimum, finding who an entry is eponymous for is important (I'm guessing idae usually refers to an Ida of Greek mythology, though didn't find anything definite in my cursory search). Scientists was my initial focus, but there's nothing wrong with increasing the number of female historical figures, patrons, celebrities, and mythological figures too, and it's also quite possible family and innamoratae were also involved in research. —Pengo (talk) 00:15, 9 March 2015 (UTC)

@DCDuring I don't suppose it would it be any less tedious if you had an "The Eponym Dictionary of Birds"? —Pengo (talk) 03:41, 9 March 2015 (UTC)

A favorite example is the whitefly genus Bemisia, described in 1914 in honor of Florence Eugenie Bemis, who was herself an expert on whiteflies. In 1904 she published a monograph on whiteflies of California in which she described 15 species new to science. I wish I could create a Wikipedia article on her, but I haven't been able to find biographical information, let alone citable references. Chuck Entz (talk) 04:08, 9 March 2015 (UTC)

What about focussing on species which were discovered by women? - -sche (discuss) 21:58, 8 March 2015 (UTC)

@-sche Species discovered by women would be great, but I have no idea how to find or make such a list. Though it might be easier for plants. The International Plant Names Index (ipni.org) has a "forename" field for their "authors" database, so it could be possible to pick out the feminine names, e.g. Miriam Cristina Alvarez (who described Ditassa oberdanii Fontella & M.C.Alvarez, a dogbane from Espírito Santo, Brazil). Ok, so maybe I do have an idea for how to make such a list. Some of the authors in the database seem to be authors of research papers but don't appear to have any species associated with them, e.g. I.Blok (Ida Blok), which tripped me up a bit. I'm not sure where to find an International list of male/female names. I could try extracting them from Wiktionary and/or try to guess based on suffix. Maybe I should write up a grant proposal. —Pengo (talk) 00:15, 9 March 2015 (UTC)

Let's say we do this. Are we doing it so that we can show that we care? If so, how will anyone know what we've done? Do we need a set of women's categories to advertise what we've done? DCDuring TALK 23:07, 8 March 2015 (UTC)

@DCDuring "Are we doing it so that we can show that we care?" Yep. (Also there's a tiny chance it might even encourage new editors, as these entries are fairly straightforward to create.) "If so, how will anyone know what we've done?" Write up some sort of summary on an IdeaLab item I guess. I'll have a go at creating the start of one soon. A category could help. We really ought to have one for eponymous specific epithets named for non-mythological humans or the like already. No idea if a category should be split by gender, but it's easy enough to pick the -ae's from the -i's anyway. —Pengo (talk) 01:13, 9 March 2015 (UTC)

First attempt: Here's a bunch of epithets ending in -ae, sorted by usage in books. Not sure how useful it is. —Pengo (talk) 00:15, 9 March 2015 (UTC)

We have nearly 200 items in Category:Translingual taxonomic eponyms and I don't always remember to categorize the items there, so there could easily be fifty or a hundred more. DCDuring TALK 03:37, 9 March 2015 (UTC)

I got the total up to 660 without creating any new pages. Though only found 14 -ae pages to add (which includes a ship: sibogae). —Pengo (talk) 10:53, 9 March 2015 (UTC)

Here's the IdeaLab page, which I have created in my quixotic quest to gather more participants and interest. Please add your name of support it if you're even vaguely interested. Pengo (talk) 23:03, 11 March 2015 (UTC)

Show/hide broken

Some days the show/hide (inflections, conjugations, translations) functionality is gone and I can't view translations except by clicking "edit". What's going on? This started to happen one or two weeks ago, perhaps at the same time that "§" characters started to appear next to headings. I'm running Firefox on Linux. --LA2 (talk) 14:02, 9 March 2015 (UTC)

Even when it is broken, the content should always be viewable, so it's a double bug. It should not have anything to do with § though, since § is a new Mediawiki feature, and the "NavBars" (hide/show boxes) are created with MediaWiki:Gadget-legacy.js. If it happens again, could you check the log (Tools > Web development > Web console) to see if there is a javascript error? — Dakdada 15:02, 12 March 2015 (UTC)

Now I removed all cookies from my Firefox browser pertaining to en.wiktionary, and that solved the problem! Can you imagine that a cookie could cause this?! LA2 (talk) 19:05, 15 March 2015 (UTC)

Happened to me too, hours ago. I also deleted cache, which did not solve the problem. Then I checked Delete cookies and other site data checkbox which solved the problem.

@Dakdada, I did look into the console log and I remember there was an error caused by Gadget-legacy.js

It seemed to me that the problem started when I clicked some buttons under the "Visibility" toolbox. --Dixtosa (talk) 19:16, 15 March 2015 (UTC)

Bad italics in comparative/superlative entries

Could someone please modify Template:en-comparative of and Template:en-superlative of so that they don't put the literal word in italics at the end? e.g. at civilest, it should say "most civil", not "most civil". Equinox ◑ 19:16, 9 March 2015 (UTC)

Done. —CodeCa t 19:21, 9 March 2015 (UTC)

Codifying sarcastic/ironic and some other rhetorical use as inelligible under CFI

Vote created at Wiktionary:Votes/pl-2015-03/Excluding most sarcastic usage from CFI

Every so often, a definition like "big: (sarcastic) small" finds its way to RFD. Sarcasm and irony are productive in the English language (and all other spoken languages, as far as I know) and there are effectively no restrictions on what can be twisted sarcastically. Standard practice has been to delete obvious sarcastic and rhetorical use (see eg. talk:touché, talk:James Bond, talk:thanks a lot), but this isn't actually mentioned anywhere. Therefore, I would suggest adding the something like the text quoted below to CFI.

As far as I can tell, this would only result in merging/deleting senses on two pages: great and pray tell, possibly also no kidding, thanks a bunch (which did survive RFD) and eon. Thoughts or improvements welcome. Smurrayinchester (talk) 16:57, 11 March 2015 (UTC)

What exactly is this referring to in "this can be explained in a usage note"? DCDuring TALK 21:12, 11 March 2015 (UTC)

I've tried to make that sentence a bit shorter and clearer. Smurrayinchester (talk) 21:22, 11 March 2015 (UTC)

I'd have guessed that, but it wasn't clear. Thanks.

I agree that it would be useful to be able to point to a policy something like what you've offered. Your draft would be good enough for me, but perhaps it can be further improved. DCDuring TALK 21:45, 11 March 2015 (UTC)

Sounds good to me, though I wonder whether there are cases where a word is now almost exclusively used in a sarcastic way, and rarely or never with its original meaning. If so, those might need special treatment. Equinox ◑ 15:51, 12 March 2015 (UTC)

I don't want to see this sort of long wording in CFI. I think the problem of sarcastic meanings is marginal anyway. Furthermore, each sarcastic meaning has to be scrutinized for how characteristic it is, and therefore, to what extent it has become lexicalized and thereby inclusion-worthy. The regulatory part (as opposed to explanatory) of the above seems to be largely captured in this: "The straightforward use of sarcasm, irony, understatement and hyperbole does not usually qualify for inclusion." The use of "usually" makes room for reasonable exceptions. If metaphor is intended to be on the list, it needs to be explicity there; it is now conspicously absent. Of course, inclusion of metaphor in the list would make this rather open to abuse. --Dan Polansky (talk) 19:09, 12 March 2015 (UTC)

Metaphor is a tricky case, as you say. Since it's a much more irregular process than the rhetorical devices listed above (or perhaps more accurate, sarcasm, irony, understatement and hyperbole are subtypes of metaphor), and since it's one the main drivers of linguistic evolution, it would be daft to have a blanket exclusion. While it's a bit wordy, I think some explanatory verbiage is needed. CFI changes that just add a rule without giving any context to its application just seem to cause endless squabbling (look at the arguments WT:COALMINE caused). I've put a more pruned version below, which still (I hope) provides enough of the background to the rule to allow it to guide RFD debates effectively. Smurrayinchester (talk) 09:56, 13 March 2015 (UTC)

Oppose: Too blanket. There are some sarcastic/ironic definitions that we should have. Furthermore, some words/phrases are used sarcastically frequently, while most are used hardly at all. Purple backpack89 20:21, 12 March 2015 (UTC)

Can you give an example of a term which would fail CFI under these rules, that should nevertheless be included? The cases that you mention are already covered by with the sentence "Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition." Indeed, usage notes specifically exist to explain the nuances of usage that a definition cannot provide. Smurrayinchester (talk) 09:56, 13 March 2015 (UTC)

Rhetorical devices

The meaning of a statement always depends on context, and there are various rhetorical devices that speakers and writers use in order to convey a particular message without meaning what they literally say. These include sarcasm, irony, understatement and hyperbole. In speech, the use of these devices is often highlighted by a particular intonation, and in writing, this may be mimicked by the use of italics, quotation marks or exclamation points. Because the set of words and phrases which can be used rhetorically is almost limitless, and because separating ironic use from literal use is often difficult, the straightforward use of common rhetorical devices does not usually qualify for inclusion.

This means, for example, that big should not be defined as "(sarcastic) small", "(understatement) gigantic" or "(hyperbole) moderately large"; the fact that an English speaker might use the word this way is obvious and not especially noteworthy. Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition.

Figures of speech that are not obvious from their parts – for example, a euphemism which successfully disguises its true meaning, or a sarcastic turn of phrase which is more than a simple inversion of meaning – or which are never used literally are not covered by this rule, and can be included on their own merits.

Alternative wording

The straightforward use of sarcasm, irony, understatement and hyperbole does not usually qualify for inclusion: these are standard rhetorical devices which affect the meaning of a statement as a whole, but do not change the meaning of the words themselves.

This means, for example, that big should not be defined as "(sarcastic) small", "(understatement) gigantic" or "(hyperbole) moderately large"; the fact that an English speaker might use the word in these ways is obvious and not especially noteworthy. Common rhetorical use can be explained in a usage note, a context tag (such as (Usually sarcastic)) or as part of the literal definition. Figures of speech that are not obvious from their parts or which are never used literally are not covered by this rule, and can be included on their own merits.

Phonetic transcriptions (narrowness, number)

I have been informed that phonetic transcriptions on this site are only to be done on a certain level of depth. As I am personally interested in the variant pronunciations of languages, non-phonemic ones included, I would like to ask whether there are really any great arguments against giving a medium number of regional narrower pronunciations under a broad heading, like in the examples here and here. Korn (talk) 10:39, 12 March 2015 (UTC)

I feel like such fine phonetic detail doesn't belong in a dictionary because it's not a lexical property of the word in question. The fact that /ʁ/ is realized as in Bavarian is a fact about the phonology of Bavarian, not a fact about robben. I also wonder how verifiable a lot of these pronunciations are. Who says that it's , with a highly unusual and almost unpronounceable sequence of vowel plus syllabic consonant in northern and central German? I live in Berlin, and while I've certainly heard (which isn't even listed), I don't think I've ever heard . I don't think I can even produce in a way that is reliably distinct from . And who says that the standard German pronunciation of Madrid is with an aspirated at the end of a syllable? I've never read a phonological description of standard German that permits aspirated consonants at the end of a syllable. I'm also curious about what inflected and derived forms of Madrid are attested to verify the claim that the final consonant is underlyingly /t/, i.e. that the word works in German as if it were spelled Madrit. —Aɴɢʀ (talk) 10:31, 14 March 2015 (UTC)

I lived in Berlin (north east) for five years and my impression is that is by far the dominant Berlin and German pronunciation. -ben is certainly not pronounced with a fully released plosive like Bad and preventing from becoming requires some carefulness in speech. When speaking careful, though, I think people normally end up with some form ending in again. Concerning Madrid: The adjective, 'madrider'. Hearing it pronounced with would make me assume the speaker was from an area with intervocalic consonant voicing, i.e. Schwaben, Sachsen, the north et cetera. Its pronunciation with /t/ is based in the devoicing in the noun.
As for the lexical property, it could just as well be stated that the fact that /r/ is realised as in Western, Central and parts of Northern Germany is a fact about the phonology of Central, Western and parts of Northern Germany and not about the word in question. But at the end of the day, both pronunciations are both permissable and spread variants of the standard language and not features of a non-standard dialect. Hence, if either deserves a place in the list, so does the other. And a note about where they are used seems a reasonable service of convenience. Actively excluding them would mean to blot out a considerable portion of German speakers and creating North-Central-centric bias in this dictionary. Especially with comparison to the English entries, which always differentiate between at least two or more variants (English, American, Australian, Canadian and American dialects), or Indonesian entries which list both /o/ and /ʊ/ (sarung#Malay) and /e/ - /ɪ/, there certainly is some precedent for, at the very least, more level of detail than just a phonemic description of one single accent; even when that accent is the one considered to be the educated regiolect in the cities where most of Germany's TV, radio and cinema is produced.
Lastly, as for the aspirated /t/, English Wikipedia cites the Duden Aussprachewörterbuch (which I don't have around to check) as a source for consonants having the same level of aspiration in all positions. It is also mentions that initial-only aspiration is a distinctive feature of northern northern Germany, which is reasonable as the same has been said by Low German grammarians over a century before. Korn (talk) 13:59, 14 March 2015 (UTC)

Coupla new votes

Thanks to their recent vandal-fighting, I've started a couple of votes for adminhood to be bestowed upon Mr Granger and ISMETA --Type56op9 (talk) 12:43, 12 March 2015 (UTC)

SUL finalization update

Hi all, please read this page for important information and an update involving SUL finalization, scheduled to take place in one month. Thanks. Keegan (WMF) (talk) 19:45, 13 March 2015 (UTC)

Striking a Blow Against a Spammer

I just deleted an entry for the name of a business/its website domain name where the definition was a verbatim quote of a slogan from their website (I'm not going to mention the details to avoid giving them the search-engine-ranking boost they were aiming for- I've given enough information here so you can easily find them).

After deleting the entry and blocking the IP for 6 months as a spammer, I took it a step further: I noticed a yelp.com entry for their business, so I signed up there with an account under my own name and zip code and posted a negative review- citing only facts verifiable in the deletion log and noting the lack of direct evidence. Now, whenever anyone searches for the website, this review will come up. Unless I'm missing something, this tactic has the potential to remove some of the incentive/reward for search-engine spam in cases where a negative review would make a difference (this is an advertising/marketing business in Texas).

What does everyone else think about this? Chuck Entz (talk) 00:17, 14 March 2015 (UTC)

This could be an effective approach. There's always the possibility Person A would create an entry for rival Person B's business, knowing we'd delete it and smack Person B, but our historical experience suggests most spammers aren't that smart or else they would have realized by now we delete spam pages and they don't gain any SEO. - -sche (discuss) 02:53, 14 March 2015 (UTC)

I doubt it will make much difference, since spammers are such single-minded meatheads, but it can't actually hurt. If you feel you've got time to mess about filling various online forms then go for it. Equinox ◑ 02:59, 14 March 2015 (UTC)

Template:lang

Do we still need {{lang}}? Is there anything that {{lang|it|Nel mezzo del cammin di nostra vita}} does that {{l|it||Nel mezzo del cammin di nostra vita}} (note the two vertical bars after it) doesn't? If I want to put a link inside {{lang}}, e.g. {{lang|it|Nel ] del cammin di nostra vita}}, it doesn't even tell the link to go to the Italian section, while {{l|it|Nel ] del cammin di nostra vita}} does tell the link what language it is. —Aɴɢʀ (talk) 10:48, 14 March 2015 (UTC)

In which situations is {{lang}} used anyway? I’ve only seen it used in quotations, but I think we would benefit from a template specifically for that (one that works like {{usex}}). — Ungoliant ^(falai) 17:57, 14 March 2015 (UTC)

Besides quotations, I've sometimes used it in inflection-table templates for forms that don't need linking. —Aɴɢʀ (talk) 19:51, 15 March 2015 (UTC)

Looks like replacing it with {{l}} is the way to go. — Ungoliant ^(falai) 14:28, 16 March 2015 (UTC)

My gut is to keep both. As I've said time and again, merging and moving templates does little other than confuse a lot of editors. Purple backpack89 14:33, 16 March 2015 (UTC)

The way this process should and used to work is that, if folks agree, the template is deprecated, then its use converted to some other, then deleted.

Deprecation can be preceded by discouraging use. Should we discourage use of this in any of its applications? In all of its applications? The discouragement can be in the form of changing the documentation, gradually converting some or all uses to some other template, as well as any adverse conclusion of discussions such as this. Ii also might be a a good time to determine whether the replacement templates are as good as they could be and to review their documentation. It is a bit more work, but a gradual process should reduce the adverse effects on contributor habits, and extend the utility of edit histories that use older templates. DCDuring TALK 17:05, 16 March 2015 (UTC)

I don't really give a flying fox if we delete it or not; I just want to know if there's any particular reason I should keep using it. —Aɴɢʀ (talk) 21:23, 16 March 2015 (UTC)

Based on {{lang/documentation}} it's basically a shortcut to , which I think is still needed because of browsers that don't work out the script for themselves. How useful it is for languages that use the Latin script, well, I think it only changes the HTML, to a human user, it's no different. Renard Migrant (talk) 20:29, 17 March 2015 (UTC)

Usability perspective:

My current understanding is that various accessibility and other tools can make use of linguistic metadata provided by {{lang}} to decide how to handle text. I've been using it for some time to specify that non-link text I am entering is not English.

From what I've been able to test, both {{lang|LANGCODE|$Text}} and {{l|LANGCODE||$Text}} produce identical output in the browser:

$Text

This proposed change would thus only 1) affect what templates editors use, and 2) require that someone go through and change all instances of {{lang}} over to use {{l}} instead.

I'm fine with that. I can't think of any other real downsides. ‑‑ Eiríkr Útlendi │ Tala við mig 23:15, 17 March 2015 (UTC)

There are some differences. Compare {{lang|ru|]}} and {{l|ru||]}}. —CodeCa t 00:08, 18 March 2015 (UTC)

With the ] link brackets, {{lang}} produces:

<span class="Cyrl" lang="ru" xml:lang="ru"><a href="https://dictious.com/en/%D1%82%D0%B5%D1%81%D1%82" title="">тест</a></span>

Meanwhile, {{l}} produces:

<span class="Cyrl" lang="ru" xml:lang="ru"><a href="https://dictious.com/en/%D1%82%D0%B5%D1%81%D1%82" title="тест">тест</a></span> (<span lang="" class="tr" xml:lang="">test</span>)

Without the ] link brackets, {{lang}} produces:

тест

{{l}} produces:

тест (test)

It looks like the key difference is addition of transliteration for those languages for which our infrastructure supports transliteration.

Query: Are there any use cases where users would want to 1) mark text as a specific language, but 2) not have any automatic transliteration? ‑‑ Eiríkr Útlendi │ Tala við mig 18:21, 18 March 2015 (UTC)

Our templates already support tr=- to suppress transliteration. So you only have to search for entries which have that. It's probably used mostly in inflection tables. —CodeCa t 19:46, 18 March 2015 (UTC)

I think the idea is something like this, on aduire, where the intention is not to link. Renard Migrant (talk) 17:33, 19 March 2015 (UTC)

But that's where you'd use {{ux}}. —CodeCa t 18:36, 19 March 2015 (UTC)

It's a citation, not a usage example. Renard Migrant (talk) 12:45, 21 March 2015 (UTC)

Template:confusion

Whyyyy do we have both {{ux}} and {{usex}}??? ‑‑ Eiríkr Útlendi │ Tala við mig 18:38, 19 March 2015 (UTC)

See Wiktionary:Grease pit/2014/February#Template for eg over usex like label over context. —CodeCa t 18:59, 19 March 2015 (UTC)

They work the same, one being a redirect to the other.

{{usex}} came first and its name is a bit more intuitive, so some users are accustomed to it and it might be a little bit easier for someone new to Wiktionary to figure out what was intended. As evidence of usex being more intuitive, it gets some use on our discussion pages as an abbreviation of usage example, whereas I don't recollect a single instance of such use of ux". OTOH, {{ux}} is shorter. If there were a big shortage of two-letter codes or a clearly better use for either of the template names we could revisit the matter. DCDuring TALK 20:54, 19 March 2015 (UTC)

Pronunciation formatting

Should phonetic or phonemic transcription be preferred, by default? WT:PRON appears to be silent on this. Yet this can be a relatively large difference for languages where a word's surface realization involves several phonological processes.

Also, {{IPA}} seems to link every pronunciation to the corresponding ] article, even if one does not exist. This seems like a bad idea, given the policy that "deally, every entry should have a pronunciation section". I would suggest instead directing it by default to ] (though it seems possible to contemplate defining a set of languages for which it instead links to the separate phonology article). --Tropylium (talk) 14:23, 16 March 2015 (UTC)

I prefer phonemic transcription because that's what most dictionaries use and because that's what lexical. That said, the phonemic transcription need not be highly abstract; for example, if the distinction between two phonemes is loss in a certain environment, then the sound that surfaces can be transcribed even if an abstract analysis would regard the other sound as the underlying one. (For example, German Rad can be transcribed /ʁaːt/ rather than /ʁaːd/ since /t/ and /d/ are distinct phonemes in German, even though an abstract analysis would posit /ʁaːd/ as the underlying form.) But that's just my preference; we have plenty of examples of narrow transcription being used, and there's no reason we can't use both. —Aɴɢʀ (talk) 21:06, 16 March 2015 (UTC)

If allophonic differences cause the distinction between two phonemes to collapse, then that collapsed phoneme should really be treated as new phoneme in itself, rather than either of the original phonemes. For example, in Eastern Catalan, unstressed /a/ and /e/ fall together as /ə/, and you can't really say which of the two it originally belongs to. It's a new phoneme altogether, albeit one that occurs in complementary distribution to both /a/ and /e/. For final devoicing, the same applies in principle, albeit that the phonetic realisation of the new phoneme coincides with the realisation of one of the two phonemes that it results from. But the distinction is definitely phonemic, and it's only when you go into morphophonemics, comparing related forms of a lemma, that the original /d/ arises. Another way to look at it is to ask: if Rad were the only possible form and had no other forms or related terms to compare it with, how would you know it was /d/ underlyingly? You couldn't, and therefore the phoneme is /t/. —CodeCa t 21:33, 16 March 2015 (UTC)

I agree. —Aɴɢʀ (talk) 22:26, 16 March 2015 (UTC)

Phonemic, please! I don't think anyone wants to see a whole raft of vowel variants for Yorkshire, London, Manchester, Essex, Scotland, etc. — and that's just the UK! Equinox ◑ 21:17, 16 March 2015 (UTC)

I'll agree with everyone then. Renard Migrant (talk) 20:29, 17 March 2015 (UTC)

I'm not asking due to dialects as much as languages with several surface filters between phonemics and phonetics. For an example: Tundra Nenets леды (lyedi, “skeleton”) is phonologically analyzable as IPA^(key): /lediă/, phonetically realized as IPA^(key): . Would you mandate transcribing the former? Or would you be OK with using "subphonemic" transcription where e.g. the vowel backing process, universal in all varieties of the language, is transcribed? How about the lenition of /d/, which is almost universal — would you consider the fact that there exist a few dialects that have in this position sufficient grounds to not mark at all?

(For that matter, suppose I were to indicate an underlying phonemicization IPA^(key): /lixt/ or even just IPA^(key): /līt/ for light, citing w:The Sound Pattern of English…?)

"Do not put in tons of dialectal pronunciations" is not at all the same as "put everything in purely phonemic transcription". --Tropylium (talk) 00:04, 18 March 2015 (UTC)

That's a difficult case. On the one hand, you don't want to give such a highly abstract representation (like the SPE ones you mentioned) that the word would be unrecognizable to native speakers if pronounced the way it's transcribed. On the other hand, you don't want to overwhelm the user with a bunch of fine phonetic detail whose absence would probably not be noticed by native speakers. One rule of thumb I sometimes try to follow in cases like this is "How narrow a transcription can I get without using any IPA diacritics, superscripts, etc., but only the basic characters?" Obviously that rule can't be applied exceptionlessly in all cases, but if is unambiguous as it stands, then don't go overboard and transcribe it or whatever. —Aɴɢʀ (talk) 20:02, 18 March 2015 (UTC)

Would it not be possible to automatically generate phonetic transcriptions from the phonemic one? After all, it's predictable by definition. —CodeCa t 21:11, 17 March 2015 (UTC)

In the past, a few users suggested using super-broad/"diaphonemic" transcriptions. Perhaps one day English entries will have expandable templatized pronunciation sections like Chinese entries, where phonemic and semi-narrow phonetic transcriptions into major dialects are shown by default, while smaller dialects' pronunciations, and super-broad/"diaphonemic" and super-narrow transcriptions, are shown when the template is expanded. (Check out the obscure dialect+chronolect in dirty.) PS I definitely agree that Rad should be transcribed as ending with /t/, not /d/. - -sche (discuss) 01:24, 19 March 2015 (UTC)

For what it's worth, I heavily support pursuing this idea. Not that my word seems to be worth much, as before I both asked about how to create such a collapsible template and just a bit further up this page asked more or less the same question about narrowness and diaphonemic/dialect IPA policy and was widely ignored both times. Korn (talk) 11:45, 19 March 2015 (UTC)

Automatic phonemic → phonetic transcription is in principle viable, but fully unified diaphonemic transcription is generally not. Consider examples like lava or pasta. --Tropylium (talk) 00:58, 1 April 2015 (UTC)

Request for citations (!= RFV) for entries not in other dictionaries

We have a good number of entries (definitions in entries) not in other dictionaries that have no citations. They really should have some citations to confirm our definition and to make us look a little more systematic than Urban Dictionary. The RfV process gives urgency to the process of attestation, but that urgency may be excessive for many of these. Would it make sense to have {{rfcites}} (or something) for entries that were not in {{R:OneLook}}, {{R:Century 1911}}, or any glossary or dictionary in Google Books (template to be written)? I suppose it would be most productive for this to be applied first to entries. Attempting to determine whether a definition is or is not in another dictionary is much harder than determining whether a term is. DCDuring TALK 17:08, 17 March 2015 (UTC)

I like the idea of some sort of collaborative wiki project where you can grab any word/sense without the requisite three citations and go away and cite it, and it is then removed from the list. This would take a lot of organisation, and a bot. Still, it could be done, and I would rather that we generate a separate list, based on our current entries, than change those entries by adding yet more template markup to them. Equinox ◑ 19:43, 17 March 2015 (UTC)

There are {{rfquote}} and {{Template:rfquote-sense}}, created on 24 October 2007 and 22 October 2007. They categorize into Category:English entries needing quotation, which now has 10,750 items. --Dan Polansky (talk) 19:54, 17 March 2015 (UTC)

Thanks. I had looked at Category:Request templates, really. There are only about 60 uses of {{rfquote|lang=en}} AFAICT, somwhat fewer of {{rfquote-sense|lang=en}}. Would it be unreasonable to categorize the English ones into a specific category? DCDuring TALK 01:02, 18 March 2015 (UTC)

I sometimes put {{rfex}} on senses that strike me as dubious but don't seem worth an RFV. I don't think those categorise, but at least it's something editors will see while editing. Equinox ◑ 19:55, 17 March 2015 (UTC)

Would it be helpful to have {{rfex}} categorize into the same category as {{rfquote}} or into a different category or none at all. If it contained "en" or "lang=en", the new search could find it, even without a category, but someone would have to know search and know what to look for. DCDuring TALK 01:01, 18 March 2015 (UTC)

Tentative support but not on the main page, perhaps. Also, we need to consider normalisations of spellings and rare languages, for example, quoting Chechen word чӏогӏа (čʼoğa, “strong”) would be difficult for two reasons - non-standard spelling "чlогӏа" is more common (problems with palochka, especially lower case "ӏ") and Chechen doesn't have a lot of digitised books published. --Anatoli T. ^{(обсудить}/^вклад) 01:24, 18 March 2015 (UTC)

{{newrfquote}} could be made less conspicuous, like {{rfelite}}, and placed at the bottom of the L2 section. {{rfquote-sense}} is relatively inconspicuous and could me made a bit less conspicuous.

As to the other problems, of course, I'm thinking mainly of English. Judgment needs to be applied for each language, indeed for each individual use. DCDuring TALK 01:47, 18 March 2015 (UTC)

Walser German and Swabian

It has been brought to my attention that we have Category:Walser German language and Category:Swabian language.

In my opinions, we shouldn't treat these two as separate languages. They are part of the Swiss German dialect continuum, which is covered via Category:Alemannic German language. There is no reason at all to keep Swabian. As for Walser German, it is the least intelligible of the dialect continuum, virtually incomprehensible even to other Swiss German speakers, but linguistic tradition has always treated it as just a variety of Swiss German, and there is no reason why we shouldn't follow suit. We can always use dialect labels to distinguish the different languages, and there are many, many more varieties of Swiss German (like Alsatian) that are not covered. -- Liliana • 22:18, 17 March 2015 (UTC)

Yes, merge ~~them~~ Walser into gsw (Category:Alemannic German language). There are lexical distinctions and phonological and hence orthographic distinctions that can be drawn between the lects, but none of them are so great that it would be sensible to treat the lects as separate languages. (And there are many equally distinct varieties of the Alemannic dialect continuum which have not been granted codes, as you've noted.) A cynic might wonder if the reason Ethnologue et al are so much quicker to grant codes to the dialects of other languages than to the dialects of English is that they all speak English well enough to recognize how silly it would be to consider da yooge boid ate da olykoek /də judʒ bɜjd eɪt də ˈ(oʊ~oə).lɪ.kʊk/ and the huge bird ate the doughnut /ðə hjudʒ bɝd eɪt ðə ˈdoʊ.nʌt/ different languages. - -sche (discuss) 20:01, 20 March 2015 (UTC)

Incidentally, I stumbled onto this today: "I notice that 'Walser' is counted as a separate language, but all the other Swiss German dialects are grouped under Swiss German. Does anyone happen to know why this is? My grandparents speak Walser and we have absolutely no trouble communicating (I speak a different, High Alemannic, dialect)."

I have merged Walser into gsw.

- -sche (discuss) 21:28, 17 April 2015 (UTC)

Discussion of Swabian is now taking place at WT:T:ADE. - -sche (discuss) 19:55, 30 July 2015 (UTC)

Wordset

Any thoughts on Wordset? They open sourced their code and data recently and emphasize a structured data approach (in contrast to Wiktionary). Their claim that Wiktionary is "unstructured" is not really correct, there a number of tools which can successfully parse the content (I contribute code to one of them). At best I would call Wiktionary "semi-structured". What I agree with however is that it is time to try out new ways to build a collaborative platform at scale. For instance there is a voting system built into wordset which is used to reach consensus on proposed changes. The big problem is that Wiktionary (and Mediawiki) can be quite intimidating to potential new contributors, the templating system is powerful but also complex. And it was obviously never designed to create a dictionary. On the other hand Wordset's data model is quite limited at the moment (for a project that aims to be more structured), and they only focus on English headwords, at least initially. Jberkel (talk) 18:08, 18 March 2015 (UTC)

I'm not impressed. SemperBlotto (talk) 08:03, 19 March 2015 (UTC)
It is quite easy to have a data structure when only focusing on one language and only looking for a definition. But try to do that with all languages (described in several languages), with much more diverse information to store and organize (pronunciations, etymologies, flexions, synonyms...) and it becomes very difficult. The semi-structured Wiktionaries allows to have all of these, but at the cost of a real structure (also parsers only work to some extend, and usually only for one Wiktionary language), which indeed make it difficult to reuse the data. Wikidata may be able to improve this, but it is going to be very difficult. Nonetheless, this Wordset site is open-source, including the definitions, and with a philosophy close to the Wikimedia projects, so we should not try to see it as an adversary. — Dakdada 09:16, 19 March 2015 (UTC)
@BD2412: If they don't import content from dictionaries like us, it will take them a long time to achieve coverage. Some of their content is apparently from WordNet and is available on what looks to me like a non-standard license. Their content is "Creative Commons Attribution-ShareAlike 4.0 International License". Can they simply import our content given that license? Can we use their content provided we include them as a reference? DCDuring TALK 15:22, 19 March 2015 (UTC)
Yes, and yes. They claim to use "the same CC license for the content as Wikipedia uses, CC-BY-SA", and we can hold them to that. Like everyone else in the world, they are free to copy and reuse our content so long as they credit us for it, and we are free to do the same as to theirs. I would not hold my breath on their providing anything that we can actually use, however. bd2412 T 19:07, 19 March 2015 (UTC)
Thanks. They might have some particularly well-worded definitions and usexes from time to time. BTW, can we copy WordNet with acknowledgement or is their license a little different?

I've been thinking that it would be handy to have a definition-writers custom edit interface that automatically generated links to various copyright-free and appropriately licensed dictionaries' entries for the headword being edited. Other links might be to various corpora and gateways. Standard boilerplate to credit the sources that needed crediting could be part of it too. At a very basic level templates like {{taxlook}} and {{REEHelp}} do a little of this, but a complete editing interface would be much better. DCDuring TALK 20:42, 19 March 2015 (UTC)

We are completely free to copy from WordNet, so long as wherever we copy, we include on the page: "WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS". However, I'd rather not include that anywhere in Wiktionary. bd2412 T 20:54, 19 March 2015 (UTC)

@BD2412: Indeed. In contrast Wordset needs a simple acknowledgement and link to their site. Or is our way of tracking changes not sufficient? DCDuring TALK 21:04, 19 March 2015 (UTC)

From a first glance, it doesn't look like their licence is compatible with ours, in particular the CC ShareAlike clause. ShareAlike means copyleft; they can't "add" restrictions on content that are not present on its original form on Wiktionary. If licencees have to include their copyright notice, it violates that, because such a requirement does not exist here. So Wordset cannot use Wiktionary content. —CodeCa t 21:10, 19 March 2015 (UTC)

But our license is also CC ShareAlike. —Aɴɢʀ (talk) 21:39, 19 March 2015 (UTC)

I was referring to ours. Theirs is not, as far as I can tell. So it's probably incompatible. —CodeCa t 21:47, 19 March 2015 (UTC)

It is the same, per their own words: "Specifically, we’re going to be choosing the same CC license for the content as Wikipedia uses, CC-BY-SA." — Dakdada 16:31, 20 March 2015 (UTC)

Yeah, it's confusing because we're simultaneously talking about Wordnet and Wordset in this thread. Wordset has the same license we do, but Wordnet doesn't. —Aɴɢʀ (talk) 16:41, 20 March 2015 (UTC)

Anyone else notice how they function on "yae" votes? Heh. Equinox ◑ 14:25, 19 March 2015 (UTC)

Interlanguage (interwiki) links

Does anybody know what the plan is for interlanguage links? Wikidata (as used in Wikipedia) is not yet used in Wiktionary. New articles lack interlanguage links (one example is lägel, created by me on February 21, which also exists on sv.wiktionary, but isn't linked) and existing articles here lack interlanguage links to newly created articles in other languages of Wiktionary, apparently because the interwiki bots have stopped. Should the bots be restarted? Or will Wikidata support come soon? --LA2 (talk) 21:58, 23 March 2015 (UTC)

@LA2: See d:Wikidata:Wiktionary. The fact that interwiki links aren't handled by Wikidata is pretty ridiculous, really. In (e.g.) Wikipedia, there won't be a direct one-to-one equivalent of every idea in every language edition and figuring out where all of them should point can be really tricky. In Wiktionary, it's irrelevant: the entry at wikt:en:foot and wikt:es:foot should link together no matter what (as long as neither of them is a redlink). This could all be accomplished painlessly in an afternoon. —Justin (koavf)❤T☮C☺M☯ 01:52, 24 March 2015 (UTC)

"the entry at wikt:en:foot and wikt:es:foot should link together no matter what this could all be accomplished painlessly in an afternoon": Indeed. And no shortage of Wiktionarians have pointed that out to the folks at Wikidata. They, in turn, have made it clear they are not going to do it. - -sche (discuss) 03:57, 24 March 2015 (UTC)

@-sche: Do you have links or diffs? I can't imagine that the Wikidata community refuse to make interwiki links on Wiktionary. —Justin (koavf)❤T☮C☺M☯ 04:35, 24 March 2015 (UTC)

I think they would like to do all of Wiktionary at once, not just interlanguage links. And since defining a structure for Wiktionary linguistic data is really hard (and much discussed), they will probably not attack the problem until the other projects are converted to Wikidata.

Also, there are some small exceptions that we need to take care of for interlanguage links: see the table I made in d:Wikidata_talk:Wiktionary#First_and_second_phases, in particular the "apostrophe", "capital" and "other" interwikis. Those are due to different communities typographic rules (and some errors). — Dakdada 10:24, 24 March 2015 (UTC)

This sounds like a deadlock situation. How sad! In the meanwhile, it couldn't hurt to restart interwiki bots, could it? I still have bot status (LA2-bot) on some languages of Wiktionary, so should I just go for it? LA2 (talk) 15:29, 24 March 2015 (UTC)

Out

Hi. I'm not gonna be using this username anymore. Time for a change. See you soon with a new name. --Type56op9 (talk) 12:49, 24 March 2015 (UTC)

OK thanks for letting us know ♥ Soap (talk) 15:07, 26 March 2015 (UTC)

It wasn't one of your better names, really. Equinox ◑ 02:06, 27 March 2015 (UTC)

Entries from the GCIDE labeled "Webster 1913 Suppl."

So I've found entries in the GCIDE which are missing from Wiktionary, such as "Pimola":

 <p><ent>Pimola</ent><br/
 <hw>Pim*o"la</hw> <pr>(?)</pr>, <pos>n.</pos> <def>An olive stuffed with a kind of sweet red pepper, or pimiento.  </def><br/
 </p>

Apparently these are from the "Webster 1913 Suppl."

My question is this: Should these be copied into Wiktionary? Is it OK for me to copy this definition into Wiktionary, or are there license restrictions for the "Webster 1913 Suppl." ?

Oh, interesting. There are also other words missing which are labeled, simply, "1913 Webster", such as Pinxit:

 <p><ent>Pinxit</ent><br/
 \'d8<hw>Pinx"it</hw> <pr>(?)</pr>. <ety></ety> 
 <def>A word appended to the artist's name or initials on a painting, or engraved copy of a painting; <as>as, <ex>Rubens pinxit</ex>, Rubens painted (this)</as>.</def><br/
 </p>

Should these be copied in? "Pinxit" seems like a pretty useful word. Are there issues with using the GCIDE definitions?

It's out of copyright due to its age, so you can do what you like with it. Please add them! Equinox ◑ 13:33, 28 March 2015 (UTC)

The worst that could happen is that some of them won't meet our attestation standards. Add 'em and we'll sort that out eventually. DCDuring TALK 13:41, 28 March 2015 (UTC)

Oh, yea. You should register. It makes it easier for us to communicate with you in a friendly way. DCDuring TALK 13:44, 28 March 2015 (UTC)

Thanks, I'm registered now. User:Pnelsonmusic But obviously a newby. I've been reviewing the GCIDE for a search engine linguistic processing project and have noticed these differences between it and Wiktionary. Maybe I'll write a program to identify all missing items. TALK 13:52, 28 March 2015 (UTC)

We look forward to your contributions. Equinox has done a lot of work on getting entries from Webster 1913. DCDuring TALK 15:43, 28 March 2015 (UTC)

Not all of us really approve of copying definitions from other dictionaries, even when they are out of copyright. Definitions are supposed to be our own work. But I have no objections on obtaining lists of words from ANY dictionary or similar source - I do that myself. SemperBlotto (talk) 09:15, 29 March 2015 (UTC)

But we see farther if we stand on the shoulders of the giants who preceded us. DCDuring TALK 09:43, 29 March 2015 (UTC)

How can I add a "thank" note?

How can I add it to edits in entry histories, next to "undo"? Or is it visible to other users and not to myself? I'm more used to the "undo" function being used, or just tacit approval of edits. Donnanz (talk) 17:18, 28 March 2015 (UTC)

The person receiving thanks gets a notification. Others have to read the log. — Ungoliant ^(falai) 17:37, 28 March 2015 (UTC)

I understand that, I've done that myself and have also received thanks. But I'm afraid that doesn't really answer the question. It doesn't show in the entry history for edits I do. Donnanz (talk) 18:55, 28 March 2015 (UTC)

Maybe your page histories look different from mine, but when I look at a page history, the "thank" button is there for all diffs except my own and those made by anons. —Aɴɢʀ (talk) 10:16, 29 March 2015 (UTC)

Ah, I was beginning to suspect / think that. Thanks, I guess that solves that. Donnanz (talk) 10:20, 29 March 2015 (UTC)

Death and taxes

No entry for death and taxes? Really? ;) ~ hey zeuss 05:07, 31 March 2015 (UTC)

Damn! UD has it and we don't! We are doomed. DCDuring TALK 12:49, 31 March 2015 (UTC)

What is it supposed to mean; how would you define it? It's just part of a popular phrase about things that are inevitable, but it still only means, in that phrase, DEATH + AND + TAXES. Equinox ◑ 15:30, 31 March 2015 (UTC)

Bad Romanian translations

I know I keep sounding like a broken record - this is not the first time I bring this up - but I've been monitoring Romanian translations and entries, and BaicanXXX is adding incorrect entries again. A great number of contributions are direct translations and don't reflect Romanian equivalents. For instance answerphone is "robot telefonic", not telefon cu răspunzător de apel or telefon cu răspuns automat. The term traveling in basketball is translated pași and not pași greșiți. Baican's translations are four times out of five explanations and not Romanian equivalents. I've also recently found out that he has several operating sock puppets, most of which have been blocked by Romanian Wikipedia administrators because Baican kept using words that don't exist and users who opposed this user's way of contributing were harassed. I just want to know what to do; I usually correct mistakes when I see them, but it's hard to keep up. If his contributions are deemed to be ok, then I'll back off. I just want to know which policies apply. Thank you in advance, --Robbie SWE (talk) 19:59, 31 March 2015 (UTC)

This is definitely a worthy issue. I've given him a warning on his talkpage about it. The way you can help is to a) revert his incorrect/unidiomatic translations and fix them, b) tag his entries for WT:RFV if they're not actually used in Romanian or WT:RFD if they're just a sum of parts, and c) mention on his talkpage if he continues to make these errors so that any admin who may wish to block him in the future has a record of any offences that may occur after he was warned. —Μετάknowledge^{discuss/deeds} 06:39, 2 April 2015 (UTC)

Oh, and I forgot to mention: if he is operating sockpuppets here, please report them! Abusing multiple accounts is not okay (e.g. editing with one account if another has been blocked). —Μετάknowledge^{discuss/deeds} 06:53, 2 April 2015 (UTC)

Ok, I'll make changes where needed from now on. I blocked Baican's sock puppets (the ones I could find were Bon.line, LaPietre, Trepier and WernescU) back home in the Romanian Wiktionary after making sure that the administrators of the Romanian Wikipedia confirmed that these accounts truly belonged to the same user. Just a heads-up about Baican's modus operandi: he never responds in the language of the Wiktionary project he is active in, so don't expect him to answer in English. --Robbie SWE (talk) 18:26, 2 April 2015 (UTC)

Unless he's currently active, there's not a lot to discuss they all need checking individually. It's not a policy issue. Many of them are so bad even I can confidently remove them. Many of them are sentences, like for marathon he'd add in Romanian race of 26.2 miles. That's a fictional example but many of them are as bad as that if not worse. Renard Migrant (talk) 23:11, 2 April 2015 (UTC)