Wiktionary:Beer parlour/2014/October

What makes a single word idiomatic?

I think it would be nice if we took WT:CFI a bit more seriously. I mean, de facto there's no problem because nobody's forcing us to apply our own rules; there's no 'court of appeal' if there's a deletion decision that goes against WT:CFI. Anyway.

Under General rule:

"A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

Under Idiomaticity:

"An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components."

So, all terms have to be idiomatic (well, it's a 'somewhat more formal guideline'. Viewed from that perspective, it does make it sound like attested and idiomatic aren't in the rules, they're just in the guidelines!), but in terms of CFI, it only give guidelines on what idiomatic means from an expression. Given that all terms have to be idiomatic, what's the test for say, hat, or reenter?

I know it's hard work, but I just think it would be nice if we could take ourselves a bit more seriously. Renard Migrant (talk) 11:14, 4 October 2014 (UTC)

For hat it's obvious because its meaning cannot be easily derived from its phonemes /h/, /æ/, /t/, since phonemes do not have any meaning to convey. For reenter it's less obvious because its meaning can be derived from the meaning of re- and the meaning of enter, but we seem to have an (unwritten?) agreement here that everything written together without a space is eligible for an entry. That convention breaks down, however, for languages that are not usually written with spaces; and it has been controversial for polysynthetic languages that may write whole phrases like "he had had in his possession a bunchberry plant" as one word without spaces. For English, the only real ambiguity is in expressions that are written with spaces, because there is no unambiguous criterion to distinguish idiomatic ones from unidiomatic ones. Probably everyone agrees that hot dog is idiomatic and hot lightbulb isn't, but between those two extremes there's a continuum, not a clearly defined split. —Aɴɢʀ (talk) 12:53, 4 October 2014 (UTC)

Make Categories Show Where They're Defined

I would like to propose that the category templates be modified to show the name of the data (sub) module where the information for the category resides. This would make it easier to make changes, and also make it easier to figure out where a new category analogous to existing ones could be added.

Adding documentation pages to modules is helpful, but it still takes a bit of wandering the maze of modules and sub-modules and data sub-sub-modules to figure out where category information resides. This shouldn't be too hard, since the modules have to have this information at some point- it's just a matter of developing protocols for passing it back to the templates.

It might also be nice to give instructions on where to go to get changes made, but that may not even be settled yet. This is all part of a larger problem with our newer Lua-based architecture, which is that things are centralized in data modules and impossible for non-admins to access, but I'll leave that for a separate topic. Chuck Entz (talk) 16:43, 4 October 2014 (UTC)

All the categories have an "edit" button already, and it's been there for a few years maybe. You never noticed? —CodeCa t 16:47, 4 October 2014 (UTC)

Why are you so surprised? It's not what one one expect from how many other things work. Human attention works that way. Given that the question of category documentation and editing has been asked before without answer, Chuck probably assumed that it must be a policy matter. DCDuring TALK 18:47, 4 October 2014 (UTC)

And, once the edit button is clicked on, then what? DCDuring TALK 18:51, 4 October 2014 (UTC)

~~I wrote three paragraphs. You never read them?~~

If you click on Edit for Category:English colloquialisms, you get:

{{poscatboiler|en|colloquialisms}}. poscatboiler contains:
{{#invoke:category tree|show|template=poscatboiler|code={{{1|}}}|label={{{2|}}}|sc={{{sc|}}}}}, so we go to that module.
Module:category tree refers us to:
Special:PrefixIndex/Module:category tree. The logical next step is:
Module:category tree/poscatboiler. This refers us to:
Module:category tree/poscatboiler/data, which refers us to:
Special:PrefixIndex/Module:category tree/poscatboiler/data, which contains dozens of submodules. Fortunately, I've been working with categories long enough to spot:
Module:category tree/poscatboiler/data/terms by usage as the most likely choice.

And there it indeed is. What I'm proposing is a line at Category:English colloquialisms that refers you to Module:category tree/poscatboiler/data/terms by usage without your having to going through all the steps above. I've worked a lot with categories, and I know something about templates and modules, and there are times when I have to look at several data sub-sub-modules before I can find where the configuration is for a given category. Sure- it's simple! Chuck Entz (talk) 18:58, 4 October 2014 (UTC)

CodeCat was referring to the small edit button next to the text. You were referring to the edit tab at the top, which is the first place one would look to edit something other than a section. Someone introduced a non-standard positioning of the edit option and expected it to "of course" be noticed by anyone with half a brain. But that is simply not true: habits that are reinforced by thousands of successful repetitions are not easily overcome and cause attentional blindness to such things as small edit buttons in unexpected places. DCDuring TALK 19:11, 4 October 2014 (UTC)

Ah, that explains it! No, I never noticed it. I was wondering how she could have so completely missed my point. That feature does, indeed, make my proposal rather redundant- but it might still be useful for those who are trying to figure out how the categories work, but aren't going to be editing data modules. Perhaps a combination would be a good idea, such as "This category is defined at Module:category tree/poscatboiler/data/terms by usage" with the edit link at the end. Chuck Entz (talk)

That would be a bit too long to fit where the edit button currently is. Do you know where else it could be placed? —CodeCa t 19:52, 4 October 2014 (UTC)

How is what happens after one clicks the edit link self-explanatory? Some kind of help (colored green?) to click on next to the edit button would both make the edit button more visible and afford an opportunity to explain further. DCDuring TALK 20:42, 4 October 2014 (UTC)

Is new to me too. Here's some ideas for making it more visible:

Add a hidden category to the category pages, e.g. Category:Categories defined by Module:category tree/terms by usage. (And that category can then explain more in its description, and link more obviously to the module). Editors are more likely to have hidden categories showing, so may notice.
Change the text to something more descriptive, such as "", and/or perhaps an even more wordy hover text, e.g. "Edit the module which defines this category's description, category parent, and category text."
Add an item to the left nav under "Tools". (Though that would probably be even less noticed)

Also, pages like Module:category_tree/poscatboiler/data/terms_by_usage could really use some docs to say what is and isn't safe to edit, how to propose or add new categories, and how to test that your edits aren't going to break everything. (Especially as it buttons encourage users to edit it). Even if you know Lua and something about Wiktionary, you still don't know what can be edited safely on that page.

Perhaps a whole other conversation, but the docs on each category page really should say (or link to) how a regular editor can add a page to that category, e.g. which template or group of templates are used in the article space to add the category tag and whether it needs additional parameters to cause it to be added, etc. Though that's a whole other conversation and perhaps a thankless task to document properly. Pengo (talk) 11:52, 5 October 2014 (UTC)

How about in an editnotice? --Yair rand (talk) 14:13, 5 October 2014 (UTC)

Time-waster

Considering that so much time has been wasted on rfv's/rfd's due to misspellings (especially in hyphenation) resulting from scannos, should we expand our criteria for inclusion page with notifications/warnings or something? Just a suggestion. Zeggazo (talk) 20:15, 4 October 2014 (UTC)

Category:Arabic definitive nouns???

First of all, shouldn't these be "definite nouns", not "definitive nouns"? Second of all,four of the five entries in this category are simply the definite equivalents of Arabic lemma nouns (which are always in the indefinite). The definition itself specifies this. The definite equivalents are formed simply by appending "al-" (or rather, the Arabic equivalent) to the noun. I thought there was a policy not to include such forms unless they have an idiomatic definition? I'm going to add {{delete}} tags soon but I want to make sure others don't disagree.

BTW the fifth of five entries is the word العَرَبِيَّة (al-ʕarabiyya), which has a special meaning ("the Arabic language"), separate from the word عَرَبِيَّة ("carriage" or "female Arab"), so it should be kept. Benwing (talk) 08:38, 5 October 2014 (UTC)

Perhaps nouns and proper nouns where ال (al-) is always used should still be categorised as "definite nouns"? It's useful for readers to know that a term is formed by al- + the stem. Not sure if ALL such terms should be redirected to terms without the definite article. --Anatoli T. ^{(обсудить}/^вклад) 23:36, 5 October 2014 (UTC)

OK, So no one answered my question. For the ones that are simply the definite equivalents of existing lemmas, with no special meaning, should I delete them, or keep them and use something like {{definite of}}? I think we should delete, since otherwise we're setting a precedent for creating definite equivalents of every single noun out there, which is crazy, since they're all formed trivially in exactly the same fashion by just adding "al-" (actually ال (al-), in the Arabic script) onto the beginning of the noun. It would be comparable to creating entries for the car and the boat and the kumquat, etc. etc. Any objections to me deleting them? Benwing (talk) 10:31, 8 October 2014 (UTC)

Normally terms should be RFD'ed for deletion but since they are definitely just "definite article + noun" entries, yes, delete all, except العربية and اللغة العربية. If you don't have the rights to delete, I'll delete them for you. العربية and اللغة العربية should probably be RF-ed or RFV-ed, not sure. --Anatoli T. ^{(обсудить}/^вклад) 22:29, 8 October 2014 (UTC)

I would also keep الأمين as one of the names of Muhammad and also given name after that. --Wiki Tiki 89 22:49, 8 October 2014 (UTC)

Also, {{ar-proper noun}} should automatically add to Category:Arabic definite nouns. --Wiki Tiki 89 22:52, 8 October 2014 (UTC)

Yes, keep الأمين. Agree about proper nouns as well. --Anatoli T. ^{(обсудить}/^вклад) 22:59, 8 October 2014 (UTC)

Definite forms in Arabic are not written with a separating space, as far as I know, so they closely parallel the definite forms of the Scandinavian languages. Since we have separate entries for those (dag, dagen, dagar, dagarna), we should probably also have separate entries for the definite forms of Arabic nouns. —CodeCa t 23:20, 8 October 2014 (UTC)

Arabic grammar doesn't consider definite articles part of the word. Exceptions are proper nouns. Also, monosyllabic prepositions consisting of one consonant and a short (unwritten) vowel are spelled together, they are separate words, unless they are adverbs (debatable), e.g. بِسُرْعَة (bisurʕa) -quickly (lit.: "with speed"), preposition بِ (bi-) + سُرْعَة (surʕa) (speed), enclitic pronouns بَيْتِي (baytī) "my house", بَيْت (bayt) + my - "ي" (-ī). Scandinavian, Bulgarian/Macedonian, Albanian definite forms are also debatable but they should be considered separately. Korean particles and copulas are also written without a space but they are considered separate words. 도서관 에 (doseogwane) "to the library" = 도서관 + 에. --Anatoli T. ^{(обсудить}/^вклад) 23:43, 8 October 2014 (UTC)

Arabic, Hebrew, and Aramaic have a lot of clitics and we have a consensus generally not to include words with clitics. The definite article is arguably one of these clitics, although in Aramaic the definite form is actually the lemma form. However, we do seem to have a status quo of generally not including the definite forms for Arabic and Hebrew. --Wiki Tiki 89 02:52, 9 October 2014 (UTC)

The Latin word com has no entry.

The Latin word com, a component of commodus does not have an entry. GHibbs (talk) 08:06, 6 October 2014 (UTC)

Is it ever a free-standing word? As a prefix we have com- (and con-, col-, cor-, and co-). DCDuring TALK 10:39, 6 October 2014 (UTC)

The free-standing word corresponding to com- is cum. —Aɴɢʀ (talk) 16:37, 6 October 2014 (UTC)

Transliterations for headword-line inflections

Previous discussion: Wiktionary:Beer parlour/2013/October#Transliterations for inflected forms in headwords?

This was discussed before a while ago, but didn't reach much of a conclusion. The question is how to deal with transliterations of inflected forms that are displayed in headwords. Module:headword, and by extension many of our current headword-line templates, do not support this at all. But for Arabic we've always displayed transliterations for inflected forms, and the templates therefore had to be custom-made to handle this.

I imagine it's best to have a single common behaviour for all languages. So the question is, should we include them for all languages, for none, or for some subset? And if only for some subset, then based on what criteria? —CodeCa t 16:08, 7 October 2014 (UTC)

My 2p is on all. As the EN WT, our user base can be assumed to read English. If an entry is in a non-Latin script, we cannot assume that our users can read the headword, and as such, for the sake of usability (among other factors), we should provide transcriptions. ‑‑ Eiríkr Útlendi │ Tala við mig 17:26, 7 October 2014 (UTC)

I thought that our "ground rules" said that all non-Roman texts should (eventually) be transliterated - and that this could be by means of "pop-up" text if necessary or wanted. — Saltmarsh^{απάντηση} 17:44, 7 October 2014 (UTC)

Transliterate all. --Vahag (talk) 18:42, 7 October 2014 (UTC)

Don't transliterate Russian inflected forms or some other languages having irregular pronunciations. It may also look quite messy if there are a lot of forms in the header. Arabic editors want to transliterate all, so be it. I don't object Arabic transliterations. --Anatoli T. ^{(обсудить}/^вклад) 22:36, 7 October 2014 (UTC)

I'm not sure I understand your reasoning. If I understand correctly that by "irregular pronunciation" you mean "pronunciation not fully predictable from spelling", then it seems to me that those cases are exactly the ones where a transliteration would be useful. Then again, we've already established that many editors here disagree with the practice of using pronunciation as a guide to transliteration in phonemic scripts such as Cyrillic. —CodeCa t 22:49, 7 October 2014 (UTC)

I agree with Atitarev that we should transliterate inflected forms only for languages for which the transliteration is essential to understand the structure of the inflected form. For languages such as Arabic, for which transliterations could be considered superfluous when the words are fully vowelated, there is another consideration: It may be difficult for some readers to see the vowel diacritics, making the transliterations essential to these readers. For languages like Persian, for which we do not indicate vowels at all in the native script, transliterations are absolutely essential. --Wiki Tiki 89 22:47, 7 October 2014 (UTC)

What about users who want to know what is written, but are not learned in reading it? Arabic looks like nonsensical squiggles to me, and without transliterations the forms might as well not be there at all. For Cyrillic or Greek the consideration is no different, except that I just happen to be able to read those scripts. But there will of course be many users that can't. —CodeCa t 22:51, 7 October 2014 (UTC)

Someone who cannot read a language is unlikely to need to know how a word inflects. --Wiki Tiki 89 00:04, 8 October 2014 (UTC)

@Wikitiki89 I guess I'm unlikely then? —CodeCa t 00:33, 8 October 2014 (UTC)

Yes, you are one of the few. Keep in mind that our inflection tables usually do have transliterations. But if you are interested enough in Arabic, I suggest you learn the alphabet. Otherwise you would be comparable to someone wanting to learn chemistry without learning the chemical element symbols or someone wanting to learn calculus without learning mathematical notation. --Wiki Tiki 89 11:30, 8 October 2014 (UTC)

Does adding a romanization to inflected forms harm the project in any way? It seems to me instead that it would add value. Perhaps I happened across the term რეჰანმა (rehanma) and simply wanted to know roughly how to read it, without any knowledge of the Mkhedruli script. Thankfully, this entry for an inflected form already includes a romanized spelling. Would you advocate for removing romanizations from inflected forms? If so, why? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)

@CodeCat, Many editors doesn't mean there's a consensus. If you haven't noticed there are a lot of languages with irregular pronunciations and transliterations (exceptions). There's no practice in published dictionaries to transliterate Russian or Greek, hence an in-house (Witktionary) transliteration method is used. "narodnovo" and "narodnogo" are equally attestable transliteration of genitive form of наро́дный (naródnyj) - наро́дного (naródnovo). Japanese and Korean exceptions are partially handled by smart modules (some Korean exceptions still need to be transliterated manually, such as 십육) but Russian is not, こんにちは is "konnichi wa", not "konnichi ha". Do I need to bring up that argument again? Hindi, Thai, Lao, Greek also have irregularities, which are reflected in standard or Wiktionary transliterations. Automatic transliteration would cause, e.g. ру́сского appear as "rússkogo", which should be "rússkovo" (gen. of русский) --Anatoli T. ^{(обсудить}/^вклад) 23:03, 7 October 2014 (UTC)

Cyrillic, Greek, Armenian, Georgian vs Hangeul, Arabic, Hebrew, Thai, Devanagari, etc. The former are considered "easy" by dictionary publishers, although Devanagari is very phonetic. Since dictionaries usually don't use transliterations for the former, we have this argument that those should reflect the spelling, letter-by-letter whereas the difficult ones use phonetic transliterations or transcriptions, mixture of literal and phonetic. You can learn about transliterations for complex scripts and see that they are full of exceptions, most are documented ("standard" or "scientific"). --Anatoli T. ^{(обсудить}/^вклад) 23:13, 7 October 2014 (UTC)

Reading the above, I think it would be useful for us to be clear about transcription -- changing one script for another, such as “ру́сского” → “rússkogo” -- versus transliteration -- which would include phonetic considerations, such as “ру́сского” → “rússkovo”.

Anatoli, do you (or any others) have any objection to transliteration? ‑‑ Eiríkr Útlendi │ Tala við mig 23:29, 7 October 2014 (UTC)

@Eirikr You seem to have gotten transcription and transliteration backwards. Transcriptions are phonetic while transliterations are (supposed to be) graphemic. --Wiki Tiki 89 00:04, 8 October 2014 (UTC)

Fair enough, I may have gotten it backwards. But the point stands -- are we worried about orthographic fidelity, or phonetic? Or do we even want both? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)

@Eirikr, have you read all of my posts above? Would agree to transliterate こんにちは as "konnichi ha" and 십육 as "sibyuk"? Modern standard transliterations go far beyond just representing words simply letter-by-letter. They use a lot of phonetic considerations, call them transcriptions, if you wish but they are not. "rússkovo" is not 100% phonetic, only shows irregular pronunciation of "г", it's pronounced (the phonetic respelling is "ру́скава"). --Anatoli T. ^{(обсудить}/^вклад) 23:37, 7 October 2014 (UTC)

BTW, fully automated Arabic transliteration will affect irregular Arabic words, such as إنْجِلِيزِيٌّ (ʔinjilīziyyun), which is pronounced the "Egyptian" way - "ʾingilīziyyun" and other loanwords and dialectal pronunciations. It's probably fine, just need to be aware of this. --Anatoli T. ^{(обсудить}/^вклад) 23:46, 7 October 2014 (UTC)

Just to make sure, you realise that if we do have transliterations for inflections on headword lines, there will also be parameters on {{head}} to override any default ones? —CodeCa t 23:49, 7 October 2014 (UTC)

I suspected there would and should be but the task is too big. All adjective-like nouns will be affected first (-ого, -его/-ёго genitive endings), all words where (Cyrillic) "е" is pronounced as "э" (the largest group of exceptions). --Anatoli T. ^{(обсудить}/^вклад) 23:55, 7 October 2014 (UTC)

@Atitarev, Wikitiki89 I'm left unsure -- do you two oppose the addition of romanizations on inflected forms, or do you instead oppose an automated approach that might introduce errors? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)

I oppose the addition of romanizations on inflected forms for two reasons (for Russian) - 1. The irregular words will need to be transliterated manually or might introduce errors. 2. The headwords get cluttered. (genitive sg., nom. plural, feminine form - are the possible inflected forms for Russian). It doesn't have to be for all languages like that. --Anatoli T. ^{(обсудить}/^вклад) 05:34, 8 October 2014 (UTC)

Your mention of "clutter" led me to look into Russian entry format. Here's a sample headword line from the entry for русский:

ру́сский • (rússkij) m anim, m inan (genitive русского, nominative plural ру́сские, feminine ру́сская)

This looks like a bit of a mess to me; all of the additional headword information for inflected forms is already given, as expected, in an Inflected forms table contained within the entry.

Redundancy aside, I think русский (russkij) is already fine -- there's a romanization of the headword, and the Inflected forms table provides romanizations of all other forms.

My current understanding of general policy, and this proposal, is that we want to make sure that all entries in non-Latin scripts include romanizations. So I'm really not worried so much about the lack of romanization for the link to русская (russkaja) in the headword line for the русский (russkij) entry. (For that matter, I think the headword line should be simplified to remove the redundant and visually cluttered inflected forms, but that might just be me.) I'm more concerned about whether there is any romanization given in the actual entries for inflected forms. Gladly, русская (russkaja) does provide a romanization.

Would you be amenable to ensuring that all entries have romanizations? ‑‑ Eiríkr Útlendi │ Tala við mig 07:11, 8 October 2014 (UTC)

I'm going to add my 2 cents to transliterating all inflections in all languages, but I think it's most important for languages like Persian and Arabic where vowels may not be written, and is important for Arabic even when vowels are written because of the difficulty that the average user will have in reading the script. So far it looks like Anatoli is opposed to transliterating inflections for Russian but not Arabic, Wikitiki might be similar, and everyone else is OK with transliterating inflections in all languages. Is this right?

I do think it's possible to make an argument that there's something qualitatively different and more "foreign" about Arabic or Devanagari or Thai vs. Greek or Cyrillic. Certainly this is the case for me. However, keep in mind, Anatoli, that you're a native Russian speaker whereas the majority of users of the English Wikipedia will not be, and might well be trying to learn a foreign language and so care about the inflections, but not be very comfortable with the script.

BTW as for the clutter issue, the same "issue" should theoretically appear in Arabic, but IMO the previous way of doing things (before CodeCat changed it), which did display transliteration of all Arabic inflections, didn't look especially cluttered. The trick here I think is to put the inflections outside of the parens, so that you don't end up with nested parens when you display the transliterations. Benwing (talk) 08:20, 8 October 2014 (UTC)

I agree that we put too much information on the inflection lines of Russian nouns. There is absolutely no need for the genitive or plural in the headword line, unless the form is irregular. The feminine form is useful, however. If the argument is about showing the stress pattern, then the genitive is needed only for nouns ending in a consonant (or ь). But I still don't see why the declension table isn't enough for this. --Wiki Tiki 89 11:30, 8 October 2014 (UTC)

Just to clarify my position on Russian headwords. I don't oppose the information (it's helpful, can help quickly identify stress patterns and declension types and plural forms) but I don't think it's a good idea to transliterate inflected forms. --Anatoli T. ^{(обсудить}/^вклад) 00:33, 9 October 2014 (UTC)

The genitive only helps identify the stress patter for nouns that end in a consonant, and only the singular stress pattern at that. It is completely useless for nouns that end in consonants, as the singular stress pattern is apparent from the nominative, except for nouns ending in -а, which may need the accusative (but certainly not the genitive). The nominative plural is insufficient to identify the plural stress pattern. You additionally need one other plural form other than the plural genitive and also the plural genitive in some cases. At that point, there is too much information in the headword line and we already have declension tables with all of this information. --Wiki Tiki 89 03:02, 9 October 2014 (UTC)

I disagree (please review your post, you have two contradicting statements - the first two sentences, so I don't know what you mean there). There are 6 stress patterns: Appendix:Russian stress patterns - nouns + some nouns that are irregular.

Consonantal endings:

до́ктор - до́ктора - доктора́
ди́ктор - ди́ктора - ди́кторы

Ь or "hissing" sounds:

ле́карь - ле́каря - ле́кари (stress pattern 3 is also acceptable)
сле́сарь - сле́саря - сле́сари/слесаря́ (то́карь is the same)
глуха́рь - глухаря́ - глухари́
врач - врача́ - врачи́
това́рищ - това́рища - това́рищи

Do I need examples for vowel endings? For people mastering the basics of Russian, including native speakers, this info is usually sufficient without looking at the full declension table. --Anatoli T. ^{(обсудить}/^вклад) 03:30, 9 October 2014 (UTC)

Maybe you misunderstood my post. For nouns that end in consonants (including ь), I agree that the genitive singular helps determine the stress pattern for the singular. For nouns that end in vowels, the genitive singular is of no help at all, since the stress is always in the same place as in the nominative singular. Furthermore, for nouns that end in -а, the accusative might have a different stress from the nominative, yet for some reason we do not include it. For the plural, the nominative plural is insufficient to determine the full plural stress pattern. More information is needed as I explained above, and that would completely overwhelm the headword line and defeat the purpose of having inflection tables. --Wiki Tiki 89 03:43, 9 October 2014 (UTC)

-а nouns are only one portion of nouns, large but not huge. You still need to know that plural and gen. sg for ка́ша is ка́ши, not ка́шы (beginner level) and томоды́ is a form of томода́. Animacy helps determine the accusative. Well, yes, it's not comprehensive but sufficient in MOST cases. Apart from stress patterns, there are other things - колесо́ -колеса́ - колёса, огонёк - огонька́ - огоньки́, и́мя - и́мени - имена́. Knowing that "-а" nouns (NOT ALL VOWELS, just "а"!) are predictable is a blessing but there are too many other declension and stress patterns. I want to reiterate that gen. sg. and pl. nom. forms are sufficient to determine THE FULL STRESS PATTERN (usually). --Anatoli T. ^{(обсудить}/^вклад) 04:38, 9 October 2014 (UTC)

Someone who does not know the rules for ы vs и will probably need the full declension table anyway to figure anything out. Can you give me an example of a noun that ends in a vowel (not including ь or й) whose stress pattern for the singular cannot be determined from the nominative? (I don't believe there are such nouns, but if you can prove me wrong, go ahead.) Note that I am all for including the genitive for nouns ending in consonants. As for the plural, the "usually" part is exactly my point. If there are exceptions, then you can't say that the full stress pattern can be "determined", but only "guessed". I noticed other Russian dictionaries tend to include the genitive and/or the dative for the plural in cases where there could be confusion. But the more we include, the more we get back to the question of why isn't the declension table enough? --Wiki Tiki 89 05:26, 9 October 2014 (UTC)

Haven't I already with колесо, имя, голова, борода (unlike simple one like женщина with stress pattern 1? What about о́блако - о́блака - облака́ ? --Anatoli T. ^{(обсудить}/^вклад) 05:37, 9 October 2014 (UTC)

Perhaps you should re-read which forms I am referring to: nominative singular (колесо́, голова́, борода́) and genitive singular (колеса́, головы́, бороды́). Although you did remind me that the n-stems such as имя are possible exceptions; we should definitely include the genitives for them. --Wiki Tiki 89 05:49, 9 October 2014 (UTC)

And here's a good one for you: with "-а": голова́ - головы́ - го́ловы, борода́ - бороды́ - бо́роды. So it's not absolutely useless, even for this type of nouns. :) --Anatoli T. ^{(обсудить}/^вклад) 05:01, 9 October 2014 (UTC)

Umm... Yes it is useless. Unless you're blind, you can see that the genitive singulars you just gave have the same stress as their corresponding nominative singulars. --Wiki Tiki 89 05:26, 9 October 2014 (UTC)

Hmm, what?! Have you read it carefully? голова is not like most nouns ending in "-а" and stress patterns can be determined not just by genitive sg but gen. sg + nom. pl in combination! See the table again. It's pattern 6, not 1, example given: полоса́ (same pattern as голова and борода). --Anatoli T. ^{(обсудить}/^вклад) 05:37, 9 October 2014 (UTC)

Perhaps you should re-read which forms I am referring to. My point is that in these cases, if you have the nominative singular and the nominative plural, then the genitive singular adds no new information (since the singular pattern is determined from the nominative singular and the plural pattern has nothing to do with the genitive singular). --Wiki Tiki 89 05:49, 9 October 2014 (UTC)

Displaying genitive sg just shows that it's "as expected", treating vowel and consonant endings differently doesn't make much sense. --Anatoli T. ^{(обсудить}/^вклад) 05:58, 9 October 2014 (UTC)

Then instead of treating the vowel and consonants differently, let's use this simple rule: if the stress in the genitive is in a different place from the nominative (or if the stem itself is different, such as for день/дня or имя/имени) then we include the genitive, otherwise it is "as expected" and we exclude it to avoid clutter. If the user is still unsure, then they can check the declension table. --Wiki Tiki 89 06:07, 9 October 2014 (UTC)

The modules are complicated as is. I don't see the need to change the Russian noun headword. The Russian headword style was discussed and agreed on a while ago. Even if genitive is hardly the crucial case, it's an example of a case and shows how nouns may change. --Anatoli T. ^{(обсудить}/^вклад) 01:32, 10 October 2014 (UTC)

Who exactly "agreed" on this, just you and CodeCat? I don't think there is anything wrong with using the genitive as opposed to another case, I just don't think we need to include it for every word. --Wiki Tiki 89 11:21, 10 October 2014 (UTC)

Right, I too favour not including inflected forms in Russian headword lines, but practices for Russian are usually determined by a minority here. Refer to the transliteration debate. --Vahag (talk) 12:46, 10 October 2014 (UTC)

If transliteration for the headwords is chosen I'd favour removing inflected forms from the Russian headword altogether. That way, there won't be any additional reasons for arguments, introduced discrepancies with the existing transliteration practice. @Wiki, having genitive in some terms and not the others will be confusing. Also, if you don't like something, don't do it. You're under no obligation to edit in Russian and genitive sg. and plural forms are optional. I've added manually genitives and plurals on many entries, CodeCat did it with a bot and did the headword changes, no conspiracy here. @Vahagn, you can direct your anger at all other languages where transliteration is not 100% graphemic. Transliterating English into Armenian or Russian graghically wouldn't be very useful, would it?--Anatoli T. ^{(обсудить}/^вклад) 13:22, 10 October 2014 (UTC)

The question isn't about whether the transliteration is graphic, but whether it represents the written expression of the word rather than the spoken one. For example, a reasonable Cyrillization of English that aims to represent the written language would transliterate colonel as колонел rather than as кёрнел, but bite would still be байт rather than the silly бите. --Wiki Tiki 89 14:10, 10 October 2014 (UTC)

Consensus on transliteration of headword inflections?

Irrespective of the question of how much info to include in Russian headwords, can I propose a consensus around the following?

For Cyrillic (and maybe also Greek), don't include transliterations of inflections in headword lines.
For other non-Latin scripts, do so. This info comes either from an explicitly given transliteration or, failing that, from auto-transliteration when it is available and is able to succeed.

My preference would be to transliterate all inflections, but I can accept this compromise for the purpose of consensus. The logic here might be something like this: Cyrillic and Greek are similar enough to Latin script, and easy enough to learn, that there's a reasonable likelihood that someone interested in the inflections of a foreign word has a decent command of these scripts, whereas other scripts are generally much harder to learn and especially to master fluently to the point where a transliteration isn't helpful. This is certainly my experience: I've learned Arabic script and tried to learn Thai script and Devanagari, and my experience with all of these is that it takes a lot more work to become comfortable reading these fluently than it does with Cyrillic or Greek, both of which I learned easily. Even after a lot of work with Arabic I still sometimes stumble over the letters, and find the transliteration very helpful. An additional consideration for Arabic script is that some of the vowels are typically omitted, making transliteration essential. Even when vowels are present, they're often hard to read properly because of font considerations (the vowels are displayed above or below the letters and frequently get drawn over letter descenders or other diacritics, or sometimes a vowel below the line can be confused with a vowel above the next line below). Benwing (talk) 04:19, 9 October 2014 (UTC)

I have already expressed my opinion. Yes, splitting "easy" and "complex" scripts sounds reasonable to me. I have to ask about Korean inflected form (verbs and adjectives). @Wyang, what do you think, do we need to transliterated Korean inflected forms in the headword? Vahagn wants Armenian (and probably Georgian) to be fully transliterated. --Anatoli T. ^{(обсудить}/^вклад) 04:38, 9 October 2014 (UTC)

I think the idea of compulsorily applying headwords to all languages is silly, and a lot of languages would be much better off without it, including the non-inflecting languages and some agglutinative languages. I think the headword is being overused in two aspects: 1) pronunciation; 2) inflection. For Korean, the inflection information in the headword more properly belongs in the conjugation section, and it can be moved to the top of the conjugation table as another table (identifying the key forms) alongside the stems table. The romanisation in the headword is redundant and should be removed. There is then no need for information or parameter duplication as in the cases of 십육 (rv=) and 아름답다 (irreg=y). In the division of "easy" and "complex" scripts, Korean would definitely be classified as an "easy" script, especially according to the Hangul supremacists. It's also called "morning script", as "a wise man can acquaint himself with them before the morning is over; a stupid man can learn them in the space of ten days". Wyang (talk) 22:35, 9 October 2014 (UTC)

This isn't a question of whether to have info in headwords but whether to transliterate them. I personally see Korean as a bunch of random squiggles, so for me it's not that easy. I have also heard that romanization of Korean involves various considerations beyond mere transliteration, i.e. the transcription shows various sorts of assimilations. I think one problem here is that people are thinking in terms of their own expert knowledge rather than the likely audience, which is someone who is a native English speaker and foreign language learner who may not have much experience with a foreign script. Benwing (talk) 00:19, 10 October 2014 (UTC)

I also used to look at Korean and Arabic as a bunch of squiggles, until I started learning these languages. Changes in the Korean transliteration make perfect sense when its phonology is understood. And learning a foreign script without learning a bit of a language using it doesn't make much sense. So, learning a script in a day or in a few days is applicable to people speaking that language. Arabic was somewhat easier for me (with good fonts only) and I still think Arabic script is easier and would be quite easy if vowel points were always written (I'm not suggesting it should). I think some info in the Korean headword is useful but for me the important bits are not those currently appearing there. --Anatoli T. ^{(обсудить}/^вклад) 01:20, 10 October 2014 (UTC)

OK, consensus appears to be:

No translit for Cyrillic, Greek or Korean scripts.
Yes for others.
@CodeCat, can you implement that? We can always add additional exceptions later if needed. Benwing (talk) 04:11, 10 October 2014 (UTC)

Sorry guys, wrench-thrower here --

What constitutes a "simple script"? Who decides what is "simple"?

Again, I must note that, as the English Wiktionary, our only safe consideration we can make when it comes to scripts is that our user base can read the Latin script. I reiterate my position that I believe we should provide romanizations for all headwords not written in the Latin script.

One argument against including romanizations for certain non-Latin scripts seems to be that the scripts are "simple". Sure, any script (or anything at all, really) can be viewed as simple, once you've already learned it. Many other scripts are also pretty straightforward, with charts providing straightforward phonetic conversions. Are we to no longer provide romanizations for Mkhedruli? Gothic? Amharic?

An undercurrent appears to be that we shouldn't include romanizations because doing so would be difficult. That said, this whole project of creating a multilingual dictionary is itself an enormous amount of work. Is such a relatively small amount of additional work really so much of a hurdle? Romanizations are a very simple way to greatly increase the usability of Wiktionary as a whole.

As with everything here, those who don't want to do the work don't have to. But as far as policy or goals are concerned, I feel very strongly that deciding to not include romanizations for non-Latin-script headwords does us, as a project, a grave disservice. ‑‑ Eiríkr Útlendi │ Tala við mig 04:55, 10 October 2014 (UTC)

@Eirikr, a few points.

This issue concerns only the inflected forms in headwords. The headword itself is always transliterated, as are links.
I agree with you. I would rather see transliterations (transcriptions or romanizations, more correctly) of inflections for all non-Latin scripts.
I don't think it's an issue of how difficult it is but rather that some people seem to think it's "cluttering" the display.
My main concern for the moment is to find some workable compromise so that CodeCat is willing to put back auto-transliteration of Arabic inflections in headwords; I'd do that myself but I don't have permission to edit Module:headword. (Can I request such permission on a page-by-page basis or do I have to become an admin?)

Here's another possible compromise:

For scripts where there's no objection to transliterating inflections in headwords, we go ahead and put the transliteration there after the native-script inflected form, whether it's explicitly given or auto-transliterated. Let's say this will currently apply to all scripts except for Cyrillic and Korean, maybe Greek as well.
For scripts where people think doing this will "clutter" the headword line, include the transliteration in a mouse-over -- I think this is feasible. (It could be said that we should use mouse-over for all scripts, but I'd rather have the transliteration directly visible whenever possible -- it is faster to read that way, and users might not realize that the transliteration is present on mouse-over.) Benwing (talk) 12:15, 10 October 2014 (UTC)

I've added a temporary exception to Module:headword so that Arabic inflections are always transliterated. This will hopefully alleviate your immediate concerns, but I do hope that you'll continue to participate in the wider discussion. —CodeCa t 13:00, 10 October 2014 (UTC)

Thanks, and I will stay in the discussion. I wish more people would contribute; it's hard to form a consensus when only a small number of people speak up. Benwing (talk) 13:16, 10 October 2014 (UTC)

I realised I haven't stated my own opinion. I mostly follow Eirikr's reasoning, and think that transliterations should accompany all non-Latin-script terms in some form, wherever they are. Exceptions can be made in cases where terms generally appear paired with Latin-script alternatives, such as in Serbo-Croatian. —CodeCa t 13:18, 10 October 2014 (UTC)

I support transliteration of all forms listed in the headword line in all scripts other than Latin, preferably automatically generated, even if this means certain Russian forms will appear to end in -ogo instead of -ovo. Some people might say that's easy for me to say, since the only non-Latin-script language I spend much time on is Burmese, and Burmese doesn't have inflections. Nevertheless, I think it's preferable to transliterate them all rather than to try to decide which scripts are "simple" enough that they don't need it. —Aɴɢʀ (talk) 13:53, 10 October 2014 (UTC)
- It reminds me a bit of a debate we had some time ago, considering whether languages were "well known" and "major" enough to not be linked in translation tables and in {{etyl}}. Eventually we gave up on the debate and just made translations never link, and {{etyl}} always link. —CodeCa t 14:05, 10 October 2014 (UTC)

One thing we seem to be forgetting here: why are the inflections included in the headword line in the first place? They're included for those who know the rules of the language to figure out the inflection without looking through the tables. In other words, they're a shorthand for people who mostly don't need transliterations. For someone who sees the letters as scribbles, an inflected form is most likely just decoration, anyway- whether it's transliterated or not. That means that this isn't a matter of substance (with a few exceptions such as Arabic), but of style. Chuck Entz (talk) 16:58, 10 October 2014 (UTC)

But many languages don't have tables, so we include the forms on the headword line. And even in cases where there are tables, the forms we include on the headword line are sometimes not in those tables. —CodeCa t 17:00, 10 October 2014 (UTC)

Certainly for Arabic, this is exactly correct. The inflections list basic and very important things, like feminines and plurals for nouns and adjectives. For nouns and adjectives we don't currently have any inflection tables. There are other languages that are similar. I took a look at other non-Latin-script languages with inflections, and I can find only Russian and Georgian for nouns, and they also list basic things like the plural (and in the case of Russian, the genitive singular). I can easily imagine a situation where a learner has some concept of grammar -- doesn't take much to want to know how to form the plural -- but a shaky grasp on the native script. Benwing (talk) 23:29, 10 October 2014 (UTC)

For Russian, the genitive and plural forms are also in the tables. But for adjectives, there's the comparative forms, which are not in any table. For verbs, the imperfective and perfective counterparts are not in the table either. —CodeCa t 23:47, 10 October 2014 (UTC)

The world of language learners is not neatly divided into those who can read the script and those who can't read the script. If push comes to shove, I can read Sanskrit in Devanagari, but I'd rather read it in transliteration because it's easier. I don't know if our Sanskrit headword lines currently include principle parts or not (our coverage of Sanskrit is not great), but if it did, I would want to have translits on each form listed. Devanagari is not just scribbles for me, but it does take me about 10 times longer to read than transliteration. —Aɴɢʀ (talk) 08:28, 11 October 2014 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘

OK, a majority seems to want to see translit of inflections in all languages. This consists of (at least) me, CodeCat, Angr, Eirikr, Vahag, perhaps also Saltmarsh. A minority seems to either want translit of inflections in only some languages, or wants fewer headword inflections in certain languages, or both. This consists of Anatoli (doesn't want translit in Russian, is OK with the rest, is OK with headword inflections in general), Wyant (doesn't want translit in Korean, wants fewer headword inflections in Korean), and Wikitiki (seems to want fewer headword inflections in general, has expressed particular opinions about Russian, might also want less transliteration although I'm less sure about that).

So, we can do two things, it seems:

Take a vote.
Find some compromise that will satisfy both camps. I've proposed above the idea that we can transliterate the headword inflections of most non-Latin-script languages the traditional way (in parens or something similar, after the native-script word), and for the ones where people object (Korean, Russian), transliterate using a mouse-over popup.

I'd like each person who has expressed an opinion, and any others who want, to comment indicating whether they find #2 reasonable and whether they'd accept it, and if not, do they think #1 is the way to go, and if not, what do they think is the way to go? Benwing (talk) 09:02, 12 October 2014 (UTC)

I don't feel super strongly about this, so I'm open to finding a compromise. —Aɴɢʀ (talk) 15:22, 12 October 2014 (UTC)

I really like the idea of the mouse-over popup (or tooltip) transliteration; however, MediaWiki is imposing their own "preview" popup, which does not even work properly in any useful way on Wiktionary and I really wish we could get rid of it and make room for our own popups. --Wiki Tiki 89 14:30, 14 October 2014 (UTC)

What I'd like to see for transliterations is 1) the most common scheme used by default, for all languages 2) ability to switch between all of the popular transliteration systems available by clicking on a link placed near the headword, opening a popup menu with options 3) selected choice remembered when browsing other entries in the same language. 4) Ability to hide/show all transliterations for languages that use them. No "one true transliteration scheme" and no "one true transliteration display option". I believe that all of the necessary data can be generated in Lua, and selectively displayed/hidden using JavaScript. We should give users options not cripple them. --Ivan Štambuk (talk) 00:27, 15 October 2014 (UTC)

Phrasal verbs whose lemma is not the infinitive

I noticed that there are some phrasal verb entries in English that are conjugated, but the infinitive is not used as the lemma. An example I noticed just now is all hell breaks loose. This verb certainly does have an infinitive, all hell break loose. This is clear when you add auxiliary verbs: I want all hell to break loose or may all hell break loose. So I think we should move these entries to the infinitive. —CodeCa t 22:24, 11 October 2014 (UTC) ~~:We usually don't bother with inflecting phrasal verbs, as it just clutters the entry for no real gain. This kind of a case probably warrants it, however. DCDuring TALK 22:38, 11 October 2014 (UTC)~~

The problem is it just sounds funny when the subject of the verb is included. I know we moved there is to there be a while back, but it has the same problem: with the subject (even just a dummy subject there) present, the bare infinitive just sounds really odd. —Aɴɢʀ (talk) 22:42, 11 October 2014 (UTC)

It does, but you can't deny that the infinitive exists. So either we should make a specific rule for these cases, or we should continue to use the infinitive, right? —CodeCa t 23:13, 11 October 2014 (UTC)

Among OneLook dictionaries only Cambridge Idioms actually covers this and they do it at all hell breaks loose. DCDuring TALK 23:54, 11 October 2014 (UTC)

Whichever form we make the lemma, there should be redirects from the other forms. - -sche (discuss) 03:38, 12 October 2014 (UTC)

This isn't what I would call a phrasal verb, nor is it so categorized. It is a full sentence. As is the case with virtually all other full English sentences (See Category:English sentences.), the verb and sometimes the noun within can be inflected. (It is trivial to show it to be a full sentence and to show it or any other sentence to occur with an infinitive.) Sentences are usually shown in their canonical form (present indicative tense). DCDuring TALK 05:49, 12 October 2014 (UTC)

use–mention distinction in reference templates

As happened seven months ago, Dan Polansky and I are currently in disagreement about reference-template formatting; this time, we disagree about whether {{R:L&S}} should enclose the cited entry title in quotation marks. I believe that such quotation marks are necessary in order to mark the use–mention distinction, and that quotation marks create a more legible presentation than italicising the entry title would. I don't know why Dan Polansky disagrees, and nor do I know why he reverted the addition of {{documentation}} to the template in the same edit. — I.S.M.E.T.A. 01:37, 13 October 2014 (UTC)

To explain, I come here in the hope that I shall find or obtain consensus to use quotation marks in {{R:L&S}}. — I.S.M.E.T.A. 01:57, 13 October 2014 (UTC)

Just ignore him. — Keφr 11:14, 13 October 2014 (UTC)

@Kephir: Forgive me; does "him" refer to Dan Polansky or to me? — I.S.M.E.T.A. 17:02, 13 October 2014 (UTC)

Polansky. He is going to be obstructionist just because he can. But for the sake of having anything said on-topic, I agree with you about the quotation marks. On the other hand, some consistency in formatting mentions would be nice, which would favour italics instead. But either way, bare external link formatting seems rather unfitting to me. — Keφr 15:34, 14 October 2014 (UTC)

Thanks, Keφr; I thought you meant him, but I wanted to make sure. I've made the change again; hopefully it'll stick this time round. — I.S.M.E.T.A. 18:28, 14 October 2014 (UTC)

FWIW, I agree with quotation marks, since we are referring to a piece of a larger work: "qua" (for example) is more-or-less a section title. (This is not exactly the same as the use–mention distinction. We are neither using nor mentioning the word qua, we're just citing a source that mentions the word qua. Perhaps a subtle distinction, but IMHO a useful one to keep in mind in cases where the reference work uses a different citation form than we do, or when it assigns a few lemmata to a single entry for whatever reason.) —Ruakh_TALK 04:56, 15 October 2014 (UTC)

Empowering WingerBot

I filled out a vote request to empower my new bot WingerBot, here:

Wiktionary:Votes/2014-10/Request for bot status: WingerBot

This is my first bot.

It gives a 30-day vote period, which seems excessive. For example, JackBot had a 7-day window, which seems reasonable. If that can be applied here, can someone fix up the start and end times appropriately?

Thanks. Benwing (talk) 07:20, 13 October 2014 (UTC)

FYI, the voting is going on now (and has been for a few days).

My bot's source code is available on github:

Benwing (talk) 11:29, 23 October 2014 (UTC)

It's been several days since this vote has finished ... could someone close it? Thanks. Benwing (talk) 01:24, 4 November 2014 (UTC)

Compound lists for Japanese entries (and possibly CJK in general) -- are these really needed?

With the advent of User:Haplology's various categories for Japanese entries, which compile lists of terms using each kanji (such as Category:Japanese terms spelled with 赤 read as あか, or Category:Japanese terms spelled with 幸 read as こう), it occurs to me that the potentially *huge* lists of compounds that could be compiled and included within each kanji entry are actually redundant and obsolete. Rather than laboriously compile these lists by hand, I think it makes a lot more sense to leverage the categories to do the hard work for us.

Comparing the categories and the manually created lists, the only additional information that the manual lists provide is a possible reading, and a gloss. This leads me to two things:

As a proposal: I posit that this information, while potentially helping to improve usability slightly, also represents a sizable negative potential for mistakes and inconsistencies. I therefore propose that we no longer include such lists in Japanese entries, referring users instead to the categories. I also submit for consideration that Chinese and Korean editors might do the same for hanzi and hanja compound lists.
As a request: Does anyone familiar with the inner workings of categories know if there might be some technically feasible way to get readings to display automatically in category listings? For instance, 幸運#Japanese is added to category Category:Japanese terms spelled with 幸 read as こう, with the sort argument こううん (kōun). Looking at the list on the category page, we see that 幸運#Japanese is there, but its sort argument is lost -- other than the sorting itself, the sort argument doesn't appear on the page as any kind of useful information. Is there any way of capturing sort arguments and getting them to display somehow in category lists?

I look forward to hearing what others think. ‑‑ Eiríkr Útlendi │ Tala við mig 18:19, 13 October 2014 (UTC)

I find them useful. They are not hard to create. Ideally, a bot should make those categories.--Anatoli T. ^{(обсудить}/^вклад) 10:09, 14 October 2014 (UTC)

Sorry, which them did you mean in I find them useful? Did you mean the categories that list compounds (which are already auto-generated once the appropriate templates are added to an entry), or the in-entry lists of compounds (which so far have to be created by hand)? ‑‑ Eiríkr Útlendi │ Tala við mig 19:11, 14 October 2014 (UTC)

I find categories useful, such as Category:Japanese terms spelled with 飢 read as う. Yes, the template auto-generates cats but they have to be created manually if they are missing. --Anatoli T. ^{(обсудить}/^вклад) 21:42, 14 October 2014 (UTC)

Rethinking Babel boxes

I did some minor editing at WT:Babel recently, which made me wonder whether it would make sense to rewrite {{Babel}} in Lua. My initial motivation was to integrate it with our central list of languages (maybe even into the category boilerplate system which User:CodeCat developed) and get rid of inline styles on the way. While planning this out, some other ideas emerged in my head:

To have the blurbs ("This user speaks Elbonian at an advanced level") in English, and English only. On one hand, this is contrary to how Babel boxes look in other Wikimedia projects. On the other, not only will it massively simplify the code, it also makes the most sense: English is the ~~one~~ language in which English Wiktionary's (duh) definitions, boilerplate and meta-content are written and in which discussions are (usually) conducted, and the only language which can be assumed to be understood by all users. If I am looking at a Babel box of an advanced speaker of Cantonese, I can recognise it only because I remember yue to be the code for Cantonese, and that the number 3 means advanced level. The blurb tells me nothing; I do not know nearly enough Hanzi to recognise a single character.
To rename the user categories. "User si-3" is rather terse and again forces me to remember language codes. "Wiktionary:Advanced speakers of Sinhalese" would be more elegant and descriptive.
To deprecate {{#babel:}}, as was suggested in Wiktionary:Beer parlour/2014/September#Can we disable the #babel parser function? (I see the English blurb issue was brought up there too). I think some page in the MediaWiki namespace can be edited to point users to the template instead.
To suggest users to add themselves to interest groups (in Module:workgroup ping/data) when they speak a certain language at a high above level.

Some considerations:

Integration with our central languages list would mean that, for example Template:User en-us-N would have to be folded into Template:User en (I see Template:User sr-4 already redirects to Template:User sh-4)
I think some users may expect Wikimedia language codes to work in our Babel boxes (they may simply copy the Babel template across projects). I think we should generally not break that expectation; however, I worry about some Wikimedia codes not mapping perfectly to local ones.

Thoughts?

— Keφr 17:08, 14 October 2014 (UTC)

I support translating the Babel boxes into English. Their very purpose is defeated when they are incomprehensible. — Ungoliant ^(falai) 17:18, 14 October 2014 (UTC)

I agree with this too, and I definitely agree with converting to Lua to eliminate the unmaintainable mess of templates we currently have. —CodeCa t 17:21, 14 October 2014 (UTC)

I support translating to English. I oppose converting to Lua because once we translate to English it will be very easy to turn it into a small maintainable template without Lua. I also oppose, as before, deprecating {{#babel:}}. --Wiki Tiki 89 17:30, 14 October 2014 (UTC)

The Module:workgroup ping integration and (maybe) validation would be much harder to do from a bare template. And I think so would be Eirikr's suggestion to avoid nested tables (while maintaining all current functionality, at least). — Keφr 08:35, 17 October 2014 (UTC)

I enjoy seeing the other languages and would be sad to see them go, but I understand and generally agree with the rationale for changing the Babel boxes to be all-English. If we're going to have them redone, my 2p request would be to not use nested tables, and to make sure that the columns actually line up properly. I'm one of those visually oriented people for whom the jagged inconsistencies of the current Babel infrastructure is so jarring, that I deconstructed the tables and rebuilt them to line up properly on my own user page. ‑‑ Eiríkr Útlendi │ Tala við mig 19:08, 14 October 2014 (UTC)

Does the text really matter, other than the English, as well as native, language name? Wouldn't luacizing the templates would mean that, as a practical matter, the text could only be in English? A new person with a new language could not be assumed capable of adding the text required in their language in a standard-conforming way, unless there were a particularly obvious way to add the text. DCDuring TALK 08:20, 15 October 2014 (UTC)

Well, maybe; translating into every language would be a bit of work (just create a huge data table… the only problem is that it would probably grow even larger than Module:languages, so we would have to split it, and it might become hard to navigate…), but could be done in principle. Though I think we could abuse the Scribunto i18n library to reuse messages provided by mw:Extension:Babel, and have every single Babel box in any language the reader desires (just add ?uselang= to the URL). Though that would put mw:Extension:Babel in a weird limbo of "deprecated but depended upon by its replacement"; and I have no idea how this interface could be exposed. Or we could just use that facility to maintain the status quo (pardon the Polanskyism) of having them in the target language. — Keφr 08:35, 17 October 2014 (UTC)

Proof of concept: {{#invoke:User:Kephir/test1|babble|ast|5}} gives

Native:: {{GENDER:USER|Esti usuariu|Esta usuaria}} tien un conocimientu ] d'].
Interface:: This user has ] knowledge of ].
English:: This user has professional knowledge of Asturian.

. Try also viewing this page in Chinese. — Keφr 14:38, 20 October 2014 (UTC)

I always thought that the purpose of having the blurbs in the target language was to help non-English speakers or English language learners to find users with whom they might be able to communicate if they needed help. I think it is beneficial to see the name of the language in English so that English speakers can easily recognize which language the box indicates. - TheDaveRoss 20:35, 16 October 2014 (UTC)

I did not consider this. This is a good argument. — Keφr 08:35, 17 October 2014 (UTC)

On that note, I wouldn't be opposed to the option of adding English text to existing Babel boxes-- but WITHOUT taking away the foreign language text. This would allow them to serve the purpose of helping foreign language users find people with whom they might communicate and talk to, and still make it easier for English speakers to make sense of it. (Note also, I'm one of the people who got so fed up with the babel templates and their alignment and such, that I made my own table rather than deal with them, as well as because there were a number of babel templates missing at the time when I set mine up. This isn't as uncommon as one might think, and thus while changing the templates is well-intentioned, it won't necessarily reach every instance.) --Neskaya _sprecan? 17:32, 29 October 2014 (UTC)

It would be hard to fit all of that text neatly in a small box. We could work around that, though, if we shorten the text. Something like "near-native level English speaker" is short enough. —CodeCa t 17:43, 29 October 2014 (UTC)

Or even more brief: "Basic Russian", "Intermediate Russian", "Advanced Russian" and "Native Russian". - TheDaveRoss 18:06, 29 October 2014 (UTC)

Rough initial version: Module:Babel. Does not categorise, and I did not aim for pixel-for-pixel replication. Try it by expanding {{Babel/x}} at Special:ExpandTemplates. Supports script, language and "coder" boxes; {{User time zone}}, {{User Wikipedia}}, {{User SUL}} and {{User bot owner}} are not supported. Using an unsupported box specification makes it fall back to the old Babel template. — Keφr 19:25, 1 November 2014 (UTC)

Spaces in alphabetization of language names

How do we treat spaces when we alphabetize language names? Specifically, does "Lower Sorbian" precede or follow "Low German"? If we ignore spaces, then "LowerSorbian" precedes "LowGerman", but if we treat spaces as preceding A in alphabetical order, then "Low_German" precedes "Lower_Sorbian". —Aɴɢʀ (talk) 19:01, 14 October 2014 (UTC)

There are pros and cons to both options. What do Dictionaries that list multi-word phrases as separate entries do? --Wiki Tiki 89 01:35, 15 October 2014 (UTC)

I just checked six print dictionaries (two British, four American) and they all ignore spaces (hotchpot before hot dog before hotel). —Aɴɢʀ (talk) 06:12, 15 October 2014 (UTC)

w:Alphabetical order#Treatment of multiword strings is relevant.—msh210℠ (talk) 12:30, 15 October 2014 (UTC)

That page basically outlines the question, but does not provide an answer. --Wiki Tiki 89 12:39, 15 October 2014 (UTC)

Both treatments are valid; the question is, which do we want to use? Dictionary headwords apparently usually follow the "ignore the space" rule, but other lists may follow the "treat the words separately" rule. —Aɴɢʀ (talk) 13:53, 15 October 2014 (UTC)

Internet-based sorting, including our own categories, generally treats a space as being ordered before any other character. So that would place Low German before Lower Sorbian. —CodeCa t 18:27, 17 October 2014 (UTC)

Some paper dictionaries, too, use this ordering, e.g. the Routledge dictionary of historical slang: have a look at http://books.google.fr/books?id=JRuNMHNcu5cC&pg=PP12&lpg=PP12&dq=%22something+before+nothing%22+dictionaries&source=bl&ots=6iDNPNRHjr&sig=S8mC2Wqar5xb4FCC2zWaw4itGG8&hl=fr&sa=X&ei=yXJBVNebNMnDPPWkgIgG&ved=0CCMQ6AEwAA#v=onepage&q=%22something%20before%20nothing%22%20dictionaries&f=false This is the better ordering for our kind of dictionary. Lmaltier (talk) 19:52, 17 October 2014 (UTC)

This dictionary calls it something before nothing. Do you understand why? Lmaltier (talk) 20:12, 17 October 2014 (UTC)

On what basis do you say "This is the better ordering for our kind of dictionary."? I happen to be leaning the other way. --Wiki Tiki 89 21:30, 17 October 2014 (UTC)

The reason is the number of multi-word phrases, etc. here. When entries in a dictionary are almost always single words (without spaces, etc.) and phrases are defined in these basic entries, the strict alphabetical order is the logical choice. Wnen each phrase has its own entry, it's much better to get all phrases beginning with the same word together when using a category. An example : you expect boulanger-pâtissier (probably adressed in boulanger in most paper dictionaries) after boulanger but before boulangerie, the order boulanger, boulangerie, boulanger-pâtissier is not what you would expect. Lmaltier (talk) 15:48, 19 October 2014 (UTC)

For the most part, we don't have to worry about alphabetization here; our entries are on separate pages that aren't ordered with respect to each other. Our categories alphabetize automatically, and I see that Category:en:Languages has Low German >> Low Prussian >> Low Saxon >> Lower Lusatian >> Lower Silesian >> Lower Sorbian >> Lower Wendish, meaning that our automatic alphabetization does treat spaces as ordered before any other character. The only alphabetization we have to do manually is the ordering of the languages in entries like se, which is where I first encountered the problem of where to put Lower Sorbian with respect to Low German. My immediate instinct was Low German >> Lower Sorbian, but then I second-guessed myself and asked here. After discovering that dictionary lemmas treat spaces as nonexistent, I went back to se and switched the order to Lower Sorbian >> Low German. But now that I've looked at how our categories alphabetize, I'm gonna go back again and switch it back to my first instinct, Low German >> Lower Sorbian. —Aɴɢʀ (talk) 20:07, 19 October 2014 (UTC)

We have to worry about categories only, but this is very important. They alphabetize automatically, but we must ensure that they alphabetize the best way for readers. For languages: the result is disputable for Lak'ota. Lmaltier (talk) 05:41, 20 October 2014 (UTC)

We have some control over sorting in categories, though I'm not sure if that includes treatment of spaces. As for "Lak'ota", that's not a good example- we call the language Lakota. Chuck Entz (talk) 12:24, 20 October 2014 (UTC)

It was a real example: see Category:en:Languages and look at L. Lak'ota is before Lake Miwok. Lmaltier (talk) 20:17, 20 October 2014 (UTC)

Extended etymologies

I came up on this website illustrating an idea that I had in mind for a while (click on the blue links in the leftmost column). We could extend the < "derives from" operator used in etymologies to generate a drop-down table illustrating intermediate steps between pairs in the derivational chain, i.e. all of the sound changes involved. Short descriptions could link to appendices where more details are available. This would be applicable to both reconstructions and attested etymons, including borrowings (which often undergo some special rules can nevertheless be described and cataloged). Chronologically inverted list would be used in the descendants sections of the corresponding source word/reconstruction. Support could be added for multiple sequences of derivation, and even multiple sources or different reconstructions reflecting different protolanguages. It would however require some non-trivial investment in the groundwork to make it work, so it should best be approved (or better: not disapproved) first before people waste time. I've seen some recent works that use this method but they use numbers instead of descriptions to explain what's going on, so one has to manually look up what each of the numbers used means, and the layout is horizontal not vertical. --Ivan Štambuk (talk) 00:13, 15 October 2014 (UTC)

Support, although I recognize that there should be a lot of discussion about the specifics of the layout. --Wiki Tiki 89 01:36, 15 October 2014 (UTC)

How would it work, on a technical level? How would you share data between entries? DTLHS (talk) 04:46, 15 October 2014 (UTC)

Support. I had a vague idea about having such lists in appendices somewhere, but never developed it. Filling out the details would seem to go beyond the limits of published sources without resorting to the kind of extrapolation that you've been berating CodeCat for- are you ok with that? Chuck Entz (talk) 13:30, 15 October 2014 (UTC)

Support. Categorization based on sound change could also be added, such as Category:Old Armenian terms derived by Meillet's law. Or such terms could appear on the appendix dedicated to Meillet's law. --Vahag (talk) 10:29, 16 October 2014 (UTC)

I think this might overwhelm normal entries, especially if people do it for every morpheme in a polymorphemic word, but it would be nice to do this somehow on reconstructed-form appendix pages. —Aɴɢʀ (talk) 13:49, 15 October 2014 (UTC)
- It wouldn't be too bad if we restricted it to the rules between a term and its nearest parent (i.e., an English etymology would only have the steps between it and Middle English or maybe Old English), and hid the list so that only those who choose to look at it would see it. Chuck Entz (talk) 13:57, 15 October 2014 (UTC)

Categories for words that have pronunciations marked in the form of IPA

Should we create such categories? I believe that it is convenient to go to Special:WhatLinksHere/Appendix:Italian pronunciation for the above information. --kc_kennylau (talk) 09:53, 15 October 2014 (UTC)

What's the general consensus view on handling abusive editors?

I stumbled across the activities of a new editor and have been quite impressed at how abusive they can be -- foul language, name-calling, lawyering, basically the kind of trollish behavior that drove me from Wikipedia years ago. I analyzed their total contributions, only a short list so far, and found that more than a quarter have been on talk pages, where this editor has mostly argued about editing decisions, illustrated their profound ignorance of the consensus here, and berated other users. Another more-than-quarter has been in this user's own userspace. 40% has been actual constructive mainspace edits, mostly in January-March this year. Out of the total, more than a quarter has been confrontational and even outright abusive.

For what it's worth, this editor has not yet had any direct dealings with me.

How would other admins approach this? ‑‑ Eiríkr Útlendi │ Tala við mig 18:05, 17 October 2014 (UTC)

I would post a warning on his/her user page along the lines of "Start being nice to people, or I will block you." (but in a more polite way). --Wiki Tiki 89 18:10, 17 October 2014 (UTC)

I have to agree here. If you don't want to post the warning to the user, feel free to post a note on my talk page or use EmailUser to contact me, and I'll be happy to deal with it and whatever incivility comes up of it (as well as happy to watch them for a few weeks to see if they improve or need some time off to think). That sort of attitude isn't what we need from editors here. --Neskaya _sprecan? 17:26, 29 October 2014 (UTC)

Proposal: use quotation marks to mark headwords cited in reference templates for Latin-script languages

Further to §: use–mention distinction in reference templates above, may I suggest that we use quotation marks in our R:-prefixed reference templates to mark the headwords cited by those templates? So, for example, the standard format (at least where the headword is concerned) would be:

“foo, n.” in Some Big Dictionary

(Because of potential problems with using quotation marks with other scripts, I make this proposal for Latin-script languages only.) Does that seem sensible to everyone? Is there consensus? Shall I prepare a vote? — I.S.M.E.T.A. 18:35, 17 October 2014 (UTC)

Of course this is not supported by everyone. It is opposed by Robert Ullman per edit history of Template:R:Webster 1913 in April 2009. I now also support removal of the quotes. In diff, Spangineer removed quotes from Template:R:Century 1911. In diff, DCDuring removed quotes from Template:R:OneLook. More research would disclose more editor stances. --Dan Polansky (talk) 19:08, 17 October 2014 (UTC)

It's also worth noting that all three of these changes to remove the quotes were in 2009, now half a decade ago. Attitudes and ideas change over time. I suggest we check the opinions of the relevant people here. That said, Ullman is no longer with us, and Spangineer's last edit was in 2010. @DCDuring do you have any input on this quote issue? ‑‑ Eiríkr Útlendi │ Tala við mig 19:32, 17 October 2014 (UTC)

I support adding quotes. It's the only way we can make the cited part stand out without changing text style like the italic "n.". —CodeCa t 19:16, 17 October 2014 (UTC)
The only way to stand out? That is obviously untrue. The text of the word stands out by the use of a different color for the hyperlink, as in “cat”, in Webster’s Revised Unabridged Dictionary, Springfield, Mass.: G. & C. Merriam, 1913, →OCLC.. --Dan Polansky (talk) 19:27, 17 October 2014 (UTC)
Not all people can see such colours. —CodeCa t 19:35, 17 October 2014 (UTC)
You mean color blind (are there such that cannot distinguish blue vs. black)? Or people with a simple browser that does not distinguish a piece of text with a link from a piece of text without a link? Even assuming some people do not see such colors, will they miss the link because of the missing quotation marks? If so, will they miss links in general, since in general links are not surrounded by quotation marks? --Dan Polansky (talk) 19:38, 17 October 2014 (UTC)

Surprisingly, I agree with Dan. Color or other link distinction seems sufficient. Quotation marks, especially double, add visual clutter IMO.

We use quotes for glosses, so any need for glosses in such templates — quite possible IMO — would require multiple quotes.

If we resort to further distinction, I would strongly oppose ever using italics as it makes it impossible to maintain the appropriate typographic contrast for the taxonomic names that are supposed to have it. DCDuring TALK 19:51, 17 October 2014 (UTC)

Re: links, are there any cases where a term might not be linked in such a template call? ‑‑ Eiríkr Útlendi │ Tala við mig 20:27, 17 October 2014 (UTC)
It certainly might not always be the pagename. In some cases having a named link might be misleading, as it implies that it is possible to go to a page that is directly related to the term, rather than, say, a general search-form page. The more I deal with these, the more I appreciate such refinements. Also: optional italics for the taxonomic names that need them ("i=1") and a optional gloss ("gloss="). Not every template needs such options, but they are handy. DCDuring TALK 22:08, 17 October 2014 (UTC)

Redesign-Redefine of Russian Entries

I'm going towards a large redo of many Russian pages, translating swathes from Russian Wiktionary with a focus to layout consistency, definition intuitiveness/coverage, and relevant design/coding.

Info on en-Wiktionary is generally inadequate for translating literature; often confusing for basic words (e.g.'весь', see below). We have all necessary info already, only, on Wiktionary-ru, hence inaccessible to casuals (many definition examples cited there derive from literature.) I started translating Dostoevsky, ( https://github.com/icarot/bk ) which was when such inadequacies became more obvious.

Roadmap:

1) Collaborate with Grease Pit to try to normalize the data layout as consistently as possible, for parsing by robots. A parser/morphological analyzer needs quality, open data. Hacky consistency = hacky parse.

2) Improve word-count and definition count immensely. On the order of a few thousand for one of them. Even ru-Wiktionary is occasionally lacking in this department.

3) Clean messy pages, i.e. 'весь' (which confound the novice with the unintuitive concept that Russian uses declensions to represent irregular meaning on an unusually multi-purpose word), and does not represent all of the critical meanings.

4) Pronunciations from ru-Wiktionary as well. Ours are sufficient but different (we use phonemic vs. narrow transcriptions). In my opinion, the narrow transcriptions are better since they reveal useful subtleties of pronunciation without adding obscure IPA symbols. The main changes would be notating non-phonemes when ru-Wiktionary decided to do so and we did not, such as replacing our alveolar approximants with velarized allophones, and notating unusual instances of vowel allophony, or secondary stress. In short, copying the more precise and still friendly transcriptions from ru-Wiktionary. Consensus?

What are desired improvements I've missed for Russian translations which can be directly bettered from conventions and the scope of information on Russian Wiktionary? Looking for criticisms, guidance, etc. I wouldn't just run rampant without letting the community know what was going on, or asking for help.

Main Points Noted

Ivan I can help generate stubs — that would be brilliant! I'd do the same, using lemmas from Dostoevsky. I'll use the corpora from ru-Wiktionary (i.e., National Russian Corpus) because if it's there, logically I assume it's license-compatible. I agree with you about Google Translate — they can't possibly have the copyright on that data. But we should verify to make sure.

Ivan German article in Spiegel and there were like 2-3 missing words in every single sentence. I can imagine that the statistics for Dostoevsky are even worse. It has become embarrassing. We should have some kind of stubs for statistically top 20k words in every language IMHO — I think this is a fantastic idea. And you're absolutely right about your inference about Dostoevsky. It's the English equivalent of reading the word 'snicker' and having no entry whatsoever. This is middle- or high-school vocabulary, and is a large problem as a whole for practical use as a dictionary. Can we reach a consensus for doing this specifically for ru-articles?

Wikitiki89 do not change the layout without discussing it first. — Main change wanted is inflection tables. These on en-Wiktionary waste huge amounts of space. We should copy ru-Wiktionary's approach: a clean, uncluttered overview of an inflection pattern. While we're on the topic of morphology, I want Alfred Zalizjank's inflection descriptions from ru-Wiktionary as well. He uses one number and one letter for each word to comprehensively cover the morphology and stress pattern of Russian. I'll work on translating the description from ru-Wiktionary when I get a moment.

Icarot (talk) 00:18, 18 October 2014 (UTC)

Hi.

We have seen you talking but we haven't seen you working :). You're welcome to demonstrate your ideas. Yes, we need more Russian entries and some entries may need fixes or improvements but you can't make major changes without a prior agreement. --Anatoli T. ^{(обсудить}/^вклад) 05:20, 18 October 2014 (UTC)

Just a heads-up: Any automatic transmission of data from Russian Wiktionary into English Wiktionary has to clearly indicate the source of the data in the edit summary to prevent copyright violation. --Dan Polansky (talk) 05:34, 18 October 2014 (UTC)

Feel free to make any changes you want to content, but do not change the layout without discussing it first. --Wiki Tiki 89 14:25, 18 October 2014 (UTC)

@Icarot: I can help you generate stubs for Russian nouns, adjectives, verbs and adverbs (the rest are a closed category and mostly covered). Stubs would be entries like in this category - the only thing they are missing are definitions. I could help extract a list of missing lemmas from a particular work. We could also pregenerate a list of examples for every entry and format them using the {{usex}} template, by taking them from ru Wiktionary, glosbe, parallel corpora databases, subtitles, google translate and so on, that editors could easily copy/paste into entries that are missing them. Don't worry about associations (derived terms, *nyms, morphological etymologies etc.) - those can be largely automated once entries with definitions are created. The primary focus should be on coverage. --Ivan Štambuk (talk) 07:32, 19 October 2014 (UTC)
- Not sure why you are not continuing with this crap in Serbo-Croatian Wiktionary. It already has more than 100 000 Serbo-Croatian definitionless entries. If Wiktionary users are so hungry after such content as you posit, Serbo-Croatian Wiktionary could become one of the most visited Wiktionaries soon. Unless it gets shut down due to copyright violation, that is, such as because of automated lifting of data from Google translate as you seem to suggest above. --Dan Polansky (talk) 07:58, 19 October 2014 (UTC)
  Inflections cannot be copyrighted, the databanks such as HJP are completely free. Besides, I fixed many errors in them, and used two others as well. Definitions on the other hand can be copyrighted, and are nevertheless abundantly stolen by many FL Wiktionaries without anyone so much raising an eyebrow. Don't worry Polansky, soon I'll add many such stubs for Czech as well. --Ivan Štambuk (talk) 08:07, 19 October 2014 (UTC)
  - As you know from a previous discussion on the subject with copious participation, there is no consensus supporting your mass creation of definitionless entries. There is no consensus for blocking that behavior either, though. You may get blocked in the process nonetheless; if I were a crat, I would have blocked you by now for entering definitionless rubbish. You may also get blocked for the above cynical utterances of disrespect toward copyright; if I were the operator of this website, I would block you for that. In the meantime, I will take this opportunity to register my annoyance. --Dan Polansky (talk) 08:16, 19 October 2014 (UTC)
    A wide consensus is not necessary for language-specific work (The original discussion was for all languages). A few editors agreeing and working together is enough. The rest can complain about it all day for all I care. (It seems to be the only thing that you do anyway.) Just looking at the content of Category:Czech nouns: We have 13k Czech nouns and 95% of them don't have inflection and pronunciation. I can guess the meaning of 90% of them and I've never studied Czech in my life. I know it's hard to accept that most of your work has been futile, but such is life. Google Translate is based on statistical correlation in parallel corpora not owned by Google an its translation pairs are uncopyrightable, and can completely substitute all of the work you've done. Working smart not stupid is the way to go, using bots and free databases for heavy lifting and not wasting time on typing wiki syntax. --Ivan Štambuk (talk) 08:31, 19 October 2014 (UTC)
    Re: 'The rest can complain about it all day for all I care. (It seems to be the only thing that you do anyway.)': That is obviously untrue; it suffices to inspect my mainspace contribution to see otherwise. I propose you use your blocking tools to block yourself for that remark. --Dan Polansky (talk) 08:36, 19 October 2014 (UTC)
    
    Re: "I can guess the meaning of 90% of them": Very unlikely. --Dan Polansky (talk) 08:38, 19 October 2014 (UTC)
    Well I took a look at the last 50 contribs of yours, and the only novel mainspace edit is some English misspelling. Anyway, my point was that you've invested too much time into easily replicable manual labor so that you oppose stubbing not by reason but principle. See: neo-Luddite. We have too little editors to do everything manually, and after 10 years we're still missing thousands of top words in many major languages. The other day I was reading a German article in Spiegel and there were like 2-3 missing words in every single sentence. I can imagine that the statistics for Dostoevsky are even worse. It has become embarrassing. We should have some kind of stubs for statistically top 20k words in every language IMHO (including translations). Regarding blocking - using words such as crap or rubbish when referring to other people's work is considered impolite and could be a cause for a block. --Ivan Štambuk (talk) 08:56, 19 October 2014 (UTC)
    Are you semantically challenged? Which part of "the only thing that you do" you fail to understand? Some recent contributions are and . Your ridiculous insults and inaccuracy are just tiresome. --Dan Polansky (talk) 09:19, 19 October 2014 (UTC)
    You've made ~500 mainspace edits in 4 months, most of which are translation pairs. I could in a few hours write a script that would generate both those and inflections and pronunciations. 4 months of work reduced to few hundred lines of code. I can even extract context labels from dicts. I understand your anger but there is no need to project it towards others. Behave yourself. --Ivan Štambuk (talk) 09:36, 19 October 2014 (UTC)
    My point is that what you said was clearly false. I still see no "I stand corrected". Actually, when one rereads your posts above, they are full of obvious inaccuracies. I am not sure why I care to respond to that sort of communication style that is inaccurate by design, and whose author never says "I stand corrected, I was wrong". --Dan Polansky (talk) 09:44, 19 October 2014 (UTC)
    Natural languages are too primitive to convey the nuances of meaning representative of the real world. Nature is stochastic and statistical, and there really exists no such thing as true or false, right or wrong. In practice "never" means "almost never/in 0.something % of cases", and "all" means "100% for all practical purposes". It's real life 101. But I digress. If you don't have anything to say regarding my points I suggest that we terminate this interlocution.--Ivan Štambuk (talk) 10:05, 19 October 2014 (UTC)
    Re: "Natural languages are too primitive to convey the nuances of meaning representative of the real world." No one should be allowed to get away with this sort of continental nonsense. The relevant distinctions are very easy to express in natural language: there is a clear, easy to understand difference between "The only thing you do is X", "You do almost nothing but X", and "Most of what you do consists of X". No rocket science, nothing to do with stochastic nature of the real world. As I said, remind me of the occasion on which you admit you made an error rather than blaming natural language for lack of expressive power. Your sort of response to clear refuting examples is the sort of behavior which Popper's philosophy of falsificationism was intended to combat. --Dan Polansky (talk) 17:51, 24 October 2014 (UTC)
    
    Re: 'using words such as crap or rubbish when referring to other people's work is considered impolite and could be a cause for a block': That's utter rubbish. You can hear "rubbish" all the time, used be well educated and generally polite people. These words are not the most polite forms available, but fit well to describe the sort of content that dominates the Russian Wiktionary. --Dan Polansky (talk) 09:23, 19 October 2014 (UTC)
    I'm not sure what kind of polite people you socialize with, but referring to other people's work as crap and rubbish an them as challenged (a jocular pejorative) is generally reserved for intimate contexts where they would not perceive it as an insult (e.g. family or close friends). Russian Wiktionary is doing fine, thanks for asking. And so will the Serbo-Croatian Wiktionary. Not so long ago the SC Wikipedia was ridiculed on similar grounds, and now is the bigger than any of the hr/bs/sr pedias with the highest growth rate. --Ivan Štambuk (talk) 09:36, 19 October 2014 (UTC)

@Icarot Feel free to add definitions to Category:Russian entries needing definition, generated by User:Ivan Štambuk, which I have been working on. Plenty of work to do! I'll repeat what was said before: please don't change the design without a prior agreement. As I said before, we haven't seen you working yet. --Anatoli T. ^{(обсудить}/^вклад) 02:37, 24 October 2014 (UTC)

IPA, language code and error message

Whatever changes were made to IPA modules to make older pages (2013) have conspicuous red error message in the IPA section should be undone. Example: this revision. Old revisions should look as legible and sane as possible; this is not. In general, IPA templates should not require the language parameter; filling-all-the-fields concerns should be delegated to editors with a shovel who have no real interest in building the dictionary. --Dan Polansky (talk) 05:31, 18 October 2014 (UTC)

I agree that the lack of a lang parameter shouldn't result in an error message, but we don't have any editors who have no real interest in building the dictionary. People with no interest in building the dictionary don't become editors. —Aɴɢʀ (talk) 07:00, 18 October 2014 (UTC)

I completely agree that there shouldn't be an error message. A cleanup category would be sufficient. --Wiki Tiki 89 14:27, 18 October 2014 (UTC)

I was gonna say exactly what Wikitiki89 said. Renard Migrant (talk) 11:49, 24 October 2014 (UTC)

thanatomicrobiome

There's a lot about this entry that makes me nervous: the word was apparently coined in a journal article published in mid August, with some or all of the authors working at Alabama State University in Montgomery, Alabama. The Wiktionary article was created at the beginning of September by an anonymous contributor whose IP is assigned to ASU. A variety of IPs from the same southern Alabama/northern Florida area as ASU, as well as an account that seems to bear the name of one of the authors, have been adding references, which are all articles/blurbs about either the research program at ASU or about the original article itself. It's tagged as a hot word, but it looks to me to be lukewarm at best: a Google search does show the word in a blog or news article here or there, but this isn't the kind of strong, widespread adoption we saw with olinguito.

I can't escape the impression that we're being used for promotional purposes, and I feel we need to do something- but I'm not sure whether to tag this for cleanup to prune out all the PR from the references, or to rfv it, or something else. It certainly doesn't meet the letter of the CFI, since it's only 2 months old, but how do we decide whether this is "hot" enough to keep it provisionally as a hot word? Chuck Entz (talk) 05:04, 19 October 2014 (UTC)

Some use outside of the group promoting it would be nice. I'd RfV it for starts. DCDuring TALK 12:44, 19 October 2014 (UTC)

It's hard to say which of the "references" have print counterparts or can otherwise be considered to be durably archived. At least one is a self-proclaimed blog. Nothing in CFI says we have to include something as a hot word, especially when it is not at all clear that use would get ever beyond the field of forensic science and practice. I think that means that it would in the end come to a vote, which usually takes place at RfD. And then there's the increasingly important question of how we address the decline of print media.

This particular case seems to me to be part of a campaign by a university PR office. RfC seems inappropriate as the entire issue is with the attestation. I'd RfV it to get a slow clock started. We need to have properly formatted attestation to facilitate wide participation in review. Why should each participant have to click through to each website? DCDuring TALK 13:17, 19 October 2014 (UTC)

Headwords for reconstructed languages

So I'm putting in the first steps towards an appendix for Proto-Samic, a fairly well-reconstructed proto-language. I'm however wondering what would be a good choice of headword for verbs?

Use just the bare verb stem. This is what the main published sources, including the 1989 dictionary by Lehtiranta , seem to do: e.g. *ëstë (“to be in time”). However this is not an actual wordform by itself.
Use the verb stem, marked by a hyphen to be just a stem and not an actual wordform: e.g. *ëstë-.
Follow the standard for the modern-day Samic languages (and, for that matter, our PF and PGmc appendices) and use the infinitive: e.g. *ëstëtēk. These are not directly listed in the source literature, but they are simple enough to assemble, and the ending itself is uncontroversial.

Worth noting is that some otherwise homophonic roots would be distinguishable under options #2 and #3 (e.g. *ćēkćë 'osprey' ~ *ćëkćë- 'to kick'). OTOH there also exist roots for which it is not clear if the original meaning was nominal or verbal (*teampō 'to become wet / seaweed'), and their placement would end up arbitrary if we strictly separated verbs and nominals by citation form.

(Discussion on further matters perhaps ought to go at Wiktionary talk:About Proto-Samic. 15:34, 24 October 2014 (UTC): Page now up.)

--Tropylium (talk) 20:46, 19 October 2014 (UTC)

Lehtiranta, Juhani. 1989–2001. Yhteissaamelainen sanasto ('Common Sami Vocabulary'). Suomalais-Ugrilaisen Seuran Toimituksia 200. Helsinki: Suomalais-Ugrilainen Seura. →ISBN.

I would choose option 3 mainly because it lines up better with modern terms and makes comparisons easier. It also matches our treatment of Proto-Finnic, which also uses the infinitive as the lemma. —CodeCa t 21:21, 19 October 2014 (UTC)

On proper nouns

Previous discussions: Wiktionary:Information desk/2014/July#Are names always proper nouns (or proper names)?, Wiktionary:Beer parlour/2014/July#Proper nouns

Why do we treat proper nouns as a separate POS from nouns? Proper nouns are just a specific type of noun; having separate headings and categories for "Proper nouns" as opposed to "Nouns" is a bit like having separate headings and categories for "Transitive verbs" as opposed to "Verbs". Merging proper nouns in with nouns would solve a lot of ambiguity problems, such as words like Friday and Christmas that can be used both as a proper noun and as a common noun, not to mention the problem that there is no real clear cross-linguistic definition of what constitutes a proper noun. (Most attempts at defining the difference I've seen apply only to English and don't necessarily work for other languages.) —Aɴɢʀ (talk) 16:26, 20 October 2014 (UTC)

I definitely support this. Furthermore, even if this does not pass, I would like to propose categorising all proper nouns as nouns as well, and merging Category:Proper noun forms by language into Category:Noun forms by language. —CodeCa t 16:59, 20 October 2014 (UTC)

As a general rule if something (eg, a classification, attribute) is reasonably well researched and documented in a given language and has lexical implications, then we should have it in that language. If other languages don't have the distinction or don't have it documented then we shouldn't have it for those languages. I don't see why we should dumb down presentation of any language, let alone the host language, for the sake of uniformity or the convenience of translators or Lua practitioners.

For English and for taxonomic names, the notion of proper nouns is well-documented and useful. We could make the presentation simpler by acknowledging that large classes of English proper nouns have perfectly predictable (ie, effectively syntactical) patterns of common-noun use. I always wonder whether we can prevent contributors from adding "missing" information such as an Adjective PoS section to cover attributive use of an English noun, but that problem seems to be declining. DCDuring TALK 17:21, 20 October 2014 (UTC)

But we don't have to indicate the "propriety" of nouns by having "Proper noun" considered a separate POS. We could tag nouns {{lb|en|proper}} or {{lb|en|common}}, for example, the way we already label verbs {{lb|en|transitive}} or {{lb|en|intransitive}}. It isn't "dumbing down" the presentation of the language to aim for accuracy as well as precision. —Aɴɢʀ (talk) 18:34, 20 October 2014 (UTC)

You have now taken a position that is better defined than your initial posting, which expressed opposition to proper noun headings and categories. And your initial posting included "the problem that there is no real clear cross-linguistic definition of what constitutes a proper noun", which seems like the kind of cross-linguistic uniformitarianism that is often proposed here and which is probably what has won you CodeCat's support.

Your statement above that 'having separate headings and categories for "Proper nouns" as opposed to "Nouns" is a bit like having separate headings and categories for "Transitive verbs" as opposed to "Verbs"' implies that you are opposed to such headers and categorization in the case of entries that are now proper nouns. But we have categorization of "Intransitive verbs". Are you really opposed to that as well. The ratio of English proper noun entries to total English noun entries is even smaller than the ratio of intransitive English verbs to total English verbs, so the category is arguably more useful. Given our current "efficient" method of implementing labels, we cannot use "what links here" and a template to construct a list of items so labeled, leaving us with only categories, programs run on dump runs, and text searches as ways of constructing such lists from labeled definitions. Speaking from extensive and recent experience, I can say that text searches are not fully satisfactory and that programs run on the XML dumps are inconvenient for many ad-hoc purposes.

Are you opposed to the proper noun category as well as to the proper noun heading? Are you in favor of proper labeling of individual definitions before the proper noun heading is eliminated? Are we sure that proper labeling does not require manual review? Who do you propose do the checking and conversion? DCDuring TALK 19:15, 20 October 2014 (UTC)

I'm not proposing anything yet; at this point all I want is discussion. I do want to consider getting rid of the L3 header, but you're right that parallelism with transitive and intransitive verbs does suggest retaining Category:English proper nouns as well as creating Category:English common nouns. As for a cross-linguistic definition, I'm not even talking about languages that aren't considered to have the proper/common distinction (though I'm not aware of any languages that don't), I'm talking about a definition that would apply to all languages that are considered to have both kinds of nouns. Even for such syntactically similar languages as English, French, and German I don't know how to define "proper noun" in a way that will apply to all three languages. And if each language has to have its own language-specific definition, that's a good indication to me that the concept of "proper noun" has no linguistic basis at all and is useful only for pedagogy. And if it turns out there is no adequate definition of "proper noun", then we shouldn't use the label template or the category at all. What do other dictionaries do? Do other dictionaries label proper nouns separately? What criteria do they use? For that matter, what criteria do we use? Why are AB-yogurt and air chief marshal proper nouns? —Aɴɢʀ (talk) 19:36, 20 October 2014 (UTC)

I'd be willing to stake my reputation as a linguist on there being massive overlap among the sets of things considered as proper nouns in all languages. Many folks don't act as if taxonomic names are proper nouns, but most theoretical taxonomists seem to. And then there is the proper name/proper noun distinction. DCDuring TALK 21:48, 20 October 2014 (UTC)

But "being considered a proper noun" isn't a definition. And I'm not sure there's even always overlap within the same language. For example, we call language names like Latin and Sanskrit proper nouns, just like names like Noah and London. But the American Heritage Dictionary, which gives no part of speech info for Noah and London, labels Latin and Sanskrit "n.", which they otherwise do only for common nouns. So are language names proper or common? What usage of taxonomic names indicates that theoretical taxonomists treat them as proper nouns? (That's an actual question, not a rhetorical one.) Considering our first definition of ] is "proper noun", I wonder the distinction between the two is supposed to be. —Aɴɢʀ (talk) 22:12, 20 October 2014 (UTC)

Support tentatively (but I will see how the discussion goes), even if it causes Japanese, Chinese (only Mandarin, Min Nan/Min Dong and Hakka) and Korean transliterations to become lower case (various dictionaries use different standard for capitalisations of these languages, place and personal names are usually capitalised but not by all dictionaries). There's definitely no need to treat language names, demonyms, month and weekday names to be proper nouns. Various languages here just follow English when using proper nouns. Transliterations, which are never capitalised don't need and don't benefit from this distinction at all. E.g. Arabic nouns are just nouns. --Anatoli T. ^{(обсудить}/^вклад) 22:59, 20 October 2014 (UTC)

@Atitarev Actually, in Arabic there is very important distinction between proper and common nouns. Proper nouns are automatically definite and never take the definite article الـ (al-) or possessive suffixes, and usually do not take nunation, in which case they also have a slightly different declension pattern. For example: مِصْرُ الْقَدِيمَةُ (miṣru al-qadīmatu, “Ancient Egypt”) and فِي مِصْرَ الْقَدِيمَةِ (fī miṣra al-qadīmati, “in Ancient Egypt”). Similar applies to Hebrew and Aramaic. --Wiki Tiki 89 21:05, 21 October 2014 (UTC)

Proper nouns never take the definite article in Arabic? So العراق, السعودية and الإسكندرية are common nouns? People sometimes make the same claim about English, that proper nouns never take the definite article, but then Netherlands, Gambia, and Philippines (not to mention Ukraine and Crimea in more old-fashioned varieties) would have to be called common nouns. —Aɴɢʀ (talk) 22:29, 21 October 2014 (UTC)

Well in those cases, they don't take another definite article because the definite article is part of the proper noun. For your English examples, I would say that "the Netherlands" is the proper noun, while just "Netherlands" is an incomplete proper noun (or the plural of "Netherland"). --Wiki Tiki 89 22:45, 21 October 2014 (UTC)

Yes, some proper nouns may become diptotes but this probably has to do with their definiteness, rather than the fact that they are proper nouns. The thing is also, not ALL proper nouns are triptotes, e.g. (with full vowelisation) مُحَمَّدٌ (muḥammadun) and, as Angr mentioned, they can also take a definite article, as in العِرَاق (al-ʕirāq) "Iraq" and الأُرْدُنّ (al-ʔurdunn) "Jordan", although the nisba doesn't have it: عِرَاقِي (ʕirāqī) "Iraqi" and أُرْدُنِي (ʔurdunī) "Jordanian". There are some rules about, which proper nouns can be diptotes - the length, whether they are loanwords or native Arabic, the endings, certain patterns (e.g. "fuʿal"). --Anatoli T. ^{(обсудить}/^вклад) 12:35, 22 October 2014 (UTC)

My whole point was that their definiteness (more so, the fact that they cannot be made indefinite and cannot take possessive suffixes) is what makes them proper nouns. Nisbas are not proper nouns, so I don't see how they are relevant. You cannot, for example, say مِصْرُكَ (miṣruka, “your Egypt”) or عِرَاقُكَ (ʕirāquka, “your Iraq”); or if you do say that, then you are turning it into a common noun. As for مُحَمَّدٌ (muḥammadun), I did use the word "usually" for a reason. --Wiki Tiki 89 12:56, 22 October 2014 (UTC)

Well, it's obvious that proper nouns, like unique place names, are definite but I personally don't see this really as a grammatical difference, to separate them as proper nouns, they can sometimes take a definite article, they can also take possessive suffixes (converting to common nouns, if you wish), they can sometimes be triptotes (and common nouns can be diptotes). These features are not reliable (also hard to verify, since ʾiʿrāb is seldom written, not so often pronounced in full). I found some rules for diptotes for proper nouns but my source doesn't mention how many are triptotes, so, not sure if the list is big. My nisba examples were just to show that الـ (al--) is not part of the word. Since Arabic grammarians do mention Arabic proper nouns, I'll drop this point specific to Arabic. I only think that language names and nationalities should be common nouns in Arabic, reserve proper nouns for place, people's and company names. --Anatoli T. ^{(обсудить}/^вклад) 14:24, 22 October 2014 (UTC)

A much simpler solution would be to rename the current Noun to Common noun. I would strongly oppose the introduction of an Intransitive verb POS, but I think it's very helpful to readers to keep both POSs when they are meaningful in the language, these two kinds of nouns being used very differently. The precise limit between proper nouns and common nouns only depends on tradition in each language (e.g. we consider italien (the language), septembre or Parisien, a capitalized word, as common nouns in French). Note that, generally speaking, all proper nouns can be used as common nouns (but this does not make them common nouns), and common nouns can be used as proper nouns, this cannot be considered as ambiguity. Lmaltier (talk) 20:26, 21 October 2014 (UTC)

We need to indicate whether a noun is common or proper in some way. Whether this is in the POS heading or somewhere else makes little difference, but it seems that the POS heading is the most obvious and best place for it. Verbs do not need transitive/intransitive distinctions as much because it is usually obvious from the definition. --WikiTiki89 21:05, 21 October 2014 (UTC)
- What's the evidence that the two kinds of nouns are "used very differently"? They seem to be used exactly the same way to me: as the subject or direct object of a sentence, as the object of a preposition, etc. Why do we need to indicate this apparently undefinable and artificial distinction? And if we do, why is the POS heading the most obvious and best place for it? To the extent the distinction actually exists, it's usually obvious from the definition too. —Aɴɢʀ (talk) 22:29, 21 October 2014 (UTC)
  - In some languages, it's clear that they are used very differently, and that they are very different from the reader's point of view. In French, the article is usually used with common nouns, not with proper nouns (it's much less simple, e.g. the definite article is normal with most country names, but this is the general idea). Lmaltier (talk) 05:49, 22 October 2014 (UTC)
    - The only thing that's clear to me so far in this discussion is that many languages have nouns that are definite without the markers of definiteness that are usual in that language, such as being governed by a definite article, a possessive determiner or the like. But in none of the languages discussed so far is that set of nouns exactly coterminous with a set of nouns that can be defined by a semantic property such as being the name of a person, geographical location, language, etc. In English, Arabic, and German, most geographical names don't use the definite article, but some do, and statements like "the definite article in the Netherlands is part of the name" is simply begging the question. In Irish, most language names do use the definite article except in certain constructions, but at least one (Béarla (“English”)) never uses it. So if we want to label nouns by this property at all, we should label them as being definite even without a definiteness marker, rather than implying that there is some sort of semantic property that causes nouns to be "proper nouns" and that their syntactic behavior results from that. —Aɴɢʀ (talk) 15:13, 22 October 2014 (UTC)
      - No, no, not at all, the only possible criterion is the tradition in the language. It was only an example to show that being a proper noun often has a major impact on the use, including grammatical rules to be used. Lmaltier (talk) 17:29, 22 October 2014 (UTC)

Another argument is that paper dictionaries including both common nouns and proper nouns sometimes have a fully separate part for proper nouns (it's the case of a best-seller dictionary for French: Petit Larousse Illustré). Readers may be used to this clear separation. Lmaltier (talk) 17:35, 22 October 2014 (UTC)

I don't think "tradition in the language" is a reason at all, especially since the vast majority of the world's languages don't have a tradition about it one way or the other. If the distinction between common nouns and proper nouns is linguistically real, it must be possible to come up with a definition that applies to all languages regardless of traditional grammars. —Aɴɢʀ (talk) 18:42, 22 October 2014 (UTC)
- I don't think so. In any case, stating that the French nouns poker, septembre or arménien (the language) are proper nouns would clearly be wrong. They are not proper nouns in French. Lmaltier (talk) 18:58, 22 October 2014 (UTC)
  - But why not? What definition of "proper noun" are you using to determine that? Capitalization alone? Because if that's the only criterion that can be used to distinguish proper nouns from common nouns, then the distinction is definitely nonlinguistic. —Aɴɢʀ (talk) 19:40, 22 October 2014 (UTC)
    - No, we consider Parisien as a common noun in French, too, despite capitalization. When I refer to tradition of the language, I mean that the general meaning is always the same (see proper noun), but how it's interpreted precisely may depend on languages in some cases (in most case, it's the same in all languages recognizing proper noun as a word category). Lmaltier (talk) 19:52, 22 October 2014 (UTC)
      - So the distinction is made on the basis of native speakers' intuitions? A noun is a proper noun because it feels like a proper noun? —Aɴɢʀ (talk) 19:59, 22 October 2014 (UTC)
        This intuition is based on the tradition of the language, on how specialists of the language usually consider the word. In French, traditionally, proper nouns are names of places, people (and peoples), companies, brands, historical events, works of art or books, not much more. Sometimes, we hear about proper adjectives in English (seemingly according to capitalization), this word is meaningless in French. Lmaltier (talk) 05:59, 23 October 2014 (UTC)
        So still no definition, just an appeal to authority. I'm becoming more and more convinced there's no such thing as a proper noun. —Aɴɢʀ (talk) 12:05, 23 October 2014 (UTC)
        In French the definition is simple: a proper noun is used to described a unique being or thing. Every modern French dictionary unambigously distinguishes proper nouns from common nouns: Larousse, Robert, TLFi, Dictionnaire de l'Académie française... Just because you can't find a universal definition for a proper noun doesn't mean that you can ignore this distinction when it is part of a language like French. Dakdada (talk) 13:10, 23 October 2014 (UTC)
        
        If my family owns one dog and my mother says "Have you fed the dog?", then "the dog" refers to a unique being; does that make it a proper noun? What about language names like arménien mentioned above? Is that not a unique thing? Then why is it not a proper noun in French? Just because dictionaries invent distinctions to make life easier for language teachers, that doesn't mean those artificial distionctions are actually part of the language. —Aɴɢʀ (talk) 13:49, 23 October 2014 (UTC)
        That's just the + dog. It doesn't change the fact that dog is a common noun. Language names are debatable, but obviously I can't convince you if you really don't see any difference between e.g. city and London. Dakdada (talk) 16:30, 24 October 2014 (UTC)

The kind of thing that a proper name names can include a lineage (real, hypothetical, or conventional), as a Roman gens or a taxon. It can include a people, race, tribe, breed, family?, etc, even when they are not lineages. All of these can be plural in form, but they are considered to be referring to a single entity. Such a word, whether singular or plural, when referring to an individual member or subset of any such grouping, seems to me to be a common noun.

More generally it is a question of convention, as almost all actual language is, as opposed to part of some ephemeral rational scheme, purported to be universal and timeless, but actually just a hypothesis.

If a given definition has exceptions, that does not invalidate the definition, which is usually of the typical member of the class. Wittgenstein's discussion of game (or was it Spiel?) should informative. DCDuring TALK 14:29, 23 October 2014 (UTC)

A distinction can be made for analytical purposes (not, IMO, for Wiktionary presentation purposes) between proper names and proper nouns. Mary is a proper noun, sometimes serving as a proper name (where the context makes it sufficient to uniquely identify the individual) and sometimes as part of a noun phrase (Mary Ellen Smith) that serves as a proper name in other contexts (but not necessarily all possible contexts). That White House is a proper name, which we present as a proper noun, does not make House a proper noun or proper name. House is a proper noun by virtue of its use as a surname.

It is hard for me to believe that the request for a definition is anything but a rhetorical ploy, as such definitions are abundant and adequate for most purposes. If we need something more for purposes of knowing what goes under a given language's Proper noun heading or into the category, we can either impose the host language's conventions, either universally or by default, allowing exceptions for the conventions of other languages. We already allow orthographic departure from English usage and certainly don't impose English grammar (eg, use of determiners) on other languages, not even PoS headers, useful though they may be. If someone would like to document the proper noun/proper name practices of a language in an appendix, they would be doing the project a service. DCDuring TALK 14:29, 23 October 2014 (UTC)

No, the request for a definition is not a rhetorical ploy. I'd genuinely like a definition because I am often uncertain whether to label a particular noun as a ===Proper noun=== or not, especially in languages other than English. Usually I simply have to rely on how the English equivalent is labeled. Most conventional definitions seem to be circular and therefore useless, as in: "When is a noun capitalized in English? When it's a proper noun. OK, so when is a noun a proper noun in English? When it's capitalized." Either that or hopelessly vague, as in "a proper noun is the name of a specific, unique being", which doesn't explain why The Hague is a proper noun that just happens to include the word the, but the dog is a common noun made definite by the presence of the definite article. —Aɴɢʀ (talk) 17:23, 24 October 2014 (UTC)

The Hague is the name of a particular city. The dog is not the name of a particular dog (just the + dog). It has nothing to do with the definite article or the capitalization, which are secondary and language related. If you want definitions, what about w:Proper noun? Dakdada (talk) 17:53, 24 October 2014 (UTC)

If a distinction can be made between definite and indefinite reference, then it's a common noun. Otherwise it's a proper noun. If both, then it's both. --Ivan Štambuk (talk) 18:38, 24 October 2014 (UTC)

Wiktionary:Votes/pl-2011-12/Merging_proper_nouns_into_nouns. --Dan Polansky (talk) 17:55, 24 October 2014 (UTC)
Recommended reading: User:EncycloPetey/English proper nouns. --Dan Polansky (talk) 18:08, 24 October 2014 (UTC)

I can't venture anything about languages other than English.

Not all capitalized words or expressions in English are proper nouns. You should discard with prejudice any reference that says otherwise.

The Hague (sometimes the Hague) is a proper noun because of its definition. I expect that it has the attached because it is a calque of Den Haag.

The is attached to Netherlands in running text (but not in mailing addresses, etc.), probably because of the historical Nether Lands, whether factual or imagined.

In English it is usually not too hard to distinguish in current and recent usage between a definite expression (usually with the) that describes or characterizes something and a proper name that includes the. But it was not too long ago that an expression like "John, sawyer" served to uniquely identify someone on parish rolls.

In English the incompatibility of a proper name with a or any or every seems more indicative than the presence of the.

In English the hand of history and fashion is very visible. Usage dictates. How each usage gets started or terminates can be a very particular story. As a result I don't think there is a short list of rules and exceptions that covers all the cases. That is why WP needs a style sheet that documents its decisions about capitalization and why the taxonomic naming authorities have explicit rules. And why users need dictionaries and style guides. Wiktionary can do a better job of providing such lexical information than other references if we continue to be willing to do so. We can check corpora and style guides so users who trust us don't have to. DCDuring TALK 18:44, 24 October 2014 (UTC)

Small, doable modification to WT:CFI#Idiomaticity

WT:CFI#Idiomaticity sentence #1:

An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components.

Change this to

A multi-word term is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components.

The changed part here is A multi-word term.

Rationale: WT:CFI does not define what an expression is, the Wiktionary entry expression isn't any help either. Some multi-word terms like come in may not be considered expression. Multi-word term is vastly better than term, because term could include single words with transparent meanings, like improvable, points (plural of point) reenter (“enter again”) and so on. I'm canvassing to see if there's enough support to make a vote of it. Renard Migrant (talk) 12:01, 23 October 2014 (UTC)

The sentence is better, but is it really useful anyway? Idiomaticity of multi-word terms should not be a condition for inclusion. ice hockey cannot be considered as idiomatic. Nonetheless, it's a term of the English language, and including it is therefore normal. Lmaltier (talk) 16:54, 23 October 2014 (UTC)

Support. I think it is useful. — Ungoliant ^(falai) 17:16, 23 October 2014 (UTC)

Lmaltier I appreciate your input, but we also know from past experience it's just you that thinks this. Also I do consider ice hockey idiomatic. It has very different rules to hockey. Like, is table tennis merely tennis played on a table? I certain don't think so! Renard Migrant (talk) 11:52, 24 October 2014 (UTC)

Of course. Nonetheless, the meaning can be easily derived from the meaning of its separate components (provided you know the sport, even without knowing its name). I copy the definition of idiom: An expression peculiar to or characteristic of a particular language, especially when the meaning is illogical or separate from the meanings of its component words. table tennis is not something peculiar to English or characteristic of English, and its meaning is not illogical nor separate from the meanings of its component words. You understand why I don't like this sentence as a criterion. Lmaltier (talk) 18:18, 24 October 2014 (UTC)

I have to oppose. I think the term "expression" was intended to cover both single words and multi-word terms. The new wording would not do that. Therefore, the new wording would no longer define what CFI:idiomatic means for single words like "redefine". Right now, "redefine" is idiomatic because its components are not separate enough. --Dan Polansky (talk) 17:41, 24 October 2014 (UTC)
- But that interpretation is not the status quo. The status quo, although it's an unwritten rule, is to accept all single words (for varying interpretations of "word") as idiomatic regardless of morphological transparency. Or to say it another way, idiomaticity is not a factor in the inclusion of single "words". —CodeCat 18:23, 24 October 2014 (UTC)
  - What I have written is consistent with current common practice. For instance, we include "blueness", since while "blueness" is clear from "blue" and "-ness", the two are not separate, which matters for "An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components." --Dan Polansky (talk) 22:39, 24 October 2014 (UTC)
    - I can see this interpretation, just it wouldn't be my interpretation; blue and -ness are separate. "Separateness" doesn't mean "separated by a typographical space". Renard Migrant (talk) 13:15, 25 October 2014 (UTC)
      - I like your interpretation very much, and a good spot, well done! However if you think of a cake made of eggs, flour, sugar and butter, when you've made the cake are eggs, flour, sugar and butter separate ingredients or not? Of course they are! Just because you've used them to make one cake doesn't mean they aren't separate concepts. Renard Migrant (talk) 19:23, 26 October 2014 (UTC)
        Then please extract eggs, flour, sugar and butter from your cake. If they are separate as you claim, this should be easy. Also: talking to yourself is a bad habit for a dungeoneer. — Keφr 19:33, 26 October 2014 (UTC)
        
        Another thing to note: a criterion like WT:COALMINE makes much more sense if single-word terms are automatically presumed idiomatic. — Keφr 19:47, 26 October 2014 (UTC)

Dialect context labels - adjective, dialect name or place name?

There's something vaguely weird on croggan: The first sense is described as "Cornish" while the second is "Scotland", and the mixing of parts of speech stands out a bit. This isn't an isolated thing - among other British Isles dialects, we have "Wales", "Ireland", "Teesside" and "Yorkshire", but "Geordie" (rather than "Newcastle-upon-Tyne" or "Tyneside"), "Bristolian" (rather than "Bristol"), "Manx" (rather than "Isle of Man"), "Northumbrian" (instead of "Northumbria") and "Liverpudlian" (rather "Liverpool", "Merseyside" or "Scouse").

I understand why we can't use (for example) Welsh or Irish as context labels in English-language entries (and by that logic, "Manx" is probably inappropriate too since there's a Gaelic Manx language), but the mishmash is a bit strange. Would people object to changing the labels to follow this pattern?

We use the proper name of the city/region that spawned it, except for in the handful of cases where the dialect has a widely-understood name that is not etymologically related to its origins (Geordie, Pitmatic, Cockney, Scouse - possibly Cajun, although I don't know whether everything currently tagged "Louisiana" is actually Cajun English.)

It just seems a bit cleaner that way. "croggan" would then be (Cornwall, Scotland), "mam" would be (Scouse, Northumbria). Smurrayinchester (talk) 16:40, 23 October 2014 (UTC)

I think that using the adjective could be more practical. It would allow us to distinguish terms used in a place from terms used in the context of discussing a place. —CodeCa t 16:45, 23 October 2014 (UTC)

I prefer using placenames. Using placenames in context labels for senses discussing the place is usually confusing and can always be improved by removing the label and amending the definition (i.e., at ABC “(Brazil) cities that form the most important industrial area in the country.” → “(geopolitics) cities that form the most important industrial area in Brazil.”). — Ungoliant ^(falai) 17:11, 23 October 2014 (UTC)

When we had context labels rather than a module, we used to redirect things like {{Scottish}} to {{Scotland}} so that both displayed (Scotland). I see no reason to discontinue this. Having said that an adjective is better if it's more accurate or easier to understand, so Geordie rather than Tyneside, I'm fine with that. Renard Migrant (talk) 11:55, 24 October 2014 (UTC)

We still do that, only everything is within the module. --Wiki Tiki 89 14:52, 24 October 2014 (UTC)

Good, then let's keep doing that, unless people don't want to. Renard Migrant (talk) 13:17, 25 October 2014 (UTC)

I've noticed this inconsistency myself. The other 'restricted register' labels I can think of are adjectives ("dated", "archaic", "obsolete", "uncommon", "rare", etc), whereas the labels I can think of that are nouns indicate restricted topical contexts ("mathematics", "aviation", etc). Context labels should indicate when a word is restricted to a certain place's dialect, while definitions should indicate when it's topically connected to a certain place, IMO. Ungoliant has a good example of how to clear up a misuse (or at a minimum a confusing use) of "(Brazil)". So, my inclination would be to make all the 'dialect' labels adjectives, noting that "UK" and "US" are adjectives and so can stay as they are. - -sche (discuss) 04:26, 27 October 2014 (UTC)

Black's Law 2d going up at Wikisource

Just a heads up - I am currently creating OCR pages of Black's Law Dictionary, 2d Edition (1910) at Wikisource, and would eventually like to bring as much of it as is useful over here. Cheers! bd2412 T 21:02, 23 October 2014 (UTC)

Cool! Maybe you should make a Template:Black's 1910 or something, similar to {{Webster 1913}}, for entries taken from it. —Aɴɢʀ (talk) 21:26, 23 October 2014 (UTC)

Yes, that is a very good idea. bd2412 T 21:27, 23 October 2014 (UTC)

@BD2412: That is excellent news. Thank you for your efforts. — I.S.M.E.T.A. 23:27, 23 October 2014 (UTC)

Cool. Even if you don't bring it here. DCDuring TALK 00:27, 24 October 2014 (UTC)

Maybe you could link all the terms here, like (a lot of work!) DTLHS (talk) 00:53, 24 October 2014 (UTC)

Where is it? --Ivan Štambuk (talk) 18:31, 24 October 2014 (UTC)
- s:Index:Black's Law Dictionary (Second Edition).djvu. —Aɴɢʀ (talk) 19:05, 24 October 2014 (UTC)
  - Here is a treasure: "HALYWERCFOLK. Sax. In Old English law. Tenants who held land by the service of repairing or defending a church or monument, whereby they were exempted from feudal and military services". bd2412 T 15:57, 25 October 2014 (UTC)
    - Sadly, having done a bit of digging in the hope of creating an entry, it looks like the concept of halywercfolk/hailworkfolk/Holyworkfolk/holy-work-folk was only ever invoked once, when the Bishop of Durham tried to get the men who maintained shrine to St. Cuthbert to fight the Scots. I've created an entry here, but all the citations seem to be about the same group of people. Smurrayinchester (talk) 08:50, 26 October 2014 (UTC)

Meta RfCs on two new global groups

Hello all,

There are currently requests for comment open on meta to create two new global groups. The first is a group for members of the OTRS permissions queue, which would grant them autopatrolled rights on all wikis except those who opt-out. That proposal can be found at m:Requests for comment/Creation of a global OTRS-permissions user group. The second is a group for Wikimedia Commons admins and OTRS agents to view deleted file pages through the 'viewdeletedfile' right on all wikis except those who opt-out. The second proposal can be found at m:Requests for comment/Global file deletion review.

We would like to hear what you think on both proposals. Both are in English; if you wanted to translate them into your native language that would also be appreciated.

It is possible for individual projects to opt-out, so that users in those groups do not have any additional rights on those projects. To do this please start a local discussion, and if there is consensus you can request to opt-out of either or both at m:Stewards' noticeboard.

Thanks and regards, Ajraddatz (talk) 18:04, 26 October 2014 (UTC)

I think you mean 'requests for comment'; here 'RfC' usually means 'request(s) for cleanup'. Renard Migrant (talk) 19:19, 26 October 2014 (UTC)

Mari terminology

Is there any particular reason why the two literary standards of Mari (the Uralic one) have been titled "Eastern Mari" and "Western Mari"? Following Ethnologue? I would suggest that "Meadow Mari" and "Hill Mari" are preferrable, for at least two reasons:

The traditional subethnic self-designations are specifically "Meadow Mari" and "Hill Mari"
There exists an "Eastern dialect" (spoken in Bashkortostan) distinct from standard Meadow Mari. Hence the term "Eastern Mari" is ambiguous.

--Tropylium (talk) 14:46, 27 October 2014 (UTC)

In the case of Western Mari, yes, the name was just imported from the ISO / Ethnologue along with the code. Eastern Mari was previously called just "Mari", until "Mari (Sepik)" and "Mari (Austronesian)" were added to Module:languages and disambiguation became necessary. At that time (see the archived discussion; skip the first half, which is about Buryat) I went with "Eastern Mari" over "Meadow Mari" so as to conform to "Western Mari", and because "Eastern Mari" seemed to be more commonly used than "Meadow Mari". Oddly enough, Andrej Malchukov and Anna Siewierska's Impersonal Constructions: A cross-linguistic perspective →ISBN, page 397, suggests that "Eastern" and "Western Mari" are the linguistic self-designations: "Mari has two literary variants Hill and Meadow Mari (or Western and Eastern Mari according to their own terminology)". OTOH, the difference in commonness is not large if you cut out the exceptional year 2003 (compare to and ), and there is the ambiguity you note: a few references refer to three or four Mari dialects and distinguish "Eastern" from "Meadow". And "Hill Mari" seems to be more common than "Western Mari". So I wouldn't object to renaming them both. - -sche (discuss) 20:21, 27 October 2014 (UTC)

I support such renaming. — I.S.M.E.T.A. 01:46, 1 November 2014 (UTC)

WMF grant request for a "Kids Visual Dictionary"

Hello all, I co-designed a Wikimedia outreach project to get a group of Indian kids to learn computer graphic while creating a real Wikipedia picture dictionary for basic English which they could be proud of ! The whole team will be under the management of a professional graphic designer lady who previously worked at Yahoo Inc India. The IEG proposal is detailed there on meta. We are obviously thinking to illustrate the wikitionaries for the most frequent words, and since the data will be structured, it could also help to build up further resources for various languages. As we are competing with other great projects as well, please take a look, your support for this Kid Visual Dictionary is also much welcome (here). Yug (talk) 18:45, 29 October 2014 (UTC)

Hello. You might like to take a look at the existing Wiktionary:Picture_dictionary. I don't think anybody has been actively working on that for a while, but a certain amount of work was done. Equinox ◑ 21:36, 29 October 2014 (UTC)

Rhymes

Is there a Wiktionary policy on which dialects to include in pronunciations and in rhymes. For example, your has pronunciations that would be non-standard on this side of the pond (including a recent addition that is common in my home dialect, but which I would expect to see only in a dialect dictionary). If we include every dialect variation, then the pronunciation section will take up the whole initially displayed page for many entries. Do we include all regional vowel mergers in rhymes? Personally, I would prefer to see only the "standard" pronunciations and rhymes, as given in major dictionaries, but I realise that we will probably not agree on "standard". Dbfirs 11:07, 30 October 2014 (UTC)

The pronunciation /jɝ/ is actually very common in American English and has nothing to do with vowel mergers, but rather with the re-stressing of a previously unstressed and reduced vowel. It seems strange for me, however, to include rhymes at all for your, since as far as I know this word can never occur in a rhyming position, since it must always be followed by a noun, since otherwise it becomes yours (or its homophone you're is separated back into its parts you are). The only sort of rhyme I can imagine for it is something like "your chin" and "urchin". But I would definite include /-ɜː(ɹ)z/ as rhyme for yours. --Wiki Tiki 89 12:09, 30 October 2014 (UTC)

So does your rhyme with year and were in many parts of America? (Strangely, though that rhyme exists in my local dialect, I can think of no part of England where yours is pronounced /jɜː(ɹ)z/. Maybe in Ireland?) Do words like insure, secure and mature also rhyme with refer and deter in those parts of America, and is demure a homophone of demur? Dbfirs 13:03, 30 October 2014 (UTC)

With were, yes (although, keep in mind that the pronunciation /jɔɹ/ is still used interchangeably with /jɝ/), but certainly not with year, which is pronounced /jiːɹ/. Words like insure, secure, and mature do rhyme with refer (unless mature is pronounced /matuːɹ/, which is rare even in dialects with otherwise regular yod-dropping), but in some dialects they rhyme with core instead; demure and demur still differ by the /j/ sound. --Wiki Tiki 89 16:04, 30 October 2014 (UTC)

Does the /j/ mean that they do rhyme or not. If we include every dialect world-wide, we will end up with lots of rhymes that are nonsense to the majority of speakers of English. Dbfirs 16:34, 30 October 2014 (UTC)

Tough one. Show has {{rhymes|əʊ}} but not {{rhymes|oʊ}}. I can see why it's preferable to have only one rhyme, but how do you pick? How is this different from colour and color (that is, in term of an 'alternative form' template, which neither has)? Renard Migrant (talk) 15:25, 30 October 2014 (UTC)

That's the way we standardized it, since RP /əʊ/ always corresponds to GA /oʊ/. --Wiki Tiki 89 16:04, 30 October 2014 (UTC)

Yes, {{rhymes|oʊ}} doesn't exist because it would be identical to {{rhymes|əʊ}}. Personally, I'd prefer the former, but 1950s RP has the latter. Should one redirect to the other or should we just add a note to {{rhymes|əʊ}}? ... and it would be {{rhymes|oː}} in my dialect! Dbfirs 16:31, 30 October 2014 (UTC)

I would prefer "oʊ". It's more neutral because it's in the middle between the extremes (oː on one side and əʊ/əʉ on the other). —CodeCa t 16:47, 30 October 2014 (UTC)

That makes sense to me, though, as a courtesy, I'd like to get the agreement of the creator of the rhymes section who put a lot of work into it. It would be the same rhymes page, just a different heading. Perhaps the heading could include both /əʊ/ and /oʊ/, then we wouldn't need to change all the entries. I wasn't seriously suggesting {{rhymes|oː}} because I don't think we need to include hundreds of regional variations. Dbfirs 18:36, 30 October 2014 (UTC)