Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2013/July. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2013/July, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2013/July in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2013/July you have here. The definition of the word Wiktionary:Beer parlour/2013/July will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2013/July, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
The point I raised on that page is that these templates violate WT:ELE. So we should either modify that page to allow {{ja-romaji}} to take the place of a definition line (which requires a vote like all significant changes to WT:ELE do), or we should not use this template in its current form. —CodeCat20:25, 1 July 2013 (UTC)
It looks like that vote doesn't even mention {{ja-romaji}} at all, even though it's the key in this whole discussion. So it doesn't seem very relevant. Besides, the vote did not pass so it's moot. —CodeCat20:42, 1 July 2013 (UTC)
Eirikr, should editors working with Japanese stop maintaining/adding romaji entries entirely and focus on kana and kanji? Convert romaji entries, which have only one value to redirects and let opponents maintain romaji, if they oppose the new structure so badly and refuse to listen to editors who actually contribute in the Japanese space. It's quite discouraging and frustrating. Just a thought. Otherwise, I strongly oppose yet another proposal to undermine our efforts. --Anatoli(обсудить/вклад)22:49, 1 July 2013 (UTC)
Actually, Anatoli, that might be exactly the answer. :)
@CodeCat, you've put together your module for deriving romaji from a given kana string, as used by {{ja-suru}}, for instance. Could you and/or Liliana leverage that to undertake the bot-driven creation and maintenance of JA romaji entries, in accordance with whatever the outcome is of the current formatting vote? Such entries are purely simple links with no gloss; theoretically, no human need intervene, once a bot is up and running. I've toyed with the idea of getting a bot going for this purpose before, but with the state of JA romaji entries in such flux, and with everything else going on in life, I haven't gotten around to learning everything required to make a bot.
FWIW, my opposition is entirely because this change impacts JA editors almost exclusively, and because this requires changes in what human editors do, as romaji entries are currently entirely human-created and human-maintained. If romaji entries are instead entirely bot-maintained, my concerns evaporate, as does my opposition to the vote. -- Eiríkr Útlendi │ Tala við mig23:03, 1 July 2013 (UTC)
I fundamentally disagree with the notion that this "impacts JA editors almost exclusively". I think Liliana's comment on the vote talk page is apt: you want all bot operators to code their bots to account for Japanese entries using a different basic structure than other entries. - -sche(discuss)23:17, 1 July 2013 (UTC)
How many bots look for the specific autoformatting issues presented by {{ja-romaji}}? The *only* concrete concern I've heard about is with regard to KassadBot. If other bots are choking on {{ja-romaji}} as well, by all means, clue me in. Otherwise, Liliana's comment is purely theoretical outside of KassadBot, and wildly exaggerated to boot -- I'm not asking 999 out of 1000 computer users to switch from Windows to Linux, I'm asking those specific computer users out of the 999 Windows users (which is some much smaller number) to use a specific program when working with specific files. More in the mien of this thread, I'm not asking all bot operators to completely rewrite their bots on a completely different platform with a completely different coding paradigm, and instead I'm asking those bot operators whose bots choke on {{ja-romaji}} to add a few lines of logic to handle this specific case. -- Eiríkr Útlendi │ Tala við mig23:26, 1 July 2013 (UTC)
A couple of things:
If you make a pause in contributing Romaji for a while, you may create more content-having Japanese entries.
Now that they are content-free, Romaji entries can be created by a bot, so if humans stop creating them manually, that will actually save human effort. If the editors previously contributing Romaji are no longer enthusiastic about Romaji, new editors will appear. After all, the only thing it takes to create Romaji with a definition line in the wiki text is to expand the current template with "subst:".
Having unified formatting helps all sorts of ad-hoc reporting over a dump using such tools as grep, sed, awk, perl oneliners and the like, regardless of whether current bots choke on disunified formatting. Reporting over a dump is often done without a connection to Mediawiki server and without having templates expanded. I have done such reporting and have no idea how to expand templates in a dump. Arbitrarily breaking assumptions that such ad-hoc reporting relies on is no good thing. I am talking from actual experience. --Dan Polansky (talk) 16:24, 3 July 2013 (UTC)
Re: leaving romaji entries be for the time being, I do hope these can be bot-maintained. I will be ignoring them for the foreseeable future, other than possibly inquiring about bots.
Re: dumps, this is very useful concrete information that either wasn't presented before, or that I missed. Without this information, I am left with the impression that this issue boils down to bot maintainers versus Japanese editors, which isn't a very useful mental model. Thank you for explaining. -- Eiríkr Útlendi │ Tala við mig18:05, 3 July 2013 (UTC)
Here is a discussion regarding these two templates and their corresponding categories, it was suggested that we bring it up at BP. This type of categorization is useful, because it's the only thing that we know in many cases, that the source language belongs to which one of these categories of the language family -- Old, Middle, or New. Note that this is not a purely chronological division, but also a linguistic one. Similar classification exists for other language families, such as Indo-Aryan (Old Indic, Middle Indic, Modern Indic). --Z11:00, 2 July 2013 (UTC)
If it helps others to understand the issue, think of these categories as being analogous to "modern Germanic languages" or "Celtic from the 10th to 15th century". They are genetic groupings (like a real family) but with the addition of a time frame. —CodeCat11:54, 2 July 2013 (UTC)
Is it ok to replace gender templates with Template:g?
A week ago I made a post at Wiktionary:Beer parlour/2013/June#Propose to use a single template for genders, but that didn't receive any attention so I was worried that if I started replacing things, others would complain. So I am now explicitly asking if it's ok to do this. Basically, all remaining transclusions of gender templates would have {{g}} prefixed to them, so that instead of the variety of templates we currently still have ({{m}}, {{f}} and so on), there would be only this one. This template would be used only in entries themselves; in templates you'd just invoke Module:gender and number directly. This change will have some consequences for scripts and bots that still use the old templates, but they will probably not be orphaned and deleted for some time, so there is no sudden rush to fix everything yet. —CodeCat14:11, 3 July 2013 (UTC)
What are the consequences to users, contributors, and template writers in terms of appearance, performance (eg, download time), keystrokes required, etc. If there are costs, what are the offsetting benefits and to whom?
Also, do you have a system for converting wikicoded genders, eg ''f'' to whatever your desired gender formatting approach is right now? DCDuringTALK14:23, 3 July 2013 (UTC)
There are no consequences at all to users or template writers. This change applies only to cases where these templates are placed directly in entries for some reason. All that changes is that you write {{g|f}} instead of {{f}} if you want to put a gender in an entry. But even that is pretty rare, because most of the time you will use a headword-line template like {{head}} or {{fr-noun}}, which have their own support for genders and therefore these templates are not placed in the entry itself but are handled internally by the template. There is nothing to be done for things like ''f'', although of course it is probably desirable to convert those to another format (like the one I propose) at some point. But finding that will be hard, it would probably need someone to go through a dump (which I'm not experienced with) to find and list all the instances. —CodeCat14:36, 3 July 2013 (UTC)
I've made a list of all pages that use ''f'', ''m'', ''n'', ''c'', ''p'' or ''pl'': Wiktionary:Todo/Non-templatised genders. There may be uses of those strings that shouldn't be templatised, so some human inspection of them is necessary before any mass/automated changes are undertaken. - -sche(discuss)21:26, 3 July 2013 (UTC)
No consequence = 2 extra keystrokes per insertion for the manual assertions, which are there "for some reason" that is evidently beyond understanding. I am getting weary of all the extra keystrokes. Where are the changes that save effort instead of costing effort? DCDuringTALK00:03, 4 July 2013 (UTC)
If typing is so hard for you, why don't you save yourself some keystrokes and stop nitpicking? :/ —CodeCat00:10, 4 July 2013 (UTC)
Because God knows where this will stop. You've caused a significant waste of keystrokes with the changes to "context". If you don't get some pushback from someone, we will end up with one baroque system after another and no actual content contributors. I feel that our previous tech-side contributors seemed to take more care to not constantly change the user interface in one petty way after another. They seemed to enjoy actually making the site easier to use. DCDuringTALK01:43, 4 July 2013 (UTC)
I understand and partially share your frustration. I supported the change to {{context}}, because the old {{context}} was and is dilapidated, and changing how it is called from entries is a first step to rewriting it. I agree that all the added keystrokes are a hurdle, though. Is there any reason I shouldn't redirect {{cx}} so that it can be used as a shortcut? (Recall, if you favour a different abbreviation/shortcut, that we can create more than one.) And can the switch from the current {{context|archaic|lang=foo}} format to the shorter {{cx|foo|archaic}} format be prioritised?
As for these gender templates... I tend to agree that {{m}} and {{f}} should be left as redirects to the module, so that they can continue to be invoked in entries as they have been. I don't see any benefit to replacing them with {{g|m}}, and I see the same drawback that you (DCDuring) see. And as much as I'd like to see {{c}} commandeered for use as a redirect to {{context}}, it would be counter-intuitive for some but not all of the gender templates to be able to be used directly vs through {{g}}, so it's probably best to leave it (and {{n}} and {{p}}) as gender templates. Indeed, if Polish is ever moved out of the way, {{pl}} could be added as a redirect to {{p}} ({{g|p}}). - -sche(discuss)02:03, 4 July 2013 (UTC)
All of these templates really contain the same code, though. Just look at what is inside {{m}} and {{f}}. Why do we need many that do pretty much the same when this can be unified more clearly into a single template? More to the point, {{g}} can do more than {{m}} can. If you want to display "masculine animate" for example, you cannot use {{m}}, and since there is no {{m-an}} template, you need {{g|m-an}}. Thus, we can never get rid of {{g}} without sacrificing functionality, and that means these templates are already shortcuts to common cases, but they can never be exhaustive because Module:gender and number and its companion template {{g}} are just that much more flexible than the other templates can ever hope to be. —CodeCat02:09, 4 July 2013 (UTC)
So what? We often use some templates for common cases and other templates for more complex and/or obscure cases (e.g., adjective declension templates). There's no reason not to keep {{m}} and {{f}} and other very common genders' templates as shortcuts, while using {{g}} in more complex (and obscure) cases. How many entries need to specify something outside of {{head}} or another headword or linking template as "masculine animate"? Probably not many, compared to how very many specify things as "masculine". Unlike contexts, which there are a very large (arguably infinite) number of, there are only very few genders which are common across the thousand languages we cover: m, f, n, c, p (and perhaps, once we greatly expand our hitherto tiny coverage of Native American languages, animate and inanimate... but masculine animate seems comparatively rare; how many languages use it, a few dozen?). - -sche(discuss)02:21, 4 July 2013 (UTC)
Note that the change may eventually save characters, we can use the old gender templates for more important templates, e.g. {{m}} for {{term}} (mentioned term), or {{c}} for {{context}}. Moreover, usually there is no need to use gender templates directly. --Z21:19, 5 July 2013 (UTC)
I suppose that's right. The gender templates do next to nothing. If the gender annotation isn't handled by the inflection-line template or a user doesn't figure out those templates, there's always good old wikitext formatting. And AF or bot runs can pick those up. I can't imagine that we will soon have a complete user-friendly, error-trapping interface for our entries by default. We just have to make sure that we don't delete contributions just because they don't use templates or have deficient formatting. DCDuringTALK21:56, 5 July 2013 (UTC)
I've created this template now. It works the same as {{context}} with one difference: the language code is given as the first parameter rather than as lang=. Both templates will continue to work for now, so there is no immediate need to start using this, but it's now available. I am not sure whether to call it "label" or "labels" though. The module is called Module:labels already, and this template technically shows many labels rather than just one label. —CodeCat16:38, 4 July 2013 (UTC)
Spelling variations by introduction of repetitive vowels for emphasis.
It is trivially easy to find CFI-worthy citations for certain interjections containing the addition of repetitive vowels for emphasis. For example, a writer might write pleeeease with extra e's added to indicate the plaintive nature of the plea. See, e.g.:
1965, Kenneth Theodore Mackenzie, The Deserter, page 52:
"Please, Japie, pleeeese," said Gert, beginning to cry.
1996, Ginny Russell, Step by Step, page 24:
As soon as you get home, pleeeese put clean sheets on your bed, loan your snakes to Susie, scrub bath tub, hide the liquor, and call me at work.
2008, Ray Green, The Seventh Sense, page 139:
Oh, God, please, please, pleeeese help me. I'll be good. I won't do it again. I promise. Pleeeese let this fucking market fall.
2013, Sherry Schumann, The Christmas Bracelet, page 93:
“More juice pleeeese.” John grabbed another carton of orange juice from the door of the refrigerator.
I have found multiple examples with up to eleven e's (but, oddly, no CFI-worthy hits with more than that). Other very common examples include the addition of o's to "love", "no", and "stop", of e's to "help", and of u's to "you". Should we include these? If so, should we redirect them to the correct spellings, list them as alternative spellings, perhaps as eye dialect spellings indicating that their use suggests an exceptionally forceful use of the word by the speaker? Or should we do something else entirely with them? bd2412T19:16, 4 July 2013 (UTC)
It seems help would just about pass CFI with 12 es. I don't think we would be doing anyone any favours (or using our time productively) by including all such forms. It is to be understood that English virtually never has 3+ of the same letter together in a word, and such instances are probably indicating a vocal prolongation, rather than trying to coin a new "word". Equinox◑19:26, 4 July 2013 (UTC)
It's one of those cultural things, like pig Latin and double Dutch, that turn up in speech, but aren't really lexical in nature (although some instances of those have become lexicalized and are included). A similar phenomenon is words that are truncated to indicate interruption: "I keep trying to tell y-" "I don't want to hear it!".
This can potentially occur with any word, with varying numbers of letters. It could get pretty involved documenting all the attestable ones. Perhaps we could create redirects for some of the more common ones, but some languages have orthographies that don't write phonemic glottal stops and/or show length by repetition, so there's at least the potential for conflicts on the shorter sequences. Chuck Entz (talk) 20:13, 4 July 2013 (UTC)
I was referring to the possibility of a redirect having the same spelling as a regular word in another language. I'm thinking of creating an appendix to document things like this in one place: word games, truncation, abbreviation, sandhi, pronunciation spellings, eye dialect, etc. There are quite a few processes that create spellings that don't show up in dictionaries, but can be hard to find information on. I'm not sure what to call it, though. "Spelling distortions" is the first that came to mind, but that's a bit clunky. Chuck Entz (talk) 20:50, 4 July 2013 (UTC)
Those, we have entries for. Supposing that I want to use some of my time unproductively, perhaps we should have entries for a few very, very common examples and redirects from less common examples to those very common ones (i.e., an entry for "pleeeease" to which "pleeeeeeeeeeease" redirects). bd2412T20:59, 4 July 2013 (UTC)
Redirects can be accompanied by a usage note on the target page indicating that some writers may add an arbitrary number of extra instances of the primary vowel for emphasis (I note that pleeeease with 4 e's gets about 40 times as many hits as pleaaaase with 4 a's). Also, it is worth noting that this seems to be a fairly modern phenomenon, largely restricted to about the last century. bd2412T23:00, 4 July 2013 (UTC)
Sounds good, and soft redirects can be used instead of hard redirects in any instances where another language has a valid term spelt the same way (e.g., if "aaaah" is a Waray-Waray word for "the foobar bird"). And the text of the usage note can be transcluded from a master template, so that the language doesn't go out of symc. (Compare the templatised usage notes than a few Latin entries use.) - -sche(discuss)23:08, 4 July 2013 (UTC)
I like the idea of including the most common as real entries, with all the others displayed as alternative forms of that one, but existing only as redirects. There are families of these, however. For example, puhleeze, puh-leeese, puh-leaze. It would be nice if we had only a single "canonical form" for each of the families.
I think we need a new langcode for the Griko language of Southern Italy. It's a Hellenic lect written in the Latin script, and sometimes considered a dialect of {{el}}, although its grammar is relatively different. Generally, there is mutual intelligibility with Greek, but I think the writing system should seal the deal. I would propose a code like grk-gri.
Incidentally, there is also the matter of considering whether Calabrian Greek is a language; some authors say it is, but it is still written in the Greek script and seems a bit less changed to me, so I'm less sure on this one. —Μετάknowledgediscuss/deeds06:52, 5 July 2013 (UTC)
The Wikipedia articles you linked to say that Calabrian Greek is written the Latin alphabet and provide a Greek-alphabet sample of Griko, so unless you're getting your information from elsewhere, you may have gotten that backward. Can "Italy's toe" Greek and "Italy's heel" Greek be considered dialects of the same language, even if they're considered a separate language from Greece Greek? —Angr16:07, 5 July 2013 (UTC)
Yeah, I think I just had them switched in my mind. Sorry about that inaccuracy. Anyway, some sources do seem to group them together as Italian Greek, but it doesn't seem (to me, at least) to be any better than grouping them with {{el}}, from a linguistic standpoint. From an organizational standpoint, perhaps, although as we add inflectional templates and whatnot, it becomes even more of a organizational problem. —Μετάknowledgediscuss/deeds18:15, 5 July 2013 (UTC)
I can't find much German- or English-language literature on the admittedly specialist subject of the differences between Griko, Bova and standard Greek. There are plausible claims of Doric elements in Griko/Bova, but Wikipedia's note that the two are mostly based on the same Koine as other modern Greek varieties seems correct, and the lexical differences in the small sample provided seem no more pronounced than dreamed of a new parka vs dreamt of a new anorak. Besides, we combine (historical) Doric itself with Attic under the label of Ancient Greek, so Doric-ness would not in itself seem a reason to separate Griko or Bova from Greek. The difference in alphabet seems more significant: because "Greek" refers overwhelmingly to the Hellenic language now spoken in Greece and written using the Greek alphabet, it would be very confusing to have Latin-script "Greek" entries for Bova words, and hence it could be better to give that lect a separate code and L2, if it is typically written in the Latin alphabet.
The question of declension and conjugation tables is interesting, but IMO much broader than just "how should we list both the Griko and the Greek plurals of φοοβαρ?": not just with Greek but with most languages, we tend to show only standard inflections; that's not very descriptivist... but that's a subject for another BP thread, one I'll start soon. - -sche(discuss)21:07, 9 July 2013 (UTC)
Is Bova preferred over Calabrian Greek? I think at the least we ought to have a code/L2 for "Greek written in the Latin alphabet and spoken in Calabria", and you seem to agree, but the nomenclature still remains to be decided. —Μετάknowledgediscuss/deeds03:10, 12 July 2013 (UTC)
Should we be acting as a database of written works?
Consider Category:Latin quotation templates and imagine this data being moved into a single module, able to be called from a single quotation template in any entry- something like {{quote|title|passage=...}}. And then extending this approach to all other languages, with all of the bibliographic data in a single place. This would undoubtedly simplify quoting from well known works, but is this within our scope, or is it something better suited for wikisource? Do they collect bibliographic data from copyrighted works? DTLHS (talk) 20:37, 5 July 2013 (UTC)
Wikisource has no truck with copyrighted works at all, but Wikiquote does have quotes from copyrighted works. —Angr21:01, 5 July 2013 (UTC)
We use quotes from copyrighted texts under fair use exemptions, which allow use of limited amounts of the text in question under the right circumstances. By aggregating these small pieces in one place, we risk assembling enough of the original there to go beyond the limits allowed. If we had a quote for each line of a short poem, for instance, we might conceivably end up with the whole thing in our database. I'm not saying this arrangement would inherently be a copyvio, but we need to be know where the line is so we can take precautions to stay on the right side of it, if necessary. Chuck Entz (talk) 02:22, 10 July 2013 (UTC)
This is actually not an academic or abstract debate; we literally have this exact problem. Specific translations of the Bible are copyrighted, right? We essentially have a few chapters of Genesis in a couple languages (I am to blame for some of this myself). That could conceivably be quite a problem, if your fears are warranted. —Μετάknowledgediscuss/deeds05:53, 11 July 2013 (UTC)
Inconsistent mention of inflected forms
For a descriptivist dictionary, we're rather inconsistent when it comes to mentioning, in the entries for lemmata, their attested inflected forms.
sit, on the other hand, does list its obsolete/dialectal past tense form sitten on its headword line... but doesn't list its obsolete third-person form sitteth.
forbid lists its past tense forms forbid, forbade and forbad, but not forbode.
laugh, among the many obsolete and modern forms on its headword line, including laugh'd, quite spectacularly lists "(obsolete)" set apart by commas as if it were itself a past tense form...twice. And given that some of our entries put obsolete tags in the headword line before the obsolete term, while others put the tag after the term, it's not even immediately obvious which term the tag is qualifying. It doesn't list laughest.
streak lists only its standard forms; it doesn't list e.g. streak'd.
Similarly, in other languages, we usually only list the standard, modern inflected forms and not the dialectal ones... which can raise questions if, as discussed in the section above this one on Griko and Bova, the dialectal forms are different from the standard ones.
I'd like us to come up with some standard way of both presenting all common inflected forms of each word in its entry, and not misleading anyone as to the standardness vs dialectality and archaicness vs modernity of each form. What I suggest is this: on the headword line and in the first inflection table, list only the standard, modern forms; then, explain the all of the other (obsolete and dialectal) forms in their own inflection table or templatised usage note...possibly omitting the forms like laugh'd which are just spelling/typographical variants.
What do you think? Does that idea so good enough to start fine-tuning it and making mock-ups? Or would you propose something else? - -sche(discuss)22:06, 9 July 2013 (UTC)
Well drownd and drowndd seem to me to be just different ways of spelling the the same word: drowned, while sitteth and sitten are forms in their own right. I support mentioning obsolete and dialectal forms, but oppose mentioning obsolete spellings. These should be listed in the Alternative forms heading of the standard spelling. — Ungoliant(Falai)22:24, 9 July 2013 (UTC)
I would be in favor of consistently downplaying non-standard, literary, dialectal, obsolete, and archaic inflected forms by putting all of them under something that:
only optionally displayed and
didn't add to the vertical space taken up by the inflection line except to display something like "± other inflected forms".
I don't think that we need to dedicate precious screen space to usage notes on this matter, which is of interest mostly to specialists. I'd be happy if it was all on a subpage or an appendix, though these options would require changing all of the links to those forms.
The entries for the forms are already available to help someone decode them and provide a home for usage notes of arbitrary length. If we would like to enable comparison among the forms by having them on a single screen the optional display approaches would seem to offer a way of accommodating specialist needs without cluttering the screen and driving away what normal users we may still retain. DCDuringTALK22:27, 9 July 2013 (UTC)
Mentioning the archaic and dialectal forms in the Inflection/Conjugation heading sounds nice. The lack of mention of -eth and -est forms is a major shortcoming of our English entries! — Ungoliant(Falai)22:35, 9 July 2013 (UTC)
Yet, those forms aren't in current use, except possibly in dialectical pockets. Adding such content right into the inflection heading substantially increases the likelihood of user confusion.
Personally, I'd push for such archaic and alternate information to be 1) optionally displayed, if kept on each individual entry; and 2) more ideally, kept to an appendix. Endings like -eth and -est are largely irrelevant to the modern English language, and as such, I'd really like to avoid cluttering up modern English entries with these historical details.
-eth and -est are used in the KJV and in Shakespeare’s works, which are still very widely read. Naturally, they should have qualifiers specifying that they are archaic. — Ungoliant(Falai)22:45, 9 July 2013 (UTC)
I think that, as a rule of thumb, we should only include forms that might be at least in somewhat common use in the last 150 years maybe. So the -eth and -est forms would fall out of that, but Dutch plural imperatives (which were still used in written Dutch in WW2) would be shown. —CodeCat22:47, 9 July 2013 (UTC)
I don't think it makes any sense to have an appendix of "-est" and "-eth" forms: such an appendix would basically be an enormous list of all verbs, with first "-est" and then "-eth" suffixed to them. What's the point of that? I think it makes more sense to list each form with the lemma it's a form of — and the regularity with which those forms are formed should also make it easy to add them to inflection tables. I do think they should be limited to those (collapsed) tables in an ====Inflection==== or ====Conjugation==== section beneath the definitions, though. They shouldn't be in headword lines, even marked as "archaic" or "obsolete", and neither should sitten and laught, because I agree with Eirikr that that is mostly just confusing. I have encountered people who used "-est" forms, because they couldn't stand that English didn't have as many person and tense suffixes as they were used to, and didn't understand/accept that saying "sittest" made them sound ridiculous! - -sche(discuss)23:18, 9 July 2013 (UTC)
Appendices organized around the inflectional suffixes are not what I had in mind, rather a single appendix, section of an Appendix, or a subpage that contained full information on all inflected forms of a given verb (and its morphological relatives ?).
I don't know that we can deal with pathological cases such as those individuals you've encountered. Were they contributors we might have to, but otherwise we can ignore them!
I have the feeling that different languages are in different situations. English and Chinese (?) seem to have the worst problems of entry clutter. English does not have inflection tables in which many such complications could be concealed, etc. DCDuringTALK01:27, 10 July 2013 (UTC)
Inflection tables for English verbs might be nice, but would they really give much additional value? —CodeCat01:30, 10 July 2013 (UTC)
I think they would, especially since early Modern English has more complex inflection then current English. It would be very helpful to someone reading an old text if we helped them learn the whole paradigm rather than forcing them to come back and look up each form as they encountered it. I like the idea of a collapsible box labeled with something like "obsolete conjugation of <insert headword her>", so it doesn't get in the way for those who aren't interested. Chuck Entz (talk) 02:46, 10 July 2013 (UTC)
I think we should mention archaic/obsolete/whatever inflected forms, but I don't think we should list them in the headword line. In other languages, the headword line is for the principal parts, while the ===Conjugation=== section is for other inflected forms. I'd say laugh should just have "laugh, laughed, laughed" in its headword line and sit should just have "sit, sat, sat"; the other things like laughest and sitteth and whatnot—even the modern regular forms like laughs and sitting, which are not traditionally considered principal parts—should be in the Conjugation section, where their {{obsolete}} labels can be less ambiguous and where they don't clutter up the area where your average reader is looking to find out the most obvious information. —Angr16:20, 10 July 2013 (UTC)
I agree. The headword line is meant for readers who want to learn the verb's basic inflections at a glance. It is suited primarily for users who already know how to conjugate in English and just want to know how this particular verb is conjugated. Apart from a few verbs, the 3rd person singular and the present participle are always perfectly regular, so adding them doesn't really provide this type of user with any more information than they already knew. An inflection table on the other hand can easily list all forms, old and modern, and show their proper relationship to one another, so it is more suited to the few users who want to know more details. —CodeCat18:29, 10 July 2013 (UTC)
I see no reason why a modestly inflected language like English should be strait-jacketed into a mold that suits inflected languages. The obsolete and archaic inflected forms are not or great use for encoding and if one has found one's way to the lemma from the inflected form, one has done most of the decoding. All that the presence of the form on the page does is confirm that one is on the right page. In addition the dialectal and non-standard forms are not a particularly good fit with the format of a conjugation. Finally, I believe that this will push Wiktionary further in the direction of putting off would-be monolingual English users - and contributors. DCDuringTALK17:12, 10 July 2013 (UTC)
@-sche, my thought about an appendix was to include not every single verb with such inflections, but rather to describe the process of inflection and provide a sample table or two showing the different forms. This would be much like what we already have at Appendix:Japanese_verbs or Appendix:Spanish_verbs. We could just expand Appendix:English_verbs to include a section on Elizabethan forms, for instance. Teaching the user how to inflect would be much more useful than requiring them to look up each verb to find the inflected forms.
In addition, I share DCDuring's concerns that we might alienate a portion of our user base if we clutter up our entries with too much ancillary information. I had thought that the Appendices were created precisely to offload such information from the main entries, hence my suggestion for including obsolete and/or dialectical inflection information there. ‑‑ Eiríkr Útlendi │ Tala við mig18:01, 10 July 2013 (UTC)
I fail to see why monolingual English speakers would be put off by a headword line that is relatively tidy and a Conjugation section beneath the definition that they can easily ignore if they're not interested in it. —Angr18:55, 10 July 2013 (UTC)
Provided the conjugation section is collapsible, and that all obsolete and dialectical forms are clearly indicated (possibly in separate “Historical forms” and “Dialectical forms” tables?), I might be persuadable. However, why duplicate obsolete and non-standard information on lots of pages, instead of just consolidating it in one place in an appendix? That seems like a lot of busywork for human editors. -- ‑‑ Eiríkr Útlendi │ Tala við mig19:12, 10 July 2013 (UTC)
I guess for the same reason we duplicate modern and standard information on lots of pages, because we have room and there's no reason not to. Also, including the obsolete and dialectal (not dialectical, which means something quite different) forms that are attested prevents the implication that all verbs have these forms. In other words, by including writeth at write but excluding *faxeth at fax, readers know that fax is not attested with archaic endings. If all we have is an appendix saying that verbs have an archaic 3rd person singular present-tense form ending in -eth, readers don't know which verbs are actually attested with it and which aren't. —Angr19:34, 10 July 2013 (UTC)
Because we want people to be able to find the forms on the lemma page via the search box, in case someone doesn't want to bother creating a whole alt-form entry for every word that has attested archaic forms. You want to have the form somewhere that the search will see it: I doubt most people would go to the trouble of setting the search to include the appendices as well as mainspace (if they even knew how). Chuck Entz (talk) 03:05, 11 July 2013 (UTC)
If "thou"/"-est" and "you"/∅ forms get separate lines, "-eth" and "-s" (third-person) forms deserve separate lines. Alternatively, the "-est" and "-eth" forms could be listed after a <br> in the same cells as the ∅-second-person and "-s"-third-person forms, with superscript tagging them as archaic... but some users might take that to mean the whole cell was archaic, so perhaps separate lines are best.
We should almost certainly have a different table for be than for other verbs, in recognition of be’s complexity and other verbs’ simplicity. Consider that "be" has, in effect, two conjugations (one of them mostly obsolete); it has forms like "beest" beside "art" beside "are", "beeth" beside "is"; it has "wast" and "wert"; etc... and that's not to mention dialectal forms.
The past tense subjunctive of be is not archaic. If it were archaic, I would have said "if it was" rather than "if it were", right?
I like the idea of a separate (collapsed) table for the entire archaic paradigm, so one can see when the archaic forms are the same as the modern ones: most people who try to imitate archaic speech use the -eth or the -est forms for everything. Perhaps we might think about having inflection tables directly under the headword, instead of under a separate header, with the collapsed table labeled as "Inflection of" or "Conjugation of" or "Declension of". I wonder if it's possible to have multiple collapsible tables nested inside a master collapsible table, so that an entry with alternative inflections would take up the same amount of space as one with only one Chuck Entz (talk) 03:05, 11 July 2013 (UTC)
If any change to accommodate this takes on average a single additional line of vertical screen space, it is exactly the wrong direction for such changes. The screen device to display this material should be on the inflection line, not under a separate header. DCDuringTALK22:42, 10 July 2013 (UTC)
To -sche: I guess you are right. "be" is probably the only verb with a past subjunctive that is still in use, probably because it has a distinct form in the singular. All other verbs have a past subjunctive that is identical to the indicative. The situation in Dutch is not so different, except that the identity is restricted to regular weak verbs in Dutch; irregular weak verbs still have a distinct past subjunctive singular. But that is mainly because Dutch still distinguishes singular from plural everywhere in the verb paradigm whereas in English they have fallen together. In fact, apart from that difference, Dutch and English conjugation really isn't that different, so Dutch can serve as a fairly decent model for a possible English conjugation table. And I think the Old English/Middle English tables were based on Dutch as well. —CodeCat20:05, 11 July 2013 (UTC)
I would suggest a table like this: User:-sche/en-conj-table/use. I think it's inaccurate to have a separate cell for the subjunctive plural archaic and then call it archaic. Given that statements like "if he were to walk in — if he walked in — right now, I would tell him" are plentiful, I think the most plausible analyses are (1) if it exists as such, the subjunctive plural form is still used, or (2) no English verb other than be has a subjunctive plural form (because that form, and the indicative third-person plural past tense form, merged with the indicative first- and second-person plural past tense forms into a generic past tense form). I think the second option is more sensible, and it's reflected in the first of these mocked-up tables. - -sche(discuss)22:26, 11 July 2013 (UTC)
Take a look at this table and let me know what you think of what's displayed. We probably want to rewrite the template's guts; I copied the parameter names from CodeCat's table, but we probably want to use different names or make the parameters positional, we may want to code the -eth form as {{{pres}}}eth rather than a separate parameter, and we definitely need to add graceful support for multiple past tense and past participle forms, whether they are common (dreamt vs dreamed) or archaic/dialectal (laughen), since listing all forms is the point of the table. But I'm not asking about the template's guts at the moment, I want to know if the table as displayed is OK or if it's missing any fields. Note that it is intended only for use on most English verbs; complex cases like be and may need to be handled by different template(s). PS, for comparison, look at the small and large tables de.Wikt has of English verb forms. - -sche(discuss)01:20, 18 July 2013 (UTC)
I made the parameters named because the template was not intended to be used directly in entries. Rather, the template would be filled with forms by another template. Keeping presentation (the table) separate from code/logic (the creation of the forms) is important. —CodeCat01:39, 18 July 2013 (UTC)
Clutter in our entries
@Angr: "because we have room and there's no reason not to"
Yes, our hosts don't seem to complain a bit about storage. We can find locations for any kind of semi-lexicographic context we care to contribute. But users are staying away in droves. Remarkably, some of those who come actually take the trouble to complain about our layout. Most complaints seem to be about how hard it is to find the content that they want, ie, definitions. How do we respond? We provide more and more content other than definitions and fail to significantly improve the definitions we have, even when the faults are obvious. We cater to those who could learn to navigate and customize their way around any complications and give the back of our hand to newbies, ignoring for their expressed complaints. Most websites seek out user response and attempt to adjust the overt appearance of the site (landing pages, ie, our content pages and failed-search pages) to the expressed complaints. Not us, after all, we're volunteers, just doing what amuses us. Don't we have some obligation to serve the broad population of users? DCDuringTALK20:17, 10 July 2013 (UTC)
We could solve all of these problems by putting everything (and I mean everything, including translations, pronunciations, etymologies, derived terms) under the individual definitions, collapsing sections as necessary and duplication be damned. Dividing entries by parts of speech is ridiculous. DTLHS (talk) 20:27, 10 July 2013 (UTC)
A while ago, someone (Ruakh?) proposed collapsing translations under each definitions, just like quotations. It would make editing the entries harder—unless we used HTML comments to separate each definition from the others by a few lines of whitespace, it could be hard to find each definition amid the ensuing clutter—but it would stop people from changing and removing definitions without updating the translation table glosses, it would stop people having to scroll back and forth between the defs and the tables of long entries like ], and it would reduce the amount of clutter than comes between definitions in multi-POS/multi-etym entries like ]. While recognising that it would take massive effort and would be a great upheaval that it would take a while to get used to, I would support it. I oppose merging POS sections, and I couldn't support collapsing pronunciation info under individual senses, either (how would that even work, in an entry like ])? - -sche(discuss)20:46, 10 July 2013 (UTC)
@DLTHS: I mostly agree, but it's hard to find an English dictionary that doesn't respect parts of speech as an organizing principle and hard to find an "unabridged" one that doesn't differentiate by etymologies. But pronuncations, etymologies, usage notes, derived terms, synonyms, and translations, let alone antonyms, hyponyms, hypernyms, anagrams, descendants, et al, — though they may set us apart from some other dictionaries — can't overwhelm the core content. DCDuringTALK20:53, 10 July 2013 (UTC)
You're right about pronunciations, that would get ridiculous very fast and most words share pronunciations between all of their senses. I don't mean to suggest abandoning part of speech as an organizing principal, just that it shouldn't get an entire heading- more like a small label at the beginning of each definition line. DTLHS (talk) 20:56, 10 July 2013 (UTC)
@DCDuring: One thing we could do about semantic relations is move all of them to Wikisaurus, and then have only one ====Semantic relations==== header with a link to any relevant Wikisaurus pages. The way we currently duplicate lists of synonyms and antonyms in entries and in Wikisaurus results in both clutter and in the lists falling out of sync. - -sche(discuss)21:02, 10 July 2013 (UTC)
For interest and comparison, here is how dog is presented in Chambers Dictionary (taken from the CD-ROM edition from 5-10 years ago, whose layout is more or less identical to the print edition — hence the abbreviations and general "tightness"). Related terms (dogged, dogger, doggess, etc.) are listed on separate lines underneath, but still within the dog entry and under that headword. Equinox◑21:16, 10 July 2013 (UTC)
dog1 n a wild or domestic animal of the genus Canis that includes the wolf and fox; the domestic species, diversified into a large number of breeds; a male of this and other species; a mean scoundrel; adj and as combining form of dogs; male, opp to bitch; spurious, base, inferior (as in dog Latin). adv esp as combining form utterly. vt (dogging; dogged) to follow like a dog; to track and watch constantly; to worry, plague, infest; to hunt with dogs; to fasten with a dog.
I oppose putting obsolete English spellings such as hopeth on the inflection line. Thus, I support removing obsolete spellings from the headword lines of such entries as laugh. I imagine a heading somewhere down at the bottom of the entry like "Obsolete forms", where both obsolete inflected forms and obsolete alternative forms could be listed. Thus, in knowledge, "Obsolete forms" would list knolege, knowlage, knowleche, etc., while, in laugh, it would list laugh'd, low, and the like. If listing inflected forms, the list could start with "Inflected forms:" or the like, to make it clear, but still under "Obsolete forms" headings. --Dan Polansky (talk) 21:08, 10 July 2013 (UTC)
@-sche: If a large portion of the most polysemous English entries were in good shape, then we could use that reliable structure as an organizer for the rest of our material. I was recently lamenting to myself how hard it is to attempt to improve the structure of the definitions in English entries because they themselves are mostly unstructured lists, without even historical sequence or order of frequency as an organizing principle, let alone some kind of semantic principle.
I guess I misunderstood what "under the definitions" means to others. I was only thinking of pushing the content of non-definition sections below the associated Etymology-PoS groups of definitions in collapsible bars. As to the more radical approach of pushing all content under definitions: if such a thing could be made available optionally to users via different SQL and php from the server or by JS on the client side, that would be great.
Obviously, different editing tasks are facilitated by different UIs. Our current UI seems to facilitate correcting formatting "errors" and mistakes of omission. As soon as one is working on material that does not fit on one screen, the UI is deficient. It seems to me that, to a great extent, we don't want structural modification of polysemous English entries, no matter how poorly organized - and, therefore, hard to edit - they are. DCDuringTALK21:15, 10 July 2013 (UTC)
Dan, what about something like what the Dutch entries use? See lopen. Singular and plural can be collapsed for English (except for the present singular indicative). —CodeCat21:23, 10 July 2013 (UTC)
The plural imperative and the subjunctive are archaic and no longer in use, at least not commonly. The subjunctive is found occasionally like in English, but nobody uses the plural imperative anymore except if they want to sound deliberately old-fashioned (like "thou" or "ye" would do in English). —CodeCat18:42, 11 July 2013 (UTC)
So you are pointing to archaic forms being listed in a collapsible inflection table in "Conjugation" section of lopen#Dutch. I would not like to see a Czech conjugation table overflooded by obsolete spellings. Are you proposing to make an English collapsible conjugation table with all sorts of weird forms, as others have proposed? I am not very enthusiastic about that. In any case, I am enthusiastic about removing obsolete inflected forms from the headword line. --Dan Polansky (talk) 19:28, 11 July 2013 (UTC)
Italicized Cyrillic is IMHO much less readable. I'd support showing usexes in different font colors, italicized or not. --Ivan Štambuk (talk) 17:28, 10 July 2013 (UTC)
Similarly, italicized Japanese is often illegible. The format Japanese editors have been using for usexes, when entered without the template, is:
First line using kanji (if any) -- not italicized
Second line giving the kana-only rendering (if different from first line) -- not italicized
Third line giving the romanized rendering -- italicized
Fourth line giving the English translation -- not italicized
{{usex|lang=ja|First line with kanji|tr=Second line in kana<br/>''Third line in romaji''|t=Fourth line in English}}
Adding automatic italicization to this template would be a substantial legibility problem for Japanese, unless the lang=ja argument turns off such italicization. ‑‑ Eiríkr Útlendi │ Tala við mig17:52, 10 July 2013 (UTC)
I would prefer it if the parameters for this template were changed to match those of {{l}} more closely. Language code first, then the phrase, then the translation (optional if {{{1}}} is "en" and maybe "mul"). Transliteration would still use tr=. —CodeCat18:33, 10 July 2013 (UTC)
Typographically, italics are suitable only for Latin and Cyrillic; no other writing system should ever be put into italics. I disagree that italicized Cyrillic is much less readable; surely anyone who can read Cyrillic well enough to be reading whole sentences in it in the first place can read it in italics with no difficulty. —Angr19:00, 10 July 2013 (UTC)
I would only leave Roman script italicised. Less than advanced learners are known to misread a few letters. Cursive Cyrillic looks markedly different from normal, e.g. cursive "т" - т (t) looks like Roman m. From Wikipedia: АВДЕИКНОРСУХавдезиопрстухч, cursive: АВДЕИКНОРСУХавдезиопрстухч looks almost like Roman ABDEUKHOPCYXabgezuonpcmyxr (it's about various hand-written styles but partially applies to computer italics as well). Of course, it doesn't apply to native Cyrillic users and advanced learners. --Anatoli(обсудить/вклад)12:47, 11 July 2013 (UTC)
Well I didn't know Cyrillic at all prior to coming to Wiktionary, and it took me some ~15 minutes to learn it (it is easy because letters are so similar to equivalent Latin and Greek script letters). However, several cursive Cyrillic letters are identical to completely different Latin and Cyrillic script letters, which kind of confuses your brain while reading it in running text, causing it to "stop" periodically when you subconsciously interpret e.g. т as /m/ instead of /t/. You need a lot of practice to achieve complete reading fluency (reading entire blocks of words at once without any misinterpretations of letters) in italicized Cyrillic script, which many learners of Russian, etc. with an English-language background don't have. It's not about ability to comprehend it, but it's a needless PITA. --Ivan Štambuk (talk) 22:47, 11 July 2013 (UTC)
usex assumes that you have used "#:" before it (the template is supposed to be used in usage examples). --Z13:09, 11 July 2013 (UTC)
I see. It does seem like a shame not to use it for quotations as well, they use the same format don't they? —CodeCat13:14, 11 July 2013 (UTC)
Yes. Users should be able to change that default behaviour; even in usage examples we sometimes need to use "#::" etc. --Z13:18, 11 July 2013 (UTC)
No. Actually usexes and quotations do not have the same format, and they should not. Quotations are preceded by citation information on the previous line, and so always have "#*:" before them, because the citation of the source has "#*". The distinction is deliberate so that (a) readers can distinguish the made-up examples from the published data, and so (b) quotations can be automatically collapsed while leaving usexes visible. If the two had the same format, then this wouldn't be possible. --EncycloPetey (talk) 18:53, 12 July 2013 (UTC)
But if the extra first line for quotations is the only difference, then everything aside from that can use the same format. A quotation is then nothing more than a usage example with extra info. —CodeCat18:57, 12 July 2013 (UTC)
But that isn't the only difference: "#:" and "#*" are not the same characters; italicized and non-italicized text is not the same. And a quotation is not simply "nothing more than a usage example with extra info". A usex is a made-up example; a quotation is firm data that demonstrates sense and usage. They are fundamentally different things. --EncycloPetey (talk) 19:04, 12 July 2013 (UTC)
Using things like #:: doesn't allow for easy nesting of elements. But the wiki also supports HTML, so we can use <dd> instead of : . —CodeCat13:29, 11 July 2013 (UTC)
Many readers of Cyrillic and RTL scripts don't like italics at all. If you are going to italicize them, I suggest to do that using a CSS class. --Z13:24, 11 July 2013 (UTC)
I've noticed them, but for both proposals I don't understand what they are. Hence, you're the only person to vote. Mglovesfun (talk) 16:43, 11 July 2013 (UTC)
Im an old hat Wikipedian checking in here to deal with a curiosity of mine, namely phrasebook(s), and Wikimedia's status in terms of their development. I know that if I want a developed phrasebook I currently need to go to Wikibooks, which has quite a few, although they are not without their issues (distanced from Wiktionary, disorganized, uncorrelated etc.). And I'm happy to see that Wiktionary has some phrasebooks listed here, although these are not without their issues, and apparently these survived at least one attempt to abolish them altogether.
I'm suggesting that we form a policy with regard to phrases included in Wiktionary which is apparently more liberal than what is currently allowed, but still fitting with the constraints of what we traditionally have called a "diction-ary." For example, in a recent proposed phrasebook CFI (criterion for inclusion), the proposed criterion was for each phrase to be listed in at least three print dictionaries or phrasebooks.^* My line of thinking, and my line of inquiry here deals more with the interlingual aspect of phrasebooks, such that I might propose as an alternative, that any phrase which has direct equivalents in two other languages satisfy the criteria for inclusion. I am tending less toward thinking in terms of the current organization of
English language
Phrases
..which is fine.. but then adding to that an extra categorical dimension of something like:
Phrases
English phrases
English phrasebook
Chinese (Mandarin) phrases
Chinese (Mandarin) phrasebook
Spanish phrases
Spanish phrasebook
etc.
At some point it would make sense to unify existing phrasebooks, which is in fact what brings me here to Wiktionary, and to express my thoughts on this matter. I think the fact that phrasebooks are confined to Wikibooks does not help their development, or their unification. Hence I think their phrasebooks could be ported here. But then I would agree with some who would argue that a phrasebook naturally can be more developed than what is required for a dictionary, and therefore there must be strict constraints on what is admitted and what isn't.^**
The idea I'm proposing is that Wiktionary undertake creating a Unified Phrasebook, for which all included phrases must belong in all (or most) languages, and where certain liberties are taken with translation, then these are explained.
Unified Phrasebook
Common phrases
in English
in Chinese (Mandarin)
etc.
Situational phrases
in English
in Mandarin
Idiomatic phrases (with translated equivalents)
in English
in Mandarin
etc.
This would be in the spirit of a dictionary and not just a Wikibook, although there should always be some overlap between Wikimedia projects, and taking this on would no doubt help to unify and organize Wikibook's divergent phrasebooks. I think where words and translations are concerned Wiktionary can play a vital role in creating a Unified Phrasebook, as well as various well-developed language phrasebooks, which would well-serve both the world and all of humanity. Respects, -Sativen Kuni (talk) 23:14, 11 July 2013 (UTC)
^* (I don't think this is unreasonable given all of the print material available, but the mechanics of such a hard rule tend to chill wiki development, and therefore hard and fast rules should be replaced with something better. In reality, I would hope that should such a hard and fast rule be implemented, it would be only after such pages have been given time to develop and the matter deliberated on talk).
^**To this the straightforward argument goes that Wiktionary is not a traditional paper dictionary and therefore is not obliged to follow any of paper's limitations, except those which serve to constrain its purpose to a singular mission at hand. And the mission at hand in Wiktionary's case seems for the most part accomplished. Hence it might serve this community to do something with itself which is related to its core task, and yet is something reasonably new. -SK
All we need is for someone to actually do the work required for a serious effort. The last major spurt of activity was embarrassing, as are many of the entries that remain. DCDuringTALK00:15, 12 July 2013 (UTC)
A dictionary is indeed a "book!" Or rather it's a 'web'-'site!' (Or rather it may be either a book or a web-accessible database. Or rather it is what is contained in such mediums that is the diction-ary). The term "book" itself has some historical usage - the Bible itself was once called a "book" ;), even when it was all written in scrolls! Of course the term "phrasebook" has its own peculiarities, such that its essence is the term "phrase" and the term "book" is an affix that indicates that the object is 'a compendium of information' about phrases. Can a diction-ary be also a compendium that deals with phrases? No! A diction-ary must be about dic-tion, and not anything else. A phrasebook, or a compendium of individual language phrasebooks would by necessity belong at a phrase-ary. And in any case if Wiktionary were to do anything different - anything outside the strict confines of a diction-ary, we would have to call Wiktionary by a different name, perhaps even something better. Regards, :) -Sativen Kuni (talk) 00:45, 12 July 2013 (UTC) Oh, Jeez that Blackadder clip was good. -SK
Sorry but your posts are a bit too wordy, they sound like slogans. It's not clear to me what you need. Importing whole phrasebooks completely may not sit well with opponents of the phrasebook and may get into RFD (deletion) process straight away, especially repetitive, vulgar, rarely used, unnatural or otherwise bad phrases. You can try adding appendices (carefully). We already have a phrasebook structure (not a perfect one) split by languages and existing appendices. We don't know you yet, we've seen you talking but we haven't seen you working. --Anatoli(обсудить/вклад)01:10, 12 July 2013 (UTC)
Sorry, that last bit was mostly having fun with the idea of what a "book" is and what a "dictionary" is. As for me, I have 45k edits on en.wiki since 2002. I have edited here over the years usually without logging in, but not too many edits overall - no more than 100 edits or so. Now with regard to the idea of a phrasebook, in short I am proposing that Wiktionary organize a core phrasebook which stays constrained in accordance with a formal approach, and Wikibooks will assist. Wikibooks at the same time builds upon that core phrasebook to better develop its own phrasebooks, and feeding back to Wiktionary core phrasebook(s). A synergetic idea, one where the overlapping purposes of the dictionary and the open book project get together. -Sativen Kuni (talk) 01:45, 12 July 2013 (UTC)
DC, I can step in and do what I can. Organization is a big part, and in order to do it right the whole idea from conception to construction has to be sensible and agreeable. Once the structure is there the rest is just filling things in as we go on. I think things are halfway there - Wikibooks (whom I will invite) has a lot going for it, but note that for each separate language book there are a different set of goals and constraints. At least they all have the guiding principle of being a useful compendium. The only thing we have to do is figure out how to put it together in a more organized and useful way. K'plah -Sativen Kuni (talk) 02:28, 12 July 2013 (UTC)
Job vacancy: FWOTD-setter
Foreign Word of the Day is now “hiring” an official co-FWOTD-setter. While anyone who knows what he is doing can set a FWOTD, we need someone who can commit to making sure every day has a FWOTD (and take the blame when things go wrong).
Requirements:
Must be able to cope with the fact that next to no one in the world cares about FWOTD.
Since not enough people nominate words and even those who do often nominate entries without the requirements (pronunciation and a citation,) you must be able to actively seek out, pronounce and cite words.
I'll do it. Since I'm chiming in here I may as well do something useful, and curious and novel foreign words are quite interesting to me. Plus I can deal with IPA, adapted IPA (adapted to English phonology), some translation breakdowns, some etymology, etc. Hire me. -Sativen Kuni (talk) 02:10, 13 July 2013 (UTC)
As far as I know, there are three or four recognised varieties of Kurdish, called Kurmanji (Northern), Sorani (Central), Kermanshani (Southern) and sometimes Laki which is also often grouped under Southern Kurdish. Currently, our Kurdish entries are all placed under a common Category:Kurdish language. However, ZxxZxxZ argues on User talk:ZxxZxxZ that the dialects are different enough to be considered separate languages. The Wikipedia entry corroborates that as well. Furthermore, there is a difference in script: Kurmanji, the most spoken variety, uses Latin script, while the others are written in an Arabic variety. This means in practice that different varieties necessarily require a different entry because they are written in another script. So should we retire the code "ku" in favour of the codes for these varieties? —CodeCat18:45, 12 July 2013 (UTC)
All this (the "different enough to be separate", different scripts) is analogous to Serbo-Croatian. I'd recommend that we treat Kurdish the same way we treat Bosnian/Croatian/Macedonian/Serbian. --EncycloPetey (talk) 18:48, 12 July 2013 (UTC)
Same is with Hindi/Urdu, as well as Romanian/Moldovan, yet we treat them as "different" languages. In Serbo-Croatian it becomes problematic because Bosnian/Serbian/Montenegrin accept both Cyrillic and Latin script, and Croatian is Latin-only, so there would be massive duplication of content if we decided to split them. Hindustani and Romanian-Moldovan link to one another in headword line, so they are de facto treated as a single language, even though they are under separate L2s.. If the only major difference between these Kurdish varieties is script, then perhaps the best option (not to hurt anyone feelings and all that..) would be to treat them as different languages, but link to each other in the headword line or in a box similar to {{fa-regional}}. --Ivan Štambuk (talk) 19:44, 12 July 2013 (UTC)
Whoops. But the point remains, in practice there it makes no difference in treating them as "different languages" or not if the only major difference is script. --Ivan Štambuk (talk) 20:38, 12 July 2013 (UTC)
WP lists the following differences:
The passive conjugation: the Sorani passive morpheme -r-/-ra- corresponds to -y-/-ya- in Gorani and Zazaki, while Kurmanji employs the auxiliary verb, come;
a definite suffix -eke, also occurring in Zazaki;
an intensifying postverb -ewe, corresponding to Kurmanji preverbal ve-;
an 'open compound' construction with a suffix -e, for definite noun phrases with an epithet;
the preservation of enclitic personal pronouns, which have disappeared in Kurmanji and in Zazaki;
a simplified izāfa system.
These don’t seem significant enough to consider them different languages. Analogous ones can be found between European and Brazilian Portuguese.
However, Kreyenbroek mentions: “For example, Sorani has neither gender nor case-endings, whereas Kurmanji has both”. This seems more serious, but it might be just a characteristic of informal Sorani (compare how informal Brazilian Portuguese doesn’t have plural nouns.) I note that دیوار and شیر have genders specified.
That list doesn't contain all of the differences, and Sorani and Kermanshahi don't have gender at all. BTW, the differences between Kermanshahi and Kurmanji is even much more. --Z19:22, 13 July 2013 (UTC)
All of the comments pointing out how similar the varities of Kurdish are make me wonder how we justify considering Nynorsk and Bokmal separate languages... - -sche(discuss)20:38, 12 July 2013 (UTC)
ZxxZxxZ has now started creating entries in Central Kurdish (Sorani), because the language code for it was never deleted or disputed. —CodeCat14:55, 19 July 2013 (UTC)
Ha! They were separate all along! Shouldn’t we rename all the Sorani entries from Kurdish to Central Kurdish then? What about renaming the language to Sorani? That’s the most common name by far, it appears. — Ungoliant(Falai)07:19, 22 July 2013 (UTC)
I support the term "Central Kurdish" (also "Northern Kurdish" and "Southern Kurdish"), since "Sorani" is, strictly speaking, an inaccurate term. --Z00:05, 4 August 2013 (UTC)
There's been talk of merging Hindi and Urdu into Hindustani, just nobody's done a formal vote on the matter. I suspect such a vote would be successful. Mglovesfun (talk) 10:18, 4 August 2013 (UTC)
Particularly those extracted on glosbe. These contain many common situational dialogues that would be very useful to have in entries. Translations themselves are community effort and thus free, but what about the original English subtitles? Are such sentences (presumably part of the movie script) themselves under some kind of copyright? Their database is huge (for Croatian alone there are 10 million pairs of usexes en-hr) and it could save a lot of time for editors. --Ivan Štambuk (talk) 23:17, 12 July 2013 (UTC)
It would be the same situation as quoting from a book. As long as there is proper attribution and we don't host the entire thing on wiktionary it should be fine (subtitles definitely have some copyrighted status, enough that it would be impossible to host them on any wikimedia project). DTLHS (talk) 23:59, 12 July 2013 (UTC)
Subtitles suffer from the need to translate words and phrases of another language without taking longer than the original, and without any explanatory text / footnotes. It takes someone fluent in both languages and especially skilled in the language of the subtitles to do it right. Instead, the economics of the movie industry dictates a result that butchers both the dialog and the language of the subtitles- the "all your base are belong to us" phenomenon (that's from a video game, but the same principles apply to both). I'm skeptical of using them to illustrate normal usage. Chuck Entz (talk) 00:46, 13 July 2013 (UTC)
Even English subtitles from English language movies don't always match the spoken dialogue. Lines may be omitted, changed, or otherwise butchered. I've seen some truly hilarious examples. One egregious case that comes to mind (and pulled from the DVD for this comment): Spoken: "There must be about a dozen wrecked spaceships out there." Subtitled: "There must be about a dozen red spaceships out there." In another instance I've seen recently, "Moon water" became "Need water" because the person doing the subtitles didn't get the imagery and themes running through the program, but I have the advantage of additional resources that allowed me to very that "Moon water" was correct. So, I agree with the general skepticism noted above, but also agree that a fluent speaker in both languages can better make the determination in a particular instance. --EncycloPetey (talk) 01:38, 13 July 2013 (UTC)
Yes, there's nothing wrong with using subtitles to illustrate usage, as long as they've been reviewed by a native speaker. Let's not go crazy and start importing the entire database automatically though. DTLHS (talk) 01:54, 13 July 2013 (UTC)
glosbe seems like an amazing resource, just need to know how to it properly. The translations are not literal, just conveying the moods, there are also many mistranslations like in Google Translate, so one needs to know the target foreign language to be able to use correctly. --Anatoli(обсудить/вклад)04:05, 13 July 2013 (UTC)
Well you can download the entire OpenSubtitle database used by glosbe in any number of language pairs, however you get two giant text files with lines in one file in one language corresponding in translation to same (by order) text lines in the second file, without proper movie attribution. However, there is a separate search interface that enables tracing any particular word/phrase. Yes there are many errors in the translations, and everything needs to be checked first. I'm not saying that usexes should be imported en masse mechanically and without reviewing. What I'm asking is whether it's appropriate (in terms of copyright and all) for me to cherry-pick several (perhaps dozen at most) usages from glosbe/OpenSubtitles databases, rectify and modify them appropriately, and use them without attribution as usexes at Wiktionary entries. It would save me a lot of time. --Ivan Štambuk (talk) 12:39, 13 July 2013 (UTC)
Thesaurus
Why not? At bottom of each item, ideas words and terms linked to this one or its antonym. Would assist anyone who is trying to think of a word. The thesaurus section could be hidden, and shown only upon request (like the translation section).
Its use could be "voted" with an "easy +-" button process, the way categories are added and removed on some websites, (and similar to the interlinks system, so that highly used terms are moved up. And also voted against - if they have nothing to do with a word, or if the linked word is controversial in this context.
Yes, thanks!! I started a discussion on that talk page. Why do you say: 'its July now'? Is there a problem with my signature as you see it. The previous entry says it was signed at 15:34 (that's 19:34 my local time) on 11 July 2013 (UTC). Pashute (talk) 17:48, 11 July 2013 (UTC)
OK, I'm proposing an inline thesaurus, where each definition has a new collapsible thesaurus section. Each term or phrase in the thesaurus will be preceded by 'TH:' and end with |n where n is a number for storing "usage" as follows: The users can "vote" for the word or phrase they were looking for, pushing up the usage of that thesaurus-link, and allowing for the creation of a web cloud view.
The 'Thesaurus' section will include words and phrases that are not necessarily defined in the Wiktionary, and possibly linked to other wiki's. For example the term Word in its definition as software, can link to 'word processor', to w:Microsoft Office or w:Google Docs. For its definition as 'part of a sentence' it can link to non synonymous terms such as sentence, discussion, write, saying etc. which in turn can lead to the next level of links such as write->pencil, discussion->newspaper etc.
The Wikisaurus space could then use this information as its database, and show the thesaurus entries for all the definitions of a term or phrase. Most of the benefits of the Wikisaurus described in Dan Polansky's page will be preserved, and all the information about words will be stored in one place. It could also show several levels in a "star view" or in a tag cloud (with the user selecting how many levels they want included) - in the future. Pashute. Its easy to implement, and will immediately even before the views, add to the popularity of Wiktionary in all web searches. (talk) 10:41, 14 July 2013 (UTC)
Some background: Old Church Slavonic (OCS) is a language attested in a set of manuscripts that are usually called the OCS canon. These manuscripts are written in two scripts - Glagolitic and Early Cyrillic. Glagolitic part of the canon is older and larger, and for most of the Cyrillic monuments can be shown that they stem from Glagolitic originals. Back in 2007/8 I created many OCS entries in Cyrillic and a few of them in Glagolitic that redirected to one another as mutual alternative forms. (NB. Many of those Cyrillic-script entries are "wrong" because the newer version of Unicode 5.1 added proper support for Old Cyrillic letters). Recently CodeCat has started doing some cleanup and expansion of OCS entries, with Glagolitic spellings being redirected to Cyrillic (like this) as alternative forms via the {{cu-Glag spelling of}} template in order to reduce duplication of meanings and etymologies. I object to that kind of redirection as because:
neither script is more "proper"
OCS spellings in the MSS have many variations. Sometimes Glagolitic texts are transcribed into Cyrillic in the dictionaries by substituting some special symbols as conventions because scripts do not map directly 1-1 in order to preserve the original spelling. Each particular spelling is important, as it is attested, on linguistic and paleographic grounds. We should have both normalized entries in a specific scripts, and all of the typographical variations in each. What is an original attestation and what is an unattested Glagolitic/Cyrillic transcription should be clearly marked.
We're dealing with a well-defined and limited vocabulary of a few thousand words, not an open-ended set with infinite combinations. Any kind of duplication would be finite in scope.
Giving priority to e.g. Cyrillic could be treated as blasphemic for Slavicists not coming for Orthodox countries where Cyrillic script is native (and where transcription to Cyrillic as opposed to Latin is more common). Wiktionary should not be making value judgements on what is the "true" spelling. We must be neutral on such matters.
Mirroring the disputed content in entries (that would mostly be sections for etymologies, meanings with possible citations, as well as references) could be easily done automatically. I volunteer to do all that myself.
The question is - whether to soft-redirect Glagolitic spellings to Cyrillic via the aforementioned template or not, with the issues I've listed in mind. Since CodeCat and I cannot agree on the matter (practicality and efficiency vs. cultural issues and preciseness), we ask community for more input. --Ivan Štambuk (talk) 20:52, 14 July 2013 (UTC)
I'm not sure I agree with point 2. Normalisation of spellings is quite common, and we don't make note of this anywhere for any language. Old English entries are placed on a normalised lemma with others linking as alternative spellings. In Old Norse, it's even the norm to normalise among scholars, and barely anyone even uses the originally attested spellings. I'm not saying we should be excluding attestable spellings of course. But for the sake of efficiency and consistency, we should place the definitions on the normalised/most common/etymologically most original variety. By that last point I mean that if a variety is attested spelled with both ь and є then we should normalise to ь. —CodeCat21:05, 14 July 2013 (UTC)
Regrettably (in terms of practical application) we should probably have full entries for both as we do for Serbo-Croatian. Unless there's a convincing reason to prioritize Cyrillic. I also don't think that the fact that Cyrillic is now more widely understood is a valid reason. Mglovesfun (talk) 21:10, 14 July 2013 (UTC)
I agree with Ivan Štambuk, because Slavicists from different Slavic countries have different preferences in this matter. It’s like other languages such as Serbian, Ojibwe, and Yupik, where there are two different alphabets in use, the alphabets do not map directly 1-to-1, many users insist on one or the other (and may be literate in only one of the alphabets), and especially since in this case there is a limited lexicon and Ivan Štambuk has volunteered to do it himself. —Stephen(Talk)21:16, 14 July 2013 (UTC)
CodeCat, original spellings are very important to paleographers and linguists. That's why we have facsimile editions of OCS canon manuscripts that are read by every student taking OCS classes, and not merely scholarly transcriptions into Latin. Every little dot, dropped sound, preference for a particular typographical variant or even the shape of letters (e.g. transitions from angular to round Glagolitic) tells us something on the document's history. The are many Slavic literary traditions, schools and cultural milieus and each deserves an equitable treatment. Dictionaries such as Старославянский словарь по рукописям X-XI веков come with very large introductions on the conventions used to lemmatize, and every headword carefully lists all of the variant forms, and where they are attested. These manuscripts haven't survived centuries so that we merely redirect an attested Glagolitic word into an artificially constructed Cyrillic equivalent. This efficiency and consistency argument is being (ab)used way too much. --Ivan Štambuk (talk) 00:11, 15 July 2013 (UTC)
What would a lua-cized translation template look like?
Assume we merged {{trans-top}}, {{trans-mid}}, {{trans-bottom}} into a single lua module. Some obvious features this would make possible would be custom sorting of languages, automatic verification of language names / translation format, and automatic nesting of dialects / scripts. Technically we could keep the same format (with {{t}} et al) and write the module around that, but if we're rethinking things, what format would be easily parseable (for lua and humans), fast (-er than the current implementation?), and allow whatever additional features are desired? DTLHS (talk) 23:41, 14 July 2013 (UTC)
With Lua we can do things like this :
{{translations|sense=Unit of language
|it = ] ''f''
|fr = ] ''m''
}}
or
{{translations|sense=Unit of language
|Italian = ] ''f''
|French = ] ''m''
}}
More efficient at what it's doing, yes, but the above suggestion eliminates many of the features we currently use in the Translations sections, such as external links to other Wiktionaries. The above example is also not alphabetized, an issue that (in order to keep it correct) would require rewriting a lot of all the tools we currently have for checking this and for automated editing. It would also need to be able to handle things like Ancient Greek and Modern Greek, which are grouped and indented when both are present). It would have to handle the automated transliterations of certain languages. It should use the standard gender calls, not simple italicization. It would need to link to the correct language section of the linked entry. Etc., etc., etc. It's an interesting idea, but the proposal would need a LOT more work to show that it's even feasible. --EncycloPetey (talk) 17:02, 20 July 2013 (UTC)
The German Wiktionary long ago enabled a feature called "Stabilversionen". I think the official English name is "flagged revs", but I'm going to call it "patrolled revisions" because I think that's clearer. The point is this: as usual, anyone can edit any article, but when people go to an article, they are initially shown the last diff that a trusted user has marked as vandalism-free, rather than the most recent diff. Readers can optionally click to see the most recent diff, and if they go to edit the page, they are of course shown its current contents. Patrollers are shown the status of pages in their watchlists, in user contributions lists, and on pages themselves. All unpatrolled pages are also stored in a central log. The feature basically removes the urgency of patrolling and allows it to be done at leisure. As it is now on en.Wikt, if someone misses a bad edit while patrolling, it may be months or years before it is noticed. (I recall someone finding a months-old advert for a band in one of our entries.) So I ask: do we want to enable this feature here? - -sche(discuss)02:08, 15 July 2013 (UTC)
It certainly does seem useful, but I hope there is also a way to keep track of the backlog. We don't want entries to have useful changes for months before we approve them. —CodeCat02:15, 15 July 2013 (UTC)
The log of all unpatrolled pages is sorted, with the oldest unpatrolled revisions listed first. On de.Wikt, 79 diffs have gone unapproved for more than a year, often because they are changes to Swahili noun classes or to Arabic vocalizations, etc — things there aren't many people on de.Wikt to check. 94.09% of de.Wikt's entries have had their most recent diff patrolled. And en.Wikt has more active users than de.Wikt who could patrol, and has users from more linguistic backgrounds — you happen to know about Swahili noun classes and several other people know about Arabic vocalization. Thus, we should be able to process things more quickly than de.Wikt. After all, we (read: SemperBlotto) already do(es) process most things using our current patrolling setup... this would just prevent things from slipping through cracks. - -sche(discuss)02:58, 15 July 2013 (UTC)
The noun class that needed to be checked was here: the entry itself is actually Xhosa; the change (presumably) went unchecked because the template used to add classes was and is named as if only used for Swahili, though classes are (as Jcwf's summary notes) used in many languages. The oldest unchecked Arabic edit is this one; does anyone here know if it's OK? This Dutch pron also needs to be checked. - -sche(discuss)04:10, 15 July 2013 (UTC)
The Dutch pronunciation looks ok, although I would put a length mark after the o, because Dutch has no phonemic distinction between those two sounds (unlike between and which are marginally distinct). As for noun classes, Module:gender and number supports them, so you can use them as if they were genders. —CodeCat11:55, 15 July 2013 (UTC)
Well, certainly I have seen it on other Wikis - but I haven't investigated it. Are our sysops any more likely to use this system than the current one? I was also worried about backlogs:- for instance, the current German word-of-the-day has 9 changes pending review (unchecked since August of last year). Would it be easily turnoffable? SemperBlotto (talk) 18:58, 15 July 2013 (UTC)
Suggestion for additional information on the landing page for a "deleted" entry
This thread on the Feedback page got me to thinking. I've seen similar complaints a number of times in the past, that an anon added an entry, someone deleted it, and the anon re-creates the entry in quick order, only to have that deleted again and then the anon gets blocked. This leads to confusion, alienation, and often the loss of a potential contributor.
Would it be possible to add some kind of additional information on the landing page for a deleted entry, to clue in anons as to what to do? I.e., hints for why the entry might have been deleted, links to relevant pages about format etc., and links to the fora and/or editors who can help newbies? ‑‑ Eiríkr Útlendi │ Tala við mig17:42, 15 July 2013 (UTC)
I was thinking. In definitions we generally only link to English words. So it makes sense to use {{l|en}} to link to the English, as words don't exist independently of language. I struggle to think of any time where it's best not to link to the language section but rather to link to the page as a whole. So the question is (getting to the section heading) where should square brackets such as ] be used? Under what circumstances are they better than a link template? Mglovesfun (talk) 10:29, 16 July 2013 (UTC)
A while ago I proposed using a special shortcut {{d|definition}} for linking to English terms in definitions rather than {{l|en}} which is longer. But that never went anywhere. —CodeCat11:54, 16 July 2013 (UTC)
In English and some Translingual sections in Latin characters the benefit from using templated links instead of simple wikilinks seems non-existent, but their use does require template expansion. Any revision in any of the templates underlying their deployment requires many cycles to work through.
Is anything more at work in the global deployment of the descendants of {{l}} than an impulse to standardize? Are we preparing for a time when English may appear in a different font? Why? AFAICT links to English and Latin-character terms from any section should not be templated links either.
As I said, terms don't exist independently of language. When you use the word chair in a definition, it's not the string of five characters you wish to link to but the English word chair. Mglovesfun (talk) 09:44, 18 July 2013 (UTC)
I’m one of the few who always use {{l}} so I guess I should defend its use:
Using the {{l}} templates links to the correct section. You might say: “oh but there is only one language section, so it’s unnecessary.” But how often are you willing to check the page to make sure it still has only one section?
You might also say: “but it’s linking to the English section, which is the first anyway.” Even then, section linking will skip the upper content (user page link, page editing button, etc.) and the ToC, and the multilingual entry if any (again, even if there isn’t one, how often are you willing to check?)
If you use tabbed languages, click a regular link and the link’s page has a section in the current language, you will remain in the same language instead of English. Even if the page linked to looks like there’s no way in hell it can have a non-English section, remember that recent loanwords tend to be unadapted (art dealer.) — Ungoliant(Falai)10:58, 18 July 2013 (UTC)
It tags the term with a span containing lang="langcode". This doesn’t have much use now, but it in the future it will be very useful. For example, if a script is made that colours a link green if the page exists but doesn’t have a section in the given language, it will be much easier to do so if the link has a lang= parameter.
{{l/en}} and the other {{l}}/foos are so small their resource consumption is negligible compared to the advantages I list above.
Someone had some time ago fun with removing my UserPage. I just recreated it and will continue doing so. It's our right as a registered user to create one, isn't it? Who thinks to stand above is, is not better than a dictator. Thank you very much indeed for your attention, bureaucrats and other people with extra buttons they love to abuse to scare users with good faith. Read the famous five "laws" of Wiki. Relax, I won't ask for a ban :-) |Klaas "Z4␟" V| 13:08, 16 July 2013 (UTC)
I would hardly say it's a 'right', you make it sound like the right to a fair trial or the right to be free from inhumane punishment. Mglovesfun (talk) 13:23, 16 July 2013 (UTC)
This isn't Wikipedia. The consensus of the community here is that user pages are for dictionary business only. For instance, they have decided, by vote, to ban most user boxes (Babel boxes excepted). Users who have made substantial contributions to Wiktionary are given more leeway. Chuck Entz (talk) 13:40, 16 July 2013 (UTC)
{{head}} has used this new module for a little while, and it has been extended and updated some. It has a few more features as a consequence. In particular, headwords and inflected forms that contain wikilinks in them are now automatically linked to the correct language section, so you don't need to do this yourself. I have updated and reorganised the documentation of {{head}} to reflect these changes.
Maybe more important is that a variety of functions used by {{head}} are now exported from Module:headword. These can (and probably should) be used in headword-line modules for individual languages, to reduce duplication of code and to make it easier to make them work consistently the way they should. I have updated Module:nl-headword as an example, which you can base your own modules on. Notice in particular the four calls to m_headword.something at the bottom of the show function, and also the way in which inflected forms are specified (more or less like {{head}}'s parameters, but using Lua tables). The module's functions are now documented, so that it's clear what can be used and what it does. There is also a list of future changes, which would bring the module, and any other templates and modules that use it, in line with some of the features recently added to other templates like {{t}}. —CodeCat18:46, 16 July 2013 (UTC)
This page tells me otherwise. I might have added a couple of entries under an IP, but not under my user name if the page is to be believed. Note that I am not opposing anything; I am merely proving input with some degree of relevance. If my attempt to edit my user page was blocked back then, I would have probably gone on to create entries regardless. --Dan Polansky (talk) 15:28, 17 July 2013 (UTC)
You are right. I was reading the page from top to bottom instead of bottom to top. In that case you were incorrect to say that you were a "contributer" before you had contributed anything! SemperBlotto (talk) 15:36, 17 July 2013 (UTC)
The page I had created did not contain the misspelling "contributer"; it was this revision. You are kind of right that I was not really a contributor at the point at which I had created the page. Well, shame on me, I guess. --Dan Polansky (talk) 15:44, 17 July 2013 (UTC)
Anyway, the following are some of the users who have created their user page as their 1st edit:
Well, DTLHS is a false positive, of course. As for me, that's because I was already a Wikipedia editor; I suspect it's the same for the others. That's why that global edit count business could help solve a lot of this. —Μετάknowledgediscuss/deeds16:06, 17 July 2013 (UTC)
This seems rather newbie-biting to me (and when have we ever been accused of that before?), but if it is implemented I certainly hope it won't apply to SUL accounts that are have been around for awhile on some other Wikimedia project. —Angr17:46, 17 July 2013 (UTC)
Actually, this already has already been implemented. We're having this discussion about un-implementing it precisely because we have been getting flak from veteran users of other wikis who come to this site and are blocked from making user pages. - -sche(discuss)17:51, 17 July 2013 (UTC)
Just add a condition that checks for external links. This is what the spambots want to drop as their payload, and no experienced Wikipedian will generally have an external link on their user page. -- Liliana•19:30, 17 July 2013 (UTC)
Sounds good to me, too. Ban new (to here) users from creating pages with external links, and any keywords we notice are common in spam (gucci, purse(s), shoe(s), sunglasses). - -sche(discuss)21:01, 17 July 2013 (UTC)
But nobody (including me back then) needs a userpage at all if they are not a contributor. It wouldn't have stopped me from contributing all those years ago - why would it stop anyone now? SemperBlotto (talk) 06:47, 18 July 2013 (UTC)
Because some people like to be a person, instead of an anonymous pseudonym? Putting up a user page is a nice easy way to sign into a project, saying who you are, what skills you bring and maybe what you plan to do.--Prosfilaes (talk) 07:53, 18 July 2013 (UTC)
At the original point of discussion, there was some commentary about that the filter that I moved over let garbage through, the purpose of a single filter is not to be as effective as possible, with zero false positives. For that reason I would suggest a series of smaller effective filters rather than one mega filter that stops all garbage. The filter that I built was to stop the NTSAMR-type spam, nothing more, nothing less. If you want to stop users adding off wiki links as new users, then that is not unreasonable as long as you directly tell users why, or you have a good monitoring in place to welcome them. Just don't use the auto-filter, the spambot-hackers clearly know to create an account, and to leave it, often over 2 months. — billinghurstsDrewth14:32, 18 July 2013 (UTC)
Judeo-Persian and Bukharic
Until yesterday, WT:LANGTREAT specified that Judeo-Persian was to be considered a dialect of Farsi and banned from having entries. Because I could find no discussion supporting that, and because the two lects differ in vocabulary and script, I concluded following this convo with Metaknowledge that that was a simple error similar to the erroneous ban on Tajik which you can read some history of here. Hence, I updated LANGREAT to allow Judeo-Persian its own entries. But Bukhari also exists. Are Bukhari and Judeo-Persian distinct enough from each other that both should be allowed, or would it make sense to combine them, and if so, under which name? (We currently call Bukhari "Bukharic", but "Bukhari" more common than any of its other names and than "Judeo-Persian", judging by Google and Google Books.) - -sche(discuss)19:05, 17 July 2013 (UTC)
Did you try searching for "Bokhari" as well? In any case, on the merits of script alone, Persian is fa-Arab only, Judeo-Persian is Hebr only, but Bukhari is Hebr/fa-Arab/Cyrl. So the two could be a bit messy to merge, were we to have entries, but not too bad. —Μετάknowledgediscuss/deeds19:11, 17 July 2013 (UTC)
Reviewing the literature, I find that does call Bukhari a dialect of Judeo-Persian. On the other hand, a number of scholars, including Solomon Birnbaum, separate Bukhari (considering it more Tajik-ish) from Judeo-Persian / Jidi / Parsic (more Persian-ish). I think we can treat them as separate languages for now, and merge them later if we discover that's warranted. - -sche(discuss)03:03, 20 July 2013 (UTC)
Users adding ky interwikis
Obviously we want all valid interwikis, but I suspect these account, a mixture of IPs and named account, are in fact bots. Furthermore they're not even being added in the right place. I think Rukhabot can now sort interwikis as well as add them? Other than this, what to do? Mglovesfun (talk) 09:39, 18 July 2013 (UTC)
A little while ago there was a discussion about the format of adjective forms in German, I don't remember where it was exactly. The problem is that a single form might have many different functions, which leads to a very long list of definitions. Here is an example of an adjective, for reference: breit. The form breiten for example appears many times in the tables, as does breiter. The alternative that SemperBlotto's bot seems to use is {{inflected form of}}, which is in fact used only for German entries (so it should really be moved to {{de-inflected form of}}). But that template is really a bad substitute because it just says "inflected form of" and doesn't give any other information. So both breiten and breiter would be called "inflected forms" by this template, which isn't terribly useful or informative. (Note that we had a similar debate about Dutch adjective forms some years ago. But in Dutch, "inflected form" is actually the established term for one specific form of the adjective, so it's as concise and accurate as it can be, at least for Dutch. Not so for German.) So what should we do about this? I think at the very least we should get rid of {{inflected form of}}, it's very vague. But how do we display the information in a format that's not too long, but still informative enough that the user knows just what form it actually is? —CodeCat20:41, 18 July 2013 (UTC)
Perhaps a small collapsed box could be made for the definition line that simply displays Inflected form of Xxx, and when you click it it expands to possibly dozens of lines containing more detailed information. I suspect that most users are in fact not interested in the the specific details of inflected forms that we usually provide (for nouns that would be the case, gender, animacy, definiteness, possessive forms, etc.), but simply want to jump to the main lemma where they can deduce it themselves from the context (once they know the meaning), or look it up in the inflection table if need be. --Ivan Štambuk (talk) 21:02, 18 July 2013 (UTC)
We can also use an intermediate approach where we provide some details on a few definition lines. Or maybe find ways to combine definitions to make them more concise. For example, breiten appears in both the weak and mixed declensions in the same places, so we can just say "(definition) weak and mixed". We can also group definitions under subdefinitions, like this:
How would that sort of list accept the addition of supporting quotations for each form and definition? Have we decided not to do that? --EncycloPetey (talk) 17:09, 20 July 2013 (UTC)
As far as I know, we put those on the lemma entry. It would be silly to require all quotations on the lemma entry to be only of the lemma form itself. —CodeCat17:13, 20 July 2013 (UTC)
Scripts and italics
A lot of scripts have italics disabled for {{term}}, and a few more had them disabled completely for any kind of italics whatsoever. Which is more desirable? Should these scripts never be displayed in italics anywhere on Wiktionary, or should this be limited only to mentions? Or should should some scripts never be italic while for others it's only disabled for mentions? —CodeCat23:28, 19 July 2013 (UTC)
No script except Latin and Cyrillic should ever be set in italics. There's some disagreement as to whether Cyrillic should ever be italicized at Wiktionary. (I'm in favor of italicizing it with {{term}}, but I don't feel particularly strongly about it.) —Angr16:59, 20 July 2013 (UTC)
I'm among those who think Cyrillic should not be set in italics. The italicized versions of some Cyrillic characters are quite different from the standard font. The Cyrillic charcter that looks like a "T" in regular script looks like an "m" in italics, when they're handled correctly. Latin script should be italicized in the circumstances we're discussing, but in my opinion it's the only one that should. --EncycloPetey (talk) 17:12, 20 July 2013 (UTC)
To be exact, it's the lower case "т", which looks like Roman "m" when italicised - "т". Capital "Т" appears normal. Other letters may also looks significantly different, note г, д, и (looks like Roman u) and т:
Russian low case alphabet letters, normal and cursive: а а, б б, в в, г г, д д, е е, ё ё, ж ж, з з, и и, й й, к к, л л, м м, н н, о о, п п, р р, с с, т т, у у, ф ф, х х, ц ц, ч ч, ш ш, щ щ, ъ ъ, ы ы, ь ь, э э, ю ю, я я.
Japanese in italics ranges from mildly funky-looking (like このサンプル this sample) to not-quite-legible (like 比喩的、龍、複雑怪奇、糞 these terms). YMMV, and it depends on what font your system uses. Typographically speaking, italics aren't used much in Japanese text anyway, at least in my experience. Changes in italicization can also make certain characters potentially more ambiguous, such as ソ (so, italicized) and ン (n, not italicized).
FWIW, I don't think italicized Cyrillic is all that problematic. But then again I'm not editing any entries using this script, so my opinion probably shouldn't carry much weight. ‑‑ Eiríkr Útlendi │ Tala við mig22:28, 20 July 2013 (UTC)
I think scripts that are not italicised by native convention should not be italicised on Wiktionary either. I think that would include Japanese. But Cyrillic certainly does appear in italics natively, so that is a different matter, and we have to look at what we prefer. What we could do is use "font-style: oblique" instead. This tells browsers to just display the normal font slanted, without using any of the special letter forms used in italics. That might help with any confusion with Cyrillic characters, and maybe for other scripts as well. —CodeCat22:39, 20 July 2013 (UTC)
Both samples here appear identically on my home machine -- Ubuntu 10.4, Chromium 25. Inspecting the elements shows that the browser thinks the former is indeed in italics and the latter in oblique, but the implementation shows up the same for me. ‑‑ Eiríkr Útlendi │ Tala við mig22:55, 20 July 2013 (UTC)
It may be a matter of the fonts. Some fonts have special italic forms while others don't. For me, the default font for Wiktionary, as well as the one used for Cyrillic, both display italic and oblique the same. But when I add "font-family: serif" like I did above, they appear different to me. —CodeCat22:58, 20 July 2013 (UTC)
It could be, but on my system they both look like the Latin letter string Memum with a soft sign after it. I use a Mac, and tried it with both Firefox and Safari. Chuck Entz (talk) 02:30, 21 July 2013 (UTC)
Also using Firefox on Windows, and they look identical for me too, like memumb. Same when I switch to Chrome. When I switch to IE, they still look identical but are in oblique, so they look like mеtиtb —Angr09:21, 21 July 2013 (UTC)
I'm using Firefox on Linux Mint 15, and for me they differ. So either Mint must be doing correctly what all of those other systems are doing wrong, or the font being selected on my system has distinct italic forms for Cyrillic whereas yours doesn't? —CodeCat11:00, 21 July 2013 (UTC)
The spec says “italic selects a font that is labeled as an italic face, or an oblique face if one is not,” and “oblique selects a font that is labeled as an oblique face, or an italic face if one is not.” Since a font with both italic and oblique styles is very rare, these two values are functionally identical.
I don’t know why CodeCat’s browser is showing two different effects above, but I would guess that it is being hyper-correct in artificially synthesizing oblique forms from the roman, which is not according to the CSS spec, and is bad font rendering (Windows might still do this, but Mac OS has avoided it for a decade or so). Mechanically-derived obliques usually distort letterform qualities like the contrast between horizontal and vertical strokes, and the stress axis (the angle at which the thicks turn into thins). —MichaelZ. 2013-09-08 05:39 z
I'm going to add my voice to those who think Cyrillic shouldn't be italicised, and thus that no script except Latin should be italicised. - -sche(discuss)02:01, 22 July 2013 (UTC)
We decided not to italicize Cyrillic because the italic forms vary, and may confuse those unfamiliar with Cyrillics. I have come to believe that this justification is bogus. If we really wanted to baby the non-readers, then we would do everything in transliteration, and stick to 26 Latin letters for every language.
Every language and writing system should be rendered according to its native conventions. That means italics for Latin and Cyrillic, but not for Greek and Chinese. —MichaelZ. 2013-09-08 05:39 z
The script request categories
I have often wondered just what the use is of having categories like Category:Entries which need Cyrillic script. We did somewhat mitigate that by adding the language to the name, but I feel that this was done without really looking at the nature of the problem. Generally, these categories are added to a page (by {{term}}, {{rfscript}} and others) when a term is given with only a transliteration, but not the term in the native script. Which script that is doesn't actually matter, because scripts are generally used different enough from language to language that someone who can convert a Russian transliteration to Russian Cyrillic won't do all that well with converting transliterated Serbian into Serbian Cyrillic. And someone who knows how to write and transliterate Sanskrit won't generally understand how to do the same with a Hindi term. So what matters really is the language, the script is only secondary. So I propose to remove the name of the script from these categories altogether, and use a name like "(language) terms needing native script". {{rfscript}} would need to be converted so that it only takes languages as its parameter, rather than the script code. What do you think? —CodeCat19:19, 21 July 2013 (UTC)
They way things are in this case is good enough for me. The proposal to me seems not as good as the status quo, ergo oppose. Mglovesfun (talk) 22:34, 21 July 2013 (UTC)
Since you asked, CodeCat, you seem to have an agenda that everything must me made to work differently to how it works now, even things which work perfectly well. It worries me that a lot of good infrastructure will be thrown away for your personal reasons (whatever they are) and not for the good of the wiki. Mglovesfun (talk) 22:46, 21 July 2013 (UTC)
Maybe I should try to explain my reasons then. I am primarily concerned with consistency and making things work in a way that is the most intuitive and sensible. Some people like DCDuring complain about template-itis, and I do agree that it is rather confusing with how many templates we have. However, I argue that the confusion stems from how they all work differently from one another. If they all worked similarly, then it would reduce the mental burden on newcomers because they would not need to learn every little slight difference about all the templates, category names and so on. Instead they would be able to actually get things done because it would be easier to actually remember how all of their tools work. In this particular case, try typing {{term|tr=something|lang=ru}} in an entry. It will add the page to Category:Russian entries which need Cyrillic script. Can you see what is wrong with that? That category name will be the same even if you put it in a German entry. So the "Russian entries" part is incorrect, it should be "Russian terms". And you can increase consistency further by changing "which need" to the more usual "needing" (which is used in a lot more cleanup categories). Working "minority" conventions out of the system in favour of the majority so that people no longer have to think "was it this category that was called 'which needs' or was it that other one?". It will always be "needing", no more question needed, which leaves more mental room for questions that actually matter.
The other half of the reason is that I don't feel that the current structure of these categories makes sense. They may have made sense at one point, but there is always a time when you need to re-evaluate things that you once took for granted, and judge whether they really make as much sense as you thought they did. Let's say that the code above was placed on a German entry, and I am a Russian speaker and I want to fix any links to Russian terms that need the Russian spelling. Where do I go? Well, the place to start is Category:Requests (Russian) but that category is already a horrible mess. Let's disregard that for a moment and assume that I make it to Category:Russian terms needing attention (although there is nothing particularly intuitive about that category, since plenty of request categories are placed elsewhere in the tree). Then I see Category:Russian entries which need Cyrillic script. Aha, I think, that is what I am looking for, so I work through that category. But then later on, I come across its second parent category, Category:Entries which need Cyrillic script, and I find more Russian entries there. Why were those not listed in the first category? Why do I need to look in two categories to fulfill essentially the same task? Request categories should be task oriented, so that is bad organisation and it's what I am trying to eliminate with this proposal. Under this proposal, Category:Terms needing native script by language would contain subcategories organised by language, including Category:Russian terms needing native script, and both Category:Entries which need Cyrillic script and Category:Entries needing various scripts would disappear. There would also be no Category:Russian terms needing Cyrillic script because Russian terms are always in Cyrillic, the script is redundant. Maybe an exception could be made for those few languages that use multiple scripts, but for the majority of languages, the script is completely irrelevant to the task. People who can write in Russian don't go adding "Cyrillic" spellings to entries, they add Russian spellings and don't care about Ukrainian or Kazakh spellings because they don't know those. —CodeCat23:38, 21 July 2013 (UTC)
You make a compelling case! I agree that it's undesirable to have entries in two places, the {{term}}-added language-specific categories and the {{rfscript}}-added generic categories. And I agree that having categories use a standard naming scheme on "needing" vs "which need" is desirable. I am not, however, sold on the need to change "Russian Cyrillic script" to "...native script". For the small number of languages that can be written in multiple scripts (especially if all are non-Latin), "native script" is an unclear designation. For the rest, I don't see that "native script" is any better than "Cyrillic/Arabic/etc script". - -sche(discuss)06:24, 22 July 2013 (UTC)
If we keep the script in the name, then we'd still have to keep the name Category:Terms needing native script by language. And the convention is for categories named "(name) by language" to contain categories named "(language) (name)" so that is the most desirable thing here as well (see where consistency comes in?). Furthermore, if we decide to have, say, Category:Old Church Slavonic terms needing Cyrillic script and Category:Old Church Slavonic terms needing Glagolitic script, then would we place them both under Category:Terms needing native script by language? That'd be a bit unusual, listing the same language twice. Instead, though, if we really want to separate them by script, we can put both in Category:Old Church Slavonic terms needing native script, so that this parent category can be used for any terms whose script is for some reason unknown. So we can keep scripts in the names if we really need to, but they are now secondary to languages, and would only serve to further specify. In the old situation it was the other way around: script was primary and language was specified "when someone felt like it" which is far less useful. It's far more likely for someone who knows language X to know how to write X in several or all of its scripts, than it is for someone who knows script Y to know how to write several or all languages written in Y. So, it's far less of a problem to leave the script unspecified for terms in a given language, than it is to leave the language unspecified for terms in a given script. We can specify the script if we still want to, but it doesn't seem nearly as necessary. —CodeCat11:12, 22 July 2013 (UTC)
This edit popped in in my watchlist today where -sche redirected defense as an alternative spelling of defence. The policy/guideline page WT:AEN however still has this:
Words that are commonly spelled differently in different countries are all considered valid entries that should not be shortcuts to other versions. Full entries exist for both color and colour.
-sche claims that merging is something has been approved by the community. However, most of such redirects are being done by a few people, that invoke their own previous mergers as precedents. I think that this is agenda-pushing and that we should have both spellings as equally valid entries, kept in sync for any changes. If any spelling should be the default one, that would be American English which is the most widely spoken and the most influential variety of English. --Ivan Štambuk (talk) 21:27, 21 July 2013 (UTC)
I agree with removing this duplication, but I disagree that American English should be the default. —CodeCat21:29, 21 July 2013 (UTC)
Then what should be the default spelling? Some random criteria like "which spelling was created first as an article" or "which was more often updated" ? --Ivan Štambuk (talk) 22:15, 21 July 2013 (UTC)
Except that "first created" is not random, and it's the only way to do redirects without favouring either side. —CodeCat22:20, 21 July 2013 (UTC)
It's random in the meaning "not predictable". The time when the article was created was randomly chosen by mental processes of the editor, which are non-deterministic. It's inconsistent. Furthermore, it's not uniformly distributed due to the inherent bias of humans to prefer their own native (or taught) spelling. Just because in the early days of Wiktionary there were many American/British editors, it doesn't mean we should favor American/British spellings. Criteria of preference should be objective, universal and not subject to cultural prejudices of early editors. --Ivan Štambuk (talk) 22:40, 21 July 2013 (UTC)
It might as well be random; it's no different from tossing a coin in terms of the ultimate outcome. And I consider tossing a coin fairer so I'd rather we do it that way. BTW I'm being serious. Mglovesfun (talk) 10:05, 1 September 2013 (UTC)
Comments:
The line cited above, from the think-tank WT:AEN, hasn't been updated since 2009 (or earlier, since it seems to have been copied into AEN from somewhere else).
Does anyone dispute that syncing does not work? If so, can they point to any entry that has been correctly synced for any major portion of its history? Even colour and color are only synced because I synced them. Fewer than 1 in 20 of the entries I've seen where content was duplicated were actually synced; the rest contained differences of varying degrees of severity, with one entry or the other routinely missing common senses, and with all entries containing definitions different enough to wrongly imply that the terms had distinct meanings and could be contrasted with one another.
The class of entries which would require syncing is huge—indeed, it is open-ended: all verbs ending in -ise/-ize, nouns ending in -our/-or, adjectives ending in -ised/-ized or -oured/-ored... common entries with many senses, like realize, less common entries like paganize... and a limitless set of entries we don't have yet but would have to anticipate somehow so as to sync them once they appear (actuarialize, dictionarize, etc)...
Until now, pairs of entries have been merged by simply picking one spelling or the other. Other people have had different methods, but my method has been to edit two pairs of entries at a time, making the American spelling of one pair the lemma and the British spelling of the other pair the lemma. We could, however, adopt a policy like Wikipedia's, whereby the lemma is whichever spelling (i.e. whichever entry) was created first.
The only reason why syncing doesn't "work" is because nobody has bothered to actually enforce it. Instead, aggressive POV-pushers such as yourself which favor British spellings use it as an excuse to subtly push their agenda. The only parts that need to be synced are definition lines, the rest can be transcluded from the same place. It's not like these particular entries are updated billions time per day. At most we're dealing with a few dozens edits per day for words that need to be mirrored in alternatively spelled entry. These could be easily logged by a bot. --Ivan Štambuk (talk) 22:15, 21 July 2013 (UTC)
The answer to this is really "been there, done that". Yes, we know that we can do these things. The problem is that nobody actually does them. And in any case I am somewhat wary of relying on bots for the normal operation of the wiki. Just look at what has happened since a formatting dispute shut down Autoformat. —CodeCat22:20, 21 July 2013 (UTC)
What has happened? What is "normal operation of the wiki" ? Every entry is a continuous work in progress. Just because edits at defense and defence are not instantly mirrored to each another, it doesn't mean that we should abandon edit mirroring as a concept. Editing mistakes in templates or wiki code are minor technical issues. This has much bigger significance IMHO. Alternatives should at least be openly discussed. --Ivan Štambuk (talk) 22:49, 21 July 2013 (UTC)
LOL. Considering that I've merged entries in pairs (one to British, one to American), and that when I've created new entries I've almost always made American spellings the lemma, the fact that you think I'm pushing British spellings shows how little you pay attention. Of course, so does everything else you've said so far. As CodeCat notes, we've "been there, done that" and it hasn't worked. - -sche(discuss)22:23, 21 July 2013 (UTC)
I do want to note that -ize is also British, so we should put the main entry there at all times. —CodeCat22:35, 21 July 2013 (UTC)
No you haven't really done that. What you have done is the exact opposite - favoring a particular side due on the basis of argument that syncing doesn't work. The problem is that you are in principle against syncing, not because it won't scale (it would, with bot assistance), because you want to reduce amount of duplication that you perceive as redundant. The problem with that line of argument is evident in entries such as defense, the most common English spelling in the world, being redirected to a regional spelling defence. --Ivan Štambuk (talk) 22:42, 21 July 2013 (UTC)
Evidence of what exactly? It haven't really seen any proposal for systematically keeping entries in sync. I've seen people discussing what entries should be the "main" ones, and the benefits of merging (less maintenance). --Ivan Štambuk (talk) 23:25, 21 July 2013 (UTC)
I think you need to look at the problem more closely. defense is used by the most native speakers, but defence is used more widely because the influence of British culture is more widespread. That may be changing gradually, yes, but the US is still a relative newcomer as far as cultural influence goes. A hundred years ago, the chances were that if you went anywhere outside the American continent and they spoke English, it would be some form of British-influenced English. To call defence merely a regional spelling would be like me trying to elevate Ijekavian to standard Serbo-Croatian and calling Ekavian merely regional (it's only used in Serbia!). All of this is beside the point though and not really relevant, because this debate isn't about deciding which variety of English to base Wiktionary on. You complain that the criteria that have now been applied are arbitrary, but I argue that they have to be. Any argument about the relative use of one variety or the other will become a dead end, so only a completely arbitrary decision that has no linguistic merit is going to break this impasse. That's exactly what -sche has done and I think it's very good. —CodeCat23:08, 21 July 2013 (UTC)
The days of British Empire are long gone. US is by far the most dominant culture on the planet. From pole to pole on on every TV, radio, newspapers etc. you'll see American movies, artists, celebrities etc. American language is the global English. From the perspective of how things are, today, the spelling defence is unfortunately a regionally-confined variant. It doesn't mean that it's any less "worth" though. I was merely making an observation that redirecting a more common spelling, to a less common one (from the perspective of majority of native speakers and FL learners), seems a bit problematic to me. The argument of relative use of some variety (from the perspective of widespreadedness, number of speakers, cultural relevance etc.) is indeed as arbitrary as "who created this entry first" - but that is the whole point. The only way to resolve this is to keep both spellings as full-blown entries. --Ivan Štambuk (talk) 23:25, 21 July 2013 (UTC)
Regarding the use of a bot to sync entries: among the many technical problems with using a bot to sync entries are: (1) Bots rely on humans to run them from time to time. If, between one bot run and the next, people edit foobarize and foobarise in different ways, what does the bot do? Throw out one of the changes? (2) If the definitions of a term (are changed to) use UK/US-specific spellings, words or grammatical constructions — e.g. if someone adds color to colorize or parka to winterize or "in hospital" to hospitalise — does the bot have an enormous table of all the tens of thousands of cross-Pondian differences in spelling and vocabulary and grammar, so that it knows to substitute colour into colorise and anorak into winterise and "in a hospital" into hospitalize, or does it awkwardly copy color and parka into the British entries and "in hospital" into a US entry? (3) Bots exist only if humans are able to write them. Has anyone spoken up to say "I am capable of writing a bot that ca do a good job of syncing entries without perpetrating errors that will go unnoticed for just as many years as unsynced entries do"? - -sche(discuss)03:59, 22 July 2013 (UTC)
I didn't said that bot would sync entries, but that bot could provide assistance to humans that would manually mirror all changes. Bot would be a simple program that would fetch recent changes a few times day (perhaps once a day would suffice), look through the ones that have been done within English L2s that have alternative spellings to UK/US entries, a build a backlog to process. One link to a diff, and another link to "edit English section" to copy/paste to. That wouldn't be very hard to write (I could do it).
Syncing all entries would be the crucial first step. Once syncing of all entries is done, we could chose some representative set of words, e.g. top 5000 English words, within that set copy/paste complete entries to what are now redirects, and make sure that entries with alt. spellings within them are kept in sync. If the volume of edits proves too low (as I suspect it would, these accusations of inordinate amount of traffic seem greatly exaggerated to me), it could be gradually expanded to encompass even more words to monitor, perhaps all daily edits.
This is just a suggestion of how it could be done. I'm not saying that this is the way it should be done. If the technical and capacity (volume of edits) difficulties actually prove that this is doable, I think that we should at least give it a shot. --Ivan Štambuk (talk) 11:39, 22 July 2013 (UTC)
There are few things quite as irritating as be forced to an entry all of the citations and usage examples for which are culturally irrelevant to you. I'd expect that users who favor the 'losing' spelling will be somewhat alienated by the experience as I am. I'd rather that we had separate entries as long as there is some usage context, regional or otherwise, in which a given spelling is dominant. Also, if there were even a single definition that was much more common in one context rather than another. To avoid the truly pointless, we can use {{trans-see}} to make sure that the translations are consolidated. If work is required, so be it. DCDuringTALK23:11, 21 July 2013 (UTC)
Re first sentence: To prove a particular spelling exists, we require citations of that spelling, but beyond that, citations are placed in lemma entries regardless of inflection, spelling and even language stage: many citations of the Middle english poet Chaucer are found in our English entries, most still using u for v and yogh for g, etc. And most works are attested in multiple versions, some UK and some US; Shakespeare, for example, used the spelling favor, but modern British editions of his works often use the spelling favour, so citations of him can be found in both US and UK entries; J K Rowling used honour, but US editions use honor. Thus, if you find Chaucer, Shakespeare and/or Rowling culturally relevant, you'll find them no matter where the lemma is, and if you find them irrelevant, you'll still find them no matter where the lemma is, even if we keep duplicate copies of all our entries. :/
Re last sentence: Why is it OK to consolidate translations? Why should I be allowed to view some (in theory "all", but we know from experience that both members of a pair of synced entries contain all definitions with the same wording barely 1 time out of every 20) of the definitions of foobarize in the entry ], but then be required to go to ] to see the translations? That's irritating: I've just finished looking over the definitions, and then I have to go to another page to find the translations? When I find that the definitions and translation glosses on that other page are, most of the time, slightly or significantly different from the ones on the page I was just on, or even that some of the senses from foobarize are missing from foobarise, that's even more irritating and confusing! - -sche(discuss)03:32, 22 July 2013 (UTC)
Because {{trans-see}} is done on a definition by definition basis and we now have {{senseid}}. I had thought that we have three main populations of users: Monolingual users preferring US spellings, monolingual users preferring UK spellings, multilingual users interested in translations. I was hoping that we do not have to allow for multilingual users who prefer one or the other regional spelling too. As I think about it, though, we seem to have many contributors who seem to enjoy copy translations from one spelling to another. Maybe we should just take advantage of that and have translations at all main spellings.
In any event, my focus is on the kind of irritation that I know at least one monolingual contributor experiences. I suspect that irritation would apply to many, possibly even the polylingual. DCDuringTALK04:59, 22 July 2013 (UTC)
I don't care which variety wins - British or American. Unlike Serbo-Croatian or Mandarin entries, English entries are usually heavily edited. Having duplicate info is not sustainable and in the end they get out of sync. I don't think we need redirects for one variety but alternative spellings with most info in one entry only. --Anatoli(обсудить/вклад)05:21, 22 July 2013 (UTC)
I also oppose having multiple entries, and strongly oppose picking either variant to be always the lemma. Who is going to check the thousands of entries every so often to make sure they are being synched correctly? Who is going to develop the bot? Who is going to maintain the bot and make sure it’s working like it should? Not you. — Ungoliant(Falai)07:04, 22 July 2013 (UTC)
"Thousands of edits" is perhaps a yearly amount of edits that all such English entries get. Perhaps some statistics should be gathered first before discussing this altogether. Monitoring bot wouldn't be terribly difficult to write (fetch a list of pages, extract withing them English L2s from the current page and the one before X hours, if different add to backlog). At any case, I think that technical difficulties are secondary, and that the most important thing is to acknowledge is that our current approach of pseudo-random redirects introduces some important issues that I'm afraid being silently swept under the rug, and justified under the orthodoxy of "efficiency and less maintenance = good, duplication = bad". --Ivan Štambuk (talk) 11:39, 22 July 2013 (UTC)
Translations will eventually (asymptotically with respect to time) be the most frequently updated part of any English entry. At some point many of the basic English words will have so many translations that it will place a significant burden on page load times and template transclusion constraints. I suspect that this will be solved by only displaying some limited subset of languages (e.g. top 200), and translations for the rest will be stored on another page and fetched asynchronously on user demand (perhaps a subpage, or a separate namespace). I also suspect that eventually all translations will be treated that way so that the section for translations could be "turned off". That could render the whole issue of "what spelling gets translations" moot, but only in the long term. For now, {{trans-see}} on whatever seems just fine. --Ivan Štambuk (talk) 22:19, 22 July 2013 (UTC)
There was some talk a while back about the possibility of putting the lemma information under a combined, special "term", such as ], and using redirects and/or transclusion so that users would still find the entry content under ] and ]. This way, there is only ever one page with the content -- and thus no question of synchronization or duplication. After all, the only real difference between ] and ], etc. is the spelling -- all else is pretty much the same, for our purposes.
I don't know if that would improve things a whole lot. Essentially, under the currently proposed system, users going to either color or to colour will face a redirect link. Under your proposal, they will both have redirects. What we want, ideally, is for this to depend on the preferences of the user. color should be the full lemma page for users who prefer US English, while it should be a redirect for users whose English uses colour. The content itself would be placed on color/colour (if anyone asks: the two spellings are ordered alphabetically), and code on color and colour would contain code to decide whether to transclude that page, or output a form-of definition stub. —CodeCat18:12, 22 July 2013 (UTC)
@CodeCat, I don't know if this will ever come to pass, but my ideal would be that users visiting either ] or ] see exactly the same content, and that editors working on the entry would only have to worry about one single set of content, perhaps at ] or at ], etc. Stub redirects are a bit of a cludge, and not very good usability, but it's the best we've come up with so far to deal with pages that might have multiple languages on them. Keeping the content in one place and transcluding it into multiple other places as required would seem to fix that need to use stub redirects. ‑‑ Eiríkr Útlendi │ Tala við mig23:23, 22 July 2013 (UTC)
I refer to the mold/mould example again. I don't see why we should stray from that. As for which form is preferred (American or British), we can use the same strategy as enwiki: the first one who converts an entry to an alternative-form-of wins. -- Liliana•22:11, 22 July 2013 (UTC)
Wikipedia provides invisible redirects, we give semi-stubbish entries. If a spelling such as color has five times as much Google hits as an alternative spelling colour, it makes no sense to force 90% of users who land on ] to read a demeaning entry that says that their preferred variety of English is merely an "alternative spelling", and make an extra click to see the wanted content. On many of the merged entries this is exactly what has been done. --Ivan Štambuk (talk) 22:42, 22 July 2013 (UTC)
@Ivan, the use of stub redirects on WT is because of our data structure -- a single headword might contain data for umpteen different languages, so we cannot just redirect a headword in toto without possibly messing things up for all the other languages that might also have a term with that spelling. ‑‑ Eiríkr Útlendi │ Tala við mig23:23, 22 July 2013 (UTC)
I'm a bit puzzled by the way you combine "alternative spelling" with negative words like "demeaning" and "merely". I don't see the connection. I would almost say you are showing a... Balkan attitude? —CodeCat23:10, 22 July 2013 (UTC)
Because it's not really merely an alternative spelling once you have regional labels such as Commonwealth or US attached. When you soft-redirect that way it could be interpreted as having an overbearing undertone of cultural superiority, of one spelling being "improper" or "substandard", and the other one not. Perhaps I'm too sensitive on this matter, but it just doesn't feel right to me. --Ivan Štambuk (talk) 02:39, 23 July 2013 (UTC)
What do you think of the wording used on pyjamas, "(Commonwealth English) Standard form of pajamas" ? I used that wording on a number of entries, following Wiktionary:BP#Standard_spelling_of. I went back to the much more common "alternative form of" wording when it was pointed out in other discussions that it might make more sense to decide what/how pre-existing templates should display/be used (e.g., should {{British}} be changed to display "British"? and should {{British spelling}} be changed?) before adding a new template to the mix, but it would be simple to revive it — simple to create either a {{standard form of}} template to use with context tags, or even {{British standard form of}}/{{American standard form of}} templates to obviate the need for context tags. That wording/format would also be useful to other languages (Swiss+Leichtenstein German / German+Austrian German, Quebec French / Metropolitan French, etc). - -sche(discuss)17:39, 23 July 2013 (UTC)
That certainly seems "less wrong", and as for other pluricentric languages that you mention I support the same treatment as with English, i.e. content duplication, for exactly the same reasons. At this point however, we might as well use {{misspelling of}}, I doubt anyone would care. I have high hopes for some riled up new editors in the future though! --Ivan Štambuk (talk) 12:05, 25 July 2013 (UTC)
I doubt there'll be as many editors riled up about it as there are editors riled up on your talk page even right now about how "demeaning" it is to Croatians and/or Serbians and/or Bosnians and/or Montenegrins that we don't duplicate Serbo-Croatian content in four or five different language sections. :-p I happen to agree with your talk-page comment that "having sections that would be more or less identical has already been tried, and it becomes next to impossible to maintain"...it's one reason I'm so puzzled that you support duplication of US/UK and Quebec/France spellings! - -sche(discuss)21:19, 25 July 2013 (UTC)
I see it differently: by treating Serbo-Croatian as one language we "hurt" everyone the same and so no harm is done, or better said - the inflicted damage is balanced out. But by giving preference to a single form (like we do for regional variants of all English, and I suspect other pluricentric languages as well, some of which you mentioned), we necessarily give preference to a certain group of speakers which strikes me as unfair. If there should be redirects, the only fair way would be to use e.g. random.org feed and calculate modulo #NumberOfVarieties to decide which spelling to standardize on. Over the long term, the duplication will become much less of a problem once all of the disputed English entries get updated as rarely as fiberglass/fibreglass - once or twice a year. --Ivan Štambuk (talk) 21:55, 25 July 2013 (UTC)
Why trying to sync such entries? It's quite normal that they should be different in some cases (e.g. if there is a sense of color only used in American English, this sense should not be added to colour). Uncarefully syncing entries might lead to errors. And it's very important never to remove sound and correct contents from a page, because this rule is the only way to ensure that each page will approach perfection in the long term. We are not in a hurry. Lmaltier (talk) 18:08, 30 July 2013 (UTC)
Having entries for spellings instead of terms is a bad idea for readers, and a bad idea for the integrity of the dictionary. Two spellings are not two terms. This is the whole point of the concept of lemma in dictionary-making.
For starters, you are ignoring that colour has been used in the US, and is the more common spelling in Canada, which speaks a variety of (North) American English. Unless I know what spellings and senses of a term are used in every single place in the world, I shouldn’t create an entry for a term that leaves out any senses. If a reader looks up either variant spelling of colo(u)r, she should be able to determine from the entry what this word means everywhere in the world. If she has to parse two different web pages with a diff program to deduce which meanings are associated with which spellings in what regions, then the dictionary is failing. —MichaelZ. 2013-08-31 21:06 z
Or four pages, in cases like Labor. —MichaelZ. 2013-08-31 21:24 z
Re: 'Except that "first created" is not random, and it's the only way to do redirects without favouring either side': This is blatantly false. A variety of hash functions applied to each considered spelling is capable of splitting the spelling set into two approximately equally big halves without being impacted by the accident of whether an early contributor to Wiktionary was a British or an American speaker. An example of such a function: take the number of letters of the shorter of the two forms and make the American entry the main one if the number is even, or the British entry otherwise. I am not pushing this particular method; I am merely showing that there is a variety of methods rather than there being "the only way", and the mentioned "only way" is not guaranteed to give equal treatment to the two varieties. --Dan Polansky (talk) 09:50, 1 September 2013 (UTC)
Among the claims WT:AEN makes that might have been true in 2009 but which seem dubious now is this: "It is community consensus not to provide entries for Modern English possessive forms which are formed by adding the enclitics ’s or ’, and which are otherwise not idiomatic (with the single exception of the pronoun one’s). However, they are welcome as emboldened but unlinked words in inflection lines." In practice, this is not true: we don't list possessive forms in the inflection lines of name entries like Kevin or Smith, noun entries like doctor, blue, or vanity, etc... even Jesus sensibly uses a usage note. And the only pronoun that has an apostrophe-s form is the explicitly specified exception, "one". Does anyone object to removing the underlined sentence? - -sche(discuss)23:18, 22 July 2013 (UTC)
If you can call the discussion and vote back then consensus. Procedurally, WT:AEN is not a policy page. If we ever make much reference to it, it may become one.
Substantively, I don't object to deleting the sentence. Looking at Garner's Modern American Usage's three pages on various aspects of possessives, very little of it seems lexical. It does not even have a discussion of possessive form of "Jesus" or "one" (that all-important exception for) at those entries. Accordingly, I expect that anything we consider lexical could be placed in Usage notes, where, unfortunately, users will generally not notice it. I don't think a better fate awaits such information in WT:AEN or in some Appendix.
There are many other sites on the web that seem to offer help to users on questions of English grammar and style. I doubt that we can effectively compete. We get relatively few questions about matters of grammar and style. And many of those are from regular contributors.
Perhaps we could collect some WP links somewhere and contribute to the WP articles. If in the future we decide that WP articles were too academic or that the demand for our services as grammarians and stylists could not be ignore, we could revisit the our lack of coverage of grammar and style. DCDuringTALK21:35, 23 July 2013 (UTC)
Hello, Sorry for English but It's very important for bot operators so I hope someone translates this.
Pywikipedia is migrating to Git so after July 26, SVN checkouts won't be updated If you're using Pywikipedia you have to switch to git, otherwise you will use out-dated framework and your bot might not work properly. There is a manual for doing that and a blog post explaining about this change in non-technical language. If you have question feel free to ask in mw:Manual talk:Pywikipediabot/Gerrit, mailing list, or in the IRC channel. Best Amir (via Global message delivery). 13:07, 23 July 2013 (UTC)
You would think that eventually someone would pick up on the fact that English Wiktionarians tend to be able to speak English... --Yair rand (talk) 14:25, 23 July 2013 (UTC)
What did you expect- it's a bot that's delivering the same message to every Wikimedia site. Of course, they'd be better off saying "sorry if this is the wrong language". Chuck Entz (talk) 14:48, 23 July 2013 (UTC)
But he said "sorry for English" and "I hope someone translates this." Either he made similar misleading word choices in both those phrases, or he meant what he said, that he was sorry for posting in English.--Prosfilaes (talk) 00:18, 24 July 2013 (UTC)
That's probably the influence of my day job leaking through -- we deal with so much grotty English that we have to "translate" into more proper grammatical structures that I'm biased towards that interpretation. But re-reading the post, and the rest of this thread, I'm sure you're right. ‑‑ Eiríkr Útlendi │ Tala við mig00:42, 24 July 2013 (UTC)
Can we phase out the script code templates?
For a long time now, the idea has kind of been "in the air" among some editors that it would be desirable to move all of our script styling differences into the MediaWiki:Common.css file. With some help from Z and Mzajac we figured out that it would actually be very feasible because almost all of the script code templates contain either the same code, or only small differences that can easily be handled with CSS. So over the last two weeks I've been working on this and it's now more or less complete. Specifically, what has happened now is that all of the script templates are the same, through calling {{script helper}}, and the only difference is a CSS class that is applied to the wrapper element. This in itself is nothing new, but in the old setup, different script templates might select different HTML elements as wrappers depending on the desired formatting/style; with the new setup the selected element is always the same regardless of script, and the CSS formats the elements depending on the script-code CSS class that is given to the element. So rather than saying "don't use the <i> element for mentioned terms in Cyrillic", the new method is "don't show the <i> element in italics for Cyrillic". This means that if you write <i lang="ru" class="Cyrl">(some text)</i>, the CSS will recognise that Cyrillic text is not to be made italic, and will just display it as regular upright text even though the HTML says it's in italics. That is separation of content and presentation at its best, I think? :)
As far as I can tell, this has been successful and that brings up the next question. Do we still need the script code templates? Since they all just invoke {{script helper}}, we could instead use that template, but there isn't a reason to do even that. After all, that template really just generates one of four HTML elements, depending on the face= parameter, which in turn depends on the specific context in which you want to show the text. And a CSS class. So it would be more economical to take this middle step out, and just write the HTML code directly in our templates and modules. This would eliminate more template calls which would make frequently-called templates like {{t}} and {{l}} quite a bit faster.
So just to wrap this up and clear out any confusion. This proposal is about eliminating only the templates, not the codes themselves. The codes would still exist, but would not need any template to support them; all script support would be through CSS. —CodeCat01:47, 24 July 2013 (UTC)
If I understood correctly, everywhere were script templates used to be invoked inside template code will now be HTML/CSS ? --Ivan Štambuk (talk) 02:18, 24 July 2013 (UTC)
To DTLHS: That will take some time. I have been working to ensure that all of them have a lang= parameter at least, but completely eliminating them will take a while. Of course it will never happen if we don't decide to start.
To Ivan: There is an alternative template {{lang}} that can be used to add script-based formatting to text in entries, so it can replace the script templates in this role. It takes two parameters like {{l}} does, and it selects the script based on the language code, so it should be easy to use. For templates, writing out the HTML is probably preferred, but you can also {{lang}} if you want to. You can't get term or headword formatting that way though, because they need CSS classes. But if you want to display a headword you can just invoke {{head}} itself (which has other benefits as well), and to show a term, just use {{term}}. That leaves {{lang}} for situations where you just want plain tagging of non-English text without any formatting. —CodeCat02:20, 24 July 2013 (UTC)
What about languages that are written in multiple scripts? Will there be automatic script-detection, or sc= will still be needed in such cases? --Ivan Štambuk (talk) 02:27, 24 July 2013 (UTC)
{{lang}} currently uses the "old" method of using the default script if none was given, so it should be fine for most languages. It doesn't have script detection yet, but if it's converted to Lua that could be added as well. —CodeCat10:03, 24 July 2013 (UTC)
I support using direct HTML code in our major modules that are used heavily in pages, i.e. Module:links and Module:translations and maybe Module:headword as well, it's a good idea to optimize them like this. Regarding other modules and templates (languages-specific headword templates/modules, etc.), it may not be a good idea, however. As you mentioned before, there is little point in optimizing the headword templates by using direct HTML code, as they are not heavily used in pages, unlike {{l}} and {{t}}. What if we decide to make a change in tags or in a class? This would be really hard to do if we use direct HTML code in templates. --Z16:13, 24 July 2013 (UTC)
But that can also be applied to any of the other CSS classes that we already use. And there has never been a problem with that. —CodeCat16:18, 24 July 2013 (UTC)
Other CSS classes are used in certain templates, so they are easy to edit. After this proposed change, a script-related class will be used in many templates and modules. And what if we want to make a changes for a face, say, not to use our currently deprecated tags, or add/remove/change a class for it? Yes these are not much likely to happen, but my point is there is also little need to optimize here. --Z17:09, 24 July 2013 (UTC)
I suppose we can keep {{script helper}} around for other templates to use if we really want to. But many templates have already been using <strong class="headword"> for some time, and I don't think templates really use mentions or bold text for anything else (and if they do, they seem to use bare italics and bold instead). —CodeCat17:17, 24 July 2013 (UTC)
So what about removing the "bold" face? After Lua-izing {{term}}, the "term" face in script helper will be useless and should be removed too. So only "face" will remain, which can be added to {{lang}} with an if, or can be used through another template, {{headword}}. --Z17:47, 24 July 2013 (UTC)
The "term" face is also used in other cases, not just in {{term}}. It's used in form-of templates for example, where it actually appears bold instead of italic. —CodeCat18:04, 24 July 2013 (UTC)
What about Lua-izing {{lang}} in such a way that takes face parameter, like the old script templates? We can add script detection and automated transliteration features to it, too. --Z18:20, 24 July 2013 (UTC)
But when would you add face=head to {{lang}} when there is no need to use that template in the first place. because you can use {{head}}? —CodeCat19:16, 24 July 2013 (UTC)
Not sure if I understand you question, do you mean we do not need {{lang|face=head|...}} because we can always use {{head}}? Then there's no need to use <strong class="headword"> either. --Z19:35, 24 July 2013 (UTC)
This will also significantly complicate writing simple templates by requiring users to be familiar with intricacies of HTML/CSS, which were before abstracted away in simple script template invocations. So now we have presentation generated in three different languages (HTML/CSS, wiki markup and whatever gets emitted by Lua), control-logic processing and content generation in two languages (template language and Lua). What a horrible mess. --Ivan Štambuk (talk) 11:55, 25 July 2013 (UTC)
Like I said, if you find writing HTML too complicated, use {{head}}? And even if we delete the script templates, we can keep {{script helper}} around for the kind of thing you mention. —CodeCat17:17, 25 July 2013 (UTC)
For me it isn't a problem, I'm just saying that it unnecessarily raises the bar of complexity for less savy editors that will get intimidated. Especially if they suddenly see their simple headword-line templates composed of {{head}} invocation rewritten by you. --Ivan Štambuk (talk) 01:39, 26 July 2013 (UTC)
Hello. I would like some outside input on the block of the above user by User:SemperBlotto. TCN7JM created his userpage, and SemperBlotto deleted it with the summary "No usable content given: contributions first, then a user page", which is surely not reflected in the consensus at #Blocking new users from creating userpages a few weeks ago. When TCN7JM recreated his userpage, which he had every right to do under local and global policy, SemperBlotto deleted the page and blocked him for "Disruptive edits".
Crosswiki users frequently create userpages on all Wikimedia Foundation wikis, including this one: see m:User:Pathoschild/Scripts/Synchbot and m:Global sysops (specifically "Global sysops must have user pages on every wiki they use their global sysop access on, which provides contact information or links to their primary user page.") As such, this disturbing practice of blocking global users when they create userpages on all wikis means that almost all crosswiki users, including stewards and global sysops, will need to be blocked immediately. Surely this is not what you want, and I encourage you to rethink this practice. --Rschen775408:43, 25 July 2013 (UTC)
Unblocked. I’ll tell you the same thing I told the guy in his talk page: we have some people who come here and expect it to work like a social network. This is why we usually delete userpages without any indication that their owner will contribute.
By the way, are you two the same person? We don’t allow abusing sockpuppets, so if you are I recommend you declare so. — Ungoliant(Falai)08:56, 25 July 2013 (UTC)
Whoah! I’m not accusing anyone, I’m just asking. I noticed the nearly identical signature styles, and you came to ask for a new user’s account to be unblocked. More likely you know each other from another project, I’m just trying to make sure. — Ungoliant(Falai)09:02, 25 July 2013 (UTC)
We're both members of WikiProject U.S. Roads on the English Wikipedia and fellow sysops on Wikidata. We are not sockpuppets. TCN7JM09:04, 25 July 2013 (UTC)
Considering that I am currently an admin on the English Wikipedia, Meta, Wikidata, the English Wikivoyage, and MediaWiki.org, this would be quite damaging to my Wikimedia career, so I hope you're okay with this. And also, I could attempt to deny this, but it seems that you've made up your minds anyway. --Rschen775409:23, 25 July 2013 (UTC)
All I'm saying is before dropping that accusation, or blocking someone over their userpage, be very careful, as it can completely destroy their record on other projects. --Rschen775409:43, 25 July 2013 (UTC)
I agree with Rschen and TCN on this. We seem to give "valuable" editors like Semper too much leeway in dealing with other users, and it's not the first time this has reflected badly on Wiktionary. And the defensive attitude that other people are showing here is really not OK. —CodeCat19:19, 25 July 2013 (UTC)
This thread looks like it's not going to be at all productive, and is probably going to degrade into personal attacks pretty quickly. Caution is advised. --Yair rand (talk) 19:13, 25 July 2013 (UTC)
The block was wrong. It's one thing to set an abuse filter, in response to a bout of spam, to automatically and pre-emptively stop new users from creating userpages (and that doesn't even mean it's a good idea to do that). It's entirely another thing to manually delete the userpage of and block a user who has satisfied the filter's one-edit requirement and who has participated in a discussion about the abuse filter in which, as Rschen7754 notes, the consensus was if anything that blocking new users was a bad idea. It's especially hostile to block a longtime contributor to another wiki, who is obviously not a spammer but a potential new member of our community. I agree with the unblock, and I have disabled Filter 21. I hope the flood of spam has passed, but even if it hasn't, we should counter it by writing a filter that blocks new users from adding links, not one that blocks colleagues from other wikis from contributing here. There are reasons en.Wikt has relatively few contributors, and our rigid structure and hostility to newcomers are among them. As Dan pointed out in a previous discussion, many of us created userpages before contributing to the project. - -sche(discuss)19:27, 25 July 2013 (UTC)
Yes, the block was wrong. Yes, SemperBlotto tends to go overboard in deleting user pages for newcomers. Yes, this discussion is deteriorating into adversarial ugliness. That said, please reconsider the disabling of Filter 21. I've already deleted two four spam user pages posted within the past hour, and I have every reason to expect lots more any time now. If we can change the wording to mitigate the effect of the filter on human users, we should, and any way to narrow the scope would be good, too. I should point out, though, that one very prolific spambot type specializes in planting keywords for search engines to see, and never inserts any kind of hyperlink. Chuck Entz (talk) 03:19, 26 July 2013 (UTC)
So that's why the spam is back. Please re-enable the filter that stopped it. A good chunk of my life has been spent on deleting that spam, partly because I'm in UTC+9 and therefore my time zone and therefore my period of activity each day differs from most other admins, and I have to watch out for spam, vandalism, etc. when others are asleep. I'm sure that any good-faith editor will understand a kindly-worded explanation as to why they can't make a user page yet. Since the Babel box is apparently the only thing allowed to newcomers, would it be possible to remove the user page entirely for newbies and replace that with a Babel tab in their preferences? --Haplology (talk) 03:31, 26 July 2013 (UTC)
I've reviewed the edits that Filter 21 had been catching, and created a new filter, Filter 25, that blocks users with ≤10 edits from adding links or certain spammy words to pages. That should stop the spambots; indeed, it has already stopped a few. I'd appreciate it if someone more familiar with the language abuse filters are written in could look over it, though, because it seems to have also blocked a couple of seemingly unrelated edits. - -sche(discuss)03:52, 26 July 2013 (UTC)
It may be helpful to CU the spambots and see if there's any ranges that can be shut down (for example, server farms/abused webhosts, open proxies, etc.) From what I've heard, there are some ranges that are not globally blocked due to the collateral such a block would have on the English Wikipedia... but here, on a smaller wiki, there may not be any editors on those ranges, so a rangeblock may be feasible. --Rschen775404:57, 26 July 2013 (UTC)
I'm very hurt that people would go and accuse very well-known and respected Wikimedians, especially those who are sysops on other projects of the things that I've seen on this page. You're supposed to be welcoming to new people, not hurl accusations at them and assume bad faith. What the hell ever happened to assume good faith? I know it is hard to do so with certain people, me included, but you've got to understand that there are some editors that are just trying to help, especially a user like Rschen7754. This user is a very nice and active user across multiple projects, and you guys really just give him the cold shoulder. That isn't acceptable on this project, in my opinion. Something should be done about this. Razorflame03:58, 26 July 2013 (UTC)
I too have been subjected to SemperBlotto's abuse of admin tools
Just as a heads up, I was given the same shafting treatment JUST LAST WEEK by SemperBlotto that all the others, even admins from other Wikimedia projects are now complaining to. I was blocked from Wiktionary via my IP address by SemperBlotto, as shown here . As you can see by my IP edits and user account history, my edits have only been constructive on this website. Here's my most recent edit before SemperBlotto's immediate block for "Disruptive Edits": . SemperBlotto had reverted my edit on basis of the fact that "there's a noun in my definition when the term being defined is an adjective." According to his comment on my talk page he made directly prior to blocking me, I'm not allowed to place a noun within the definition if the word being defined is an adjective. I have never heard of this policy before. But it's not like I could even discuss the matter because I was already blocked by the admin before I could even respond.
I actually just randomly looked up the term in question, "unseemly," in Merriam Webster's dictionary as well as several other dictionaries. They have not only included a noun within the definition of this same adjective, but coincidentally enough, the exact same noun as I included in my definition of "unseemly" on Wiktionary, which is "behavior." Again, SemperBlotto criticized me over using a noun within an adjective on my talk page, then instated an immediate block without so much as allowing me a chance to explain myself. By the way, here is Merriam Webster's definition of "unseemly" in which they too include the term "behavior" in their definition: .
But this is really not about the edit in question. I'm utterly confused as to why SemperBlotto blocked me and essentially got away with this even when I made an unblock request. As shown on my IP talk page, my unblock request was never even reviewed or addressed. Just because the admin disagrees with an edit does not give him the right to instate an immediate block.
Given how I was given the shaft by this admin, I was sure that the admin likely had a history of doing this to others. Sure enough, within one click to SemperBlotto's user talk page, I found my answer and was led to this very discussion. I have made a formal complaint about the matter here on Wiktionary's Feedback page. I plan on avoiding this website until I'm confident the website isn't being abused by administrators like SemperBlotto. And apparently this is a "join the club" sort of thing with SemperBlotto's wrongful blocks because he's got multiple individuals complaining to wrongful Wiktionary blocks that he's instated at his Wikipedia talk page. And he himself acknowledges that at least one of the complainants had a legitimate complaint against him, all as shown here . AmericanDad86 (talk) 19:11, 29 July 2013 (UTC)
MW only lists adjectival and adverbial meanings. AFAICS you repeatedly inserted wrong definitions and the short IP block was deserved. You should've taken it to the talk page first. --Ivan Štambuk (talk) 19:48, 29 July 2013 (UTC)
Repeatedly inserted?????!!!!!!!!! I initially made the edit. SemperBlotto then removed it without explaining himself and I reinstated it ONCE. He then reverted again and only then explained himself on my usertalk page before IMMEDIATELY instating a block without letting me respond; this, despite my many other positive contributions to this website. Also, this admin is going to keep abusing his admin tools as long as he has others here at this website condoning it like Ivan and the people in the above discussion who made accusations of sockpuppetry to defend SemperBlotto's misconduct. There are now administrators from other projects complaining about this admin's handling of his tools for God's sakes. What more do you people at this website need before you take action against the abusive treatment of an administrator? Is not the complaint of other administrators enough?! What's worse is the website doesn't even review Unblock requests after behaviors like Semperblotto's, as evidenced in all the complaints SemperBlotto has on his Wikipedia talk page. AmericanDad86 (talk) 19:55, 29 July 2013 (UTC)
If you had taken it to the talk page before reverting you wouldn't get blocked and an explanation would be provided. IP reverting an admin and reinstating wrong definition simply calls for a block. Those admins from other projects are in their own words self-styled "Wikimedia careerists", hat collectors that are more interested in creating fancy user pages than improving their single-digit edit count. --Ivan Štambuk (talk) 20:09, 29 July 2013 (UTC)
Ivan, you have just basically implied that all IPs are to miraculously know who is and who is not an admin. Ok, so let me get this straight: the administrators complaining to this above are shady; I'm an IP who should have miraculously known this was an administrator and deserved an immediate block for my 1 revert; now how is it the fault of all the others who went so far as following SemperBlotto to his Wikipedia page (because Wiktionary didn't review their unblock requests) to formally complain of unjusts blocks to him there?! How is it there own faults?! 173.0.254.22617:23, 30 July 2013 (UTC)
This is standard behavior. You made unacceptable edits to a definition, which were reverted (standard) you then used another account to undo this correct edit and got blocked for it (standard). You're either deliberately misquoting SemperBlotto because it makes your case seem stronger than it really is, or you just haven't read/don't understand his comment. Unseemly cannot mean behavior because then you could say "that's bad unseemly" and you can't. Mglovesfun (talk) 21:01, 29 July 2013 (UTC)
Sigh! Please review the edit in question, Mglovesfun. I didn't write unseemly meant behavior. I said I had the term "behavior" within the definition, as shown here . And this admin blocked me for inclusion of the word "behavior" in the definition because it is a noun. Is there any reason why some of the regulars here are afraid of this admin, SemperBlotto?! Can he block you too or something?! 173.0.254.22617:27, 30 July 2013 (UTC)
→ The key problem here is that the definition then incorrectly reads as if unseemly == behavior.
SemperBlotto (talk • contribs) reverted those changes here, apparently using the rollback feature, which doesn't leave informative edit summaries.
Anonymous IP user 173.0.254.226 (talk) then reverted Semper's revert here, but also without leaving any informative edit summary.
Semper then reverted the reversion here, blocked the IP for one day only (block log), and left a comment at User_talk:173.0.254.226 when initially creating that page here.
The facts here that concern me are actually not the actions of SemperBlotto, but rather those of AmericanDad86, who is apparently the same person as IP user 173.0.254.226:
Changing user accounts to revert another editor's reversion generally seems underhanded.
Assuming that the change from registered user to anonymous IP was an accident (perhaps due to login timeout), leaving an edit summary is always best practice -- informative edit summaries in this case might have obviated this whole kerfuffle.
If AmericanDad86 was concerned about being blocked, why didn't he log into his registered account again? The block log shows that the IP address was blocked from forming new accounts, but notably not blocked from logging into an existing account, as the Prevent logged-in users from editing from this IP address option was not enabled. (Unless I've misunderstood how blocking works -- in which case, someone please clue me in.)
Looking at the record, I somewhat disagree with Semper's block, but I sympathize with his actions in that situation -- I cannot speak his mind, but in his shoes, I would have been nonplussed at an anon reverting my own good-faith reversion. Given the well-described limits to our maintenance manpower, a one-day block on an IP anon that has gotten up my nose doesn't strike me as too terribly far outside the bounds of reasonable.
Meanwhile, AmericanDad86 seems to be on a tirade of sorts. Why not just log into the registered account, and continue merrily editing away?
Huh?! Seriously! Why do you only disagree with SemperBlotto's block "somewhat" and not "entirely"? This is a blatant abuse of his administrative tools? This creative excuse you've formed to try to make sense of SemperBlotto's instantaneous block is not even why he blocked me. He stated for himself and I quote "disruptive edits," nothing about this "abuse of sockpuppets" you're coming up with. Thus far, everyone whose come forth with complaints about this admin's abuse of tools is being accused of sockpuppetry and underhandedness. There was only 1 revert on my part, no different from what SemperBlotto had done, and I'm "underhanded" because I didn't make sure to sign in first?! Believe it or not, I was just passing through and making a quick edit. I didn't say to myself "Hmmm! Let me not sign in. That way, I can pull this, this, this, and this." Honestly, no wonder this administrator SemperBlotto is on an abuse of tools streak; the regulars here fully condone it and will justify it to anyone who dare even questions it. I won't be editing here at all and writing bad reviews all about the internet about this website. 173.0.254.22616:58, 30 July 2013 (UTC)
Oh by the way, I just realized I wasn't signed in when I made out this last post. I guess that's me being "underhanded" and trying to pull a fast one on the system. Rolls eyes! You regulars need to stop being afraid of this abusing administrator SemperBlotto and actually stand up to him when he's out of line. I'm not quite certain why some of you regulars are scared of this particular admin, but understand that you're pushing people away when you allow SemperBlotto to do whatever the hell he pleases on this website with no consequences. 173.0.254.22617:02, 30 July 2013 (UTC)
173, re-read your post(s) with special attention to your tone -- you come across as strident and vituperative. I understand that you're frustrated, but if you intend to convince others, it generally works better if you don't start off by being caustic.
FWIW, I'm not afraid of Semper, and I doubt that any of the other admins here are. I merely tried to explain 1) what I can verify as the objective facts, and 2) my own subjective interpretation of events. If you choose to be upset by that, that's certainly your prerogative.
By way of more background, we have lots of vandalism, and lots of well-intentioned-but-incorrect editing, and very few maintainers. Blocking an anon for problematic editing is a common approach to handling maintenance issues, for good or ill. If you'd been logged in when you made your second edit to unseemly, I don't think that Semper would have blocked you. And again, your IP was blocked from making anonymous edits, but your registered account has never been blocked (assuming that you are AmericanDad86; block log). Why don't you simply log in to make your edits? That's the whole point of having a registered account in the first place, no? ‑‑ Eiríkr Útlendi │ Tala við mig17:35, 30 July 2013 (UTC)
I just wrote most of this page, which is almost wholly formatting guidelines, but feel free to take a look anyway.
The main reason I'm posting this in the BP is that I raised a few questions at Wiktionary talk:About Swahili that I would like community discussion on. Please don't shy away from commenting if you don't know Swahili, although anybody with experience in Bantu languages is especially welcome. —Μετάknowledgediscuss/deeds18:47, 25 July 2013 (UTC)
It's generated automatically when you use the "+" button in an RFV tag to add the entry to WT:RFV, and it (when it's working properly) causes the header to link right to the RFVed section of the page, a requested feature that is especially useful on long pages. What's not to like? How it looks in the edit window of WT:RFV, which is the only place it's visible? In that case, I think removing it would add too much complexity (=the requirement that people find which definition in even very polysemous, polysectional entry has been RFVed) for too little benefit (=shortening some invisible text that no-one has to type manually anyway). - -sche(discuss)09:17, 27 July 2013 (UTC)
"a requested feature": any evidence? RFV worked without this for ages without trouble. Your notion of "complexity" is backwards, IMHO. --Dan Polansky (talk) 09:32, 27 July 2013 (UTC)
As the RfV and RfD processes are applied to more and more non-English terms and as even our English terms become more polysemous, it is becoming harder and harder to find even which language section a challenge is against. We could ask RfVers and RfDers to manually insert finding information when it is necessary and not at other times, but I rather doubt that would be complied with, except by the most diligent. Sometimes I and others have inserted the challenged definition line on the RfD and RfV pages, but the practice has not caught on.
Placing the challenged definition on the RFD page is an established practice. Your claim that "the practice has not caught on" is untrue, from what I have seen in RFD and RFV. --Dan Polansky (talk) 11:45, 27 July 2013 (UTC)
I don't see how this practice is "too much complexity for too little benefit" either; neither how it's complex nor how its benefit is little. The only thing that I find slightly annoying is it makes links orange instead of blue, but that I can live with. —Angr16:42, 27 July 2013 (UTC)
It's a bit like saying 'aa' has twice as many letters as 'a'. True, but in the context of a page the size of WT:RFV, what possible relevance? Mglovesfun (talk) 19:00, 27 July 2013 (UTC)
I think Mg is saying that the amount by which "arbor vitae#rfv-notice--|arbor vitae" is longer than "arbor vitae" is negligible compared to the enormous size of WT:RFV. - -sche(discuss)00:17, 28 July 2013 (UTC)
Can we at least have the last two minuses removed, resulting in this?
== ] ==
Or even this?
== ] ==
I find the last, shortest form least visually disturbing.
For for the responses by MG, I was not speaking of size, but of complexity. What I was not speaking of but should have been is the amount of nonsubstantive markup material the human eye has to parse when looking at the wiki text. --Dan Polansky (talk) 08:53, 28 July 2013 (UTC)
You say the human eye, so far in terms of people who've actually complained about it, there's just one, you. Of course you have a right to complain, but it doesn't make sense to rework the system in the face of just one complaint. Mglovesfun (talk) 09:35, 30 July 2013 (UTC)
"Reworking the system" you are referring to is a simple edit to a template that changes "rfv-notice--" to "rfv". Easy as a cinch. I see no "rework" and no "system" involved in the discussion; I see a single template and a simple change to it involved.
I surmise that parsing == ] == is a bit harder for any human eye than parsing == ] ==. If I am wrong about this, editors are likely to disagree with me and the change will not get implemented. The point of your response escapes my understanding. --Dan Polansky (talk) 22:18, 31 July 2013 (UTC)
How to make a language-independent sort key?
{{head}} now generates a custom sort key by removing any hyphens from the beginning of the word, so that suffixes are sorted according to their base letter. This seems like a fairly "safe" change to apply to any word in any language (as {{head}} is multilingual). Are there any other changes that can safely be applied to a page name to make a sort key in any language? Removing any initial apostrophes for example, or other kinds of punctuation? —CodeCat00:01, 28 July 2013 (UTC)
This is about removing or replacing characters from a page title in a language-independent way. Are there any characters in Chinese or Japanese entry names that should always be changed or removed, and should be in any entry regardless of language? —CodeCat10:37, 28 July 2013 (UTC)
I've added those to the module as well, and tried to orphan any references to them, but it will take some time to update. Strangely, the Scottish Gaelic module was not used for any Scottish Gaelic templates, but it was used for Irish. —CodeCat01:49, 29 July 2013 (UTC)
Hmm, that probably means that I forgot to do Scottish Gaelic. A little while back I was doing sort keys... I think roa-jer and roa-grn use fr-utilities too, right? So their sorting ought to be in the module the same way. —Μετάknowledgediscuss/deeds01:59, 29 July 2013 (UTC)
What about hyphens in the middle of a word? I noticed fo-ordaithe removes the hyphen in its sort key; is this something we should apply to all terms in all languages?
As a side note, I created Category:Sort key tracking to gather some statistics about sort key usage. Entries are placed there by format_categories in Module:utilities, if they specify a custom sort key. The entries are then split among the two subcategories depending on whether the automatically-generated sort key equals the provided one. If they are the same, the provided sort key is redundant, whereas if not, it's needed. This can help us with devising sort keys (in Module:languages and the language-independent one in Module:utilities) that cover as many cases as possible. The goal is to empty out the "redundant" category altogether, and to keep the "needed" category as small as possible. —CodeCat21:38, 5 August 2013 (UTC)
Those categories are really great! I think we should only strip internal hyphens on a case-by-case basis, to be safe about it. For example, Manx needs it, but I'm not sure about some other languages... One thing I don't understand is why Âbréhan is not being catted as redundant, for example. —Μετάknowledgediscuss/deeds03:10, 6 August 2013 (UTC)
It's because the generated sort key is always lowercase, but the provided one has uppercase letters, so they don't match. I should probably change that so that the provided sort key is lowercased as well. I am planning to run a bot over all the "redundant" entries to remove the sort key. —CodeCat10:46, 6 August 2013 (UTC)
Apostrophe conflict between Yucatec Maya and Mopan Maya
On en.wiktionary the glottal stop is consistently rendered as ⟨’⟩ (curly apostrophe) in Yucatec Maya entries, and consistently rendered as ⟨'⟩ (straight apostrophe) in Mopan Maya entries. Normally this would be OK since it seems to be up to the conventions of each language as to how to render a particular phoneme. In this particular case, however, it creates an awkward situation since the two languages are closely related and often share words. So we have a Mopan Maya entry for ka'an ("sky") and a separate Yucatec Maya entry for ka’an (also "sky"). It's the same word with the same spelling, but on two different pages due to the different apostrophe rendering. I'm not really sure how to handle this. Normally we would create a redirect from one to the other, but in this case both are valid as separate entries for different languages. Using "Alternate form" doesn't seem right as that is typically used for alternate spellings. (And do we really want to list an alternate form for every word in Wiktionary that has an apostrophe?) It seems that the only real solution would be to harmonize apostrophe usage across all Mayan languages.
Does that sound like the correct approach to resolving this?
Generally we use the straight apostrophe for page names, and use alternative displays with the curly apostrophe when linking to them (though few people actually bother doing that.)
The entries being in different pages is not really a problem. Just make sure you add {{also}} to the top of each page linking to the page with the other apostrophe style. — Ungoliant(Falai)10:01, 28 July 2013 (UTC)
I think the exception to that rule is languages which use ’ as part of their orthography, rather than it being a typographical variant of '. In some cases we use ===See also=== for related terms in other languages. Mglovesfun (talk) 10:05, 28 July 2013 (UTC)
How is this a problem? We have this for other languages as well, like Arabic Ka vs. Persian Ka (and I think Ya too). -- Liliana•10:22, 28 July 2013 (UTC)
If the apostrophe is really a letter of the language and not just a punctuation mark, we shouldn't be using either ' or ’ but rather ʼ (U+02BC, MODIFIER LETTER APOSTROPHE). Both pages should be moved to kaʼan. —Angr10:59, 28 July 2013 (UTC)
Yes, move all to use ʼ (as in kaʼan), per Angr. We had the same problem with the Athabaskan languages, where most entries have been made with the straight apostrophe ('), but in Navajo we use ʼ. All of these are the glottal stop and should be represented by the glottal-stop character ʼ (which is an alphabetic letter), not by the punctuation ' or ’. It is a similar problem with the Eskimo-Aleut languages, the Polynesian languages, and other languages that use a similar character for the glottal stop. The Polynesian languages want it turned 180°, so they use ʻ (as in ʻoe).
Using the glottal-stop letter ʼ versus the punctuation ' or ’ (nonletters) affects the way a word may be clicked on, or highlighted: if you click on ka'an (with straight apostrophe), it will not highlight the whole word, since ' is punctuation, not a letter. If you click on kaʼan, the whole word is highlighted, since ʼ is considered a letter and not punctuation. It also affects searches ... if you google kaʼan (with the glottal-stop letter), Google searches for the whole word. But if you google ka'an (with straight or curly apostrophe), the apostrophe is treated like a space and Google searches for ka and an. —Stephen(Talk)11:18, 5 September 2013 (UTC)
Italian form-of entries proposal
While looking through a bunch of Italian form-of entries, I've noticed that a lot of them are using antiquated formatting such as bolding the headword underneath the POS header, and including the actual form categories, like Category:Italian verb forms for Italian verb forms, etc. What I propose is that someone makes a program for a bot that would go through these form-of entries and update the formatting so that they are up to date with the current way things are, such as {{head|it|Italian verb forms}} etc., etc.
While I don't think this is a major ground=breaking issue, I believe that having the same entry formatting across all entries is key to a successful Wiktionary. Now, I cannot hope to do this myself since my bot is blocked, but I'm sure that there are some smart people out there that could work something out for these pages :) Comments, suggestions? Razorflame20:31, 31 July 2013 (UTC)
I can't do it using only regex. I can do it for any time the Italian header is directly followed by the verb header, but I can't code for every possible combination of headers. Mglovesfun (talk) 21:21, 31 July 2013 (UTC)
I guess that's as good a place to start as any :) I don't know how it would work for the adjectives/nouns, but at least the verb forms will be done :) Razorflame21:24, 31 July 2013 (UTC)
MewBot is capable of splitting the page by language section, so it will only do the Italian sections if told. But I'm afraid that fixing these entries will not be easy because they are still being created - by SemperBlotto. diff was done barely a week ago. —CodeCat21:29, 31 July 2013 (UTC)
Bot now fixed. The main reason for the various different formats is that I was the first to botify the creation of such forms, and the required format has changed several times over the years. Perhaps we will keep the current format for some time? SemperBlotto (talk) 07:28, 1 August 2013 (UTC)