Hello, you have come here looking for the meaning of the word Wiktionary:Beer parlour/2013/June. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Beer parlour/2013/June, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Beer parlour/2013/June in singular and plural. Everything you need to know about the word Wiktionary:Beer parlour/2013/June you have here. The definition of the word Wiktionary:Beer parlour/2013/June will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Beer parlour/2013/June, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
As a nine-year Wikipedian and very occasional editor here at Wiktionary, I would like to offer insight into what this community's atmosphere feels like to an outsider. Here are a few thoughts from 42 short hours of interacting with six established contributors (on various pages) with respect to the subject of untranslatable terms:
Content upon which articles, books, research, and most members of the public express interest is useless
Content upon which articles, books, research, and most members of the public express interest is irrelevant
Content pertaining to knowable terms should be omitted because it could be contradicted by unknown information
Content should be omitted if inspired by the popular press
Content should be omitted if it requires substantial effort to produce
The most widely acclaimed translation of a nation's most seminal work is unacceptable as a reference
Scientific meta-analyses consist of subjective opinion and cannot be used as a reference
Two books and an expert's paper aren't to be considered durably published references
Verbal challenges to the veracity of references may not be overcome by verbal verification with one or more native speakers of an obscure language
Verbal challenges to the veracity of references may not be overcome by directly contacting its author to obtain explicit provenance
These observations are not intended to ridicule the associated contributors in any way as individuals – in fact, I hope to work productively with each and every one of them even on this very issue. However, most new contributors to Wiki-projects cannot endure more than two or three blows like these before getting a pretty sour taste in their mouth or just outright leaving forever. — C M B J09:08, 3 June 2013 (UTC)
I think you are from Wikipedia. Having to follow these rules doesn't seem much different from having to follow Wikipedia's rule about "no original research" (which talking to a native speaker would also be, unless it were published and peer-reviewed). Perhaps it's just a matter of understanding why these rules exist, and adjusting to them? Equinox◑09:19, 3 June 2013 (UTC)
No original research is a fine policy, but that was not a point of concern for three very sound reasons: (a) the content was published by multiple sources and veritably by at least one, (b) its criteria for inclusion is lower due to special considerations for rare languages, and (c) past thinking on the matter was that attestation by a native speaker or knowledgeable individual would be considered appropriate. I am more than able to navigate policies foreign to me and I do not believe unfamiliarity or confusion on my part to have been a factor. — C M B J09:40, 3 June 2013 (UTC)
These are very quickly formed opinions and based on comments by individual users rather than the community as a whole. We cannot stop individual users having opinions, nor would we want to. Mglovesfun (talk) 09:21, 3 June 2013 (UTC)
As per above, I am here speaking on good faith and, frankly, I really don't appreciate being accused of trying to break the project's rules for personal reasons, of being ignorant and uninformed after providing this explanation, or being disparaged for identifying with Wikipedia, or being asked to stop talking (while simultaneously being told "we cannot stop individual users having opinions, nor would we want to"), all because I dared to disagree with unsubstantiated claims. These unprovoked and irrational ad hominem attacks are ironically representative of why I felt that it was necessary to go out on a limb and share my experience in the first place. This environment is toxic. — C M B J10:18, 3 June 2013 (UTC)
I think there are two issues going on here, and some of the disagreements may stem from different understandings of what's under discussion. The sources you've cited for these words are fine (at least as far as I'm concerned) for having an entry on words like tingo#Rapa Nui. The sources are perfectly adequate in terms of WT:CFI for less well attested languages. What's more problematic is putting them in the Category:Terms without an English counterpart (or even having that category) because deciding what words do and do not "really" have an English counterpart is highly subjective. At some level of philosophizing, no word in another language has an exact English counterpart because each word in another language will have slight shades of meaning and connotation that the English word doesn't have. I think all of the words listed over at WT:RFD/O#Category:Terms without an English counterpart is worthy of having a Wiktionary entry, provided it's in the correct script with the correct capitalization and as long as confirmation from some published work by a recognized expert in the language (as opposed to the popular press) is provided to confirm the term's existence in the case of less-attested languages, and as long as three cites from durably archived sources are provided in the case of well-attested languages like German. But deciding what goes into the category of "English doesn't have a word for this" is problematic because there are no objective criteria for it. I myself have worked on Sättigungsbeilage, a word with no obvious English translation, and brought it to the attention of word mavens by nominating it for WT:FWOTD, but even I'd be wary of putting it into that category. —Angr10:37, 3 June 2013 (UTC)
I don't wish to intrude upon the rest of the discussion, but I'm reminded of a statement made by my German teacher, that if you really, really think about it ‘bread’ has no exact German translation, because what we think when we hear the word ‘bread’ is very different from what Germans think when they hear the word ‘Brot’. So even though the word ‘Brot’ is a general term, in theory even including even including things like ‘matzo’, when they say they eat bread, they mean they eat German bread without meaning they eat German bread. And that makes ‘Brot’ in some sense untranslatable. (And of course, vice-versa for ‘bread’.)
CMBJ, please don't see disagreeing with you as a personal attack, as we can't agree with you purely so you don't feel attacked. Mglovesfun (talk) 11:29, 3 June 2013 (UTC)
To quote:
…here's a tip. If you don't know what you're talking about, stop talking."
"…become better informed, please!"
"You come across as a Wikipedian trying to get round the rules for his own personal reasons."
"I'd suggest if you have nothing relevant to say, say nothing."
These are not spirited disagreements over the issue at hand. They're wanton personal attacks, and even worse, they're non sequiturs. I am willing to forgive and forget and move beyond them, but I will not for one second tolerate further denigration—especially in a safe place and from an administrator. It's doubly unacceptable. — C M B J12:05, 3 June 2013 (UTC)
I stand by all of that. What's wrong with being better informed or not making ill-informed comments? What exactly do you object to? Mglovesfun (talk) 17:02, 3 June 2013 (UTC)
The insulting tone, the implication that he doesn't know what he's talking about, has nothing relevant to say, and should shut up. I find these statements insulting too and I'm not even the one they're directed at. Any reasonable user would take these as personal attacks. —Angr18:05, 3 June 2013 (UTC)
I wish to associate myself with Angr's comments immediately above, as well. Even though I sometimes have delivered abusive comments, I don't think it is good practice, especially directed at a new contributor who is making good faith efforts to contribute. DCDuringTALK18:25, 3 June 2013 (UTC)
I mostly agree that WT editors and admins, myself included, sometimes come across too tersely or even insultingly, perhaps as we let our frustrations get the better of us.
However, the comment above that "You come across as a Wikipedian trying to get round the rules for his own personal reasons" does not itself strike me as all that accusatory -- it is simply a description of what CMBJ's push-back might be viewed as. Within the greater context of CMBJ's interactions, I can see how CMBJ might interpret it as inflammatory, however.
Online discourse can be difficult. Without all the visual social cues that humans have evolved to give and receive, intent is often hard to discern. -- Eiríkr Útlendi │ Tala við mig18:39, 3 June 2013 (UTC)
Just to be clear here, these comments were not a clumsy exchange of benign text that just came out sounding wrong. Mglovesfun was not even a participant in the associated discussion prior to stating, with candor, that I was insidiously "trying to get round the rules" (what applicable rules?) for my "own personal reasons" (what possible reasons?) and that I should "stop talking" until I "become better informed" (about what subject?). There is also no reconciling "we cannot stop individual users having opinions, nor would we want to" with "I'd suggest if you have nothing relevant to say, say nothing" because they're contradictory advice in principle. Moreover, if dissecting and attempting to calmly refute unsubstantiated claims is perceived as frustrating push-back, then something's very wrong here, because that's how consensus is supposed to be developed.
For what it's worth, I'm thick skinned. I'm not here calling for his de-sysopping. I respect his right to say anything to me, even if accidentally misconstrued or necessarily offensive, and even if in violation of the letter of policy if he genuinely felt it was justifiable for some reason. But that doesn't mean that the other 99 contributors who don't have the nerve to speak up will stick around after such shoddy treatment, which is the focus of this thread. — C M B J04:26, 4 June 2013 (UTC)
Thanks for the suggestion. I seem to have already picked up just about everything it describes, but it would've undoubtedly been helpful to me not that long ago. Maybe an eventual goal should be to automatically display it in a dismissible sitenotice for unified accounts that have >500 Wikipedia edits and <25 Wiktionary edits. — C M B J03:35, 4 June 2013 (UTC)
CMBJ, you came in with your personal project. Many people who join Wiki-projects with broad ideas of how it should be leave frustrated. Your sources do not match up to what we expect for the project, and many of us don't think your new category is useful. Personally, the fact that your signature links back to Wikipedia doesn't inspire me to treat you as other then a Wikipedian tourist.--Prosfilaes (talk) 19:37, 3 June 2013 (UTC)
First of all, this is not my personal project and I did not come here with an ulterior agenda based on broad assumptions of how everything should be. I did, however, come here and find that an expected level of detail was missing in an area that is of particular interest to many readers, and whether that information is most appropriately presented in the form of a category or not is aside the point. The problem here is that, for a new user, participating on this project is painful, and not for reasons that can be explained away as normal responses to personal fault. This is a chronic problem and that is made abundantly clear by the reactions that articulating it has provoked.
Even in your case—and I stress that you're not even involved—the response has been to just further make this about me. The fact that the thought would even cross your mind to view cross-project editors as "tourists" is very telling of the climate here. The fact that you would for some reason consciously treat them differently, and feel comfortable and confident about stating that intention openly amongst peers and moderators—and to justify the behavior of others, no less—is even more telling because these are the very people who should sense the utmost of hospitality and collegiality and support while making their first contributions.
Further to that point, this "we're not Wikipedia" mantra does not resonate with me at all; both projects are funded by the WMF and both projects labor for the same central goal. The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration. — C M B J03:30, 4 June 2013 (UTC)
You can't say I stress that you're not even involved to any editor at this point, because you are making negative accusations about the entire project, i.e. all of us. Strictly regarding the project, if the issue is that the environment is toxic and painful, some users may experience that in isolated cases, but overall I think the editors do their best to be civil and helpful. When they fail, it's because of the limits human nature and of communicating via online forum. That's my opinion. The empirical support would be that 1000 active editors made it through the supposed toxicity somehow. --Haplology (talk) 04:48, 4 June 2013 (UTC)
Actually, yes, I can, and previously did, and will continue to do so, because my perception of this problem is that it is systemic in nature. This is not an unreasonable assertion because attitudes and norms are contagious social factors. In this case, I am already familiar with the complications that you speak of from Wikipedia and other communities, but it is my view that, with respect to this particular community, they are above and beyond what would be considered normal. It is also more than possible for thousands to unknowingly endure such tendencies and then inadvertently and unintentionally perpetuate them forward without ever taking notice. — C M B J09:34, 4 June 2013 (UTC)
I don't view editors who also edit other projects as tourists. I view editors who set their signature to link to another project as tourists. It's an obnoxious habit, and it makes my eyes roll on any project I see it on. And anybody who waves a flag saying "I'm not interested in working with this project" is not really the person to devote extra care on.--Prosfilaes (talk) 04:29, 4 June 2013 (UTC)
"I don't view cross-project editors as tourists, I just view cross-project editors like you as tourists" isn't exactly making the premise any less malicious. For your information, I personally provide a link to my home wiki to centralize my identity and so that others can receive a timely response to messages. I consider this configuration to be of mutual benefit and so I utilize a dirty workaround to make possible what will likely be a standard MediaWiki feature at some point in the future. Regardless, this has yet once again devolved into ignoring the issue while making this about me. — C M B J09:34, 4 June 2013 (UTC)
Granted, Mglovesfun was pretty rude, and your experience overall hasn't exactly been pink horsies and rainbows. Still, you aren't completely blameless, either. Before you came along, the discussion consisted of 5 edits totalling 778 characters. SemperBlotto called it a "useless category", but otherwise the comments centered on practical issues. Pretty mild stuff.
A week later, you decided to weigh in. Ignoring the entire discussion, you set out to educate us about how your category was the only thing preventing us from descending into a morass of error and mediocrity. You started out with "This concept is itself independently notable and such categorization is necessary for the eventual completeness of our project".
From your very first sentence, you set yourself to the task of telling us in absolute terms what Wiktionary has to have in order to be any good at all. Your comment later on is telling: "The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration." No false modesty there. The fact that most of us also contribute to Wikipedia seems to have escaped you.
Except you don’t seem to understand what you’re proposing to change: notability is strictly a Wikipedia concept- our CFI center on usage. What's more, categories aren't content- they're tools for organizing and navigating through the dictionary entries, which are the real content. The completeness of the project has nothing to do with categories.
We do things differently than Wikipedia not because we don't know any better, but because Wiktionary is a dictionary, and Wikipedia is an encyclopedia. Dictionaries are highly structured and concise- we don't go into much detail, because people come to us for very specific types of information, and everything else is clutter. Your category has all the markings of a typical Wikipedia list article, starting with the interesting concept. As I mentioned above, our categories are mostly for organization and navigation- not for telling a story.
You then added almost 50 lines of unnecessary examples regurgitated from popular websites, complete with footnotes/bibliographic references, for a total of 6 edits and 5876 characters- 7 1/2 times the size of the entire discussion- before even starting to address so much as a word of what anyone else had said. I'm pretty verbose, myself, but that’s a lot!
To sum it up: you tried to graft encyclopedic concepts onto a dictionary, jumped into the discussion about it without addressing anything already said, dumped huge amounts of verbiage on us while still missing the point, talked to us like you were introducing civilization to the heathens, and then wondered why everyone got annoyed at you.
What it boils down to, is this: the category that you thought of as the ideal way to dress up this nondescript little backwater of ours was nominated for deletion as useless by the locals. You seem to have taken this as a criticism of your judgment, and have a very strong emotional vested interest in fighting off the challenge. You don't want to hear that, so you've been repeatedly ignoring the issue while making this about Wiktionary. I would say more, but this has grown to almost half the size of your original post... Chuck Entz (talk) 09:52, 4 June 2013 (UTC)
We now have one account across Wikimedia Wikis, and come August, the chance that anyone might believe there's two CMBJs editing on Wikimedia will be removed with the renaming of unified accounts. It's not mutually beneficial; it left me on another wiki when I was trying to check your contributions, I would have had to deal with completely irrelevant material if I wanted to leave a message there, and certain users may not be able to leave a message at all. (I know of at least two major editors on Commons that are blocked on en.WP.)
You keep saying it's not about you, but you were one party in all these discussions. How could we have best informed you that we were deleting the category in all forms? If you can't think of a way, you're saying the members of this Wiki don't have the right to choose what content they find acceptable.--Prosfilaes (talk) 17:46, 4 June 2013 (UTC)
Agreed. I spent so much time trying to come up with a coherent explanation of the problems I saw in all this, that I just ended up tired and grumpy. I take back the negative tone of my comments, but I don't have time or energy to rework everything right now. It will have to stand in its current ugliness until I can rework it and address the real issues I was trying to get across. Chuck Entz (talk) 12:15, 4 June 2013 (UTC)
Individually,
“Granted, Mglovesfun was pretty rude, and your experience overall hasn't exactly been pink horsies and rainbows. Still, you aren't completely blameless, either. Before you came along, the discussion consisted of 5 edits totalling 778 characters. SemperBlotto called it a "useless category", but otherwise the comments centered on practical issues. Pretty mild stuff. A week later, you decided to weigh in.”
The reason that I weighed in a week later on this matter is because no one had the courtesy to notify me of the deletion discussion. I found it accidentally while navigating for unrelated reasons.
”Ignoring the entire discussion, you set out to educate us about how your category was the only thing preventing us from descending into a morass of error and mediocrity. You started out with "This concept is itself independently notable and such categorization is necessary for the eventual completeness of our project". From your very first sentence, you set yourself to the task of telling us in absolute terms what Wiktionary has to have in order to be any good at all.”
I strongly disagree that I ignored this discussion and in fact my original response was intended to address prior concerns (“useless”, “difficult to manage”, “necessarily subjective”) by presenting a cogent case otherwise (“independently notable”, “necessary for completeness”, and as a clarification, “scholarly examples exist”). Moreover, I do believe that this information is necessary for the eventual completeness of this project. I base that view on the observation that many publications have expressed interest in this particular area.
“Your comment later on is telling: "The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration." No false modesty there. The fact that most of us also contribute to Wikipedia seems to have escaped you.”
This was not false modesty and this assertion may very well be the most offensive remark made since this ordeal began. The comment does not refer to my self-image but my view of each and every individual who meets this description, many of whom are truly stellar in every sense of the word.
”Except you don’t seem to understand what you’re proposing to change: notability is strictly a Wikipedia concept- our CFI center on usage. What's more, categories aren't content- they're tools for organizing and navigating through the dictionary entries, which are the real content. The completeness of the project has nothing to do with categories. We do things differently than Wikipedia not because we don't know any better, but because Wiktionary is a dictionary, and Wikipedia is an encyclopedia. Dictionaries are highly structured and concise- we don't go into much detail, because people come to us for very specific types of information, and everything else is clutter. Your category has all the markings of a typical Wikipedia list article, starting with the interesting concept. As I mentioned above, our categories are mostly for organization and navigation- not for telling a story.
The only thing I was/am proposing is that this information—which, again, I believe to be necessary for the project's completion—not be needlessly eradicated. The way it is presented in makes little difference in my mind, so long as it's easily accessible to readers.
”You then added almost 50 lines of unnecessary examples regurgitated from popular websites, complete with footnotes/bibliographic references, for a total of 6 edits and 5876 characters- 7 1/2 times the size of the entire discussion- before even starting to address so much as a word of what anyone else had said. I'm pretty verbose, myself, but that’s a lot!”
These examples were preceded by the question of “what would go in these categories?” and I consider them to have been a decent response. The sources were presented in such a way that would convey their journalistic nature, attributions were provided to avoid plagiarism, and they were formatted in the usual way.
”To sum it up: you tried to graft encyclopedic concepts onto a dictionary, jumped into the discussion about it without addressing anything already said, dumped huge amounts of verbiage on us while still missing the point, talked to us like you were introducing civilization to the heathens, and then wondered why everyone got annoyed at you. What it boils down to, is this: the category that you thought of as the ideal way to dress up this nondescript little backwater of ours was nominated for deletion as useless by the locals. You seem to have taken this as a criticism of your judgment, and have a very strong emotional vested interest in fighting off the challenge. You don't want to hear that, so you've been repeatedly ignoring the issue while making this about Wiktionary. I would say more, but this has grown to almost half the size of your original post.”
No, I simply tried to incorporate popular lexicographical information into Wiktionary. I found that information silently nominated for deletion and sprung into action to help save it. I attempted to address the other participants' concerns individually and have continued to do so as best possible. I did and still do take issue with the unwillingness of multiple participants to address cogent counterarguments.
Again, and as a final note, I want to reiterate and make unequivocal that this thread was not intended to be focused on the RfD. It is not about and was never about me or my opinions here. It is, however, about how toxic this environment feels from the perspective of a new user, which unfortunately has been further echoed by this discussion. — C M B J11:19, 4 June 2013 (UTC)
It seems most people here (especially judging from Mglovesfun's comments) just tried to bash the new contributor instead of taking their time and help him to contribute what he want in the "right" way (which differs in every project); of course because it's the easiest way to deal with new users. The worst part was the comment by the idiot who accused him of being a troll. Chuck Entz is right about the purpose of the category namespace, and CMBJ is also right that this information is necessary and quite useful. The solution here is that these informations should be put in the appendix namespace, as a list, and I think it would become a quite useful one. --Z12:07, 4 June 2013 (UTC)
I hereby admit to being rude and promise to try to do better.
I wholeheartedly accept instances of this gesture as making amends. Additionally, if my own actions led to ill feelings for anyone involved at any point, then I ask forgiveness and offer my commitment to continued cooperation and respect in all efforts that contribute to the advancement of our common mission. — C M B J11:15, 5 June 2013 (UTC)
Silent deletions are really infuriating. Every {rfv, rfd}-ed page should have all of its respective contributors notified on their talk page (perhaps by a bot, it's easily automatable). Or even better - through a notification gadget like the one on Wikipedia. --Ivan Štambuk (talk) 14:07, 4 June 2013 (UTC)
That is apparently part of the basic software and is available in user "Preferences". I find it very useful to track the limited number of pages I watch in WP, Species, Commons, and MediaWiki. DCDuringTALK16:05, 4 June 2013 (UTC)
I imagine we want to convert {{context}} to Lua at some point, so I am wondering what the best way would be. Ruakh made a start with creating a replacement some time ago, {{label}}. It's used on a few pages but it's template-based and uses subtemplates instead of "raw" templates. One of the advantages of using subtemplates is that it eliminates any conflicts between context labels and other templates. {{context}} would use any template that had the same name as the label, which often causes problems (the recent issue with {{abbreviation}} is one example). On the other hand, because {{label}} doesn't use the "bare" template as the label, it's not possible to write something like {{intransitive}} by itself, you'd need {{label|intransitive}} instead.
The most straightforward way to convert these to Lua is probably something like Module:languages, with a single data module containing all the information for the context labels, and a separate module to handle the processing and display. I think that the approach used by {{label}}, in which labels always need {{context}} or {{label}} prefixed, is preferred for a Lua implementation. It would drastically reduce the number of context templates we need to maintain, it would remove any conflicts between labels and other templates with the same name ({{plural}} for example!), and it would also prevent any desynchronisation between the templates and the module. For example, if someone creates a new context label, they'd need to remember to also create a matching template, which would not really add much value to the system and just be there for convenience. It makes more sense to not create those templates in the first place and to always require the same template to "initiate" the process. Another advantage is that bots, if they want to parse entries, no longer need a long list of which templates can possibly be used as context labels, because there'd only be one.
Another change I would like to make while we're at it, is to use the first parameter to specify the language code. It's common for editors to forget to specify the lang= attribute because they're not aware that some context labels add categories. The problem is compounded by the fact that only some labels categorise while others do not so editors need to remember this for every label. {{intransitive}} does not categorise for example, so it's easy to miss this and, when you want to add a second label like {{rare}} (which does categorise), to forget the language. I believe that requiring the language as the first parameter will help with these problems because then it can never be forgotten or skipped, so it makes editors more aware that they need to put something there and that they need to change it when copying content to another language.
I support everything. My only suggestion is that if we are going to make {{context}} obligatory, we should create a shorthand, like {{x}} or something, redirecting to it. — Ungoliant(Falai)19:23, 3 June 2013 (UTC)
That’s a language code, and {{c}} is a grammatical label. What about {{ct}} or {{ctx}}? (But let’s not forget that any 2 or 3 letter template is a timebomb waiting for ISO to release it as a language code.) — Ungoliant(Falai)19:31, 3 June 2013 (UTC)
Well, we're phasing out the language templates, so we don't have to worry about that. And {{c}} may also be phased out if we decide to do so, since we now have a module to replace it. We could also decide to use Ruakh's {{label}} instead, which is a bit shorter. —CodeCat19:34, 3 June 2013 (UTC)
In that case, I state my preference for {{c}}, since that’s the smallest increase possible in the amount of characters one will need to type. — Ungoliant(Falai)19:40, 3 June 2013 (UTC)
Sounds good, but let’s abandon the misleading name “context.” Labels like {{pejorative}}, {{plurale tantum}} and {{abbreviation}} are nothing to do with context.
Context means two different things in lexicography. One is the context a word appears in in its citation, esp. in corpus lexicography. The other is in something called discourse analysis, and seems to be only vaguely related to usage as we consider it here.
We are using this template for both usage and grammatical labels, or tags. —MichaelZ. 2013-06-03 20:34 z
We shouldn't be wedded to the somewhat misleading name "context" as we use the beginning-of-the-definition-line position for many things, including topical labels, sense-specific complement information, semantic-grammatical classification (eg, intensifer, modal adverb), as well as register and regional and other context.
One thing that might be very helpful in the long run would be to build in support for various default types of display for various types of tags. One useful thing would be to differentiate topic from usage context typographically. Another would be to allow semantic-grammatical tags to be non-displaying by default. This might also be useful for maintenance-related tags. I suppose such things could be done using CSS to make it easier to users to use common.css to customize display of such tags. DCDuringTALK20:40, 3 June 2013 (UTC)
With all of the labels codified in a Lua table, it should be easier to categorize the labelled entries, as well as to inject CSS classes. Perhaps something like class="label-subject-history" or class="label-grammar-intensifier", so CSS can be used to style or hide individual labels, general classes, or all of them. —MichaelZ. 2013-06-03 22:22 z
We have 935 labels that might want to have their own CSS class or ID. For myself, I would rather be selecting groups, if at all possible. For some types I would think that we not need individual CSS classes. I take it that CSS does not allow one to select members of a class equal to specific text. DCDuringTALK01:32, 4 June 2013 (UTC)
Are you asking whether CSS can select and style based on the text of the content? No. But if we are putting that text into the page, then there’s practically no overhead in also putting it into the class attribute.
Actually, using simple class selectors would require separate classes for the levels of categorization, as class="label label-subject label-subject-history", allowing one to style all labels, or labels in a category, or a specific label. Leaving out the individual label class would save a tiny bit of overhead in loading time and page weight, but I guess it would be insignificant.
If we used onlyclass="label-subject-history", then we could use a substring selector in modern browsers (MSIE 7+), as in * {. . .}, as long as we made sure that *label* didn’t appear in any unrelated classes. —MichaelZ. 2013-06-04 16:28 z
Template:label's subpages all have a common piece of code, which I don't like. It's harder to maintain, if one decides to perform any change then a lot of pages need to be changed. I support the proposed change. --Z21:09, 3 June 2013 (UTC)
Both {{label}} and {{context}} have the problem that they don't properly separate code and data. {{context}} is impossible to modify thoroughly for that reason, but I don't know how much better {{label}} is. —CodeCat21:11, 3 June 2013 (UTC)
Although I do think there's a risk of going too far with Lua, {{context}} is one template that absolutely should be Luacized, for performance, for readability, and for correctness. (The demo that I created, and that Liliana-60 copied illegally to {{label}}, is an improvement over {{context}} in all three respects, but a Lua module would be a much greater improvement. I would never have created that demo if I had known that we'd get Lua so soon.) —RuakhTALK06:44, 4 June 2013 (UTC)
I believe that anything we save in our user space is released under the open licences. It can be republished freely, but does requires attribution, e.g., linking to the source in an edit summary. Caveat: I might be wrong. —MichaelZ. 2013-06-04 16:34 z
Exactly. Liliana-60 has a habit of ignoring the attribution requirement, and refuses to acknowledge that it's a problem. Frankly, I don't see how we can keep an administrator who insists on violating copyright, but wev. —RuakhTALK17:46, 4 June 2013 (UTC)
Then if nobody minds, I will convert the few uses of {{label}} that are still present back to {{context}}, so that we can work on it and eventually convert {{context}} to the new Lua-powered {{label}} altogether. —CodeCat11:52, 4 June 2013 (UTC)
I've been working on adding an explicit call to {{context}} to the labels, but there are a lot of them (160 thousand...) so it will take some time even with a bot. The progress is at Category:Context label called directly. I noticed that quite a few pages misuse the templates by using them as something other than a context (like where {{qualifier}} whould be better). But I have also realised that there is a more fundamental problem with some of the labels we need to address. Labels can have different "scopes" so to say: it can be used to specify a topic, it can indicate restricted usage (by field, place), and so on. Currently, the labels are just names and do not distinguish between these types, but there can be some ambiguity in quite a lot of cases. For example, it could be desirable to use a label to restrict a term to the topic of a particular country, but all of our country labels are currently used for restricted usage (that is, dialectisms), so this is not possible. If you write {{context|Britain}} then the term is assumed to be a Britishism, even when you really want it to mean that the term pertains to Britain. So you may get something like Category:British Dutch when you really wanted Category:nl:Britain. I'm not really sure how to solve this currently, but I do think it's important. —CodeCat15:51, 6 June 2013 (UTC)
Can you give an example? I’m not sure how the topic of a particular country is a usage, but it might be used only the academic field of British studies, or have a special meaning when speaking about Britain, or only when referring to a sense of a thing that is in Britain (although the last is properly part of a definition and not a usage). This kind of usage categorization is problematic, because many editors start to categorize things with them rather than terms (like animal was being applied to names of animals). We once had label {{London}} and category:London, but got rid of them because they were just labelling the names of things in London. —MichaelZ. 2013-06-10 15:42 z
I think bush would be an example. It has a meaning that originated in Australia, but is now used worldwide in that sense. Nevertheless, the word doesn't refer to that same thing outside the context of Australia, so only when speaking of Australia it has that specific meaning. —CodeCat15:49, 10 June 2013 (UTC)
The definition already says “area of Australia,” so it is clear what thing is the referent. This is not usage.
You could refine the usage label as {{chiefly|_|Australian}} or {{originally|_|Australian}} to indicate that it is not only used in Australian English.
(The Canadian and Australian usages look identical to me, except the Canadian is not widely popularized in phrases like bush tucker.) —MichaelZ. 2013-06-10 16:13 z
Of course, the lexicographer could also account for nuance: an Australian in Asia may refer to the local countryside as the bush. This could be analyzed as the Australian/Canadian sense of bush meaning “countryside,” and the global sense meaning “Australian countryside.” —MichaelZ. 2013-06-10 16:18 z
I would say yes, with care. {{enzyme}} is used to label maltase with biochemistry, but the term is not restricted to biochem. {{organic compound}} labels the widely-known substances like amyl nitrate, ethanol and lactic acid, etc as technical terms restricted to the field of organic chemistry and put them into the non-lexicographical category:en:Organic compounds (why do we need to distract readers from the much better w:en:Category:Organic compounds?). Shortcut templates like these just encourage editors to categorize referents instead of labelling usage. A label should be explicitly what it is, so editors can understand what it is and does. Mzajac
Maybe what we really want is a template that works parallel to context but is explicitly intended for gloss tags instead, and placed after the word. So that something like {{gloss|organic compound}} will also categorize. —CodeCat17:54, 10 June 2013 (UTC)
Yes, templates like those should be deleted. Some have already been deleted, or are orphans. (But some people like them, as I recall from one contentious RFDO...) - -sche(discuss)17:56, 10 June 2013 (UTC)
For the record, I have a lot of doubts about bot edits in which "vulgar" is replaced with "context|vulgar". Making these edits from a thread entitled "Lua-cising Template:context" seems rather inadvisable, to say the least. I am disappointed. --Dan Polansky (talk) 18:10, 7 June 2013 (UTC)
Trademark discussion
Hi, apologies for posting this in English, but I wanted to alert your community to a discussion on Meta about potential changes to the Wikimedia Trademark Policy. Please translate this statement if you can. We hope that you will all participate in the discussion; we also welcome translations of the legal team’s statement into as many languages as possible and encourage you to voice your thoughts there. Please see the Trademark practices discussion (on Meta-Wiki) for more information. Thank you! --Mdennis (WMF) (talk)
Universal Language Selector to replace Narayam and WebFonts extensions
On June 11, 2013, the Universal Language Selector (ULS) will replace the features of Mediawiki extensions Narayam and WebFonts. The ULS provides a flexible way of configuring and delivering language settings like interface language, fonts, and input methods (keyboard mappings).
This seems to be breaking font specification. See Talk:Fraktur, where the specified font in the Fraktur sample only appears up if JavaScript is disabled, in Safari/Mac 6.0.5, Firefox/Mac 21.0, and Chrome 27.0. To sum up:
Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeitgenoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.
Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeitgenoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.
Wasn't the point of WebFonts that the user didn't have to have the font installed in order to have it render correctly for him? Because what you have written above after "This works:" is perfectly legible for me, but doesn't appear in Fraktur. Maybe if I tracked down and installed UnifrakturMaguntia on my computer it would, but doesn't that defeat the purpose? —Angr19:16, 14 June 2013 (UTC)
I don’t know the precise point of the ULS, but in this case it steals control. If its point is readability, then it should add fallback fonts, not override the editor’s choices. It isn’t a case of the user didn’t have to have the font installed, it’s the user may as well not have it.
It is also not as smart as it should be, because it is stupid about script tags. If I correctly tag the language-script as German Fraktur with de-Latf, it still prevents correct rendering. —MichaelZ. 2013-06-14 20:59 zAlso drives me fucking nuts by making a stupid keyboard icon pop on and off constantly while I type in this edit field. And the pop-up menu is garbled in Safari, but at least it can be turned off by a preference. —MichaelZ. 2013-06-21 19:23 z
Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsste ihre Abſicht von den Zeitgenoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.
UnifrakturMaguntia or UnifrakturCook, or some other version? Are you able to check your Unifraktur font’s font name? (On the Mac, open your Font Book.app, select the font, get info, and see what it says under Family, I think.) —MichaelZ. 2013-06-14 20:59 z
I just installed UnifrakturMaguntia and the first text above does now display in Fraktur for me. Don't like the capital U though; it looks wrong. (And müßte/müſſte is spelled wrong.) —Angr21:26, 14 June 2013 (UTC)
I corrected müste → müßte.
Now this is interesting: the first example above now renders in a Fraktur font on an iPad/iOS 5! I am certain that it did not when I started this thread. I assume that the ULS extension matches font names from Google Fonts and loads those? Too bad it barfs on language codes. —MichaelZ. 2013-06-21 16:39 z Interesting because the iPad doesn’t have installable fonts —MichaelZ. 2013-06-21 19:23 z
I found the original quote on b.g.c.: it's müsse, and seyn rather than sehn. The sentence still makes no sense to me, though. Just figured it out. —Angr18:51, 21 June 2013 (UTC)
Curiously, the first example also displays in Fraktur for me now (in Windows), possibly because I re-installed UnifrakturMaguntia. - -sche(discuss)18:49, 21 June 2013 (UTC)
Okay, I have disabled the Unifraktur fonts on my Mac, and my first example above appears to be displayed in UnifrakturMaguntia. Anyone else seeing this?
Just to diagnose this, here is a sample that prefers UnifrakturCook to UnifrakturMaguntia.It should still use one of these fonts if either is available. but it is failing for me. Perhaps the ULS only looks at the first-choice font. It didn’t work on preview, but now works on both Mac and iOS, showing the second-choice UnifrakturMaguntia face.
Where the heck are the docs for ULS? —MichaelZ. 2013-06-21 19:32 z
Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeitgenoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.
Format of articles, why not put the definition in a lede at the top?
Take a look at toches as just one example, wouldn't that article be a lot more useful (and nicer) if right at the top of the page there was a lede that gave the definition? As it is, the definition is dead last to many various and sundry less important things. I'm sure this question has come up, but I did some looking and couldn't begin to find it. 108.54.62.15517:30, 5 June 2013 (UTC)
It comes up on WT:FEED quite a lot. There's an argument for it, one argument against it is when say 'the definition', many words have more than one definition, some words have dozens of definitions. Since we're a multilingual dictionary we do need to put what the language is. The definition of lit changes a lot depending on what language you're speaking. Inflection is quite important but could I suppose go after the definitions, as could just about everything else. I've seen at least one suggestion to use the ===Definitions=== headers which I quite like. I suppose, one factor that shouldn't be underestimated is how users will get used to the format if they use Wiktionary enough. It's pretty simple; I'd've thought most people can learn it in a matter of minutes. Mglovesfun (talk) 17:43, 5 June 2013 (UTC)
It would be a good idea. If we were able to separate data from presentation (the two concepts are currently hideously intertwined here), we could easily generate custom layouts with javascript. Until then any change would be counterproductive. DTLHS (talk) 18:03, 5 June 2013 (UTC)
But remember that in most documents, certainly in web pages, the text is serialized – it has an inherent order. This order will manifest itself in many contexts – excerpts, search results, mobile view, in non-visual browsers including screen readers and braille readers, when our data is reused elsewhere, etc.
We could expand this to a discussion about whether an entry is a page or a database. —MichaelZ. 2013-06-06 17:11 z
As things currently stand, we've got data that's supposed to have a specific structure, but that structure is enforced manually by humans (and by bots), rather than by the system itself. This is rather horrible, in a number of different ways -- it's terribly inefficient, it's error prone, it unavoidably mixes data and presentation in ways that have been recognized to be huge no-nos, and it's very labor intensive. Building a database by hand is not the best way to go about it. ;)
In a couple of my jobs now, I've had occasion to poke around looking at various terminology management apps. One that was quite interesting was TermWiki, which (as best I understand it) is mostly MediaWiki with the semantic extensions added in (http://semantic-mediawiki.org/, itself relying much upon mw:Extension:Semantic_Forms), one or two other openly-available extensions, and some custom whizbang. I have no idea how much pull or push we have with regard to how the Wiktionary back-end is set up, but Semantic MediaWiki is high on my wish list for this site.
If any MediaWiki extensions are entirely off the table, I think we should explore building tools ourselves to emulate that kind of automated structure building and integrity management. Why should I be expected to remember all the wrinkles of WT:ELE? That kind of structure is exactly what a database provides, automatically. Users shouldn't have to even be aware of this; it should just happen. We would avoid a huge class of entry maintenance problems if we could do this. -- Eiríkr Útlendi │ Tala við mig17:31, 6 June 2013 (UTC)
Basically I agree. On the other hand you could look at it though as being similar to Wikipedia pages about people: they start out with the person's childhood, even though that is often not really what most people care about. Yet it's natural to begin at the beginning. The etymology is sort of a description of how the word grew up, if you will. --Haplology (talk) 15:32, 6 June 2013 (UTC)
Wikipedia's articles about people start with a summary of one's life and important achievements. It is strange to talk about the etymology of a word for which no definition has been given yet. I've also never understood why pronunciations are placed before the definitions. On fr.wikt we've moved the pronunciation some time ago, but the etymology remains at the top, too (but in our case we still have a general section for all homographs, which is both good and bad). Dakdada (talk) 16:16, 6 June 2013 (UTC)
I think the original reason for putting the etymology at the top was to distinguish words with different etymologies. I'm not really sure that is the best solution, though. -- Liliana•16:23, 6 June 2013 (UTC)
Also print dictionaries traditionally put etymology and pronunciation before the definition, so our order matches what you might see in one of them:
horse/hɔrs/n. 1. A hoofed mammal, Equus ferus caballus, often used throughout history for riding and draft work. 2. A piece of gymnastics equipment with a body on two or four legs, approximately four feet high with two handles on top.
However, all the print dictionaries I currently have to hand do in fact put their etymologies at the end of the entry, not at the beginning. They do all put the pronunciation first, though. Instead of ===Etymology 1===, ===Etymology 2===, and the like, maybe we could find some other heading like ===Word 1=== or ===Form 1=== or ===Lexeme 1=== to use instead. One thing that's bothered me about separating by etymology is that people are often tempted to separate parts of speech this way, for example listing the noun house as ===Etymology 1=== and saying it comes from Old English hūs, and the verb house as ===Etymology 2=== and saying it comes from Old English hūsian. Which is strictly speaking true, but not the way I think we should be using those headings. —Angr17:08, 6 June 2013 (UTC)
In Japanese, that (breaking an entry down by each etym) does actually seem to be the best organization -- sometimes you have a single "term" as written, but it's actually umpteen different "terms" as spoken, and each has its own etymology. Lumping all of those different readings and definitions together and then trying to explain the etymologies of each after the fact would be horribly confusing. C.f. 愛#Japanese, 大人#Japanese (multiple readings shown, but the entry still needs etyms & pronunciations, etc.), 目#Japanese, and so forth.
One suggestion would be to amend the CSS or JS to make etymology sections auto-collapse, so users only see the text if they want to (by either clicking or by customizing their CSS/JS). Just the etym text itself, not including the subsections thereunder. Inspecting the rendered page shows that the etym headers are <h3> elements, generally followed by a series of <p> elements until the next header. I know that some of our subsections use collapsible divs, like in {{der-top}}; I have no real idea how difficult it would be to auto-collapse a series of <p> elements based on their relative position in the page. -- Eiríkr Útlendi │ Tala við mig17:18, 6 June 2013 (UTC)
I have always thought that XML would be more suited to making a dictionary, because it more strictly separates formatting and structure. It also has the advantage of being able to validate pages and reject them if they are invalid. That would allow the software itself to check the formatting, and add categories for missing inflections and so on. —CodeCat17:40, 6 June 2013 (UTC)
That or JSON. Anyway, some things that would have to happen: 1. write a parser / validator (in Lua?) 2. create a new editing interface (javascript) 3. figure out how to make it work with our existing template, Lua and javascript infrastructure. 1 and 2 are easy enough but I'm not sure how 3 would work (can you call templates from a string in Lua?). DTLHS (talk) 19:51, 6 June 2013 (UTC)
Writing our own parser seems a bit pointless because we would want to make use of something like XML schema for validation, and XSL as well to transform the data into HTML. —CodeCat19:56, 6 June 2013 (UTC)
Yes you're right. How do you envision something like this working with our existing infrastructure? Or would we have to scrap everything. DTLHS (talk) 20:10, 6 June 2013 (UTC)
Just write a new mw:ContentHandler (though this would probably require some coöperation from the WMF…). As for scrapping everything, I would implement migration this way: first split pages into per-language elements containing blocks of wiki markup, and then successively refine the markup schema to capture more of the entry structure, and rewrite pages into the new schema. With current quite consistent usage of templates, I think Wiktionary markup is already quite machine-readable, so actually bots could be doing the conversion. And when they fail, they could just leave wiki markup in place as a fallback. Keφr20:28, 6 June 2013 (UTC)
Something like that would work, yes. One effect of having a strict separation of content and presentation is that we would need to split our templates and modules in the same way. A template or module could, under this scheme, either generate content or display it, but not both. This would have quite a few consequences that we would need to work out. Inflection tables are probably the easiest to do, but other things will need more thought. —CodeCat20:41, 6 June 2013 (UTC)
@CodeCat, we already have the database, no? Why recreate it using XML? And if we're about to embark on anything that "would probably require some coöperation from the WMF", shouldn't we first look at things like Semantic MediaWiki, given that they've already done a lot of the work? I know some folks like reinventing the wheel for the thrill of learning how it was done, but I'm more interested in having a back-end that works well, and sooner rather than later. ;) -- Eiríkr Útlendi │ Tala við mig21:39, 6 June 2013 (UTC)
XML and a relational database are very different things. Databases are meant for representing raw data without structure, whereas we definitely do want structure. XML is also much easier for people to edit because it's human-readable text, whereas for a database we'd need to make a whole interface as well. —CodeCat21:50, 6 June 2013 (UTC)
Really, I think we need to look at what's already out there before embarking on anything this major. What you're proposing sounds to me an awful lot like something that's already been done. To wit:
We already have the MW database. Readily available extensions already provide much of the functionality required to ensure structural consistency and integrity. See above about the semantic and form extensions.
Wikitext is also human-readable, and it's much less verbose than XML, and it's already supported.
Those same readily available extensions also already supply an interface.
Jumping right into creating a whole huge infrastructure for reworking everything about Wiktionary to use XML and then reworking everything about the UI to deal with that XML, all pretty much from scratch, strikes me as potentially foolhardy. Again, I'm sitting here looking at something that looks an awful lot like a wheel, and that already exists. And then I hear you talking about plans to invent one. :-/
Note, I'm honestly not trying to be obstructionist, I'm really just trying to make sure that we proceed with our feet on the ground, and in possession of all the relevant facts. -- Eiríkr Útlendi │ Tala við mig22:17, 6 June 2013 (UTC)
Why would we need a whole new UI? My intention is that when you click edit, you are served with XML-ified page content instead of wikitext, and you edit it in that form. Then you save, and saving will validate the page and if it's valid, it's done. The editing process itself would not change at all. The only thing that would change is that there is an XML validation step followed by an XSL-driven pre-parser which converts the XML into wikitext (or directly to HTML cache). So from the wiki's point of view we would still store things in pages like we do before, only the source code of those pages would be the more data-oriented XML instead of the current presentation-oriented wikitext. —CodeCat23:29, 6 June 2013 (UTC)
I'm confused -- it sounds like you're suggesting that editors would still have to remember everything about WT:ELE, and now a lot of stuff about XML, and would still be editing the raw code -- the only addition you describe that seems to make sense is the validation.
Again, I think Semantic Wiki does everything good that you describe (enforcing structure, validating data, etc.), only without the ugliness -- data is entered using forms, not raw XML.
Besides which, doesn't MediaWiki filter out or disallow certain kinds of markup in the raw wikitext? Or does that only apply to a small subset of HTML, like <a> tags? -- Eiríkr Útlendi │ Tala við mig23:36, 6 June 2013 (UTC)
No, the whole point is that we can ignore ELE because of the XSL processing step. The XSL style sheet will determine which parts of the XML tree go where, so it will rearrange the source to match whatever we like. The order of elements in the source would be dictated by the XML schema but it would not affect the end result. So you could put translations first or last in the source, or even reorder all the languages, and they'd still show up the same on the page thanks to XSL. —CodeCat23:42, 6 June 2013 (UTC)
Still, 1) editing raw XML == yuck. No thank you. 2) And we couldn't ignore WT:ELE entirely, as we'd still have to be aware of what kinds of content were allowed. 3) Why reinvent the wheel? -- Eiríkr Útlendi │ Tala við mig23:52, 6 June 2013 (UTC)
I'm not trying to reinvent the wheel. It's just how I would envision a purely source-based dictionary wiki like we do now. Of course if we drop source editing altogether we can change a lot of things, but I feel that that would not actually be very practical in many situations. Being able to copy parts of a page in one go has its advantages. —CodeCat23:55, 6 June 2013 (UTC)
Yes, copying chunks is quite useful. I believe that's still possible using Semantic MediaWiki; I just popped over there, and you can still get at the raw source just fine. Their page on wiki/Semantic_Forms describes some more of what I was thinking -- directing user input in ways that hide away the minutiae of proper wikitext. -- Eiríkr Útlendi │ Tala við mig00:14, 7 June 2013 (UTC)
We don't really have very structured data at this point. We really should be using things like {{senseid}} and (hopefully improved versions of) {{etymtree}}/{{findetym}}. Switching straight over to XML or anything else to far from our current format is probably not going to happen, at least not any time soon, for a number of fairly obvious reasons. That shouldn't stop us from making more usable data, or dealing with the main topic at hand, that the definition is too hard to find. We could put cognates in a collapsible box, preferably one much smaller than the normal full-sized 100%-width large-padding boxes that translation tables use. We could make the pronunciation section more compact. We could shrink down the ginormous headers and inflection lines and the space between them. All sorts of things would help. --Yair rand (talk) 00:40, 7 June 2013 (UTC)
Thanks for mentioning {{senseid}}, I didn't know about that feature. That's exactly the type of thing I've been thinking WT ought to have. I tried it just now with ハウス. --Haplology (talk) 06:13, 7 June 2013 (UTC)
Thank you Yair, yes, back to the main topic :) -- (rabbit hole well and duly fallen into for the day) -- in addition to the visual reworking of an entry page's style, which should be easy enough to do, another of the suggestions above that we could implement fairly easily would be to use a ===Definitions=== header, clearly pointing the user to the defs. -- Eiríkr Útlendi │ Tala við mig05:03, 7 June 2013 (UTC)
Definitons are put in PoS sections which use level >3 headings, so it should be a level >4 heading. I'm not agree with this idea though, we are already over-sectionizing informations, which works for bigger entries but just causes problems in most other cases. For example, having a separate section for pronunciation is a good idea for some languages, but doesn't work for most other ones (compare ܟܣܐ with this version). --Z09:32, 7 June 2013 (UTC)
@Z, part of the issue with Syriac (and presumably other Semitic languages that don't mark vowels) is that one written form can have multiple spoken forms. Japanese does this too, only in different ways. I've been treating each reading (i.e. spoken form) as a separate etymology, since, frankly, each spoken form has its own derivation, which users may want to know about (I do myself, which is partly why I'm happy to dig up the whys and wherefores and write it all down here). Have a look at 愛#Japanese for one example -- this single written form has three spoken forms (that I know about) that are regular words, each with its own derivation, and then tons of exceptional uses in names. 目#Japanese has four spoken forms, 盆#Japanese has three, and so forth. Hopefully food for thought, at any rate. :) -- Eiríkr Útlendi │ Tala við mig15:36, 7 June 2013 (UTC)
Separating readings by etymologies sometimes results in repeating definitions word-for-word such as 旋風 where we have exactly the same definition all of four times. Meanwhile maybe only one of them is common and the others are freaks that only live in dictionaries, and the only note to this effect is hidden in a Usage Note. Sometimes I use {{ja-altread}} to avoid this, but it's cheating. --Haplology (talk) 16:49, 8 June 2013 (UTC)
That particular entry could certainly use some cleanup, expansion, and clarification (I'll add it to my list).
In general, though, Japanese presents a bit of an odd case. For folks not familiar with the mechanical issues of Japanese, imagine that eat, consume, and ingest all had their individual pronunciations, etymological derivations, and meanings -- but all shared the same single single spelling. Consequently, all would go under the same headword here at Wiktionary. So the question then becomes how to organize that entry.
Duped defs in and of themselves I don't think are necessarily a problem; if the overlap is complete, a simple "see " could suffice. If the readings are really distinct, I think the etyms should be broken out, with proper etymological descriptions given. If the readings are only slightly different, and this produces no difference in meaning, I think {{ja-altread}} is probably the way to go. See 伊邪那岐 for one such example -- this can be read as Izanagi, or Izanaki, with no real change in meaning or derivation. The difference is explained by a simple sound shift, mentioned in the etym. Meanwhile, 紫苑 is a different example, where each of the three different readings has the same def for the proper noun, but other aspects are different -- the Shien reading is never used for the common noun, and shioni is never used in the derived term.
The combination of the MW back end, the need to accommodate multiple languages per headword, and the oddities of Japanese orthography all lead to a bit of an inelegant matching, but there you have it. :) -- Eiríkr Útlendi │ Tala við mig18:51, 8 June 2013 (UTC)
This is a tangent, but one example of semi-synonymous English homographs that comes to my mind (though it isn't a very good example) is board+board. Amusingly, en.Wikt currently conflates the two separate but overlapping etymologies and groups of senses which that string of letters has, but if you can read German, you can take a look at how I handled them on de.Wikt. - -sche(discuss)19:25, 8 June 2013 (UTC)
That's interesting (though I struggle to follow German). There are indeed two separate origins, but I'm not convinced that such a clear case can be made for separation of modern meanings since the two words had already become conflated in Old English. Dbfirs12:02, 10 June 2013 (UTC)
There should at least be a note somewhere explaining how "a length of a piece of wood" came to mean "food". As the board entry currently stands, that development is completely unclear, probably leaving any reader quite puzzled. -- Eiríkr Útlendi │ Tala við mig17:54, 10 June 2013 (UTC)
Make Template:alternative form of explicitly say that a US/UK spelling's definitions are found in the other entry
Recently, I noticed several edits like this, where a second link to sense of humour is put immediately after the templatised one, explicitly stating that the definitions of sense of humor are found in ]. Is this desirable? Dbfirs thinks it is, because "users will expect to see definitions for a valid word in their region, so the repetition makes is clearer that the definitions are only a click away". I think it isn't, because users are no more likely to expect content on favour/favor than on their preferred spelling of kinnikinnik/kinnikinnick/kinnickinnick/etc, and no less likely to understand how a soft redirect works on one of those pages than on the other. What do you think? If it is desirable, can the extra text be added by {{alternative form of}} itself (or by whichever other template we use to redirect US/UK spellings; e.g. {{form of|Standard form}} or *{{standard form of}}), rather than being added by hand? - -sche(discuss)22:12, 6 June 2013 (UTC)
Redundant links with different link text are confusing for users.
The link text is also wrong, because English spellings cannot be divided into binary US and British sets. Humour is a British and Canadian spelling, while curb is a Canadian and US spelling. Unless you disavow the classic linguistic meaning of British English, in which case humour is a UK, Irish, Canadian, Indian, South African, Australian, and New Zealand spelling.... —MichaelZ. 2013-06-06 22:48 z
Yes, you're right, it's more complicated than I anticipated. Perhaps the only way to deal with the variations is by adding individual usage notes? Dbfirs05:37, 7 June 2013 (UTC)
I hope you don’t mean usage notes about humor placed on the entry for humour? —MichaelZ. 2013-06-07 13:59 z
No, those entries are perfectly clear as they stand. It's when we provide a soft redirect from a correct spelling (for one region) to an incorrect spelling (for that region) that we need clarification. Dbfirs21:49, 7 June 2013 (UTC)
So how come we have a “soft redirect” from sense of humor to the simple definition at sense of humour? But full entries at humor and humour, where two separate, moderately complex pages for the spelling variations of a single actual English term will be forever impossible to keep in sync?
Not only is this inconsistent for no reason, but here we are discussing the idea of complicating the simple pages instead of merging the relatively complicated ones. —MichaelZ. 2013-06-08 01:43 z
Humour and humor haven't been merged yet only because no one has gotten to them yet. As I wrote in the Tea Room, I've been reducing such content duplication for about a year now, but it's hard to find entries: when I started, there were 13 pairs of supposedly synced entries (findable via Category:English synchronized entries and via HTML comments), and not a single one actually was or had even recently been synced. Unfortunately, there are many pairs like humo(u)r that are such terrible messes that they're not even categorised.
Postscript: I just consolidated humo(u)r and colo(u)rful, leaving only colour/color (which are only synced because Isyncedthem) as a monument to failure, a reminder of naïveté, a proof for anyone who, in the future, ever finds it hard to believe that people would have attempted something as foolish as the total duplication and perfect synchronisation of content across dozens of pages. (Let all who doubt view the entries' edit histories see that Wiktionary did once try that, and that the entries did indeed spend much of their time out-of-sync.) - -sche(discuss)02:56, 8 June 2013 (UTC)
Blimey! Looks like you’ve already been through that one. —MichaelZ. 2013-06-08 03:32 z
I still prefer separate entries, but I have to admit that the problem of synchronisation is almost insurmountable until we find a way to include a single set of definitions in both entries. Meanwhile, sche's amendments have produced unambiguous entries. Is everyone happy that we standardise on this format, with the optional addition of individual usage notes where the spelling situation is complicated? Dbfirs12:14, 8 June 2013 (UTC)
I hope that your thoughts allow for the occasional possibility that there are regional differences in meaning and usage that happen to correspond to the regional prevalence of the spelling. I hope we don't end up discouraging contributions because of seeking 'coordination'. DCDuringTALK15:18, 8 June 2013 (UTC)
Indeed, there are subtleties that are best explained by separate entries, but the consensus of editors seems to be that the problem of trying to keep these synchronised is insurmountable. (I agree that synchronisation is a problem.) I like your suggestion below. Do you think the experts here would be happy with the links? Should I try a few to see what people think? Dbfirs08:24, 10 June 2013 (UTC)
I'd try the experiment in a few pages mentioned in discussion pages (to speed the feedback). Widespread implementation would not be wise without some positive response, at least, and, at least surly, grudging silence from others. DCDuringTALK12:59, 10 June 2013 (UTC)
There is American and British English spelling differences on Wikipedia.Wikipedia . We could provide, under Usage notes, a section-specific link to the section covering specific classes of "UK-US"-type (pace MZ) spelling differences. We could even have templates like {{en-spelling -or-our}} that provided the section link and could be updated when we get our own vastly superior coverage of this and similar matters next month. DCDuringTALK14:30, 7 June 2013 (UTC)
I can’t imagine how “subtleties are best explained by separate entries.” Even a series of blatant contrasts in usage are difficult to assimilate while flipping between two, three, or four variant entries, or having them open in separate browser windows. And how is the reader under this heavy cognitive load supposed to discern “subtleties” from synchronization errors?
Labor, labor, Labour, and labour are not four different words. No other dictionary, print or electronic, considers moving them to separate pages, because that would be a disservice to their readers and reduce the value and utility of their resource. —MichaelZ. 2013-06-10 14:58 z
Explaining meaning is the most important task here. Usually differences are of interest to linguists. A meaning that is present in one spelling, but not in the other, belongs to the entry in that spelling in which it exists. Usage examples for a sense are often regional as well, though that would argue for revising them rather than duplicating content. DCDuringTALK16:37, 10 June 2013 (UTC)
When a non-linguist hears an unfamiliar usage of /ˈleɪbər/ on the radio, she has to learn of the existence, locate, and click through four separate articles, then compare all their senses in her head before she can be confident that she has determined this one word’s meaning. That’s a failure of the dictionary. —MichaelZ. 2013-06-10 17:03 z
I don't disagree with the general advantage of combining senses across alternative spelling entries. I'm just suggesting that we need to not prevent or even discourage user input on whatever page they prefer. Their choice of one page rather than another might be useful data about the actual distribution of the sense or they might be knowledgeable about the term's usage. I fear the consequences of the accretion of layers of rigidity and complexity leading to a gradual and premature ossification of Wiktionary, especially in English.
It's an empirical matter as which layout might give the best results for various classes of users with various look-up needs. I am extremely skeptical that we will manage a major advance with a part-time crew of amateurs and no empirical data on how our current and potential non-linguist users use and could use Wiktionary. Wikipedia hews close to the style of an encyclopedia in most regards. Just as the QWERTY and telephone-style keyboards largely define many, many user-input interfaces, the limited variation in styles of dictionaries provides the total range of base interfaces we can use. Feasible paths to desirable innovations seem to me to be limited to incremental changes.
For incremental changes, we don't even know whether our existing users would prefer that the main entries typically (subject to variation by type of alternative spellings and even by individual term) be the "US" or "UK" (or "North American") variants. Suggested proxies for the missing empirical data are the relative sizes of English-speaking populations in countries assumed to prefer one or the other, the preferences of our contributors generally, the preferences of our contributors who make the first entry in one of the main alternative forms, usage in the large controlled corpora like BNC and COCA, and usage in the various Googles, especially News, which enables regional differentiation. DCDuringTALK17:32, 10 June 2013 (UTC)
(tl;dr summary: this is a non-issue that can be sidestepped anyway in most cases)
I doubt many senses are attested only in certain US or UK spellings. Some UK/US authors use US/UK spellings, and UK/US works are often copyedited when published in the US/UK, so even British senses of -our or -ise words are usually attested in American spelling, too.
Even if a sense is attested only in a word's US spelling, I doubt the same word has a second sense attested only in its UK spelling. Thus, in most cases, we can sidestep the issue by making whichever spelling has the peculiar sense the lemma: then the sense is on the "right" page, and all the senses are on one page.
If a word does exist which has some senses only when spelt -ise and other senses only when spelt -ize (can anyone think of such a word?), then I still think the content should be centralised: I agree with Michael that scattering not-entirely-overlapping sets of definitions over different pages obfuscates, rather than explains, a term's meaning. (It requires casual readers to, first of all, realise that the content is spread out across multiple pages, and then to compare the pages to see which spellings have which senses.)
Capitalisation is an at least slightly different issue. What I've most often seen done, and what I do, when a word has different (and overlapping) senses when capitalised bzw. uncapitalised, is what is done on gypsy/Gypsy (each term has a definition line linking to the other). I try to employ similar linking from singular entries to plural-only senses (see message, messages). I suppose such linking some also be employed between -ise and -ize words, etc, but that seems unideal—less ideal that simply putting all the definitions on one page.
DCDuring's comment of 17:32, 10 June 2013 makes me think we may be talking past each other / about two different things, however. I tend to accept that new users should be allowed to add content to whichever page they like; if the content they add to an alt form is already found in the lemma entry, their edit can be rolled back; if the content isn't found in the lemma entry, it can be moved thither. (In this as in many other things, Wiktionary might benefit from adopting "Stabilversionen", so that we could clean up noobs' misformatted entries at leisure, and be less rigid about formatting in the meantime.) - -sche(discuss)17:43, 10 June 2013 (UTC)
I see the basic issue as what is a term and its lemma (WT:Lemmas is sorely lacking). We have separate entries for spellings and capitalizations (while keeping together completely unrelated other-language terms that share an orthography). But a dictionary entry is properly for a term in a particular language, and offering a survey of its variable properties like spelling and especially capitalization.
From the readers’ point of view, they may need to find a lemma entry based on any alternative, regional or historical spelling, and especially capitalization, which can be absent in their source – as in spoken form or most subtitles and captions – and can vary freely – as at the beginning of a sentence, in title caps, or in all-caps texts.
The one lemma entry should guide the reader as to the normal or preferred spelling and capitalization, or its range of usage. So in addition to usage and grammatical labels, either entries/headwords or particular senses may require an indication of the usual or preferred spelling and capitalization, plus the variations of these in different regions, over time, or in different contexts. —MichaelZ. 2013-06-10 19:32 z
So, can these notes – as found in entries favour and tumour – be removed? (Or at least replaced with a template so they can be easily found and altered?)
In “UK and Canada spelling of favor”, it is clear what the link takes the reader to.
In “UK and Canada spelling of favor. (For definitions, see the American spelling.)” it is unclear what “the American spelling” is referring to. There is no indication to the reader why there are two choices, and no evidence of what two things they lead to. —MichaelZ. 2013-06-12 14:35 z
Also confusing: “the American spelling,” when the target headword is labelled US, alternative in Canada. Labels on links should use the same terminology as labels in entries. Links should not be redundant, and their destination should be clear. A better alternative might be one of:
But we have enough difficulty labelling headwords. It’s a mistake to also start labelling links in headword lines too.
If you want to explicitly mark the relationship, then add an “Alternative forms” header. —MichaelZ. 2013-06-12 15:17 z
Yes, I'd already conceded to the majority view, so I've removed my addition. I've also replaced "Canada only" with "Commonwealth", even though I don't really like the label. What alternative would include NZ, Oz, SA etc? Dbfirs16:13, 14 June 2013 (UTC)
I’m arguing for British, referring to the British English branch of the language. We were using that when the label text was changed to “UK” with no allowance for the difference in meaning. —MichaelZ. 2013-06-15 15:28 z
In that case, I fully agree with you. Can we change the template to say "British English" rather than "UK" when the parameter "from=British" is used? Dbfirs08:02, 16 June 2013 (UTC)
Adding explicit "context" before context templates on definition lines
MewBot (talk • contribs) has started to replace {{vulgar}} with {{context|vulgar}} and the like on definition lines. An example edit: diff. Apparently, no one has protested so far. Let this be a record in Beer parlour that this is ongoing. I have doubts about advisability of these replacements, but do not see them as obviously wrong. I am disappointed that this was not expressly discussed.
Before the example edit:
# {{pejorative}} A man who uses services rendered by ].
After the example edit:
# {{context|pejorative}} A man who uses services rendered by ].
I do not know what is "concern trolling". The new text is longer and looks more messy in the wiki text, for one doubt. WT:AGF. --Dan Polansky (talk) 18:28, 7 June 2013 (UTC)
Okay, as per concern troll: "Someone who posts to an internet forum or newsgroup, claiming to share its goals while deliberately working against those goals, typically, by claiming "concern" about group plans to engage in productive activity, urging members instead to attempt some activity that would damage the group's credibility, or alternatively to give up on group projects entirely." --Dan Polansky (talk) 18:29, 7 June 2013 (UTC)
Fair enough. For what it's worth, I also wonder about how productive it is to add everything under {{context}} if that template is eventually going to be orphaned or changed to something more broad like "label". DTLHS (talk) 18:37, 7 June 2013 (UTC)
Subjectively speaking, I just don't like the new wikitext. I suppose there must be some advantage, or else the bot would not be doing it. But I am not very clear about what the advantage is. --Dan Polansky (talk) 18:42, 7 June 2013 (UTC)
I am mainly doing it as a first step so that the initial problems have been worked out, and we can work out what (concretely) to do next. It may be somewhat redundant but it's not harmful either, and it eases the conversion later. One of the main problem points for having a template for every label (which I listed above) is that there is an automatic conflict between any label and a template with the same name. This created problems like {{acronym}} which was originally not a context template and which was causing script errors because some pages used {{context|acronym}} as a label (correctly). Another example is our inability to use "law" as a label because of {{law}}, having to resort to "legal" instead. Removing this point of conflict, even if we don't do anything else, is therefore a definite benefit.
The advantage of adding {{context}} now is is that once all calls are through {{context}}, we can start editing the label templates themselves without worrying about breaking this backwards compatibility, because we can assume that they are always called by {{context}}. This makes it possible to do the transition in small steps gradually, without any fear of breaking anything with one huge step. Also, the next step I was intending to do was to add lang= to the template where it's missing, as the proposal above (the name of the final template hasn't been settled yet) would have a mandatory language code as the first parameter. It will also will catch many errors. This is a lot easier to do this replacement if there is only one template to consider than if there are hundreds. —CodeCat18:52, 7 June 2013 (UTC)
I see. The ultimate verbosity fest that you are planning is this:
# {{context|pejorative|lang=en}} A man who uses services rendered by ].
No, not the ultimate. It's just an intermediate step in the process. Once all labels are called via {{context}} and they all have a language, converting all of them to another call (using another template) becomes trivial. I am just doing groundwork right now. Also see my post below. —CodeCat19:07, 7 June 2013 (UTC)
CodeCat’s proposal included: “I think that the approach used by {{label}}, in which labels always need {{context}} or {{label}} prefixed, is preferred for a Lua implementation.”
"Sounds good, but let’s abandon the misleading name “context.” ... —Michael Z. 20
On the day on which the post was made, the bot started replacing. But whatever. The point right now is, if someone can explain the benefits, they should do so. --Dan Polansky (talk) 18:52, 7 June 2013 (UTC)
I don't really see the point of the effort under challenge except to remove an implementation barrier to deprecating all direct use of context tags. Presumably that is being done to bring Luacization within the capabilities of the technical resources that we have. If Luacization requires millions of extra keystrokes, thousands of them mine, then it doesn't seem like such a good idea to me. A philosophy of adding keystrokes where not necessarily essential seems exactly the opposite of what we need. I thought that Luacisation would not be used as a reason for such an obvious regression. I thought a basic principal of user interface design is: NO REGRESSIONS. DCDuringTALK18:55, 7 June 2013 (UTC)
I reject any notion that simply adding more text is automatically a regression. Besides, "context" can be renamed to something shorter eventually. DTLHS (talk) 19:01, 7 June 2013 (UTC)
Indeed, and that was already discussed as well. Proposals were {{label}}, {{x}}, {{ctx}}, {{ct}} or {{c}}, although DCDuring you yourself said that we should abandon the name "context". Once the language code templates are out of the way, we will have a lot of freedom in naming because we can use almost any two-letter name. How would {{lb|nl|rare|archaic}} be in contrast with the current {{context|rare|archaic|lang=nl}}? It's quite a bit shorter. —CodeCat19:05, 7 June 2013 (UTC)
You are misreading by lack on enthusiasm for the name "context" for some kind of implicit support for adding to the the typing burden. One of the great virtues of the {{context}} system is that it allows folks to add whatever they thought was appropriate for the entry. The implementer of the system would periodically review the labels added and make a new direct context label where appropriate, sometimes making it a redirect. On the NO REGRESSION principle, I assumed that this behavior would be continued as a matter of course. That requires that there be a template to support labels not yet implemented as direct labels. That is the sole essential use at present for {{context}}. It seems that the current conception is to discard direct labeling for programming convenience. I would be inclined to add Category:Entries with redundant context templates and eliminate all those instances where context preceded a label which has its own template. DCDuringTALK19:35, 7 June 2013 (UTC)
That part would always work. Any new proposal would certainly allow you to use "new" labels, as well as to make labels aliases of one another (something we currently use redirects for). The difference would be that the list of "recognised" labels (which get a category or a link or something else) is stored in a Lua module rather than in templates. So you could still type {{lb|nl|I'm a label}} and it would work. And you could still then, later on, define a new label called "I'm a label" and the entry would then use that. —CodeCat19:41, 7 June 2013 (UTC)
I support this. CodeCat is eliminating all of the conflicts and open-ended problems with {{context}}. Huzzah! 🐱
DCD, the concept of invoking a template without typing its name is completely flawed. An open-ended set of keywords is not templates, it is parameters. Abandoning the idea is not a regression. Being able to move forward is progress. —MichaelZ. 2013-06-07 19:48 z
Another way to see it is in terms of "namespaces". Not namespaces as they are in the wiki but a bit more abstract, they are the set of all possible names. Currently, the namespace that contains all possible context labels is the set that is the union of: 1. all templates that are context labels, plus 2. all other names in the Template: space that do not exist. This has created many problems because as Michael said it's an open-ended set of names for a namespace that is not open-ended, because it already contains other templates. Every template you create restricts the space of possible context labels further and there have already been conflicts. {{plural}} is not a context label template, but most of its current transclusions are via {{context|plural}}, and they just happen to work, kind of, by chance (but {{context|plural|rare}} will break!). And then there is the problem I mentioned with {{law}} and {{acronym}}. Ruakh's original {{label}} proposal as well as my own both avoid this by making the namespace for context labels distinct from that of template names. This does mean that you have to call the template explicitly each time, but I think that is a small price to pay compared to the issues we otherwise have (and have had). —CodeCat19:54, 7 June 2013 (UTC)
I understand the point about restrictions on available context names, though it seems rather rare in practice. I'm not in a position to evaluate what is or is not possible technically. What I see is simple regression from a entry-content-contributor PoV. For templates that are used commonly, like context or its successor a one letter name, or alias, eg {{c}}, would seem best. At more than 300K uses before all the conversion of all the directly template labels, it would seem to have earned the right to usurp that name from the "common gender" template.
The principle of liberating text strings from their least efficient uses has even broader application. Why not reserve all one- and two-letter strings for things that are directly input by humans. The principle has already been established by such templates as {{m}}, {{l}}, {{g}}, {{a}}, etc. Perhaps it is time to revisit the use of two- rather than three-letter language codes to liberate more one- and two-letter codes for those doing manual data entry. {{n-g}} shows another approach for two-part names. DCDuringTALK20:26, 7 June 2013 (UTC)
The gender and number templates may disappear in the future as well, as we now have a module for them. There are still many cases where they're present directly in entries, but we can find solutions for that. So that will free up {{c}}, {{f}}, {{n}}, {{p}} and so on. Currently, {{head}} already uses the module, but none of the others do, and it will take a while to migrate all of our headword-line templates over, so we can't use {{c}} just yet. I do prefer {{lb}} over {{c}} though, and we can't usurp {{l}} because that template is probably used even more widely than {{context}}. —CodeCat20:48, 7 June 2013 (UTC)
It's slightly longer, but I'd like to make a plug for using {{lbl}} instead of {{lb}} -- {{lbl}} is an obvious mnemonic for "label" for English speakers, while {{lb}} is obtuse enough that I had to reread a few paras above to remind myself of what it was supposed to stand for. -- Eiríkr Útlendi │ Tala við mig21:15, 7 June 2013 (UTC)
It should be possible to put Mewbot to work on renaming the 28K instances of {{c}} and get that done quickly. {{temp}} has fewer than 19K instances. All of these are far fewer than such templates {{head}} and the reigning champion of both raw count and redundancy(?): {{Latn}} @ 2,958K. {{Latn}} is an example of something that could easily be much longer as it is not very commonly typed in by humans. DCDuringTALK21:18, 7 June 2013 (UTC)
It's not that easy. Only a small proportion of all uses of {{c}} is actually from direct transclusions in entries. Most of them are called through other templates like {{t}}. So we would need to track down all the templates that use genders in such a way and change them, but that is quite a big task because it has to be done by hand, and there are many languages whose nouns might use gender templates. It would take a few weeks to track them all down and fix them, assuming of course that there's a consensus for such an operation. We can't rename {{Latn}} because its name follows an ISO standard for script codes. —CodeCat21:55, 7 June 2013 (UTC)
Just to make it explicit: I, too, support what Mewbot is doing. It's good for the reasons Ruakh outlined earlier and the reasons CodeCat outlined recently. It's an important first step in updating our context templates, which currently have to use highly complex, recursive code to account for the possibility that they might be called directly, or called from {{context}}, or called from another template, or followed by |another template}}, or followed by |something that isn't a template}}, or... - -sche(discuss)02:16, 8 June 2013 (UTC)
I don't see why it is that the replacement for context can't simply compare a given label with a list of pre-existing valid labels and operate as if context had been called explicitly. Lua should make that much easier. Furthermore and more importantly, I think more attention needs to be paid to the process by which new labels are added. Our systems are increasingly effectively closed to many types of user input due to creeping top-down templatism - exactly contrary to wikiness. DCDuringTALK17:41, 10 June 2013 (UTC)
I'm not sure what you mean by your first point. A replacement for {{context}} can do just that, and it will have to because that's how it already works currently. If you call {{context|label}} then currently it will look for Template:label and transclude it if it exists, otherwise it will just show the text "label". The Lua form will do the same, except that instead of looking for Template:label it will look in a Lua table of labels. —CodeCat17:49, 10 June 2013 (UTC)
Just as we use context labels to distinguish US English, UK English, Canadian English, etc., under the ==English== header, this entry should use one under the ==North Frisian== header. I'll go do that now. —Angr10:44, 8 June 2013 (UTC)
Editing this entry, and leaving a comment on the creator's talk page, has raised a question in my mind. We are currently moving away from directly called context tags towards using parameters of {{context}} for everything. Does this mean that new directly called context tags are not to be created? If the creator of (deprecated template usage)greewe wants to start a category for the Mooring dialect of North Frisian, can he start a new template {{Mooring}} to allow dialect terms to populate a Category:Mooring North Frisian? Or are such templates now deprecated, and everything is done by parameters of {{context}}? If the latter, what edits would have to be made to {{context}} to get it to know that anything labeled {{context|Mooring|lang=frr}} goes into the appropriate category? —Angr11:27, 8 June 2013 (UTC)
Currently, {{context}} still relies on the presence of directly-named templates because that's how it works. The label templates always had this dual way of using them, and can still be used that way, except that it's now discouraged to use them directly. So you still need to create {{Mooring}} for now, but you should use it as {{context|Mooring|lang=frr}}. It's likely that sometime soon the templates will be changed so that only the latter works. —CodeCat12:15, 8 June 2013 (UTC)
He asked about it here, but none of you commented. I wasn't sure what to say, myself, so I kept waiting for someone to weigh in. Regardless of the merits, he did ask. Also, I wouldn't call it spamming, since we have been using information from his site. Chuck Entz (talk) 14:43, 9 June 2013 (UTC)
Yes, he has asked; I am not crying foul. I merely doubt that speculative information sourced from a single source is suitable for Wiktionary. --Dan Polansky (talk) 18:22, 9 June 2013 (UTC)
Hello all. Thank you for the comments above, which are of the sort I expected might be generated in response to my original inquiry. Allow me to address them here.
The major sticking point appears to be the one expressed by Dan Polansky, who believes that single sourcing for this data is inadequate.
As it happens, this research overlaps with that of an unimpeachable source for Old Chinese studies, Axel Schuessler, who discusses the relation between particular sounds and meanings in his ABC Etymological Dictionary of Old Chinese. I've uploaded a brief article detailing these overlaps, which anyone interested can find by searching my name and the name of the book.
If dual citations are necessary, there are, among the characters in common use today in China and Japan that I cover, approximately 2,500 corresponding to the half-dozen sound/meaning relations specifically described by Schuessler. This corpus would seem to be an unobjectionable starting point for the Wiktionary entries.
Turning to other concerns mentioned, it appears something needs to said about the links. The editor who originally used my material linked with a (sitename).com format, providing unsolicited free advertising for my site. I was not comfortable with retaining that format for entries I was redoing, and chose to use only the family names of the individuals standing behind the claims, hiding the site name in the link coding. My intent was to be less, not more obtrusive, but the decision seems to have brought on the law of unintended consequences.
Finally, DCDuring mentioned the possibility of inserting data in a Usage Notes section, a suggestion that - -sche too had made earlier. I look forward to hearing from experienced editors about how best to go about doing this, especially in respect to coordinating the cites from my data and from Schuessler's. This would have the added benefit, I suppose, of allaying concerns about link spam.
Thank you for your consideration. Lawrence J. Howell (talk) 07:16, 11 June 2013 (UTC)
This is utter phonesthemic nonsense, largely based on outdated reconstructions. I suggest simply reverting these additions. (reverting the reverting) Wyang (talk) 09:33, 11 June 2013 (UTC)
I agree with Wyang; these "phonosemantic interpretations" are linguistically nonsense. (I also disagree with reverting non-trolling/non-vandalism comments just because they're made by an anon.) —Angr13:54, 11 June 2013 (UTC)
@Wyang, you say this is nonsense, but I don't really know how to weigh your comment -- based on what? Lawrence has pointed us to a scholarly work, providing a way for us to verify that someone out there has this theory, and ostensibly to follow up on that author's sources as well. (Google Books, for the interested.) On the opposing side, we have you and Angr, and while I respect both of you as WT editors, I have no real idea what your academic background might be, nor any real idea what underlies your calling this "nonsense". I don't have any solid basis for evaluating your comment.
I don't have access to sources, but the statement "Old Chinese Initial /*k-/ lends semantic value Frame. Final consonant /*-t/ lends semantic value Cut/Divide/Reduce." at was the first thing that set off my bullshit detector. It is axiomatic in phonological theory (the field my Ph.D. is in, since you asked about academic background) that phonemes have no inherent meaning (e.g. ), and although sound symbolism has some reality, it generally holds across languages rather than being language-specific, and it tends to correlate sounds with vague physical properties like "big/small", "pointy/rounded" etc. rather than very specific concepts like "frame" and "cut/divide/reduce". —Angr19:39, 11 June 2013 (UTC)
Schuessler (2007) is a pioneering work in Chinese etymology and is in general reliable, but the "phonosemantic interpretations" posted here are far from what was written in that publication. Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants), some of which are reconstructable at the Proto-Sino-Tibetan level, and identified some phonesthemic patterns, such as *m– for "darkness", *–m/p for "closure", which are patterns generally found translingually, especially in the E/SE Asia region. However, the theory advanced by User:Lawrence J. Howell is that every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning, which is of course untrue for any language without an established pattern of such word formation. For example, for the word for "two" (二, *nij-s in Baxter-Sagart (2011) or *njis in Zhengzhang (2003)), based on the outdated reconstruction *ȵi̯ær by Karlgren (1957), he proposed that it can somehow be reanalysed as the diliteral root *n–r, which encoded meanings in its individual consonants (suppleness + continuum). This is obviously false since the word came from Proto-Sino-Tibetan *g/s-ni-s ("two"), and it's not even a marginally accepted theory in Chinese or Proto-Sino-Tibetan linguistics that the word for "two" was constructed from "phonemes expressing suppleness and continuum". Linguistics is not my profession; my area is in phylogenetics or hominid evolution rate. Wyang (talk) 23:48, 11 June 2013 (UTC)
Thank you both, that's exactly the kind of detail I felt I was missing previously. @Angr, your mention of specificity articulates an unvoiced worry in the back of my head about how over-specific a couple of these phonosemantic interpretations have been. @Wyang, your comments are particularly damning in pointing to where Lawrence is diverging from the actual sourced material.
@Eirikr: Thank you for steering the discussion in a productive direction. On the other hand: In my defense? Bogosity? Wow.
@Wyang: Your presentation of the contents of the ABC Etymological Dictionary of Old Chinese merits close inspection.
Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants)... (Tentatively? But let me not get sidetracked.) Please refer to p. 27 of the dictionary, Section 2.9 Meaning and sound. You'll find that your assertion does not square with the terms the author employs in his discussion of the nexus of particular sounds and meanings (quotation marks omitted): OC words; final -*p; final *-m; stem initial *-m; roots; stems; initial *w-; variants with other vowels; initial *l-; initial consonant; start with *n-.
... every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning. Regrettably, you misstate my theory, which properly accounts for the presence in OC of loan words and terms originating in onomatopoeia.
You take issue with my interpretation of initial *n- term 二. Perhaps my interpretation of a particular character is dissonant. But try coming at things from the opposite direction. In other words, begin with the normative and sort out the exceptional. You do not care for my data, so let's follow Schuessler who, on the page noted above, writes Words for 'soft, subtle, flexible', including 'flesh; female breast' start with *n-...
Turn to p. 395 of the dictionary. Over the following dozen pages, scattered among the loan words you will find enough examples of terms beginning with *n- and associated with soft/subtle/flexible that Schuessler felt justified in making the statement quoted immediately above. If you have a bone to pick with that conclusion, the author would be the person with whom to remonstrate.
As for me, among the thousands of characters I interpret, I may unwittingly be offering scores of problematic examples. I will readily acknowledge those when presented with compelling arguments. For one, I will revisit 二, and thank you for that.
The ultimate point here, however, is that Schuessler identifies what he calls phonesthemic or phonoaesthethic phenomena in OC, where certain meanings are associated with certain sounds. Nothing you have written counters that fact or undermines his finding.
I realised I missed the bit on phonesthemic patterns but you beat me by about ten minutes in submitting the reply. I've slightly revised my post. It's true that such patterns are found in Old Chinese, but these patterns are of limited derivational consequences in OC as the majority of OC lexicon does not conform to the semantic expectations from generalised phonesthemic patterns (as proposed in KN). In addition, these patterns are generally true for other E/SE languages as well (or even wider, see Phonosymbolism and the Verb cop). Postulating that words are consistently derived using this principle is a bit like positing a proto-phoneme initial *f– for English, denoting "movement", which is responsible for deriving Modern English words like fare, fast, fight, flee, flow, fly. Even if a large proportion of Old Chinese appears to be of non-Sino-Tibetan origin, a large part of which does not even seem to be related to anything else found in languages in the vicinity (!), there is no need to resort to such extensions of the sound symbolism principles, especially when the model is 1) unrecognised elsewhere; 2) nonspecific and ambiguous in the semantic descriptions; 3) relying on outdated reconstructions; 4) gives numerous misfits apart from the true sound symbolisms (Sorry). For OC words with established ST comparanda, listing the PST etymology is sufficient. Wyang (talk) 06:40, 12 June 2013 (UTC)
I have no knowledge of Chinese and can't make any judgement on this issue, but I'd like to point out that only a few days ago there was a rather heated discussion about bashing newbies - and utter phonesthemic nonsense, wild theories with nothing to back them up and spammer seem rather harsh judgements to make without actually asking the editor what he has to say. Isn't everyone supposed to AGF? Lawrence has been polite and has cited his source(s); so it would be good if we could be polite back. Hyarmendacil (talk) 05:52, 13 June 2013 (UTC)
@Hyarmendacil: Thank you for the call to civility. You missed my favorite, though: The implication that this is a tribunal, and that my options are to clear my name or face sanctions. I know it was written tongue-in-cheek, but still ... And bogosity? Perhaps after this thread winds down someone will be kind enough to assign it a bogosity level rating, so we all know where we stand.
As for the editors you quote: Wyang has managed not only to regain his equilibrium but to perform an immensely constructive service for the Wiktionary community: Confirming that a reputable authority maintains the existence in Old Chinese of phonoesthemic (phonosemantic) patterns. (That information, I might add, has been in circulation since 2007, contained in a readily available book that has received both scholarly and popular acclaim. For that reason, the knee-jerk rejections caught me by surprise.) His example holds out hope that the other editors too may eventually come around and offer positive contributions here.
@Wyang: OK, so we're agreed on the existence of phonesthemic patterns in OC. Great start.
Can you tell me the basis for your statement ... the majority of OC lexicon does not conform to the semantic expectations from generalised phonesthemic patterns ...? AFAICT, the majority of OC lexicon conforms surprisingly well.
... these patterns are generally true for other E/SE languages as well ... I understand you to be offering this as a rationale for not applying the patterns to Wiktionary's entries for Han Chinese characters. I take it to be, on the contrary, an excellent reason to apply the patterns across the Wiktionary board.
Skipping to your last point before circling back, I would second your idea about adding a PST etymology (source?) for each character. But that doesn't have to come at the expense of listing the true sound symbolisms.
Now to your four points, in order.
1) Your indication that the model is unrecognized elsewhere. (Shrugs shoulders.) Speaking here to the Wiktionary community: Bearing in mind that this discussion is, in the end, about whether or not to add certain data to Wiktionary, and presuming consensus can be obtained for adding consensus true sound symbolisms (minimal definition: Contained in both Schuessler and KN): Is lack of precedent a make-or-break issue?
2) ... nonspecific and ambiguous in the semantic descriptions ... Can you elaborate? Earlier in this thread certain aspects of my data came under suspicious for being overly precise.
3) Outdated reconstructions. I don't dispute your use of the descriptor outdated reconstructions; scholarly reconstructions of OC have progressed considerably since Karlgren. Nonetheless, the considerable handicap of ORs did not prevent my research collaborator and I from identifying the phonosemantic/phonesthemic patterns in OC noted at KN, which overlap with those of Schuessler. Also, it is entirely possible that the ORs should be credited for enabling us to see just a bit more deeply than Schuessler in certain cases. For example on p. 21 of the dictionary he presents a chart with a small number of examples of labial initials connected with the meanings swell, protrude, prominent, bloom, bud etc. These are all, the KN data indicates, manifestations of the single concept Spread, encompassing a number of related terms many times larger than the number of characters in the chart. Also in this context, I could discuss at some length the Cut/divide/reduce aspect of final *-t (ABC: sometimes transcribed as *-t, occasionally as *-ts) that was called into question earlier in this thread, but maybe some other time.
An additional point with regard to reconstructions is their mutable nature. It's 1992, and William Baxter has the stage pretty much to himself with A Handbook of Old Chinese Phonology. 1999, however, brings competition in the form of Laurent Sagart's The Roots of Old Chinese, a work glowingly reviewed by Wolfgang Behr. Sergei Starostin's reconstructions are coming out, too. Fast-forward to 2007 and here's Schuessler with his ABC Etymological Dictionary of Old Chinese, giving us four differing sets of OC reconstructions by contemporary scholars (not counting works published in China). What are conscientious editors of an online reference source such as Wiktionary to do? The solution seems to have been to ignore them all, a shrewd move as it turned out because just four years later Baxter and Sagart (neither scholar having been satisfied with his earlier work) bestow upon the world their collaborative Old Chinese reconstruction files. My point is, of course, that when it comes to OC reconstructions, the goalposts rarely remain in place for long. And so one may be excused for regarding them with a jaundiced eye.
4) Misfits. These are, as I see them, the undesirable flip side of the ORs which, as I describe above, have their strong point too. I look forward to identifying misfits and amending their interpretations as necessary. The process, if it is to be carried out on Wiktionary, depends on the issues presented below. (/@Wyang)
Now, as a practical matter, and returning to the topic that has brought us here, I'd like to ask the Wiktionary community whether there is a consensus to add OC reconstructions to the Han character entries. If so, whose? Baxter/Sagart's? Starostin's? Schuessler's? Someone else's? Some combination of two or more?
Scenario One: No consensus for adding scholarly OC reconstructions. In this case, I propose adding KN data as it stands, accompanied by footnotes or usage notes with verbiage such as: Reconstructions based on B. Karlgren; Interpretations by H & M; No scholarly approbation implied.
Scenario Two: Consensus to add scholarly OC reconstructions from one or more source. In this case, what objection might there be to adding interpretations to the entries for those characters in which KN data overlaps with Schuessler's, providing two cites for each entry, which should satisfy the concerns of Dan Polansky and others who share them? Lawrence J. Howell (talk) 08:17, 13 June 2013 (UTC)
My point was that the majority of your additions could not be found anywhere other than your site, not in any publications, including Schuessler (2007). Looking at your most recent edits, almost none of these character etymologies can be backed by Schuessler, and so is your methodology of phoneme decomposition and semantic value association. Even if a few are also identified by Schuessler as true sound symbolisms, the fundamental reliance of KN data on an outdated reconstruction has made such analyses unreliable. The multitude of reconstructions is not an excuse for choosing an obsolete one; in fact recent reconstructions have been surprisingly convergent (as evident in the case for "two" above), for example the reconstruction of the uvular series, lateral initials and voiceless sonorants, none of which is reflected in KN.
Let me use an example to illustrate this - 馬 ("horse", OC /*mˁraʔ/). Below is the passage from Schuessler (2007) (no copyright infringement intended):
mǎ1 馬 (maB) - LH maB, OCM *mrâʔ
'Horse'
Sin Sukchu SR ma (上); ONW mä
ST: PTB *mraŋ (STC no. 145): > OTib. rmaŋ, Kan. *s-raŋ, WB mraŋB, JP gum31-ra31 ~ raŋ; JR (m)bro < mraŋ). For the OC - TB difference in finals, see §3.2.4. STC (p. 43 n. 139) relates PTB *mraŋ to a PTB root *raŋ 'high' ( → líng6 陵).
Horse and chariot were introduced into Shang period China around 1200 BC from the west (Shaughnessy HJAS 48, 1988: 189-237). Therefore this word is prob. a loan from a Central Asian language, note Mongolian morin 'horse'. Either the animal has been known to the ST people long before its domesticated version was introduced; or OC and TB languages borrowed the word from the same Central Asian source.
Middle Korean mol also goes back to the Central Asian word, as does Japanese uma, unless it is a loan from CH (Miyake 1997: 195). Tai maaC2 and similar SE Asian forms are CH loans.
and this is your added content:
Old Chinese Initial /*m-/ lends semantic value Concealment.
Pictogram (象形) of a horse. It is unclear whether this term is onomatopoeic. If so, there is no semantic role behind initial /*m-/. If not, and the term was devised in connection with "concealment," the precise nature of the link is uncertain.
Source: Howell & Morimoto.
How is your theory that the OC word for "horse" was perhaps derived phonesthemically from the phoneme initial /*m-/, signifying "concealment", or your theory that the word was perhaps onomatopoeic in origin, backed by Schuessler (2007) or other publications? If not, isn't the above paragraph entirely your envisagement against established consensus? Wyang (talk) 02:12, 14 June 2013 (UTC)
ABC and KN both maintain the existence of phonesthemic patterns in OC. How Schuessler arrived at that conclusion and how he chooses to shape the material is his prerogative. Likewise for KN. Remember, the only reason Schuessler has been brought into the discussion is to address concerns about single sourcing (NB: sourcing of hermeneutic principles, not one-by-one interpretations).
@Wiktionary editors: Unless I am greatly mistaken about how things work around here, the community will at some point be shifting into decision-making mode to determine: Do KN interpretations belong in the dictionary? As a quick reference for those to be involved in that process, allow me to contribute a recapitulation.
・I have come to the dictionary with the intention of helping to improve the presentation of existing material.
・I stated my relation to that material, offered for inspection a sample of entries in a format I believe is compatible with Wiktionary style, and requested feedback.
・I waited until responses petered out, implemented formatting improvements that had been suggested for the sample entries, and uploaded similarly formatted new entries.
・Apparently unaware of the thread I had initiated here in the Beer Parlour, an editor called attention to my uploading activity and asked what should be done about it.
・A half-dozen other editors swiftly converged, casting aspersions on my motives and vilifying the idea of phonosemantic principles being operative in Old Chinese.
・One editor asked why none of these issues had been raised in the initial post. (No response.)
・Everyone disappeared save a single editor whose rhetoric thus far has included:
Vituperation: utter phonesthemic nonsense
Volte-face: It's true that (phonesthemic) patterns are found in Old Chinese ...
Untenable claim: Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants).
False attribution: ... the theory advanced by User:Lawrence J. Howell is that every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning ...
False analogy: Postulating that words are consistently derived using this principle is a bit like positing a proto-phoneme initial *f– for English, denoting "movement", which is responsible for deriving Modern English words like fare, fast, fight, flee, flow, fly. (No, actually, it is not a bit like it at all. The subject is OC, not Old English, and nobody is claiming that universals across languages maintain applicability along all categories.)
Non sequitur: ... almost none of these character etymologies can be backed by Schuessler ...
This is kettle logic, AKA chucking out arguments in hopes that one of them will stick (or create the desired impression, which can be almost as good).
This is not to assert that Wyang is completely off target. For example, with reference to OC reconstructions, and for what it's worth, it's safe to say that the opinion ... recent reconstructions have been surprisingly convergent ... is mainstream in Sinologic circles. S/he also notes the advances resulting in the reconstruction of the uvular series, lateral initials and voiceless sonorants; what if any influence they may bear on KN interpretations is a matter for study.
As for the thrust of Wyang's rhetoric, however, s/he appears to be arguing that the current state of OC reconstruction (partially/largely) invalidates the KN interpretations. Also (and Wyang will correct me if I'm wrong), it appears the intent is to persuade the community to exclude the interpretations from Wiktionary; expending such energy on the thread makes little sense otherwise.
I'll continue the debate with Wyang as long as necessary, but I wonder if it would be too much to request some form of indication that the community is working toward a resolution of this issue. To that purpose, interested editors may wish to look at the explanations found in the Etymology sections of Han Chinese characters (ones not taken from KN). A few dozen entries will be enough to create a valid if minimal sample. For each, do you find that the citing make the origin of the explanation evident? If not, is that not a problem? If so, do the sources conform with the inclusion standard to which KN data is being held? I refer especially though not exclusively to that prickly issue of multiple sourcing. I believe you'll agree that the answers to these questions are of great relevance.
There's more to say, but I think it's time for the community to return to the stage.
Real life considerations dictate that I'll be able to offer nothing beyond cursory remarks in the next few days, and none at all in the week following. I will however be back at the end of the month, so please do carry on without me. Thank you all for your consideration. Lawrence J. Howell (talk) 04:43, 16 June 2013 (UTC)
If I read it correctly, you didn't really answer my questions did you? Your addition at the etymology of 馬 was unsubstantiated by publications. Do you agree? Wyang (talk) 04:36, 20 June 2013 (UTC)
Here you have unfortunately discovered the major flaw in wikionary's bureaucratic process; that discussions like this in the Beer Parlour never actually 'end' - they just. I'm afraid I don't think the community is going to return to the stage (cf. Mglovesfun), so if you're still interested in the whole affair, I suggest we just try to resolve the issue. To me it appears that the main points are as such:
- Both sides agree that phonosemantic interpretations can be valid under at least some cases.
- Neither side agrees on the extend to which phonosemantic interpretations apply.
In a case like this, the only way to resolve the argument is to cite eveything with a reliable source. Schuessler says Words for 'soft, subtle, flexible', including 'flesh; female breast' start with *n-...; this implies that the Chinese words for flesh and female breast (if you can find the words being referred to) are valid tender for a phonosemantic note; but not that all words starting with *n- are so valid. In other words; each phonosemantic note you make should be able to be cited to a statement in a reliable source that confirms that this specific entry has the phonosemantic etymology you attribute to it.
You note, correctly, that many other Chinese etymologies do not conform to the standards you are being asked for. Most etymologies (e.g. see basically every English etymology) are not cited becuase they are widely accepted by the community, and have not been challenged. Chinese Phonosemantic etymologies (however well established to the Sinologists) are unfamiliar to the general public; hence the disparaging remarks at the beginning of this chapter; hence the need for adequate citation. The point is that every entry should be able to be cited reliably when challenged.
So (if you're still interested) I would suggest you do the following:
Continue to add phonosemantic etymologies.
Classify them under the etymology header.
Make sure that they are cited thoroughly, as per above, using the references tool so that they appear as footnotes (this is the standard way of doing it, e.g. gonzo)
Get Wyang to check over a few ones you have done, so that (s)he is happy.
^ Perhaps you haven't been made aware of a key point in this debate: most wiktionarians will not consider KanjiNetworks a 'reliable source'; certainly not for non-mainstream etymology content. Most websites are generally veiwed with suspicion, unless they're academically affilated (Perseus) or have been found to reliable and uncontroversial (Online Etymology Dictionary.
^ You should note that, if you've cited Schuessler or someone, it is not actually necessary to cite KanjiNetworks for any phonosemantic etymologies. However, KanjiNetworks may be useful as a source for any general, mainstream etymologies (e.g. for 馬 you note that the shell and bone character is a pictogram - I don't think anyone has challenged that etymology.)
Thank you, Hyarmendacil, for stating the issues in dispute and for offering specific suggestions according to which Wiktionary entries for Chinese characters may be improved. (I had intended to be available for follow-up last week, but events did not cooperate: Apologies for my tardiness.)
I concur with the necessity of citing everything with a reliable source. Also, I acknowledge that, as you point out, most editors consider that my dictionary does not qualify as a reliable source. And, in the interest of bettering Wiktionary's entries on Chinese characters, I'm more than a little tempted to implement your concrete suggestions.
However, my experience here indicates that allocating additional time and energy to the entries on Chinese characters is likely to prove unfruitful.
Consider the following.
An IP reverts my description of 身 (Pictogram of a pregnant woman, the fetus adhering within her womb) with the cryptic comment, Very little evidence to support those claims. The claim the IP prefers (as the previous and current version of the entry explains): Ideogram 指事: from a pictograph of a pregnant woman.
Two issues here. First, where is the evidence for the (ahem) bold claim that 身 is an ideogram? For that matter, where are the demands to furnish evidence for this claim?
Second, prior to this reversion, Dan Polansky had asked me to hold off on adding entries until things had been sorted out (as he put it). Motivated by natural guilelessness and a desire to be collegial, I readily agreed. So, following the IP's reversion, I asked Dan: As I'm abiding by the community's request to refrain from doing anything until the matter under debate has been settled, I believe it's only fair that the hands-off policy cut both ways. What's your take? Dan responded by referencing a fast track policy the existence of which might come as a surprise (and useful tool) to more than a few Wiktionary editors: As per fast track, etymological content that is sourced from a single source, having no obvious other sources, and for which no sources are in the process of being added can be removed.
In other words, at the very time a secondary source (Schuessler) could not be added because the process was still being sorted out, editors were welcome to revert entries because no sources were in the process of being added. Welcome to the bottom of the rabbit hole, everyone. I think you'll like it here.
With respect to your suggestion, Hyarmendacil, that ... each phonosemantic note you make should be able to be cited to a statement in a reliable source that confirms that this specific entry has the phonosemantic etymology you attribute to it:
Regrettably, this won't work because Schuessler doesn't repeat his phonesthemic conclusions in specific entries. For example, if you look at 乳 (breast; nipples), nowhere will you find Schuessler discussing breasts/nipples in terms of being soft, supple or flexible objects. The reader must supply the phonesthemic imagery herself. The upshot is that any Wiktionary editor who happens along can say, Hey, Schuessler's entry for 乳 says nothing about breasts or nipples being soft/supple/flexible objects. This is single sourcing!! Revert! And that's the last we'll see of the phonosemantic etymology.
For those reasons, much as I appreciate your intervention and proposals, Hyarmendacil, I just don't see them being practicable under the current circumstances. I'm optimistic that more can be accomplished if and when the dynamics of this particular dictionary project evolve.
In any case, although as you say the process was long and drawn-out, I'm encouraged that a number of Wiktionarians have been exposed to the fact that reliable research indicates the existence of phonesthemic patterns in Old Chinese. It's a start, and that's something worth celebrating. Lawrence J. Howell (talk) 06:17, 4 July 2013 (UTC)
Perhaps the appendix namespace would be more appropriate given the speculative nature of Old Chinese reconstructions that these interpretations are based on? These appendices could then be linked to with some special templates (or plain wikilinks) from ===Etymology=== sections. We could treat them as semantic counterparts to proto-forms. --Ivan Štambuk (talk) 15:12, 4 July 2013 (UTC)
Thank you very much for your constructive suggestion, Ivan. I'm combing the sub-categories and pages within the Appendices to familiarize myself with the contents as well as with the formatting and presentation styles currently in use there. Meanwhile, I hope your proposal will generate additional fruitful discussion. Thank you again. Lawrence J. Howell (talk) 10:11, 5 July 2013 (UTC)
The IP user who reverted your edit at 身 was suspicious of the veracity of the phonosemantic assertions, not of the glyph origin; otherwise, he/she would have deleted the "Etymology" section below altogether. He/she certainly has solid grounds to do so - your invented metholodogy, your semantic values, and your reconstructions, all appear doubtful. Below are the reconstructions and modern pronunciations for this character:
Old Chinese
Karlgren (pre-1970s): *ɕi̯ĕn
Li Fang-kuei (pre-1990s): *r̥jin
Wang Li (pre-1990s): *ɕien
Baxter (1993): *hjin
Zhengzhang Shangfang (2003): *qʰjin
Pan Wuyun: *qʰjin
Baxter-Sagart (2011): *n̥iŋ
Middle Chinese
Karlgren: ɕi̯ĕn
Wang Li: ɕĭĕn
Li Rong: ɕiĕn
Shao Rongfen: ɕjen
Zhengzhang Shangfang: ɕiɪn
Pan Wuyun: ɕin
Pulleyblank: ɕin
Modern Chinese
Beijing (Mandarin): ʂən⁵⁵
Guangzhou (Cantonese): sɐn⁵⁵
I don't think anyone credible has ever reconstructed this character as having a *t- initial in Old Chinese, and this constitutes the basis of half of your assertions. Besides, why would the word be derived from "straight" rather than "bent" when "a pregnant body" rather than "body" was the original meaning?
Even with phonesthemic cases recognised by Schuessler, listing the phonesthemics may not be necessary if the phonesthemic derivation (if true) is demonstrably a pre-OC event. eg. Even if 乳 ("breast", the example you used above) indeed had a phonesthemic origin, such relationship could obviously be traced back to the Proto-Sino-Tibetan level, since established cognates exist in other languages. See Reconstruction:Proto-Sino-Tibetan/s-nəw. The phonesthemics is therefore quite distant and should not be mentioned when the PST etymology is absent (otherwise it would be akin to saying the word "wheel" is onomatopoeic with the PIE etymology in absentia). Wyang (talk) 08:24, 11 July 2013 (UTC)
I promised to debate Wyang as long as necessary. Objective readers will, I believe, agree that Wyang's latest post has now obviated the necessity, as we find it is:
Repetitive: The list of reconstructed readings serves no purpose but to back assertions made nearly a month earlier about (1) consistency among contemporary reconstructions (a point I readily acknowledged) and 2) my reconstructions being idiosyncratic. (By the way, for those keeping track, *t- for 身 comes from Akiyasu Tōdō. Expect Wyang to reply, opining that Tōdō is not a credible source.)
Inconsistent: Quoting Wyang on 12 June: For OC words with established ST comparanda, listing the PST etymology is sufficient. Quoting Wyang on 11 July: ... listing the phonesthemics may not be necessary if the phonesthemic derivation (if true) is demonstrably a pre-OC event.
Eccentric: 1) Mind-reading another editor (curious enough, that) then presenting the interpretation as fact. 2) Conjuring out of thin air an association between a pregnant body and bent.
To this point, we've seen three types of argument employed against phonesthemic interpretations and/or their inclusion in Wiktionary's Chinese character entries. 1) The argument from ignorance, with the syllogism running: I've never heard about phonesthemic patterns in Old Chinese. The idea sounds outlandish. It therefore merits immediate rejection. 2) The argument against single sourcing, here applied selectively in ignoring all those long-accepted, single- and non-sourced etymologies in the Chinese character entries. 3) The argument that, despite the existence of a recognized authority for the notion of phonesthemic patterns in Old Chinese, Wiktionary need not incorporate such material (or, incorporate it only partially).
Exponents of the first two arguments have disappeared, leaving 3). Wyang has made her/his points, and I have acknowledged the meritorious ones among them. With nothing new under the sun (and considering the odd tangents) contained in Wyang's last post, it appears we've reached the end of the productive life of this thread. Here is how things stand.
Including - -sche in this earlier thread, four editors have presented specific proposals for placing and/or formatting the phonesthemic material (two, it should be noted in all fairness, with provisos).
Re: "Exponents of the first two arguments have disappeared, leaving 3)". Okay, so if the proponents of the first two arguments do not take part on this discussion any further, does that mean their arguments have really been addressed by you and refuted or invalidated? It does not. I still think sourcing etymology in Wiktionary mainspace from a single source is a poor idea. I do not know the state of other etymological information in Chinese entries; I find it likely that there is other stuff to be removed. Now you may complain of me being repetitive, but I do not really have much else to say for the matter. My stance rests. --Dan Polansky (talk) 19:38, 13 July 2013 (UTC)
I wouldn't regard Kanji gogen jiten (1964) as a reliable source of information on Chinese etymology. Apart from outdatedness, the "associative/derivative" etymological approach used in that book was not properly comparative.
My intentions are non-opaque since my first post - unsubstantiated original research like this should be killed on sight. It's a shame that verbosity has impeded its removal. I never envisaged some way in which such content could be transformed into something inclusion-worthy. I have used multiple examples above to illustrate why, but the replies have been concessive or digressive unfortunately. Wyang (talk) 00:57, 14 July 2013 (UTC)
@Wyang: ... the replies have been concessive or digressive ... Precisely. Concessive when I've acknowledged your meritorious points. Digressive when ..., well: Here's a mirror. My intentions are non-opaque ... The dueling positions noted in the Contradiction bullet point suggest otherwise; they sound mighty square peg/round holish to my ears. Finally, you would have us believe that verbosity is impeding the removal of the material I added?? What's stopping you? Dan Polansky gave all editors the green light, remember? And recall who applauded the IP for taking the initiative and doing what you yourself evidently ache to do. Go on, Wyang, have the courage of your convictions!
@Dan Polansky: ... if the proponents of the first two arguments do not take part on this discussion any further, does that mean their arguments have really been addressed by you and refuted or invalidated? It does not. (Aside to Wyang: If you consider Dan's self-answered question a digression, you'll probably want to stop reading now. I'm going to respond because it's important to him.)
The two arguments are better kept distinct. The first, the argument from ignorance, is an argument only in the loosest, knowing-wink sense of the word: Here, appeal would be a preferable rendering of the first word of the Latin argumentum ad ignorantiam. Note that its exponents didn't ask, "What if any evidence is there for this phonosemantic business?" They simply dismissed the possibility out of hand, appealing to the notion's assumed ludicrosity when viewed through the prism of whatever linguistics training they have happened to obtain. And then, yes Dan, I really addressed their argument by introducing them and other Wiktionary editors to the research of Axel Schuessler. You'll recall that as being the point when Wyang decided to abandon it. And did introducing Schuessler's work serve to refute/invalidate this argument? To your way of thinking, it doesn't mean I/it did. I'd counter that refute/invalidate is asking too much in the first place, that for present Wiktionary purposes it has been more than adequate to suggest the surprises Old Chinese contains for the uninitiated, and to demonstrate that this surprising material rests on a solid academic foundation. But, whatever.
In contrast, the second argument (yours, about the non-desirability of single sourcing) is perfectly sound, making it only natural that you wish to rest on that stance. The problem, as I noted, was its selective application. You inform us you ... do not know the state of other etymological information in Chinese entries. OK: That explains why, at the outset of this thread, you were unaware you were making a selective argument. And now you are aware. Good enough.
In fact, it's very good indeed (and a happy by-product of this thread) that the sorry state of many of the Chinese character entries is receiving some of the long-overdue attention it deserves. Below, Eiríkr Útlendi makes a useful suggestion about the proper location for certain material, and we can hope that is just a start toward helping Chinese character entries approach the quality of those covering English terms.
For all this, I don't see that the posts of the last few days are bringing us closer to a resolution of how to handle the phonesthemic material. Anyone else care to chime in? Lawrence J. Howell (talk) 12:05, 16 July 2013 (UTC)
OK, it appears this thread has run its course without reaching a consensus about the main issue that took shape along the way: Whether and how to employ phonesthemic material in Chinese character entries. Nor have any proposals for achieving resolution been forthcoming.
In order to facilitate such a resolution, I considered proposing that we backtrack and rectify a fundamental problem: The dissimilitude of data contained in the Etymology section of various of the entries. (In an earlier thread, I noted that we find Etymology serving as a catch-all category for unannotated graphic evolution, plain descriptions of the phonetic and semantic elements of compounds, unsourced speculation, interpretations contraindicated by historical evidence, and ancient phonology). My thinking was: If the community is able to settle upon a working definition of etymology with respect to Chinese characters, the content of the entries will become more congruous. In turn, a suitable treatment for the phonesthemic material vis-à-vis this harmonized data should come into focus.
With the intent of gathering data to set the stage for this kind of discussion, and to familiarize myself with the process according to which the entries have assumed their present shape, I referred to past Beer Parlour threads concerned with Chinese characters and etymologies. There I learned that others have already attempted to train the spotlight on this question of definition, without ever gaining traction.
At the same time I (re)confirmed that it is as Hyarmendacil indicated above: Threads tend to end abruptly and without resolution. One example bearing directly on the present discussion is the following contribution from this October 2011 thread, in which an IP editor wrote, Proposal: use "Etymology" for the origin/coinage of the phonetic version of the character (if a single-character entry) or the origin/coinage of the word (if a multi-character term), and use "Graphical significance" (which could be a subheader of "Etymology") for the way the character looks and has developed over time (for single-character entries). The thread stopped here sans comment on the proposal, and sans practical effect.
So, we find older threads evidencing lack of interest to resolve an issue of central concern to the resolution of the issue debated here in the present thread. Here, in contrast, we find conscientious editors contributing suggestions about placement and formatting of phonesthemic material. However, as the specifics are, by and large, mutually exclusive, we remain far from consensus on how precisely the material should be presented.
Then again, and ominously for the prospects of cooperative effort, the present thread also features an example of an editor making things up as he goes along, justifying his freshly minted policy with logic straight from the pages of Lewis Carroll (or Franz Kafka) and creating a new type of double standard for editing the dictionary.
In conclusion, I believe that consensus on what Etymology sections of Chinese character entries should contain is a sine qua non for placing the phonesthemic material on footing stable enough to ward off reversions accompanied by objections as attenuated as Very little evidence to support those claims, but Wiktionary history indicates that hopes for achieving such consensus in the absence of considerable changes in editorial dynamics are unrealistic. For that reason, and because nobody wants to spend hundreds of hours uploading data only to see it all sucked into the void, I will refrain from adding phonesthemic material to Wiktionary until the editorial climate changes.
If someone wishes to have the last word here, s/he's welcome to it.
Thanks again to all those who made positive contributions to this thread, including of course Wyang.
Setting aside the entire question of whether such evidently controversial single-source information should be included in Wiktionary, it occurs to me that what Lawrence is addressing here is the possible etymology of the Chinese terms and their descendants. As such, I don't think the ==Translingual== section is the place for this. Given my current understanding of the organization for hanzi / kanji / hanja / hantu pages, etymologies in the ==Translingual== section should be solely about the derivation of the character, and not about any particular pronunciation of said character. For instance, the phonesthemic origins of Chinese 身 have nothing to do with the character itself -- this only has any bearing on the phonology of the Chinese term and its descendants. It is irrelevant to any language that uses the (deprecated template usage)身 character to spell a completely etymologically-unrelated term, such as Japanese 身(mi), which arises from the same Old Japanese root as 実(mi) and has to do with ideas of budding or fruition, with no connection to the Japanese terms for straight (直ぐ(sugu), 真っ直ぐ(massugu)) or bent (曲がった(magatta), 曲がっている(magatte iru)).
The glyph origin is inherently intertwined with Old Chinese phonology. It is impossible to explain the former without reference to the latter, considering the majority of Chinese characters are phono-semantic characters. Conceivably the character page should not serve as lemma page for non-Sinoxenic content, such as Japanese wago or Vietnamese từ thuần Việt, and the character page should start with the character glyph + word etymology, ancient / modern Chinese pronunciations, semantic development, compounds, then followed by Sinoxenic pronunciations and links to non-Sinoxenic words written with this character. The character etymology does not belong to any modern Chinese variety. The current practice of giving Chinese a special treatment in contradistinction to other ISO macrolanguages is... absurd. Wyang (talk) 00:10, 19 July 2013 (UTC)
Re: phonesthemics and glyphs, for compound characters like 冷 or 縞 or 燃, yes, phonology and character composition are demonstrably related. However, what of purely pictographic characters like 馬 or 木 or 水? How does phonology have any bearing on how these characters were originally composed? (Phonology is perhaps a different question for 馬 (horse), where the etymologies I've read so far point towards an origin in some central-Asian steppe language, with later borrowing into Chinese, and suggestions that root sound ma is also the root of English mare.) ‑‑ Eiríkr Útlendi │ Tala við mig00:51, 19 July 2013 (UTC)
Re: lemmata for Japanese, the consensus among JA editors has been generally to use the lemma form as found in JA dictionaries, unless there is some compelling reason to do otherwise for a given specific entry. In addition, there is entirely too much room for confusion if we were to use the phonetic hiragana spelling for all 和語(wago, a word originating from Old Japanese and not borrowed from another language). This is, to a great degree, why Japanese still uses kanji and has not simply adopted kana across the boards -- the need to distinguish homophones. ‑‑ Eiríkr Útlendi │ Tala við mig00:51, 19 July 2013 (UTC)
Non-phono-semantic characters account for less than 20% of all characters. Presumably the relative positioning and formatting of all character etymologies should be kept consistent - hence OC phonology and modern pronunciations should be presented as close to the character etymology as possible in the page, which should be at the top of the page, not in the current form of glyph etymology much antepositioned to Chinese pronunciations (eg. 代, 使). For the second part - it is implementable. ja.wikt uses it and the homophones are presented clearly (ja:かく). Wyang (talk) 01:18, 19 July 2013 (UTC)
Re: single-hanzi pages, hmm, yes, much to chew on. OC phonology being kept close to the glyph makes considerable sense; I'm less convinced about changing the location of modern pronunciations.
Re: Japanese entries, implementable, yes, but also quite ugly. :-/ The MediaWiki platform just isn't suited to the kind of multiple redirection required for an optimal Japanese e-dictionary structure.
I note too that the example you give is for kanji + kana entries -- I have yet to find any terms with single-kanji spellings and no okurigana that are listed on the JA WT under anything other than the single kanji. And it's this kind of single-kanji term that is more relevant to this current thread about phonesthemics and where to locate such information, no? ‑‑ Eiríkr Útlendi │ Tala við mig18:14, 19 July 2013 (UTC)
People, Mr. Howell has (semi-)graciously conceded on this issue at the end of the massively-long previous section.
This material is totally, impossibly inappropriate for Wiktionary. It has to be removed. It's time for it to be removed.
It's a radically controversial new theory. It would have to be well received by linguistics journals and conferences, make its way into textbooks. Then it would become the "standard facts" that belong in Wiktionary.
It's a flagrant case of someone engaging in "promotion" of their own obscure work -- even if well-intentioned on Mr. Howell's part. I understand how it got to this point, but Mr. Howell should not be lobbying on his own behalf in these discussions.
And as to the substance, the reason you can't decide where this material belongs is because it belongs everywhere and nowhere.
Two totally different subjects -- possible phonesthemic patterns in the spoken language, and the etymology and form of written characters, are being hopeless mish-mashed together. The graphical shapes of Chinese characters, the strokes themselves, have never represented any analog to spoken sound. And Mr. Howell doesn't propose so. He just transfers the "suppleness of initial /n/" over to a written glyph, without any mechanism for that sound/meaning to become graphically represented by the strokes of the character. He then separately invents similarly-worded "interpretations" of the strokes like "suggesting alignment", but unlike the phonesthemics, with no methodology for this at all. He then throws in actual more-or-less authentic graphical etymologies. He strings together these phonesthemics, made-up graphic shape meanings, actual character etymologies, to produce absolutely unscientific, meaningless, cryptic "Chinese mystic" utterances.
I don't mean to be unkind with Mr. Howell around. But it's gibberish.
And then he confusingly misappropriates the term "phonosemantic" from its normal use in Chinese linguistics -- leading unsuspecting readers to think they're getting standard information about decomposing the character's radical and phonetic.
His phonesthemic theory on the spoken language is unconventional, and problematic with changing Old Chinese reconstructions, but within the realm of linguistic science. Bringing in the glyphs is astrology.
But that's just my opinion. My opinion is meaningless. Your opinion is meaningless. Mr. Howell's opinion is meaningless. All that matters is the opinion of mainstream Chinese linguistics. Is this stuff widely-accepted facts -- or a fringe theory?
If no one can produce citations from academic journals discussing Mr. Howell's theories as credible, accepted, and standard, this should be the last post of this "Location of such information" discussion.
And then we should start a new discussion, about who can hopefully do something automated to remove all the existing "phonosemantic interpretations" from Wiktionary. With apologies to Mr. Howell for the harsh realities of life, and thanks to him for wanting to do something helpful for Wiktionary.
So, what shall be done about this in the end? If it has to be removed from the main namespace entries, I'd prefer that it be relocated to an appendix instead of being hidden in page history. If we can have appendices for protologisms, Harry Potter universe etc. then I can't imagine how this wouldn't qualify. --Ivan Štambuk (talk) 23:39, 30 July 2013 (UTC)
This shows just how inefficient and biased the consensus-based system sourcing from the unfamiliar majority can potentially be. It's been nearly two months since the addition of such nonsensical content. Wyang (talk) 00:12, 31 July 2013 (UTC)
Ivan Štambuk: it's entirely normal to just delete content of various sorts that shouldn't be there. Mr. Howell has a complete dictionary of his interpretations up on his own web site -- where it belongs. Partially duplicating that in a WIktionary appendix doesn't make sense even if it were accepted reference material. But it's not accepted reference material. Any kind of linking to these interpretations from a Wiktionary entry (whether to an appendix, or to Mr. Howell's site) is presenting them as if they are authoritative within the field of accepted Chinese linguistics, and that is a dangerous misrepresentation to unwary Wiktionary readers, and has a "promotional" effect of taking the material from (rightfully) obscure to a major audience. There are thousands of works out there on Chinese linguistics of varying quality, and his is just one them. Wiktionary has no relationship to it. While well-intentioned on Mr. Howell's part, its not in practice different than spam.
Note that I'm currently trying to find someone over in the Grease Pit who can help with automatic deletion of the Howell material.
But removing content en masse through IPs and making snide remarks such as "nonsense" is not acceptable. If you think that some content shouldn't be there (e.g. because it constitutes original research), then you discuss it on an appropriate talk page or a community discussion board. Also note that wrong etymologies (e.g. obsolete ones or folk etymologies) also deserve mention if they have enough usage (though they should be clearly marked as such). What matters the most is not the truth, but the interest of readers. We include every word in every language, so why not include every etymology of every word? Our etymologies are growing larger and larger, perhaps one day we'd have separate namespace for them alone. --Ivan Štambuk (talk) 07:25, 11 August 2013 (UTC)
???? Not certain of what points you're trying to make regarding this particular material. It's blatant original research/theories being published in Wiktionary, which has no place in Wiktionary, which must simply be removed from Wiktionary. This should be obvious, but instead some WIktionary editors have been trying to militantly defend it as having a place in Wiktionary, because they personally think it's valid work. So I'm pointing out that it's not even valid original research/theories, that we wish we could find some way to use in Wiktionary. It's completely outside accepted linguistics and logic. You can pick another work for that if you don't like "nonsense".
This is all a totally distinct from well-referenced folk etymologies. Referenced, established, etymologies for every word, absolutely. But filling Wiktionary with randomly made-up etymologies by random Wiktionary editors?? I must be misunderstanding your goal. Incidentally I was hoping to do a GREP search for this specific Howell content itself, and would never think of eliminating by IP (given rotating IPs), but the material has already been removed manually.
HanEditor (talk) 08:12, 15 August 2013 (UTC)
Lawrence, your contributions in their current form do not fit for reasons of form, setting aside questions of substance; I elaborate below, and then address content.
Please revert them at your earliest convenience, replacing them with the prior Etymology section if that has been changed or removed. Per Search: “Phonosemantic interpretation” there are currently 165 pages with this section. You can revert them by going to the page, choosing “History”, finding your edits, and clicking “Revert”. You may need to make manual changes if there have been intervening edits. I’ll do so this weekend if you don’t get around to it by then.
Once all “Phonosemantic interpretation” sections have been removed, please feel free to add references to your book and website as relevant to character composition; this is fine and uncontroversial, and indeed much appreciated! Please format them as footnotes, using <ref></ref> for each footnote and a <references/> in the ===References=== section to list them. Please include the page name and work title in the reference, as in:
Before doing many, it’s best if you do one and check with an experienced editor (or just ask here) to see that the format is correct, otherwise you’ll need to change them all. If you expect to do a lot, it’s cleaner to use a template; I (or other technically inclined editors) can write one if that would help.
Please do not add any content on Old Chinese pronunciation without discussion and agreement from the community, together with references (I elaborate below); this warrants a separate, narrow discussion.
The problems of form in these edits are:
They replace or augment an existing, standard section, “Etymology”, with a new, non-standard section, “Phonosemantic interpretation”. WT:ELE is a fundamental policy.
They put phonetic information in the etymology of a symbol. This information belongs in Old Chinese; if well-established, it is also acceptable in Middle Chinese and various modern Chinese languages, notably Mandarin. See Wiktionary:Etymology for recommended practices for language sections.
The citation style is wrong (yes, this matters). Please use footnotes and a references section (as discussed above), not inline citations.
The specific phonetic information is not relevant to the character form: in a pictogram like 馬 or an ideogram like 二, the words could be pronounced “cheval” and “deux” (as in French), and it would make no difference to the character form. Contrast with &, which is derived from the word et, and the verbal origin is relevant to the form. Even for phonosemantic compounds like 的 (白勺), all we need to know for the form is that the Old Chinese word written as 的 had roughly the same pronunciation as the Old Chinese word written as 勺. Information on the specific pronunciation belongs in the Old Chinese section.
Turning to the content, the citations to Howell & Morimoto, a kanji dictionary (characters used in Japan) contain no mention of Old Chinese pronunciation. Thus these claims are uncited. Uncited, controversial claims are not appropriate for inclusion in Wiktionary. If adding such material in future, please include citations to reliable sources, and convince the community that this is a well-accepted theory.
For example, in your submission to 的, it contains the text:
Old ChineseInitial/*t-/ lends semantic value Straight. Vowel/*-o-/ lends semantic value Curvature or Curve and surround/envelop.
...and references 的. This page contains the modern Japanese pronunciations and discussion of the character derivation, but contains no information about the Old Chinese pronunciation.
Old Chinese pronunciation is probably suitable for inclusion in Wiktionary, despite being under active research – it is of general interest, and there are references available, but we need to be careful to not overstate the case. I will open a new discussion below (well, in August) to discuss the narrow question of how to incorporate Old Chinese pronunciation.
I do not believe that the above judgment is in any way controversial, and is independent of the merits of the content itself: the edits do not fit existing Wiktionary form, and add controversial, uncited content, and thus should be immediately reverted.
The question of whether Mr. Howell deleted anything when adding his "Phonosemantic Interpretations" had not occurred to me. I have seen at least one entry where there is an Etymology section that follows his "Phonosemantic Interpretations" section -- and contradicts his Old Chinese reconstruction used. I just checked, and that Etymology section predates the Howell edit, and he didn't touch it, so that's a good sign. But we should do a survey of a random selection of other entries with the "Phonosemantic Intepretations" to make sure.
If he didn't delete anything, it's actually fastest just to edit the Translingual section and delete his block. If he did delete anything, and there are then later edits, we need to do manual cut-and-paste from the pre-Howell history version to the current page. :(
Easiest (assuming no deletions by Howell) would be an automated search-and-replace on all the source Wiktionary pages. I'm asking over at the Grease Pit currently whether anyone can do this.
Nils von Barth: actually don't tell Mr. Howell it's OK to put any of his work anywhere in Wiktionary. I don't know about his Kanji dictionary, but in these "phonosemantic interpretations", separate from the phonesthemic part, he is not giving academically-credible decompositions of the graphical forms. Rather he makes up "interpretations" of the strokes, and blurs this insensibly together with scientific decompositions/glyph etymologies. As this does not paint him to be a very reliable source for anything, references to his online Kanji dictionary -- if only a web publication -- are nothing other than himself citing himself. We would need good outside references of it being a credible, academically-accepted reference source. And if only an online publication, he can just edit it at any time to add anything he arbitrarily wants to add to Wiktionary with a "citation".
HanEditor (talk) 08:27, 2 August 2013 (UTC) hanEditor
I checked that no content was deleted (rather, in a case or two an Etymology section was deleted or edited as redundant, which I restored), and generally actually reverted so the history looks right; in some cases there were intervening edits and I had to manually edit (as Liliana did).
HanEditor – you’re completely correct regarding content: Lawrence’s contributions and book are flatly and obviously wrong, and unsuitable for Wiktionary.
There were enough reasons to revert without getting into that, but on further examination and reflection, none of the content that Lawrence added to Wiktionary looks credible, and his website/ebook (Howell & Morimoto) does not look reliable at all. The phonetic content was very speculative sound symbolism and unreferenced, and the etymologies look completely unfounded, based on surface analysis of current Japanese forms.
For example, in the sample pages from the ebook, he gives 愛 (“love”) as having a sense of “dragging, hindering” (due to 夂 “foot”), and then 曖 (“dim, obscure”) as due to 日愛 = “hindered sun”.
This is completely wrong. 愛 has no sense of “dragging, hindering” (in Japanese at least), and 曖 is obviously a phono-semantic compound.
I don’t see any other evidence that this is a reliable source; agreed, the content is unsuitable.
Thank you so much Nils von Barth and Liliana. No one in the Grease Pit seems to know how to do batch automated edits, and I thought as usual I would be the responsible one stuck doing all the work, after everyone's run out claiming an unbreakable dinner appointment :)
There were two cases when it was referenced via a footnote, in which case I fixed the etymology to something sensible; in other cases it was just in the references and I removed it (improving etymologies of Chinese characters is needed but beyond scope of this cleanup).
Hello all. Having left disposition of this matter to the wisdom of the community, I went on to other more productive things and only just recently came upon this post-mortem. I see that the discussion raises two issues affecting the editing environment, which in turn impacts the content appearing on Wiktionary. So, as always with the intent of improving the dictionary, I hereby (albeit belatedly) touch upon these issues.
1) Even-handedness: With only a handful of exceptions, even those editors most vehemently opposed to inclusion of my material editors assumed good faith on my part. However, there was one instance of making up policy to suit the occasion. Also, I would note that although Nils reposted here the message he left on my talk page, he did not repost the reply I left on his page, which makes it appear as though I were being uncooperative. Beyond that, I note that he has deleted my reply from his 2013 archive. I'm aware that retaining or deleting material from one's own talk page is acceptable practice, but selective reposting and deletion undermines not only even-handedness but also the trust that Wiktionary users are able to place in the dictionary.
2A) Accuracy: "(Howell) just transfers the 'suppleness of initial /n/' over to a written glyph, without any mechanism for that sound/meaning to become 'graphically represented' by the strokes of the character." There's no need to bring in the strokes of the character: The glyph as an entirety conveys the sound/meaning. The "mechanism" is nothing more than simple convention: A decision made in ancient times to assign a particular glyph a particular sound which conveyed a particular concept/meaning.
"He then separately invents similarly-worded 'interpretations' of the strokes like 'suggesting alignment', but unlike the phonesthemics, with no methodology for this at all." Sure there's a methodology: Reasoning from the general concept of the term to the specific meaning borne by the glyph. The reasoning process includes but is not limited to a) comparisons with OC and ST cognates b) applying what can be construed from tendencies evident in the creation of derivative and replacement characters c) identifying the chain of logic at work in cases where glyphs have acquired associated and extended meanings and d) correlating the qualities of objects and the conceptual values conveyed by the terms chosen to express the objects e) factoring in what we know about ancient tools, social practices, cosmology and so on (while of course filtering out non-Han terms, onomatopoeia and borrowed meanings from the corpus of concept-based interpretations). I daresay this is the blueprint for how sinographs will be interpreted in the future.
"He then throws in actual more-or-less authentic graphical etymologies." Gracias, and thank you as well for correcting the characterization of these etymologies that Nils presents on the Talk page for 民 as "... obviously incorrect etymologies based on current forms."
2B) After quoting me on 愛, Nils states "愛 has no sense of 'dragging, hindering' (in Japanese at least), and 曖 is obviously a phono-semantic compound." Yes, 曖 is a phono-semantic compound; where have I ever stated otherwise? Meanwhile, the "dragging, hindering" refers to the ancient Han conception of the term attached to 愛; what sense it may or may not have in Japanese is irrelevant.
If the lapses in even-handedness and accuracy evidenced in the present case have subsequently been effaced from Wiktionary, fantastic, as I continue as always to cheer for the project. Best regards. Lawrence J. Howell (talk) 01:20, 23 January 2016 (UTC)
The following sentence has been out of date since before I started editing here: "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop."
In fact I can trace it all the way back to User talk:Mglovesfun/Archives/1#formatting (2009). Definitions of non-English terms are formatted without a full stop or an initial capital letter (with the obvious exception of words that always require a capital letter like Spain) and English definitions have full stops and initial capitals. Can we finally update WT:ELE to cover this?
Furthermore, a separate but much more minor point:
It shouldn't have ended twice as {{en-verb}} doesn't show that anyone. Since en-verb only categorizes in the main namespace, we can just use it directly. The entire section Headword line could use the templates directly, but I don't think I could do it and pass it off as an uncontroversial edit, so here it is. Mglovesfun (talk) 12:48, 9 June 2013 (UTC)
I noticed that quite a few pages have some kind of substituted version of templates. I don't think that's a good idea because then things like this happen. It's better to put the actual template in there, so that they always match whatever the template really looks like. —CodeCat12:53, 9 June 2013 (UTC)
I vaguely remember a big controversy over this. I thought it was cap-and-period if the definition was an explanation, and neither if it was a simple gloss, as in most non-English terms. And there was argument over WTF “treated as” was supposed to mean. —MichaelZ. 2013-06-10 03:51 z
The South Picene language (spx) has Old Italic (Ital) as its alphabet. However, not every character used to write it is encoded by Unicode; it lacks one of the characters transliterated as ‘í’ and the word separator (looks like a vertical ellipsis), and the characters for ‘ú’, ‘t’, ‘f’, ‘o’ and the other ‘í’ are rather different. I propose we change its script to Latn until the Unicode coverage of the South Picene alphabet is adequate (compare how we treat Iberian and Egyptian.) — Ungoliant(Falai)05:03, 10 June 2013 (UTC)
Support. The Noric language also uses a variety of Ital, but I didn't even try to use it at Artebudz as Ital is LTR by default, and the Noric inscription is RTL; also, the letter shapes are different. —Angr08:24, 11 June 2013 (UTC)
No idea, but even if it is, it doesn't change the fact that (1) letter shapes are reversed (like mirror-writing) in RTL, and (2) the letter shapes in the Noric inscription are different from those provided by Old Italic fonts anyway. —Angr19:49, 11 June 2013 (UTC)
If we had some kind of "Wiktionary font" (I think that was discussed previously) we wouldn't have to deal with this kind of problem, as we could just devise our own encodings in the PUA. But oh well. -- Liliana•19:31, 15 June 2013 (UTC)
bad-iw filter
Why not just block all bad interwiki edits (there aren't all that many) but with a note explaining why the edit has been blocked. It will stop good faith bad edits, vandalism and experienced editors who make a typo (I've done it) will be able to correct their work with minimal fuss. Mglovesfun (talk) 16:20, 11 June 2013 (UTC)
The "bad-iw" filter already excludes those, since it only looks at mainspace entries. As far as I know, it pretty much exactly matches the rules that have been used by bots like Interwicket and Rukhabot to correct iw entries. There are a few cases such as straight-versus-curly apostrophes and variations in rules for representing Hebrew lemma entries that make for a few mismatches between WTs that might cause problems. I'm not sure how Rukhabot and the bad-iw filter deal with those. Chuck Entz (talk) 13:36, 12 June 2013 (UTC)
If Unsupported titles were already excluded, I never would have found the example above, since I found it just by looking in Recent changes for edits tagged by the bad-iw filter. —Angr14:14, 12 June 2013 (UTC)
The filter blocks all edits that add (or, in some cases where diff messes up, retain) bad interwikis. I would agree with blocking an edit that only adds a bad interwiki (and I doubt it'd be too hard to write one), but I can't agree with blocking an edit like diff, which adds a lot of info besides the bad interwiki.—msh210℠ (talk) 07:23, 16 June 2013 (UTC)
Problems w/recent changes?
Maybe it's just my internet, but I haven't been able to get to recent changes all day yesterday or today. At all, and it's the only page that's not loading (I'm on Chrome on Mac OSX 10.6.8). Anyone else having problems with it? Thanks.
--Neskaya … gawonisgv?17:14, 12 June 2013 (UTC)
There is a WMF project called . It may be a pipedream, but it is at least one step closer to reality than the complaints about our wishlist that sometimes surface here. I would like to collect any thoughts that folks here have about how to make watchlists more useful for us. I have already mentioned there the great utility of limiting watching to sections (in particular language L2s). We also already have the problem with editing the watchlist and even using large watchlists on record.
I would like to be able to automatically watch all entries in a given language, and perhaps sort them under separate tabs within the list so I can quickly browse it. —CodeCat18:42, 13 June 2013 (UTC)
Opening the Watchlist on mobile devices (excluding iPad's) leaves much to be desired. Crashes in various browsers on simple operations, such as resizing. The Watchlist specific to mobile phones is useless, even if it doesn't crash.
I hadn't thought about mobile and I don't think anyone else had mentioned that yet.
Someone had suggested category-specific watchlists, which could include the language categories. That seems second-best to something "section"-specific, ie, only changes in L2 sections for one's selected languages. Our page architecture of having multiple languages on the same page does make things harder and out of sync with what WP typically needs.
@CodeCat: I don't understand "sort them under separate tabs within the list so I can quickly browse it". If you are watching an entire language, what sorting would you want? Where are the tabs coming from? DCDuringTALK04:27, 14 June 2013 (UTC)
I think she means she wants to have one tab open with her Catalan watchlist, another with her Dutch watchlist, another with her Swedish watchlist, and so forth. —Angr09:41, 14 June 2013 (UTC)
I see, I think. That would mean one would be "allowed" to have multiple watchlists, say, one for each language, category, or . The tabs are provided by one's browser? DCDuringTALK12:22, 14 June 2013 (UTC)
I think I see this more as a customizable filter on one's Watchlist, with the additional option to bulk add items that match a category criterion (rather than having to visit each page and click the little star to make it blue). --EncycloPetey (talk) 01:35, 25 June 2013 (UTC)
There are two requests for watchlist grouping, which would enable grouping by a category, and consequently by a language. Though it remains unclear whether these encompass subcategories as well (since Wiktionary groups by PoS). Grouping by changes in a particular language seems to be impossible without support for L2-based (as opposed to page-based) categorization, which will probably never gonna happen.
As a workaround we could ensure that every L2 lemma has language-specific page linked to and then monitor all the pages that link to it. It could be a dummy page (which would be preferably linked invisibly - though I'm not sure that's technically feasible) or e.g. a WT:A<language code> policy page that should be IMHO regardless always displayed in the headword line, in superscript or something. --Ivan Štambuk (talk) 04:53, 25 June 2013 (UTC)
Manual transliteration and transliteration from modules
After I've added manual transliteration to the translations (to a few selected languages where it's possible), some editors started removing previously added manual transliteration. I'm against this practice. The transliteration may be out of date but it can easily be updated from preview (User:Conrad.Irwin/editor.js). Note that auto-transliteration is only added when it's missing. If people wish to add the new transliteration, then perhaps a bot could do this - as a once-off job - overwrite existing transliteration and add where it's missing. Perhaps one of User:Conrad.Irwin/editor.js or User:Kephir/gadgets/xte could do that?
What's the general opinion about this? I also think there should be the transliteration written in entries and translations. How it gets there - manually or via a bot is another thing. Can someone create a bot to update/insert transliterations or modify the scripts, so that auto-translit is written to translations if it's not supplied manually (this condition is important)? --Anatoli(обсудить/вклад)06:57, 14 June 2013 (UTC)
I think that auto-transliteration should always override manual transliteration. Manual transliteration will not coincide with auto-transliteration only if an editor made an error in transliterating. By forcing auto-transliteration we can neutralize such errors. Consider this: historically different editors have used different transliteration schemes for Armenian on Wiktionary. By adding auto-transliteration to {{hy-noun}} and the rest I made sure Armenian is transliterated consistently. We should do the same to {{t}}, {{l}}, {{term}} and others.
Anticipating your objection, that in Russian we show stress in the transliteration and so it does not coincide with auto-translit, I say we should show the stress on the Russian word (like this, {{ru-noun|head=соба́ка}}) and let auto-translit pick up the stress from there.
Using a bot to upload transliterations is not a good idea, IMO. The bot would need to rerun every time we decide to modify a transliteration scheme. On the other hand, with auto-transliteration you need only change Module:Armn-translit once. --Vahag (talk) 09:42, 14 June 2013 (UTC)
Transliteration is supposed to convey orthography. Maybe we should consider dropping foreign-style stress marks from transliterations in the few languages where we use them, and only indicate stress in the pronunciation, where it properly belongs. —MichaelZ. 2013-06-14 14:59 z
Perhaps, but it might come in handy in cases where there are homographs with differing stress, so it's obvious which one is meant. Chuck Entz (talk) 15:12, 14 June 2013 (UTC)
Right, as in горілки(horilky). Are there other ways to handle such entries? The important difference would be emphasized if stress were only indicated in such entries. —MichaelZ. 2013-06-14 15:39 z
Of course pronunciation can be indicated in language-specific form. —MichaelZ. 2013-06-14 15:08 z
Right. So pronunciation and stress could be indicated in the “Pronunciation” section, as in его(jevo), for example. (Russian is a special case because pronunciation is also entered where transliterations normally appear, and no transliteration appears in Russian entries.) —MichaelZ. 2013-06-14 15:39 z
Our transliteration system for Burmese is pronunciation-based, not orthography-based. In probably 75% of the cases the pronunciation-based transliteration can be correctly mechanically predicted from the orthography, but in the remaining 25% of the cases it can't and will need to be done manually. The alternative would be to switch over to an orthography-based romanization of Burmese, which I would actually be in favor of but which met some opposition a few years back. —Angr16:03, 14 June 2013 (UTC)
I hope this is not intended to apply to uses of {{term}} in etymology sections. In etymology sections for terms mostly transmitted over time in writing I think a pronunciation-based transliteration system can be quite misleadinging. For example, the writers who took Greek terms into Latin followed a practices that must have fit their modified pronunciations and created precedents that are followed to this day, ie υ (upsilon) -> "y", not "u". I don't know for how many situations this objection is relevant beyond what I've drawn from. DCDuringTALK17:00, 14 June 2013 (UTC)
Actually, I think we're supposed follow the standards for transliteration of entries for transliterations in etymologies. This isn't really followed all that much, because a lot of editors just use the transliteration in the source they got the etymology from, and have no clue what the Wiktionary practice is. Chuck Entz (talk) 17:13, 14 June 2013 (UTC)
What I wish we did in etymology sections is the same thing most English-language dictionaries do, namely present all foreign words in transliteration. We could still link to the original-script page, of course: if we say that raj comes "from Sanskrit {{term|राज्य|rājyá|lang=sa|sc=Latn}}" rather than "from Sanskrit {{term|राज्य|tr=rājyá|lang=sa}}", for example, it displays as "from Sanskrit rājyá", saving space and not confronting readers with possibly unfamiliar Devanagari, while still linking to the Sanskrit entry. I've tried that on one or two pages, but it always gets reverted. —Angr17:53, 14 June 2013 (UTC)
I wonder what percent of our actual and likely future users prefer and find more useful English Etymology sections the way we do them to an alternative presentation having no non-English script, just transliterations (with no cognates visible by default). Are we just doing this all just for a small population of scholars and for machines that will render it all more useful for humans? DCDuringTALK18:03, 14 June 2013 (UTC)
I think it's a good idea. A better form is mentioning the transliteration first and putting the term in its native script(s) after it, in parentheses, e.g. "rājyá (राज्य(rājya))", as we do in Wikipedia. --Z18:15, 14 June 2013 (UTC)
What group of users do you think prefer it that way, rather than the current way or a presentation with no non-Latin script? DCDuringTALK18:26, 14 June 2013 (UTC)
English Wiktionary is for English-speaking readers, most of whom can't read non-Latin scripts. A small group who are familiar with that non-Latin script prefer to see the term in that way and others prefer Angr's suggestion I think. But I think saving space is not big advantage to completely remove it. I think it would be even preferred by readers whose native script is not Latin; it's kinda hard for the reader to switch to a non-Latin script while reading an English text. --Z18:47, 14 June 2013 (UTC)
It’s bad enough already when a reader follows a link राज्य (rājyá) to a heading “Sanskrit,” and has to read down past Etymology and Adjective to find the precise text “राज्य (rājyá).” In many entries they may have to scroll just to see the headword. (Maybe our headwords should be a head of a language entry instead of just of the page.)
Presenting transliterations only would force the reader to additionally extrapolate that what they clicked on is a derivative representation of राज्य. And I don’t see the point of linking rājyá (राज्य) to represent राज्य (rājyá) – these should be consistent, and I think the current use of brackets to indicate that the transliteration is a representation derived from the original, in both entry and link, is clearest. —MichaelZ. 2013-06-15 16:55 z
Synonym internationalization
How do you translate a page, if each synonym should lead to a different word in the other language.
Say Foo has two meanings: (1) food, the other (2) disgusting.
But in OtherLang the word food is ol:Phould, while the disgusting is Fouyah.
Are you talking about automated translation of entire pages? If so, you don't, it doesn't work yet. To get anything but a very poor translation, you need a human being in there somewhere. Tell me if I've missed the point. Mglovesfun (talk) 08:42, 14 June 2013 (UTC)
To see how we handle the translation of polysemous words (words with multiple meanings), look at get#Translations as an example. Each meaning has its own separate translation box. —Angr09:59, 14 June 2013 (UTC)
OK thanks! That makes the next issue that I brought up, much worse. It means that the whole Hebrew wiktionary is now connected to a "false" location in the English wiktionary, and that a large bulk of work will need to be moved. See my remark on the next topic. Thanks again! Pashute (talk) 11:13, 11 July 2013 (UTC)
Hebrew and Aramaic terms here inside English wiktionary
It seems there has been much work done on Hebrew and Aramaic here in the English Wiktionary. May I ask why? What is the rational? (Perhaps exactly the above issue, but if so it is a very bad solution, and differs from all the other many languages referred to from here) ... Pashute (talk) 07:10, 14 June 2013 (UTC)
Second thought - perhaps it was meant for Talmudic and Kabalic or biblical transaltions - and if so: Why not open a separate wiktionary exactly for that, and move the terms to there? There are many benefits: There could be words specific to the time that are not used anymore, there could be a special entry for words that have changed meaning from the modern language, or from the other versions of the language (say between Biblical and Talmudic hebrew - there are many examples of that...)
So actually this is another topic: Ancient languages...
Back to my question here: Why are there Hebrew and Aramaic terms here in the English Wiktionary, what is the rational of those who worked on it extensively, and is it possible change this without loosing their obviously hard work. Pashute (talk) 07:17, 14 June 2013 (UTC)
The rationale is that these languages exist and contributors have decided to give their time and effort to make these entries, which are perfectly valid and so haven't been deleted. Mglovesfun (talk) 08:41, 14 June 2013 (UTC)
Pashute, the point of English Wiktionary is not only to list English words, but to list all words in all languages. Words in languages other than English are provided with English translations. We don't just have Hebrew and Aramaic, we have Swedish, French, German, Arabic, Hausa, Swahili, Zulu, Persian, Sanskrit, Burmese, Indonesian, Chinese, Japanese, Russian, Navajo, Quechua, and thousands of other languages. There is already a Hebrew Wiktionary, and its point is also to list all words in all languages—but with Hebrew glosses for words in languages other than Hebrew. —Angr09:48, 14 June 2013 (UTC)
Good heavens, you're right. Is that their policy, or is it just that no one's gotten around to adding words in other languages yet? I thought "all words in all languages" was the goal of each Wiktionary and would be dismayed to learn that certain Wiktionaries had decided not to accept that. —Angr14:19, 16 June 2013 (UTC)
Yes, en.Wiktionary has been visited by a few exiles from that Wiktionary who lamented that policy and the (in their opinion) unresponsive and unreasonable admins who used their power to ban anyone who opposed it. (e.g. in 2007) - -sche(discuss)16:47, 16 June 2013 (UTC)
I wonder if we could submit this to the Wikimedia foundation? We could do it on the grounds that they are discriminating against Hebrew speakers, since they do not have access to foreign translations in the way that speakers of other languages do. I doubt that the foundation wants to sponsor that. —CodeCat17:03, 16 June 2013 (UTC)
Yes, it would be preferable for active users of he.Wikt to open any RFC; they would also know better if there has been any recent admin/crat action (the discussion I linked to above being from 2007). Of course, if "Hebrew only" has been he.Wikt's policy long enough that opponents of it no longer edit there, that could complicate matters. - -sche(discuss)17:57, 16 June 2013 (UTC)
I don't think it's necessary to find editors from he.wiktionary. After all, this concerns a policy that goes (IMO) against the neutral and open spirit of Wikimedia projects, and I think we, being also editors of Wikimedia projects, are entitled to have a say about it even if we do not edit there. Consider for comparison if Wikipedia adopted a policy stating that articles about things in the English-speaking sphere were inherently more notable than other things. I'm sure we'd have something to say about that even if we didn't edit there. —CodeCat18:05, 16 June 2013 (UTC)
You wish to argue that hewikt — whose editors are Hebrew speakers — is discriminating in its editor-written policies against Hebrew speakers. Seriously?—msh210℠ (talk) 04:33, 17 June 2013 (UTC)
Having a monolingual dictionary is abominable? It's a different goal than having a pan-lingual dictionary, is all.—msh210℠ (talk) 02:31, 17 June 2013 (UTC)
Not a goal that a Wiktionary should force onto its contributors, IMO. You can still have a Hebrew dictionary and allow foreign language entries. Look at how good our coverage of English is. — Ungoliant(Falai)02:42, 17 June 2013 (UTC)
And you can have a pan-lingual dictionary and allow a numerical (Roget-style) thesaurus. But we've decided not to, and that's something we force on our contributors. They've decided to limit the focus of their project, much as we have. I don't see the problem with it, at all.—msh210℠ (talk) 04:14, 17 June 2013 (UTC)
Wiktionary projects are supposed to contain all words in all languages, and I don't think decision about changing this goal can be made by user community. --Z05:56, 17 June 2013 (UTC)
That text was added by Dominic, an enwikt (and enWP) denizen. AFAICT he acted alone (and from an enwikt perspective) in making that edit — though of course we can ask him. I have no reason to believe that it reflects the Foundation's official view.—msh210℠ (talk) 06:48, 17 June 2013 (UTC)
It is still obviously against the nature of WMF projects. We may not force the contributors to focus their contributions on what we prefer; everyone must be free in contributing as far as possible. --Z07:18, 17 June 2013 (UTC)
The purpose of WT:CFI and whatever WT:... else is to indicate what is an improvement and what is not. Adding a non-Hebrew entry to he.wikt is nothing but improvement of this project of WMF, and people should be free to improve WMF wikis, that's all we tried to tell you. Anyway, lets stop this discussion, it doesn't belong here in en.wikt and is none of our business, he.wikt editors should decide about it. --Z16:57, 17 June 2013 (UTC)
Arguably, including a numerical thesaurus is an improvement of enwikt, and including information about every startup music band is an improvement of enWP. Anyway, that's all I meant also: that hewikt editors should decide on this. That is, I didn't mean that I agree that hewikt should not have foreign entries: only that the outrage against it and comments denouncing it, above, are uncalled for. Glad we quasi-agree. :-) —msh210℠ (talk) 17:04, 17 June 2013 (UTC)
I’m not familiar with Roget’s Thesaurus, but it does sound like it’s similar to our Wikisaurus project. In any case, I’m not sure submitting this to the WMF is a very good idea. It sets a bad precedent, and soon enough other Wiktionaries will start demanding that we change our practices too (and you know how some people feel about our logo and SOP-deletion.) — Ungoliant(Falai)09:32, 17 June 2013 (UTC)
Sorry for joining in late. I'm user כחלון (sounds like "Kakhlon"/"Kahlon") from the Hebrew wiktionary.
I want to point out the he.wikt doesn't have a policy against non-Hebrew entries. As User:Angr suggested above, we just haven't gotten around to adding words in other languages. The Hebrew wiktionary is very small, and has about 5 ~constant editors, of which 1 is an administrator (not me, but i'm speaking on his behalf). We still haven't added many basic verbs and nouns, so the foreign languanges words are quite ahead in our plan...
Of course, if any of you wants to add any entry to he.wikt - be it in Swahili, Zulu, Persian... - he's welcomed. However, since they are only a few of us, and a lot of the work is monitoring contributions/abuses, please do not add too much before we get to know you (-:
If there is anything I didn't clarify - feel free to ask. I'll be checking this page in the next days. Be well, 132.76.61.2312:19, 22 June 2013 (UTC)
I assume he's referring to the sentence: "ויקימילון העברי מציג ערכים עבריים בלבד..." etc.
It says: "the Hebrew wiktionary displays only entries in hebrews. i.e. in the "translation" section of every entry, one should link the translation to the foreign language wiktionary. e.g.: in the entry זאב (="wolf") the translation "wolf" should be linked to the entry wolf in the English wiktionary".
Now, all that is correct. The translations at he.wikt do link to the other wiktionaries (otherwise, all these links will be red). However, you can still add words to he.wikt in any other language. In particular, it seems the sentence "the Hebrew wiktionary displays only entries in hebrew" was misread. I notified our administrator about that, and asked him for his opinion. 132.76.61.2212:50, 22 June 2013 (UTC)
You probably lack the necessary templates for other languages, don't you? If that's the case, are there rules what foreign language entries should look like? We had some ridiculous, badly formatted entries like "ghar is a Hindi word for house", which was obviously deleted. Maybe that's why the rumour about the Hebrew Wiktionary? People didn't know how to create correct entries? --Anatoli(обсудить/вклад)13:12, 22 June 2013 (UTC)
I'm not sure what do "the necessary templates for other languages" mean. I'm pretty weak in all the technical aspects of wiki...
Your guess may be correct. Every "regular" entry (=Hebrew word) should meet cetrain criteria: "grammatical analysis" template, good definition, etc. We have similar demands for non-Hebrew entries, but we never really defined them, as far as I know. We simply never got to that. We delete bad new entries every day, but I rarely see any contribution in foreign languages. 132.76.61.2313:54, 22 June 2013 (UTC)
There's also this thread from 2007 where a user from Hebrew Wiktionary (who doesn't edit there or here anymore) said "In the past, users who tried to contribute entries for German and English words were ordered to stop, and their contributions were deleted" and "Yesterday I intiated a discussion, trying to convince my fellow users to change the policy. Unsuprisingly, the idea was rejected." That user's he-wikt contributions for that day can be found at he:מיוחד:תרומות/שי. So was it true in 2007 that German and English words were deleted simply for not being Hebrew? And was there a proposal back then to accept non-Hebrew entries that was rejected? As recently as last August, the entry "dog" was deleted at Hebrew Wiktionary, though of course I can't tell if it was deleted for being badly formatted or simply for being English. (I did find the pages he:English and he:Hebrew, though, as well as 18 German words in he:קטגוריה:גרמנית, showing that at least a few non-Hebrew entries exist and haven't been deleted.) —Angr14:02, 22 June 2013 (UTC)
I'm not sure what was the policy in 2007, if there was any. I joined he.wikt only 4 years ago, of which I'm active only about 1 year total. Nevertheless, I'm considered one of the experienced users (-:
I think "dog" was deleted because it was badly written. However, that only allowed us to stall. If someone was to start adding well-written English entries to he.wiktionary - we'd have been forced to phrase criteria for foreign languages entries. That might have taken some time, though.... There are several dozens (I think) of German entries in the he.wiktionary. They are badly formatted, and give partial information. The current Status Quo is leaving them as is.
Finally, for those of you who read Hebrew, here's a link to discussion we had on this subject (from 2009). It started at our parlour, and was moved to its own archive. Most of the editors supported "turning he.wiktionary to a multi-lingual dictionary", but thought the time hadn't yet come. I guess that situation didn't change since. 132.76.61.2215:07, 22 June 2013 (UTC)
Hebrew speakers in the English Wiktionary
I've noticed that there are a lot of Hebrew speakers here in English wiktionary (relative to the small percentage of the world that speaks Hebrew). Is it just a coincidence, do you think? JulieKahan (talk) 18:08, 25 June 2013 (UTC)
I think using the percentage of Hebrew speakers in the world might give a wrong idea. Israel has a high score in "interenet penetration", and it's even higher among its Hebrew speakers. Same goes for other big Hebrew speaking populations (e.g. in the US). 79.182.160.19107:23, 28 June 2013 (UTC) (same guy as 132.76.61.23)
English is the second language in Israel. It is studied at school, mandatory for finishing high-school and used in all scientific research and technological development. Understanding Arabic, sadly, is much less pervasive. You can expect many of the hi-tech and scientific community in Israel to be well versed in English, while needing translation, bringing them to the Wiktionary. (That is how I came there, and the following it, came here - I'm US born, living in Israel for the last 43 years, came at 6.)
There is also a large Jewish community of recent migrants to Israel, especially in the technical field, many of them with jobs as technical writers (working for Israeli companies lacking English linguistic expertise). These people studied and speak Hebrew at home with the next generation. Thus, in time, it was natural and as expected to see many Hebrew speakers, on the English Wiktionary. A similar phenomenon happened with the French Wiktionary and the recent influx in the migration of French Jews, except that in this case, a larger percentage has actually used Hebrew in France as a second language, following their traditions in North African countries, where most of their ancestors lived in the past 400-1000 years. It would be interesting to understand where the difference originated and why. Perhaps because the Arabic language is closer to Hebrew.
Also, Hebrew is taught in all Jewish schools in the US and is the ritual language used in all national or religious ceremonies. So you should expect many native English speakers who know Hebrew.
And finally, there are many Jews in the US who do NOT speak Hebrew but do speak Yiddish or Aramaic, both of which are very close to Hebrew and have much in common.
I hope your not hinting that this is part of "the Jewish scheme" to rule the world. At least to the best of my knowledge I'm not part of it. I do believe though and am proud of the fact, that the Jewish legacy is to try and make the world a better place for everyone, and working with the Wiktionary community is part of that. Pashute (talk) 15:14, 11 July 2013 (UTC)
Local Hebrew translations and interlinks
Back to the main topic: (Thanks angr for you clarification in previous question of how its done for multiple meanings)
OK, I now understand how this works and what the idea is. I'm discussing this on the Hebrew Wiki, and believe we'll resolve to open all languages there, with proper assistance from that local community.
In the meantime, two other problems are now apparent:
a. The whole interlink system is skewed - and taking from the he-wiktionary Hebrew words to the en-wiktionary HEBREW WORD. As long as the he-wik is missing the English terms, and until it does get them, could we have the Hebrew link to the English term in the en-wik? I'm putting this to vote.
b. The Hebrew subsection in the en-wik is all mixed up with Aramaic. Just because a language is written with the same letters, doesn't mean it has the same meaning. And the example happens to be: en:ספק which apparently and surprisingly (and wrongly defined even in that language) holds the Aramaic term. Pashute (talk) 15:14, 11 July 2013 (UTC)
Vote for translation Wiktionary links correction
The way things work now, terms in the original language should point to the term in the local translations. (t|aa|AaTerm -> aa:ZzTerm) Example: fr|Get in th English Wiktionary, points to the English Wiktionary French (or Catalan) translation en:obtenir
The translations themselves should point to the term in the translated languages' Wiktionary (t|aa|ZzTerm -> zz:ZzTerm). For example en:obtenir correctly points, as its French translation, to fr:obtenir
In any case, it is obviously a mistake to link a term in the original language and on the original Wiktionary to the translated term in the remote language (wrong: t|aa|ZzTerm -> zz:AaTerm). For example it is obviously wrong to point from the English Get to the French as its translation.
But what happens in wiktionaries with a small community that currently have not had enough editors to totally evolve the project, or for any other reason the local translation is missing?
As a temporary solution I propose that as long as there is no local translation, a term will be temporarily linked to the remote translation in the remote language. (Temporarily: aa:AaTerm -> zz:ZzTerm, until aa:AaTerm is entered). Example: English Wiktionary Get not having a term for Danish få fat i would point to the Danish Wiktionary (not the best example because there's no 'få fat i' in the Danish Wiktionary... but I hope it clarifies what I'm talking about)
I don't understand, please clarify: interwiki links are the links written in the left column of every page, with a code like ], to link with the page with the same spelling in other Wiktionary chapters. You seem to be speaking about translation links between Wiktionaries, which is different (please write ] to create a link and not an interwiki link). Most Wiktionary chapters (en.wikt, fr.wikt...) make translation links like this : obtenir (two links, one in the local Wiktionary, the other to the corresponding Wiktionary chapter when it exists). Also, clearly define what wiki you are talking about: en.wiktionary or he.wiktionary? Dakdada (talk) 16:05, 11 July 2013 (UTC)
OK I've corrected the terminology above, took off the vote and leaving it as a discussion. Please re-read. This is correct for ANY wiktionary where a local translation is missing in some language. Even the English wiktionary doesn't have every word translated to Danish (was: Swahili, but not sure of that, after having a look).
A side remark, I've had people remarking about the date in my signature. I have no idea what is wrong. In my browser, the four tildas open to my (Pashute) signature with 11 July, 2013. Pashute (talk) 17:40, 11 July 2013 (UTC)
In the main namespace, interwiki links go between one page titled foo and another page titled foo, except where there's a technical reason the pagetitle can't match the headword. Likewise, Template:t links {{t|fr|foo}} to fr.Wikt's entry on foo, with the link shaded red or blue depending on whether fr.Wikt has an entry for foo. I don't think it makes any sense to change either of those things, which is what I think you're unclearly proposing. - -sche(discuss)18:10, 11 July 2013 (UTC)
Vote for correcting and separating Hebrew portal from other same alphabet languages
Yiddish already has a Yi header. Aramaic does not. In any case, all languages using the same alphabet should be entered separately. So does ancient (biblical) Hebrew.
Yes. That is what I am asking about. Removed the vote. Leaving it as a discussion. Please note my remark about the date in the previous section, if there is any problem with my signature. Pashute (talk) 17:42, 11 July 2013 (UTC)
Is it possible to open a wiktionary for an ancient language, that is now being studied extensively? Such as Medieval Latin, Talmudic Aramaic, Zohar Aramaic etc. ? Pashute (talk) 07:17, 14 June 2013 (UTC)
The short answer is yes, but this isn't really the place to ask. There's a Wikimedia incubator for wikis which are not ready to go 'live' yet. Mglovesfun (talk) 08:38, 14 June 2013 (UTC)
The short answer is no; although there are a few Wikimedia projects in ancient languages, the only projects that can still be created for ancient languages are Wikisource and Wikiquotes (since they don't provide original content). New projects that provide original content, such as Wikipedia and Wiktionary, can be created only in living languages with native speakers. But words in those languages can certainly be added to English Wiktionary, and indeed we already have many words in Category:Latin language and Category:Aramaic language (as you noticed in your post above this one). —Angr09:57, 14 June 2013 (UTC)
OK, thanks for clearing this up. I now understand the multilingual basis in the Wiktionary, which seemed totally wrong until then. So having a term in a foreign language on a local Wiktionary in its original alphabet and spelling, makes sense as a translating dictionary, for the speaking community of the local language. Americans want to see the French word explained in English, not in French. Makes perfect sense. I'm still not clear on the subsection vs. category and portal way of doing it.
Why not have the French term under en:wiki/French rather than just having the term (usually not capitalized - does that solve the problem?) and then if the same spelling is used in several languages, having the same entry with branches for each language. Is there a policy here? If so, what are its benefits, and is it mandatory for all Wiktionaries.
I'm obviously asking so that the Hebrew Wiktionary could follow suite. BTW the subsection would solve a serious maintenance problem for smaller teams, and would make it easy on web searches for translations to a particular language. This would be much better than Google Translate, and IMHO would push the Wiktionary in bounds and leaps, ahead. Pashute (talk) 15:26, 11 July 2013 (UTC)
Statistics related to only one part of an entry
At per there are two etymologies, a preposition derived from Latin and a pronoun/adjective coined in 1979. The entry also includes a statistics section noting that it was the 760th most common word prior to 1923.
Obviously those statistics cannot be for the senses coined fifty-five plus years after 1923, so should the statistics section be moved to a L4 heading at the end of Etymology 1 rather than a L3 heading at the end of the English section? Thryduulf (talk) 07:37, 15 June 2013 (UTC)
If it's just purely based on frequency, then no. Because it's probably just searching the equivalent of the regex
per(|\.|,|;|:)
. In human terms, a space, the three letters per followed by a space, or a period, or a comma, or a semicolon, or a colon. Mglovesfun (talk) 10:58, 15 June 2013 (UTC)
I have yet to see any comprehensive statistics about the frequency of usage of meanings of spellings. That seems beyond the capability of corpus analysis at this time. It wouldn't even seem possible in principle without some kind of standardization of meaning. PoS-level and Etymology-level statistics might be possible. But, for example, COCA's PoS reporting seems not ready for prime time.
Perhaps we can find some studies of small sets of words that report such frequency information so that we have good reason to show additional statistics and decide how to show them. DCDuringTALK12:14, 15 June 2013 (UTC)
I'd guess \b\b. In any event, I agree with Mg: the frequency stats are independent of sense so should be listed ===thus===.—msh210℠ (talk) 07:39, 16 June 2013 (UTC)
Missing punctuation after "per"! Why not any punctuation after per? Quotes, brackets, exclamation points and, especially, question marks, DCDuringTALK12:56, 11 July 2013 (UTC)
All uses of context labels have been converted so that they explicitly call {{context}} now. That has allowed me to significantly rework the template and, maybe the most important, to get rid of the recursion. So all the numbered context templates should now be orphaned (it will take the software a few days to catch up, I expect). The new version of the template now uses a few new helper templates, {{context/show}}, {{context helper}} and {{context test}}. {{context/show}} is called for every numbered parameter that is passed to {{context}}, and is responsible for showing the label and transcluding the label template when it exists. {{context helper}} is called by the labels themselves. Because the recursion is gone, the label templates no longer need to be passed all the remaining labels. However, they still need to know the next label, because that determines whether or not to show a comma after the label. Some labels explicitly omit the comma as well. So, labels are now passed only the next label that follows them, but they do not display it; they only use it to determine what separator to show.
The labels originally called {{context {{{sub|}}}| where the parameter "sub" was supplied by {{context}} or its numbered varieties. I thought it would be useful to co-opt that mechanism for another purpose. That is what {{context test}} is for. When {{context/show}} needs to see whether a template is indeed a context label (because of the naming conflict that still exists), it calls the label template and passes it sub=test. This causes the label to call {{context test}} rather than {{context helper}}, and it will return the text "valid context label". {{context/show}} then checks for that text and considers the label valid if so. —CodeCat14:51, 16 June 2013 (UTC)
Thanks for all your work, and, perhaps more importantly, for your initiative. One question: It looks as though {{context labelcat}} still works fine. Is that your intent for the future? If not, what are you thinking of doing with its current uses?—msh210℠ (talk) 04:22, 17 June 2013 (UTC)
It displays the context template's label and categorizes the entry in the context template's category, but does so without parentheses or italics. It's used in usage notes ("Considered {{informal|sub=labelcat}} when construed with for" or whatever) and in some templates.—msh210℠ (talk) 16:46, 17 June 2013 (UTC)
On pages like ], it's used by {{alternative spelling of}}. On pages like extract the urine and ], it's used by context labels via syntax like {{obsolete|sub=labelcat}}, which allows people to put the labels in usage notes (and suppress their parentheses and italics) rather than on the definition lines and/or in the POS sections where the labels would in many cases be more at home. On pages like ], it's used (by context labels) in etymologies, where a dedicated, categorising etymology template vaguely like {{borrowing}} might be better. - -sche(discuss)16:42, 17 June 2013 (UTC)
IMO, we don't need it. It would make more sense for {{alternative spelling of}} to apply categories and display text on its own, without invoking {{British}} etc via {{context labelcat}}. Many uses of the template in usage notes should be deleted in favour of regular uses of context templates on sense lines. Even if a few uses are left over once that's done, I expect they'll be very few (because the template only has <150 mainspace uses even now), and to replace them, people could just write out 'jocular' by hand without encasing it in brackets and appending |sub=labelcat, and add any necessary categories manually. (The few etymologies which use it to describe parts of words as onomatopoeia could use a dedicated onomatopoeia template and/or write out any necessary categories manually.) - -sche(discuss)20:09, 17 June 2013 (UTC)
I agree with -sche (just above) that the uses in etymology sections can be converted to 'manual', though template use is certainly more editor-friendly and it'd be a shame to see the template go. But {{alternative spelling of}} and {{eye dialect of}}'s use of it is one that allows those templates to include any regional context tag, including such as are created in the future, and IMO that's an important feature of those templates which we should definitely not lose. So either {{context labelcat}} or an equivalent (i.e., some other template that does what it does, reading any (at least regional) context tag, displaying its label, and categorizing) is necessary.—msh210℠ (talk) 05:49, 18 June 2013 (UTC)
Question: is there any reason one might want{{context labelcat}} to not work? I mean: It sounds from the above that {{context}} is now stable. In that case, {{context labelcat}} should be fine. Is that correct? If not, what further changes might be desired to {{context}}?—msh210℠ (talk) 05:49, 18 June 2013 (UTC)
I did some checking and it turns out that {{rare|sub=labelcat}} produces the same as {{rare}} itself. So we don't really need the extra template. —CodeCat12:26, 18 June 2013 (UTC)
Perfect. Thanks for checking; it looks like you're right. If that's to be true for the foreseeable future — is it? — then we can simply redirect {{context labelcat}} to {{context helper}} and no further work is necessary for this.—msh210℠ (talk) 17:10, 18 June 2013 (UTC)
In the past, one user proposed to delete the small number of "fairly used" copyrighted files en.Wiktionary hosts locally, citing the fact that en.Wikt did not have an EDP (exemption doctrine policy, allowing copyrighted images to be hosted locally and fairly used) of the nature required by the WMF. In response, I drafted Wiktionary:Non-free content criteria, based on Wikipedia's EDP, but heavily adapted to Wiktionary. The deletion discussions were closed with the files kept... and discussion of our EDP petered out. We are still listed on meta:Non-free content as having a "draft proposal only consensus has not been reached". So: do you support the non-free content policy I drafted? And/or would you propose a slightly or significantly different policy? Or do you think en.Wikt should not host non-free files locally under any circumstances? Let's see if we can get consensus for a EDP (whether it's my draft or not), or if consensus is that we shouldn't host non-free files. - -sche21:17, 16 June 2013 (UTC)
Support. Points 2 and 3 are important (no free equivalent; minimal usage). We ought to be able to do our job as a dictionary with little or no use of non-free content, and that will remove a source of possible trouble. Equinox◑00:23, 17 June 2013 (UTC)
Support. The fact that we may only very rarely have a reason to use such material is no reason not to have a policy for its use where such use is legal. bd2412T14:45, 18 June 2013 (UTC)
Support. It is very rare that Wikionary needs to use non-free content but "very rare" != "never" so we need a policy to cover those situations. Thryduulf (talk) 12:27, 20 June 2013 (UTC)
Support. The, "fair use" doctrine is English (as in UK) law. The US & other English speaking countries have similar laws, whereby for educational purposes pictures can be used. Since Wikipedia is the internet's encyclopedia, and since, I believe, over 90% of contributors are from the UK and US, this is an important matter. A code should be created, and allowed to be used liberally by wikipedia users, to enhance the product. Wikipedia's standing policy is too conservative crummy. You address every language and you invite all the world's religions in, yet you fail to address this simple cure all to all ills. If the code exists, please post it here. For reference, please check out item #7, 2nd bullet point, below:
I believe wikipedia's policy is like this because copyright permission laws among the English speaking countries are not verbatim identical. However, their policy is the same. To be Accounting correct, you should address specific laws in each country, have specific tags for Canada, the US, UK, AU, NZ, IRE, etc., and maintain a file and keep it current for changing law. This is not hard. In fact, it is the cure all to make a Wiki patent expert's life easy.
Oppose In my opinion we don't need fair use images at all. From what I see, we only have one right now, so... yeah. -- Liliana•12:27, 17 June 2013 (UTC)
Abstain no strong feelings. Minimal usage is a good idea in case there's a MediaWiki ban on all such images (or files, not necessarily images) so we can remove them quickly if we need to. Mglovesfun (talk) 11:29, 18 June 2013 (UTC)
11 users support this policy (the 10 who so voted and the one, me, who proposed it). 4 oppose it. That's a comfortable degree of support (73%, or 65% if abstentions are included). I have therefore updated meta:Non-free content to indicate that we have a consensus-backed set of criteria for using non-free content. - -sche(discuss)18:52, 11 July 2013 (UTC)
An aside: This alleged "informal poll" is not substantively different from "a formal vote". To think otherwise seems ridiculous to me. What matters is evidence of consensus. This alleged poll does not provide any less evidence of consensus than a formal vote. It differs from it mainly by not being listed on WT:VOTE page, so being a little less visible. --Dan Polansky (talk) 19:19, 11 July 2013 (UTC)
Arabic dictionary (Sakhr) down but its data can be useful
It used to be the best online Arabic dictionary. It's the only comprehensive dictionary that consistently provided pronunciation (with vowel points) for most of the words. Others have so far failed to do it. I've been in contact with them in the past. When it went down, I've contacted them, they replied a year ago that they were still fixing it. So far, no progress. I hope we can hold of the data and import it into Wiktionary. I made another contact today in hope they can release the data:
Who wrote this dictionary, and is it an original work? Or did they compile it from various sources? DTLHS (talk) 04:29, 18 June 2013 (UTC)
The approach to create this dictionary was similar to Wiktionary, EDICT (ja), CEDIC (cmn). Various users added their contributions but I'm not quite sure, as the volume was quite big, so they may have some initial data from somewhere. I could find a lot of various words there in their lemma form with Arabic short vowels written, so that a person knowing the letters could read. It wasn't too smart, as it didn't separate various senses but every word's translation was split into parts of speech (Arabic word). A user with very basic knowledge of Arabic could find what they were looking for.
I'm waiting for their response but wanted to advise that some major importing work may be forthcoming and in case there are any licensing issues. Also, in case anyone found any other decent comparable resource (I doubt there is). العربية (Arabic) - WordReference Forums is as close you can get to it, it has sample sentence but only some words have marked pronunciation - the main hurdle in learning to read Arabic well is not the alphabet but missing letters, one has to know those words, grammar and patterns. --Anatoli(обсудить/вклад)04:55, 18 June 2013 (UTC)
How you mean "misused"? Do you mean used without explicit {{context}}? Without lang= tag? Wrong section? Used other than is a definition line? DCDuringTALK14:02, 18 June 2013 (UTC)
Is that a misuse? I thought that usage and grammatical labels could be applied to entire entries or individual senses. —MichaelZ. 2013-06-18 15:26 z
Yes, that was my understanding as well. I put it on the headword line if there are multiple definitions/translations and they all have the same transitivity. SemperBlotto (talk) 15:30, 18 June 2013 (UTC)
Should headword templates incorporate this for basic info, like (in)transitivity of verbs? Or should they be able to accept any usage or grammar label as a parameter? I suspect the most important consideration here is a consistent UI for editors. —MichaelZ. 2013-06-18 15:50 z
But verbs are not intransitive or transitive. Senses of verbs are. So this information belongs on the definition lines. —CodeCat16:14, 18 June 2013 (UTC)
I don’t understand the nuance of the meta-semantics, but if a label applies to all senses of a term, then isn’t it clearer for the reader if the label is applied at the headword? I believe print dictionaries do it thus. Are our usage and grammatical labels clearly in a different class from mainly-headword labels like m, uncountable, plural only, plural---, or superlativemost ---? —MichaelZ. 2013-06-18 18:03 z
Because no one has done so explicitly, I'm objecting to this "fix", whether by bot or not. (I agree with Mzajac and SB.)—msh210℠ (talk) 17:13, 18 June 2013 (UTC)
Semper's right on this one, in French and I believe some other Romance languages, there are verbs that can only be used transitively or only used intransitively. I picked the wrong fix. Mglovesfun (talk) 18:06, 18 June 2013 (UTC)
Transitive French verbs also have intransitive uses, and intransitive French verbs often have transitive uses in some cases. Lmaltier (talk) 22:10, 25 June 2013 (UTC)
But other context templates are also used (and also correctly) on the headword line. And {{context|transitive}} is perfectly correct there: there's no need at all to change it to {{qualifier|transitive}}: I don't see why you're calling this a "fix".—msh210℠ (talk) 18:29, 18 June 2013 (UTC)
No one's saying they're interchangeable. {{qualifier}} — q.v.! — qualifies a synonym or relterm or the like with a register or region or the like. {{context}} does the same for definitions. That's per the template documentation and accepted practice. Obviously (from this conversation), I'm not the only editor who's extended {{context}} to apply not only to single definitions but to blocks of definitions — by putting context labels atop definition lists, on headword lines. That certainly seems more reasonable than applying a template to headword lines that's meant for relterms.—msh210℠ (talk) 22:04, 19 June 2013 (UTC)
Okay, this is really confusing, and some of the conversation doesn’t even make sense to me. The problem starts with “context labels are being used as grammatical labels.” I don’t know what that means. As far as I know, we have these kinds of labels:
Usage labels, including subject-area labels, using {{context}}. These indicate that a term, a sense of a term, or a spelling is restricted in usage to a particular period, genre, technical subject, region, social situation, or other. Some usage labels qualify others, like chiefly.
Grammatical labels, which indicate a grammatical quality of a term or sense. They are applied by headword templates (e.g., m, f, n, pl, sing) or using {{context}} in either the headword line or a sense line. They have nothing to do with “context,” and the term “grammatical context label” appears to be nonsense. When a grammatical label is added to a headword line, it looks awkward because that usually creates two adjacent sets of round brackets.
Indicator labels, merely indicating which particular sense of a term is being referred to. They appear in the header of a translation section, or next to a linked term in a list, and consist of a concise gloss of the definition, or a copy of a sense’s usage or grammatical label. They are sometimes enclosed by {{qualifier}} or {{sense}}, but I’m not clear on the difference (apparently they both “qualify”).
Do we all see it the same way? Does anyone have a substantially different picture of all this? Did I miss anything?
If we don’t get straight what we are talking about, then we can’t understand what it is, or what we are saying to each other. I recommend we stop saying “context” at all. —MichaelZ. 2013-06-19 22:12 z
There are numerous instances in the entries for English polysemic terms of sense-specific information, such as concerning complements, which has, for some time, possibly ab ovo, used {{context}} to avoid multiple sets of parentheses and diverging styles of formatting for such information. We also have semantic scope, eg "(of animals)", indicated in the same way, for the same reasons. I am a little concerned that, at this late date, the legacy uses of {{context}} come as a surprise to anyone implementing what I'd supposed was an updating and performance improvement of the templates, not a simplifying reversion of capabilities. DCDuringTALK23:16, 19 June 2013 (UTC)
Don’t be surprised. CC is necessarily doing a lot of cleanup, and that requires dealing with every possible edge case, and ambiguous or rare situation. Admirable, considering that we’ve never been able to agree on what “context” means. So let’s keep trying. —MichaelZ. 2013-06-19 23:34 z
Well, Conrad had a system of monitoring the counts of words used within {{context}} that did not have a specific template associated with them, which was used as a basis for creating new templates. Has that fallen into disuse, like so much of our infrastructure? It would seem that such a system would provide useful data about the needs that an improved "context" system would work. DCDuringTALK23:48, 19 June 2013 (UTC)
@Michael: the difference between {{sense}} and {{qualifier}} is that sense goes before words in ===Synonyms=== and ===Antonyms=== sections and indicates which sense (of the entry one is on) the following word is a synonym/antonym of. {{qualifier}} often goes after words and tells how their usage is restricted, although in some lists (e.g. of alternative forms), it's placed at the front of the list rather than the end, to better indicate that it applies to the whole list. For example, in the synonyms section of ], there's this: * {{sense|tool for pressing clothing}} ] {{qualifier|old-fashioned}}, ] {{qualifier|old-fashioned}}. That {{sense}} is intended to be followed by things rather than to follow them is evident from the colon that comes after the closing parentheses of the text it produces. - -sche(discuss)00:30, 20 June 2013 (UTC)
Thank you. We should try to use the same text in {{sense}} and in translation section headers. E.g., in iron#Synonyms: “strong of will, inflexible,” and “made of the metal iron,” but in iron#Translations: “strong, inflexible,” “made of iron.” —MichaelZ. 2013-06-20 02:07 z
There's a brand new draft proposal for support for the Wiktionaries from Wikidata: . This one is different from previous proposals and it is quite concise so I urge everyone to take a look. --Haplology (talk) 17:49, 19 June 2013 (UTC)
A few questions:
Can this be enforced upon English Wiktionary regardless of local community's consent?
Are editors from all of Wiktionaries supposed to get involved to iron out the flaws before this feature gets activated?
Will importation of Wiktionary data to WikiData necessarily involve elimination of what is perceived as a "duplicate" from the main project, similar to what has been done with explicit interwikis on Wikipedia? Does that also imply that any kind of future editing/restructuring of content imported thusly will be taking place not on Wiktionary, but on a related WikiData page? --Ivan Štambuk (talk) 22:44, 19 June 2013 (UTC)
It seems to be a "it's-there-if-you-want-it" (sorry for all the hyphens, I know) approach: "The Wiktionaries would be able to access the data about words and meanings (and also items, actually, for what it’s worth) through Lua. It would be completely up to the communities of how they want to use Wikidata data in their Wiktionaries." --Haplology (talk) 02:45, 20 June 2013 (UTC)
As Haplology said you'll be able to use it. If you do use it is up to you.
If you want it to be great and useful for you then yes please help iron out all the issues with it.
I, for one, think it would be nifty if we used Wikidata as the handy repository for all our transwiki linking needs. It would be much easier than manually adding every new link from a new language to every article in which it belonged. bd2412T20:32, 20 June 2013 (UTC)
Transwikis are the one item I've seen proposed that might actually be useful, but they're a long way from making it practical for Wiktionary. --EncycloPetey (talk) 21:45, 24 June 2013 (UTC)
Apart from interwikis, the only think IMHO worth relocating to WikiData is inflections, pronunciations, transliterations and other seemingly static content not subject to ad-hoc butchering (as opposed to definitions and their semantic-driven dependencies - translations, *nyms etc.), and really carefully tagged. That would facilitate cross-wiki sharing, reusage by third parties (e.g. if you wanted to build a lemmatizer you'd not need to download a Wiktionary dump, extract all entries for a specific language, and re-implement template/module processing logic to build a list of inflections - you'd just need to pull these directly from WikiData by certain criterion, e.g. LanguageName="English" groupBy <headword>), reduce both "reinventing the wheel" and "not invented here" problems (by enforcing sharing and collaboration with the usage of a common storage whence the content will be extracted to the presentation layer) and (last but not the least) perhaps got rid of Wiktionary of all the useless bot-generated content for inflected forms which serve no intelligent purpose other than being soft-redirects to lemmas, by integrating per-language search, as well as search by inflections, variant forms, transliterations etc. directly into the search box.
Trying to build a semantic data model that would fit all languages on all wiktionaries is a futile effort IMHO that would introduce irreconcilable friction in vaguely defined relationships between meanings (which are themselves arbitrarily defined). --Ivan Štambuk (talk) 03:49, 25 June 2013 (UTC)
Even inflections, pronunciations, transliterations, gender, etc. are subject to discussions in many cases, they cannot be considered as really objective data, and therefore are not candidate to Wikidata, in my opinion (Wikidata is for data, and information about words is not data). And bot-generated content for inflected forms is useful to some users (I can tell you, because it already happened to me to look for a word in a dictionary without finding it, because I was using an inflected form, and it's very frustrating). Lmaltier (talk) 22:02, 25 June 2013 (UTC)
Everything up to the hash tag # is very much data. Yes, it is subject to discussion, and so is everything on the Wikipedia side of Wikidata. Take the word 食う. 1. It is Japanese. That's data. 2. It's a verb. That's data. 3. It has godan conjugation. That's data. 4. It's also written くう. That's data. Those are all uncontroversial statements that are shared on every WT that has 食う. There is plenty of uncontroversial data to add. Sure, some of it is controversial, but nobody is forcing editors to use data on WD if they don't want to. --Haplology (talk) 02:21, 26 June 2013 (UTC)
The population of a country, or the height of the Eiffel Tower, it's data. But not the list of languages (is something a dialect or a language of its own?), nor the part of speech (wiktionaries sometimes choose diffetent parts of speech), nor the conjugation (sometimes difficult to determine, or variable), nor the spelling. All of this is related to usage, and usage is not consistent. Of course, you can find many uncontroversial examples, but this is not the general case. Lmaltier (talk) 20:29, 27 June 2013 (UTC)
All that stuff the OED started collecting on index cards in 1857, that’s data. They way it gets stored, printed, displayed on a screen, marked up with ISO language codes, is data. Some of it may need agreement on standards, or fields representing a range or multiple options, and some of it may have to be written in prose on individual language wiktionaries. But every single bit on this website is data, and most of the boring uncontroversial bits can only be improved by reducing redundancy and increasing participation by the global wiktionaries. —MichaelZ. 2013-06-27 20:48 z
RE: "most of the boring uncontroversial bits can only be improved by reducing redundancy." From what I've seen of the discussion over there, I'd disagree. Right now, we can code a Spanish verb conjugation table very efficiently using a local template, then run a bot to populate the entries. Wikidata would expand each item in the conjugation table to be a data value, and all the grammatical number, tense, person, etc. would become individual separate data statements about each verb form pointing back to the data item for the lemma. How would having that sort of complex data structured on Wikidata make it any easier or simpler here? In other words, what we do locally with templates eliminates redundancy, but Wikidata will not use such a template, and all that redundancy will come spilling out all over Wikidata. I'm also not encouraged by the fact that nearly everyone involved in the Wikidata discussion about Wiktionary is a monolingual non-Wiktionarian who only speaks English. --EncycloPetey (talk) 21:11, 27 June 2013 (UTC)
Nearly everyone, I'm not sure. But everyone involved in the discussion is able to write in English well enough to accept to be involved, and to be understood, this is obvious. Lmaltier (talk) 21:32, 27 June 2013 (UTC)
Yes, wiktionaries are databases, but I don't refer to this sense of data. Yes, you can find data about words, but I understand that Wikidata is dedicated to objective data (e.g. creation date of the word, this is objective data, when known). And most information about words is not objective data. Lmaltier (talk) 21:32, 27 June 2013 (UTC)
Everything you just wrote is completely wrong. There is no difference between objective and subjective data, and if there were, databases do not care about such a distinction. I invite you to research how commercial dictionaries are built (hint: they use databases). DTLHS (talk) 21:36, 27 June 2013 (UTC)
Inflections and standardized IPA-transcribed pronunciations are relative and subjective only in some perverse postmodernist parallel universe deprived of laws of causality (and common sense, for that matter). I see the relocation to WikiData of all that dumb data that is being reinvented and cloned among various wiktionaries as an inevitability. You see, words are not just lemma, they're the whole package - phonology, inflection, grammatical constructs (currently expressed in free style, but formalizable in some BNF-reminiscent notation). For metadata (specifically - definitions, i.e. meanings, and all the accompanying inter- and cross-language associations that I've already mentioned), I agree that it's pointless to establish a cross-wiktionary standard due to frequent changes and arbitrariness of word meanings themselves. Unfortunately, the greatest strength of Wiktionary (inflections, etymologies, detailed pronunciations, obscure terms etc.) keep being overlooked and instead the focus is on meanings/translations as is with the traditional dictionaries. --Ivan Štambuk (talk) 00:48, 27 June 2013 (UTC)
Standardized IPA-transcribed pronunciations is already going further then I'm sure we can promise, and even then, what exactly pronunciations are we documenting? English words have a lot more pronunciations then we document, and I can't see every Wiktionary deciding on the same collection.
I don't regard etymologies or detailed pronunciations something I terribly trust Wiktionary on, much less its greatest strength. A good pronunciation transcription takes specialized skills, and a good correct etymology takes specialized skills and all the wisdom and intelligence the definition does.--Prosfilaes (talk) 23:04, 27 June 2013 (UTC)
Phonemic transcriptions are universal and dialect-invariant. Local community issues such as: what symbols to use for certain phonemes, or: which subphonemic features to disregard, or: which transliteration scheme te use, could be handled at the presentation layer (i.e. post-processing the data fetched from WikiData through string manipulation functions). The point is that there exist a single and central repository of all the shared data (or better said: the data that will inevitably get duplicated in more or less identical form everywhere), and that local communities only decide upon non-shared data (i.e. meanings of words and their semantic relationships, or what Lmaltier calls "metadata" - which it kind of is, depending on your frame of reference). Regional pronunciations could be tagged by standards (if they exist, e.g. General American, and many languages have "governing bodies" established by governments which proscribe orthoepic norm) or authors (e.g. notable grammars or dictionaries).
Specifically concerning etymologies - lots of Wiktionary's etymologies (particularly reconstructed forms) are based on cutting edge research, which is something that 99% of paper dictionaries don't come close to. Etymons for most words are well-known, uncontroversial and established decades or centuries ago, and adding them is a matter of looking them up. They do not require any special skill whatsoever. The only exceptions are prehistoric borrowings which are often a matter of speculation. Sequence of borrowing/inheriting from ancestor language(s) could easily be formalized in some format, enabling all wiktionaries to simply translate the metalanguage (strings such as "from", "borrowed from", "inherited from", and language names) enabling them to easily create etymologies for entries without duplicating other people's effort. --Ivan Štambuk (talk) 04:06, 29 June 2013 (UTC)
And what if somebody disagrees with this other people's effort, disagree with an etymology? This may happen because etymology, very often, is not something objective, it's often only hypothetic. In such a case, he might be unable to discuss it because the contributor having added it on Wikidata speaks only English... Such linguistic data sharing between all languages is acceptable for Commons files, because it's easy to remove the included file, and file are independent media. But I understand that the idea is to build a whole database. This is another matter. Lmaltier (talk) 21:14, 29 June 2013 (UTC)
If a particular etymology is controversial on English wiktionary, it would also be controversial on French, German, ... wiktionary. The assumption is that all those etymologies would be equal (same words - same etymologies!), and placing them in a single location (WikiData), perhaps in a special machine-readable format that would facilitate/eliminate manual translation, would be advantageous with respect to the reduction of effort wasted by different people on different wiktionaries. If the etymology needs to be changed, it is changed on a single place, and such updated etymology could be displayed everywhere. --Ivan Štambuk (talk) 22:57, 1 July 2013 (UTC)
@Ivan, I like the theory, but in actuality, the content would not be changed in just a single place -- any content that requires translation / localization will need translating / localizing in all the places that need that content. The etymologies, for instance, could be located in each respective entry on each respective Wiktionary, as now, or they could be located in each respective entry in Wikidata. But for a given etymology in XX given languages, any change made in one language must be made in all the others of those XX locations if that data is to be kept in sync. And if that change is regarded as controversial, then that controversy must be either 1) hammered out until a rough consensus can be achieved, or 2) selectively ignored by each linguistic community that doesn't agree. In which case, housing the data on each Wiktionary or in Wikidata doesn't make much difference, no? In fact, housing the data in each Wiktionary is the easier way to go about things, since it's already there. -- Eiríkr Útlendi │ Tala við mig23:33, 1 July 2013 (UTC)
Wikidata is versed into dealing with controversial content. First, there is no necessity to assume that there is one single truth -- anything can be stated several times, and contain sources for where it comes from. I think Lmaltier does not realize that there is a lot of structured data that can indeed very well be contained in the more structured environment of Wikidata. And also the plan is not to move all of the content of Wiktionary to Wikidata -- I don't think that's possible -- but merely some small parts that make sense. And every Wiktionary can autonomously decide what does make sense, and what does not.
I am a bit surprised that we are discussing whether structured data for a dictionary makes sense at all or not. I was expecting more to discuss what kind of structure need to be supported.
So, may I ask the question if the English Wiktionary community thinks that having some structured data available at all can be useful? --Denny (talk) 15:14, 11 July 2013 (UTC)
Map
On a hopefully less controversial and more pretty note, Wikidata user Denny has made a map of all the geocoordinated items currently in Wikidata, which uploads daily. The small version is here, the large version is here. The Ukraine is very well-represented, while Kansas is noticeably dark. (We're not in Kansas anymore...and apparently neither is anything else.) - -sche(discuss)21:34, 29 June 2013 (UTC)
Bad section nesting in template documentation
I just changed Help:Documenting templates and modules and Template:documentation/preloadTemplate to use level-two (==) section headings for "Usage" and (in the second case) "See also", since these sections should immediately follow the page title (a level-one header) on both the /documentation subpage and the template pages themselves. The third level would have been correct if the documentation was preceded by a level-two "Documentation" section heading, but (unless I have missed something important) that's not how it's being done. This has left a great number of template documentation subpages (and template pages into which they are transcluded) with badly nested section headings. (For example, Template:sense/documentation contains a level-three "Usage" section followed by 2 level-two sections. Madness! ) How should we fix this problem? A bot? Can someone at least generate a list of pages containing badly nested sections (only those of the form "Template:*/doc" or "Template:*/documentation", for what I'm talking about)? - dcljr (talk) 04:08, 21 June 2013 (UTC)
How should the "new" context labels work? Please discuss!
Almost all uses of {{context}} now have a language code specified. So the first stage is done and we can now start looking at further questions:
Name of the template
What do we want to call the new template? Do we want to keep the name {{context}}, or use {{label}}, or something else? We can also use one name as the main name, but another name as a shortcut for convenience. Because the language code templates are now mostly orphaned, we can use any of their names as well, so we have a lot more choices like {{lb}} or {{lbl}}. We also have a module to replace the gender templates, so we could also re-use {{c}} if we want to.
In the original proposal I made, I wanted to make the language code the first parameter of the template rather than named like it is now. So {{context|...|lang=en}} would become {{context|en|...}} (or whatever name we use for the template). To ease the transition, it may be beneficial to choose a different name for the new template, so that {{context}} still takes the same language parameter it always did, while the new template takes the new numbered parameter. On the other hand, we could also just convert {{context}} to work both ways for a while until everyone becomes accustomed to the new method. We can say "use lang= if it's present, otherwise use the first parameter". {{prefixcat}} also works that way currently. —CodeCat19:54, 21 June 2013 (UTC)
Please, not “context.” It doesn’t accurately describe usage labels. It is confusing. It has nothing to do with grammatical labels. No one has ever explained what “context” means in this context.
“Label” is fine. We are using this template for two rather different types of labels for terms and senses.
What if we had two different templates, {{label-usage}} or {{use}}, and {{label-grammar}} or {{gram}}? This might keep everything clearly ordered in the entry for readers, help separate the functions for editors, and facilitate categorization, and provide a sensible way to split up the database of labels. —MichaelZ. 2013-06-23 23:54 z
That last proposal would mean showing two separate pairs of brackets whenever an entry has both a usage and a grammar label. Is that what you want? —CodeCat00:39, 24 June 2013 (UTC)
That could be improved. We already show two sets of brackets when a label is placed after a headword template, which doesn’t look right. Grammatical labels should probably be rolled into headword templates, so they appear within those brackets. That’s on the “later” list. —MichaelZ. 2013-06-24 03:22 z
Internal structure of the Lua data module
This will need some careful consideration because it will affect how flexible things become. We will want to keep the flexibility of our current system at the very least. That means the following things:
Unrecognised/undefined labels should be shown as given. But what if someone creates a new recognised label that just happens to be already used in another entry? How do we find out which entries use a certain label?
There should be support for "modifier" labels which cause the following comma to be omitted, such as _, and, or. It may be possible to extend this system in Lua so that it allows any label to "modify" the label that follows it in any way we choose. For example, there are many entries that specify "with dative" and "with accusative" and so on. We could make a label "preposition with" that automatically categorises the entry based on the label that follows it, so that {{context|preposition with|dative|lang=de}} adds the category Category:German prepositions that take the dative or something similar.
Multiple labels should be able to be treated as aliases, so that we can use multiple names for the same underlying label. "law" and "legal" should be the same, for example. We currently use redirects for this purpose, but we will need to find a substitute in Lua because it doesn't have redirects as such. I think it would be good to have one table to contain the actual labels, and another to contain aliases. That way we can keep them clearly separate while making it easy (programming-wise) for a module to find the "canonical" name of any alias.
Labels currently allow one topical category (en:Foos), one grammatical category (English foos), one regional category (Fooish English) and one "bare" category (Foo). We could keep this more or less the same, but it might also be desirable to allow more than one category for a single label. We could also decide to remove the distinction between the types of category, and specify the categories (in the Lua data module) as something like {{{lang}}}:Foos or {{{langname}}} foos, which gives us more freedom to format the category names the way we like.
We may want to make it possible for a single label to "expand" to multiple sub-labels. For example, {{ambitransitive}} is really two labels in one. This will add some complexity, so we can also decide that it's not worth it and simply encode the few labels that need this as if they were really one label that contains a comma (like now).
Regional labels are, in principle, very open-ended. It can be rather cumbersome to create a label for every dialect of every language we come across. There might eventually be thousands of them, and it can become hard to manage. I have thought of a way to mitigate this, by allowing a special prefix to specify that a label is a regional label. Something like {{context|r:British|lang=en}}. The module will recognise this prefix and treat it specially; it will not need a label to have a category, but it will automatically be treated as a dialectal term. An alternative to this is to use the extended "modifier" labels above for this purpose, such as a "used in region" label that is then followed by another label to specify the region name. We would need to think of a nice way to say "used in region" when there are multiple regions, though.
Even if we don't do the above, we probably also want to get rid of {{British English}} and similar labels which contain the name of the language within them. That is really redundant because there is already a language code. So instead of {{context|Northern England|lang=en}} and {{context|Northern Dutch|lang=nl}}, why not just use {{context|Northern|lang=en}} and {{context|Northern|lang=nl}}? —CodeCat19:54, 21 June 2013 (UTC)
I'm not convinced it's necessary to prefix regional labels, or that 'regional labels' is a more open-ended class than 'non-regional context labels'... one equally open-ended class of labels that comes to mind is 'temporal labels' which modify {{historical}} or other templates such as {{military}}, for terms which are used e.g. "especially the Vietnam era" (which is not the same as {{defdate|1955–1975}}, the label for words which fell out of use after the Vietnam War). OTOH, I'm not necessarily opposed to it. I can see how it would be beneficial to store such large categories of labels in a different module or section of the module, just to make things more übersichtlich. - -sche(discuss)21:17, 21 June 2013 (UTC)
PS, {{context|Northern|lang=en}} ≠ {{context|Northern England|lang=en}}. {{context|Northern|lang=en}} could be "Northern United States", "Northern Canada", "Northern Australia"... and recent discussions of how to categorise 'Commonwealth' English have suggested that 'British English' may not be the same as (and/or may be worth distinguishing from) 'British'. - -sche(discuss)21:17, 21 June 2013 (UTC)
Ok, but the current way the template works, regional labels are formed by adding the name of the language after the name of the label (possibly modified). So, {{British}} creates its category as British + English. {{context|British|lang=fr}} creates Category:British French. In that respect, having "English" at the end of the label is redundant. —CodeCat21:48, 21 June 2013 (UTC)
Indeed, it's distinctly unhelpful, and I thought someone was going to update it to display as "British" instead. (We are, however, straying from the topic at hand.) - -sche(discuss)04:32, 23 June 2013 (UTC)
I thought so too, but nothing happened. I'm tempted to remove the "|UK" from label=British English|UK, but I don't know what effects it might have elsewhere, so I haven't tried it. Where are our template experts? (Apologies for straying off topic). Dbfirs07:12, 23 June 2013 (UTC)
Removing the |UK would just change the label text. There’s no extra template magic, the template displays the link ], even if you invoke it with lang=fr. Go ahead and remove it, or change it to the shorter British.
But perhaps then {{UK}} should be turned into a country-specific label, rather than redirecting to {{British}}. —MichaelZ. 2013-06-25 21:39 z
Regional labels can’t be completely separated from other usage labels. They overlap with socio-cultural, socio-ethnic, temporal, media type, and other usage. Overt examples include African American Vernacular English, British spelling (specific to written language), Helsinki slang, Multicultural London English. Many regionalisms are directly a result of other factors, like politics and administration, cultural and religious history, etc.
Most regional usage labels are naturally organized as a hierarchy, and the categorizing can be limited to a lower level. For example, even though jambuster is used in Manitoba and northwestern Ontario, the very-specific label is designed to classify it as Canadian English.
Ideally, each label or combination would carry a specific definition. For example, southern + US isn’t just the southern half of the USA – it is the South, whose particular boundaries and definition result from its history. —MichaelZ. 2013-06-24 14:46 z
Automatic topical categorisation
Something we may consider for the more distant future is automatic recognition of topical labels. Our repertoire of topics is determined by {{topic cat}}, which we probably want to Lua-cise as well at some point. There is no reason that its data module could not be shared, though. We could use that module for labels, so that any label that matches a topical category name will automatically be categorised in that category. {{context|clothing|lang=en}} would then automatically add the entry to Category:en:Clothing if "clothing" is recognised as a valid topic by {{topic cat}}. This won't happen anytime soon but it is something we can consider. —CodeCat20:00, 21 June 2013 (UTC)
That would be tricky to implement, since sometimes we don't want the matching category to match the context label, but rather have the word in a subcategory. Listing all (taxonomy) terms in a single category would accomplish nothing useful, as it would put all the generic, specific, familial, ordinal, etc. terms into a single category, and that would not be desirable. I can think of several other situations like that, such as the way that names of stars in in Cat:en:Stars rather than in Cat:en:Astronomy. There are also situations where a term will be listed as (slang, XXX), where there is a category for XXX slang, so combinations would have to be treated as well. I'm not saying that it couldn't be done, just that it wouldn't be altogether straightforward to accomplish. --EncycloPetey (talk) 01:22, 25 June 2013 (UTC)
It could work as a default, so that it applies to any unrecognised label. Recognised labels would override the default so we could still specify. That would be necessary in any case where the displayed label does not match the typed label, like for "stars". But the advantage would be that we could whittle down the number of recognised labels somewhat and let them use the default handling, which makes the collection of labels easier to manage (there are currently almost a thousand). —CodeCat01:26, 25 June 2013 (UTC)
New l
There's an upgraded version of {{l}}here, it has more features and is faster. The template is backward compatible, so nothing will break if we replace l's code with it, and I hereby suggest to do this. Any thoughts are welcomed. --Z 09:34, 22 June 2013 (UTC) (edited --Z10:40, 22 June 2013 (UTC))
I trust you mean "nothing will break if we replace the current code of {{l}} with its code" rather than the other way around. —Angr10:36, 22 June 2013 (UTC)
I hope we replace {{l}} and {{term}} with these excellent new templates soon. Will make editing for me much easier. --Vahag (talk) 11:02, 22 June 2013 (UTC)
I took a look and there was nothing to fix, we only have 6-7 entries that start with "*", none of which are linked (and probably will be linked) by Template:l, they are English. --Z16:58, 22 June 2013 (UTC)
Can this template be made to have the functionality of {{l-self}} in inflection tables, viz. a link that appears on its own page appears linkless and in bold instead of linked and in blue? —Angr21:28, 22 June 2013 (UTC)
I added that to a test module, (if you are in WP:BP, go to Wiktionary:Beer parlour/2013/June to see this test correctly) {{l-self/sandbox|la|link to current title ]}} > but the feature makes language_link's code a bit ugly (I'm not sure where is the best way to handle it, in language_link(), outside of it, or in a new function) so I didn't add it to Module:links. --Z11:22, 23 June 2013 (UTC)
To update {{term}} with {{term/t}} and in a way that it takes the language code as the first parameter we first need to check for all usages of {{term}} without the "lang" parameter, and add "|lang=" to them, by bot, so that all lang parameters would have a value. --Z13:57, 23 June 2013 (UTC)
I meant exactly "lang=", the value is an empty string (we can also add "und"). --Z14:56, 23 June 2013 (UTC)
Actually it is also possible to guess the lang to some extent (for example when term is used just after etyl), but the change still needs to be checked by humans so it can be a JS tool or something. --Z15:01, 23 June 2013 (UTC)
I'm not sure what the benefit of that would be, honestly. For the template, it doesn't matter whether the parameter is empty or just not there. And we did already try to guess the language in many cases. That category contains what is "left" after all of that. Still quite a lot to do. —CodeCat15:15, 23 June 2013 (UTC)
The benefit would be that we can make term/t (and therefore term) backward compatible -- if the parameter "lang" is not provided, then the first parameter will be treated as the language code, otherwise it would be considered as the target page. --Z15:20, 23 June 2013 (UTC)
There's no need for that, the problem is not making term/t's parameters identical to term, we can do that right now. But I want to get the rid of this horrible "lang" beside Lua-izing term. --Z19:18, 23 June 2013 (UTC)
Does it handle artificial languages ok when they belong in an appendix? Also, what happens when you specify a reconstructed language but the term is not preceded by *? This should trigger an error, ideally. —CodeCat22:32, 23 June 2013 (UTC)
For what people are expecting from {{l}} in particular, yes it works ok. I was not sure what to do in that case (consider that a valid input and link to its appendix, or showing an error) but I think it should be considered an error, otherwise inputs would be inconsistent. I added that. --Z07:53, 24 June 2013 (UTC)
Well, it would be nice if this template could replace {{lr}} and {{lx}} as well. So it should handle terms similarly. The following rules should be applied:
Regular language: link to mainspace if no *, link to appendix if there is *.
Reconstructed language: error if no *, link to appendix if there is *.
Appendix-only language: always link to appendix.
Also, it's probably better to leave out the section link when linking to an appendix page, because we normally have one language per page anyway. {{lr}} and {{lx}} already do that. —CodeCat14:56, 25 June 2013 (UTC)
Currently the code is exactly working as above (I forgot appendix-only languages, added support for that).
Regarding linking to sections, I would do that if that could make the job easier for the code, but in this code we actually have to add extra "if"s to remove section link, so... --Z15:16, 25 June 2013 (UTC)
Please do not "transliterate" Serbo-Croatian Cyrillic since Latin and Cyrillic forms always link to each other in headword lines of Serbo-Croatian entries, and are always listed in pairs in all the other cases. --Ivan Štambuk (talk) 04:13, 25 June 2013 (UTC)
I've removed that. Ideally we should have linking templates for Serbo-Croatian (and similar ones for languages that use both non-Latin and Latin scripts) that takes say "хрватски" and gives "хрватски / hrvatski" (and give "hrvatski / хрватски" when take "hrvatski"). --Z14:41, 25 June 2013 (UTC)
Why? Premature templatization is the root of all evil. The lower the number of templates, the easier it becomes to parse and reuse Wiktionary data. Language-specific templates should IMHO be outlawed whenever a more general template exists that provides the same functionality. The only exceptions would be templates that do not generate content (e.g. {xx-PoS} used in the inflection line, even though this is unfortunately beginning to change with "intelligent" Lua ports) and serve as time-savers. --Ivan Štambuk (talk) 00:16, 27 June 2013 (UTC)
Yeah I don't like the l/xx templates either, but Serbo-Croatian is an exception. It's easier to type {{l|sh|hrvatski}} rather than {{l|sh|hrvatski}} / {{l|sh|хрватски}}. --Z06:36, 27 June 2013 (UTC)
Sure, but then again the same argument could be made for {{term}} where Serbo-Croatian pairs are used (e.g. in etymologies). Such pairs of terms differing in scripts alone are at any case rare (only in English-section lists and running text, which are usually etymology-related). What would save significant space and typing is {{l}} removing SC accent marks if present, e.g. {{l|sh|hr̀vātskī}} behaving as if {{l|sh|hrvatski|hr̀vātskī}} (and same for Russian, Slovene etc.). --Ivan Štambuk (talk) 07:00, 27 June 2013 (UTC)
Would it be ok to make {{lr}} just forward the call (not redirect) to {{l/beta}}, as a trial run to see if there are no problems that we missed? I would rather not make any changes to {{l}} just yet until we have worked out any children's diseases (a Dutch idiom... not sure if English has an equivalent) that the module may have. —CodeCat16:34, 25 June 2013 (UTC)
:) OK, lets do this, although as I said before, there's really no appendix-related bug in the module for l particularly. --Z16:45, 25 June 2013 (UTC)
There are some script errors. All of them so far are because VL. is not a valid language code. The old version of {{lr}} would fall back to etyl: templates in that case, but the module does not do that. I don't think anything needs to be changed though because there is already a discussion going on whether we should treat Vulgar Latin as a language or not (and give it a proper code if we do). So these uses will all be changed sometime soon to either use "la" or some new code for Vulgar Latin. —CodeCat21:29, 25 June 2013 (UTC)
It's mostly used in templates, though, and it's used for reconstructed terms as well, so it will not be so easy to fix. —CodeCat12:50, 4 July 2013 (UTC)
But it's also harder to convert (the face parameter in particular, there's no clean way to convert it). --Z13:08, 4 July 2013 (UTC)
The face parameter shouldn't really be there, I think. I'll add a cleanup category to see if is actually used anywhere. I think the best strategy to convert it is to do what you did to {{lr}}, more or less. When the term is in a reconstructed language, an * has to be added, but otherwise it can be left as it is. —CodeCat13:10, 4 July 2013 (UTC)
I imagine that this template works pretty much the same as the new {{l}}. What differences are there, other than the class="mention"? —CodeCat18:28, 5 July 2013 (UTC)
term? or recons? There are some minor differences, we don't have genders and instead we have lit and pos parameters, also it has several more useless classes that I had to add... --Z18:44, 5 July 2013 (UTC)
I meant between {{term}} and {{l}}. Maybe the next step, after this is over, is to see how we can remove some of the differences between these two templates. One obvious difference is the way the language is specified; I'd really like to eliminate that soon (now, if possible), but it would be hard to do with so many instances of {{term}} still missing a language. I do want to ask though, does Module:links use expandTemplates to call the script templates, like Module:headword does? —CodeCat18:52, 5 July 2013 (UTC)
No, it uses a function in Module:utilities instead. Regarding eliminating "|lang=", that's possible to do, we already discussed that above... maybe you didn't get what I meant in that discussion? If we add "lang=und" (or "lang=") to those instances of term which haven't used "lang", then the language can be specified in the both ways, i.e. using "lang" and the first parameter. So, "lang" will be deprecated. Later we can move the value of "lang" to the first parameter in the entries, and then remove "lang" from the template. --Z19:03, 5 July 2013 (UTC)
I did something else with Module:labels. Instead of relying on the presence or absence of lang=, I added an extra parameter to the module invocation (not to the template) called "compat". The module uses this to determine whether to look for lang= or for the first parameter. That way you can have two different templates, with different names for parameters, that both do the same thing. We could do this for {{term}} too, but we'd need a second template then. —CodeCat19:11, 5 July 2013 (UTC)
Is there any good alternative for the title "term"? What about "ment" from mentioned? --Z19:21, 5 July 2013 (UTC)
Eureka, {{m}}! You're going to replace gender templates with {{g}} aren't you? l for links, t for translations, m for mentioned terms... looks great IMO, it's also shorter, an important advantage. --Z19:26, 5 July 2013 (UTC)
Yes, that would work, but not everyone wants to get rid of {{m}} just yet (see the discussion about it), apparently because people think {{g|m}} is too much work to type. We'd need to make sure there's agreement about that first. In the meantime, would it be possible to add such a compat= parameter, so that we can get started once we know what we're going to name it? In any case, {{term/t}} should use the new parameter names without lang= (it should not use compat), and I'll convert {{recons}} to call it once that is done. —CodeCat19:30, 5 July 2013 (UTC)
We have been creating more entries for reconstructed Vulgar Latin lately, but we still use the code "VL." for it. That is rather inconvenient and somewhat inconsistent as we are really treating it as a distinct language with its own name and language header. So I propose that we create a separate language code for this language. An obvious candidate would be "roa-pro", but other ideas are also welcome. —CodeCat13:42, 22 June 2013 (UTC)
But it was contemporaneous with Classical Latin and spoken, in some dialect, in all the places where Classical Latin was and shared most grammar and a great deal of the vocabulary. And they shared the same army. How is it a separate language? DCDuringTALK16:43, 22 June 2013 (UTC)
It's a separate language because we treat it as one. This request is a practical consideration, not a theoretical one. —CodeCat16:50, 22 June 2013 (UTC)
I was thinking maybe we should go the other way; turn all the headers into ==Latin== and use {{context|Vulgar Latin|lang=la}} in the entries. Mglovesfun (talk) 17:04, 22 June 2013 (UTC)
That's also possible, but we probably don't want to use the same inflection tables for those entries, because use of cases and verb forms was somewhat different in Vulgar Latin, as were the endings themselves. The ablative case was no longer distinct for example, except maybe as a relic formation, and genitive and dative were merging or had merged. Some Romance pronouns also descend from case forms that were created analogically within VL and never made it into writing, such as French lui. There is more at Wiktionary:AVL. —CodeCat17:29, 22 June 2013 (UTC)
I'd prefer to see it all under a single language header of ==Latin==, but do acknowledge the need for some different templates, almost as if it were a separate language from Classical Latin. The problem is that one can't completely separate the two, so that for example, some words in Classical Latin have a different inflection under Vulgar Latin. Do we then duplicate the contents of the Classical Latin under Vulgar Latin, only with different pronunciation and inflection? Wouldn't it be simpler to have a Vulgar Lation inflection table added? Further, it opens a can of worms concerning Ecclesiastical, Medieval, Renaissance, and Modern Latin, which also have differences from the Classical. Unless a proposal deals with the gamut of the language, I don't think I could see treating ==Vulgar Latin== entries as particularly feasible. --EncycloPetey (talk) 21:52, 24 June 2013 (UTC)
Vulgar Latin terms that are not attested and are reconstructed on the basis of comparative evidence shouldn't really belong in the main namespace, but as appendices as all the other reconstructed terms. I support the common treatment of attested forms as a variant of ==Latin==. --Ivan Štambuk (talk) 03:57, 25 June 2013 (UTC)
Ok so... All instances of {{lr}} and {{recons}} that use "VL." should be replaced with "la"? And then Appendix:Vulgar Latin and its subpages should be moved to Appendix:Latin? —CodeCat14:58, 25 June 2013 (UTC)
Hold on now. . . you've taken comments about language section headers and reinterpreted them as comments about etymology and categorization. If you want to extend the discussion, that's fine. . . but ask the questions directly instead of inferring broad opinion. I would keep VL. in etymologies. It's no different than saying that a word derives from Mexican Spanish. There are temporal and regional subdivisions of a language that are critically important to understanding etymology. I still consider Vulgar Latin to be Latin, just as we consider Mexican Spanish to be Spanish. The only difference is that the differences we treat in Spanish are usually synchronic, while the Latin differences are more often diachronic. --EncycloPetey (talk) 04:27, 26 June 2013 (UTC)
Giving Vulgar Latin an etyl-only code while treating attested words as ==Latin== for L2 purposes (which is kinda what we do now) does seem like the best idea. - -sche(discuss)05:37, 26 June 2013 (UTC)
Yes, that is what I meant, EncycloPetey misinterpreted it a bit. I specifically said all instances of {{lr}} and {{recons}}, not categorization or etymologies. Of course, {{recons}} often appears in etymologies and it would be changed there too, but the original code would still remain in {{etyl}}. —CodeCat18:54, 26 June 2013 (UTC)
But it's not just an etyl-only situation. Vulgar Latin also appears as a context label for senses that exist in Vulgar Latin, but did not exist in Classical Latin, just as we mark senses in English as (UK), etc. --EncycloPetey (talk) 21:05, 26 June 2013 (UTC)
Nothing, but the thread is titled "a code for Vulgar Latin", and Vulgar Latin needs a context code for senses and an accent code for pronunciation. If you meant for this thread to exclusively discuss the language code, then you didn't make that clear. --EncycloPetey (talk) 19:36, 27 June 2013 (UTC)
You know
I'm very afraid to think that the only moderator of the Korean Wiktionary is missing for quite a long time. Besides there are only two main contributors of the Korean Wiktionary and this includes me. --KoreanQuoter (talk) 18:32, 22 June 2013 (UTC)
Not at the moment but the templates need extensive updates. I don't know but the Korean Wiktionary is stuck the same since 2011. Let's not forget that the whole Korean Wikipedia community is disintergrating due to some members most likey suing each other under the Korean laws. --KoreanQuoter (talk) 18:59, 22 June 2013 (UTC)
Haha wow. Is it really that bad over there? I am really interested in the background and how it came to that. It just so happens that I do have sysop powers at Korean Wiktionary, though I'm not too fond of using them. But if there's anything urgent then I guess it's okay. -- Liliana•19:59, 22 June 2013 (UTC)
Do we want "imperfective" and "perfective" to be treated as genders?
I noticed that some translations of verbs into Russian use imperfective and perfective labels as the second parameter of {{t}}, indicating that they are treated like genders. This is not our usual practice, but is it something that we want to adopt? There is no technical restriction against it; we can add "impf" and "pf" to the list of valid gender codes in Module:gender and number and then these codes will work fine. So this is more a question of, do we want it that way? I am a bit unsure about it myself. While on one hand these labels are very useful for the Slavic languages (and maybe others as well), there is the danger that we might end up extending this into other types of verbs like frequentative, durative, stative, inchoative, causative and so on. And I'm not sure if we want to indicate just any arbitrary verb type on a translation or headword line. It could become quite messy if we did that. —CodeCat20:51, 22 June 2013 (UTC)
I noticed that this method was used and I think that's a good idea to add impf and pf directly into {{t}}. (for some reason the second one is always with a dot, as in pf). The problem is only that the template doesn't allow both impf and pf, so such verbs are marked impf / pf, which User:Kephir/gadgets/xte currently doesn't like. I'd like this to be adopted. It would definitely benefit all Slavic languages and seemingly Georgian. I don't know if any other language groups have imperfective/perfective pairs of verbs. All Slavic verbs are either imperfective, perfective or both (a smaller number), which affects their usage (sometimes complicated) and grammar (e.g. perfective verbs don't have a present tense).
I don't see any other verb labels to be overused. Transitive and intransitive are usually marked with {{qualifier}}. Japanese causative verb link to their normal lemma forms (often a noun). Slavic abstract and concrete verbs are only a small group of verbs, so they are also marked with {{qualifier}}. --Anatoli(обсудить/вклад)22:45, 24 June 2013 (UTC)
Ok, I have added the gender codes "impf" and "pf" (without the dot), so they can now be used anywhere a gender can. You can also combine them with the other codes in silly ways like "m-p-impf" or "pr-pf-d" but of course you're not supposed to do that... —CodeCat22:54, 24 June 2013 (UTC)
Thank you. Will it work for verbs, which are both impf and pf? Not sure about these combinations, as verbs don't have genders or numbers. --Anatoli(обсудить/вклад)23:21, 24 June 2013 (UTC)
You can specify multiple genders for a word. This works the same way because they are now treated as genders. —CodeCat01:04, 26 June 2013 (UTC)
Seems hackish but justifiable, though I'd prefer that an additional parameter aspect= be added to both {{t}} and {{infl}} accepting well-defined sets of arguments (as opposed to stand-alone templates that are being used now), as well as cheat codes for biaspectuality and similar. If need be, combinations via hyphens could be enabled in a manner similar to genders. --Ivan Štambuk (talk) 04:06, 25 June 2013 (UTC)
I don't really see any benefit in adding yet another parameter. We already add number and animacy to the same parameter, which are also not genders. If anything I would prefer just renaming "g=" to something else, but we'd want to keep the name short. For templates like {{t}} which use unnamed parameters, there isn't really a point. —CodeCat15:02, 25 June 2013 (UTC)
These are all grammatical labels, right? They may as well all be in the same place in the headword, whether it’s inside or outside the brackets (I have seen both, and we should pick one or the other in our headword templates). —MichaelZ. 2013-06-25 15:10 z
Well, I don't think we should put all grammatical labels there. Only the labels that apply to the word as a lexical unit, so transitive/intransitive go on the individual sense lines. And the following brackets and/or the inflection section are used for inflection information, so that shouldn't go in the same place as the gender either. So what does that leave? —CodeCat21:36, 25 June 2013 (UTC)
I hope this doesn't get bogged down in long discussions. I'm continuing to use impf/pf params in {{t}}. Yell out if it's going to be a big problem. I'm keen to have the ability to add impf/pf via User:Conrad.Irwin/editor.js, which will be a time-saver. --Anatoli(обсудить/вклад)02:30, 26 June 2013 (UTC)
I think we need to look at the underlying rationale: Why do we mark gender for noun translations in translations? Should we mark this information in translations? If no in general, then are the exceptions? I've always been of the opinion that including gender of nouns in translations is a good thing, but offhand I can't see a good rationale for it other than "That's what we've always done." So, is there any kind of information about a translation that should be included in a Translations section, and why? --EncycloPetey (talk) 04:37, 26 June 2013 (UTC)
One obvious answer is because it affects the way the word is used in context (which article it gets, etc). Of course there's a lot of information that isn't simply gender (conjugation / declension type?) that also is necessary to know but which we don't include. I don't know where the line should be drawn. DTLHS (talk) 04:49, 26 June 2013 (UTC)
So, do we want all information that affects the way a word is used in the translation? Why gender but not the part of speech? Why not the name of the inflection pattern? And how does having any of this information in the Translations section (as opposed to the entry) benefit the user? --EncycloPetey (talk) 21:12, 26 June 2013 (UTC)
But there are plenty of homographs for which there is no gender difference, particularly as some languages have no gender, or have homographs of the same gender. Do you have a proposal that would help mark those as well? If distinguishing homographs is the only reason for marking gender, then I don't think we should do it. --EncycloPetey (talk) 21:12, 26 June 2013 (UTC)
We can draw the line where it's not common. Bulgarian, Macedonian and Albanian nouns have definite and indefinite forms but only indefinite is given in the translations, no need to mark. Conjugation and declension types belong to entries but occasional info can be given using {{qualifier}}. impf/pf info benefits a dozen of Slavic languages (+ Georgian) for which we have a lot of contents. A learner won't be able to use correctly without this knowledge, e.g. it doesn't make sense to make a phrase "I'm listening to music" with a perfective verb (they simply don't have the present tense). I don't think anyone would seriously think of removing gender info from noun translations, it's definitely going to make the dictionary worse. We should give more, not less. --Anatoli(обсудить/вклад)05:51, 26 June 2013 (UTC)
I'm thinking about it. I'm beginning to think it doesn't belong in the Translations section. It belongs in the entry, where the inflection pattern, usage notes, and definition are. None of that is translation. --EncycloPetey (talk) 21:12, 26 June 2013 (UTC)
Imperfective/perfective is a part of the definition, technically, but it is so pervasive in these languages that you can't get around selecting one or the other. So adding this information helps users find the translation they want faster, because they can see in advance which of the two verbs they need for their particular use case. Think of it as a language-specific translation gloss. —CodeCat00:45, 27 June 2013 (UTC)
I must say that I agree with EP, adding genders, plurality, aspectuality and other arbitrary grammatical info to translations, and which are already available in much greater details within the linked entries themselves, seems pointless after all. Knowing gender of a noun or perfectiveness of a verb is usually not enough to predict its inflection. Translations are simple linked redirects to the main entries, and not full-blown "mini entries" that can be used at the instant of looking them up. Slavic verbs which come in perfective/imperfective pairs (not all do, and some are biaspectual) should simply be provided in form of "<perfective lemma form>/<imperfective lemma form>". --Ivan Štambuk (talk) 00:27, 27 June 2013 (UTC)
Labels {{impf}} and {{pf.}} are short and informative. You don't have to provide these labels if you don't want, they are optional, like gender. The number and quality of Serbo-Croatian translations lag behind Serbo-Croatian entries. It's the other way around with Russian. Many verbs are red-linked in translations, so labels in translation tables is often the only place where this info exists (unless one uses interwiki and can understand Russian). Let alone Ukrainian, Belarusian, Macedonian, etc. verbs, for which entries may never get created at this rate.
Slavic editors haven't communicated with each other enough to agree on formats. E. g. Dan Polansky prefers Czech perfective forms to be provided in translations rather than imperfective or both. It seems different from Polish. I have been providing both imperfective and perfective forms, especially for Russian. Each time I had to add {{impf}} and {{pf.}} manually. Since they are not synonyms users need to understand, IMO, why there are two forms provided for most verbs, e.g. see#Translations - видетьimpf, увидетьpf. Labels {{impf}} and {{pf.}} have been in use for many years, nothing changes, this topic is about simplification of adding these templates to {{t}}. People who are not interested in Slavic languages, can simply ignore this. This distinction becomes absolutely important for people who deal with these languages. As for entries, Russian pairs are done this way: "делать" and "сделать" (делать • (délatʹ) impf. — сделать (sdélatʹ) pf.).
Yes, not all verbs have pairs, some are biaspectual and some have more than one corresponding perfective or imperfective verb, perfective forms often change the original meaning, depending on the prefix used (if used), can be semelfactive. --Anatoli(обсудить/вклад)01:37, 27 June 2013 (UTC)
Well, for Serbo-Croatian translations I have been providing them for a long time . What I'm dwelling on is the purpose of providing such (useful but nevertheless superficial) grammatical information to the translation-section terms at all. When you add two scripts, as well as Ijekavian/Ekavian variants (when present) to the list, for what is essentially a single translation of a verb you can get as much as eight different SC words listed. "impf." label plus two spaces times eight equals 56 extra characters. So if you want to add e.g. three SC verbs as translations for an English verb, translations would always occupy 4-6 lines which simply looks silly.
Regarding redlinked/missing translations - it's the MW software that is defective and thus causing this imbalance (imperfectly fulfilled by TBot/"translations that need be checked"). Creating an entry with a proper meaning gloss, and adding a translation to the glossed translation section is essentially one and the same operation - creation of an association between meanings in different languages. Every time someone adds a new translation, a corresponding FL entry should be created or updated (a missing meaning added), and vice versa. Specifically for translation purposes, in languages with "nontrivial" grammars the kind of info that we mandate as translation terms is not enough to guess inflection - we only add lemma forms (and not e.g. oblique stem forms, inflection class etc.) Translations IMHO can only serve as dumb redirects to proper FL entries. --Ivan Štambuk (talk) 07:26, 27 June 2013 (UTC)
I have just added rather ridiculous translations (two synonyms) for heal as you described for your consideration, Ijekavian/Ekavian, Cyrillic/Roman + perfective/imperfective (hopefully with no errors). It doesn't have to be so big due to the nature of Serbo-Croatian. One verb infinitive may be sufficient but it's better if it has impf/pf tag. --Anatoli(обсудить/вклад)08:44, 27 June 2013 (UTC)
Some forms were missing and I've added them...it really looks ridiculously verbose. Perhaps in the future this could be visually optimized in some way. --Ivan Štambuk (talk) 03:41, 29 June 2013 (UTC)
Can we indicate on the main page the number of English words defined?
Currently, the main page says:
Wiktionary, the free dictionary 3,444,617 entries with English definitions from over 500 languages
I understand this to mean that the 3,444,617 entries are entries for words in all different languages. I think it would be nice to indicate how many entries we have that define English words, i.e. adding text saying, "including 1,234,567 definitions of English words". Cheers! bd2412T18:46, 24 June 2013 (UTC)
Actually it’s 3,444,617 main namespace pages, each of which can have many entries. If we add the amount of English words, it would have to be updated manually. — Ungoliant(Falai)18:57, 24 June 2013 (UTC)
We could have the dumps processed to count the lines in each Language's PoS sections starting with "#" (and not "#:" or "#*") and attempt to eliminate "form of" type definitions. That would yield "definitions", not lemmas, by language. If we use the dumps and keep growing, then we could honestly say something like "more than X,XXX,000 English definitions of English words" and update it periodically (monthly, quarterly?). Changes in the count between periods would be another way to monitoring activity. We could express gratitude to contributors in obscure languages, note unsanctioned reductions, etc. Whether its worth the cycles and effort I don't know. DCDuringTALK20:05, 24 June 2013 (UTC)
Actually, I wasn't really thinking of senses, just words. "Set" has a large number of senses, but it is still a single "word". I was thinking that it would be nice to show how many English words we have definitions for, irrespective of the number of definition per word. Of course, given the number of words with many senses (and words with senses in many languages), I'm sure that if we were to count all of the actual definitions here, that would put us in the tens of millions. bd2412T20:29, 24 June 2013 (UTC)
I don't see how we are not misleading folks when we say a form-of entry has a "definition". What we say has the sound of marketingspeak. Furthermore, the proportion of English L2 sections that have multiple definitions is less than 10% of the total, even excluding English form-of entries, so we cannot assume that we have "tens of millions" of definitions, if we exclude form-of entries. We can't really use "lemmas" and expect normal folks to understand. We could possibly count each L2 section as an "entry" without being misleading, except for the form-of problem. It seems to me that any honest count requires some work. If we only counted carefully once a year, but also counted some percentage increase of some meaningful measures of overall size since that last count, we would not be misleading folks and convey some idea of continued growth.
None of this gets to the real problem of quality, which is probably more important to keep users coming back, especially in the competitive areas, such as against English online monolingual dictionaries, where we do not excel. DCDuringTALK22:59, 24 June 2013 (UTC)
Templates and categories for passive infinitives?
I've recently created templates and categories for Category:Russian passive participles (past and present, active are still to be done) following the Bulgarian way, using template boiler. What would be the correct way to categorise Russian passive infinitive (they are some of Russian reflexive verbs with the suffix "-ся/-сь", e.g. делаться (passive infinitive of делать) or нестись (passive infinitive of нести)? Can someone help with create a correct category boiler? --Anatoli(обсудить/вклад)22:55, 24 June 2013 (UTC)
I know, those verbs are automatically added to the category as per Module:ru-verb you started but it's not the same thing. All verbs with "-ся/-сь" are reflexive even if they don't have a reflexive meaning (which is OK), "делаться" and "нестись" also have non-passive meanings but "смеяться" (to laugh), another reflexive verb, can't be a passive verb. --Anatoli(обсудить/вклад)23:26, 24 June 2013 (UTC)
Yes, it's like "be made", "be constructed", "be carried", etc. formed from transitive verbs, only in East Slavic verbs, they are single words (detachable particles in other Slavic languages not necessarily make them two words, it's debatable). --Anatoli(обсудить/вклад)23:43, 24 June 2013 (UTC)
Varieties of English
We have never agreed on a definition for the major regional labels and regional categories for English. I have been struggling for years now trying to label terms and senses with what I see as absolute contradictions (e.g., the label {{British}} rendering the text UK and adding to Category:British English), and undefinable terminology ({{Commonwealth English}}, no English-language variety or feature whatsoever corresponds to this political alliance).
“British English” is the traditional linguistics term applied to terms, senses, spellings, originating in or characteristic of Britain or England. So even though, e.g., the spelling aluminium (North America: aluminum) is used in Ireland and Australia, and dozens of other countries, let’s label it British as being inherited from Britain and characteristic of the largest branch of the language. Then category:Australian English can be a list of true Australianisms.
“American English” is the traditional linguistics term for the language in North America, including the United States, and Canada when it’s considered relevant. Let’s call it North American for clarity. Let’s use this single label wherever both US and Canada appear together.
“United Kingdom,” consisting of England, Wales, Scotland, and Northern Ireland, would likely only have a few legal and political terms that belong to this country.
Instead of using UK as a label, I would prefer "Britain, Northern Ireland". Northern Ireland is not unified with Britain linguistically in any meaningful way, except through Scots, so it makes sense to treat them separately. —CodeCat23:35, 24 June 2013 (UTC)
Maybe Great Britain, to be more specific and less confusable with British. —MichaelZ. 2013-06-25 00:53 z
The scheme makes sense to me. The hard part is knowing the facts. For example, it is hard for me to know that a term I know, say, from US TV (presumably nationwide in usage), that does not appear in UK dictionaries, is actually "North American" rather than "US". It is tedious and not necessarily conclusive to check Google News for use in Canadian newspapers. I have never objected to relabeling something from US to North American. I'm sure that some US readers will be confused, but screw 'em: I can't imagine a better alternative. DCDuringTALK23:59, 24 June 2013 (UTC)
If you are only sure of US, then enter US. Sooner or later a Canadian might change it to North America. —MichaelZ. 2013-06-25 00:53 z
I had thought there was some unity among those not subjected to Noah Webster's spelling reforms, with the US being differentiated in many spellings from all the rest of English speakers, whose spelling can be relatively briefly called "Commonwealth English". Are we going to have labels like except US and except North America? DCDuringTALK00:04, 25 June 2013 (UTC)
By the way, British English also has reforms that took place after North American English was differentiated.
No one in Canada thinks of their English as “Commonwealth.” Canada is in the Commonwealth, but has English closer to the US, while the Republic of Ireland is not, but shares much with many Commonwealth countries. So Commonwealth is likely to become Commonwealth except Canada, Ireland, etc., in the majority of cases. Are there other exceptions? —MichaelZ. 2013-06-25 00:53 z
One problem I immediately see with "Commonwealth except Canada, Ireland" is that it's too easily parsed as "Commonwealth (except Canada, Ireland)" rather than the intended "(Commonwealth except Canada), Ireland". That implies both that the spelling so labelled isn't used in Ireland, and that Ireland is in the Commonwealth, both incorrect implications. I suppose just {{context|except North America}} would be clearest... or, suboptimally, we could make up our own label (e.g. "UKIANZI", based on the first letters of many colour-using countries) and link to a WT:Glossary definition of it. - -sche(discuss)07:18, 25 June 2013 (UTC)
Since we're grasping at straws, anyway, we could do worse than using something like "colour countries" vs. "color countries"- along the same lines as satem languages vs. centum languages. I think it's obvious enough that most would only have to have it pointed out once before they caught on. Of course, one isogloss doesn't give the whole picture- but it would be decent enough as shorthand.
If we want to do it that way, I'd suggest putting the end of the word in bold, like colour countries. That would make the point even clearer. —CodeCat11:41, 25 June 2013 (UTC)
The problem with except North America is that it becomes except North America, except the Philippines, except Liberia, and except someplace in the Caribbean, I think. Hey dictionary people: the reason we have simple names for complicated things.
The idea of identifying a language variety by a language feature is interesting in the abstract, although I suspect there’s a reason English doesn’t have one for this – why make up a name when there is a well-attested one? Colour, of course, is the spelling chiefly used in Canada, so it plainly fails to represent British English. If a feature is to be identified, perhaps it should be something more ingrained than an alternate spelling. —MichaelZ. 2013-06-25 14:09 z
Canada is tricky because it's the intersection of US and British spheres of influence, so it can go either way, depending on the isogloss. This is true of some other countries as well, but to a much lesser extent. I would classify Canada as closer to British than to US overall, but that doesn't help at the single-isogloss level.
The problem we're having is that nothing is truly independent: innovations tend to propagate more within one grouping or other, but can, and do, cross over unpredictably. They also may propagate unevenly, so that an innovation limited to one grouping may not reach all its members. This creates a situation where peripheral members of one grouping may have an isogloss in common with the other grouping because of a lack of change rather than a common change. At best we can identify groupings that tend to share things, not cut-and-dried mutually-exclusive lects. Chuck Entz (talk) 15:13, 25 June 2013 (UTC)
Canadian English is a variety of (North) American English. No references consider it a variety of British English. Estimating the number of spellings Canadians have in common with modern US and UK writing is a superficial and transient way to classify the dialect (and I don’t believe such an estimate would provide a clear answer). —MichaelZ. 2013-06-25 16:44 z
"Except North America, the Philippines and Liberia" is no worse than "North America, the Philippines and Liberia", which is logically what would have to be on entries like color. I like the suggestion of using an isogloss (either "colour countries" or something better). Not all "colour countries" spell the word with a "u", but then, hardly any "centum languages" languages use the precise word "centum". English, for example, uses "hundred", which is indeed etymologically more related to "centum" than to "satem", but which doesn't bear any visible similarity to it and which is therefore just as opaque and confusing as any opponent of a phrase like "colour countries" might argue that term to be. - -sche(discuss)15:25, 25 June 2013 (UTC)
Well, except that we know that Philippine English and Liberian English are mainly influenced by (North) American English. So the label North American, supplemented by our documentation which would agree with prevailing linguistic and dictionary terminology, would suffice. It would only have to be noted in the rare case of a clearly contrary usage in Philippine English by adding a Philippines label to the other usage, or not Philippines to this label.
The point is that the main language branches are British English and (North) American English. —MichaelZ. 2013-06-25 16:44 z
For some of these cases, why not put whatever detail we collect prevalent spelling in a nation or region in labels that, by default, do not display. Anonymous users could be given the option of seeing the countries of interest to them, eg, "Jamaica, US, Canada, UK". The visible-by-default items could be "US, UK, North America". This might allow a simple, but expandable data structure for the label, consisting of countries and some frequency indicator, eg, 1-10 for decile of frequency relative to universe of acceptable spellings. DCDuringTALK16:05, 25 June 2013 (UTC)
Relevant: a survey of regional English labels in some print dictionaries: User:Mzajac/Dialect labels. I had mostly forgotten about this. —MichaelZ. 2013-06-27 19:37 z
Why is the entry reduplicated below the part of speech headers?
I refer to the following. Suppose you're on page ‘frobber’. It will look something like this (I've omitted etymology &c. for brevity):
English
Noun
frob
A person who is frob
Someone resembling a frob
A device to frob
Verb
frob
To act like a frob
To make something frob
Adjective
frob
Having been frobbed
Resembling a frob
This reduplication adds clutter and seems senseless since these are almost always identical to the page's title.
Do you mean the redundancy of the headword line? But these would actually look like the following, respectively. So what should be removed? —MichaelZ. 2013-06-25 17:45 z
Okay, that helps explain it. In most articles I've visited recently this information was either absent, or contained in a separate conjugation table.
I still think Wiktionary looks cluttered though, compared to my pocket dictionary.
Also, most words tend to be regular so the extra forms should maybe not be displayed by default. (My pocket dictionary doesn't, although it does in most cases duplicate the main form, although due to the more compact layout this is less of an issue.)
So what should be removed? Well, maybe everything except for:
comp. alsofrobbier super. alsofrobbiest
A small and unobtrusive link to an article describing regular productions should be kindly offered as an aside.
I think the massive amounts of scrolling necessary for even the most basic of lookups is a major usability issue, and although I think people should think long and hard about getting a better layout (perhaps borrow some good ideas from paper dictionaries), everything we can do to minimise it would be good (if it doesn't remove non-redundant information). — This comment was unsigned.
Well, this dictionary is a comprehensive database, which plans to collect all of the information necessary for translation dictionaries, foreign-learner dictionaries, an etymological and historical dictionary, etc. It is cluttered and getting more so.
It would be great to have alternative, purpose-specific views, but we don’t have that yet. Other projects are also welcome to use our data (according to its open licence) to create unilingual, special-purpose, or simplified dictionaries.
Your contributions to improving the situation are welcome. —MichaelZ. 2013-06-26 03:25 z
DCDuring, I mostly use Wiktionary from my PC. I also sometimes use it on my laptop, albeit with even more frustration.
The problem doesn't seem to be that there is too much information, as such, but that it isn't structured nicely.
For a typical entry, three quarters of the screen is white and half of the rest is sidebar. So there is plenty room for improvement even without getting special-purpose-y.
Just not putting things below each other that can just as well be put next to each other would help a great deal, removing redundancy would also help, as well as taking better care to put summary information at the top where it's immediately visible and detailed information lower, linked from the top.
That way it would be easier to determine at a quick glance which sense of a word you need and then click through to the conjugation table and example sentences, instead of having to scroll through tables and examples for senses that turn out to be irrelevant.
Yeah, our layout sucks. The MediaWiki software is a jack of all trades in terms of page layout. It will keep improving in tiny increments. Tiny. I find putting the table of contents on the right is a big improvement.
Go to WT:PREFS, check “Put the table of contents onto the right of entries,” and click “Save Settings.” You have to set this for each browser. Unfortunately, the layout repaints annoyingly after a horrendous delay. Better to do it right:
That's all very well for people who are logged in, but our display ought to be attractive even to people with no username and no intention of registering one. —Angr17:49, 28 June 2013 (UTC)
Absolutely. Let’s put that CSS into MediaWiki:Vector.css, so everyone can enjoy the benefit. —MichaelZ. 2013-06-28 18:07 z
FWIW, I'd support that change site-wide. Not sure about others, though. Nor am I sure how to capture feedback from any anons who might be horrified by the change. -- Eiríkr Útlendi │ Tala við mig18:13, 28 June 2013 (UTC)
Michael, your change would only affect people using the Vector skin, so it's not really "everyone". —CodeCat18:36, 28 June 2013 (UTC)
Which skins suffer from this problem? Corresponding CSS can be formulated for them, too. But that doesn’t affect anonymous readers, right, only logged-in users who have chosen those skins?
Why would they be horrified? If so, they can use WT:FEED as they do for all other horrors.
I am having difficulty deciding what to do with Latinate terms for Chinese herbal medicines that appear in Chinese entries. The ones I have noticed appear under a "Scientific names" header, but they appear elsewhere in the entries and are likely more widespread than the 10 or so that I have found so far. They are sometimes used in the definiens in these entries. Should we call these terms English as we do with medical and legal Latin, where they would be subject to RfV in running English text? Presumably if they fail CFI, they should be removed from any entry where they are used in the definiens. I am bothered by the fact that these terms, as medical and legal Latin terms, are apparently often used in the same sense in running text in multiple languages, but we insist on not calling them Latin or Translingual.
The "Scientific names" header seems to me to be a misnomer for them as they usually have no scientific backing, just clinical experience. But I will wikilink them as I move them in the entries. To what language should they be wikilinked? Why? DCDuringTALK18:14, 26 June 2013 (UTC)
It would help me to make a meaningful comment if I had a short list of examples. I suspect they're "English", but if they appear in medieval or Renaissance texts written in Latin, they they could be considered Latin. As far as I'm concerned, it all depends on the contexts in which they appear. If they appear only in otherwise English contexts, then they're English. If they appear in the midst of Latin text, they're Latin. And if they appear in the same set construction across a range of lingual contexts, we could consider them Translingual. --EncycloPetey (talk) 21:17, 26 June 2013 (UTC)
@SB: Sorry, but these are not names of herbs, which I have handled. These are sometimes related to animals, sometimes references to the seeds or roots of plants.
@EP: To me they appear uncited in the context of Chinese entries in the definiens. They are obscure as definiens, except possibly to herbalists. I'm not looking for a definitive resolution of their ultimate position because that has proven extremely unlikely. I'd be perfectly happy to simply remove them from the definitions and replace them with an English calque. Maybe I'll just put in a requests for glosses, so it's on someone's else's plate and might eventually get resolved. Do you think that I should put in WT:REE and WT:RE:la too? DCDuringTALK21:37, 26 June 2013 (UTC)
These are translingual names of pharmaceuticals. With the displacement of natural substances in the materia medica by synthesized or extracted/refined chemicals, these have likewise been displaced by chemical terms or commercial drug names. They've pretty much disappeared in most English contexts over the past century, except among those who get their information from older sources. Traditional Chinese medicine still relies on natural substances, so these terms have survived in Chinese publications, and in English texts based on them. I you look at older editions of formularies and pharmacopeias, they should be quite common. It wouldn't surprise me if they still showed up in some modern references, too, as a sort of "backwards compatibility" with older usage. Chuck Entz (talk) 02:45, 27 June 2013 (UTC)
@DCD: OK, I see what you mean now. Chuck is right that these are materia medica, but these are simply descriptions of the material in Latin, and are not particular terms in themselves. The phrase flos carthami means "flower of carthamus", and radix achyranthis bidentatae is "root of Achyranthes bidentata". There are all strictly sum-of-parts, with Latin used as a kind of international Western language to translate the Chinese. The entries probably shouldn't exist at all. --EncycloPetey (talk) 04:12, 27 June 2013 (UTC)
EP's is a persuasive analysis. That means that, as DCDuring thought, the parts should indeed be added to WT:RE:la because they (unlike the SOP combinations) should have entries, and the Latin should be dropped from the Chinese entries in favour of English translations/calques (e.g. Chuanxiong rhizome or Ligusticum rhizome instead of rhizoma chuanxiong). - -sche(discuss)05:07, 27 June 2013 (UTC)
Some last points which seem to conflict with the conclusions reached so far: The names used, though possibly following taxonomic names often lack capitalization of the genus name and tend to use older synonyms, often ones that are not valid in contemporary biology. The difference reminds me of the difference between an alchemist's use of quicksilver and a modern chemist's use of mercury. As for dismissing the terms as SoP, we do have terms, such as calcium carbonate that are as SoP. DCDuringTALK06:55, 27 June 2013 (UTC)
n.b. The word "calcium carbonate" is not SoP. Calcium is a metal; carbonate is an acidic powder, but calcium carbonate is a crystalline salt. Chemistry makes the combination have a different set of properties from its components. The name is a recipe, not a physical description. --EncycloPetey (talk) 17:14, 27 June 2013 (UTC)
nn.bb. The name Brontosaurus is also no longer valid in biology, but that doesn't stop people from using it outside of biology. And those flowers everyone calls "Geraniums" aren't Geraniums anymore, they're Pelargoniums. When scientific names change, the common names often don't. Although some disciplines are jargonistic, or hold on to archaic terminology, we're already equipped to handle archaic, jargonistic, and obsolete terms on Wiktionary. --EncycloPetey (talk) 17:19, 27 June 2013 (UTC)
A couple of quick points (I'm almost late for work): I've checked old additions of the US Pharmacopeia, and they do, indeed, have each item under a Latin header. It's possible these are references to some kind of official standard. I remember seeing such a term on a bottle of Witch Hazel I bought at a drug store a few years ago (I don't still have it, but it said something like "Aqua hamamelidis U.S.P."). Also, these are all over the place in works on Chinese herbal medicine, and it would be very helpful to the people who read those if we had an appendix of such terms, if they're not includable. Most people into herbal medicine don't know any Latin, and might have trouble with deciphering the inflectional information. I definitely think the Latin should be replaced in the Chinese entries with their translations- we shouldn't be translating one non-English language into another in English Wiktionary. Is it possible to create a cleanup category or a list of such cases? As an herb hobbyist/amateur ethnobotanist and linguist I'm probably as qualified as anybody to go through those and convert them. I'll have to dig out my books on Chinese herbs from storage and see what I can do. Chuck Entz (talk) 15:01, 27 June 2013 (UTC)
An Appendix:Materia medica would not be a bad idea. It's existence would also help future editors trying to deal with this issue with entries added after this is sorted out the first time. IT could also be set up with the reverse translation; that is, it could give the Chinese equivalent for each Latinate phrase. --EncycloPetey (talk) 17:14, 27 June 2013 (UTC)
A search for entries containing "Chinese medicine" gives 814 pages, which might be good to sample to see if there are more focused patterns. Some of these entries have had English versions of the Latin terms since their creation. DCDuringTALK18:03, 27 June 2013 (UTC)
Propose to use a single template for genders
We currently have several templates to display genders. It's not going to be easy to get rid of all the transclusions, and many of them are legitimate at least in the short term. But because we now have a module to display the genders, all of these templates now do the same thing (compare {{m}} and {{f}}). So it makes more sense, to me at least, to use just one template for this purpose. A short and concise name would be ideal, and I already considered we could use {{g}}. That template has become orphaned (mostly because of changes to {{head}} which have bypassed it) so the name is "free" (and there is a proposal in WT:RFM to rename it, anyway). So, I propose that we abandon the gender templates we still have by adding a call to {{g}} to them: {{m}} becomes {{g|m}}, {{m|f}} becomes {{g|m|f}} and so on. This is not really a big difference, and it is only really for legacy reasons because all "new" content should use {{head}} and such anyway (which does not need these templates because it uses the module directly). —CodeCat17:04, 27 June 2013 (UTC)
What's most important here, it's readers. And that does not make any difference for readers. What is needed for getting a better Wiktionary, for readers, it's as many contributors as possible. I don't understand how making their work a bit more complex would be a good thing for the project. Is the objective of head simplification? Lmaltier (talk) 17:49, 17 July 2013 (UTC)
Having to learn a syntax, the names and possible values of several parameters, etc., this might be a good thing for consistency, but it's more complex for new potential contributors, it's not simpler. On fr.wikt, we already got good will people discouraged by the heavy use of templates, which makes pages unreadable when you are not used to them. Lmaltier (talk) 21:38, 17 July 2013 (UTC)
So you're saying that "there are many different gender codes, but only some of them have templates like {{m}}, {{f}}, while you need to use {{g|...}} for the remainder" is somehow simpler to understand than "all gender codes use {{g|...}}"? —CodeCat22:09, 17 July 2013 (UTC)
The simplest way would be to have to know only the gender templates you need (their number is limited, and there should be a template for each one), rather than to have to know them + an additional code (g). Lmaltier (talk) 17:01, 18 July 2013 (UTC)
This is all just conjecture really. And I really doubt that the gender templates are going to be the deciding factor in how hard it is to edit Wiktionary. Especially as you rarely need them anyway, because in almost all the places you would need to put a gender, there is already a template that has gender support, like {{head}} or {{t}}. So making these really really really easy to use isn't exactly a top priority. —CodeCat22:21, 18 July 2013 (UTC)
Colors
Many of the entries on colors contain {{color panel}}, to display the color that the word represents. The sources of these colors are unclear, and are probably usually taken from editors' personal experiences on what color people mean when they use the term. Instead of using the best guess of an individual editor for what color a word represents, I think it might be beneficial to replace the English entries' colors with the results of the 2010 xkcd color survey. The compiled responses to five million "What color is this?" questions to a few hundred thousand people sounds like a lot more accurate linguistic data than any other source we could use. Thoughts? --Yair rand (not logged in) 22:10, 27 June 2013 (UTC)
Did the survey show colours and ask for names? Results might be different if people were given names and asked to formulate a colour. Note “judging from the RGB values, though, my readership skews white and nerdy.” Also, this is limited to display on computer screens (of course we’re thinking of using it to display on computer screens), but reflective colours under different illumination work very differently.
That said, I don’t mind using this, as long as its characteristics are kept in mind. It is a language corpus in a way. —MichaelZ. 2013-06-27 23:04 z
I'd feel better with such a source as a base standard where available. There really aren't very many instances of citations that would establish what color was meant by an author or speaker. Most color words beyond the most basic are ambiguous metonymies, eg almond (what part? of the nut or the tree?). I wish we could show a range of hues, with various degrees of saturation, etc. and a probability distribution that someone of a given culture would agree that the color matched the word. Until that's available (for free!!!), the color survey seems good for however many terms it covers. DCDuringTALK23:41, 27 June 2013 (UTC)
I disagree with using the arbitrary xkcd stuff; this could be gamed by bots etc., and is not durable or an expert authority — plus it only gives one colour for each name, whereas DCDuring points out that one colour can range widely (cf. how some cultures don't use the same word for light and dark "blue"!). The existing colour swatches aren't great, either: also too arbitrary. Equinox◑00:56, 28 June 2013 (UTC)
What we are doing now seems fairly arbitrary. Where do the colors come from? Where do the words come from? We have about 360 transclusions of the color panel template. xkcd had 954 colors, apparently, but not almond. BTW, unbeknownst to me, Talk:almond illustrates some problems with our current approach. (And the lighter of the two almonds looks too pink to me on my laptop screen, even compared to an actual almond.) Does WP have anything of value to us on the subject? DCDuringTALK01:57, 28 June 2013 (UTC)
Is the almond lit by daylight, incandescent or fluorescent? Is your laptop set to a white balance of 5000 K, 6500 K, 9000 K, or something else? There are dozens of factors that affect the way colour is perceived. Using a survey significantly reduces some of their effects.
Of course, unless there is at least a vague description of how the XKCD survey was conducted, it doesn’t mean very much.
The survey appears to be finished, so I don’t see how it can be gamed. If it is not, then just freeze the current results as a source. —MichaelZ. 2013-06-28 02:59 z
Wouldn't all color be daylight-lit? Maybe I should at least try some different skins to see how much it changes. DCDuringTALK03:08, 28 June 2013 (UTC)
Could we expand {{color panel}} to allow multiple panels? That would enable us to list both 'standard' values when these exist (e.g. HTML or ISO definitions) and the XKCD values, which I'm down with using. It would also allow us to display a range of shades on entries like grey, red, etc. - -sche(discuss)03:16, 28 June 2013 (UTC)
I'm in favor of such a thing in principle, but the last implementation (here? WP?) I saw had nine shades of red and didn't make be feel that I knew any more than if I'd never seen it and it looked ugly to boot. Yair's suggestion has the advantage that we know how to implement it and what it would look like. It would be nice if we could find some kind of color "wheel" visual that had multiple color names (red, dark red, maroon) appear when one clicked on a spot and allowed one to play with other visual dimensions as well. That would help with "translation" the other way. Even if all we could find was an external, non-WMF site. DCDuringTALK03:43, 28 June 2013 (UTC)
TOC on the right
Let’s float the table of contents on the right for everyone.
In the entry a, readers have to scroll down 16 times before they see any dictionary content (in Firefox on a 1024×768px display). This is very poor.
This can be done with WT:PREF, but then there is a disruptive delay before the javascript loads and the page layout changes. A better way is to use CSS. I have the following in my style sheet, and it could be put into MediaWiki:Vector.css for all readers. Test drive it by putting this code in your Special:MyPage/vector.css:
/* TOC float right without waiting for javascript */
.ns-0 #toc,
.ns-4 #toc {
float: right;
clear: right;
margin-left: 0.5em;
margin-bottom: 0.5em;
display: inline;
}
#toc {
max-width: 18em;
}
I've grown accustomed to the TOC being on the left and above all content, but I must admit, having it on the right and letting the content rise up is an improvement. - -sche(discuss)19:21, 28 June 2013 (UTC)
Having said that, this script does make pages like WT:RFV hard to navigate, because the TOC ends up beneath the list of "oldest RFVs"...and switching the placement of them (so that the "oldest RFVs" were below the TOC) wouldn't be much better, as it would just hide them. Can we exempt specific pages like RFV and RFD/RFDO from right-floating? - -sche(discuss)19:53, 28 June 2013 (UTC)
In terms of the rendered HTML on the WT:RFV page, the list of oldest RFVs starts on line 130, while the TOC starts on line 335, so combined with the CSS rules above, the TOC perforce comes after the list of oldest, even when floating.
The TOC and the "oldest RFVs" are side-by-side when the TOC is on the left. That's the format I'd like to preserve (at least for myself) on that page and on other big discussion pages with pre-existing right-floating boxes, where chances are people are more interested in the boxes and the TOC than in seeing the uppermost discussion right away. Floating the TOC on the right on top of the "oldest RFVs" would not, IMO, be much better than floating the TOC below the "oldest RFVs", because either way, one of the two is hidden from view. - -sche(discuss)20:30, 28 June 2013 (UTC)
The only places where this is particularly useful are in those with entries. Is it possible to limit it to mainspace and Appendix space? Chuck Entz (talk) 20:44, 28 June 2013 (UTC)
If you want this to affect main namespace only, then remove the comma and .ns-4 #toc from the code above.
Some other options in dealing with other top matter on pages could be to float other boxes on the left, or to add or remove a top-level heading (the TOC is always inserted just before the first heading on the page). There are also some mw:help:magic words: __NOTOC__ to hide and __TOC__ to locate the ToC.
You can select particular pages with the classes and IDs that MW provides in the page. For example, WT:RFV has body.page-Wiktionary_Requests_for_verification. So to disable the right float on this page only, try the selector:
Great! With that modification (so that it affects only the main namespace), I'm happy with that css, and would be OK with implementing it for anons. (If lots of people complain on WT:FEED, we can always turn it back off.) It would seem to address concerns like the one above. For logged-in users, I suppose it should be opt-in or at least opt-out-able. - -sche(discuss)21:53, 28 June 2013 (UTC)
I did the same. I use Monobook and put it in Custom.css and so far it seems to work just as it did from gadgets. I didn't experience the same JS speed problem though, so I can't vouch for performance. I buy that it ought to be faster. CSS seems to me a bit more accessible than JS, so I prefer the approach, even if it proves no faster. DCDuringTALK22:43, 28 June 2013 (UTC)
I had assumed the TOC appearance was part of the skin, but I see it is not. Yeah, common.css would be the right place for this. —MichaelZ. 2013-06-29 04:05 z
I want the TOC on the left. The right is where we float images, WP link boxes, WOTD notices, and just about everything else. Putting the TOC on the right as well (1) would increase visual clutter, and (2) would be poor visual design as the TOC information is left-justified. I thought we went through this before and decided against it; what's changed in this incarnation of the proposal? --EncycloPetey (talk) 22:39, 28 June 2013 (UTC)
EncycloPetey's comment here prompted me to look again at a. The TOC is beastly long, so from a usability standpoint, *not* requiring users to scroll down 16 times just to get any content seems like a win.
However, the image for the letter A is indeed pushed down to after the TOC, which isn't really the correct place for it.
Is there any way to have the TOC float right, but further right than images? I.e., so that right-floating images show up just to the left of the TOC, and vertically where they would have appeared if the TOC weren't there? -- Eiríkr Útlendi │ Tala við mig22:49, 28 June 2013 (UTC)
@EP: It's not so bad:
It's always possible for any individual registered user to choose the appearance that they want. If we don't have a preference set up in user or browser "preferences", then they can have customized css. It seems particularly easy for something the ToC, which does not require a clever use of selectors, AFAICT.
The vote was 29 months ago, which is a a generation in Internet dog years.
It is hard to defend the visual design aspects of a table of contents that can push content sixteen screens below where one lands on a content page. Images, as useful as they are for some entries, such as for taxons and many others, cannot be of primary importance on most big pages compared to definitions in the home language of a Wiktionary.
@Eirikr: I think cases that have both huge ToCs and images of vital importance are relatively few and some palliatives are available:
One could force an image to the left, I think.
Yes, you can. See whether you think it is a good idea at birch bracket, which should be reverted once this discussion has run its course, as the entry is short enough not to require it all. It doesn't look too good on my laptop with biggish fonts. Maybe it looks OK on other's setups. DCDuringTALK00:32, 29 June 2013 (UTC)
One could have images under a gallery header, which would follow the PoS section that was relevant.
I'm less sure of the desirability of putting images to the immediate left of a RHS TOC. For folks who have narrow screens or large type forcing images to take up horizontal screen space means squeezing down the text content, which I assume has primacy, to not much. DCDuringTALK23:12, 28 June 2013 (UTC)
I disagree with reagrd to your assertion that "cases that have both huge ToCs and images of vital importance are relatively few", and also would point out that pages that do have those conditions are likely to be high-traffic pages, and so what happens to them carries a lot more weight than a random entry. I also do not think that forcing images to the left is a good idea. Is the age of increasing use of mobile devices and smaller screens, that would have the same effect as a ToC on the left. A gallery header is also undesirable in my view, as this would put useful images well past large sections of definitions and quotations and everything else. When I'm visiting a page for a language I'm not familiar with, an image provides far more immediate definitional context than anything else on the page. I've seen various attempts over the past five years to create a viable RHS ToC, and have not been impressed by any of them. They just don't work visually or structurally. --EncycloPetey (talk) 23:57, 28 June 2013 (UTC)
Well, we have no facts to back our assertions, so we are just flapping our gums when we disagree on what should be a simple matter of fact, like which entries have high traffic. But there are certainly fewer pages with a top language section that needs an image, than those that don't. And fewer with small or no ToC than those that have one. Venn diagrams are a reasonable support for my limited inference.
But just think how much traffic ], which has been very high-traffic indeed in the not so distant past, would get if it had pictures, however they were arranged. DCDuringTALK00:32, 29 June 2013 (UTC)
Stupid question: Can the ToC be made collapsible? E.g. by default only showing the first two-three languages, with "Show more" expanding to all languages and that preference being remembered in cookies. I think that would be a good compromise with respect to users who only come to look up a single entry, and regular Wiktionary users who navigate among L2 sections via proper wikilinks and do not concern with the ToC at all. --Ivan Štambuk (talk) 03:31, 29 June 2013 (UTC)
Good question. It might work for some English-only users or those only interested in a first-listed language on a non Latin script page. I suppose users might hit the collapse icon by mistake and become thoroughly confused and put off, but we've never let that kind of thing stop us before. DCDuringTALK03:45, 29 June 2013 (UTC)
Thank you for that, Dan -- I just tried it out, and I quite like it! Entries with otherwise enormously unwieldy TOCs, like ], are now usable, without worrying about sidebars. Folks, have a look at what this CSS code does. I think we might want to consider this before doing any right-hand-side TOC changes. -- Eiríkr Útlendi │ Tala við mig19:21, 29 June 2013 (UTC)
I pefer to be able to see the subheadings, so I prefer either the right-hand TOC or fr.Wikt's TOC. The js that fr.Wikt uses to collapse their tables of contents is here, though if you copy it to your common.js, you'll notice that it takes a while to load, just like all the other js on our site. (Why? It doesn't take any time to load on fr.Wikt. Is en.Wikt too js-heavy?) You'll also notice that it currently applies itself to all namespaces, which is undesirable...- -sche(discuss)20:55, 29 June 2013 (UTC)
For the code that I posted above, notice how it looks a bit like a previous version of tabbed languages, with languages layed out horizontally rather than vertically. With tabbed languages, you don't get a TOC of any of the headings that Wiktionary has. --Dan Polansky (talk) 08:28, 30 June 2013 (UTC)
The proposed CSS could be made into a gadget, which would allow users to enable it without any of the delay caused by WT:PREFS. --Yair rand (talk) 19:50, 30 June 2013 (UTC)
Proto-Balto-Slavic vs. Proto-Baltic?
CodeCat has just told me that a decision has been made to relabel Proto-Baltic (PB) etymologies as Proto-Balto-Slavic (PBS). Since there are few if any sources that do have PBS etymologies, and the ones out there are still not extensive (no PBS dictionary yet), I think this is at least a bit hasty. The current sources (including my own copy of the Latvian Etymological Dictionary, published in 2001) refer only to PB, not PBS. To simply relabel as PBS forms that were reconstructed as PB, i.e. without taking the Slavic evidence into account, is to me wrong -- like calling Istro-Romanian simply Romanian. I haven't seen the original discussion, so I don't know which arguments were given; could we perhaps talk about that again? --Pereru (talk) 19:06, 28 June 2013 (UTC)
As you may be aware, politics often gets in the way of linguistic discussions like this. A few people have mentioned that it is often Baltic linguists themselves who disagree with grouping Baltic with Slavic, and it's quite possible that your dictionary is based on that viewpoint. Given that Balto-Slavic is generally accepted among most linguists and has been for some time (regardless of whether Proto-Baltic exists), I would consider it rather suspicious if an etymological dictionary of Latvian made no mention of it at all. —CodeCat20:24, 28 June 2013 (UTC)
But the point is not the subgrouping, but the reconstructed protoforms, which are in principle independent of the subgrouping. The question is not the acceptability of Balto-Slavic, but of specific reconstructed PB vs PBS forms -- adding Slavic may change them, no matter what the BS tree turns out to be. --Pereru (talk) 01:41, 29 June 2013 (UTC)
The easiest way to think of it is that there is no difference between Proto-Baltic and Proto-Balto-Slavic; they're two names for the same thing. So anything that your sources call Proto-Baltic can just as accurately be called Proto-Balto-Slavic. —Angr00:19, 29 June 2013 (UTC)
And that's what I would dispute. Adding one more language can influence the reconstructions, and in several cases probably does. (The LEV tells me that Ivanov reconstructs PBS *gʰel(e)gʰ-, but PB *gel(e)ž-, for "iron", for instance.) To simply relabel everything PB as PBS without carefully reconsidering the reconstructions flies in the face of everything I've learned about the historical-comparative method. It may well be true that reconstructed PB forms are mostly the same as reconstructed PBS forms would be, but that's not, a priori, the way to bet; it must be demonstrated. Has anyone done that? Is there a PBS etymological dictionary that could be cited? --Pereru (talk) 01:44, 29 June 2013 (UTC)
Proto-Balto-Slavic didn't even have a *gʰ phoneme, and the Baltic *ž is actually *ź (this difference is significant because *š (from RUKI) and *ś (from *ḱ) appeared side by side). —CodeCat02:10, 29 June 2013 (UTC)
Take it up with Ivanov (or his pal Gamkrelidze). It's their reconstruction, not mine. (Another problem with going PBS: which PBS are we talking about? Does anything with PBS on it qualify? Maybe there should be some conversation on what sources are acceptable and what sources aren't.) --Pereru (talk) 03:39, 29 June 2013 (UTC)
(Edit conflict) Yes, there's a difference, but it's the same sort of difference as the discovery of Tokharian or the decipherments of Hittite and Mycenaean Greek had on Proto-Indo-European reconstructions. Better understanding and more data points make for better reconstructions, but the entity being reconstructed remains the same (unless you believe in Indo-Hittite, of course). Should we call PIE reconstructions from before those discoveries "Aryan", because that was the name used at the time? Chuck Entz (talk) 02:13, 29 June 2013 (UTC)
Sure, Chuck. But the problem for me are the reconstructed forms. Are they right, or wrong? Do they become different, or don't they? --Pereru (talk) 03:39, 29 June 2013 (UTC)
For languages to form a clade two criterias must be satisfied 1) exclusive common innovation (e.g. sound changes that were not shared with other languages outside the clade) 2) relative chronology of changes (e.g. the second shared sound change being dependent on the output of the first shared sound change). When you take aside all such changes that are not shared with Slavic, there is not really much left for Baltic alone. You can reconstruct proto-whatever forms by comparison of various forms in various languages, but if these languages do not form a clade by the mentioned criteria, these reconstructions are just..meaningless.
However, Pereru's criticism is justifiable: Many of the Wiktionary's Proto-Balto-Slavic reconstructions seem to be original research, as are numerous other proto appendices (PIE and Proto-Germanic in particular). E.g. the Balto-Slavic word for "smoke" is reconstructed on Wiktionary as *dūmas, but Derksen in an actual dictionary of Proto-Slavic and Proto-Balto-Slavic forms (Etymological Dictionary of Slavic Inherited Lexicon, 2008) reconstructs it as as *dúʔmos with glottal stop (which some simply write as H and do not interpret as a glottal stop at all but "a merger of laryngeals introducing a particular tonal feature") reflecting PIE *h₁,*h₂,*h₃ and causing the acute and later disappears with compensatory lengthening. However, Matasović (Poredbenopovijesna gramatika hrvatskoga jezika - Comparative grammar of Croatian, 2008) reconstructs it as *dū́mas, which is more in line with Wiktionary's form. The problem is that endorsing Derksen's form would implicitly endorse glottalic theory of PIE which is kind of non-mainstream... The
The real problem is that Balto-Slavic (particularly accentology) is still an active field of research that is unlikely to "stabilize" anytime soon, so inevitably you get variant reconstructions depending on the author. However, I'd rather that we're up-to-date with modern scholarship, and update appropriately as consensus in the field is reached, rather than being stuck on older theories that have more support in literature.
The only thing that bugs me in CC's BSl reconstructions are the ones based on only one branch (or sometimes on a single language within that branch) - IMHO cases such as these should simply derive from PIE directly (the last common genetic ancestor with other attested forms). --Ivan Štambuk (talk) 03:22, 29 June 2013 (UTC)
I have nothing against anyone's reconstructions in particular, much less CodeCat's. I don't think I can criticize someone's original research in its specifics; I'm not even a Balto-Slavic specialist; my field is Cariban languages. But: if a reconstruction placed in the Etymology section of a word can be sourced -- e.g., if it is CodeCat's, Derksen's, Karulis', Mažiulis', Endzelins', or anyone else's -- then it should be. If the source is here at Wiktionary, then the link in the footnote should be to this source -- say, CodeCat's talk page, or a special subpage in which she explains her theories about PBS and presents her arguments. (And she does have some good ones, as I've noticed.) After all, the usual pratice in the better etymological dictionaries is for the author to add his/her initials if s/he adds personal opinions or unpublished research to an etymology. If we don't do that, we come dangerously close to simply having speculation without recoverable arguments presented as etymology. And I think we can do better than that.
For PBS specifically: many, in fact most, protoforms haven't even been officially reconstructed yet. To act as if they had is, at best, wishful thinking. Now, if someone wants to make a claim -- give his/her own reconstruction for PBS, or any other protolanguage -- that's OK, but then, well, own it. Add a link to your User Page, and give arguments there. Or else, why couldn't I simply replace your (unargued) opinion with my (unargued) opinion on a whim? We need to follow criteria, don't we? And specifically, I do have a problem with simply relabeling PB as PBS. This would suggest that we did what Ivan Štambuk suggested -- keep up with current scholarship -- without this actually being true, since PB = PBS hasn't been, to my knowledge, claimed yet. (Or has it?) We need to follow criteria, don't we?
For the more general question: if as Ivan Štambuk points out many PBS, PIE and PG reconstructions are original research, then they should be labeled as such, preferably with the names of their authors. I have nothing against original research; I just want it to be marked as such. After all, original research is probably not going to be the default assumption of the casual reader looking for PIE or PG information (it wasn't mine, for instance; up until now, I thought all PIE and PG reconstructions here came from some published source or other, which I could in principle check for arguments if I wanted to.)
As said, there is a single dictionary in existence that reconstructs PBSl forms and it is based on a fringe theory of Proto-Indo-European. Others are research papers and monographs focused on specific languages or phenomena (e.g. accents). Your dictionary is based on obsolete theory of Proto-Baltic. I don't really see the alternative to community deciding itself how to treat the up-to-date scholarship, even if it means creating protoforms not explicitly mentioned anywhere, but based on decided rules. If we decided to wait for what you call "official reconstructions" to be published in some comprehensive work representing the linguistic community's consensus, we could be waiting for a very long time. As the scholarship evolves so will these reconstructions be updated (cf. many PIE protoforms in the appendix namespace that underwent an evolution from Pokorny's pre-laryngeal reconstructions to what they are know, as editors have gained knowledge). Protoforms themselves are not what should or could be controversial, but the rules upon which they are reconstructed (i.e. the proto-language: phonemic inventory, agreed set of sound changes etc.) - these must reflect up-to-date scholarship. Whether some particular protoform could be cited is not that relevant, unless we're dealing with some problematic reconstruction (and in case of these there usually are plenty of references). --Ivan Štambuk (talk) 04:42, 29 June 2013 (UTC)
There are at least three options I can think of: (a) keep the old scholarship, since it exists in the sources, and refer to the new one as it comes out (so have a PB form next to a PBS one if you can source the PBS one, have only the PB form if you can't), and link somewhere to a discussion of it, like, say, w:Proto-Baltic or something better; (b) delete all PB forms, since it may very well be that they "don't exist", and add the PBS forms as the scholarly papers with them get published, or (c) do reconstructions yourself, but own them: add your name to the reconstructions, put forth arguments somewhere (a subpage of your user page, for example) and link to them (and in this case remove reference to published sources). If you don't do this, you'll be suggesting that there is consensus on the protoforms, which as you point out there isn't.
In principle, I don't have a problem with waiting a very long time--what's the hurry? But I don't think we have to. If you saw a PBS form in a scholarly source, by all means cite it here and source it. If another scholarly source has a different reconstruction, this can also be added to the etymology section of the word in question -- etymological dictionaries often cite competing proposals, which is OK.
What I don't think can be done is to PRETEND that there ALREADY is a solution or consensus, as the idea of simply relabeling PB forms as PBS would imply. Again, as you yourself said, there is no such consensus. To pretend here that it exists is simply wrong.
Maybe the LEV PB reconstructions are obsolete, but, if so, again the solution is not to relabel them as PBS (which would beg the question); rather, we should ask ourselves whether or not they should simply be deleted. For every etymon, is there already a better reconstruction, somewhere, in some theory, in some scholarly paper or book? Cite it and source it. Or don't, if you think it's "fringe". Or add your own, with your name and/or your arguments, or a link to a page with your arguments if they're long (if it's your idea, why not?). But DON'T relabel the LEV reconstructions as if they were not LEV. That is, well, misquoting a source, in addition to the whole aforementioned problem (PB = PBS is not an obvious equation, assuming it begs the question, etc.).
So: it's not a question of "waiting for the PBS etymological dictionary" (though I'd have nothing against waiting for it -- again, what's the hurry?) -- it's a question of saying what it is you're writing in as etymologies: YOUR etymologies, SOMEONE ELSE'S etymologies (if so, whose?), or something speculative (nothing against it, just label it as such). Am I really saying something so controversial here? And what exactly is the problem with implementing (a), (b), (c) or any variation thereof? --Pereru (talk) 15:03, 29 June 2013 (UTC)
I agree that simply renaming PB under PBSl label is wrong. However, superseding PB with newer PBSl reconstructions is not problematic. PB would OTOH be convenient as a cover term for reconstructions representing the exact same "stage", but which have not been retained in Slavic, but that would introduce an unnecessary confusion... I don't really see any problem in adding PB reconstructions. Even today there are editors creating PIE protoforms with schwa indogermanicum - these get cleaned up/updated eventually.
Regarding the citation of protofoms - as I've explained above, it is not the the exact written form of protoforms which is important and should be cited, it is the agreed set of sound changes, phonemes (and their symbols), and inflectional endings that matter. So what if a particular form cannot be cited in some paper or book? We might as well agree to treat every reconstruction from Derksen's dictionary in a modified form (e.g. *V: instead of *Vʔ) for our purpose. Protoforms are not "facts" as attested words are. They are points of convergence of agreed set of characteristics that define a proto-language. You make a single change in that set and many protoforms could change form. Choosing only cited protoforms, by various authors, that are written on different set of such conventions would make no sense. We must agree on a single one (based on the most mainstream and up-to-date scholarship) and use it as a standard. There is really no problem in citing alternative reconstructions in the appropriate appendix page.
Your idea of sourcing reconstructions to authors is intriguing. However, we should really depersonalize such activity. We could, for example, compile in an appendix page a tabulated, enumerated list of sound changes (from language ancestor such as PIE, to some destination (proto-)language) and provide extended reconstruction for interested parties with various intermediate stages referencing by number applicable subset of sound changes. E.g.
PIE *PIE-form>2 *intermediate-form15 > *intermediate-form2 >6 ... *intermediate-formN>9 *proto-XXX
Where subscript numbers 2, 5, 6, 9 would wikilink to the appendix page table rows explaining what exactly is going on. Individual changes would be chronological (if that is possible to determine, sometimes it's not), commented and cited with up-to-date literature. Thoughts? :) --Ivan Štambuk (talk) 16:07, 29 June 2013 (UTC)
No - Pereru's criticism was on the lack of citations regarding specific reconstructions. I was trying to explain that the the creation of uncitable protoforms was inherent in the standardization procedure, and that such forms are uncontroversial if they follow some agreed set of conventions (which themselves can be cited). Instead of creating protoforms out of the blue, which can strike as an original research on part of creator to someone unfamiliar with the matter, I proposed that instead we expand the entire derivational chain linking to specific (cited, and uncontroversial) sound changes affecting the protoform. I'll try to make a more illustrative example later.
Regarding the mentioned appendices - I don't really see of what purpose is the internal proto-language history (i.e. things like First and Second Palatalizations for Proto-Slavic). This is stuff for Wikipedia. Appendix pages for proto-languages should deal with details of notation of reconstructions, which rules to follow and how to handle various corner cases.
And yeah, about-page for PBSl really should have been created first. I know that almost all of these PBSl forms are "obvious" if you have sufficient background, but regardless.. --Ivan Štambuk (talk) 19:10, 29 June 2013 (UTC)
I tried to add information to the about page that is relevant for the language synchronically. Just like for Proto-Germanic you can't really discuss the grammar without making reference to Verner's law, the spirant law or i-mutation because they have an effect on the grammar, not just the reconstructed lemmas. In PS the palatalisations and the vowel fronting pervasively affect the morphology of the language so they are very important in giving a good description and allowing people to understand the reconstruction. For example, without such knowledge it can't be explained why *lice ends in -e while *nebo ends in -o, and *uxo has consonant alternations, even though they all belong to the same declension class. —CodeCat19:31, 29 June 2013 (UTC)
Sure, but understanding the intricacies of etymological reconstruction is IMHO outside the Wiktionary's project scope. Everyone dabbling in protoforms is presumed to have that knowledge already, and about-pages should simply be a an overview of how to translate that knowledge into a standardized set of rules that Wiktionary follows. Internal history of proto- and real languages really belongs to Wikipedia or Wikibooks. From PIE to modern Baltic or Slavic word there are at least dozen of sound changes, and some half a dozen accentual, some of which are really obscure and cannot be covered by any such "general overview". Our real language about-pages (WT:AEN etc.) don't deal with with historical changes in English etc. - some of which are arguably even more extensive then in their nearest genetic proto-ancestor's history, but on which templates to use, how to format entries, what general conventions to follow etc. --Ivan Štambuk (talk) 22:15, 1 July 2013 (UTC)
Indeed, my beef is with etymology sections of specific words, and what is put there. I have nothing against having pages on Proto-Balto-Slavic and its sound laws, or pages with reconstructions by Wiktionarians. My entire point is simply that it makes no sense to give a certain reference to a reconstructed protoform -- say, the LEV -- and then have this protoform be changed, without at the same changing the reference. (And also, that fully relabeling PB as PBS in the absence of PBS claims would be, well, just plain wrong -- but we seem to already agree on that.) If you have reasons to change a given PB reconstruction to PBS, and the PB reconstruction in question is sourced to the LEV, then I think you need to (a) remove the LEV reference, or (b) add another reference (to the source that has it as PBS -- even if this source is yourself) and something -- a couple of words in the etymology, or a footnote -- making clear which is which. Or else, you're making a wrong claim about what someone else said.
Now, on protoforms -- I agree that any protoform that is basically a respelling doesn't could as independent and doesn't need to be quoted as different. Ditto for protoforms based on different sets of assumptions (as long as the assumptions are mentioned somewhere and linked to said protoform). So I have no problem with a change like the one you mention for Dirksen's protoforms -- and I even have no problem with still citing them as Dirksen's despite the changes. They're respellings that basically change nothing. But if you do change something essential -- even if you just change theory, by, say, going glottalic in your interpretation -- then you have to mention that.
So I technically disagree that the work of standardization leads to "unsourceable forms". If you're simply retranscribing, it's still sourceable to the original source. If you're changing theories and interpretations, you cite both the original source and one that defines the new interpretation you're using (say, glottalic). Specific pages describing the details for the unfamiliar reader can be added. If you are choosing between sets of parameters defended by different authors -- again, the author whose parameters you're using is source. And if you're changing more than that -- say, if you're extending to new words some idea that has thus far been applied only to a few reconstructions in a few papers; or if you're coming up with your own set of parameters, slightly different from all others in the literature -- then you cite yourself as taking that further step. In each case, there is a possible source. Or am I missing something?
Finally, even though I agree that sound changes and specific reconstructino parameters are the most important thing, I disagree to some extent that that is the only thing. Words do have, to some extent, their own history, which is what makes etymological dictionaries a good idea -- or else, we'd simply need the rules + parameters, the cognate sets, and the authomatic results of applying said rules to the data, leading to a simple list of reconstructed protoforms, without any further explanations. That this is often not what one finds in etymological dictionaries, however, is what makes them interesting to read, and the pursuit of specific etymologies for specific words a fascinating puzzle. --Pereru (talk) 20:26, 29 June 2013 (UTC)
The forms that I created entries for are mostly forms that are either attested in both Baltic and Slavic (and can't be borrowed), or attested in one of them as well as reconstructed for PIE. In both cases, it's pretty certain that a PBS form also existed at some point, particularly in the latter. So I tried to focus on that first so that the easiest and most secure reconstructions were out of the way. —CodeCat20:37, 29 June 2013 (UTC)
And I've created references to these sources in two words, Latvian sniegs and Lithuanian sniegas "snow", as an example of how the referencing can be handled in etymology sections. (It is of course repetitive to have the reference in both the etymology section and in the protoform page, but at least in the Latvian case this is unavoidable, since there is information about the evolution of the word into Latvian not mentioned in the protoform page but found in the LEV. Or should such information be perhaps also placed in the protoform page? Should every "descendant" form in the protoform page also have paragraphs describing details or problems with their derivation, or is this better placed in etymology sections?) --Pereru (talk) 18:53, 30 June 2013 (UTC)
Mentioning that in the etymology is probably better. The descendants section would become very messy otherwise. —CodeCat18:57, 30 June 2013 (UTC)
Yes, I see what you mean. And I agree. (A different question, though: I had a quick look through Kim's article and couldn't find *snaigas in any of his examples. Is it really mentioned there? If not, then he is a source only for the sound changes, but not to *snaigas itself, in which case he is not the appropriate source for that specific PBS reconstruction.) --Pereru (talk) 19:01, 30 June 2013 (UTC)
According to modern research in the field, Baltic languages don not constitute a genetic node. Proto-Baltic can be reconstructed by comparative method, but such forms are meaningless since Baltic languages alone do not form a separate "branch", having exhibited a period of exclusive common innovation.
Adding Proto-Baltic reconstructions from older etymological dictionaries (e.g. LEV from 1992, based on the 1980s scholarship) is OK. These forms will be eventually be updated to Proto-Balto-Slavic forms, unless it is e.g. decided to have the term Proto-Baltic capture the notion of the exact same stage of proto-language, but only having reflexes in Baltic branch (just a proposal of mine), or to deprecate the term altogether. However, reverting the updated BSl. reconstructions as has been done here and here is really unwelcome. If user is not familiar enough with language's history to assess which proto-form is more "up-to-date" with modern theories, he shouldn't be reverting the work of others who are.
Asking for citations for precise spellings of proto-forms is by itself meaningless. Asking for citations for proto-forms which are pretty much obviously reconstructed is meaningless as well. The reverted PB. galwā is identical to Early Proto Slavic, whence Late Proto Slavic galva (the form which we call "Proto-Slavic") is derived through trivial sound changes that every Slavicist has in his little finger. The change from PB to PBSl in cases such as this is uncontroversial. For controversial reconstructions (e.g. invoking some obscure semantic shifts and analogies) proper citations should be mandatory. For those formulaic ones - not. --Ivan Štambuk (talk) 22:41, 1 July 2013 (UTC)