Hello, you have come here looking for the meaning of the word Wiktionary:Grease pit/2017/May. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Grease pit/2017/May, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Grease pit/2017/May in singular and plural. Everything you need to know about the word Wiktionary:Grease pit/2017/May you have here. The definition of the word Wiktionary:Grease pit/2017/May will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Grease pit/2017/May, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
I can do this, but I want to make sure there's consensus for this before doing something major like this. Any objections? Benwing2 (talk) 03:39, 2 May 2017 (UTC)
The only objection is that we shouldn't rely on the boxes to provide these categories. For languages for which no one has added these boxes, the numbers should still be properly categorized with the headword templates and such. --WikiTiki8921:25, 2 May 2017 (UTC)
Sauraseni Prakrit transliteration
When I link to a word in Sauraseni Prakrit (psu) written in Devanagari (according to Module:languages/data3/p the only script for this language) with a template like {{l}} or {{m}}, the transliteration comes out in Devanagari as well, which seems counterproductive (e.g. वेदस(vedasa)). Either we should automatically transliterate the Devanagari into the Latin alphabet, or we shouldn't transliterate it at all. But "transliterating" it into the exact same script can't be right. —Aɴɢʀ (talk) 14:06, 2 May 2017 (UTC)
I think the problem is that it invokes Module:Brah-translit as if it were using Brahmi script, when it should (if it is written in Deva) invoke Module:Deva-Latn-translit. Any character that is not recognized by a transliteration module is "transliterated" as itself. ... But having made that change, I see that the instance above now produces a module error, so I undid my edit. - -sche(discuss)17:21, 2 May 2017 (UTC)
Or at the very least, edit some module so that no transliteration appears at all, as already happens for other Devanagari-script languages like Awadhi and Bhojpuri: {{l|awa|वेदस}} and {{l|bho|वेदस}} simply surface without transliteration, and that would be preferable for Sauraseni rather than this ridiculous "transliteration" into itself. —Aɴɢʀ (talk) 21:19, 2 May 2017 (UTC)
Yeah, I think interlingual script translit modules should check the script before transliterating (and return nil if it's the wrong script). --WikiTiki8921:24, 2 May 2017 (UTC)
It's quite bizarre that Module:Brah-translit was added as the transliteration module for psu, since the only script listed is Deva. So, I removed the transliteration module from the language data file. — Eru·tuon21:30, 2 May 2017 (UTC)
So we have automatic interlanguage links now. I don't know where this is coordinated, but it apparently isn't working for entries containing certain non-ASCII characters. When I tried removing the interlanguage links from mało, łopata, and Sigolène, they vanished. The same problem is occurring in some other languages as well, but not all: cy:łopata is showing the links (although they've been removed from the page by bot), but pl:łopata has no links showing. —Aɴɢʀ (talk) 20:08, 3 May 2017 (UTC)
The Cognate extension has been discussed at length and announced at N4E, but the problem that it works on some pages and not on others has AFAICT not been discussed yet anywhere. —Aɴɢʀ (talk) 20:30, 3 May 2017 (UTC)
I know, which is why I pinged someone who can address it. My links were in response to your statement "I don't know where this is coordinated". —Μετάknowledgediscuss/deeds23:07, 3 May 2017 (UTC)
Is it only some articles for which it doesn't work, or did it all work yesterday but is all broken now? LA2 (talk) 23:10, 3 May 2017 (UTC)
Thanks for your messages. Indeed, if you find some bugs or other things that Cognate doesn't do correctly, you can ping me and I'll transmit it to the developers. I created a ticket with the examples you mentioned.
Right now, like Vriullop mentioned, a problem occurs with the extension. We try to solve it as soon as possible, thanks for your understanding. Lea Lacroix (WMDE) (talk) 07:47, 4 May 2017 (UTC)
Broken script and links in new request categories
None of the request categories have links that point to the appropriate language section anymore. They are also no longer formatted with the appropriate script. This was still done when they were implemented by {{poscatboiler}}. —CodeCat20:51, 3 May 2017 (UTC)
Okay, I've made that category use an English catfix. If there are other categories like that, it can be done the same way. — Eru·tuon02:19, 4 May 2017 (UTC)
I think it can be attributed to the strain of introducing a major new extension at the same time they've been doing major work connected to the new backup-server location. With all that going on at the same time, it's a wonder that WMF's already-overextended technical staff hasn't completely lost it... Chuck Entz (talk) 02:15, 5 May 2017 (UTC)
Listing scripts for etymology languages
Now that we are using etymology language codes not only in {{etyl}} but also in templates like {{der}} and {{desc}} which take a term, we should list scripts for these languages in Module:etymology languages/data as well. In particular, I wanted to add Zzzz, Mani, Syrc to sog-bud, sog-man, and sog-chr, but I guess there won't be a difference since our modules are developed in a way to fetch data from Module:languages/data2 (etc.) only. --Z07:55, 5 May 2017 (UTC)
@ZxxZxxZ: It probably wouldn't hurt to add script codes to Module:etymology languages/data, since Module:etymology languages will probably just ignore them. But I can only imagine them being useful if the etymology language uses a subset of the scripts used by its parent language. In that case, they could be used to provide an error message if the script supplied to the template is not used for that etymology language, even if it is used for other varieties of the parent language. I can't think of examples where that would be needed, but perhaps there are some. — Eru·tuon08:56, 5 May 2017 (UTC)
@Daniel Carrero, Ungoliant_MMDCCLXIVMiddle Dutchhollant is a county that doesn't belong to a larger entity. Countries as such didn't exist yet, and the status of various polities as county, duchy, bishopric and such is largely arbitrary and probably not worth subcategorising. So I made brabant, a duchy, categorise in Category:dum:Polities. Can hollant be also made to categorise in that? I was able to add a new place type (duchy) but since county is already used by existing entries, I don't know how to do it. A thorough documentation of Module:place/data would certainly be appreciated for future reference! —CodeCat20:57, 5 May 2017 (UTC)
I see it too, even when logged in: <img src="//upload.wikimedia.org/wikipedia/commons/thumb/7/72/Disambig.svg/25px-Disambig.svg.png" alt="" style="width: 25px; height: 20px;" />. A quick guess is that perhaps the person who translated the banner at the top of the page that links to into Icelandic may have messed something up in the code for it. I suggest leaving a post at meta:Meta:Babel asking them to check that everything is alright on their end. Or perhaps a recent change to some is.wikt site-wide .js or .css has done it? - -sche(discuss)17:10, 6 May 2017 (UTC)
Display of vertically written languages (Mongolian, Manchu)
Is it possible to convert a space in terms of these languages to a new line character (while keeping the transliteration unchanged), when they are generated by the link templates?
One problem: it will only transform spacing characters into <br> tags when the space is inside a link. Any spaces outside a link will not be converted to newlines. For instance, {{m|mnc|] ]||Milky Way}} displays as ᠰᡠᠩᡤᠠᡵᡳ ᠪᡳᡵᠠ(sunggari bira, “Milky Way”). There may be a way to fix this. — Eru·tuon23:27, 7 May 2017 (UTC)
@Erutuon Thanks. One correction is needed though: The character ‹› (U+180E, MONGOLIAN VOWEL SEPARATOR) should not be converted to a line break. An example is at тарвага(tarvaga); ᠲᠠᠷᠪᠠᠭᠠ is currently displayed as ᠲᠠᠷᠪᠠᠭᠠ(tarbag-a). Wyang (talk) 12:40, 8 May 2017 (UTC)
In ᠭᠠᠩᠰᠠ, the first instance of "ᠭᠠᠩᠰᠠ", above the language header displays horizontally, while the headword line displays vertically. I know we can italicize particular pages' titles (see w:Template:Italic title), can we cause these pages' titles to display vertically?
Also at ᠭᠠᠩᠰᠠ, it might be a more economical use of space to display the usexes side-by-side (perhaps in box elements?) instead of one after the other; can/should this be done? Ideally, the boxes would "break" onto new lines/rows based on screen width, or at least the current display would be preserved on the mobile version of the site.
Alternatively, perhaps {{usex}} (or a script-specific template) could take usexes of Mongolian-script and either link each individual word with a "black link" (a link that looks like regular text, as used in e.g. some inflection tables), or else somehow subject them to Erutuon's excellent line-breaking feature without linking anything, to reduce the amount of vertical space the usexes take up.
As the class Mong isn't applied to the heading at the top of the entry ᠭᠠᠩᠰᠠ(ɣangsa), the heading displays horizontally instead of vertically. {{DISPLAYTITLE:}} or a JavaScript function can be used to apply the class to the article title (see this edit, for instance). JavaScript would require less editing than {{DISPLAYTITLE:}}, but {{DISPLAYTITLE:}} would work even for readers who don't have JavaScript enabled. — Eru·tuon08:33, 8 May 2017 (UTC)
It seems to be possible to obtain verticalization even by transcluding a template that includes {{DISPLAYTITLE:{{lang|mn|{{PAGENAME}}|sc=Mong}}}}. Would it be feasible and desirable to edit the headword-line templates of languages that use Mongolian script to use such an approach (perhaps different templates could be used on Mongolian- vs Cyrillic- pages, or one template could be smart enough to only apply DISPLAYTITLE on Mongolian-script pages), so that rather than separately spelling out {{DISPLAYTITLE:}} on every page, merely adding a standard headword-line template would verticalize the page title? - -sche(discuss)09:01, 8 May 2017 (UTC)
Hmm, that's an interesting idea. I wonder if a Lua module can use the {{DISPLAYTITLE:}} magic word (or if it can correctly expand a template that contains that magic word). If it can, the headword module could certainly do what you suggest. That would also be nice, though perhaps less important, in entries using other scripts. For instance, it would be great if Ancient Greek entries could apply the class polytonic to the title, though in some cases the Modern Greek entry would be found on the same page, and Modern Greek uses the class Grek instead. — Eru·tuon09:58, 8 May 2017 (UTC)
@Vriullop: I copied your Lua code to Module:User:Erutuon/Mongolian and am testing it at User:Erutuon/ᠭᠠᠩᠰᠠ, but it doesn't seem to be working. In preview mode, it gives the message Warning: Display title "<span class="Mong">Erutuon/ᠭᠠᠩᠰᠠ</span>" was ignored since it is not equivalent to the page's actual title. Rather odd. — Eru·tuon12:33, 8 May 2017 (UTC)
@Vriullop: I find display:table-caption; causes the quotations in ᠭᠠᠩᠰᠠ(ɣangsa) to overlap, and the headword to overlap the bullet that is after it. See the screenshot to the right. — Eru·tuon13:04, 8 May 2017 (UTC)
Thanks to @Vriullop, I tried again and figured out how to make the display title thing work. So the top header in ᠭᠠᠩᠰᠠ(ɣangsa) and other Mongolian-script entries will now have the class Mong and will display vertically. The same can perhaps be done with other scripts. — Eru·tuon13:26, 8 May 2017 (UTC)
I notice that the entries in Category:Manchu lemmas are vertical (and remain so even when I add testentry to the category, which itself becomes vertical), whereas Mongolian-script entries in Category:Mongolian adjectives are horizontal (probably so that the more numerous Cyrillic entries display correctly?). I suppose there's no way around that.
What does this newline conversion code do when it encounters spaces in code, like say in a HTML tag or Wikimarkup? —CodeCat18:05, 8 May 2017 (UTC)
Well, {{usex|cmg|ᠮᠥ<a href{{=}}"https://en.wiktionary.orghttps://dictious.com/en/ice">ᠰ</a>ᠦᠨ|ice}} results in the "<a href="https://en.wiktionary.orghttps://dictious.com/en/ice">" being broken at the space and displayed as text (see here), but under what circumstances would such a thing occur validly? I had to use {{=}} just to make that "work" (i.e. not just resolve to a Lua "parameter not used" error). (If there are valid usecases, perhaps we could add a switch by which users could suppress space-to-newline conversion in individual instances?) If the concern is that it might sometimes be necessary to call a template that includes a space in its name inside a {{usex}} or the like (as in this contrived example), that could be solved by having a redirect to the template from an unspaced name. - -sche(discuss)19:54, 8 May 2017 (UTC)
The code I've written assumes that the text being script- and language-tagged only contains wikilinks, nothing else. I'm not satisfied with it, because it's overly verbose. And it should be able to find and ignore anything that should not be transformed (HTML tags, the target part of wikilinks) and then just replace spaces in the rest. That would be much simpler. I'm trying to figure out how to do that. — Eru·tuon22:17, 8 May 2017 (UTC)
Done. Anything that should not have its spaces transformed (currently, wikilink targets and HTML tags) is escaped before the replacement, then unescaped. — Eru·tuon22:50, 8 May 2017 (UTC)
It's good that our articles nowadays automatically get interwiki (inter-language) links. But our semantic categories still need manual linking (right?), the way it used to be in Wikipedia. So, once I found out that Category:Hormones should link to ru:Категория:Гормоны and sv:Kategori:Hormoner, is there some tool (on the WMF toolserver) that helps me in interwiki linking all the subcategories such as Category:en:Hormones to their counterparts in the other languages? --LA2 (talk) 21:25, 7 May 2017 (UTC)
I'm working on some changes to {{desctree}} in {{desctree2}}, which I outlined here. I'm trying to logic out how to deal with the alternative forms. I could employ some method using |altN=, but I would prefer it if I could use * {{desctree|goh|apful}}{{l|goh|aphul}}, ... and have the imported list of descendants start on the next line. Is there some way to do this in Lua, like table.insert(arr, currentline+1)? --Victar (talk) 04:37, 8 May 2017 (UTC)
@Victar: I don't think that's possible. A Lua-based template can only insert content at the position where you put it in the text. — Eru·tuon04:39, 8 May 2017 (UTC)
@Victar: The module might be able to grab the content from the {{l}} after the {{desctree}}, but I'm not aware of any way in which it could delete the content of that {{l}}. So, you would still have the alternative form aphul repeated after the descendant tree. — Eru·tuon05:54, 8 May 2017 (UTC)
Apply a font to cuneiform page titles
Erutuon, with some help from Vriullop and me, excellently found a way to apply a class to the page titles of Mongolian-script pages so that they display correctly. Can we similarly apply a class to the pagetitles of cuneiform pages, so that they are rendered in the same font as they are when they are linked (𒋼), rather than the default font they are currently rendered in, which displays them as boxes (for me)? More generally, we could perhaps apply classes and hence relevant fonts to many non-Latin scripts which might otherwise display as boxes. - -sche(discuss)19:39, 8 May 2017 (UTC)
You can add script-tagging to the title for any scripts you want, by adding the script code to the to_be_tagged table near the top of Module:headword. — Eru·tuon01:27, 10 May 2017 (UTC)
You mean they do now, or they always did? I think that whether or not they display without script-tagging depends to some extent on whether a user already has a font installed that can display them, and on how recent their operating system and browser are. Script-tagging them helps them display for more people. - -sche(discuss)00:46, 11 May 2017 (UTC)
How "expensive" is this script-tagging of page titles? Do we know? If it's "inexpensive", perhaps tagging most non-Latin scripts would be appropriate. - -sche(discuss)00:46, 11 May 2017 (UTC)
I doubt it's very expensive. The script code is already present in Module:headword, and it is quite quick to look it up in the table of to-be-tagged scripts. — Eru·tuon01:22, 11 May 2017 (UTC)
Something to consider: the software disallows overriding the display title more than once. I believe the second attempt results in an error. That means that any entry with more than one headword line will break. —CodeCat00:59, 11 May 2017 (UTC)
I don't think it does cause an error. I have added tagging for polytonic, and that would probably make the display title function be called twice in a page such as ὅς(hós), yet there's no error there. — Eru·tuon01:25, 11 May 2017 (UTC)
However, if I tried to add tagging for Grek along with polytonic, on a page like λόγος(lógos) that has an entry for Modern Greek as well as Ancient, I suspect the <spanclass="Grek"></span> added by the Greek headword template would override the <spanclass="polytonic"></span> added by the Ancient Greek one, because the alphabetization makes it be the last on the page. It seems the last display title on the page is what is actually used. I tested this by adding {{DISPLAYTITLE:ὅς}} on ὅς(hós): if I place this DISPLAYTITLE parser function at the top, it's overrided by the polytonic-class display title added by the headword template; if I place it at the bottom, it overrides the headword template. So, the last DISPLAYTITLE on the page is the one that wins out, but there is no error. — Eru·tuon01:33, 11 May 2017 (UTC)
Oops! There is an error, at the bottom of the page, in preview mode at least: Warning: Display title "ὅς" overrides earlier display title "<span class="polytonic">ὅς</span>". When the display titles added by subsequent headwords are identical, there seems to be no such message. — Eru·tuon01:43, 11 May 2017 (UTC)
@Romanophile: I think what "missing plurals" means is that there is currently no entry for the plural. Module:fr-headword, which is used by {{fr-noun}}, checks to see if the entry title for each plural form exists. Feminine nouns without entries for their plurals are also put in this category: see accessoirisation. — Eru·tuon02:58, 10 May 2017 (UTC)
The current spelling templates produce confusing and misleading entries, at least the one for American spelling. Now almost all users interpret demonize, for example, as saying that its meanings are American, not its spelling (which isn't even true). --Espoo (talk) 07:03, 10 May 2017 (UTC)
I'd thought the spelling e instead of ae might be not in use in UK English, but it seems we have editors who don't even know the basics of UK spelling and think that -ize is not also UK. Is there a bot that can find and fix this kind of nonsense? --Espoo (talk) 07:13, 10 May 2017 (UTC)
@Erutuon I'm trying to use callback instead, but I can't seem to get it to work. Could you take a peek and see what I'm doing wrong? --Victar (talk) 02:35, 11 May 2017 (UTC)
@Victar: I'm sort of a newbie with JavaScript, but perhaps it's not working because you only escaped the first curly bracket in each group? Each one should be individually escaped. — Eru·tuon03:02, 11 May 2017 (UTC)
I copied your code to my common.js and correctly escaped the brackets, but it still doesn't add anything to the toolbar while I'm in editing mode. — Eru·tuon03:33, 11 May 2017 (UTC)
The instructions for adding your own toolbar button that uses a JavaScript function leave me at sea as to just how the callable function is supposed to work. — Eru·tuon03:54, 11 May 2017 (UTC)
@Erutuon: Yeah, the documentation really sucks. Thanks for looking into it. I just reverted it back to my original version so I can at least use it for now. --Victar (talk) 04:17, 11 May 2017 (UTC)
Show-and-hide templates not working properly
The templates in the "derived terms" and "translations" sections of our entries ({{trans-}} and {{rel-}}) are not working anymore. They normally have a button to show and hide the content inside them, but this button apparently disappeared, making the templates nonfunctional. I've tried different browsers and computers and still get the same problem. What is the matter? - Alumnum (talk) 04:06, 11 May 2017 (UTC)
Additionally with Chrome (but not Netscape or Edge) I have a long blank space between the bottom line and the category section - but only when there is a pull down table (like inflections) present. — Saltmarsh. 04:46, 11 May 2017 (UTC)
Using Chrome logging out make the long blank space (see above) disappear, by the show/hide problem is still there — Saltmarsh. 05:22, 11 May 2017 (UTC)
Yup, all quotations are no longer visible, and the "Show translations" and "Show quotations" links on the left side of the screen are gone. Quiet Quentin also no longer appears. — SMUconlaw (talk) 08:48, 11 May 2017 (UTC)
Not loading LegacyScripts was provoking a cascade of errors in unexpected places. Once fixed, the problem with TabbedLanguages is beyond my understanding, but obviously they are related and provoked by last version of Mediawiki installed last UTC night. --Vriullop (talk) 15:57, 11 May 2017 (UTC)
Help, declension tables no longer are openable
I think this applies to all collapsible tables. I see it e.g. on the page любимица, where the declension table is missing the button to open it. User:JohnC5 noted something similar in the April Grease Pit. I don't have any Javascript, and I don't even know what skin I have. (How does one switch skins?) I tried this on Chrome and Safari (on Mac OS 10.9.5), same thing. It even happens when I log out. Any ideas? Benwing2 (talk) 05:58, 11 May 2017 (UTC)
Is the JavaScript function that handles this collapsible content a local Wiktionary thing or a global MediaWiki feature? — Eru·tuon06:06, 11 May 2017 (UTC)
Is it my phablet or is there a bigger problem? I restarted the phablet, but nothing happens when i click on a translation box in Android Chrome and Firefox. It worked yesterday. --Espoo (talk) 07:40, 11 May 2017 (UTC)
Just discovered that all other drop-down boxes are broken too, f.ex. conjugations. Also discovered it's only broken in desktop view, not mobile -- both on phablet. Will now test on laptop. --Espoo (talk) 07:58, 11 May 2017 (UTC)
Thank you, this is informative. Vriullop describes the same message on ca.Wikt, and a fix which could be tried. I wonder what caused the gadgets to break / to 9apparently) need |type=general] to be declared... - -sche(discuss)09:51, 11 May 2017 (UTC)
Apparently there is a general issue where "Gadgets that use both scripts and styles, but do not specify type=general, are never loaded (JS file not loaded but CSS file is)" (per a report on Phabricator). But type=general appears to have been set for TabbedLanguages in this diff; does the problem persist? - -sche(discuss)19:02, 11 May 2017 (UTC)
This was one of the collateral effects by LegacyScript not loading. Once fixed, this error message does not appear any more and purgetab is working fine. But the problem with TabbedLanguages persists. --Vriullop (talk) 20:21, 11 May 2017 (UTC)
CSS classes for transliterations
It would be beneficial to have CSS classes for transliterations. Many transliteration systems use characters that will not display well in all fonts. With predictable classes, fonts could be specified in MediaWiki:Common.css to ensure they display well. Or users can set whatever fonts they like in their common.css. I've wanted to do the latter. Or users can write JavaScript functions that convert one transliteration system to another, allowing people to see whatever transliteration system they prefer.
Currently, all transliterations are tagged with class="tr", if generated by {{l}}, or with class="tr mention-tr", if generated by {{m}}. So, there is no way to distinguish between transliterations of different languages when using JavaScript, and absolutely no way when using CSS. They are all identical.
I was thinking the class names could be tr- plus the language code. Then, the transliteration class for Ancient Greek (language code grc) would be tr-grc. That would be following the tradition of using |tr= for transliteration, and of using the class tr to mark transliterations in general. Or perhaps translit- plus the language code could be used instead (translit-grc). Or the order could be reversed (grc-tr or grc-translit). This reversed order would resemble a language code combined with script code (fa-Arab), though translit isn't a script. Not sure which of these ideas is best.
These class names would be added on to the default ones: thus, the class for an Ancient Greek transliteration in the {{m}} template would be class="tr mention-tr tr-grc". So, no existing functionality would be broken.
This idea is perhaps not absolutely necessary, but it would allow for easier customization of transliterations with CSS and JavaScript. — Eru·tuon05:49, 11 May 2017 (UTC)
I don't think this is necessary, as it is possible with CSS selectors to style by language (e.g. .tr:lang(grc)). This was added in CSS2 and has wide support. – Krun (talk) 11:28, 11 May 2017 (UTC)
@Krun: But at the moment there's no language attribute (lang="grc") added to a language's transliteration, so that selector wouldn't work. For instance, in the headword of ἀρχιμανδρῑ́της(arkhimandrī́tēs), the transliteration is coded as follows: <spanclass="tr"lang=""xml:lang=""><spanclass="tr"lang=""dir="ltr"xml:lang="">arkhimandrī́tēs</span></span>. This can only be selected with .tr or :dir(ltr). Perhaps Module:links (which currently handles transliterations) should add the language attribute to the transliteration. This would be fine since MediaWiki:Common.css currently doesn't use language codes alone to set fonts (if it did, adding language attributes to transliteration might cause the wrong fonts to be applied to transliteration). — Eru·tuon16:43, 11 May 2017 (UTC)
On the other hand, it might cause problems for screen readers, which would try to read the transliteration because it was tagged with a language code, and would likely fail because they would only be able to read the language's native script. Hence, separate classes for each language's transliteration might be a better idea. — Eru·tuon16:45, 11 May 2017 (UTC)
@Erutuon: Ah, I didn't realize the language code was missing. The transliteration should definitely be tagged with the relevant language code (in this case grc), as it is text in that language, whatever the script. It's definitely a problem if it isn't, because in that case it inherits the language designation of a parent/ancestor tag or the entire page, namely en, and would e.g. be read by a screen reader as text in English! When the script is different, as in this case, that can and should be indicated in the HTML (lang="grc-Latn"). – Krun (talk) 10:17, 12 May 2017 (UTC)
@Krun: Hmm. lang="grc-Latn" sounds reasonable, but what about class="(tr mention-tr )Latn" lang="grc"? Putting in Latn as a class would be more consistent with the way we usually handle scripts. — Eru·tuon01:20, 14 May 2017 (UTC)
I've gone and done the script class and language attribute solution. Now a language's transliteration can be selected with, for instance, :lang(grc).Latn. — Eru·tuon02:03, 14 May 2017 (UTC)
What annoys me about adding language attributes to transliterations is that my browser (Chrome) adds undesirable styling to them: for instance, full-width font to transliterations of Japanese. That is what I hope to avoid by naming the classes tr-language code. — Eru·tuon01:52, 16 May 2017 (UTC)
This is also happening to me in Safari. The transliteration for Japanese, e.g. is being displayed in Hiragino rather than the basic site-wide font. I tried changing the language code on the page (using the browser inspector) to ja-Latn and that worked (i.e. that makes it use the site-wide font). I really think it's more appropriate to append the script tag in this way in every case where we are using something other than the regular script for the language, and, in any case, it seems to work styling-wise. – Krun (talk) 16:03, 17 May 2017 (UTC)
@Justinrleung mentioned on Module talk:links § Script tags for transliterations that for him, lang="language code-Latn" still applies incorrect fonts to the transliteration. It seems his browser (Firefox) ignores the -Latn part and applies fonts based on the language code. This could be solved by replacing the language attribute with the class I suggested above: class="tr-language code". However, I tried using Firefox and didn't see the same problem. Not sure why that would be. — Eru·tuon05:57, 19 May 2017 (UTC)
It is possible to make a list auto-sorted by Lua modules. However, every language is alphabetically ordered in its own way, so it must be per-language concept. The extension just globally sorts them out in one way which might not be preferable. --Octahedron80 (talk) 11:52, 11 May 2017 (UTC)
Possibly I'm doing something wrong. I'm typing a new word into the search box, eg ghzz. Then I click on the "create it" link in the search results. This takes me to this page, which the url claims to be the User:Yari_rand app for new creations, but the edit box is completely blank and there is no template for selecting entry parameters. Monobook, Firefox 53.0.2, Windows 7 SP1 64bit. SpinningSpark18:10, 11 May 2017 (UTC)
Looking at this further, it seems that it is something in my own monobook.js that is breaking it. Since I haven't changed anything in my personal js recently, I guess this must be an old script-pocalypse issue. SpinningSpark21:05, 11 May 2017 (UTC)
Translation pop-ups not opening?
When I checked a word today (11 May), the translation pop-ups did not display an 'open' button.
The html text of the translations is still there, if I click on 'edit', so I am assuming a technical problem of some sort.
I checked a number of other words, and they are all doing the same thing. — This unsigned comment was added by 2602:302:D10C:83B0:226:8FF:FEEB:4F85 (talk) at 13:44, 11 May 2017 (UTC).
Is there a way to place site-wide JS pages in a category? I think this would be useful as half the time I can't find them. —CodeCat19:20, 11 May 2017 (UTC)
I created Module:translit-redirect to do the work of choosing between transliteration modules, for languages written in several different scripts. It replaces modules such as Module:pa-translit and Module:khb-translit (Punjabi and Lü), which simply redirect to another module, depending on which script is being used. The other module does the actual work of taking the native script and generating a transliteration. The redirecting code in these two modules is remarkably similar. It seems simpler to centralize all this transliteration-redirecting in one module. So far, I've made Punjabi use the redirecting module, and it appears to work. — Eru·tuon04:04, 12 May 2017 (UTC)
Bot request: External links → Further reading
Can a bot do this, please?
Change all instances of the heading "External links" into "Further reading".
While someone is running a bot doing this, you could also check for any instances of "Future reading". I may have made that typo from time to time when renaming the sections manually. —Aɴɢʀ (talk) 09:47, 12 May 2017 (UTC)
I just grabbed the latest dump and can work on this. TheDaveBot has started running, if anyone wants to look at the changes. And I am looking for "Future reading" as well. - TheDaveRoss15:04, 12 May 2017 (UTC)
I finished everything in NS:0 from the latest dump, there may be a few remaining. Wasn't sure if non-NS:0 pages should be updated, but I updated a few of those before I thought better of it. - DaveRoss12:41, 16 May 2017 (UTC)
Also, it's being transcribed as a two-syllable word, but it's a monosyllable with a long diphthong. The 5th-century BC pronunciation should be /nɛ̂ːu̯s/, not /nɛː.ŷːs/. —Aɴɢʀ (talk) 14:56, 12 May 2017 (UTC)
Hi , Is there a tool that check loops in wiktionary definitions ?
that is two terms each defined by the other.
I was asked to write such a bot to the hebrew wiktionary so I better first check if such a bot exist already. — This unsigned comment was added by Shavtay (talk • contribs) at 21:07, 12 May 2017 (UTC).
Do we still need this? It claims "This script changes timestamps such as those in comments to be relative to the local time", but all it does is import MediaWiki:Gadget-TranslationAdder.js. (Is this a reminder that the "editor" functions need to be split out of the "translation adder", and if so, can someone do that?) Also on the subject of gadgets which seem to only load other gadgetry: MediaWiki:Gadget-WiktBlockedNotice.js. - -sche(discuss)16:51, 13 May 2017 (UTC)
WiktBlockedNotice seems to not have any use so I am going to delete it.
And I have speedily deleted MediaWiki:Gadget-WiktAssistedEditing.js because for the time being TranslationAdder has dependency on one part of LegacyScripts and loading legacyscript with its many scripts for translation adder only does not make sense. Also the name is wrong.--Dixtosa (talk) 18:02, 7 August 2017 (UTC)
A discussion from 2010, archived on the talk page, suggests this category should be renamed, but a bigger issue IMO is that it should be generated automatically in entries that use {{en-noun}}, rather than being added manually to such entries. (Then we have to decide what is "irregular": any plural other than "s" or "es"?) - -sche(discuss)20:03, 13 May 2017 (UTC)
Yes, the only cases that come to mind where a plural could be in the "irregular plural" category while the singular was not in the "nouns with irregular plurals" category are cases where the plural is too nonstandard and uncommon to give in the headword of the singular, like dogz, but I wouldn't expect casual users to notice such a distinction and I'm not convinced it'd be useful. (Plus it could be argued dogz isn't even irregular, but a regular eye-dialectal change.) So I'd be OK with getting rid of Category:English irregular plurals. - -sche(discuss)04:08, 14 May 2017 (UTC)
Like water did before we split its translations onto a subpage (see GP, BP), man is now running out of memory. I've null-edited the page several times and the error is consistently in the Swedish etymology section (whereas the error in water would move several sections up or down after null edits), but it does display a small amount of randomness: sometimes the memory runs out right at Proto-Germanic, sometimes it runs out at PIE. I have a Phabricator report ready to file about water and man, but before I file it, are we still suspecting that individual tasks like instances of {{t}} not releasing memory after they finish is a cause? And what assistance would we want from the folks at Phabricator if they say they can't change their implementation of Lua? - -sche(discuss)06:44, 14 May 2017 (UTC)
I don't have answers to your questions, but I changed{{t-simple}} so that it frees up a little memory, pushing the error to the Declension section for Volapük. Strangely, when I switched more translations to {{t-simple}}, it made the problem worse. — Eru·tuon08:23, 14 May 2017 (UTC)
I think there may be some wasted memory in our language and script objects, though I'm not sure how to confirm that. Each language object contains one or more script objects. This probably results in duplication. For instance, when we have many languages in the translations list that use the Latn script, the Latin script object will be repeated inside each of those language objects. That duplication could account for part of the memory problem.
(I don't know about the whole memory-releasing thing, but I'm imagining there's a counter that tracks how much memory is used, whether or not it is released when the functions have done their work.) — Eru·tuon08:34, 14 May 2017 (UTC)
After recent edits, the page consistently makes it as far as the Volapuk Coordinate terms section before running out of memory. Interesting that the error on this page is so much more consistent than the error on water, and that adding t-simple to this page apparently makes it worse, apparently unlike water. - -sche(discuss)03:28, 15 May 2017 (UTC)
Quite odd. The edit with the memory increase is linked above. You added {{t-simple}} to non-Latin-script terms; do we want that? — Eru·tuon03:39, 15 May 2017 (UTC)
If that's the only way to fix it. I imagine non-Latin script languages use a lot more memory, since they require script classes and transcription. DTLHS (talk) 03:45, 15 May 2017 (UTC)
Re "able to decrease the memory": but not by adding t-simple to the same terms. In theory, adding it to Latin-script terms should either simplify things slightly or have no effect, but instead it had a bad effect (when tried). Adding it to non-Latin-script terms is not a good fix, since it breaks script support and transliteration. We could try simplifying t-simple further to allow script to be set manually the same way it allows langnames to be set. And, of course, we could try to simplify our modules. I am revising the bug report I intend to file on Phabricator to see if folks there have suggestions. 04:49, 15 May 2017 (UTC)
I kind of wish there were a function to create a "cache" of language and script objects that can be used by multiple functions. It would be similar to mw.loadData, but for tables that include functions. So, then, before calling getByCode, you could check if the language or script object has already been loaded to the cache, and use that instead. Or getByCode could check for the existence of that function. Not sure if this is possible programming-wise. mw.loadData requires the tables that it loads to be static, not containing any functions. The caching function I'm envisioning would have to allow functions to be within the tables. Perhaps it would be possible for language and script objects, because all of their variables stay the same, except for the variables supplied as arguments. Or maybe not. — Eru·tuon16:48, 19 May 2017 (UTC)
Also the idea above, of a new Scribunto function, similar to mw.loadData, to cache language and script objects so they're not duplicated in memory on a given page. Again, not sure if it's possible. — Eru·tuon19:25, 19 May 2017 (UTC)
Probably asking for an increase of Lua memory would be the simplest option, and it would probably suffice for quite a while. However, it would be good to pursue another option even if we get an increase in memory, to prepare for the ideal future, in which every translation table for basic words like water would contain every attested translation in every language possible (and be as full as or fuller than the water-translations are now). It's conceivable that we increase memory now, and then our pages get longer and longer, and we end up needing a further increase of memory because we haven't done anything to optimize the memory-hogging modules. — Eru·tuon21:01, 19 May 2017 (UTC)
The ticket has been closed as not something they will resolve, because it quite plausibly could be our modules which are the issue. They will also not increase the amount of memory pages can use, because "at some point in the future you'll probably wind up with an even-huger page that would exceed the new limit." They note that transliteration does seem to eat up a lot of memory. Perhaps deploying t-simple even on languages that would otherwise be transliterated — possibly simplifying t-simple to not even call modules, but have the script class (needed so the correct fonts are picked) input manually — is the way to go. Perhaps translations do not absolutely need to have transliterations? Or they could be supplied manually? - -sche(discuss)22:01, 21 May 2017 (UTC)
I find the response somewhat frustrating. @Anomie suggests that somehow our modules are being cached, but it would help to have more explanation on how that can happen and how that could cause the problem. And he does not respond to the request for help with determining which modules or functions are using so much memory. Perhaps there is another place to submit a request for help so that they will actually respond to it, but I don't really understand the Phabricator framework enough to determine where that would be.
I think inputting manual transliterations with {{subst:xlit}} would be a good idea. Of course, the transliterations would not be updated when the module is changed, but the same will happen with the manually inputted language names. We can modify {{t-simple}} to add transliteration and gender annotations if they have been supplied as parameters.
Wait, why would a module need to be involved at all if we input the script code manually? The template would just tag the translation as being whatever script was input, and common.css would apply the right font. (Or would that not work?) Of course, this would mean we would need to tightly control how many pages used t-simple, and periodically manually- or by-bot- check that only valid codes were being supplied. - -sche(discuss)22:32, 21 May 2017 (UTC)
That would ensure that a valid script code was being supplied, and maybe we could do it "inexpensively" enough for it to be (to extend the metaphor) affordable. OTOH, we could save memory by not invoking a module at all: the template could just apply the supplied script code to the translation, like templates did before we had Lua. - -sche(discuss)06:22, 22 May 2017 (UTC)
Okay, I guess I am being unclear. I am not proposing that the module be invoked, except when saving the page. I would just put a parameter such as |sc={{subst:#invoke:scripts/templates|findBestScript|eau|fr}} into {{t-simple}}. This would transform to |sc=Latn when the page is saved, and no template would be invoked once substing has happened. Then the script code in the wikitext will be valid unless someone changes the term or the language code on the page, or gives the script a new code in the script data module. findBestScript is the function by which {{t}} and other linking and tagging templates get script codes, so the output will be identical to the equivalent instance of {{t}}. — Eru·tuon07:17, 22 May 2017 (UTC)
Adding ids to enable linking to headwords
I think we need ids in headwords, for form-of templates to link to.
Form-of templates currently allow an |id= parameter to link to a sense id. The need for some kind of id is clear, when there are multiple POS sections each with their own alternative forms. But using a sense id implies that an alternative form or inflected form only belongs to one sense. That's usually not true. Alternative or inflected forms generally apply to all senses inside a particular POS section. In order to use a sense id in a form-of template, you have to arbitrarily choose a sense to link to.
Ideally, we would link to the POS header itself. That's currently impossible, because sections for the same part of speech often occur multiple times in the same page. For instance, the page set has no less than 16 Noun sections, two of which are inside the English section. Adding the anchor #Noun will link to the first Noun section on the page, which is not necessarily correct. There is no way to add unique anchors to POS headers. It could be done by putting {{anchor}} within the equals signs of the header (for instance, ===Noun{{anchor|English noun 1}}===), but that's not currently allowed, and for good reason: it makes the anchor template display in the edit summary when you edit that section, and it breaks the link from the edit summary to the section.
The current solution is to link to a sense id in one of the definition lines in the desired POS section. This does not make sense for a form-of template. Forms generally apply to all or most definitions within the POS section, not to only one.
A better solution would be to link to an anchor within their headwords. We could add a |id= parameter to the headword template, and have Module:headword add id="headword id" to the first form in the headword. The form-of templates could then link to this id, instead of linking to the POS header directly above the headword.
(An alternative would be to place {{senseid}} in the same line as the headword template. That's sloppy, because it's called a sense id, not a headword id. As a sense id, it should only be used in senses .)
To implement this, we would have to replace the existing |id= parameter in existing form-of templates with |senseid=. Then we would decide on a format for the headword ids, and allow Module:headword to create these ids, and Module:links to link to them. This may be complicated, but I am interested in making it happen. — Eru·tuon22:07, 14 May 2017 (UTC)
Well, that means the headword ids have to be formatted identically to the sense ids. I didn't want to do that because it could result in a conflict, if a sense id had the same string as a headword id. — Eru·tuon23:02, 14 May 2017 (UTC)
Ids of individual senses could also conflict with each other, but that doesn't seem to happen. I don't think it's an issue, unless I am missing something. —CodeCat23:07, 14 May 2017 (UTC)
How are we going to name these headword ids though? We can't name them by sense or by part of speech (which isn't guaranteed to be unique). Adding a number can work, but of course the sections may be reordered later and then the number no longer matches its order on the page. This isn't necessarily an issue and highlights a strong point of ids, but if there's a better option that would be nice too. —CodeCat23:19, 14 May 2017 (UTC)
That's a thorny question. Using a keyword from one of the most common senses would be easiest. As you say it might result in conflicts, but it might not be all that hard to avoid a conflict: an editor might be able to see both the id in the headword and any sense ids in the definitions while editing, unless the conflicting sense id was in another POS section. — Eru·tuon00:52, 15 May 2017 (UTC)
This is why creating entirely separate ids, despite the complexity, would be better for editors. No need to worry about conflicts between headword and sense ids. — Eru·tuon01:01, 15 May 2017 (UTC)
Like CodeCat, I think we should use the existing id= framework. Let's not make things unnecessarily complicated! We should probably add "have a bot check for duplicate ids" to the list of semi-regular tasks at WT:TODO, even to check for duplicate senseids under our already-existing system, like if someone put the senseid "pair" at a sense of the noun couple, and someone else put the same senseid at a sense of the verb couple. - -sche(discuss)01:49, 15 May 2017 (UTC)
in t-check categories, "Unspecified is an invalid script"
Category:Requests for review of Lü translations and Category:Requests for review of Punjabi translations are displaying errors, apparently because the default text they create gives users an example of how to use {{t-check}} to put an entry into the category, but that example uses the Latin-script word "example" as the Lü/Punjabi translation. If there is no easier way of resolving the error, I suggest just dropping "It results in the message below:" and the displayed example after it, because it does not seem to be helpful. The wikitext to copy and paste seems to be all that a casual reader would need. - -sche(discuss)01:43, 15 May 2017 (UTC)
That's an error message generated by the transliteration module, which in this case is Module:translit-redirect. I can disable the error message, either in non-entry namespaces or everywhere. — Eru·tuon01:51, 15 May 2017 (UTC)
Eh, for the transliteration module to check that it is getting valid input, and to display an error if it is not, seems fine. So far, it seems to be only this edge case where there is a problem, and it would seem easier (or maybe not easier! but maybe better) to solve it by removing the actual display of "this is what it would look like if you put Latin script into a Punjabi t-check for some reason". - -sche(discuss)02:49, 15 May 2017 (UTC)
Added English catfix to "requests for review of translations" categories, as I did before for "requests for translations". Now those categories will link to the English section of the entry. Thanks for noticing that error. — Eru·tuon02:59, 15 May 2017 (UTC)
@SemperBlotto I checked with AWB and found that there were only ~30 entries in both the "attention" category and Category:German terms with red links in their inflection tables, i.e. entries where the bot would be creating inflected forms for an entry with an attention tag. Of those, the inflection tables were correct in most of the entries: the attention tags were only asking for the definitions to be expanded, or in the case of a few adjectives, for "verb form" sections to be added. I will keep working to address and remove the attention tags, but I think that if your bot runs, the only entry it would create a wrong inflected form for is Wikiwörterbuch, where the inflection table lists a form Wikiwörterbuche which it should not list. - -sche(discuss)23:01, 20 May 2017 (UTC)
Modified Russian translit
I've created a JavaScript function that modifies Russian transliteration to show vowel reduction and other things: User:Erutuon/modifyRussianTranslit.js. So, for instance, the transliteration of Воро́неж(Vorónež) shows up as Varóniž: a little more representative of the actual pronunciation . Kind of silly, but I like the more transcription-like transliteration variant.
I've wanted to do this for a while, but it wasn't possible until I added tagging for transliterations, as discussed above. — Eru·tuon22:16, 15 May 2017 (UTC)
There is a longstanding disagreement between those who want transliterations of Russian to be transliterations, letter for letter, and those who want them to be like a secondary IPA. It has occasionally been suggested that both could be displayed, first the actual transliteration and then the "pronounced as". - -sche(discuss)23:55, 15 May 2017 (UTC)
I just want to warn you that your tool probably makes a lot of mistakes. The reduction rules are not as regular as they may seem at first. --WikiTiki8915:04, 16 May 2017 (UTC)
Accel template for German diminutives needs updating
The accelerated-creation script for German diminutives uses outdated syntax in the declension-table it generates (). I would fix it myself, but I don't recall where the relevant text that needs to be changed is located. - -sche(discuss)00:43, 16 May 2017 (UTC)
I suggest Template:de-decl-adj should accept optional parameters comp1_qual=, comp2_qual=, sup1_qual=, and sup2_qual= (or whatever names would be better) which, if present, would add a qualifier after the phrases "comparative"/ "superlative forms of x" that form the titles of the tables of forms. Then it could be explained why there are two tables of comparative forms on rot (one is for the inflection with umlaut, one without) and why there are two tables of superlative forms on blau (one with epenthetic -e-, one without). This could also be accomplished by repeating the tables with semicoloned headers, like this, but comp1_qual= would be neater. - -sche(discuss)02:06, 16 May 2017 (UTC)
There are several formatting issues with Japanese readings (yomi). The readings are displayed (collectively) by {{ja-readings}}, which provides fields for classes of readings (such as Goon, Kan’on, Kun, etc.). Each field may contain multiple readings (see e.g. 食), but there is inconsistency in how the individual readings are formatted. The readings currently use plain wikilinks, so neither the linked hiragana nor the transliterations are currently tagged with lang=ja or classes Jpan or tr. Furthermore, there is inconsistency in how readings with okurigana are presented (sometimes the kanji spelling with furigana is included, sometimes not; sometimes there is a dot in the hiragana, etc.). When historical readings are thrown into the mix, things get more complicated, and other information is also sometimes supplied, such as rarity, and if the reading has been excluded from the Jōyō kanji table, i.e. it is 表外(hyōgai). See 十, 況 for some of the variety of display approaches. I think it is time we figured out how to present all this information and format it properly so that fonts are consistent on the site and language/script tags are properly applied. We could include this functionality in {{ja-readings}} (i.e. code it in the readings section of Module:ja), or we could create a new template for a single reading with its historical form (and automatic transliteration). I rather lean toward the latter route, as it would continue to allow extra information to be supplied alongside the readings as needed, while standardizing the display of modern/historical pairs and their transliteration. I also think it is particularly important to divorce the categorization of the historical readings from that of the modern readings, because they overlap, and it is useful to be able to search for kanji with a particular historical reading. – Krun (talk) 15:53, 17 May 2017 (UTC)
I think it would be fairly easy to write a function to search for the wikilinked Japanese words and create a {{l}}-style link for each, without transliteration since the transliteration seems to be written out manually. It would also be possible to search for the italicized transliterations and add the correct formatting to them. If there are tables with transliteration missing, it would probably be possible to search for that and add automatic transliteration generated by the transliteration module. Is that the kind of thing you want? — Eru·tuon00:32, 18 May 2017 (UTC)
Just to clarify: these are my ideas after looking at the sample input on the template page for {{ja-readings}}. I'm somewhat bewildered by your post because I know very little about Japanese and its entries (aside from the fact that there's one ideographic script and two syllabic ones). — Eru·tuon00:36, 18 May 2017 (UTC)
Well, I was looking for technical input as well as linguistic, and you seem to manage well with the Lua module coding. I haven’t really gotten into it myself and don’t understand it very well. Anyway, about the transliterations: those should probably be removed and autogenerated instead. – Krun (talk) 01:49, 18 May 2017 (UTC)
I guess what we need might be a template, say ja-y that took a modern reading, its corresponding historical reading (optionally for now due to lack of data, but preferably always included), an optional more ancient reading (such as くゐやう for 況) and potentially more, e.g. a parameter to indicate presence or absence in the Jōyō kanji lists. I’m not sure how it should be presented, but the connection of historical readings to the modern counterpart should be clearly indicated, all readings must be transliterated (there needs to be a separate automatic transliteration for historical/ancient readings from the modern ones), and the readings should be labeled succinctly, preferably with links to Wiktionary appendix pages explaining things like historical kana usage, Jōyō kanji, etc. – Krun (talk) 02:04, 18 May 2017 (UTC)
I've done a restructuring of the readings function, so that each reading can be handled in the same way (links redone using Module:links, for instance). If you notice any problems (such as readings in the template code vanishing from the displayed list), please let me know. — Eru·tuon02:08, 18 May 2017 (UTC)
Hmm, I don’t really notice anything different now from before, but anyway, here is a stab at what I mean for each individual reading: Something like
@Krun: Right, it shouldn't look different. I was restructuring to make consistent script and language tagging or linking possible. — Eru·tuon23:56, 19 May 2017 (UTC)
結#Readings is indeed hard to read. Do you have an ideas for how it could be entered using the more brief input format that you are suggesting (or how it could be made more readable)? — Eru·tuon00:21, 20 May 2017 (UTC)
My suggestion:
{{ja-readings
|kanon=けつ
|goon=けち
|kun=結(むす)ぶ, 結(むす)び, 結(むす)ばる, 結(むす)ばわる, 結(むす)ぼる, 結(むす)ぼうる, 結(むす)ぼれる, 結(ゆ)う, 結(ゆ)い, 結(ゆ)わう, 結(ゆ)わえる, 結(いわ)える, 結(いわ)く, 結(す)く-“to knit a net”, 結(かた)なす-“to gather or tie together into one bunch”, 結(かた)める-“to bind together; to open and read out the content of official documents”, 結(かた)ぬ
|nanori=
}}
Jouyou readings should be handled by a backend data module. Of course, the automatic link formatting algorithm should be disabled if any of the parameters is formatted as a link (e.g. by {{ja-r}}). Wyang (talk) 00:34, 20 May 2017 (UTC)
I've disabled the link-formatting thingy whenever "<span" is found in the parameter, which indicates that {{ja-r}} (or another linking template that adds language and script tagging) has been used. — Eru·tuon01:16, 20 May 2017 (UTC)
@Kc_kennylau As 況(いわ)んや<況(いは)んや - so the algorithm for generating the link(s) would be (1) split by dash; (2) split by the less-than symbol; and (3) analyse parentheses. Wyang (talk) 11:39, 20 May 2017 (UTC)
(unindent)
Nice. After seeing your suggestions, I really like them, @suzukaze-c and @Wyang, particularly since a way to add extra info has been envisioned. My original suggestion mostly has merit for making it easier for editors to add extra info, but if everything necessary is provided for, I actually prefer the single template solution. The code is then also shorter and easier to read without all the pipes and stuff in there. I still wonder how specific things like qualifiers about context of readings, rarity, classical Japanese only, etc. would be put in, and of course about the implementation of Jōyō readings. I do very much like the idea of having the Jōyō status of readings be automatically determined by lookup in a data module. I guess that would be very similar to how {{ja-kanji}} automatically determines the grade (Kyōiku (1–6) / Jōyō / Jinmeiyō / Hyōgai) of the kanji itself, which I also like. One change from Wyang’s example I would like to make is to remove the kanji (it is, after all, always the same kanji). That would also make the input simpler and easier to read:
{{ja-readings
|kanon=けつ
|goon=けち
|kun=むす.ぶ, むす.び, むす.ばる, むす.ばわる, むす.ぼる, むす.ぼうる, むす.ぼれる, ゆ.う, ゆ.い, ゆ.わう, ゆ.わえる, いわ.える, いわ.く, す.く-“to knit a net”, かた.なす-“to gather or tie together into one bunch”, かた.める-“to bind together; to open and read out the content of official documents”, かた.ぬ
|nanori=
}}
I still think the spelling with kanji and the plain hiragana spelling should both be displayed in full, though, when the page is rendered. – Krun (talk) 00:28, 22 May 2017 (UTC)
Looks great. Question: are there any circumstances in which {{ja-r}} may be invoked with the kanji in a non-initial position? Wyang (talk) 07:13, 22 May 2017 (UTC)
@Wyang I am not aware of such. The only thing I could think of is things like お母さん, but then the かあ part seems to be considered the reading. – Krun (talk) 15:29, 22 May 2017 (UTC)
@Suzukaze-c Pretty good. I particularly like the special formatting for the historical readings. As for the additional layer, it is certainly true that dictionaries will normally not include it (like the one you linked to, which I believe is the 2nd edition of the Daijirin. Wiktionary is special, however, as we are not bound by the same restraints. I also think it’s a very good thing that we can offer easy access to otherwise obscure information. In any case, these readings have already been added to Wiktionary pages, and I do believe they are authentic, if rare. There are several sources for them; this page has a nice overview, with reference to several printed sources at the bottom; see also this one.
I am a little bit concerned with the furigana/ruby though; I think I prefer the full on plain kana spelling accompanied by the kanji-with-kana spelling for several reasons:
1. To have a link to both (the kanji spelling links to the main entry for a word, where the senses, pronunciation, etymology, etc. are to be found, but the plain kana spelling links to the page that gives the different kanji spellings for that reading).
2. With ruby, unfortunately, the kanji-with-kana spelling is not searchable (e.g. 遅れる(okureru) is 遅 followed by おく followed by れる in the HTML text. In my browser (Safari), a page search will find “おくれる”, but not “遅れる”. This is of course a wider issue with our use of ruby, and in cases where multiple kanji are used, often neither the spelling with kanji nor the plain kana can be searched for. It will presumably become less of an issue as our Japanese coverage gets better and we get individual entries for everything (then the headword line has both spellings in full), but it would be nice to help users to make the most of the information we currently have.
3. The kana are more readable when written in full. After all, in this particular page section the kana are all-important, as people are looking in it specifically for the reading; it’s not just a reading aid or extra information in this case.
I would also like to have consistency in the appearance of readings that have okurigana and those that do not, e.g. 行う (おこなう(okonau)) and 行 (くだり(kudari)), rather than having the latter show only the plain kana spelling. We already have it e.g. at 食. I want to make the presence of okurigana explicit like that in every case, as it isn’t otherwise obvious whether there are no okurigana or the information is merely missing. There probably are cases where we cannot be certain for a while what the okurigana are, and not specifying (until someone figures it out) would be desired in those cases. There is always a question of whether the kanji or the kana spelling should come first; I don’t really have a strong opinion on that.
Another thing we need to be prepared for is variations in the okurigana. How should we format e.g. おこなう for 行, which has the variants 行う and 行なう?
Then there are readings that are words normally spelled with two kanji, like the kun-reading さとう for 糖. This is, of course, a word which is normally spelled 砂糖 (including the kanji 糖, but with another kanji as well). Has this word really been spelled with 糖 alone, or do kun-readings of this sort perhaps only serve as glosses or as standard word transformations for reading kanbun and the like? Perhaps you know something about that, @Eirikr, TAKASUGI Shinji, Nibiko?
re: historical readings: Alright, how many levels are there? Is "modern" < "historical" < "ancient" all there is?
re: furigana: Being unable to Ctrl+F ruby is a problem that lies in the very HTML structure of ruby. However, I've changed the way kanji is displayed to be more like 食. I think there is no problem with having okonau listed twice ().
Thank you guys for your effort to improve kanji entries. I’m wondering if we could show official readings more clearly, probably with bold letters. Wiktionary sometimes has too much information including archaism, in which case it can be misleading. The current version of 食 erroneously makes you believe 食す(osu) is acceptable in modern Japanese. — TAKASUGI Shinji (talk) 22:36, 22 May 2017 (UTC)
@TAKASUGI Shinji: I wouldn’t say it’s implied that everything on the page is in contemporary use, but I say we should definitely indicate all the Jōyō readings. Perhaps bold would be an appropriate choice, but then just for the kana-only spelling (don’t know about the transliteration, but definitely not the kanji). It would probably also be a good idea to specially indicate readings that only apply to classical Japanese.
@Suzukaze-c: Nice work! I very much like the look of it now. Also, nice touch to add the underline to the transliteration to indicate furigana. Regarding the “ancient” readings: Yes, I think there is only that one extra layer, and it is only when there was originally /kw/, /gw/ before /e/, /i/, /y/ (later simplified to drop the /w/). These are borrowed clusters from Chinese that only occur in on’yomi. The name isn’t anything standard, but I suppose “ancient” will suffice as long as we have an appendix to explain what we mean.
I have two things to add:
First, when the historical reading is the same as the modern reading, I would still like it to be indicated in some way. This helps users and editors to know whether the historical reading is in fact the same or just hasn’t been added, and enables categorization as well.
Second, we need to have a separate transliteration scheme for historical/ancient readings. It has to differentiate between づ(du) and ず(zu) and ぢ(di) and じ(zi), ぢゃ and じゃ, etc. Perhaps Nihon-shiki would be a good basis for that, although perhaps the は-line had better be fa, fi, fu, fe, fo? I guess I’d be okay with it being ha, hi, fu, he, ho, as in Hepburn. Yōon should also be indicated without an extra vowel before the /y/ or /w/, even though all the kana are written full size, e.g. きやう(kyau), くゑ(kwe), きよ(kyo), くゐよく(kwyoku); /ou/ should not yield ō, e.g. きよう(kyou), not kyō. – Krun (talk) 00:12, 23 May 2017 (UTC)
In fact we have so many “Japanese” entries that are actually Old Japanese. Why don’t we separate them using the codes ja and ojp? — TAKASUGI Shinji (talk) 03:14, 23 May 2017 (UTC)
@Suzukaze-c: Re: bolding: We were talking about putting boldface on readings that are found in the Jōyō list, not all readings that are used in the modern language. Even if we went for bolding them, their status should also be indicated more explicitly (it could be an abbreviation or symbol, but it must have a link to an appendix explaining the Jōyō list. Actually, one thing I noticed from your sandbox example is that you’ve marked things as non-Jōyō (which is already done in entries), but I envisioned that we would rather specially indicate readings that are in the Jōyō list; the + in the code was meant to indicate that (a + is weird as a marker for exclusion anyway). For the kanji that have many readings, I’m sure most of the readings are non-Jōyō, and we it’s not good to imply that some random readings (maybe ones added later to the page) are in Jōyō because they’re not marked as such. Anyway, @Wyang already suggested above that a data module be used to keep track of the Jōyō readings. We can extract all of them from w:List of jōyō kanji.
Re: modern < historical < ancient: Looks good!
Re: if historical == modern: Hmm, I guess we don’t need to add any new feature for that. Some shorthand display would be possible; many dictionaries don’t indicate the historical spelling if it’s the same or if it only differs in use of small kana vs. full size (e.g. しょう / しよう) – but I guess it’s probably best to just add the historical reading in full even if is’t the same as the modern one, especially if it might have different romanization. Just so you are aware, in case it has any effect on the implementation, there are cases where the modern and historical readings are the same, but there exists a more ancient version that is different, e.g. 鬼 (き(ki) < き(ki) < くゐ(kwi)). This won’t affect anything of course, if we just go for the full display of modern and historical readings regardless of the values.
Re: categorization: Consider 館 (かん(kan) < くわん(kwan)) and 奸 (かん(kan) < かん(kan)). These should both go into Category:Japanese kanji read as かん, but there would additionally be something similar for the different historical readings (e.g. and ). That way one can search specifically for kanji with the historical reading kan without them being conflated with the ones that have the historical reading kwan, as it is currently.
@Suzukaze-c I see you’ve been working on the historical romanization :). Just one more thing: を needs to be wo (not o) in the historical mode. – Krun (talk) 12:59, 24 May 2017 (UTC)
Jōyō readings are now indicated by both an inline note and a hideous yellow background, with the help of data at Module:Sandbox/1. I also added historical romaji according to what you've told me and "historical reading" categories. —suzukaze (t・c) 04:57, 25 May 2017 (UTC)
@Suzukaze-c Cool. We’ll need to organize a bot run through all the kanji entries to switch to the new format. WT:AJA will need to be updated as well. Then we’ll need to add the historical readings and the okurigana dot marking manually, so perhaps we should add maintenance categories, one for readings where the historical form is missing and one for readings that don’t have a dot separator (if the kanji covers the whole reading, i.e. there are no okurigana, the dot should be at the end). Also, is everybody okay with this change (@Eirikr, Hippietrail, Nibiko, Haplology, エリック・キィ, TAKASUGI Shinji, Nbarth, Stephen G. Brown, Wyang, Atitarev, kc_kennylau)? – Krun (talk) 00:24, 28 May 2017 (UTC)
I support this and I think that this is a step towards making Japanese entries easier to edit. @Krun Those weird kun'yomi come from the Unihan database, which seems to treat kun'yomi as a field for giving Japanese glosses for kanji. Theoretically, they could be used in kanbun, but as they are not a part of any standard, I don't think that we should include them. Nibiko (talk) 07:38, 30 May 2017 (UTC)
Deploying the new code
(unindent) @Suzukaze-c I am wondering whether we can move the new code to Module:ja immediately, but keep the old code and just add a switch that would handle the readings as they are handled now if e.g. they start with
@Krun The code already is able to cope with "old" formatting, due to User:Erutuon's work. Adding a category would be very easy. The code could theoretically be deployed now but there is still a small portion of it that may not be good coding (the part marked with TODO: this is probably bad, mod:ja-link should be callable from modules). @Erutuon, could you help? (also maybe review the rest of the code if you want, since you're so clearly more experienced at this stuff than I am) —suzukaze (t・c) 18:44, 1 June 2017 (UTC)
@Suzukaze-c: I'm taking a look at it now, initially trying to figure out the code and making a few small changes. I started a module-callable function in Module:ja-link. Not sure if it will work yet. — Eru·tuon22:24, 1 June 2017 (UTC)
@Krun It serms there is no objection so I am ready to make the change live but there is one thing I forgot to ask. Currently the "old" {{ja-readings}} only adds categories for on'yomi, and the way these categories are named is different from what you proposed above (Category:Japanese kanji read as きょう vs. Category:Japanese kanji with reading きょう). How should these differences be reconciled?
Also, are there any kanji readings that exist as with on'yomi and kun'yomi readings? (do we need to add "on" and "kun" into the category name?) —suzukaze (t・c) 21:22, 3 June 2017 (UTC)
@Suzukaze-c Yes, there are definitely readings that exist both as on’yomi and kun’yomi. Those will generally be the shorter ones, such as ひ, いく, まつ, and so on. I think it would be a good idea to categorize the on- and kun-readings separately, e.g. Category:Japanese kanji with on-reading ひ and Category:Japanese kanji with kun-reading ひ. We could also separate them into the different classes of on-readings (goon, kan’on, tōon, sōon, kan’yōon), but in that case we would need duplicate generic on’yomi categories on everything (when one is looking up readings without knowing which type of on’yomi they might be), so maybe that would just make the categories overly complicated? I don’t know, maybe it would be an interesting new way to discover kanji and their different types of readings. I do think separate categories for Nanori will be needed. Also, I noticed that the dot separator is currently in the category names. We’ve never had that before, but it could be interesting, as okurigana are not always obvious and, again, the same kana can be applied differently (e.g. ひる: 昼 and 簸る, both kun-readings). – Krun (talk) 14:53, 4 June 2017 (UTC)
Perhaps we should also keep the current categories (], etc.) to allow more generic lookup as well, for when one doesn’t know whether a reading might be on or kun, or when one is simply looking for homophonous characters. – Krun (talk) 16:16, 4 June 2017 (UTC)
Could you make a list of the types of category names that should be recognized, and an idea of what the category tree should look like? I should be able to create a module with that information. — Eru·tuon02:26, 5 June 2017 (UTC)
The new format looks really nice. Just a small suggestion: perhaps the background colour could be made half of current + white so that it looks more soothing. Wyang (talk) 02:49, 5 June 2017 (UTC)
@Suzukaze-c: I've changed some things in the Sandbox: Still needs some work; I am extremely tired and can barely wrap my head around it right now. – Krun (talk) 01:33, 6 June 2017 (UTC)
@Suzukaze-c I was just trying the new functionality out on a new kanji, 鮑, and I encountered a problem: the reading しおうお is incorrectly romanized as shiōo instead of shiouo, because we forgot to provide for morpheme boundaries. In other places, a dot (full stop) is used for this (e.g. {{ja-r|鮑|しお.うお}}, yielding 鮑(shiouo)), whereas we are using the dot here for the okurigana boundary. Perhaps we should change the separator for okurigana to - (hyphen) so that we can uniformly use . for its existing purpose in transliteration generation? – Krun (talk) 14:41, 7 June 2017 (UTC)
re: "if the kanji covers the whole reading, i.e. there are no okurigana, the dot should be at the end": Hmm, IMO it is kind of illogical. It seems weird to me. —suzukaze (t・c) 03:09, 8 June 2017 (UTC)
@Suzukaze-c, Krun: Should there be "modern" categories, or is there another way for someone to find current readings as opposed to historical ones? — Eru·tuon22:18, 8 June 2017 (UTC)
Ohh. Or is it that if the reading category is not qualified as "ancient" or "historic", it is modern? — Eru·tuon22:27, 8 June 2017 (UTC)
@Erutuon: Yes, that’s it. You raise a valid point; perhaps it’s not obvious that they are modern readings.
@Suzukaze-c: re: dot/thingy at the end for readings without okurigana: Well, it’s perfectly logical if you consider that the part before the delimiter is the part covered by the kanji, and so if there is nothing after the kanji, everything comes before the delimiter. I was particularly concerned with uniform appearance, and putting the delimiter at the end accomplished that, namely in the form of identical formatting (underlining) of the part covered by the kanji. However, I am not completely satisfied. Like you, I do find it weird to see the hyphen (or dot) at the end; it looks like there is something missing. I don’t have anything against dropping the delimiter per se, when there are no okurigana, as long as it is unambiguous and displayed consistently, but I can’t see how it could be unambiguous with our editing model. We already have an abundance of existing readings without delimiters, many of which are readings that do have okurigana. Even if we go through all the kanji entries and standardize, there are always new editors who put in readings without necessarily knowing such details, and it’s useful to be able to do so (just like adding an on-reading without knowing whether it’s kan’on, goon or kan’yōon, or even tōon, etc.). Therefore, I think we do need to have some sort of explicit marker for this kind of reading. I haven’t added it for on-readings however, as I don’t feel that would make sense; on-readings can never have okurigana anyway.
@Suzukaze-c: Also, there are some small issues with the module: 1. the display order of the reading categories is getting messed up; it should always be: gōon, kan’on, tōon, kan’yōon, on (unclassified), kun, nanori; 2. jōyō matching has stopped working for readings with okurigana after the delimiter was changed. – Krun (talk) 00:59, 9 June 2017 (UTC)
@Krun: The readings should display in that order now (or else my method of maintaining the order isn't working). Though, where should sōon go? — Eru·tuon03:11, 9 June 2017 (UTC)
@Erutuon: It’s not working here, at least: 齡 (the order shown is Goon, Kun, Kan’on). I’ve even tried purging the page’s cache. Also, 初 shows Nanori before Kun. – Krun (talk) 00:11, 10 June 2017 (UTC)
@Suzukaze-c, Erutuon Something’s up with the reading は (e.g. in 葉 and 母): it’s being romanized as wa. Also, the Jōyō reading matching doesn’t work when the trailing delimiter is used. – Krun (talk) 01:18, 11 June 2017 (UTC)
The kun+trailing delimiter problem is fixed. The romaji, on the other hand, is a more complex issue...... I'm looking into it. —suzukaze (t・c) 02:13, 11 June 2017 (UTC)
Module:inflection was invented as some universal platform for inflection modules. Firstly it was implemented and used for Uzbek nouns, and then was started implementation for Russian nouns. But other users asked me to stop that implementation, so fill free to do with them anything you want. Vitalik (talk) 23:27, 19 May 2017 (UTC)
Module:table tools does not only handle footnotes. It contains functions that linkify and format comma-separated lists, and footnotes are only a part of that. Overall, it's meant to be a set of tools for making inflection table templates without dedicated modules. --WikiTiki8914:35, 18 May 2017 (UTC)
A bot can do it, but it has to be logged in as an administrator. Not that it matters, because logged events cannot be flagged as bot edits in most cases (so will show in RC). Also, not all of these categories are empty, wouldn't it be easier to clean them up before deleting? - DaveRoss14:28, 18 May 2017 (UTC)
@TheDaveRoss: Do you think you can make the bot delete ONLY the categories that don't have any pages? This is just to be sure, because actually I believe I successfully emptied all the categories now. I did null edits where needed, and also removed a few entries that were categorized manually (i.e., without using templates).
Some categories have a reported number of entries like "Translations to be checked (Tupinambá) (0 c, 1 e)" (emphasis mine) but they are actually empty. --Daniel Carrero (talk) 14:56, 18 May 2017 (UTC)
Yes, I can check if a category has members before deleting it. I'll delete all of the empty ones first and then check back. - DaveRoss15:02, 18 May 2017 (UTC)
@TheDaveRoss, Daniel Carrero: Is there a way to re-add the boxes that showed the newest and oldest additions to the categories? I was checking French translations from oldest to newest, but that information is no longer available to me. Has it been lost altogether with the category name changes? If so, I would have voted against renaming them had I known this would be a consequence.... Andrew Sheedy (talk) 03:03, 8 June 2017 (UTC)
@Andrew Sheedy See Category:Requests for review of French translations. I re-added the boxes that showed the newest and oldest additions to the categories. I apologize since I'm doing this now, this could have been done before. However, because of the category move, technically all entries qualify as "Recent additions to the category". This will be true until 10 new entries get added in the category. --Daniel Carrero (talk) 06:58, 8 June 2017 (UTC)
Thanks. I figured the information about the oldest members of the category would be lost, which is a shame. Oh well, hopefully I'll get them all done anyway. Andrew Sheedy (talk) 02:57, 9 June 2017 (UTC)
This template is failing in the Compounds section of 一 by running out of Lua memory. It is powered by Module:zh and Module:columns. I tried turning off sorting (by previewing the page while editing Module:zh), but that doesn't fix it. It does have a huge number of links: 1275. — Eru·tuon00:27, 19 May 2017 (UTC)
I've created a JavaScript function, User:Erutuon/simpleTranslations.js, to quickly convert {{t}} to {{t-simple}} for Latin-script translations with no parameters besides lang and term. (The function creates a link just above the edit box to allow you to trigger the function, if it finds the Translations header on the page.) I had been using regex in gedit (which I had to retype each time); might as well automate it. This will allow quicker fixing of Lua memory errors related to huge translations sections. I just had to do this in I, which recently developed a Lua memory error. So I decided to make a function. — Eru·tuon03:44, 20 May 2017 (UTC)
Edit tag for missing headword template
Would it be possible to create an edit tag for missing headword templates? I've seen many edits like this, which have a definition and no headword template. They would be easier to find if they were tagged.
Perhaps the tag could be applied if there is a header containing one of the allowed parts of speech, but something other than a newline and a template is found after it, or if there is a POS header, newline, and #, as is true in the edit I linked above.
They use regular expressions. It's technically possible, but the regular expression would have to match on a location where a headword-line template should be placed, but isn't. Both of these things run into problems: where does a headword-line template go, and what is a headword-line template to begin with? The answer to the first is a rather long list of valid POS headers, which makes for a very long regular expression. The second is even harder to answer, as it's essentially an open set; people create new headword-line templates all the time. At best, it could match for a template immediately following one of a long list of headers. —CodeCat01:34, 21 May 2017 (UTC)
I think the location of the headword template is defined: one line below the POS header (though rarely someone might place it two lines below, which is incorrect). Yes, there are lots of headword templates. Checking for any template in that position would be easiest. (Could compile a full list from all the templates listed in Category:Headword-line templates and its subcategories, but I imagine that'd be a very long list, and quite frequently people don't add a category when creating a new template.) — Eru·tuon01:47, 21 May 2017 (UTC)
So, it would be impossible to generate a list of all the POS headers in use, and it is probably impossible to generate a list of all headword templates. But at the very least the filter could search for more commonly used POS headers, and then search for any template at the beginning of the first or second line below the header, and add a filter if there isn't a template there. That would be incomplete, but might not result in any false positives. — Eru·tuon20:24, 21 May 2017 (UTC)
It must be possible to generate a list of POS headers (or a list of L3 headers from which non-POS headers could be sifted out manually), since lists of headers have been generated before and used to clean up errors like "Etmology". The list of POS headers would be large, but probably not larger than 300, especially once errors (like "Nouns" for "Noun") are removed. Then we could update WT:POS, with the understanding that new headers are not forbidden, but should be discussed. An edit filter checking so many things might be too expensive, though. - -sche(discuss)22:24, 21 May 2017 (UTC)
Another idea: listing all the non-POS headers and assuming something is a POS header if it isn't a level 2 header (language) and isn't one of the non-POS headers? That would only be tenable if the list of non-POS headers is smaller than the list of POS headers. — Eru·tuon01:17, 22 May 2017 (UTC)
I edited the shortcut WT:POS to point to WT:EL#Part of speech, which is the actual voted and approved policy. We do have a comprehensive list of parts of speech. (New parts of speech may be added depending on the needs of each language, of course.) --Daniel Carrero (talk) 01:33, 22 May 2017 (UTC)
For anyone curious about why this is a non trivial thing to analyze I suggest you download a dump and try to do it yourself. DTLHS (talk) 01:40, 22 May 2017 (UTC)
A possible approach: all headword templates should add a category in one of the following formats: lemmas or non-lemma forms. It should be possible to use the regular search to find entries that have no headword templates at all. As for entries with headword templates for some, but not all language sections: do the dumps include categories generated by templates, or are they all pre-transclusion? If they do have such categories, it would be a simple matter of comparing category names with L2 headers. If not, it might be possible to go through the subcategories of Category:Lemmas by language and Category:Non-lemma forms by language and create a list of language headers to compare with the list of L2s in the dumps. That still won't find cases where there's at least one headword template in the language section, but one or more headword templates are missing. It's a start, though. Chuck Entz (talk) 06:04, 22 May 2017 (UTC)
There is a category dump, but it's separate from the main dump, is in SQL instead of XML and is somewhat hard to work with. DTLHS (talk) 06:24, 22 May 2017 (UTC)
This filter might do the trick, it looks for the "known" POS headers and checks to see if there is any sort of template used before the definition line.
Not sure how many false positives this would result in, but we can try it out and if it proves unhelpful amend or disable it. Added it as AF #68, tagging with "no head temp". - DaveRoss15:54, 22 May 2017 (UTC)
Here is the log. It caught some edits to Wiktionary: pages; it should only look in the main-namespace and maybe also Reconstructions:. It also caught e.g. diff not because of any error in that edit, but because the page elsewhere has an untemplatized headword line, in the Scots section. That's probably OK, if we treat the log as a source of entries to clean up rather than edits to disallow. - -sche(discuss)05:59, 23 May 2017 (UTC)
Ahh, so the filter checks the entire entry, not just the part that has been changed. I think it should be restricted to namespaces that contain entries: main and Reconstruction. The pages in the Wiktionary namespace are just noise to be ignored. Some Appendix pages contain entries (for instance, Appendix:Quenya/Elda), but most probably don't, so it would be best to exclude them. — Eru·tuon06:45, 23 May 2017 (UTC)
DTLHS fixed the namespace (currently it is NS:0 only). I actually thought that looking at the whole entry (instead of newly added lines) was a feature, since it gives an opportunity to fix old problems as well as new. If that is not ideal it can check only new lines instead. - DaveRoss11:49, 23 May 2017 (UTC)
I recently added "Category:English words prefixed with de-"to the page for "decant", but decant doesn't now show up on the list of words. Where is the documentation for this particular aspect of working with categories? Thanks! — This unsigned comment was added by Raspberrybeloved (talk • contribs).
These categories are automatically added by etymology templates like {{affix}}, {{prefix}} and such, so you never need to add them manually. —CodeCat14:17, 22 May 2017 (UTC)
The prefix de- was added to decant while the word was in Latin. We only add "prefixed with" categories when the prefix was added in the current language (English). So decant should not be in the category English words prefixed with de-. — Eru·tuon01:33, 23 May 2017 (UTC)
I working on a module to grab the etymology of a term, based on {{desctree}}, called {{termetyl}}{{termetym}}. The idea is to nest etymologies to reduce duplication and make entries more accurate and consistent. It's pretty awful how so many parent and child entries have conflicting etymologies. I'm running a looping issue right now, though. Could someone have a look and see what's wrong? You can find an example here: duvet. --Victar (talk) 01:11, 23 May 2017 (UTC)
I have an issue that some children language may have more than 1 word/form (which may or may not have same meaning). How do you handle this? --Octahedron80 (talk) 03:43, 23 May 2017 (UTC)
That might be somewhat complicated. It might require compiling a full chain of derivation, listing the proximate relationships between each word in the chain, and then a function to determine the relationship between the word of the current entry and each word in the chain of derivation. Two examples: ❶ If an English word is inherited from a Middle English word that's inherited from an Old English word, the English word is inherited from the Old English word. ❷ But if an English word is borrowed from a French word that's inherited from a Latin word, the English word is derived and not inherited from the Latin word. The module has to somehow distinguish those two cases. Not sure how to do that. But it can't be done by checking if Latin is an ancestor of English. That wouldn't work in the case of ❸ a French word borrowed from a Spanish word that is inherited from a Latin word. There, the French word is derived and not inherited from the Latin word, even though Latin is an ancestor of French. — Eru·tuon05:24, 23 May 2017 (UTC)
I did a simple mw.ustring.gsub on the results, which should remedy that, although now you have to enter {{termetyl|en|fr|foo}}{{termetym|en|fr|foo}}. @Erutuon you want to have a look and see if it checks out for you? --Victar (talk) 05:50, 23 May 2017 (UTC)
I believe there are two schools of thought on the use of {{inh}}. Some only use it for the first derivation in the etymology. Others use it for every inherited step. Which is correct, I don't know. If we did the later, all you would need to do is only start replacing {{inh}} with {{der}} after the first instance of {{bor}} or {{der}}. We'd also have to add a |noinh= parameter which is a bit annoying. --Victar (talk) 06:16, 23 May 2017 (UTC)
I hadn't heard of this other school of thought, in which only the relationship to the nearest ancestor word counts as inheritance. I'm doubtful. Where did you find this idea expressed? — Eru·tuon06:49, 23 May 2017 (UTC)
I've never heard anyone espouse the "first-only" school of thought, and I certainly can say that it is not the intended usage of the template (no offense). Inheritance should be shown all the way down where accurate (I'm sure @CodeCat can back me up on this). —JohnC506:55, 23 May 2017 (UTC)
LOL, oh man, the pitchforks came out. No need to freak out. I can program the later, no problem. --Victar (talk) 07:26, 23 May 2017 (UTC)
@Victar: There is none. You have to write out each alternative separately and put it in a separate instance of the function. — Eru·tuon17:42, 23 May 2017 (UTC)
Lua does not have regex's, it has patterns, and patterns lacks any equivalent to (x|y|z) —JohnC517:42, 23 May 2017 (UTC)
That's pretty limiting of Lua. Fixed one of them, but still need to figure out how to replace this one:
local pattern = ".*?(\{{(bor|der).*)"
local match = mw.ustring.match(etymology, pattern, match)
But yeah, if this goes well and people agree to this, then I'd love to see this integrated into {{bor}}, {{der}} and {{inh}}. That would also simplify the logic as well ({{inh}} maintains inheritance, {{bor}} and {{der}} do not. All of them replace {{bor}} with {{der}}). —JohnC503:56, 24 May 2017 (UTC)
@Victar: I've fixed it. You were calling preprocess on each recursive call, so it was rendering the html for the internal calls. This way, they weren't editable like we wanted. Now, it only preprocesses once before returning. I also made it more efficient: you only need to run all the template language manipulation on the top call, not all the recursive sub-calls. Please check that everything is working! —JohnC505:54, 24 May 2017 (UTC)
Okay, the categories look good. But what about an English word inherited from Middle English, borrowed from French, inherited from Latin? I think an example of that is peace. Is there a way the module can be made to handle that? — Eru·tuon07:27, 24 May 2017 (UTC)
Well from the English pages perspective, that would be English inherited from Middle English and then derivation the rest of the way. The fact that Middle English borrowed from Old French and Old French inherited from Latin is irrelevant to the English entry. I've reworked peace to show this. The template would handle this correctly. —JohnC507:41, 24 May 2017 (UTC)
Okay, I don't understand how it can handle that example, since it either keeps all {{inh}} templates or converts them to {{der}}. Am I misunderstanding how it works? — Eru·tuon07:57, 24 May 2017 (UTC)
@Erutuon: So I should clarify the possible distributions of the templates. If were to make a pseudo-regex for the possible orders of templates ({{inh}} = i, {{der}} = d, {{bor}} = b, {{cal}} = c) within an etymology section these would be all of them:
An etymology containing inheritance: i+d* (at least one {{inh}} followed by any number of {{der}})
An etymology containing borrowing: bd* (only one {{bor}} followed by any number of {{der}})
An etymology containing calquing: cd* (only one {{cal}} followed by any number of {{der}})
An etymology containing derivation: d+ (at least one {{der}})
You cannot have a section like iidid or iibdd. These would be considered ill-formatted according to the intended usages of the templates. I also noticed we need to add coverage for {{cal}} and all the aliases of these templates.
After writing all of this, I realized that you might have been asking how, when transcluded, peace would have a distribution of ibi before processing, which would get processed to idi. I'll look into fixing this, but it shouldn't be too bad. I may not get to this immediately though. —JohnC514:54, 24 May 2017 (UTC)
This is all very complicated, and I realized peace may be a bad example. The first "inherit" would be indicated using {{inh}}, while the rest of the derivation (bi) would be handled by {{termetym}}. And hypothetically each entry in the chain of derivation would have {{termetym}}, and the Middle English entry would turn the {{inh|fr|la}} into {{der|enm|la}}, so there might not end up being a problem after all. I was thinking of hypothetical scenarios: iibi (English word inherited from Old English, borrowed from Latin, inherited from Proto-Indo-European). Not sure if that actually occurs. (I mean, it wouldn't occur in one etymology section; it might occur if you were transcluding the etymology from each item in the chain.)
@Erutuon: I think, like me, and as JohnC5 and my edits can testify, you might be overthinking it. You're never going to have an entry that is iibi because it should always become iidd from the perspective of the source. As soon as the chain hits a {{bor}}, that step in the chain and the rest going forward become {{der}}. --Victar (talk) 18:41, 24 May 2017 (UTC)
Still, using a parameter |noinh=1 seems messy. The module should be able to tell on its own when inheritance should be changed to derivation, because it is completely predictable. Editors, on the other hand, are prone to error. I like the idea of using a code to represent the chain of derivation. There are difficulties, though. I'm considering the idea of a template showing the proximate relationships between words in the chain, but am not sure how the parameters would be structured. — Eru·tuon18:04, 24 May 2017 (UTC)
@Erutuon: Yeah, |noinh=1 is a bit annoying, but the only time you would ever use it is after a {{bor}} and {{der}}, so it's pretty straightforward. Ideally, it would be great if we could simply use |ety=1 within {{bor}}. --Victar (talk) 18:41, 24 May 2017 (UTC)
Hmm, so essentially to make this automatic, {{termetym}} would have to communicate with the etymology template that comes before it, and that is impossible. Humph. — Eru·tuon19:01, 24 May 2017 (UTC)
@Chuck Entz: Just needed to removed the extra "a" from the Latin etymology. More to the point though, is it functioning how you would hope/expect? --Victar (talk) 05:39, 26 May 2017 (UTC)
@JohnC5, Chuck Entz, CodeCat Nothing truly to do with {{termetym}}, but I just cleaned-up peacock and it got me again thinking about how to deal with {{compound}}. Should it just actually end the etymology, requiring people to click on either element? In this case, I only followed the tree up though the first element, since the second is really just an meaning intensifier. Or is this just an example of how it should be left to the editor's discretion and there is no one standard? --Victar (talk) 16:17, 24 May 2017 (UTC)
There are always going to be things like rebracketing and backformation to mess us up (not to mention calques), and complex compounds with lots of potential derivation chains. Perhaps we need parameters to control whether to follow the derivation of specific morphemes. Chuck Entz (talk) 04:56, 26 May 2017 (UTC)
I have a suggestion. To me, the name of the template and the modules are somewhat inscrutable. It would make more sense to me if the template were called {{getetym}}. That is a short description of what the template does: grab an etymology from another entry's etymology section. termetym sounds like it means etymology of a term, which could describe an entire etymology section, and it's unclear, from the name, what the template does to or with the etymology section. — Eru·tuon21:00, 24 May 2017 (UTC)
Hmm, I'm not a fan. "get" is associated with functions and not in keeping with other template naming schemes. {{termetym}} was fashioned after {{desctree}}. We could use {{etym}}, instead of it being a redirect to {{etyl}}, or we could also make a sorter redirect, like {{tetym}}, {{tety}} or the reversed, {{etyt}}. --Victar (talk) 21:17, 24 May 2017 (UTC)
This sounds very promising, but also very expensive in terms of memory. Would it work on a page like mole, which has a lot of etymology sections in several language sections? Can it handle the fact that one English etymology section refers to Spanishmole, which itself has multiple etymology sections? (Perhaps it could use anchors similar to senseids?) Will it use so much memory thait breaks the page? - -sche(discuss)05:54, 25 May 2017 (UTC)
Answers to two of your questions: It can handle there being several language sections, each with its own Etymology section. (That can be seen in the demonstration page Appendix:duvet.) But it can't handle multiple etymologies in the same language section yet. — Eru·tuon06:04, 25 May 2017 (UTC)
Welsh singulative parameter
Could someone who's better at editing templates than I am please add (1) options for 1=m-p, 1=f-p, 1=f-m-p, 1=m-f-p (also for g= and g2=) to {{cy-noun}}, and (2) the function that 2= displays "singulative" instead of "plural" whenever 1= is set to one of the plural options? See abwyd for what I'd like the end result of
Very oddly designed template. It uses an instance of {{head}} for the first form and then manually tags the rest of the forms. — Eru·tuon03:40, 24 May 2017 (UTC)
I would try to do what you ask, but I don't know how to make sure the acceleration-related HTML stuff continues to work. — Eru·tuon03:54, 24 May 2017 (UTC)
Hmm, maybe it would be better for someone to make a module for Welsh headword lines. Unfortunately, I'm not the one to do that. Would anyone like to try? —Aɴɢʀ (talk) 09:23, 24 May 2017 (UTC)
I've started a new template {{cy-noun/new}} that shows "singulative" when the gender is set to m-p or f-p. It doesn't have accelerated entry creation (green links), though. —Aɴɢʀ (talk) 14:30, 2 June 2017 (UTC)
bot problem
SemperBlottoBot hasn't worked since they changed http to https.
I've had a go at updating the bot software and now get this error message when the bot runs:-
File "C:\<whatever>\pywikibot\data\api.py", line 2560, in getCookie
prefix = login_result
@Barytonesis: Done. {{unk}} is a very helpful for categorization. It should be used on entry pages in conjunction with a source that cites the etymology as unknown. Otherwise {{rfe}} should be used instead. --Victar (talk) 00:22, 24 May 2017 (UTC)
@Barytonesis: Oh, you're talking about the template name. No, {{unk}} is a good name, and inline with {{der}}, {{bor}}, etc. I do wonder if we should be using {{etyl|und}} or {{der|lang|und|-}} instead, or have {{unk|lang}} be a redirect to the former. --Victar (talk) 22:33, 25 May 2017 (UTC)
People are free to use either the full name or the shortcut. If we don't want people to use the full name, we shouldn't have it. —CodeCat22:54, 25 May 2017 (UTC)
I don't see it mentioned anywhere that {{unknown}} is not to be used. The fact that it's the name of the template moreover invites people to use it. —CodeCat23:41, 25 May 2017 (UTC)
@CodeCat: It seems you missed the fact that I just moved {{unk}} to {{unknown}}. No {{unknown}} previously existed. I was referring to {{der}} which cites the usage der, inh and bor. You can write out {{mention}} and {{link}} as well but there it is also not the recommended usage. --Victar (talk) 01:43, 26 May 2017 (UTC)
@Victar: I don't see anything on the template documentation page for {{der}} that says you shouldn't use the full spelling {{derived}}. Only thing is that you should use {{inh}} or {{bor}} whenever possible, to make the derivation more specific. — Eru·tuon02:25, 26 May 2017 (UTC)
Exactly that. Also, if you ever do use {{borrowing|en|fr|duvet}} or {{mention|en|apple}}, rest assured, someone like @Angr is going to come along and change it to {{bor}} or {{m}} and probably write something on your talk page. --Victar (talk) 02:35, 26 May 2017 (UTC)
(The template can perhaps do most (if not all) of its work by extracting all the headword templates (tashkil forms, manual transliterations, etc.) on the page, and hence be parameter-less.) Wyang (talk) 23:12, 24 May 2017 (UTC)
Sounds interesting. I've been struggling to think about how the consonants ظ and ع are pronounced. (Also, I've been thinking of an Arabic script for German and related Germanic languages and dialects.) --Lo Ximiendo (talk) 23:55, 24 May 2017 (UTC)
I started a module that generates the Arabic pronunciation from the transliteration: Module:ar-pronunciation. It's not quite ready; it doesn't show stress, or transcribe ـَة(-a) as /ah/, and it might not be able to handle phrases. — Eru·tuon01:57, 25 May 2017 (UTC)
@Wyang: Having the module extract voweled forms from the headword templates sounds great, if we can make it work when there are multiple Pronunciation sections. (It would also be a great feature to have {{grc-IPA}} extract macroned forms from the headword template.) — Eru·tuon18:50, 25 May 2017 (UTC)
As far as presentation is concerned, I would format the examples as shown on the right. This is the way I typically format dialectal pronunciations in English entries. I admit it's repetitive to have two of the prefix IPA (key). — Eru·tuon18:58, 25 May 2017 (UTC)
The pronunciations are for MSA, aren't they? And if you wanted to do Classical, then wouldn't you have used ? It would also be ideal if it could do syllabification. --WikiTiki8921:15, 25 May 2017 (UTC)
Yeah, but MSA has a bunch of different regional pronunciations. Again, you can change it. — Eru·tuon21:18, 25 May 2017 (UTC)
Yeah, but there's a sort of "Standard MSA". And the /g/ pronunciation is heavily marked as Egyptian. And I did change it. --WikiTiki8921:45, 25 May 2017 (UTC)
At least the pronunciations /ʒ/ and /dʒ/ are equally standard. You'll never hear a Syrian or Lebanese person use /dʒ/ instead of his native /ʒ/. It is also rather rare for northern Egyptians to use /ʒ/ or /dʒ/ instead of their native /g/, but that does happen. Kolmiel (talk) 11:39, 26 May 2017 (UTC)
@Erutuon: Thanks! I wasn't aware Module:ar-pronunciation exists, but it seems like a pretty good start. The format is amenable to change, and I think putting the IPA tag in front of the individual pronunciations looks more aesthetic. Re headword templates: An example of such parsing is Module:th-headword and Module:th-pron―which do it in the reverse order, i.e. the headword template interprets input in the pronunciation template. There is probably a way around multiple pronunciation sections, but it will require some investigation. Wyang (talk) 21:47, 25 May 2017 (UTC)
I will join the efforts to make Arabic transcriptions work. It may never be perfect because of shortness lack of references, dialectal differences even with MSA pronunciations. There are also variants and different styles but we can agree on what and how we transcribe. --Anatoli T.(обсудить/вклад)22:24, 25 May 2017 (UTC)
Yes, I can't select any of the tabs. The only language I can see or edit (without opening the entire page) is the top language, usually English. I'm guessing it only affects the Firefox browser. —Stephen(Talk)05:11, 25 May 2017 (UTC)
Rather oddly on here, the comparative and superlative degrees of Latin adjectives (e.g. laetus) are generally neither listed in the headword line (even though the various Latin adjective headword line templates, such as {{la-adj-1&2}}, do support them) nor the declension table itself but are simply given below the table. The consensus here is to move to the headword line comparatives and superlatives that are already there below the table. Would someone mind having a bot do that for us? Esszet (talk) 23:43, 24 May 2017 (UTC)
PS. What kind of permissions/features do I need to activate in sv.wikt to get the same possibilities?
@Moberg: Nothing happened. Judging by how the Wikidata business has gone so far, the process will be entirely passive on our part, and they will not consult with us about what will happen and when. —Μετάknowledgediscuss/deeds17:40, 26 May 2017 (UTC)
It may or may not, I am not sure at all. @Lea Lacroix (WMDE) At which point do we have enabled Wikibase Client and arbitrary access? Does it worth to request it for testing purposes? --Vriullop (talk) 06:12, 2 June 2017 (UTC)
I am less concerned about the unilateral project that is being pushed on Wiktionary communities and more concerned about enabling local communities to decide how they want to use the Wikidata structure and data. - DaveRoss11:47, 2 June 2017 (UTC)
Hello, thanks for your questions. Enabling arbitrary access on Wiktionary is the next step after enabling sitelinks for non-main namespaces. We didn't plan anything yet, but since several users from English Wiktionary asked us to enable it, we may adapt our schedule to allow you to try it soon :) Of course, we will take your community vote into account, if the result if negative, we won't deploy anything. If you're interested to try arbitrary access on English Wiktionary, then we will make the necessary changes so you can include Wikidata data in your pages.
That doesn't mean that we will force you to use the data. That means that you will be able to "embed" some informations stored in Wikidata, using a simple code such as {{#statements:part of|from=Q9264}}. The community will remain totally free to decide where, when, for which uses you want to use data. We will allow the possibility to do it, nothing more.
Of course, we can also deploy it on demand for other Wiktionaries.
About "installing Wikidata", we should talk about what you mean exactly. For enabling sitelinks, arbitrary access, etc. there is no need to install something. It's not a new database, it's about improving the code of the Wiktionaries to allow the access I describe above.
By "improving the code" do you mean installing the Wikidata, Wikibase and DataValues extensions? I think that is what is meant when we speak of installing Wikidata. - DaveRoss14:18, 2 June 2017 (UTC)
@JohnC5: I guess so. I didn't want to do that because I thought maybe there was a reason for putting in "comparable" rather than "comparative". — Eru·tuon23:40, 25 May 2017 (UTC)
I see it only has a few entries but I'm not clear on which category or module needs to be edited per entry in order to actually empty this maintenance category. —Justin (koavf)❤T☮C☺M☯01:24, 26 May 2017 (UTC)
The thing to do is update Module:scripts/data so that each character is included in the range of one of the scripts on that page. :) A couple of the characters sre Thai; it seems our current 'range' for Thai is too narrow. - -sche(discuss)05:41, 26 May 2017 (UTC)
I think the problem is from "findBestScript" in my Module:mul-letter because "mul" has not been assigned EVERY available script; it may be solved by putting a script code into mul-letter template. But I wish it has better solution to "findBestScript" of "mul" without manual input, if someone can help modifying Module:scripts. --Octahedron80 (talk) 06:24, 26 May 2017 (UTC)
Yeah, I wish there were a function to return the script code for a codepoint. There are scripts that share characters: Cyrs and Cyrl, Grek and polytonic, the various Arabic script classes, and the various Latin script classes. But those conflicts can probably be resolved in some way or another. — Eru·tuon07:01, 26 May 2017 (UTC)
I think it should work this way: The function should only choose Cyrs if the character is not in the character list for Cyrl; same with Grek and polytonic. So letters that are only in polytonic, like ἁ(ha), would be considered polytonic, while letters like α(a) that are in both Grek and polytonic would be considered Grek. The modern script wins over the older one. — Eru·tuon06:45, 27 May 2017 (UTC)
The function I'm proposing could also be used to add script classes to the links in the {{also}} template, to ensure they display well. — Eru·tuon07:03, 26 May 2017 (UTC)
The module already accepts manual sc; thanks to Erutuon. But it should ideally detect script by its own first. To run a bot adding sc everywhere is suchlike opposite thinking. --Octahedron80 (talk) 06:40, 27 May 2017 (UTC)
Bot task: indicate what script “unspecified script languages'” entries are in
It is unfortunately just beyond the reach of my limited coding skills, but I wonder if one of you might run a script (if it would not be too difficult) to look through the "lemmas" categories of all the languages in Category:Unspecified script languages, and add : scripts = {"Latn"}, to the entry (in Module:languages) of each language which has entries in the Latin script ("A-Za-zÀ-ÖØ-öø-ɏḀ-ỿ"). This would presumably knock out most of them. More ambitiously, the script might also add script data for languages with entries in other scripts, possibly using Module:scripts/data. (Great minds think alike, since I notice Koavf proposed something similar on Module talk:scripts/data, having only now read that discussion which I assumed was just restating the section above this,..) - -sche(discuss)06:19, 26 May 2017 (UTC)
Do we follow the Unicode standard for these script codes? If so it would be fairly easy for a bot to determine the script code of each character without having to read that module. - DaveRoss12:27, 26 May 2017 (UTC)
Closely enough that it would work for this task, I think. (But we careful that if an entry uses a Latin script letter and then an "IPA"/"modifier letter", the language should be said to use Latin script only, not also "Zsym".) Of course, one could always just knock out the Latin script characters first and then see what was left. - -sche(discuss)17:36, 26 May 2017 (UTC)
IPA pharyngealization character
It has come to my attention that there are two similar Unicode codepoints that can be used to indicate pharyngealization (ironically, the "small" one appears larger):
U+02C1 MODIFIER LETTER REVERSED GLOTTAL STOP (e.g. /sˁ/ replace ˁ with ˤ, invalid IPA characters (ˁ))
U+02E4 MODIFIER LETTER SMALL REVERSED GLOTTAL STOP (e.g. /sˤ/)
I have no idea what U+02E4 is for or why it was added to Unicode, but I'd say U+02C1 is correct for normal IPA purposes, since it immediately follows U+02C0 MODIFIER LETTER GLOTTAL STOP and there is no MODIFIER LETTER SMALL GLOTTAL STOP. And of the full-size letters, U+0294 LATIN LETTER GLOTTAL STOP is the one intended for IPA, not U+0242 LATIN SMALL LETTER GLOTTAL STOP. —Aɴɢʀ (talk) 16:26, 26 May 2017 (UTC)
It was on the basis of the Wiktionary entries: the entry for the first sign, ˁ, says it's an IPA symbol, while the entry for the second, ˤ, says it's an Egyptological symbol. I'm beginning to be doubtful: perhaps the entries are actually wrong. We should have some external verification for this. Phonetic symbols in Unicode on Wikipedia says that the second symbol, U+02E4, is an IPA symbol. — Eru·tuon17:02, 26 May 2017 (UTC)
In the official Unicode chart, U+02E4 is in the section "Additions based on 1989 IPA". I'm not sure exactly what that means. The Wikipedia article Pharyngealization says both characters can be used and uses them inconsistently and interchangeably. --WikiTiki8917:20, 26 May 2017 (UTC)
I support standardizing on U+02C1, at least in IPA (perhaps Egyptian editors prefer to use the other character in romanizations?), for the reason Angr gives — it is the counterpart to ˀ and matches it in size. - -sche(discuss)21:38, 26 May 2017 (UTC)
In what way is it the counterpart to ˀ? They represent conpletely unrelated things in IPA. It's really the counterpart to ʼ. --WikiTiki8921:54, 26 May 2017 (UTC)
They immediately follow each other in their codepoints and are mirror images with parallel names, with ˁ derived from its counterpart ˀ by reversal. - -sche(discuss)05:24, 27 May 2017 (UTC)
The makers of the Gentium font seem to think the neighboring characters are IPA, because they give them identical letterforms, while the more distant one is taller and has a serif: ˁˤˀ. Here, the distant one is in the middle. However, Doulos SIL and Charis SIL make no distinction: ˁˤˀ, ˁˤˀ. (If you don't have these fonts installed, just ignore this post.) — Eru·tuon06:29, 27 May 2017 (UTC)
In fact U+02E4 ˤ seems to be the IPA character. It is not a standard Egyptological symbol; we just use it as a kludge instead of the more correct ꜥ because the latter used to not have wide font support. According to Unicode, U+02C1 is a ‘typographical alternate for U+02BF’, i.e. ʿ, whereas U+02E4 canonically decomposes to a superscript U+0295, i.e. ʕ. The official IPA website also has a link labeled ‘IPA and Unicode’ that leads here, where U+02E4 is given as the hex code for ‘pharyngealized’. So I would guess that U+02E4 is the official IPA character. —Vorziblix (talk) 09:07, 1 June 2017 (UTC)
Oops the new version fails to be even parsed. It seems MediaWiki does not support ECMAScript 6. @-sche I have updated the code at Giorgi's. Update the gadget too please. --Dixtosa (talk) 22:18, 26 May 2017 (UTC)
I see that entry has been fixed by inputting the plural form as "pl=" rather than as an unnamed parameter. Whether the template should be changed to allow the plural to be put in as the first unnamed parameter, or whether something else more often needs to be put in that slot, I don't know. - -sche(discuss)20:08, 27 May 2017 (UTC)
I'm confused; what was the incorrect plural that the template was generating? I only saw Ausflüchte, which is correct according to German Wiktionary. Oh, it was *Ausfluchte. — Eru·tuon00:06, 28 May 2017 (UTC)
Bot task: CAT:Taos lemmas entries should use modifier apostrophes
Could someone move all the Taos entries (at the moment there are no non-lemma entries, so all entries are in CAT:Taos lemmas) which use ' or ’, to instead use the modifier-letter apostrophe ʼ, pursuant to this RFM? Overwriting redirects is fine. Ideally, the bot would also fix links to the pages it moved, e.g. from translations tables. Links inside Taos entries which (links) use ’ should also be updated to use ʼ. (Any links inside Taos entries that use ' will probably need to be handled in AWB by me or someone else on the lookout for false positives / links that are OK as-is, like a link to brewer's yeast). - -sche(discuss)20:06, 27 May 2017 (UTC)
See межевать for an example. The left column is a half line lower than the right one, at least on my screen and browser (Mac OS X 10.9.5, Chrome). They used to line up fine, so something has gotten broken in the meantime. Benwing2 (talk) 21:07, 27 May 2017 (UTC)
I'm having problems with the automatic categorization linking to categories that words don't belong in. For example, I edited the formatting of the etymology of begynne to make it link to the words that it mentions, and now for some reason it's in the category Old English words prefixed with be-, which is definitely wrong because it's not even Old English. Also, yerd, which I'm currently adding an English definition for, is automatically categorized in English twice-borrowed terms even though it's not borrowed at all. The current etymology is
The problem related to the Norwegian entries in begynne is that you used {{prefix|ang|be|}}. The template {{prefix}} is for etymologies, and it automatically categorizes words in a category such as Old English words prefixed with be-, using the language code (in this case ang, for Old English). You should be using the linking template {{m}} instead. And probably you should use the etymology given in the Old English entry beginnan, which derives the word from Proto-Germanic. — Eru·tuon06:04, 28 May 2017 (UTC)
As to your second question, when I enter that code, I don't get an "English twice-borrowed terms" category at all, so I don't know what's going on there. — Eru·tuon06:22, 28 May 2017 (UTC)
Maybe remove a few "redlinks" categories to avoid module errors
Maybe it's a good idea to disable the redlink categories of a few languages, if they are not being used right now, because they use expensive functions and a lot of those languages enabled at the same time can cause module errors.
I could be wrong, but I don't think expensive functions contribute much more than other functions to the Lua memory load. — Eru·tuon02:05, 29 May 2017 (UTC)
Why not include a suitably hard-to-find switch to facilitate turning these redlink categories on and off in response to actual interest from a user or lack of interest from any users? Perhaps a switch for each language could be in a subpage of its "About" page, which page could be protected from editing by anyone but an admin.
How long would it take for such a category to be repopulated to 90% of complete? How long does depopulation take? DCDuring (talk) 02:10, 29 May 2017 (UTC)
I removed English from the template- there were already a couple dozen entries with "too many expensive function calls" due to it. Links to English entries aren't in the translation tables, but they're everywhere else, with Derived terms alone more than enough to take many entries over the 500-per page limit. Chuck Entz (talk) 03:09, 29 May 2017 (UTC)
Are categories really the best way to generate the lists? That is, do they have to be in (near-)real time? Dumps are available now every two weeks? Dump processing can give additional information as well. Obviously there is less labor involved, but downloading the entire dump and running a script with regex doesn't have to take very long. The possibility exists as well of generating counts of the redlinks for each missing term. DCDuring (talk) 03:22, 29 May 2017 (UTC)
I agree. We're clearly using too many different Lua functions, because errors have started cropping up more and more frequently and in more and more entries: water, man, I, iron, etc. And this is something we don't need near-real-time categories for. Something that parsed a database dump, and for more languages than just this, would be ideal. Another thing to consider is disabling automatic transliteration in translations tables (or alphabetic scripts, or of all scripts), because that is known (by us and also noticed by e.g. the folks at Phabricator) to eat up a looooot of memory. - -sche(discuss)03:56, 29 May 2017 (UTC)
The expensive parser function calls error is separate from Lua memory usage. I agree there's no good reason to have these categories though, especially since they don't really seem to be used. DTLHS (talk) 04:03, 29 May 2017 (UTC)
The inclusion of languages is mostly due to people who edit those languages asking for them to be included. The categories' usefulness isn't the issue, it's the inefficient way they're generated: this template gets executed every time a page is loaded that uses the links module- once for every module-generated link. The only thing that keeps it from overloading things on every entry is that it does the expensive stuff for just the selected language codes. Chuck Entz (talk) 04:41, 29 May 2017 (UTC)
As I think I've said in a previous discussion, it would be great if someone created redlinks lists based on dumps. They don't have to be categories. --Daniel Carrero (talk) 18:06, 29 May 2017 (UTC)
I'm using the Spanish redlinks cat. It's my favourite page this month. -WF
See Reconstruction:Proto-Slavic/vьrgnǫti. The three columns should be headed "East Slavic:", "South Slavic:", and "West Slavic:", respectively, but instead I see East Slavic continue onto the second column, and South Slavic placed at the bottom. This happens both in Safari and Chrome on Mac OS X 10.9. Benwing2 (talk) 21:41, 29 May 2017 (UTC)
@CodeCat If this isn't able to be resolved we should revert the changes that make {{top3}} auto balancing. We could make another template {{top3-a}} that has the auto balancing CSS. DTLHS (talk) 21:45, 29 May 2017 (UTC)
In the vast majority of cases, columns should be balanced. Cases like the above were really misuses of the these column templates. They should be replaced with specialized templates and/or tables. --WikiTiki8916:33, 30 May 2017 (UTC)
First of all, there are two brokennesses, and only one of them concerns unbalanced columns. The other one mentioned somewhere above concerns extra blank space in an entirely balanced two-column layout. Secondly, I don't agree that it is reasonable to make a change like this that will cause significant brokenness for existing pages and then simply tell those pages that they're "misuses" and need to be fixed. It is the responsibility of the template changer to handle any breakage that ensues; if they're unwilling to do this, the change should be reverted. Benwing2 (talk) 07:36, 31 May 2017 (UTC)
Template:zh-pron: In Firefox, Mandarin audio player covers up Mandarin IPA text
I kept wondering why so few Chinese characters had IPA for their Mandarin, but then I found out they do! It is just that in Firefox, the audio player control is graphically glitched so it covers up the IPA text.
(edit conflict) I have the same version of Firefox, and I see the same thing. Another difference to note is that all of the indented lines below the language names have bullet points on Firefox, but not on Safari, and when the audio player is followed by one of those indented lines instead of a language name, there are two bullet points. Also, the bullet point for the line the audio player should be on is correctly placed, but the audio player and everything after it is shifted up a line. Chuck Entz (talk) 02:11, 30 May 2017 (UTC)
Not sure what I was looking at before, but when I looked at it again, both browsers showed the bullet points. Chuck Entz (talk) 02:52, 30 May 2017 (UTC)
As thwikt has Linter extension installed; it notifies about obsolete HTML tag (tt) inside zh-pron which is adapted from enwikt. It populates about 40000 soft errors. Please replace tt with another tag. --Octahedron80 (talk) 02:00, 30 May 2017 (UTC)
As thwikt has Linter extension installed; it notifies about missing end tag inside character info which is adapted from enwikt. It populates about 80000 soft errors. Please solve this. --Octahedron80 (talk) 02:03, 30 May 2017 (UTC)
Two broken gadgets which break expanding some items
At the moment, there's two labels, Flemish and Hollandic. The former has no link and doesn't even add the entry to a category, while the latter does. The "Hollandic" label links to the Wikipedia article about Holland, but this is the modern area of Holland. Since this is a medieval language, a link to w:County of Holland would be more appropriate, and also w:County of Flanders. Can this be done? —CodeCat20:14, 30 May 2017 (UTC)
@CodeCat: Are the labels Flemish and Hollandic used for any languages besides Middle Dutch? If they are, it is currently impossible to make them have different content depending on the language, so all languages would have to link to these Wikipedia articles, even if they are spoken at a time when these Counties didn't exist. — Eru·tuon21:05, 30 May 2017 (UTC)
No, all we have is the ability to restrict labels to a specific language. So, if you use {{lb|en|Doric}} rather than {{lb|grc|Doric}}, it won't use the label data file, and it'll add a tracking template. But at the moment there is no way to have a label for Doric Scots and a label for Doric Greek that both have the same key ("Doric"). — Eru·tuon21:10, 30 May 2017 (UTC)
Think of these subtables as fully-fledged labels in their own right. So there's an entirely separate label definition for each language. The module would first see if a table index exists with the code of the current language, and if so, it uses the data there. Otherwise, it uses the data directly under the label as before. —CodeCat19:08, 2 June 2017 (UTC)
I understand the idea, but so far it's not working in the sandbox module. And I have other concerns. — Eru·tuon19:20, 2 June 2017 (UTC)
@CodeCat: I would rather not replace the existing language-specific labeling system with the new system, as you suggested in this edit summary. The new system offers no way to track labels that are being used in the wrong language, which may be useful in some cases. (For instance, perhaps someone would want to go through and correct labels that are used with the Cantonese language code when they are supposed to be used with generic Chinese, or clarify whether a label is actually being used the way its label data assumes it would be.) And it doesn't offer a way to use the same label data for more than one language. I'm not pleased with there being two ways of handling language-specific labels, though. — Eru·tuon19:44, 2 June 2017 (UTC)
A possible solution would be to introduce something like invalid = true on the top-level label. If that value is true, then that means the label isn't valid. Since the language-specific data doesn't have that tag, it works as normal. This lets you do the inverse too: make the label valid for all except a selected language. As for using the same label data for more than one language, that's true, but how big of an issue is that? —CodeCat19:47, 2 June 2017 (UTC)
As more and more entries use templates to add categories (and especially if Wiktionary:Votes/2017-05/Templatizing topical categories in the mainspace were to pass), those entries can no longer be quickly edited with HotCat. Anyone fancy a go at updating the gadget to handle templatized categories? But, caution: there is some preparatory discussion underway leading to a proposal that might replace categories named like "en:Foo" with "English foo", which might change how/whether templates are used to add categories. - -sche(discuss)23:53, 30 May 2017 (UTC)
While we're at it, can a new version of HotCat be written to put the category in the correct language section? As of now, it puts the category at the bottom of the page, regardless of the language. So for example, if I wanted to use HotCat to add a category to cha#English, it would actually be added under cha#Zulu, which isn't good, especially in tabbed languages view. —Aɴɢʀ (talk) 19:39, 4 June 2017 (UTC)
{{blend}} should add categories for prefixes and suffixes, like {{affix}} does
I always wonder if "blend" is the appropriate terminology for words where morphemes are transparent; for me it's not exactly the same situation as in blurse, for example. Also, I don't understand the point about "blend versus portmanteau" on the documentation page. "blend is the correct linguistic term for a word made by merging two words" looks like the definition of a compound. --Oxytonesis (talk) 17:56, 31 May 2017 (UTC)
The problem with adding that capability is that the inputs to the blends don't necessarily end up as discrete morphemes with one attached to the other. A blend strikes me as more like a compound, because it can combine two or more independent terms without making one subordinate to the other. For instance, cockapoo is a blend of cocker spaniel and poodle. Is cocker spaniel a prefix? Is poodle a suffix? When {{confix}} was first introduced, we had a great deal of trouble with people using it improperly for compounds, with a flood of bogus affix categories as the result. I think adding the capabilities you're asking for would encourage similar abuses, though not on the same scale. Chuck Entz (talk) 02:31, 23 June 2017 (UTC)
I agree with Chuck's assessment, if a word is formed by combining an affix with another word {{blend}} is not the appropriate template. If the word is formed by blending two words then those words should not be categorized as affixes. - DaveRoss12:56, 23 June 2017 (UTC)
Then what is the appropriate classification for prequel? It looks like a blend consisting of a prefix and a full word. Either that or a portmanteau. I don't know what the difference is. The first syllable of prequel is replaced with the prefix pre-. Does the prefix pre- have to be considered a full word because it is combined with another word in a blend? — Eru·tuon16:39, 23 June 2017 (UTC)
prequel is not prefixed with pre-, so it doesn't belong in the category. This is about categorization. {{blend}} is for terms like smog which are not roots with affixes, but rather are words formed by the blending of two other words. I don't think prequel falls into that category, however there may be blends which have a constituent word which is an affix, in those cases I think the category should be added separately as a special case. The worst case scenario is one in which smoke or fog are added to an affix category because they are the components blended in smog. - DaveRoss16:51, 23 June 2017 (UTC)
Well, how can prequel not be a blend or a portmanteau? The first syllable (or the first consonant) of its second component is replaced with that of its first component. I thought the definition of a blend or portmanteau was that one component replaces part of the other component. If it were presequel, it would not be a blend. But I'm not sure what I think about the "words formed with prefix" categorizing issue. — Eru·tuon16:59, 23 June 2017 (UTC)
If you want to assert that prequel is a blend that is fine, it doesn't change my point. I consider it to be one of those words which is patterned after another word, but that is subjective. - DaveRoss17:19, 23 June 2017 (UTC)
(edit conflict) Okay, so prequel may be a blend formed with the prefix pre-, but it should not be considered as prefixed with pre-. Interesting. It seems a distinction that is likely to confuse readers, but okay. I mean, I would rather simply stick all words that are formed with the prefix pre- in some way or another in the category English words prefixed with pre-, and avoid making finer distinctions about what "prefixed" means. For what it's worth, the OED gives the derivation as pre- + -quel. Probably the suffix -quel originated with prequel. — Eru·tuon17:30, 23 June 2017 (UTC)
Huh? I'm not proposing that non-affixes be converted to affixes in {{blend}}. {{blend|smoke|fog}} would not result in Blend of smoke- + fog. You'd have to enter it as {{blend|smoke-|fog}}, with a hyphen, to get that result. — Eru·tuon18:10, 23 June 2017 (UTC)
My first thought is that the combination categories ("Italian explorers", etc) might be too "granular", to use the going word. And since I gather they could fairly easily be added in later and would automatically become populated, one could wait and only add them if the "Italian people", "American people" etc categories become too large. But my second thought is of how many Italian people, American people, etc things are named after; those categories will surely become very large, so we could go ahead and create at least some "granular" categories from the start.
However, we probably want to avoid an over-abundance of possible occupations, and especially synonyms that would result in entries being split among two categories and thus made harder to find (one entry might say "alderman" where another says "city councillor", all where "politician" is probably sufficient), so maybe the template should rely on a list (the way {{label}} does) and silently put the entry into an "attention" category if an un-listed occupation (location, etc) is given. - -sche(discuss)08:26, 2 June 2017 (UTC)
Using the lists defined in {{label}} is a great idea! I suppose one could do this, if they really want to: {{nam|en|it|Mario Rossi|occ=]}}. --Victar (talk) 15:26, 2 June 2017 (UTC)
If we're relying on {{lb}} lists, perhaps we can format it like this: {{nam|en|American|Amelia Bloomer||feminist|activist|politician}} --Victar (talk) 15:40, 2 June 2017 (UTC)
To be clear, I'm not suggesting that Module:labels/data itself be used for this (it doesn't contain labels for "explorers" or "Italian people", does it? and it shouldn't), but just that a data module like that could be used. - -sche(discuss)04:21, 3 June 2017 (UTC)
It might also be useful to have a notext= parameter, that would result in no visible text (at all!) being displayed, but categories being added. Then the template could be used even in entries where its particular wording might not be desirable, e.g. where {{blend}} or {{compound}} is already used. - -sche(discuss)08:30, 2 June 2017 (UTC)
I support some granularisation of eponym categories (without removing pages from the main category; we could do something similar to what has been done with Category:English terms derived from Italian-type categories in the last few years). Category:English eponyms is already too large for comfort. This is what I had in mind when I devised {{named-after}} with parameters for occupation and nationality several years ago, even if the way I implemented it wasn’t well thought out.
Concerning your proposed categories above, I think that “derived from proper nouns” is better than “named after proper nouns”; and “terms named after place names” feels redundant to “terms named after places”. — Ungoliant(falai)21:37, 4 June 2017 (UTC)