Hello, you have come here looking for the meaning of the word Wiktionary:Grease pit/2014/February. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary:Grease pit/2014/February, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary:Grease pit/2014/February in singular and plural. Everything you need to know about the word Wiktionary:Grease pit/2014/February you have here. The definition of the word Wiktionary:Grease pit/2014/February will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary:Grease pit/2014/February, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
They literally perform the same function though. No reason to have two templates when we can simply parameterize one to accommodate the other. TeleComNasSprVen (talk) 02:52, 2 February 2014 (UTC)
The first category is terribly named; it should be deleted and its contents moved into the second category. I thought it was the Missgeburt of a newb until I saw with considerable surprise who had created it. - -sche(discuss)09:49, 2 February 2014 (UTC)
I only created the category because it wasn't empty, it was being filled by {{spelling of}} and I didn't really know what else to do with it. I figured that if the category existed, its oddness would be noticed more easily by people who knew what to do. That worked, apparently. :) —CodeCat14:17, 2 February 2014 (UTC)
It would seem that the relatively few uses of {{spelling of}} need to be reviewed so that dumb category names are not forced on us by template writers. Say, wouldn't that be a good idea for several of the categorizing templates? DCDuringTALK14:35, 2 February 2014 (UTC)
I can't even print them out. I tried encoding with utf-8, utf-16, iso8859-1, iso8859-7, iso8859. I also tried decoding with them. I also tried encoding then decoding. I even tried decoding then encoding. All didn't work. --kc_kennylau (talk) 16:50, 2 February 2014 (UTC)
It's not a policy question, it's a technical question- so it should be asked at Grease Pit. In cases where you're not sure where to ask, you can start at the Information desk. Chuck Entz (talk) 17:11, 2 February 2014 (UTC)
I would start with reading the error message. And giving details about the OS, Python version, and other such things. Trial-and-error programming is never the solution. Keφr18:20, 2 February 2014 (UTC)
.encode('iso8859-7'): UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256), Python 2.7.6, Windows Vista --kc_kennylau (talk) 18:33, 2 February 2014 (UTC)
Are you running this from inside cmd.exe? Try 0) running chcp 65001 at the command prompt, 1) changing console font. Keφr18:38, 2 February 2014 (UTC)
chcp 65001:
Traceback (most recent call last):
File "orphan-el-altern.py", line 2, in <module>
import catlib
File "C:\Users\Lau's family\Desktop\compat\compat\catlib.py", line 21, in <module>
import wikipedia as pywikibot
File "C:\Users\Lau's family\Desktop\compat\compat\wikipedia.py", line 9723, in <module>
exec "import %s_interface as uiModule" % config.userinterface
File "<string>", line 1, in <module>
File "C:\Users\Lau's family\Desktop\compat\compat\userinterfaces\terminal_interface.py", line 12, in <module>
from terminal_interface_win32 import Win32UI as UI
File "C:\Users\Lau's family\Desktop\compat\compat\userinterfaces\terminal_interface_win32.py", line 10, in <module>
import terminal_interface_base
File "C:\Users\Lau's family\Desktop\compat\compat\userinterfaces\terminal_interface_base.py", line 13, in <module>
transliterator = transliteration.transliterator(config.console_encoding)
File "C:\Users\Lau's family\Desktop\compat\compat\userinterfaces\transliteration.py", line 2019, in __init__
while value.encode(encoding, 'replace').decode(encoding) == "?" and value in self.trans:
So we use templates with a lang= parameter to generate links to language sections, JavaScript to make section red-links orange, special templates to neuter self-links, and a framework of CSS and JavaScript (tabbed languages) to make language sections look like pages.
We have this discussion every so often, and I still oppose it. Putting langcodes in titles complicates searching beyond MW capabilities and thus requires disambiguation pages. Plus, it makes it harder for editors who work in different languages on the same page. It's not worth the switch. —Μετάknowledgediscuss/deeds19:35, 2 February 2014 (UTC)
Then maybe that's where our customisation efforts should be. We should adjust our software to fit our needs, instead of trying to patch our way around it to fit a model that fundamentally doesn't work. The "one page per word in all languages" model just doesn't work. It's cumbersome and no other dictionary would do it that way. The only reason we do it that way is that Wiktionary started off long ago as a mostly English-only dictionary, and it only really had Wikipedia to look to for ideas on how to structure a wiki. But now that we have so much more experience with how a wiki-based dictionary works, we are also in a much better position to judge what works and what doesn't. I think it's pretty clear that there is a lot of room for improvement. I think the argument that it's more convenient for people who want to work on different languages on the same page is so minor that it shouldn't even count. We shouldn't be letting convenience for a minority of editors (and editors are themselves a minority compared to users) get in the way of improving the usability at the more fundamental level. Of course the inertia for such a big change is going to be high, but I think if you ignore the need for changing this, then you're really sticking your head in the sand. —CodeCat19:46, 2 February 2014 (UTC)
Is there any technical reason why, if we did this, each page couldn't transclude all of its subpages automatically so it would look the same as it does now? DTLHS (talk) 19:52, 2 February 2014 (UTC)
It should be possible, but I don't know how easily. We could also make automatic disambiguation pages, which would be nice for people with slower connections. Maybe it could be an option to either transclude or just list links. —CodeCat19:55, 2 February 2014 (UTC)
I like the idea of an automatic index to common spellings, to serve as an index for readers or search aid. But I think it’s mainly not that useful to aggregate whole entries because they are spelled the same in different languages. More useful to index common meanings (thesaurus), common etymology or direct descendants of a term, cognates, synonyms, or things like that.
Translingual entries could be at term, at mul/term (= multiple languages), or at und/term (= undetermined language). —MichaelZ. 2014-02-02 20:14 z
This change actually simplifies the process of searching when a user want to perform a language-based search, and also simplifies language-based monitoring of changes to entries. I would support if we were at the beginning of things. The site badly needs changes, at the MediaWiki level. --Z20:29, 2 February 2014 (UTC)
What links here more useful: terms linking to language X won’t appear in the WLH page of terms linking to language Y. Consequently, it may become a useful tool for searching for related terms, descendants, derived terms and whatnot;
Wiktionary will run faster: if a person only wants to read information about the Portuguese word a he can load the Portuguese content alone, instead of the Portuguese content with content in 77 other languages he doesn’t care about. (This will require the addition of a box below the search box where the person can add the language name);
More flexible use of redirects: the existence of a word in language X won’t prevent us from making the same word in language Y a redirect;
No need for language code parameters: templates will be able to fetch it from the page title (but see con number 1). Consequently:
Less typing;
Creating identical entries will be easier: if multiple entries are identical, as is common for terms in closely related languages, you only need to write the wikicode once and paste it into the other language pages without the need to manually change the lang codes;
Way less incorrect categorisation: is there anyone here who has never seen a weird word in some Terms derived from X category only to find out {{etyl}}’s second parameter was wrong?
Link blueness more useful: if a link is blue, you won’t have to click it to check if the entry you want does indeed exist. Consequently:
Black links less useless: black links are useless because the same word existing in another language completely defeats their purpose. With this change, their purpose will remain undefeated (though the purpose itself will still be pretty stupid IMO);
Easier automatic creation of entries: an entry existing in another language won’t prevent those green creation links from working;
Language name redirects: people will no longer have to memorise which name of a language we use. If someone types Anglo-Saxon in the language box it will take him to ang/foo. If we were nice, we could even have some JavaScript magic to change the lvl 2 heading to match the person’s preferred name;
More flexible use of section linking: we will be able to link to a POS section without fear of the link being rendered useless by someone who adds another language section containing the same POS;
Patrolling easier: with the language in the pagename, it will be easier to skip unpatrolled edits in languages one doesn’t know the first thing about;
IWs more useful: fr/foo would only display IWs whose page contain a French entry. The hub pages (foo) can continue following our current practice of only allowing IWs with the same page name.
Cons:
Conversion will be a nightmare: millions of pages to create and move, thousands of templates to rewrite, new software to create, loads of new practices to decide, a decade of tradition thrown in the dust. How do we undertake the transition without making Wiktionary unavailable during its certainly long duration?
Entries containing / will be invalid: there are very few of these, I think, but we still need to agree on what should become of them;
Adding multiple entries at once will be more difficult: multiple pages will need to be opened. Some of this disadvantage may be offset by pro #5.2;
Translingual issues: by Translingual, I understand terms used across multiple languages (thus its language code mul). So, an entry like cm is relevant to someone interested in French as much as it is to someone who is interested in English. However, if this change is undertaken and a person searches for something like a with French as the language, he might never see the Translingual entry which may contain the sense he is looking for.
I will simply repeat what I said one of the last times this was proposed. At that time, the concern was that a very small number of pages were too large, so some of my comments were specific to that concern:
Only a small percentage of pages will ever have more than two language sections . I estimate that a supermajority will only ever have one language section: the inflected forms of Georgian verbs, for example, are unlikely to have homographs; in fact, most inflected forms are unlikely to have homographs: sure bats has some, but fugiebamus? arrodillasen? Likewise words containing clicks (ǃʻûĩ ǂʻàn ǀàũ), hieroglyph transliterations (m3-ḥs3), etc. I vehemently oppose splitting pages by language, but if pages are to be split, I suggest they should be split only after a certain treshhold is passed, e.g. only once they contain 2+ language sections or surpass a certain byte size... otherwise, a tiny tail will be wagging an enormous dog. Are there any prohibitively large pages not in Latin script?
If pages are split, how will users know to type rottweiler/en and rottweiler/fi to find the definition of "rottweiler" in English and Finnish, respectively? What will plain rottweiler look like? Will rottweiler transclude all the subpages, so that its display is unchanged? (I could live with that.) Or if you want main pages to be stripped-down disambiguation pages saying "for the Finnish definition, click here", what happens to users who don't know what language a word they want to look up is in, or who want to know what water means in all the languages which use it? They have to click through to each subpage, then go back to the main page and click to the next subpage, to slowly get a picture of the definitions in each language?
I disagree with assertion that the current format is only convinient for editors and that subpages would be easier for readers; I think the current format is easier for readers.
@Amgine, Ungoliant MMDCCLXIV My issues centre around Ungoliant's pros #3 and #7, with his con #2 as a smaller but also problematic point. My concern is that Ungoliant is assuming that there will be a second search bar for language, which presumably could be autofilled by user preference (yes, even by anons). If someone were to have made the JS for that, or to have pledged to make it, I think I could support. But while we are simply relying on MW search like we are now, without any promise of a search bar that would be truly functional for a multilingual dictionary, I cannot support this. However, I believe this is the only deal-breaker for me. Actually, there might be more. The transclusion idea is very important. —Μετάknowledgediscuss/deeds22:34, 2 February 2014 (UTC)
I agree. The search bar for language is absolutely necessary. Come to think of it, it would be a useful feature whether we change the page structure or not. — Ungoliant(falai)22:49, 2 February 2014 (UTC)
I agree a couple of these are concerns, and I do not know if they are surmountable.
The idea the wiktionary will 'run faster' is not true. Most pages are served from cache.
The concept of using a language name in the search box is cool. I suspect it will affect less than 3%, more likely less than 1%, of all searches once it is widely accepted and used, but this is a large improvement even at that rate of use. As long as you understand it's an insider's trick, aimed at the tiny portion of our readership who visit here regularly, then it's cool.
/ is used within Mediawiki software to indicate a subpage, which has substantial effects within the software (see mw:Help:Subpages.) I believe this is turned off by default in the main namespace in WMF projects, but I do not know the status of Wiktionary. I seem to recall there's a work-around for using solidii in titles to avoid subpaging.
Additional comment: there is a greater concern, imo: how will external apps using the API to serve our content to users (there are about 50 last I checked) find a term in a given language? - Amgine/t·e18:02, 3 February 2014 (UTC)
Long pages really do load slower, but it's even more noticeable with editing. Long pages are very hard to edit, in a few cases even impossible because of time-outs. —CodeCat18:15, 3 February 2014 (UTC)
That is, generally, a browser issue, not a Mediawiki issue. Furthermore due to the cache-poisoning model used in MW a long page (for example water) has a greater likelihood of needing on-load reparse/recache. IOW: this proposal could make the long-page load problem worse, not better, due to the number of page transclusions. - Amgine/t·e18:20, 3 February 2014 (UTC)
Yes, it takes longer to download, and the browser may take longer to display it, but it takes less server time than using a search engine to find and send a page. Using the example of water/en, the savings would be negligible. The savings for water/de might be noticeable by the average broadband user, would be very obvious to someone with limited internet. In my opinion it is unlikely someone who would benefit from the filesize savings will also know about a probably obscure method of searching for it. - Amgine/t·e18:39, 3 February 2014 (UTC)
What will be typed in the second search box? the term being sought. And the search will guess which language is being sought, because the user has not declared it. Some correlative studies of en.WT readers suggest the primary audience who are not using an English user interface are students learning English: your search will find the term in their interface language and not the English language which they are actually searching for. Among those who do use an English interface the evidence suggests they are often (but not mostly) English speaking students of a non-English language - their search terms are not English terms.
For that portion of our readership, and I believe it to be substantial, they would need to search for both language and term - e.g. "Deutsch water" - to find what they want. Most of the English speaking browsers would likewise be completely unaware a term is also used in other languages if they use the second search box. And finally, when there are two optional boxes next to each other on a page, the top item will be used in practice with the unused option relegated obscurity. - Amgine/t·e18:57, 3 February 2014 (UTC)
Why should’t they display entries? The page titles should help decode search results. If someone searches for “water”, the results headings should look something like water (English), water (Afrikaans), water (Dutch), &c. I wonder if there’s a simple way to add such boilerplate to the titles displayed in search results, as well as title and h1 elements on the page.
My hope is that if a person searches for water with no language specified, it would take him to a page that looks exactly the same as the current page. I’m trying to figure out a way of doing this without negating pro number 5. — Ungoliant(falai)19:34, 3 February 2014 (UTC)
If we can get the developers to do this, we could have a second type of transclusion which transcludes the page in its own context rather than in the context of the current page. --WikiTiki8919:40, 3 February 2014 (UTC)
Yes, if (a big if) we could get transclusion to work we could just use the existing tabbed languages infrastructure. Otherwise I couldn't support this without major changes to the mediawiki software. DTLHS (talk) 22:39, 2 February 2014 (UTC)
The fundamental idea of hypertext is that you link to pages without duplicating their full text. Have we twisted the wiki so hard that we are thinking backwards about how a website works?
I don’t really understand the urgent need to associate dictionary entries based on spellings in a writing system. Maybe this belongs at the bottom or sidebar of the page, and not up above the heading. If I’m looking at ale#Polish (“but”), I’d prefer a list of links that includes cognates ale#Czech (“but”) and але#Ukrainian (“but”), and perhaps αλλά#Greek (“but”), but omits ale#English (“malt liquor”). —MichaelZ. 2014-02-03 17:21 z
You would be able to do both. You can either look at the language-less page and see all the words in all languages with that spelling, which is useful if you don't know what language you are searching for or if you are just browsing. Or you can look at the language-specific page and see only that language section. --WikiTiki8919:18, 3 February 2014 (UTC)
True enough. And cognates are linked from a complete etymology section. I am thinking that a simple index to subpages on the root page would be more useful and simpler to understand than a full transclusion of them. —MichaelZ. 2014-02-03 19:32 z
One of the last times this issue arose, I think I was the one who brought it up. Various database-y things are made more difficult by the way each "entry" is currently a bucket for all homographs, as described in that older thread here.
I knocked up a mock up at ] for one way that this might work: the main page looks effectively the same to the reader as it would now, but each language entry resides at its own sub-page -- so the Kedah Malay entry is at ], the Zulu entry is at ], the Japanese entry is at ]. The main entry (everything-on-one-page entry) at ] still shows the blue Edit links, and clicking on the Edit link for any language section opens just that language section in the editor (i.e. the edit is of the transcluded lang-specific subpage), in a way that is transparent to the user. The only real difference for editing is that there isn't a way to edit with everything on one page, but there are workarounds for this (opening other language sections for editing in other browser tabs, etc.).
Searching would be the same as now -- typing a specific spelling in a specific script into the search bar would direct the user to the main page for that entry, provided it exists. We could also find a way of implementing language-specific searches, where searches would automatically direct the user to the ]/] page instead.
This would apply to *all terms*, regardless of the number of languages for any given spelling (entry). Entries with only one language would not be "split", they'd be moved -- so all Georgian entries would be under ]/], for instance. This is much more straightforward and easier to implement than any approach where only some entries have lang code pages and some don't. Combined with the previously described way in which the main page for an entry (without lang code) would be used as the search result landing page, we avoid any need for casual users to even be aware of the lang codes. Moreover, this work of splitting and moving is eminently bot-able.
Nicely done. Is there an easy way to make the all-entries page just a list of links that the reader could peruse without scrolling? —MichaelZ. 2014-02-06 19:55 z
There is a way to get all subpages without having to manually maintain the list: mw.text.unstrip(frame:preprocess('{{Special:PrefixIndex/'..pagename..'/}}')) I am not sure how expensive this is, or how stable. DTLHS (talk) 20:36, 6 February 2014 (UTC)
I imagine (perhaps naively) that it might be relatively simple to create a Lua template that would transclude all subpages (or perhaps all subpages that are lang codes, to open the door to having other kinds of subpages, perhaps for maintenance or other reasons), optionally sorted in certain ways. ‑‑ Eiríkr Útlendi │ Tala við mig20:55, 6 February 2014 (UTC)
Listing or transcluding subpages is easy (if we want to transclude them; I don't) but it's hard to find out what subpages there are. I'm also not sure if it's worth the trouble to have to put the template on every single main entry page. It would be much preferred if we could automate it so that these pages didn't need to be created at all. —CodeCat21:11, 6 February 2014 (UTC)
User:DTLHS: Transcluding special pages disables page caching. No other performance issues are known. This was already discovered once, there are a few interesting discussions over at w:WT:Lua. Keφr21:18, 6 February 2014 (UTC)
Right, no caching is definitely unacceptable I think, and we'll have to ask the developers for a solution if this is the path we want to take. DTLHS (talk) 21:30, 6 February 2014 (UTC)
Won't the lack of a cache reduce performance significantly? And the main page doesn't ever change, but any subpages would need to be loaded each time. DTLHS (talk) 21:41, 6 February 2014 (UTC)
My sense from the above, and from past threads, is that there is a substantial demand to keep something that resembles the everything-on-one-page model, with the option to display on a per-language basis. That was the basic understanding in my head when I created the mock-up page at ].
Just listing all of the subpages sounds to me like a usability regression -- we're forcing users to click another link before they can get to the information. If everything is at least displayed on one page (as currently, or as at the mock-up), then users don't need to navigate another set of links.
In past discussions, I have noted that I vehemently oppose splitting content onto subpages, but feel that transcluding all the subpages onto the main page would be a must if the pages were to be split onto subpages; I still feel that way. As Eirikr says, to simply list subpages is an egregious usability regression, particularly because (as I have noted) a supermajority of pages contain (and will only ever contain) one language section. - -sche(discuss)22:58, 6 February 2014 (UTC)
A lot of the points that are raised here are already raised at Wiktionary:Per-language pages proposal, so they should probably be discussed there, not here. I don't see how it's a usability regression, because you first have to argue that people widely use Wiktionary to look up the same word in many languages at a time. I think you'll find that it's only a small minority of users, and the majority is only interested in one language, so that is what this improvement is targeted for. I think that putting all content on one page like we do now would be the regression, or rather a missed opportunity at progression. —CodeCat23:03, 6 February 2014 (UTC)
I think the usability regression that -sche is referring to is the fact that we would need to create a hub pages and a subpage even for every entry that only has one language. --WikiTiki8923:17, 6 February 2014 (UTC)
That's why I suggested that we should look at ways to do this automatically, without having to create pages at all. Ideally, every possible non-subpage would display an automatically-generated list of all languages that have that word, and we'd only need to create subpages. The list entries would update themselves and wouldn't even need to be created as real pages. —CodeCat23:20, 6 February 2014 (UTC)
That's the kind of details that we would work out on that dedicated page. It's meant for us to gather all information, pros and cons of each approach, other consequences, and so on. That way we can make an informed decision once we do. Right now we're trying to decide things before we even have all the facts addressed yet. —CodeCat23:29, 6 February 2014 (UTC)
Template for eg over usex like label over context
Having discovered and used "label|en", I've abandoned "context|lang=en" in its favor. Liking its brevity and clarity, I then figured that "eg|en" would be a similar if not better improvement over "usex|lang=en". Although a very inexperienced editor, I thought to put such a template together. Pages like "Help:Template" and "Help:A quick guide to templates" informed me about how they worked but were mysterious about just how to put one up. If Grease pit people agree with me about this potential template, I would be grateful to have them put it up so that I (and others) can use it. ReidAA (talk) 05:14, 3 February 2014 (UTC)
One reason is people are more used to {{usex}}, so it would be easier for them to switch to {{ux}}. Also, "e.g." implies that it would be an example of the meaning word rather than an example of the usage of the word. For example, "apple, e.g. Granny Smith" makes more sense than "apple, e.g. I ate an apple.". --WikiTiki8921:38, 3 February 2014 (UTC)
I think {{usex}} as it is is short enough. It's not used that widely to make it prohibitively long. Compare {{etyl}}, which is used more. —CodeCat21:40, 3 February 2014 (UTC)
It's not so much the length as the convenience of having the language code as the first positional parameter rather than as a named parameter. In order to do that, we need to create a template with a different name. --WikiTiki8921:51, 3 February 2014 (UTC)
We don't need to necessarily. In the past, we've "migrated" templates like this by first making the language mandatory, then using the presence of lang= to determine whether the first parameter is the language code or not. —CodeCat21:55, 3 February 2014 (UTC)
But now we have to support both templates, and will probably want to migrate them all eventually. So now, a bot has to go through all uses of {{usex}}, orphan it, then go back and change all uses of {{ux}} to {{usex}} again later. —CodeCat22:10, 3 February 2014 (UTC)
That could work, though. It's how I changed the behaviour of {{only in}}: I created {{only-in}}, and (with considerable assistance from Mg) switched entries to use both it and the new format. - -sche(discuss)22:17, 3 February 2014 (UTC)
I agree with CodeCat that {{usex}} is the best title, and we should Just change its behaviour. However, I expect that using the presence or absence of lang= to tell whether or not the first parameter is the language code is not viable, because I expect that lang= is often omitted (or would be if we made it so that omitting it didn't cause an error, because the template interpreted it as meaning the language code had been supplied in a different way). I suppose temporarily creating a new template à la {{only-in}} may be the best idea. - -sche(discuss)22:15, 3 February 2014 (UTC)
Why not name the new temporary template {{usex/t}} then, just to make it clear that it's not a name editors should get accustomed to? —CodeCat22:21, 3 February 2014 (UTC)
I guess if you have your mind set on it replacing {{usex}} entirely then go ahead. Personally, I don't see why we can't have both. --WikiTiki8922:43, 3 February 2014 (UTC)
Then, let a thousand flowers bloom. Experimentation doesn't need much discussion. Put the uniformitarian impulse may kill it prematurely, by going directly to step 4 after a suitable interval. DCDuringTALK02:05, 13 February 2014 (UTC)
Uncategorized pages
Special:UncategorizedPages has gotten relatively long again. One reason seems to be that the deletion of the useless "alternative forms" categories has uncovered the fact that some entries weren't in any other categories. In many cases, this is because their headword lines aren't templatized, they're just bolded pagenames. I know we have bots that add {{head|foo|POS}} to newly-created entries that lack it; perhaps those bots could be sicced on the special page. - -sche(discuss)05:15, 3 February 2014 (UTC)
Wait, are second-person singular simple present tense forms of things nouns (as the POS header claimed even before the bot edited the entry)? Or are they verbs? Turkish has some strange grammar... - -sche(discuss)19:42, 3 February 2014 (UTC)
I created Angriffen by clicking on a green link in Angriff. It produced a ===Related terms=== heading instead of a ===Noun=== heading, and a headword containing "related terms form" instead of "noun form". What's going on? SemperBlotto (talk) 15:05, 3 February 2014 (UTC) p.s. I haven't corrected it yet.
The acceleration script tries to guess the part of speech based on the header in the entry. Here it apparently thinks that "Related terms" must be the header. It might be because Related terms should come after Declension, but in any case green links have never really been applied to inflection tables, so the script hasn't been properly adapted to them. Currently it just "scrolls back" from where the link was, and uses the first header it finds (regardless of the level) as the PoS. There should be a way to solve it, but I'm not sure how. —CodeCat15:19, 3 February 2014 (UTC)
It's a temporary fix at best... if you add a Usage notes section (which does go before Declension according to WT:ELE) the same thing will happen again. I could tell it to ignore that section... —CodeCat15:26, 3 February 2014 (UTC)
Right now that page is protected so that only admins can edit it, which is the way it should be because mistakes there can cause all of Wiktionary to fail. --WikiTiki8900:28, 4 February 2014 (UTC)
That's understandable. In such cases a user can request administrator rights or rather someone can nominate him. He has been with Wiktionary and all his edits are good, working with a number of Indic languages, so I think I can nominate the user. --Anatoli(обсудить/вклад)00:38, 4 February 2014 (UTC)
Can someone update WT:EDIT for the "new" translation check format?
For the new translation format with {{t-check}} and {{t-needed}}, some changes need to be made to WT:EDIT. It has to understand both the old and the new format before we can make any other changes, otherwise things will start breaking when it no longer understands our translation tables. I'm not familiar enough with WT:EDIT to trust myself to make the changes properly. Can someone do it? —CodeCat15:07, 4 February 2014 (UTC)
"Add translation" script might break when translation table contains {{trreq}}
The entry where I found this is (deprecated template usage)roil. Some experimentation indicates that the error ("Could not find translation entry for 'sv:foo'. Please reformat") appears if the entry is supposed to be inserted right before a trreq entry. For example, attempting to add an se entry ('Northen Sami') to the first set of translations on that page works, but attempts to add jv ('Javanese'), sv ('Swedish') or sw ('Swahili') fails with an error as above.
Adding an entry after a trreq seems to work, however - cf se.
Also, in the second table, where a Swedish translation is already present, it is possible to add a translation to Swahili so I doubt it's a problem with the language templates, but with the context into which it's supposed to be added. \Mike (talk) 17:55, 4 February 2014 (UTC)
I asked Mike to report this here, after he asked about the issue in IRC. "Is broken" has a slight difference from "will break". - Amgine/t·e21:08, 4 February 2014 (UTC)
The reason it's being changed is because it's already broken, though. The "main" link explains more. To fix the small breakage, we need to make a change that will cause a large breakage unless it's accounted for in advance by adjusting the script. —CodeCat21:12, 4 February 2014 (UTC)
Oddly enough, I don't care. I suspect some change was initiated which has caused the current condition of brokenness. Any change which does this, when it was predictable before the change was implemented, is by definition a bad change or implementation. Just like the use of obscure/non-intuitive abbreviations, specialty jargon in naming, and undocumented code; there are reasons why sane coding conventions are developed and used. - Amgine/t·e21:19, 4 February 2014 (UTC)
I just want to be sure that we're on the same page: you are aware that the bug Mike describes has existed for at least three years and wasn't caused by any recent change, right? (It's possible the bug was present even in the earliest versions of WT:EDIT; I don't know.) The threads above are about how to make changes (they have not been made yet) to fix the script. - -sche(discuss)21:29, 4 February 2014 (UTC)
Nope, not aware of any of that and I want to keep it that way. Bug reports should be welcomed with open arms because they are the best feedback. - Amgine/t·e21:37, 4 February 2014 (UTC)
I'm glad Mike reported the bug. Even though we've known about it for years and have been trying to fix it as recently as the thread directly above this one, it's good to know that other people are aware of the bug and want it fixed. I'm less thrilled with the way you revelled in your ignorance of the situation when CodeCat and I tried to explain it to you. - -sche(discuss)22:01, 4 February 2014 (UTC)
What little I have seen of 'coding' on this project I mostly am either unqualified to help with, or extremely unwilling. My ignorance had nothing to do with my answers, and I would say much the same now having been 'enlightened'.
So it was a known, old, bug after all, and not a transitional glitch while the new "translation check format" is being rolled out? (Or whatever that section above is talking about). Then, do we, somewhere, have a source where casual editors can see if the strange behavior observed indeed is known, or if it should be reported? I am of course not talking about a full-blown Bugzilla behemoth, but a simple list of known issues, which would have saved me from some head-scratching and trouble-shooting yesterday. \Mike (talk) 11:50, 5 February 2014 (UTC)
Just a cursory look, but it looks like there are 3 trreq-related issues reported on the talk page User talk:Conrad.Irwin/editor.js. The coding section seems to have an overview of known issues, but I suspect you would need to work through the rest of the page to look for similar issue reports. - Amgine/t·e17:03, 5 February 2014 (UTC)
Are the redundant second copies of the simplified or traditional form of Chinese "inflection lines" a bug or a feature?
Recently I started noticing lots of Chinese "inflection lines" like this:
密码 (simplified, Pinyin mìmǎ, traditional 密碼, simplified 密码)
They start with one form, in this case simplified, indicate the pinyin, give the equivalent traditional form ...
But then give a second, redundant copy of the simplified form.
The equivalent contrary situation occurs for traditional form entries.
I'm assuming this is a low priority bug that's come about while Lua-ifying lots of templates combined with automatically constructed "inflection lines" which pass both the "sim" and "tra" parameter.
Is this done consciously as a feature? I can't imagine why. So perhaps we can look at modifying the templates / Lua modules involved to not display the "tra" parameter even if it's specified, when the form is already traditional, which I believe is specified with the "t" parameter.
And conversely to not display the "sim" parameter on simplified entries, those which have the "s" parameter.
Simplified only needs to tell us the matching traditional and traditional only needs to tell us the matching simplified. I don't believe entries which have identical traditional and simplified forms have any similar kind of redundancy. — hippietrail (talk) 11:19, 6 February 2014 (UTC)
I agree that it looks strange at first but the simplified entry (s) didn't display simplified, the traditional entry (t) didn't display traditional and shared (ts) didn't display anything. It's just easier to maintain entries when both trad. and simp. have the same part, e.g. |tra=愛好|sim=爱好| --Anatoli(обсудить/вклад)12:24, 6 February 2014 (UTC)
Then the module should compare both the tra= and sim= parameters to the pagename to figure out which it needs to display. --WikiTiki8914:14, 6 February 2014 (UTC)
Exactly. Now that we have LUA this kind of thing should be really easy to do. Though I admit I haven't got around to learning WikiLua myself yet. — hippietrail (talk) 14:32, 6 February 2014 (UTC)
Yes I had a quick look and saw that this/these templates do not yet use Lua. I'll take a look to see the new version. Thanks to all who listened and/or contributed! — hippietrail (talk) 17:46, 6 February 2014 (UTC)
It is not redundant within the template. There may be multiple simplified or traditional forms (eg. 併存), in which case PAGENAME =/= 'simplified' or 'traditional' in the template.
Anyway, those templates are redundant externally, considering essentially all information contained within the template is duplicated elsewhere. There is no point in indicating simp/trad and the other forms if Template:zh-hanzi-box is compulsorily present on all pages. And Pinyin should go to Pronunciation (where it is also duplicated by the IPA pronunciation template) not definition, and be made visible by Template:Pinyin-IPA, since it is more pronunciation of the character in one variety than the inherent transliteration of a character into a different script. Thus those headword templates, which are useful for inflecting European languages, basically contain no critical information at all for Chinese or Vietnamese.
What is worse is the compulsory PoS split for unsuitable languages, which dictates that these redundant information be repeated. What results as a consequence is an entry in which information is unnecessarily duplicated multiple times. eg. 明白. Wyang (talk) 22:41, 6 February 2014 (UTC)
re "there may be multiple simplified or traditional forms (eg. 併存)": good point. AFAICT, the templates as newly modified still allow for that. {{cmn-verb}}, for example, no longer displays "traditional" forms redundantly on the pages of the traditional spellings of verbs that have only one traditional form, but it still displays the multiple forms of 併存.
re "those templates are redundant externally, considering essentially all information contained within the template is duplicated elsewhere": I agree. Someone should make a cmn-head template or module and have all the POS templates use it rather than repeating as much code as they do now.
re "Pinyin should go to Pronunciation": I disagree; it is a decent romanization / way of identifying characters in Latin script (and romanizations go on the headword line); it is an unintuitive guide to pronunciation (e.g. qǐlái being /tɕʰi˨˩lai˧˥/). - -sche(discuss)23:10, 6 February 2014 (UTC)
Silly question (pinging @user:CodeCat, @user:Wyang): how do I add a call to {{Zhuyin}} passing the pin parameter, just a after pin? (The template delinks pinyin). It should be automatic, so that entries won't need to be modified.
Thanks to Lua, we've come a long way in automatically sorting entries in categories. {{head}} now knows, for example, that the German umlauts ä, ö, ü are to be sorted as a, o, u, and that ß is to be sorted as ss. {{de-noun}}, on the other hand, does not know that. (Maybe the other German headword-line templates don't know it either, I haven't checked them.) Could someone knowledgable please fix that? Thanks. —Aɴɢʀ (talk) 19:30, 6 February 2014 (UTC)
Apparently recent edits to {{ro-adj-form of}} have made it so that the first parameter is interpreted both as the lemma to be linked to and the gender/number parameter, the second as both the head parameter and the grammatical case, and I'm not sure what it thinks the third parameter is. In short, it looks like all the Romanian adjective form entries are broken. Can someone fix this? Chuck Entz (talk) 07:58, 7 February 2014 (UTC)
Strange text-centering behaviour in collapsible boxes
I can't figure out why this edit diff causes the text in the tables to be centered. It seems totally counterintuitive... I remove things that say "center", and yet..? Is there a way around this? If I apply text-align: left to the main table, then all of the table header cells are also left-aligned, and I don't want that. Only the regular cells should be left-aligned. —CodeCat02:09, 9 February 2014 (UTC)
The text is now centred because it MediaWiki:Common.css has div.NavFrame {text-align: center}, which is inherited by the table cells.
I am guessing that it wasn’t centred before, because the centring property of the <center> element has some inheritance weirdness. All I could find is MDN saying “This is used to implement the legacy align attributes on some table-related element. Do not use these on production Web sites.”—MichaelZ. 2014-02-09 06:10 z
The sentence you quote from MDN is referring to e.g. text-align: -moz-center, used for implementing e.g. align="center". (Not sure if you already realize that; the context you give it makes it sound like it's talking about <center>.) —RuakhTALK08:06, 9 February 2014 (UTC)
Both <center> and <div align=center> are given the property text-align: -webkit-center; in Safari (these were both deprecated in HTML 4). So I suppose these obsolete elements may have similar, unpredictable effects on layout.
But life is short and I don’t want to spend it analyzing how obsolete HTML works. Let’s get rid of it and troubleshoot any real problems. —MichaelZ. 2014-02-09 19:05 z
I haven't botted for a long time, and after a quick successful run it won't let me run it, instead spitting out the error message WARNING: Token not found on wiktionary:en. You will not be able to edit any page. A search around the blagotubes reveals but little. I'm on a Mac using SVN, and I just updated Pywikipediabot. —Μετάknowledgediscuss/deeds22:12, 9 February 2014 (UTC)
Try updating manually or using something other than SVN? I have heard that causes problems. And make sure your config files are correct... DTLHS (talk) 01:13, 10 February 2014 (UTC)
I've noticed lately that it takes a little longer for black links to turn blue when I enter form-of pages for Latvian words from the declension table in the lemma. For instance, a word like rudmatains "redhaired" has a full declension. I usually add form-of pages by clicking on the black links to start a new pages, write in the form-of information, save it, and then go back to the lemma page, where the respective form in the declension table would now be a blue link. Now, when I add a form-of page and return to the main lemma, the link in the declension table is still black -- if I click on it, it does take me to the form-of page, so I know everything is OK, but it just doesn't immediately turn blue. It eventually does turn blue, in about two or three minutes, so no biggie there; but I just wondered if something had changed at Wiktionary while I was away, something that would slow down the previously near instantaneous conversion of black (or red) links to blue links after you save the corresponding new page. --Pereru (talk) 01:11, 10 February 2014 (UTC)
It's been happening to me too lately; purging the page fixes it, but it's a PITA to have to keep purging pages all the time. —Aɴɢʀ (talk) 08:56, 11 February 2014 (UTC)
Template:wikipedia currently has, after its box, a separate link, hidden by CSS (search within that page for interProject), and copied by JS (ditto) to a link in the left margin. This is (a great idea but) poor design: using CSS to hide stuff that's really there: then browsers that don't bother with that CSS will show the extra link, as in the picture to the right.
I propose:
that the JS be modified to read even links that have interProject among their classes (not only those that have it as their only class);
that the CSS hiding such links be removed;
that the extra link be removed from after the box generated by template:wikipedia (and likewise for any similar link generated by another template); and
that the remaining link to Wikipedia, which appears in the box generated by template:wikipedia (and any other template that currently uses the class interProject on a separate link), have interProject added as a class.
varspans=document.getElementsByTagName('span');// filter for projectlinksfor(vari=0,j=0;i<spans.length;i++){if(spans.className=='interProject'){elements=spans.getElementsByTagName('a');j++;}}
and replace it with
varspans=document.getElementsByClassName('interProject');// filter for projectlinksfor(vari=0,j=0;i<spans.length;i++){elements=spans.getElementsByTagName('a');j++;}
I am requesting a bot run through all Hungarian plural nouns to make the following change:
From:
# {{plural of|xx|lang=hu}}
To:
# {{hu-inflection of|xx|nom|p}}
The plural entries that are in Category:Hungarian noun forms - nominative already have the new structure since they were created after User:CodeCat added the accelerated noun form creation to the Hungarian declension table.
I would also like to get rid of two templates: {{hu-noun-form}} and {{hu-noun form}} but they are still being used in many of the plural forms. These two templates should be replaced with {{head|hu|noun form}}. Can this be done in the same bot run or should it be a different reqest? --Panda10 (talk) 14:55, 12 February 2014 (UTC)
I am using AWB on a database dump on my PC. I'd like to create a list of entries that contain both 'Word1' and 'Word2' in the same line, not immediately after each other. What is the correct regular expression to do this? Thanks. --Panda10 (talk) 19:07, 11 February 2014 (UTC)
Bot task: add missing entries to 'Category:Terms spelled with...'
I started using AWB to (a) find all entries with 0 in their titles and ==English== in their contents which did not have ] in their contents, and (b) add ] to them. However, I realized that it made more sense to let a bot do that, and likewise catch entries missing from all the other subcategories of Category:English terms by their individual characters. - -sche(discuss)06:00, 12 February 2014 (UTC)
That would be ideal, assuming it wouldn't slow pages down. The way things are currently set up, {{head}} would have to check the language, then check all the characters in the pagename against a language-specific list (either of approved categories, or of unapproved categories; more on than in a moment), since different languages have categories for different characters. For example, English doesn't have categories for its typical letters, A through Z, but does have categories for Ä and Γ, since those letters are exceptional in English. In German, there are not categories for A or Ä, since both are typical letters, and in Greek there would not be a category for Γ (since that letter is typical) if someone ever got around to creating categories for Greek. If that set-up is retained, it would probably make the most sense to associate a list of "basic letters" with every language (in Module:languages?), and then have {{head}} add categories for the open-ended set of "all characters not in the list of typical letters". (Alternatively, we could revise our decision to delete Category:English terms spelled with ' and to not have categories for Category:English terms spelled with A, etc, and allow categories for every character for every language, but the categories for 'typical' letters would be very large.) As I processed the first batch of English-0 entries, I found several that contained no headword template, but a one-off bot or AWB search could catch those. - -sche(discuss)18:24, 12 February 2014 (UTC)
I would be ok with doing this, but I'd suggest doing it first with a more regularly spelled language that has less entries, as a trial. —CodeCat18:36, 12 February 2014 (UTC)
I've been wanting to do this for the Russian obsolete letters ѣ, ѳ, і, and ѵ, which already have categories, but most words that have these letters do not have the category listed on the page. --WikiTiki8919:11, 12 February 2014 (UTC)
It will be important to be thorough with the list, though. We have to consider every single character, even punctuation and spacing. —CodeCat19:45, 12 February 2014 (UTC)
Actually, I think that (in the case of Russian at least) each categorizing character should be specified explicitly, rather than specifying each non-categorizing character. --WikiTiki8919:54, 12 February 2014 (UTC)
Hm... why? The class "characters other than Cyrillic letters which could potentially be used in Russian words" is open-ended, and "wanted categories" could help us find uses of such characters. Having all characters except the small, closed set of basic letters/spaces/punctuation categorize is more maintainable, I think. The one exception I see to that is not Russian but Chinese-character-using lects. For them, flipping things (and specifying which characters do categorize) does seem obligatory. - -sche(discuss)09:15, 16 February 2014 (UTC)
Am I right in guessing that the {{vandal}} template has the unintended side-effect of pinging the account being reported? If so, is there any way to make it not do that? Chuck Entz (talk) 07:45, 14 February 2014 (UTC)
According to the documentation, it pings the vandal only if whoever used it signed his addition with tildes and the link to the vandal's userpage is constructed ] and not . If we want to make sure the template doesn't ping the vandal, we can change the link to the latter. If we don't want that, but some user doesn't want to ping the vandal, he can avoid using tildes to sign his name.—msh210℠ (talk) 22:14, 14 February 2014 (UTC)
I'm glad he did. This diff shows why it was needed. I just happened to see the vandalism report about the same time as the vandal did, so it ended up being the vandal's last edit. Chuck Entz (talk) 22:44, 14 February 2014 (UTC)
Where to go for data dumps?
I'm curious where I should go to get an XML dump of Wiktionary, and more specifically, the ZH Wiktionary. The ZH one has Middle Chinese readings for many more entries than the EN WT, an area I'm currently interested in. My Chinese is just good enough for me to puzzle out the reading info on a given entry, but not good enough to be able to read their GP or other fora, or to ask there for dump data. Could anyone here point me in the right direction? ‑‑ Eiríkr Útlendi │ Tala við mig08:31, 14 February 2014 (UTC)
It says: "If you already have some experience with editing our sister project Wikipedia, then you may find our guide to Wikipedia users useful." But it's a guide for Wikipedia users, not a guide to them; i.e. we aren't listing specific users and their characteristics (although that would be amusingly controversial). Equinox◑23:08, 18 February 2014 (UTC)
The same way you edit any other kind of page. Especially since the welcome template does not actually have much logic in it, it's mostly just text. --WikiTiki8920:23, 19 February 2014 (UTC)
If you have tabbed languages turned on, then when you look at one language's entry, all and only that language's categories are supposed to be shown at the bottom. However, at ], it appears that the "Ireland" inside the context template of sense 6 is somehow triggering Irish language instead, because Category:Irish English—and all other English-language categories that are defined from the point until the end of the English entry—are appearing under the "Irish" tab instead. Any ideas how to fix that? —Aɴɢʀ (talk) 14:31, 19 February 2014 (UTC)
I was sort of hoping for a fix that could be implemented with a simple edit to a template or something, rather than completely overturning the current organization of the wiki. —Aɴɢʀ (talk) 20:39, 19 February 2014 (UTC)
But why doesn't the tabbed languages software just display the categories that are under the language heading whether or not they seem to relate to the language? --WikiTiki8922:42, 20 February 2014 (UTC)
That's not actually possible. The location of a category on a page/in the wikitext isn't something the script can access. Only the order is accessible. When the categories are sorted, if the category name begins with the name of the next language section ("Irish " in this case), it's assumed that the next language section has started (unless the category name ends in "letter names", "script characters", or "mythology"). This tends to work quite well in most cases. --Yair rand (talk) 23:03, 20 February 2014 (UTC)
This specific problem could be solved by renaming the category in question Category:Hiberno-English (though that may refer to something slightly different to Irish English), but the more general problem remains. And renaming our categories to various awkward phrases that no one actually uses just to prevent them from breaking seems very much like getting the wrong end of the stick. —Aɴɢʀ (talk) 23:45, 20 February 2014 (UTC)
Per-browser preferences
Every single day, I have to go here, untick "Highlight the inflection line of some entries" and tick "Show the translation sections expanded, instead of having them collapsed". Why? SemperBlotto (talk) 09:20, 20 February 2014 (UTC)
Love 'em. I'm kept logged in, so I assume all my cookies are functioning normally. Other sites that require cookies all work normally. SemperBlotto (talk) 16:12, 22 February 2014 (UTC)
Duplicated the problem with a restart of my PC. Perhaps the cure is like the one in the doctor joke: Patient: "Doc, it hurts whenever I do X" / Doctor: "Stop doing X". Maybe you could have your PC "hibernate" or something overnight, instead of flipping it off. DCDuringTALK17:39, 22 February 2014 (UTC)
It's a machine-readable version of Wiktionary:List of languages. I've tried to include all the data from the existing lists.
It would be good to nail down the format so script/bot writers could depend on it. For the program I'm writing, I'm currently only using columns 2–4 (language code, canonical name, category name). If anyone has thoughts on what data should and shouldn't be included please chime in or edit the thing. I don't actually know how much data is stored per language. Pengo (talk) 02:20, 22 February 2014 (UTC)
User:Ruakh was working on a module for exporting data using JSON. That seems much more sensible than csv, which is kind of outdated and inflexible. —CodeCat02:23, 22 February 2014 (UTC)
Cool. The CSV is a slightly more lightweight, but JSON is certainly more robust. I've added a link to JSON module now from List of languages (and from my csv list) so others might find it.
Right. The current format of the etymology-only language data is rather unsuitable for exporting, as it does not distinguish between "canonical" codes and aliases. show_etym in Module:list of languages uses a somewhat grotesque hack to present WT:LOL/E the way it currently does. Keφr09:53, 23 February 2014 (UTC)
As for documentation, I remember thinking something about deliberately not documenting it with too much detail (e.g. not giving a URL to the API entry point which exports the data), so that it would not be so easy to abuse. (While the knowledgeable can just read mw:API:Expandtemplates to construct an API call to get the data they want.) Also, User:Pengo: can you switch to the JSON exporter now, so that we can get rid of the CSV version? I think maintaining two separate "official" data exporting methods (with CSV being, I imagine, somewhat less stable and straightforward) is not a very good idea. Keφr10:31, 23 February 2014 (UTC)
Thanks for your feedback. I'll stick to using and developing the CSV, if you don't mind. The columns have clear meanings, it contains less redundant data, the page has less potential for "abuse" (it uses less CPU, if that's what you're worried about? Hiding the method for viewing it means it never gets cached, btw), and the format of the JSON is undocumented and seems to be geared towards showing the undocumented internal representation of the data rather than a user-centric data export. Thanks but no thanks. Pengo (talk) 08:31, 24 February 2014 (UTC)
Our data format is well-documented and it contains no redundancies, see Template:language data documentation. The JSON exporter simply remaps that format to JSON types (and can also filter out unneeded language data). Your CSV representation redundantly lists language family both as a code and name; the "line counter" field is also superfluous. I can also imagine problems with delimiter collision, if we ever decide to put a comma in an alternative language name (not likely, but possible). MediaWiki caching may be actually harmful here — pages sometimes fail to be purged after their dependencies are changed, which may make the bots/whatever use non-fresh data (I think it happened with WT:LOL a while ago).
I'm using Firefox and there's no resize handle on the text boxes here at Wiktionary, though there is at Wikisource. —Aɴɢʀ (talk) 20:45, 22 February 2014 (UTC)
I do get a resize handle, but it's not remembered. I wondered if there was a way to make it stay that way. It works now, thank you for the advice. —CodeCat21:03, 22 February 2014 (UTC)
I turned enhanced editing off, and the resize handle appeared for a moment, then disappeared again when Dot's Syntax Highlighter kicked in. I don't have that at Wikisource, which must be why I always have the handle there. —Aɴɢʀ (talk) 22:36, 22 February 2014 (UTC)
I highly recommend using external editors for editing large amounts of code and the default unenhanced plaintext editor for small amounts. The enhanced editor is horrible. --WikiTiki8922:43, 22 February 2014 (UTC)
Lebanese Arabic is currently treated as part of North Levantine Arabic ("apc"). We are actually in the middle of a discussion about merging North and South Levantine into just plain Levantine (see here). --WikiTiki8900:16, 25 February 2014 (UTC)
Automatic redirects for characters/character combinations
This is not related to Wiktionary directly, but is an issue that has been discussed, so I'm asking here. With a wiki, how do you set two characters or sets of characters to be equivalent? For example, when the user searches for a word with dz in it, I want the wiki to automatically consider dz to be dᶻ (such as in a word like d̲ᶻ̲idᶻəlal̓ič). Even better, is it possible to have the search engine look for dz first, and then search for dᶻ if no word has dz? --BB12 (talk) 01:32, 25 February 2014 (UTC)
Kind of. Rather than searching for "dz" then "dᶻ", the search index might simply stores dᶻ in an index of normalized word forms as "dz". Usually a search engine creates a "normalized" or "canonical" or "stemmed" form of the word for some of all of its indexes. Not sure why you're asking here. Pengo (talk) 01:55, 25 February 2014 (UTC)
I have a wiki and would like to do that to aid the user in searching for words. I've spent quite a bit time looking for a way to do that, but I've found mention of something like that only on Wiktionary (and possibly one other place), so I'm asking here in the hopes somebody knows how to do it. Googling on mediawiki text normalization brought up some interesting results, but they were not exactly what I'm looking for (or else they were over my head).
FWIW, this is something that also might be useful for Wiktionary. Requiring the user to type a superscript "z" or underlines in the word d̲ᶻ̲idᶻəlal̓ič dᶻidᶻəlal̓ič (Seattle in Lushootseed) is more than many people can handle. --BB12 (talk) 03:43, 25 February 2014 (UTC)
I just learned my text is out of date. No need for underlining, but the issue with dz and other superscripts still remains. --BB12 (talk) 04:59, 25 February 2014 (UTC)
Evidently the method used is the Universal Language Selector. I've put in a request to bugzilla, but I don't know what the process will be. Hopefully I can help out in some way..... --BB12 (talk) 20:42, 28 February 2014 (UTC)
Tlingit noun templates
I don't know the first thing about templating, and I'd appreciate some help with creating some templates for Tlingit nouns. I need a template for plurals, diminutives, possessed forms, and all combinations of these three, but doesn't require any of them.
You're talking about a headword template, similar to {{en-noun}}, correct? How many plurals / diminutives / possessed forms of one noun can there be? If there are a lot a full inflection table template might make more sense. DTLHS (talk) 04:38, 26 February 2014 (UTC)
I guess a table would make more sense since there are so many potential forms of a word. I'm still hesitant though because many nouns don't have plural forms or diminutive forms, so the tables would have a lot of dead links, or just forms identical to the base form. In any case, I'm not sure how to make an inflectional table either.
I wanted a list of words that have not yet been translated into my language, sorted according to how common/popular these words are. First, because a lot of very common words have not yet been translated into my language (Greek). Second, because it is quite difficult to actually check if a word has been translated or not. When an entry does not exist at all, you see it in red. But if you want to check whether a translation in a particular language exists, you have to open existing entries, expand the translations, and then check for translations in that language.
So, I downloaded the frequency lists from Project Gutenberg and the wiktionary dump and wrote some basic code to do this:
#create frequency list from downloaded html files (put all in one folder)
#!/bin/sh
# grep -h "</a></td>" * > freqlist
# perl -i -pe 's/<.*>(.*?)<.a><.td>/\1/' freqlist
#one line per article
for i in *.xml; do perl -lpe 'BEGIN { $/="</page>"} s/\s/ /g' "$i" > untranslated; done
# remove articles without translations and items with Greek translations
perl -i -ne'print if /{{trans-top/' untranslated
perl -i -ne 'print unless /{{t.?\|el/' untranslated
#keep title only
perl -i -pe 's/^.*<title>(.*)<.title>.*$/\1/' untranslated
#create final list
awk 'FNR==NR{a;next}($0 in a)' untranslated freqlist > wordlist.html
perl -i -pe 's@^(.*)$@<a target="_blank" href="https://en.wiktionary.orghttps://dictious.com/en/\1#Translations">\1</a><br>@' wordlist.html
split -l 100 -d --additional-suffix=".html" wordlist.html wordlist/wordlist
The results are not perfect since they contain a lot of plurals/past participles etc., but they are definitely usable. I'd like to filter out some more words by also comparing the list to the wordlist of a "standard" dictionary, such as wordnet or the free version of webster's. Is anyone aware of another list I could use?
On a more ambitious note, would you consider adding a similar feature to the site? I'm thinking of a page where people would be able to press a button and be served with the next most common untranslated word in their language. The list would be updated every 2-4 weeks and the counter would be reset to zero. So, words that were clicked on but were not translated will return to the start of the list and will be served to new users, while words that were translated in the meantime will be removed. Jenniepet (talk) 20:18, 26 February 2014 (UTC)
You mean a bot that would regularly add, say, a hundred new words to the Category:Translation requests page? That sounds good. Although I don't think it adds anything to make the missing translations visible inside the respective articles. For example, among the first hundred untranslated words for Greek I found "look" and "low". If I had opened these entries by chance and seen they had no translation I would probably have added them anyway. Jenniepet (talk) 20:18, 26 February 2014 (UTC)
I have a program that sorts words lacking translations based on how many translations the table has. The results have been great. — Ungoliant(falai)21:05, 26 February 2014 (UTC)
That's a great idea. Could you send me your list of "English words with the most translations" so that I can combine it with mine and see if this results in something even more useful? I'd appreciate a longer list (somewhere between 20-40.000). I did try to combine my list with the wordnet and webster1913 wordlists after all, but the results are disappointing. Essentially, I'd like to have a list that doesn't include said and came in the first page of results. However, there is one thing I would change in your approach. I think that your condition "words that are missing one translation in a given language" is too strict. For languages with very few existing translations, I would suggest listing only "words without any translations". For very popular languages, the ideal solution would be something along the lines of "words missing at least 50% of the translation glosses/senses". The reason is that for many well-known words you have more than 10 translation glosses, and some of them can be really obscure (e.g. baseball or american football terminology) or almost identical. So, people might leave them blank on purpose.
@Ungoliant MMDCCLXIV. The lists are great! Are you able to generate such lists for Russian, Japanese and Mandarin, please? I could use other languages, such as German, Arabic, Korean, etc. but I won't push my luck now :) --Anatoli(обсудить/вклад)00:10, 28 February 2014 (UTC)
@Ungoliant MMDCCLXIV I uploaded a list of words that haven't been translated into Greek created using my original script: list1. Also, two differently sorted lists of the common elements between my list and yours: list2 and list3. I think that the combined lists 2 and 3 are much better. My original list had too many inflected forms and your list has too many proper names. Their combination looks perfect! (I might have a slight preference for list2)
And now we come to the fun part: For these lists to be really useful, they should not lead you to words that have been translated in the meantime. So, they should be updated every 1-3 months using the latest dump. But what I'd also like to propose is that once someone clicks on a link on the list, that link should be removed (or hidden) from the list. So, if two or more contributors are working their way down the list, none of them will come up against a multitude of "dead" links. Sadly, I don't know how to implement this. I don't even know if it can be done using only javascript. Does anyone have any suggestions? Jenniepet (talk) 03:48, 28 February 2014 (UTC)
// Welshcreation_rules=function(params,entry){vartemplate={'plural':'plural of','equative':'equative of','comparative':'comparative of','superlative':'superlative of'};if(!template)thrownewPreloadTextError('No rule for "'+params.form+'" in language "'+params.lang+'".');entry.def='{{'+template+'|'+params.origin+'|lang='+params.lang+'}}';};
Um, what? Normally it works without having to add the language to the JS. In fact, that's how it worked every single other time I've done this. But anyway, for Welsh the automatic plurals really ought to have a mutation table automatically appended to them, so I wouldn't mind if the logic for that went in creationrules.js... but maybe that should be done after we work out the templates involved. —Μετάknowledgediscuss/deeds02:05, 28 February 2014 (UTC)
A bunch of spam talk pages have been created in the past couple of days, each using a different IP from any of several parts of the world, with mostly the same text and with the same words in the edit comment (I'm adding extra characters in even-numbered positions to avoid giving them a free search-engine hit): "Fqrqiqeqnqdq qFqiqnqdqeqrq". Admins can find plenty of examples through the deletion log.
Could someone who knows how add a filter to block such edits?
Also, at least one of the IPs was used a year ago to post a test edit with an edit comment of "Test, just a test" and text of "Hello. And Bye." If anyone sees such an obviously automated edit, don't just delete it- block the contributor as well. This might just reduce the chance of them coming back later to post spam.
In the past we've let these go because the edit itself doesn't violate any rules, overlooking that fact that using a bot to add content of any kind without going through the approval process is a blockable offense. We may not always block suspected bots if they're doing something innocuous like adding interwikis- but we have every right to do so. Chuck Entz (talk) 00:32, 1 March 2014 (UTC)