May 2010

Century Dictionary Bot

I've been looking at the Wordnik dictionary recently and love the extensiveness of their etymologies -- many of which come from public domain Century Dictionary. Is there a reason why nobody has tried to import this dictionary to Wiktionary before, and would I be allowed to create a bot to fill in non-existant etymologies with those in the Century dictionary? Divinenephron 16:21, 1 May 2010 (UTC)

Dan Polansky has been adding a bunch by hand. I don't see a problem with bot addition to entries lacking etymologies myself, as long you can teach the bot to add the stuff in our format. -Atelaes λάλει ἐμοί 18:05, 1 May 2010 (UTC)

It would be a really neat thing to do. We've had some problems in the past when importing definitions (from Webster 1913), but I think etymology is more stable. If you need a technical hand, I can maybe help, though I know next-to-nothing of etymology. Can you give a rough estimate of how many etymologies are importable? Conrad.Irwin 19:16, 1 May 2010 (UTC)

The Wikipedia entry the The Century Dictionary states that it has over 500,000 entries. A quick scan of one page leads me to suggest that half of them have etymologies. Acquiring the etymologies from an OCRed document shouldn't be too hard – they're contained within square brackets. I was planning to parse them with NLTK to convert them to a wikified format. I'm experienced with Python but not natural language processing -- part of the fun of doing this project. I can't start on it until after the 21st of June however (exams). Divinenephron 11:31, 2 May 2010 (UTC)

It's dated and in many respects obsolete dictionary based on obsolete scholarship, and every single etymology should be manually verified against some latest compilation of English etymologies. We also employ templates such as {{etyl}} {{term}} and {{proto}} that wouldn't be easy to automate properly (Latin macrons and Greek in Greek script with transliterations spring to mind). The best that you can do is to dump that etymological info on some subpage for editors to weed out garbage and incorporate manually into entries with missing etymologies. Also a link to an online image of the corresponding page scan (to doublecheck for the scannos). --Ivan Štambuk 17:44, 4 May 2010 (UTC)

na:k

The most recent dump of main-namespace pagetitles includes na:k, which is a page impossible to access AFAICT. (I wonder, indeed, how it was created. Perhaps by import?) It also includes two other similar-looking titles, which do not match (current) Wiktionaries and so (now) present no problem. Any ideas on how to deal with such?—msh210℠ 18:26, 3 May 2010 (UTC)

It has a sneaky (invisible) U+FEFF in front of it na:k to get round the interwiki problem. Not the best idea we've ever had, but it does work (confused me muchly when it first appeared too). Conrad.Irwin 19:45, 3 May 2010 (UTC)

It is indeed evil, since you have no chance ever of finding that page (even when you type it in the search bar it points you to Nauruan Wiktionary). A better idea is to use the IPA colon, which at least appears in the edittools. -- Prince Kassad 20:21, 3 May 2010 (UTC)

"not the best idea you've ever had"? Really? I should say so. I would hope it is the stinkiest, as I would hope you haven't done worse. (or have you ...) The entry is entirely unusable ... use the IPA colon, since that is precisely what it is; Alvarez-Hale is the IPA transcription with some of the characters simplified to be easier to type, so people write : instead of ː. Robert Ullmann 21:55, 3 May 2010 (UTC)

Um, sorry, you implied that it wasn't your idea. I guessed who might do such a thing though, and looked, and was exactly right. Sigh. Robert Ullmann 22:03, 3 May 2010 (UTC)

The various Manda languages use a zero-width joiner at the end of their names so AutoFormat can differentiate between them. That *is* worse than the above. -- Prince Kassad 21:57, 3 May 2010 (UTC)

To work around what? AF only needs the entry titles, what would the problem be? Can you give me an example? Robert Ullmann 22:03, 3 May 2010 (UTC)

AutoFormat does not like it when entries with identical language name appear in translation sections, and will tag such cases with a maintenance tag. That's why this workaround is needed. -- Prince Kassad 22:06, 3 May 2010 (UTC)

You couldn't have asked me? Before creating a mess? Are you going to go clean up that shit? It isn't a "workaround", it is a . WHAT were you thinking? How could any editor figure out what was going on? That could have cost me dozens of hours to try to figure out your shit. NOW GO FIX IT. AND FIND EVERY CORRUPTED ENTRY AND FIX IT! (and now I have to teach AF to tag entries with invisible crap in language names. sigh. probably a good idea anyway.)

Just effing use (Tanzania) or whatever after the language name. It doesn't work right with all the tools, but at least it is a known problem. Like Template:zma before you broke it. Robert Ullmann 22:16, 3 May 2010 (UTC)

rofl. I'm going to do it tomorrow. I didn't do that before because such names wouldn't make for particularly good entry names (while names with invisible symbols would make good entry names). -- Prince Kassad 22:22, 3 May 2010 (UTC)

not one little bit funny. fortunately, it looks like water might be the only entry you have to fix. Robert Ullmann 22:32, 3 May 2010 (UTC)

"would make good entry names"? What about the poor person trying to figure out why there are (would be) THREE entries titled "Manda", two of which are nearly impossible to get to? Robert Ullmann 22:39, 3 May 2010 (UTC)

Oh, and template:bzw, esp. as {bas} is Basaa. Robert Ullmann 23:01, 3 May 2010 (UTC)

I apologize for the tone used above.

The explanation, while not an excuse, is that I have little tolerance for people who intentionally introduce extremely hard to find bugs, to try to "fix" something. Think about it this way: if someone inserted byte-order marks of zero-width joiners with malicious intent, it would be clear—and extreme—vandalism. Doing it without malice and good intent doesn't make it any easier to find and fix.

In both of these cases, asking here how to do something properly (workably) would be the proper action. Robert Ullmann 22:49, 3 May 2010 (UTC)

posappboiler

I created a new template, whose code is very similar to {{poscatboiler}}. It categorizes POS appendices and provides a little introductory description for each one. See {{posappboiler}}; and Appendix:Portuguese pronouns for an example. --Daniel. 11:57, 5 May 2010 (UTC)

I would prefer that we left the display of appendices to editorial discretion, the introductory sentence in the template is unnecessary considering the page's title, and some appendices (Appendix:English adverbs) have better introductions already. "Main category" seems inappropriate when the pages is about pronouns, but the category isn't about pronouns. ({{also}} seems to be what you meant). I also diagree with categorizing the appendices into the categories, they are not entries, and are linked to by {{poscatboiler}} (by some obscure method, I'd really appreciate it if you could finish documenting {{categorytree}}). Conrad.Irwin 13:44, 5 May 2010 (UTC)

en-letter, etc.

A few weeks ago, I created {{en-letter}}, {{pt-letter}}, {{es-letter}} and a few dozens of other, related templates, that categorize and generate headword lines for letters. I think they are very simple and understandable. I'm also implementing them into the respective entries in an effort to clean up that incredible overall mess, but I should take some time to finish this. --Daniel. 12:24, 5 May 2010 (UTC)

I would remove the first label (outside brackets) from the display, as it is inconsistent with how other things are done. I have no clue what you intend with {{headtempboiler}}, the code looks stupidly complicated, and it is undocumented. Instead of including all the logic necessary to render every single page in one place, put the logic in the individual templates and use {{infl}} as a base for display (if you really want a meta-template), this has the advantage that the code can be understood easily, and we can include thousands of languages in hundreds of parts of speech without the entire thing exploding. Conrad.Irwin 13:58, 5 May 2010 (UTC)

I suppose the first label, outside brackets, is the text "lower case" or "upper case". How is that inconsistent with any other thing? Do you think that it should perhaps be inside the brackets, or possibly not exist at all? Since I read your message, I tried to simplify {{langtempboiler}} and created a documentation for it. I didn't follow your suggestion of placing the logic in individual templates (such as using {{infl}} or mere MediaWiki syntax at {{es-letter}}, {{ca-letter}}, etc.) because I don't see exactly how this action could make the letter templates more simple, understandable or editable. --Daniel. 08:20, 6 May 2010 (UTC)

Thanks for splitting headtempboiler up, that is a great improvement already. I think the label should only exist when the opposite doesn't exist (otherwise it's obvious which it should be), and it should be in the brackets when that is the case. The documentation generated by the template seems to advocate wrong usage - delta isn't an English letter last time I checked; and there's no need to pass sc=Latn for English - not a huge problem, but it is an issue that using proper templates avoids. This particular template also irritates me because it doesn't bother trying {{uc:{{PAGENAME}}}} {{lc:{{PAGENAME}}}} which would seem to work correctly on all pages that it's currently used on (I don't deny there are some letters it won't work on - Dutch dotless I, some crazy digraphs). Conrad.Irwin 12:58, 6 May 2010 (UTC)

add a parameter to template:quote-book

Please can someone add a parameter to {{quote-book}} to do exactly the same as the existing volume= parameter but without the leading "Volume " text it produces. I've provisionally called the parameter "volume_plain" where I've needed it (pile#Etymology 3), so if you add it using a different or better name (I thought about "book" but decided that would be too confusing) please could you update that entry and note the new name here. It (along with several other parameters, including "volume" and "original") will also need adding to the template documentation. Cheers, Thryduulf 13:39, 5 May 2010 (UTC)

I think that's now done (as volume_plain). Conrad.Irwin 14:37, 5 May 2010 (UTC)

Is it possible that this change broke the "section=" and "chapter=" functionality? See paedantry. DCDuring TALK 16:45, 6 May 2010 (UTC)

Hm, I just made some tweaks. Is it still broken (sorry, I'm confused)? Conrad.Irwin 17:33, 6 May 2010 (UTC)

It works now. Thanks. DCDuring TALK 19:02, 6 May 2010 (UTC)

Category:Bulgarian form-of templates

I thought I should speak out about these three templates. bg-verb-form of for example, adds a lot of categories that are at the moment red linked, and possibly not the sort of templates we want. es-verb-form of has similar problems, with one of two of its categories having failed RFD at the end of last year.

OK, in a nutshell, do we want super-specific verb form categories, or just Category:Bulgarian verb forms? Mglovesfun (talk) 11:27, 7 May 2010 (UTC)

Binisayâ language vs. Bundeli language

The table Index to templates/languages contains the line: "Binisayâ bns". But the bns code defines the Bundeli language, see . Do you have any ideas? -- Andrew Krizhanovsky 19:06, 7 May 2010 (UTC)

Fixed. That was a very old mistake. We don't really have coverage of Bundeli nor of Visayan languages, which is probably why this has gone unfixed for so long. Thanks for pointing it out. -Atelaes λάλει ἐμοί 20:32, 7 May 2010 (UTC)

Thank you. -- Andrew Krizhanovsky 07:50, 11 May 2010 (UTC)

Government glossaries

The EPA, EIA, and NASA have several glossaries which I believe are in public domain:

EPA

EIA

NASA

NASA's Earth Observatory library

IPCC

UNFCC

UNFCC

Copyrighted ones...for comparison

California Climate Change Portal Glossary (This one is (C) California)
Weathervane
Lenntech

(For some terms, their may be enough content to create a Wikipedia stub). I'm not Wikitionary much, but is there any place to list links to online glossaries?Smallman12q 21:13, 7 May 2010 (UTC)

Generally these are not imported directly into the entry namespace for fear of inaccuracies, prescriptivism, and general untidiness. It may make sense to import them into the Appendix namespace or Transwiki namespace so that they can be processed more easily. Thank you for compiling the list! Conrad.Irwin 00:55, 8 May 2010 (UTC)

template:term and literal translations

Sometimes it is useful/desirable to have a literal as well as/instead of a more idiomatic gloss of a foreign language term in etymologies, etc. Where we do include such, we almost always mark them with "literally", however as this is not part of the meaning it should not be enclosed in the quote marks that {{term}} produces, but this is not currently possible to do using the template, so various constructs are used to get around the limitation. See for example eta#Etymolgy 2, tsunami#Etymology.

A better solution in my opinion would be to add an optional named parameter to {{term}}, perhaps "lit=", so

{{term|つなみ|sc=Jpan|tr=tsunami||seismic sea wave|lit=harbour wave|lang=ja}}

would produce: つなみ (tsunami, “seismic sea wave”, literally “harbour wave”), or maybe add a colon after "literally" to give: つなみ (tsunami, “seismic sea wave”, literally: “harbour wave”).

I don't know whether I prefer the colon or not, and either way I don't speak template anywhere near well enough to actually implement this myself. Thryduulf (talk) 12:34, 8 May 2010 (UTC)

Looks very good to me. Mglovesfun (talk) 12:53, 8 May 2010 (UTC)

I have been needing such function. --Vahagn Petrosyan 14:21, 8 May 2010 (UTC)

Me, too. I'd love to have it.—msh210℠ 17:25, 12 May 2010 (UTC)

One comment on template call style: You've inserted several named parameters before the calls to the numbered parameters. That makes it much harder for later editors to determine what you've done. Good style always calls all the unnamed parameters first, and then any named ones after. --EncycloPetey 15:43, 8 May 2010 (UTC)

I take the point, but I just copied the markup from the tsunami entry rather than giving it any thought. Is this something that would be a useful for AF to tidy (i.e. move named parameters to the end of a template)? I've seen a lot of IPA templates where the named lang= comes before the first unnamed transcription, for example. Thryduulf (talk) 16:41, 8 May 2010 (UTC)

There are other qualifiers and things that can go into the gloss, like part of speech (“take, n.”), inflection (“girl (dative)”), context or explanation (“to shell (nuts)”), non-gloss definitions (“verbal suffix”), etc. I'd rather just let an editor work any of these out within the quotation marks or after the {term}, than end up adding a half-dozen other parameters to the template. What's wrong with “seismic sea wave, lit. ‘harbour wave’”?

Or if we do go the latter route, let's start a workshop and make sure we've covered all the possible examples rather than adding parameters piecemeal. —Michael Z. 2010-05-08 19:07 z

"What's wrong with “seismic sea wave, lit. ‘harbour wave’”?" lots. You've got a qualifier in the quotes implying that is part of the gloss, further you've got a literal meaning set-off further within those quotes implying that it too is part of the gloss definition (remember that people can choose to have single or double quotes around the gloss definition, so single within single wont be set off the same way as single within double). Adding a a template to handle literal meanings ensures uniform and possibly customisable presentation of a common thing. If there are other parameters that need adding, then we can add them as well, but the present setup of ad hoc handling of literal meanings is not professional. Thryduulf (talk) 20:52, 8 May 2010 (UTC)

I believe I have seen professional publications place non-gloss definitions into the quotation marks. How else would you signal that that is is serving the function of a gloss? We do routinely separate multiple glosses with commas, so how about setting off labels in italics: “seismic sea wave, lit. harbour wave”?

(And if the type of quotation marks is meaningful, then why do we let software change them indiscriminately? If a program can do this, then it should also handle nested quotation marks, or it's failing.) Anyway, the quotation marks represent a gloss – it's like a translation or definition, not a direct quotation. So non-gloss definitions could go in there, and also, “literally” plus nested quotation marks could be a good convention to signal literalness.

Adding a raft of obscure parameters to these templates just means that either the 3% of editors who learn them are going to be perpetual clean-up crew, or that half the dictionary will always be broken. Let's just use simple common-sense conventions for both editors and readers. K.I.S.S. —Michael Z. 2010-05-08 21:40 z

Except it wont be an obscure parameter, will be documented, and will ensure consistent formatting far better than requiring lit. (not literally, lit:, lit.:, etc). I don't know how the changing of single to double quotation marks works, but AIUI it doesn't change wikitext (where nested quotation marks not in a separate parameter would have to be). Thryduulf (talk) 21:56, 8 May 2010 (UTC)

So can this be added, or are there any other objections? Thryduulf (talk) 07:59, 11 May 2010 (UTC)

Based on the comments above, I think it is safe to say this is very desirable. I would add it myself, but I don't know how to do it, so please can someone with the appropriate skills please make the change. Thryduulf (talk) 19:37, 12 May 2010 (UTC)

Done.—msh210℠ (talk) 18:57, 26 July 2010 (UTC)

Cheers. Would it be possible to generate a list of all the places where the word "literally" is used in etymology sections. Thanks, Thryduulf (talk) 19:10, 26 July 2010 (UTC)

Template:ca-noun

I was thinking of merging {{ca-noun}} and {{ca-noun-mf}} into one, see {{ca-noun/New}} (when it exists). On a more general theme, it's a shame that these templates xx(x)-noun aren't more alike. For example, what purpose do {{es-noun-m}} and {{es-noun-f}} serve? Masculine and feminine nouns both has the same declension (add -s or -es for the plural) unlike Old French and Anglo-Norman, where the single biggest factor influencing declension is gender. Mglovesfun (talk) 13:11, 8 May 2010 (UTC)

A lot of the reason there are separate templates is historical. Originally, the English templates were that way (the horror), and the Spanish ones followed suit. Althought the English noun template now is unified, the Romance languages have followed the Spanish set-up, resulting in the current plethora of mostly-duplicate templates. I'm working on consolidating a new {{es-noun}}, just as I did for {{es-verb}}. --EncycloPetey 15:41, 8 May 2010 (UTC)

I'd agree with that. {{fr-noun}} shows that you can do it all with one template, I don't see anyone complaining about that. Ideally {{fr-noun-unc}} and {{fr-noun-inv}} should be merged into too, the problem is that {{fr-noun|m|-}} sets the singular and the plural to be the same, but not uncountable. See something like souris for an example. That's why nobody's set - to give uncountable, because it's already taken. Mglovesfun (talk) 10:11, 9 May 2010 (UTC)

ca-noun/New seems to be done now, but I won't have enough time today to move it and correct the pages using the old templates. So that'll be tomorrow. Feel free to comment here, or on the template's talk page (where it will be preserved longer). Mglovesfun (talk) 11:30, 9 May 2010 (UTC)

Done, I've checked some random entries and all is well, as it's designed to use the parameters of the former ca-noun template, just allowing feminine singular, feminine plural and uncountable options. But please do report anything that seems wrong/broken. Next step is to remove {{ca-noun-mf}} by hand. Mglovesfun (talk) 11:15, 10 May 2010 (UTC)

Much easier than I expected! Mglovesfun (talk) 12:23, 10 May 2010 (UTC)

Great to hear! --EncycloPetey 19:57, 10 May 2010 (UTC)

Old Church Slavonic font

I can't understand why Old Church Slavonic words (chiefly in etymologies) have to be significantly larger than all the other words. I find my eye drawn to them as soon as I see them, rather than reading the etymology from the first word to the last. PS I think the script template is {{Cyrs}}. Mglovesfun (talk) 10:09, 10 May 2010 (UTC)

It appears that the script template just applies the css class "Cyrs", which is set in MediaWiki:Common.css to use a font size of 137%, which seems to make the x-height of Cyrs script letters approximately equal to the cap-height of Latin letters. I don't know how or why the figure of 137% has been arrived at, but it is far from the only font to have a size of >100% (Lao and Mongolian seem to use 140%). I've not changed it, as I don't know what would be a good alternative value to give both legibility and approximately equal x-heights. That the font is quite bold might also have an effect. Thryduulf (talk) 10:49, 10 May 2010 (UTC)

{{Cyrl}} can be a bit awkward when used in bolded words, the words literally look a bit blurred, like I've taken my glasses off. Mglovesfun (talk) 10:51, 10 May 2010 (UTC)

Cyrl has a very bad set of font choices. Your best bet is to turn it off in your monobook.css. The font size increase for Cyrs is needed because the font for OCS is rather small by default. -- Prince Kassad 11:04, 12 May 2010 (UTC)

But maybe not 137%, right? Mglovesfun (talk) 11:47, 12 May 2010 (UTC)

Indeed, I wasn't disputing that >100% was needed, just that 137% seems a bit much in the specific case of Cyrs. If nobody objects, I think we should try reducing it to 125% and see if that fits better and is still readable, etc. If there is a way to test and experiment without changing the global .css file I don't know it, so if anyone does perhaps they should do it in case I muck things up! Thryduulf (talk) 12:04, 12 May 2010 (UTC)

OK changed to 125%. Feel free to revert or experiment further if this doesn't work as we hope. Thryduulf (talk) 07:40, 15 May 2010 (UTC)

I probably did that, and I don't remember the details, but 137% will turn medium-sized 16px text into 22px. Unfortunately there isn't a single font for old Slavonic, so there will be some size variation, but I did test all the ones I had. Go ahead and try 125%.

In print publications, the Slavonic fonts normally look very bold compared to surrounding modern Cyrillic or Latin text, so I don't have a problem with the same effect here. But I don't believe that {Cyrs} should appear in bold anywhere in Wiktionary. —Michael Z. 2010-05-15 06:16 z

It depends on what font you have downloaded for OCS. Back when we first started making use of OCS fonts, the only nice one that was freely available was BukyVede, which is quite small and must be enlarged 137%. Since then, another couple of OCS fonts have become available, and they may be larger. I still have only my original BukyVede. I think that if you have not downloaded an OCS font, then it probably defaults to your regular Cyrillic, which would look too big. You should install BukyVede and Kliment Std. See Appendix talk:Old Cyrillic script#Fonts. —Stephen 07:56, 15 May 2010 (UTC)

New context label and category

Please could someone sufficiently competent please create a regional context template for the Isle of Man (template:Isle of Man) and an associated category. Thanks Thryduulf (talk) 22:02, 11 May 2010 (UTC)

Good?—msh210℠ 22:08, 11 May 2010 (UTC)

Yes, thank you. Thryduulf (talk) 22:14, 11 May 2010 (UTC)

Oops. We already had template:Isle of Mann, so I've now redirected. Same content, I believe, so should still be okay.—msh210℠ 22:23, 11 May 2010 (UTC)

For some reason MHK is shown on the entry page as being a member of Category:Manx English, but it is not appearing in the category? Thryduulf (talk) 23:25, 11 May 2010 (UTC)

It will (has?).—msh210℠ 17:24, 12 May 2010 (UTC)

Yes, it's there now. Thryduulf (talk) 19:39, 12 May 2010 (UTC)

Detecting scripts

Is it possible (in theory) to determine which script template to use by looking at the characters in the word? If not, would a combination of the language name and the first letter be enough? I was just thinking that if we had an extension that did this, a lot of our script template stuff could go away and die, which (I think) would make lots of people slightly happier - though I wasn't volunteering :p. Conrad.Irwin 22:52, 11 May 2010 (UTC)

My initial thought is that it wont be possible in all cases, for example IPA transcriptions contains characters from several character sets, including Latin and Greek. Thryduulf (talk) 23:23, 11 May 2010 (UTC)

I hadn't thought about non-language cases. I suppose the "solution" would be to either allow special values for the language name, or just to ignore cases like IPA where (by virtue of being used with an IPA template) we already know what the script is. Conrad.Irwin 23:29, 11 May 2010 (UTC)

Even the combination of the language ID and all the characters wouldn't always be enough, because there are many cases where Han unification unified a Traditional and a Simplified character that are both attested within what we (correctly) treat as a single language. But even setting aside Han and such as weird edge cases that would still require manual handling: I think we'd still want to use the language ID, because it would be weird if Old English words showed up in a regular font when they contained no special characters and in a specialized font when they did. (Though maybe there's just no way to really handle that case smoothly, anyway.) —Ruakh_TALK 01:38, 12 May 2010 (UTC)

It would work for some and not for others. With Han unification, even specific Sino-Japanese, S.-Vietnamese, S.-Korean characters would show as one language. On the other hand, Latin, Greek and Cyrillic were never unified and letter that even look the same (both upper and lower case) are in fact, different scripts (Russian: Аа, Ее, Оо, Рр, Сс, Хх) or additional Ukrainian, Belarusian Іі and Serbian and Macedonian (Јј) are all Cyrillic letters, looking exactly like Roman.

Arabic based languages (Arabic, Persian, Urdu, etc.) may have different codes for some letters that look the same and some letters have shared codes. Arabic ى is not the same as Persian ی but Arabic و (w) is shared. --Anatoli 04:19, 12 May 2010 (UTC)

Thanks for your replies. It sounds like using the langauge name is the best first approximation, and a small improvement can be made (for a few languages - e.g. Serbian, Old Church Slavonic) that are written in two very distinct scripts by looking at some of the letters. Maybe less of an improvement over Xyzy et.al. than I thought. Conrad.Irwin 10:51, 12 May 2010 (UTC)

JavaScript does not support surrogates. I don't know if PHP does, but I doubt it. -- Prince Kassad 10:53, 12 May 2010 (UTC)

What's that got to do with anything :p. It's not hard to add support for surrogates to javascript if we were to need it, and MediaWiki does everything in utf-8 (and utf-8 encoded surrogate pairs are invalid titles: ], invalid HTML entites: &#xd950;&#xdf21; and I couldn't type them in)

I tried something like this in de.wikipedia once, until I found out that JavaScript cannot check for Gothic, Old Persian etc. glyphs because of lack of surrogate support. -- Prince Kassad 11:30, 12 May 2010 (UTC)

That seems surprising. You can extract the unicode code-point from a pair using (p.charCodeAt(0) & 0x3FF) * 0x400 + (p.charCodeAt(1) & 0x3FF) + 0x10000, but it's probably easier to just leave things as pairs and use them like that. Conrad.Irwin 11:55, 12 May 2010 (UTC)

hieroglyph

Hi, I found a font for hieroglyph here. So I would like use this font which seems to be a unicode font to replace the hieroglyph weird codes. For example, I find that 𓆈󰧛 is similar to ʽšȝ. However, I am not sure that is exactly the same hieroglyphs. So, I would like to know where do these characters (ʽšȝ) come from? Pamputt 11:36, 12 May 2010 (UTC)

How familiar are you with Ancient Egyptian graphemics, phonology, usual transliteration and phonetic reconstruction? All those transliteration-entries have to be relocated to their unicode variants (cuneiform entries have the similar problem), but unless you're really knowledgeable in what you're doing, you're likely to introduce many errors. --Ivan Štambuk 12:46, 12 May 2010 (UTC)

I am not at all familiar with Ancient Egyptian so I will not modify article without knowing what I do. Now, I would like only to know what are these characters (ʽšȝ), where do they come from? If I find a correspondance so one can modify article. Pamputt 13:16, 12 May 2010 (UTC)

It's imperative that all the entries be added as they are actually attested, and not as reverse-engineered from transliterations which are often wrong, lossy, obsolete, based on an unknown or ambiguous scheme, and oftentimes no transliterations at all but some kind of phonetic reconstructions (Egyptian language encompasses a few millennia of literature, and during that period many sound changes occurred, and the same sign could denote several different sounds depending on the period). As regards what these symbols mean, I suggest that you start with: Transliteration of Ancient Egyptian. --Ivan Štambuk 15:52, 12 May 2010 (UTC)

I know nothing about Egyptian. Can someone explain to me why we use ASCII pagetitles and <hiero> headwords instead of converting them to Unicode pagetitles and headwords? (There's a one-to-one correspondence, right?) Is that just a remnant of pre-Unicode-5.2 nonexistence of Unicode hieroglyphics, or is there some extant reason?—msh210℠ 17:34, 12 May 2010 (UTC)

My understanding is that most of the Egyptian titles were created before Unicode 5.2. While we could, and should, convert them all to Unicode characters now, the trick is finding someone competent enough to do it. I'm most certainly not. However, we can now start yelling at editors who enter Egyptian in romanized form (in the entry title) and deleting their work for not entering a language in an attested form. For whatever that's worth. -Atelaes λάλει ἐμοί 22:14, 12 May 2010 (UTC)

Hey, guise . I'd like to point out that most of the Egyptian entries are my work and were indeed all created before U5.2. I am excited that Egyptian has finally been Unicoded (in part)-I say "in part" because I have yet to hear how one is to input the Egyptian language apart from using a char palette. Also uncertain is how (if even) words will be properly composed as they were on murals, stelae, cartouches, etc. I am thinking something along the lines of Hangul syllabic composition, albeit without each jamo block (or Egyptian lemma, in this case) having its own codepoint as these would run into the thousands given the superabundance of combinations of graphemes in Egyptian hieroglyphics. This aspect of stacking is of supreme importance in Egyptian as one is hoping to recreate faithfully the exact order of units as they appear in the passage in question. Their very existence (Egyptian hieroglyphics) in Unicode is a great step for students/teachers of ancient languages who work chiefly or entirely online. I hope a facile scheme is soon introduced which allows one to enter Egyptian via a keyboard. At this point it would behoove us to "wait and see". I'm not too crazy about reformatting entries more than once...--Strabismus 00:56, 8 June 2010 (UTC)

Esperanto X-system

Esperanto has a popular alternate orthography system which uses the letter X instead of circumflexes and breves over letters (i.e. cx instead of ĉ). I think it would be somewhat helpful to use this system so that users without Esperanto keyboards can access Esperanto entries more easily. I can think of a number of possible ways of doing this:

Bot-create entries for each alternative spelling with {{alternative spelling of}} in the definition line.
Bot-create redirects to the main entries for each X-system spelling. (fr:wikt has done this.)
Try and get some Javascript thing to redirect users from X-spelling entries to the correct entries.

Any thoughts about the feasibility of doing one of these? (If doing any of these would be a huge effort it probably wouldn't make sense to try, as it's not really that important, but if it would be simple to do...) --Yair rand 01:59, 13 May 2010 (UTC)

4. Add Esperanto letters to the edittools. -- Prince Kassad 08:27, 13 May 2010 (UTC)

On fr.wikt it's just a person who adds the redirects, not a bot. PS we already have an Esperanto section in edittools. Mglovesfun (talk) 08:31, 13 May 2010 (UTC)

Don't add redirects, a specific alternative spelling template would seem to make sense. "Esperanto X-system spelling of", or w/e is the correct terminology. It should be very easy to do by bot. Conrad.Irwin 13:01, 13 May 2010 (UTC)

Keep in mind these are subject to WT:CFI as much as any other entry in this dictionary. -- Prince Kassad 16:44, 13 May 2010 (UTC)

I agree with CI and PK.—msh210℠ 17:05, 13 May 2010 (UTC)

Change in appearance of frequency-rank box

Is there a reason why {{rank}} should have changed its appearance? DCDuring TALK 02:02, 13 May 2010 (UTC)

Not really (seems to have been done by hippietrail six months ago). Unless it's updated soon, we should probably just blank that template (or restore its old, ugly, box and move it to the bottom of the page). Conrad.Irwin 13:00, 13 May 2010 (UTC)

Wiktionary:Votes/2010-04/Renaming requested entry pages 2

Has passed. Looks like a job for the page move script, Conrad.bot? Mglovesfun (talk) 12:34, 13 May 2010 (UTC)

Could do. What is to be done about Wiktionary:Requested entries:Unknown language (Arabic script), presumably Wiktionary:Requested entries (Arabic script) is enough? Conrad.Irwin 12:58, 13 May 2010 (UTC)

Or "...(unknown language, Arabic script)" so people don't put requests there for words they know are, e.g., Arabic?—msh210℠ 17:07, 13 May 2010 (UTC)

String functions

Will the string functions be implemented in Wiktionary any time soon? I believe I've lost track of the related bureucracy. --Daniel. 01:36, 15 May 2010 (UTC)

No. Never. Someone tried to get it by adding the functionality to Extension:ParserFunctions, but WMF disabled them. They just don't want the level of complexity that would ensue (no-doubt it wouldn't be too long until some project had re-implemented wikisyntax "better" using then). Hopefullly, we'll eventually get the transliterator extension but it seems that there is no interested person who is alloewd to install it. Conrad.Irwin 10:23, 15 May 2010 (UTC)

Machine- vs user-dependent preferences

Am I correct that "Wiktionary preferences" are machine-dependent, whereas "my preferences" and the monobook.css and .js are user-dependent? Do Wiktionary preferences depend on cookies or some analog thereof? DCDuring TALK 20:51, 15 May 2010 (UTC)

User js/css and gadgets are account-specific, everything else (WT:PREFS, translation adder etc.) works off cookies and is browser specific (machine specific to the majority who always use the same browser). Conrad.Irwin 21:05, 15 May 2010 (UTC)

Thanks. I had complained about something based on my misunderstanding the difference. DCDuring TALK 21:24, 15 May 2010 (UTC)

special:newmessages

What is this?—msh210℠ 17:47, 17 May 2010 (UTC)

Your personal Liquid Threads watchlist. It, confusingly, calls a reply to any thread you've edited a "new message". (It's not a "new message" in the MediaWiki talk-page message sense). Conrad.Irwin 17:52, 17 May 2010 (UTC)

Thanks.—msh210℠ 18:00, 17 May 2010 (UTC)

Empty bullet point

Why is there an empty bullet point when going to a page that does not exist?

See picture: http://i39.tinypic.com/167vz11.jpg Logan _Talk ^{Contributions} 20:32, 17 May 2010 (UTC)

I think it is likely because of the following line in MediaWiki:Noarticletext:

*{{#ifeq:{{NAMESPACE}}|Category|{{didyoumean|{{PAGENAME}}|{{PAGENAME}} language|span=<span id='did-you-mean'>}}}}

To my far from expert eye it looks like the bullet is not empty when there is a "did you mean" but empty when there isn't. To solve this I think the bullet would need moving to the "if true" section of the IF statement, but my knowledge does not extend far enough to actually do this. Thryduulf (talk) 21:36, 17 May 2010 (UTC)

I made a quick test with united Arab Emirates, that naturally shown a text "Did you mean United Arab Emirates?". That text was followed by an empty bullet. --Daniel. 07:07, 18 May 2010 (UTC)

Good now?—msh210℠ 16:09, 18 May 2010 (UTC)

Good indeed. Thanks to you who fixed that problem. --Daniel. 23:55, 18 May 2010 (UTC)

Thanks so much! Logan _Talk ^{Contributions} 15:40, 2 June 2010 (UTC)

"Preview translation" bug

I apparently found a tiny and harmless bug which is a result of using the function "Preview translation" (then "Save Changes") to add a translation to the translation box of an entry.

When there are two or more translations for one language, these translations are naturally separated by commas; each comma is followed by two spaces. Shouldn't it be followed by only one space? --Daniel. 08:41, 19 May 2010 (UTC)

Fixed, thanks. Conrad.Irwin 10:07, 19 May 2010 (UTC)

Watchlist and utilities

The current watchlist shows a section that contains utilities and wanted pages:

Utilities: Logs - New - Cleanup - Verification - Deletion - Requests - Shortcuts - Vandalism - Newbies - Patrol Anons
Wanted: strikee - defendernos - second half of the chessboard - 'roo - propraetorship - medievaldom (+/-)

I would like to hide utilities and wanted pages for myself. This I could do in CSS if the table that shows the utilities had an ID, such as "watchlist-utilities". Can someone please add that ID? The table to which the ID would be added starts as follows:

<table border="0" class="plainlinks">
<tr valign="top">
<td align="right">
<small><b><a href="https://dictious.com/en/Wiktionary:Utilities" title="Wiktionary:Utilities">Utilities</a>:</b></small>
</td>
...
</td></tr></table>

The table would become:

<table border="0" class="plainlinks" id="watchlist-utilities">
...

--Dan Polansky 10:23, 19 May 2010 (UTC)

Done, along with recentchanges-utilites. Conrad.Irwin 10:49, 19 May 2010 (UTC)

Thanks! --Dan Polansky 10:55, 19 May 2010 (UTC)

"by language" Category

Trying to link the Walloon Categoreye:Pronos di tchaeke lingaedje, I noticed that there were two diferent interwiki's on the same subject : in French and in English (see history of the page). If you mix both interwiki's, you only get problems with 2 languages : Russian (Категория:Местоимения по языкам + Категория:Местоимения) and Hungarian (Kategorija:Zamjenice po jezicima + Kategorija:Zamjenice). Now the first of both category is the right interwiki in Hungarian and Russian.

Like this, you can have a wider interwiki's list in Category:Pronouns by language. Please, check if all really link to the right all-languages-pronouns category.

Lucyin 11:48, 20 May 2010 (UTC)

On here, we always use by language for lexical categories, but not for topic ones like "trees" and "card games". You'd have to ask the Russian and Hungarian Wiktionaries whether those categories are bad duplicates or have different functions. In short, what exactly should we be checking here? Mglovesfun (talk) 23:00, 20 May 2010 (UTC)

I've added English and Walloon interwikis to pt:Categoria:Pronome as the user who runs the global category interwiki bot is Portuguese. Mglovesfun (talk) 23:04, 20 May 2010 (UTC)

User:Keffy/IPAc

This shouldn't be linked to from main space entries. We either need to rename it, or orphan it. Current trend is to use {{IPA}} which automatically links to Wikipedia. There is another Wiktionary pronunciation key, ergo they should be merged or (better IMO) this should be orphaned, but definitely kept as a valid user subpage. Mglovesfun (talk) 22:57, 20 May 2010 (UTC)

This looks like a very old experiment left to rot. All instances of {{ipac}} should be replaced with {{IPA}}. -Atelaes λάλει ἐμοί 23:00, 20 May 2010 (UTC)

Ought not to be difficult. Mglovesfun (talk) 23:05, 20 May 2010 (UTC)

Whenever I've come accross an entry linking to this in my cleanup of pronuncation sections I've changed it to {{IPA}} with an {{a|CA}} label. If my drunken count is accurate then there are currently only 28 entries that link to it, which should be pretty easy to orphan. I might even have time tomorrow. Thryduulf (talk) 23:49, 20 May 2010 (UTC)

I've changed {{ipacregion}} to {{a|Canada}}, I'm assuming the template recognizes these as the same? And yes, remove it. It totally clashes with the new-style {{audio}}, making it not only useles, by harmful. Mglovesfun (talk) 15:02, 21 May 2010 (UTC)

A bit later than I planned, but I've now orphaned {{ipac}} and am about to nominate it for deletion. Thryduulf (talk) 13:03, 16 June 2010 (UTC)

Targeted Translations

So, who's up for some new javascript? It's....*sniff*......some really good stuff. See User:Atelaes/Customization/Translations. Feedback is very welcome. -Atelaes λάλει ἐμοί 05:14, 23 May 2010 (UTC)

This is BRILLIANT! Wiktionary's usability issues with being used as a translating dictionary have just vanished. Any chance of this being made usable as a gadget? --Yair rand (talk) 05:20, 23 May 2010 (UTC)

Well, these things take time. There are almost certainly some bugs to work out, and some features to be added, such as the option for more than one language, and we'll need to take this to the BP, to get wider feedbacks, critiques, etc. But, longterm, yes, that is a possibility. Thanks for the feedback. -Atelaes λάλει ἐμοί 05:23, 23 May 2010 (UTC)

Cool. A couple things: It doesn't work in water, tall fonts make a scrollbar in the trans-top head, the "There is no X translation!" could probably lose the exclamation point, this probably shouldn't add stuff to the TTBC, and it jams up the translation gloss editor. --Yair rand (talk) 05:32, 23 May 2010 (UTC)

(Actually, it looks like it only messes up the gloss editor when there isn't a translation from the language available. Jams up the translations adder in those situations too.) --Yair rand (talk) 05:38, 23 May 2010 (UTC)

Ok. Could I possibly get a link on an example for the scrollbar issue? I'll turn on Conrad's bit tomorrow, confer with him, and see what I can do. -Atelaes λάλει ἐμοί 05:59, 23 May 2010 (UTC)

I was looking at parrot with Hebrew selected. Doesn't look like the problem occurs on most pages though... --Yair rand (talk) 06:30, 23 May 2010 (UTC)

I took the liberty of fixing that (though you could get away with just not displaying anything when you have nothing to display). When you edit the gloss, the preview will show without your language's translation - I think that is desirable behaviour, (and it's hard to fix :p). Conrad.Irwin 10:50, 23 May 2010 (UTC)

That page claims to have an input field to type in, but does not (in my browser (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.19) Gecko/2010040116 Ubuntu/9.04 (jaunty) Firefox/3.0.19)).—msh210℠ 17:51, 24 May 2010 (UTC)

You have to get the JS for the field to appear, either by manually importing or by checking the box in WT:PREFS. You'll subsequently have to clear your cache, in order to get activate the JS. -Atelaes λάλει ἐμοί 22:15, 24 May 2010 (UTC)

Keep. --Rising Sun talk? contributions 11:18, 23 May 2010 (UTC)

Nice. It doesn't seem to work when there's a {{trreq}} (see house). --Bequw → τ 05:57, 26 May 2010 (UTC)

Fixed. Thanks for noting that issue. -Atelaes λάλει ἐμοί 13:54, 28 May 2010 (UTC)

If this becomes standard-issue, then can the choice be made on any page instead of on a dedicated page? Perhaps as follows: If the person has no cookie set (I assume this works on cookies?), then each translations table should have in its trans-top bar an input box with grey text "choose a default language and press Enter" or something, which when submitted will set the cookie and change the trans-top bars on that page to display the appropriate content. I have no idea whether this is feasible, but if it is then I think it would afford many many more users the possibility of using this feature than would be the case if the choice must be made on a separate page.—msh210℠ 16:05, 28 May 2010 (UTC)

Hmmm....that'd be trickier, but I suspect Conrad's little API could make it work, though I worry if it would make the navbars a bit crowded. However, I think you're right that we absolutely do need to find ways to get the average user in on stuff like this, which is almost certainly not the case now. -Atelaes λάλει ἐμοί 21:53, 28 May 2010 (UTC)

Request templates

While continuing to update request templates, I found this. Some of our request templates are in the form of boxes, and some are single line templates that fit neatly. I think the boxes should go right under headers or else they just sit between two passage of text, like we're seeing here. Mglovesfun (talk) 22:34, 25 May 2010 (UTC)

See Category:Request templates. Furthermore I thought that {{rfex}} redirected to {{rfexample}}. They don't as it happens, one displays a line of text, and one doesn't. Couldn't we redirect in this case? I suppose not easily, no. Mglovesfun (talk) 10:39, 26 May 2010 (UTC)

Missing Babel templates

Could someone with a knowledge create templates for si-0, bn-0, he-0, fa-0, ur-0, id-0, and or others, please? They can be in English, IMHO, until they get translated (I think I've seen a translation request on a template, not sure how this is done). --Anatoli 02:48, 28 May 2010 (UTC)

I thought the only -0 template we used was en-0, this being the English Wiktionary. Otherwise you could have thousands of -0s (e.g. I speak no Malay, Ukrainian, Navajo, Esperanto, Afrikaans, Hebrew, Fang, South Bolivian Quechua, Ngangikurrunggurr, ...) making it difficult to see what languages you do know. Thryduulf (talk) 10:15, 28 May 2010 (UTC)

My reason is - if I contribute in a language (translations), which I almost don't know but I'm certain about some words, I want others users to know that they can't expect more from me in these languages. --Anatoli 12:50, 28 May 2010 (UTC)

I'd just not mention them in your babel templates. I know a smattering of vocabulary in more languages than I can speak/write to at least level 1, and I expect that is very common.Thryduulf (talk) 16:07, 28 May 2010 (UTC)

Template:frp

A user changed the language name to Arpitan, breaking a number of pages and categories on the way. He/she is right, it's a valid name, but so is Franco-Provençal. I'd favor the rename as people confuse Provençal and Franco-Provençal, which aren't the same language. But in doing so, Category:Franco-Provençal derivations will need to be deleted, and I imagine some translation tables, maybe a hundred or more, will need to be updated. Mglovesfun (talk) 10:28, 28 May 2010 (UTC)

楷书

Can someone skillful please add a template for "Chinese calligraphy", which would go under the main category of "Calligraphy" for Mandarin? Cheers. ---> Tooironic 02:04, 29 May 2010 (UTC)

Redirects for macrons

Such as scēap. Do we want these? Or is it simply because a lot of etymologies link to the forms with macrons in Old English or Old High German. The search got upgraded so that you can find forms with or without diacritics - if you type sceap into the search box, it finds sceap and scēap. And at least two languages - Maori and Nahuatl use macrons in the page titles, so those can't redirect. If someone wants to created a list I don't mind going through it. IMO you'd need to include all of the following characters: ĀāĒēĪīŌōŪūȲȳǢǣ. I don't mind orphaning them and deleting them, although it won't be immediate. Mglovesfun (talk) 17:29, 29 May 2010 (UTC)

Maybe not deleted as other Wiktionaries would allow cwēn as a page title, and this would allow for extra interwikis. Mglovesfun (talk) 17:55, 29 May 2010 (UTC)

If you're right than other languages use macrons in pagetitles and the search will find the right page anyway, then I think that we should not have these as redirects.—msh210℠ 15:34, 1 June 2010 (UTC)

Some languages use macrons and some do not. Macrons in OE are a relatively recent editorial convention, and should not be part of page titles. It would thus be preferrable to correct any links in etymology for OE and OHG, rather than to have the redirects. As you've already noted Māori uses macrons, so we can't simply redirect in some cases anyway. I rather think we should therefore not redirect, since it would be very inconsistent if we did. --EncycloPetey 01:31, 4 June 2010 (UTC)

Agreed. All links to Old English, Latin, and other languages which, by convention, do not use macrons in page titles, should be fixed, and then the redirects deleted. -Atelaes λάλει ἐμοί 01:41, 4 June 2010 (UTC)

Redirects for Arabic words with diacritics

(inspired by Redirects for macrons)

I wish Arabic script based entries (and Hebrew) worked the same. Words with or without diacritics are not mutually searchable. توتر is not searchable with تَوَتُّر although many words could be spelled with different diacritics, it would be good to have this functionality. --Anatoli 01:38, 3 June 2010 (UTC)

A Hebrew example: עלייה (without diacritics) and עֲלִייָה (with diacritics). --Anatoli 01:40, 3 June 2010 (UTC)

There have been people who have entered such. When that occurs (and I'm the one to catch it), I always keep the redirect, and I do think that there is some value in creating such redirects. (It can even be automated: look for a vowelized form in a Hebrew inflection line, and create the redirect if the spelling sans vowels is the same.) There is conceivably a problem with some Hebrew words, as the appropriate pagename in another language may be with diacritics or whatever you call them, but in practice I don't know of any language for which that will be a problem for Hebrew characters. (Specifically, the only language that I know uses such marks with Hebrew characters is Yiddish, and I doubt there are any Hebrew words that look the same with vowels as Yiddish words.) That said, I can't speak for Ladino or other languages, certainly not Arabic-script ones. (On another note, I once suggested that Hebrew entries use vowels in their pagetitles, with the form without vowels soft-redirecting. (Or at least for non-excessive-spelling forms.) But this was rejected by other editors.)—msh210℠ 15:05, 3 June 2010 (UTC)

You can always create redirects but if there is only one entry with the same basic letter like in the above example, it's a shame the redirect and the quick search doesn't work automatically. I think it may not be hard, since when you're adding a translation, then the actual link added is without diacritics but the alt= portion of {{t}} keeps the vowelisation. It works very similarly in both Hebrew and Arabic. In addition, it would be good if alif was searchable with or without hamza and other optional tashkil e.g. ا, أ, إ, آ and ٱ - at the moment, they are treated as different symbols (they are, of course) and there may be entries and translation with different level of stricktness or a word may be in Persian (except for some letters, the symbols are the same and the word in Persian, Urdu, etc. may also be a word in Arabic) where a strict Arabic spelling would require an alif. Sorry, if I am confusing. In short, اصل is an entry and أصل is a manual redirect. If the functionality is added, there would be no need for أصل or if the main entry was أصل, then using اصل would still find it. --Anatoli 01:58, 4 June 2010 (UTC)

Previous-next links

Not sure what happened, but the non-English vowels (e.g. á, é, í, ó, ú etc.) are suddenly replaced by a blank box (IE8) or a white question mark in a blue diamond (FireFox). When I click on these words, it can't find them, I get an error. Would anyone know what may cause this? --Panda10 23:29, 31 May 2010 (UTC)

The question mark in a diamond makes it sound like a character encoding issue to me. Are you viewing in UTF-8? ISO-8851-9? other? Thryduulf (talk) 16:28, 1 June 2010 (UTC)

It's UTF-8 in FireFox. --Panda10 21:00, 1 June 2010 (UTC)

I have the same problem (try to display débarbouiller). When you try to click on them, the message is that the title is not supported. I also use Firefox. Note that, with my Internet explorer, these words don't show at all. Lmaltier 06:27, 5 June 2010 (UTC)

I don't completely understand the code, but I suspect that something's gone wrong with the information being from toolserver. I've posted a note to hippietrail, who wrote the script. Hopefully he'll have a solution. -Atelaes λάλει ἐμοί 10:54, 5 June 2010 (UTC)

Wiktionary:Grease pit/2010/May