Thank you for the Wiktionary welcome, and for your inquiry about the Babel feature. Response is that I'm not comfortable with adding the template yet. The codes/levels don't provide the right classification for my account. E.g., level 5+ for en. My 2nd item is a request that you please notify your Wiktionary co-administrators not to randomly delete entries (etiquette rule #4), since they are causing waste of volunteer time and energy, and causing loss of relevant information over extremely petty, unexplained issues on their part. Blurbzone 12:14, 3 January 2012 (UTC)
{{Babel|en}}
: that will do, provided you are a native English speaker.Hi, would you accept a nomination at WT:RFFF. Mglovesfun (talk) 11:53, 21 January 2012 (UTC)
Some stats about the replacement batch made using WT:AWB:
--Dan Polansky 16:32, 21 January 2012 (UTC)
I have made a follow up, by considering the forgotten search for "< {{proto" in AWB.
( < (from False False False False False False True (< (from False False False False False False True < , from False False False False False False True ^< From False True True False False False True
--Dan Polansky 18:55, 22 January 2012 (UTC)
As a hopefully last follow-up, I have searched all the pages containing " < ", without any checking for its context.
--Dan Polansky 22:53, 22 January 2012 (UTC)
I have fixed double "from" occurrences, as "from from"; #edits: 46.
Regexp table used for this:
from from from True False False False False False From from From True False False False False False From From From True False False False False False from From from True False False False False False
--Dan Polansky 11:54, 24 January 2012 (UTC)
No, no chance I would voluntarily add Babel to my userpage. The underlying theory of the Babel presentation is wrong. The templates have little actual usage beyond vanity. And I subscribe to the original purpose for user namespace on Wikimedia projects (circa 2004, so mostly irrelevant at the current time): User pages are for the over-all improvement of the wiki. - Amgine/ t·e 15:34, 22 January 2012 (UTC)
In 2010 and 2011 I did poorly thought, immature and harmful edits to Wikisaurus, including the history of WS:person, the use and documentation of certain templates and a flood of WS pages for proper nouns. You had to undo them and/or discuss them with me, while I defended my edits stubbornly and disregarded your experience in contributing to the project and your knowledge of semantic relations.
I am sorry. --Daniel 17:53, 24 January 2012 (UTC)
Hi Dan. Could you create Czech entries for kroužek (the English already exists) and the diminutive suffix -ek (that page has Breton, Hungarian, Kurdish, and Serbo-Croatian entries) please? Also, could you tell me what diminutive suffix was involved in the derivation of čárka from čára please? I ask you because you are the member of Category:User cs-N with whom I am best acquainted. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:19, 5 February 2012 (UTC)
I have run a batch of edits that standardized linking to Wikisaurus from the mainspace, using AWB. This required a lot of manual editing and supervision.
Worklist identification: search for "[[WikiSaurus" without regexp in a case-sensitive manner.
The number of edits: 175.
Replacements used in AWB:
\* *\[\[WikiSaurus * See also [[WikiSaurus True True False False False False ^''See'' \[\[WikiSaurus * See also [[Wikisaurus True True True False False False ''ee'' \[\[WikiSaurus See also [[Wikisaurus True True False False False False [[WikiSaurus [[Wikisaurus True False False False False False
--Dan Polansky 10:36, 6 February 2012 (UTC)
Then please make clear which of the sources cited claims that the etymolgy is unknown. "Etymology unknown" is a claim as much as any other, and it must be clear whose claim this is. Otherwise you are just replacing one unsourced claim by another. --Dbachmann 10:33, 7 February 2012 (UTC)
A personal log: I have done a partial etymology cleanup using AWB, above all by putting commas before "from" in etymology chains. Beyond that, I added the use of the term template to some easily matched parts that were not yet formatted using the term template; I did it in an incomplete manner, mainly as a preparation for adding commas before "from". The task of adding "term" template is not perfectly suited to AWB; ideally, one would loop at a comprehensive list of languages and, for each language, perform a couple of regexp replacements.
''\*)#Middle English\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=enm}} False True False False False False ''\*)#Old English\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=ang}} False True False False False False ''\*)#Anglo-Norman\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=xno}} False True False False False False ''\*)#Italian\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=it}} False True False False False False ''\*)#Old French\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=fro}} False True False False False False ''\*)#Old Norse\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=non}} False True False False False False ''\*)#Middle French\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=frm}} False True False False False False ''\*)#French\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=fr}} False True False False False False ''\*)#Latin\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=la}} False True False False False False ''\*)#Old High German\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=goh}} False True False False False False ''\*)#Middle High German\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=gmh}} False True False False False False ''\*)#Middle Low German\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=gml}} False True False False False False ''\*)#German\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=de}} False True False False False False ''\*)#Old Frisian\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=ofs}} False True False False False False ''\*)#East Frisian\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=frs}} False True False False False False ''\*)#Middle Dutch\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=dum}} False True False False False False ''\*)#Dutch\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=nl}} False True False False False False ''\*)#Old Saxon\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=osx}} False True False False False False ''\*)#Spanish\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=es}} False True False False False False ''\*)#Gothic\|(*)\]\]''\s*(*) {{term|$1|$2|$3|lang=got}} False True False False False False ''\*)#Middle English\|(*)\]\]'' {{term|$1|$2|lang=enm}} False True False False False False ''\*)#Old English\|(*)\]\]'' {{term|$1|$2|lang=ang}} False True False False False False ''\*)#Anglo-Norman\|(*)\]\]'' {{term|$1|$2|lang=xno}} False True False False False False ''\*)#Italian\|(*)\]\]'' {{term|$1|$2|lang=it}} False True False False False False ''\*)#Old French\|(*)\]\]'' {{term|$1|$2|lang=fro}} False True False False False False ''\*)#Old Norse\|(*)\]\]'' {{term|$1|$2|lang=non}} False True False False False False ''\*)#Middle French\|(*)\]\]'' {{term|$1|$2|lang=frm}} False True False False False False ''\*)#French\|(*)\]\]'' {{term|$1|$2|lang=fr}} False True False False False False ''\*)#Latin\|(*)\]\]'' {{term|$1|$2|lang=la}} False True False False False False ''\*)#Old High German\|(*)\]\]'' {{term|$1|$2|lang=goh}} False True False False False False ''\*)#Middle High German\|(*)\]\]'' {{term|$1|$2|lang=gmh}} False True False False False False ''\*)#German\|(*)\]\]'' {{term|$1|$2|lang=de}} False True False False False False ''\*)#Old Frisian\|(*)\]\]'' {{term|$1|$2|lang=ofs}} False True False False False False ''\*)#Old Saxon\|(*)\]\]'' {{term|$1|$2|lang=osx}} False True False False False False ''\*)#Spanish\|(*)\]\]'' {{term|$1|$2|lang=es}} False True False False False False ''\*)#Gothic\|(*)\]\]'' {{term|$1|$2|lang=got}} False True False False False False {{etyl\|la}}\s*''\*)\]\]''\s*"(*)" {{etyl|la}} {{term|$1||$2|lang=la}} False True False False False False {{etyl\|la}}\s*''\*)\]\]'' {{etyl|la}} {{term|$1|lang=la}} False True False False False False {{etyl\|enm}}\s*''\*)\]\]''\s*"(*)" {{etyl|enm}} {{term|$1||$2|lang=enm}} False True False False False False {{etyl\|enm}}\s*''\*)\]\]'' {{etyl|enm}} {{term|$1|lang=enm}} False True False False False False {{etyl\|fro}}\s*''\*)\]\]''\s*"(*)" {{etyl|fro}} {{term|$1||$2|lang=fro}} False True False False False False {{etyl\|fro}}\s*''\*)\]\]'' {{etyl|fro}} {{term|$1|lang=fro}} False True False False False False {{term\|(*)\|\1\| {{term|$1|| True True False False False False }} from {{(term|etyl|proto|prefix) }}, from {{$1 True True False False False False ^Via {{(term|etyl|proto|prefix) From {{$1 True True True False False False
--Dan Polansky 14:46, 12 February 2012 (UTC)
Reliability: The regexp replacements above are imperfect. Their use requires close monitoring in AWB. I have cought several bad edits, but I may have overlooked other ones. An example of bad edit, cought by Ruakh: http://en.wiktionary.orghttps://dictious.com/en/בן־דוד?diff=16253121. --Dan Polansky 15:00, 12 February 2012 (UTC)
How do you interpret this vote? Is it implementable in any meaningful way? Mglovesfun (talk) 13:55, 23 February 2012 (UTC)
I have added Czech translations donated by Zdeněk Brož via ZBroz (talk • contribs). See the user page and the user talk page for details. --Dan Polansky (talk) 23:17, 26 February 2012 (UTC)
Thank you for checking on the numbers Wikisaurus entry. I checked each and every number that I put up. All of them are from Wikipedia, and I checked all of them, except, I think the twins/triplets series. I cannot understand why you deleted that work. Please restore. BenjaminBarrett12 (talk) 16:05, 29 February 2012 (UTC)
I have restricted adjectival translations to their lemma forms, removing feminine, neuter, and sometimes plural forms listed alongside the masculine form in the translation table. Done using AWB. Minor supervision was necessary, as some Icelandic terms were found outside of translation tables. These could have probably been reduced to masculine form too, but I have skipped them to be on the safe side.
\] {{m}}, \] {{f}}, \] {{n}}, \] {{p}} ] False True True False False False True \] {{m}}, \] {{f}}, \] {{n}} ] False True True False False False True
--Dan Polansky (talk) 20:07, 29 February 2012 (UTC)
I have performed another batch, this time for translations using the t template rather than the plain ] markup. Supervision is needed; there were some false positives in noun entries, such as in "gangue" or "pupil". In fact, it is surprising that the lame heuristic used in the regexp table below works so well for a range of languages. The batch sometimes left forms for m|p, f|p and n|p behind, in those few entries where they were used.
{{(t-?\+?)\|(..)\|(...)(*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3(*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3(*?)\|n\|?.*?}}, {{t-?\+?\|\2\|\3(*?)\|p\|?.*?}} {{$1|$2|$3$4$5}} {{(t-?\+?)\|(..)\|(...)(*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3(*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3(*?)\|n\|?.*?}} {{$1|$2|$3$4$5}}
Above, I have been working with languages that have the three genders of m, f, and n. What remains to be done is the same for languages with the two genders of m and f, such as--probably--Italian, French and Spanish. Using the same technique is likely to generate a significant number of false positives. The matching on all three genders almost always selects adjectival translations; a similar matching for only masculine and feminine would probably select many nouns, such as analogues of English "actor" and "actress" in these languages. To fix this, one would have to make sure that the translation being matched is within an adjective section, which is nowhere obviously possible using AWB regexp replacements. --Dan Polansky (talk) 10:44, 1 March 2012 (UTC)
A further batch, heavily supervised with several skips and manual corrections, for the likes of {{t|...|m|f}}, {{t|...|n}}:
{{(t-?\+?)\|(..)\|(...)(*?)\|m\|f(\|?.*?)}}, {{t-?\+?\|\2\|\3(*?)\|n\|?.*?}}, {{t-?\+?\|\2\|\3(*?)\|p\|?.*?}} {{$1|$2|$3$4$5}} False True True False False False True {{(t-?\+?)\|(..)\|(...)(*?)\|m\|f(\|?.*?)}}, {{t-?\+?\|\2\|\3(*?)\|n\|?.*?}} {{$1|$2|$3$4$5}} False True True False False False True
--Dan Polansky (talk) 11:51, 1 March 2012 (UTC)
Responding to a query by Vahag several sections above: The search in AWB for "> {{(term|etyl|proto|prefix)" finds 387 pages. Unfortunately, in these entries, ">" means either "from" or "whence", depending on whether it goes in the right directtion. For ">" meaning "from", an example page is "drachma"; for ">" meaning "whence", an example page is "saksalaistaa".
I have performed a batch of edits in AWB nonetheless.
> {{(term|etyl|proto|prefix) , from {{$1 False True
--Dan Polansky (talk) 13:59, 1 March 2012 (UTC)
Thank you for all your help so far. I see you are involved in etymology, and I have a related question I hope you can help me with. The word "pasta" used to be "paste" in English when referring to the Italian food. In the pasta entry, I put up the earliest quotation I could find of "pasta" and now I would like to put up some earlier instances of "paste" (where the word clearly means "pasta). Is it acceptable to also add an obsolete entry to the paste entry? And can I put both "paste" and "pasta" in the etymology section on the pasta entry? I hope my questions make sense. BenjaminBarrett12 (talk) 22:17, 1 March 2012 (UTC)
Some Wikisaurus statistics for January 2012 follow, based on stats.grok.se, such as "http://stats.grok.se/en.d/201201/Wikisaurus:fatty_acid".
The number of Wikisaurus pages: 1209 (inaccurate, actually a bit more more; a list extracted from Wiktionary:All_Wikisaurus_pages)
The total number of page hits in Wikisaurus in January 2012: 85,983
The total number of page hits in Wikisaurus in January 2012, without top 100 pages: 23,513
Median page hits per Wikisaurus page in January 2012: 17
Average page hits per Wikisaurus page in January 2012: 71
Top 100 Wikisaurus pages in January 2012, with page hits in the month:
WS:sexual intercourse | 5448 |
WS:masturbate | 5383 |
WS:vagina | 5004 |
WS:vulva | 4793 |
WS:breasts | 4191 |
WS:erection | 3234 |
WS:semen | 2959 |
WS:penis | 2523 |
WS:promiscuous woman | 2440 |
WS:nude | 2402 |
WS:prostitute | 2027 |
WS:promiscuous man | 1769 |
WS:mistress | 1564 |
WS:sexual partner | 1289 |
WS:homosexual | 1218 |
WS:bisexual | 1068 |
WS:pregnant | 891 |
WS:labia | 869 |
WS:testicles | 745 |
WS:heterosexual | 739 |
WS:money | 638 |
WS:beautiful woman | 582 |
WS:nonsense | 441 |
WS:penis/translations | 365 |
WS:buttocks | 337 |
WS:fool | 299 |
WS:clitoris | 296 |
WS:marijuana | 265 |
WS:arrogant | 264 |
WS:bathroom | 258 |
WS:drunk | 258 |
WS:oral sex | 226 |
WS:sexual activity | 209 |
WS:libertine | 203 |
WS:beer | 193 |
WS:ear | 192 |
WS:anus | 191 |
WS:obstinate | 187 |
WS:female genitalia | 180 |
WS:fastidious | 177 |
WS:pubic hair | 169 |
WS:cheeky | 167 |
WS:destroy | 158 |
WS:male homosexual | 148 |
WS:sexy | 147 |
WS:excellent | 143 |
WS:marijuana cigarette | 140 |
WS:copulate | 134 |
WS:ejaculate | 132 |
WS:naive | 131 |
WS:insane | 129 |
WS:woman | 127 |
WS:girl | 126 |
WS:masturbation | 126 |
WS:wow | 122 |
WS:idiot | 121 |
WS:thingy | 118 |
WS:calm | 112 |
WS:die | 109 |
WS:witty | 109 |
WS:characteristic | 108 |
WS:villain | 108 |
WS:anal sex | 102 |
WS:abandon | 101 |
WS:strange | 101 |
WS:sycophant | 101 |
WS:beautiful | 99 |
WS:joke | 99 |
WS:ghost | 98 |
WS:chav | 97 |
WS:covert | 97 |
WS:water | 94 |
WS:hinder | 93 |
WS:steal | 91 |
WS:supposition | 91 |
WS:gigantic | 90 |
WS:give head | 90 |
WS:ejaculation | 89 |
WS:intelligent | 89 |
WS:reprehend | 89 |
WS:index finger | 88 |
WS:saying | 88 |
WS:dork | 87 |
WS:humble | 86 |
WS:ephemeral | 84 |
WS:fake | 83 |
WS:beginner | 82 |
WS:combative | 82 |
WS:happy | 82 |
WS:dammit | 81 |
WS:advise | 80 |
WS:apex | 80 |
WS:evil | 80 |
WS:skillful | 80 |
WS:stingy | 80 |
WS:disorder | 79 |
WS:model | 79 |
WS:scrawny | 79 |
WS:mad person | 78 |
--Dan Polansky (talk) 20:17, 12 March 2012 (UTC)
You're good at regex and AWB, from what I can tell. (Thank you for replacing those less-than signs!) Could you also replace c., C, c, ], C, C., c and c. (and possibly other variants of "c") (both with and without spaces between the c, dot or bracket and the following word) in etymology sections — replace them with "attested circa" or "circa"? And could you replace cf. / Cf. with "compare", since Wiktionary is not paper?
Entries with etymologies that begin with "c."-variants are: discord gangster donate authoritarian disability numinous adaptitude heliostat virescent sycomore sycamore disbelieve iridescent acetification heteronym inalienable anonymous indigent guzzle confection burger legion availability hieroglyph centenarian immanent humanitarian anonym equalitarian putrescent centenary beautification rubescent astronomical lickety-split reification prose offing notability dollar scissors realize anthroponym opalescent squeeze split dwarf arborescent (there may be more entries that have "c" somewhere else).
Entries with etymologies that begin with "cf." or "Cf." are: el nana celo civet baste mope squander brou actionable winkel Wallachia boira gibe glans gebied (other entries may have "cf" elsewhere in them).
- -sche (discuss) 20:44, 14 March 2012 (UTC)
Are you also harassing everyone else that simply votes "delete" "tosh" and "nah" on a regular basis with no "bearing" or just me?Lucifer (talk) 22:12, 6 April 2012 (UTC)
Ahoj,
Is it hard to produce regular statistics on translations into a language (Russian, Chinese, etc.)? Where can I find this info or does it require some coding? --Anatoli (обсудить) 07:35, 13 April 2012 (UTC)
I'd disagree that this is all that includable. It's easily derived from the sum of its parts, if you can find the sense at mood, granted. Furthermore it's much much more commonly called the subjunctive, so this and indicative mood, nominative case (etc. etc.) should really jut point to the more common form, so subjunctive, indicative and nominative. Mglovesfun (talk) 10:21, 9 May 2012 (UTC)
A blog post, as it were, relating to Wiktionary:
I do not know whether names of languages are proper nouns or not, but I will try to develop a position according to which names of languages are not proper nouns.
In English, there are two characteristics of names of languages that point to their being proper nouns: capitalization and their having a single referent.
Capitalization of names of languages can be declared an accident. Terms referring to people by nationality ("Englishman", "Spaniard") are also capitalized and yet are not proper nouns. In names of nationalities and languages, capitalization seems to confer an honor to the referent ("Spanish") or manner of reference ("Spaniard").
Having a single referent is also not guaranteeing being a proper noun. An analogy can be construed between terms referring to materials and masses on one hand and terms referring to languages on the other hand. Terms referring to materials such as "gold" and "wood" also have a single referent, even if spatially distributed one. A chair can be made of wood or metal. By a bit of a stretch, a sentence can be made of English (as if it were a sort of material from which sentences are made) or of Spanish. The hypothesis would be that terms that have a single referent that is an abstract object are not considered proper nouns; abstract objects include chemical elements, chemical compounds, colors, numbers, etc. If this hypothesis is accepted, it remains to be seen whether a language is a concrete object or an abstract object.
A language has no spatial extension, no location and no mass, so it does not belong to the ranks of such concrete objects as rocks, rivers, cities, plants, animals, people, stars, comets, etc. The analogy to materials suggested above points to the possibility that a language is an abstract object; similar abstract objects seem to be musical styles, such as "rock" and "jazz", and names of dances (Wikisaurus:dance).
A key property of abstract objects is that they can be instantiated or quasi-instantiated, even though they already are instances. Thus, gold can be instantiated in a particular brick, rock can be instantiated in a particular song (which again is an abstract object, instantiated in a particular performance of that song), the number five is instantiated in my right hand in the number of fingers, the color black is instantiated in the color of my laptop, and the English language is instantiated in this particular sentence. Only a fraction of the whole of gold is instantiated in a particular brick; only a fraction of the whole of English is instantiated in any particular sentence. By contrast, concrete objects do not seem to show anything like this ability of being instantiated.
This consideration is complicated by the fact that information artifacts (such as this sentence) seem to be abstract objects yet their names are considered proper names, and so capitalized. To deal with it, we could introduce a degree of being abstract. Thus, a particular sentence is still an abstract object, instantiated in utterances of the sentence. But it is less abstract than a language, instantiated in its sentences. Furthermore, if this consideration creates a problem, the problem so created nowhere concerns only names of languages. Why are names of styles of music and dances considered common nouns, while names of particular artistic works proper nouns? Are languages more like styles of music and dances or more like particular artistic works?
To close the discussion, if a term that refers to a single referent that is an abstract object is not considered a proper noun regardless of the singularity of reference, and if a language is an abstract object, then a name of language is not a proper noun, regardless of capitalization.
--Dan Polansky (talk) 22:03, 14 July 2012 (UTC)
When you create a new term in a language with limited documentation, especially one you don't know, be sure to add a citation (either a quotation or a ===References=== section) and add {{LDL}}
with it. Thanks --Μετάknowledgediscuss/deeds 08:35, 21 July 2012 (UTC)
Thanks :) --Μετάknowledgediscuss/deeds 09:07, 21 July 2012 (UTC)
{{LDL}}
to all members of Category:Malagasy nouns, after you have added it to teny? Are you a bureaucrat who hates to do real work? --Dan Polansky (talk) 09:08, 21 July 2012 (UTC)
{{LDL}}
good for? Why have you placed it to teny? By what criteria do you decide which entries of poorly documented languages should have the template? --Dan Polansky (talk) 09:17, 21 July 2012 (UTC)
{{LDL}}
to teny but not to Category:Malagasy nouns? Does the policy force you to add {{LDL}}
to teny? If so, what sentence of the policy? Why does not the same policy force you to add {{LDL}}
to all members of Category:Malagasy nouns? Why are readers not alerted on almost all pages of Wiktionary that three quotations are missing? --Dan Polansky (talk) 09:27, 21 July 2012 (UTC)
{{LDL}}
template)" (this is described as a "requirement" a few lines above), I have already answered that, because most entries on Wiktionary are easily citable with three quotations and can be presumed to be reliable. --Μετάknowledgediscuss/deeds 09:32, 21 July 2012 (UTC){{LDL}}
to all members of Category:Malagasy nouns? --Dan Polansky (talk) 09:35, 21 July 2012 (UTC)
(A Wiktionary blog-like post) Wiktionary has the practice of including inflected forms rather than restricting itself to lemmas. One advantage of doing so is that the reader of a written material can take any inflected form found in a sentence, and find it in Wiktionary, even when he does not know the regularities ("smile" --> "smiled") and irregularities ("buy" --> "bought") of inflection of the language. Thus, to the extend to which inflection is regular, form-of entries can be thought of as a tabulated or buffered result of an inflectional analyzer, something like an addition table replacing a compact algorithm for addition.
A consequence that may be disliked is that, in highly inflected languages, entries for inflected forms massively outnumber lemma entries. Based on WT:STATS made using the dump of 2012-07-24, here are some statistics for some of the languages with highest numbers of inflected forms. Let me highlight that the column E has the number of form-of definitions rather than form-of entries, and the column D has the number of gloss definitions rather than the number of gloss-having entries. Furthermore, note that C = D + E. Column B stands in no direct relation to the other columns other than that B < C; it involves both entries with form-of definitions and entries with gloss definitions. B-D comes close to being a lower bound on the number of pure form-of entries.
Language (A) | Number of entries (B) | Number of definitions (C) | Gloss definitions (D) | Form-of definitions (E) | E/D | B-D |
---|---|---|---|---|---|---|
Latin | 613023 | 992531 | 44653 | 947878 | 21 | 568370 |
Italian | 487007 | 613087 | 129759 | 483328 | 4 | 357248 |
Spanish | 242918 | 357840 | 38284 | 319556 | 8 | 204634 |
French | 254629 | 333948 | 53346 | 280602 | 5 | 201283 |
Esperanto | 100720 | 101803 | 12254 | 89549 | 7 | 88466 |
German | 69501 | 113797 | 31781 | 82016 | 3 | 37720 |
Swedish | 89768 | 100972 | 20954 | 80018 | 4 | 68814 |
Finnish | 107180 | 133946 | 63074 | 70872 | 1 | 44106 |
Catalan | 56049 | 72196 | 9761 | 62435 | 6 | 46288 |
The claim of Wiktionary:Main_Page that Wiktionary has "3,065,335 entries with English definitions" has to be read with the inflected forms in mind. By summing the column D from WT:STATS, we get 1,490,000 gloss definitions; the number of gloss entries is even lower than that.
There is a discussion in Wiktionary:Requests_for_verification#vuvuzela about whether attestation requirements should apply to inflected forms. Some people seem to think that whenever Wiktionary has an attested lemma entry, it should also have all regularly formed inflected forms of the lemma regardless of their attestation. By contrast, I think that Wiktionary should avoid hosting unattested inflected forms regardless of the attestation of the lemma. Especially, when an inflected form is challenged in RFV, it should be deleted unless attested. The use of bots to create a complete set of inflected forms where there is a suspition that some of them are unattested seems tolerable, provided the inflected forms are deleted once they are challenged and left unattested. --Dan Polansky (talk) 09:44, 4 August 2012 (UTC)
For some notes on Luciferwildcat (talk • contribs), see #RFD above.
Users of the person:
Unattested entries that he has hastily added:
Incidents:
Edits showing lack of lexicographical skill:
Vote:
Editing pattern:
--Dan Polansky (talk) 10:03, 12 August 2012 (UTC); updated --Dan Polansky (talk) 16:48, 12 August 2012 (UTC); updated --Dan Polansky (talk) 11:32, 1 October 2012 (UTC)
Today, I have discovered that an export of English Wiktionary is being published that only contains English terms with their definitions in a relational format, as four tab-separated columns. This is so nice! The file is much smaller than a full dump (50 MB after unpacking), can be copied and pasted to Excel and then filtered using Excel filtering tools, can be grepped while you see both the term and the definition in the result line, can be copied to Excel after grepping, etc.
The definition files location:
The result of 'grep "irrationality" enwikt-defs-20120821-en.tsv' (on Windows, you may use "findstr" instead of "grep"):
English Dada Noun # A cultural movement that began in ] during ] and peaked from 1916 to 1920. The movement primarily involved visual arts, literature (mainly ]), ], and graphic design, and was characterized by ], ] ], ], ], chance, ], and the ] of the ]ing standards in ]. English irrationalities Noun # {{plural of|irrationality}} English irrationality Noun # The quality or state of being ]; want of the ] or the quality of reason; ]. English irrationality Noun # Something which is irrational or brought forth by irrational action, judgement, idea or thought. English unreason Noun # Lack of ] or ]; ]; ]. English woolly-headedness Noun # The quality of being ]: ], ].
From what I can see, this has been around since March of 2010. It appears to have been created by Conrad.Irwin (talk • contribs).
The relational format is also great for all languages. It is hard to filter on language and definition at the same time using AWB or Wiktionary online search function.
--Dan Polansky (talk) 20:46, 26 August 2012 (UTC)
Hi! Is this expression idiomatic/proverbial (entry-worth) in Czech? If so, would you like to move the transwiki into the main namespace? Or if it's not idiomatic, let me know and I'll delete it. - -sche (discuss) 21:05, 26 August 2012 (UTC)
Please help me to understand, as I am obviously unaware of the proper protocol for making edits here on Wiktionary. Do I need to create an RFV or RF(whatever) for any changes that I deem necessary or just the ones that you don't agree with? You actually don't need to answer that as it is a rhetorical question, to demonstrate the absurdity of your percieved thought process. I just want to clarify the guidelines as I understand them are to do constructive edits as I see fit. If I am verifying a definition and I see an incorrect sense, I am supposed to change that and fix it. If we put out RFV for all changes nothing would get done. I honestly wish I do not upset you with my responses as I merely wish to edit peacefully and be corrected amicably when my editting is incorrect. Speednat (talk) 00:00, 28 August 2012 (UTC)
{{rfv-sense}}
to see whether other editors can attest it. Just recently, I could not attest English "angulus" so I have send it to RFV, and, soon enough, an editor linked to attesting quotations. --Dan Polansky (talk) 19:20, 28 August 2012 (UTC)Another batch of AWB editing, this time a tiny one. See also #Restricting translations to lemma.
{{(t-?\+?)\|(..)\|(...)(*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3(*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3(*?)\|p\|?.*?}} {{$1|$2|$3$4$5}}
--Dan Polansky (talk) 19:14, 6 September 2012 (UTC)
Look what you can do with #English definitions from Wiktionary in a relational form
grep "\(arrogant\|conceited\|proud\).*person" enwikt-defs-20120821-en.tsv | cut -f2-
arriviste Noun # ], ], late arrival, ], ], generally characterised as an ambitious, brash or arrogant person who has yet to integrate with his or her new social group. bashaw Noun # {{archaic}} A ]; a self-important or arrogant person. {{defdate|from 16th c.}} bastard Noun # {{vulgar|referring to a man}} A ], ], overly or ] ] or ] person. See ], ]. bighead Noun # {{colloquial}} A person having an inflated opinion of himself; a ]ed or ] person. cock of the walk Noun # {{idiomatic}} A ] or ] person. cock of the roost Noun # {{idiomatic}} A ] or ] person. coxcomb Noun # A ] or ] person; a ]. inflated Adjective # {{context|figuratively}} ]; ] (''of a person or ego'') pajock Noun # A ] or ] person. proudling Noun # {{obsolete}} A ] or ] person. swellhead Noun # {{informal}} An ] or ] person. wisenheimer Noun # {{informal}} A ]-] and ] person; a ] or ].
--Dan Polansky (talk) 19:32, 6 September 2012 (UTC)
Ahoj,
Mám dvě otázky.
{{cs-noun}}
. (I didn't notice that you were the creator, otherwise, I wouldn't ask the second question). --Anatoli (обсудить/вклад) 22:21, 10 October 2012 (UTC)Hullo, it is ‘Pilcrow’ (Seth) again. I am not a Slavicist myself, but I must say, your labours on Wikisaurus are generally quite good. In a particular, I like Wikisaurus:burger because it is very thorough, and it is interesting to be aware of all of the many types of hamburgers. A little nitpick is that ‘gardenburger’ and ‘whaleburger’ are missing, but that isn’t a big deal. I appreciate how you want to enrich the project, even if I do not always show it.
So…would you like to be friends? --Æ&Œ (talk) 15:18, 15 October 2012 (UTC)
Hi Dan. Can you revisit your vote at Wiktionary:Requests for deletion#Crouchy in light of my revisions to Crouchy? Cheers! bd2412 T 18:26, 15 October 2012 (UTC)
: wtf: from where You got this? Czechs use infinitiv. Cheers. ;-) --Kusurija (talk) 06:08, 24 October 2012 (UTC)
A word (aka "lexeme") is a collection of inflected forms. This is not a definition, merely a statement to clarify that a word can appear in printed text as different strings of characters. As an example, "does" and "did" are word forms belonging to one word. However, "doer" (noun) is not an inflected form of "do" and is a different word.
When do two uses of a particular string of characters belong to one word, and when do they belong to two words? My tentative answer is that two uses with different etymology belong to two words and that two uses with different part of speech belong to two words. Any two uses that share etymology and part of speech belong to one word. Thus, same-spelled words are distinguished by either of etymology and part of speech. By means of example: "paper" in "made of paper" and "paper" in "papered" belong to two words per different part of speech; "sound" in "it made a loud sound" and "sound" in "The Sound of Denmark, where ships pay toll" belong to two words per different etymology. --Dan Polansky (talk) 11:42, 3 November 2012 (UTC)
... about and : I accidentally clicked a rollback link instead of the link above it. -- Gauss (talk) 21:25, 12 November 2012 (UTC)
Do you support removing the phrasebook? —RuakhTALK 21:42, 5 December 2012 (UTC)
Check here and you'll have right answers to your questions. Ĉiuĵaŭde (talk) 17:59, 15 December 2012 (UTC)
Hey Dan, I recently came across this Czech book with a long note in it that I'm very curious about. If you're willing to translate it for me, it's the third and fourth pages in this album. Ultimateria (talk) 16:53, 17 December 2012 (UTC)