Wiktionary talk:Frequency lists/Spanish1000

Hello, you have come here looking for the meaning of the word Wiktionary talk:Frequency lists/Spanish1000. In DICTIOUS you will not only get to know all the dictionary meanings for the word Wiktionary talk:Frequency lists/Spanish1000, but we will also tell you about its etymology, its characteristics and you will know how to say Wiktionary talk:Frequency lists/Spanish1000 in singular and plural. Everything you need to know about the word Wiktionary talk:Frequency lists/Spanish1000 you have here. The definition of the word Wiktionary talk:Frequency lists/Spanish1000 will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofWiktionary talk:Frequency lists/Spanish1000, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

New version

I have generated a new, improved, version of the list. The new list has been generated from 6527 subtitle files of TV-series and movies with a total of 27417111 words.

A bug in the counting script causing lost words at the end of line has been fixed. The bugfix results in an increased total wordcount and an increased wordcount of mainly adjectives and nouns, which are likely to occur at the end of a sentence. Additionally, some few files in non-Spanish language or with encoding problems have been excluded.Matthias Buchmeier 13:24, 10 October 2008 (UTC)Reply

Words excluded from the list

  • The following names, proper nouns, words from other languages and typos have been excluded from the list:
io john ei jack sam michael ia clark peter frank george harry james tom ben jimmy joe charlie bob

Import to other Wiktionaries

What's the licencing for these lists? Can I import them to the French Wiktionary (which has shockingly few Spanish words, less than 7000, I think). Mglovesfun 11:42, 21 May 2009 (UTC)Reply

The lists are released under both the GDFL and the LGPL licenses. Of course, You can import them to the French Wiktionary. Matthias Buchmeier 08:36, 22 May 2009 (UTC)Reply

What about a two words together frequency list? I have looked up two words alone and the combination of the two can have a meaning that it hard for me to explain. Like say "tal vez" , "creo que". Any way my point is there maybe some interesting trends.

Possible repetitions and spacing errors?

I am wondering about the word counts. The corpus consists of 27.4 million words. The unformatted list contains words by number of tokens in groups of 5.000 up to 225,000 words. However, there are a few repetitions where only the accents differ. Is this simply an orthographic error or are they different words? For example, in the 75,001-80,000 group, there are 6 tokens of decídselo ' you tell it to him'. However, in the 105,000-110,000 group decidselo appears again without an accent on the 'i' for a total of 3 tokens. I am guessing it is the same expression, but missing the accent? Also, there seem to be a number of typographical errors where the space between two words was eliminated and thus counted as one word, such as miracometimos and miraditasparece in the 175,001-180,000 group. Zaimot 18:23, 9 June 2010 (UTC)Reply


== Spanish words incorrectly link to English wiktionary e.g. "mayor" should link to http://es.wiktionary.orghttps://dictious.com/en/mayor and not to http://en.wiktionary.orghttps://dictious.com/en/mayor Could someone fix this?