WTF? What word is NUMBER 1?
As I've just found some words at 90000+ with capitals, does that mean these lists aren't even counting Age and age as the same word?
Couple questions. It it considered valuable to redirect things like i've to I've? Most of the red links in the first 10,000 now are just capitalization issues like this. If you type i've into the search box it sends you to the capitalized version, but the linked text shows red. So are we going on the redirects are cheap and no harm idea or do we just not care if the Gutenberg list looks complete? 2) Would it be hard to simply filter out the Gutenberg introduction by say automatically skipping the first x lines? It seems that would lead to a much more valuable list. Thanks - Taxman 17:35, 4 January 2006 (UTC)
Why has someone included nbsp?
I'm very tired but I've just removed the following sentence:
Fewer 17th century religious tracts and histories of Prussia. More vampires, starships, amnesia, homicide investigations, and cocaine.
If this has anything to do with the entry then perhaps it could go back in...
It might be worth mentioning that Project Gutenberg constists mostly of books published before 1923 and so it isn't exactly representative of words used today. See http://www.gutenberg.org/faq/C-10 — This comment was unsigned.
The Gutenberg section says:
Would this be http://www.gutenberg.orghttps://dictious.com/en/Gutenberg:The_Project_Gutenberg_License ? Any idea how this has affected the results? Denishowe 10:58, 15 August 2007 (UTC)
Surely http://www.wordcount.org counts for something? If I recall correctly it studies the word frequencies in online communications, eg email, forums etc.
I would be nice to see lists that exclude pronouns and prepositions.
In most cases present form of a verb comes before the past form in the TV list but the past form precedes in the PG list.
It struck me ass odd that 'a' was last in order of all the words with initial a. Then I noticed that 'down' precedes 'do', and so on. Please confirm that a# preceeds ab, and so forth. Kjaer 08:18, 25 August 2008 (UTC)
The word "brian" occurs in place 1670 (freq. 1146) and in place 8254 (freq. 106). There are no other duplicates among the 41284 words.
I might also note that some proper names like "Theresa" and "David" are capitalized, and others like "brian" (both occurrences) and "alice" are not. However, after converting the list to all lower case, there are still no further duplicates other than the "brian" mentioned. 207.172.220.155 17:49, 13 June 2009 (UTC)
I'm interested in seeing a German word frequency list where words are counted by lemma. Even better if there were also other German word lists grouped by word-families such as the British National Corpus has done
Is there a way to search engine to find a specific word? Sure I don't have to check every list individually to find out how often a word is used. 207.6.241.10 21:23, 8 May 2010 (UTC)
I have a further request: Is there a way to scan film scripts for specific phrases? I have looked around the subtitle sites but these frequency lists are the closest i've got so far. — This unsigned comment was added by 88.110.58.85 (talk) at 19:33, 21 August 2012 (UTC).
Anyone have a problem I I completely split this page by language? --Bequw → τ 20:58, 21 July 2010 (UTC)
I've assembled a list ranking Old Norse wordforms by frequency based on existing lists for a small number of texts. The list distinguishes alternate spellings as separate wordforms and, due to the small sample size, gives disproportionate ranking to certain names of characters from the texts. This list needs additional processing (see Top missing words), and a large portion of these words are not found on Wiktionary. LokiClock 12:14, 6 January 2011 (UTC)
The word "remuneration" appears twice in the Project Gutenberg lists of 2005-08-16:
What gives? --Lambiam 13:37, 16 November 2013 (UTC)
As a native Dutch speaker (and as a human with sufficient intellect) I have to question the sources mentioned.
The novel 'Max Havelaar' hardly qualifies as a benchmark for modern day Dutch. What practical purpose would using any single novel have? Let alone one from 1860. Is anyone one writing style to be the median for any language?
The University of Leipzig, while undoubtedly more knowledgeable than I, doesn't clearly state their source at all. I find it very hard to believe that a word like "frank" appears in the top 100 of most frequently used word in the Dutch Language. It's relatively antiquated. I myself have never heard it used outside of when it's used as a name. "1" hardly qualifies as a word to me, and "a" isn't even a word in Dutch, not to my knowledge anyway. Neither are "fr", "s" and "t". I have to seriously question both the validity and usefulness. OmikronWeapon (talk) 07:41, 9 July 2015 (UTC)
I believe the "s" comes from the genetives like "Anna's boek" and the "t" from the abbreviated, unstressed "het" as in " 't huisje aan de Schelde". For more recent information on many languages, including Dutch, see Opensubtitles Word Frequency lists August 2016 - https://invokeit.wordpress.com/frequency-word-lists/ Jansegers (talk) 15:38, 24 October 2016 (UTC)
Is there a way to download the raw lists or the source-code to make them? All I can see is the lists as wiki pages, which is not ideal for mechanical consumption. 207.179.110.10 02:24, 5 June 2016 (UTC)