{{{1}}} at Google Ngram Viewer
Use this template to link to Google Ngram Viewer, showing time-dependent graph of word form or spelling frequencies.
The following parameters are used by this template:
|1=
|2=
|corpus=
|startyear=
, |start=
|endyear=
, |end=
|caseinsensitive=
Here are some:
* {{R:GNV|indecipherable, undecipherable}}
* {{R:GNV|ad lib, extemporal, extemporary, extemporaneous, extempore, extemporized, impromptu, improvised, improviso, off-the-cuff, offhand|some of the synonyms}}
* {{R:GNV|телепрогра́мма, телепереда́ча, телешо́у|corpus=36}}
* {{R:GNV|malen, streichen|corpus=31}}
* {{R:GNV|colour:eng_gb_2019,colour:eng_us_2019}}
* {{R:GNV|croissanterie|corpus=30|start=1900}}
* {{R:GNV|color/colour}}
* {{R:GNV|states of *}}
* {{R:GNV|states of *_NOUN}}
* {{R:GNV|*_ADJ argument}}
* {{R:GNV|cook_NOUN,cook_VERB}}
* {{R:GNV|cook_INF a meal}}
* {{R:GNV|cook_INF *_NOUN}} -- does not work
A list (with descriptions) is also available at https://books.google.com/ngrams/info.
Corpus | 2019 index | 2012 index | 2009 index | Shorthand (followed by _ and year) |
---|---|---|---|---|
American English | 28 | 17 | 5 | eng_us |
British English | 29 | 18 | 6 | eng_gb |
Chinese (simplified) | 34 | 23 | 11 | chi_sim |
English | 26 | 15 | 0 | eng |
English Fiction | 27 | 16 | 4 | eng_fiction |
English One Million | N/A | N/A | 1 | eng_1m |
French | 30 | 19 | 7 | fre |
German | 31 | 20 | 8 | ger |
Hebrew | 35 | 24 | 9 | heb |
Italian | 33 | 22 | N/A | ita |
Russian | 36 | 25 | 12 | rus |
Spanish | 32 | 21 | 10 | spa |
Google Ngram Viewer suffers from some limitations: 1) scanning errors (scannos); 2) corpus increasingly biased toward academic publications with passage of time; 3) each book has the same weight regardless of popularity; 4) wrong assignment of year of publication. Some of the problems are covered below. The scanno problem does not seem to completely invalidate the results, especially for English and longer words. The severity of the problems depends on what we want to measure, whether cultural change over time or relative frequencies of word forms.
figure, Figure at Google Ngram Viewer reveals the problem: capitalized Figure rises to the top during 20th century, suggestive of use in captions of academic literature. When we restrict the corpus to English Fiction, the problem disappears: figure, Figure at Google Ngram Viewer.
fuck at Google Ngram Viewer shows the problem: there is no way there were so many instances of "fuck" before 1800; rather, these are likely scannos of "suck" caused by long s (ſ). On the other hand, this problem does not occur after 1820.
anti-American, (antiAmerican*10) at Google Ngram Viewer and google books:"antiAmerican" show the problem: scanning sometimes drops the hyphen. There is no way there are so many occurrences of "antiAmerican" and the Google Books search confirms that. Other examples: (exteacher*10),ex-teacher at Google Ngram Viewer, (nonEnglish*10),non-English at Google Ngram Viewer.
Some hyphens are dropped when used within an unbroken line, other are dropped at a line break, which is ambiguous as for the presence of hyphen.
thebook, nonchocolate at Google Ngram Viewer and google books:"thebook" show the problem: the space was dropped and the result is as common as the legitimate nonchocolate. On the other hand, the book,(thebook*5000) at Google Ngram Viewer shows this happens relatively rarely.
google books:"misargument" shows the scanning problem: there are very few occurrences of "misargument" and some of the found items result from joining parts from different columns in multi-column publications. This one example does not make it into GNV statistics, though. It is unclear this could significantly impact frequencies of common words, though.
There is no reason to think there are spurious changes in capitalization. anti-American,(antiamerican*1000) at Google Ngram Viewer looks plausible, unlike anti-American, antiAmerican at Google Ngram Viewer.
As of Oct 2022:
{{R:GNV|nonstandard/}}
: nonstandard/ at Google Ngram Viewer.