What follows is a catalog of inclusion arguments and exclusion arguments, strong and weak ones. Some may be used to supplement WT:NSE, other to inform policy discussion. It further treats classes of entries and other considerations.
Avoid a hole in network and improve navigability and discoverability. #Derived-term principle is related, but is weaker since a single derived term would apply to it, while the avoid-hole principle for navigability requires at least two items to interconnect. Entry hub is more general than WT:THUB, which is for translation only.
Application:
This is related to #Entry hub, which is a stronger argument and should be preferably invoked if applicable. Principle: if there is an includable term X derived from term Y, include term Y. If X is derived from a specific sense of Y, include that sense.
This can be applied quantitatively: the more numerous the derived terms, the stronger the case.
The rationale is that if a sense in an entry has produced a derived term, the sense is probably notable enough to be included in the entry. An extreme case is of Hitler, whose name much sooner activates the referent in the mind of the listener or reader than a generic "someone named Hitler" sense. The rationale is not that the entry for the derived term itself needs the base term, since that is not so: Zeldaesque can mention the game in the definition and in the etymology so the reader will never need to navigate to Zelda for further information.
Application:
See also User talk:Dan Polansky § Derived-adjective principle.
Quantitative impact: Category:English terms suffixed with -ian has over 2,624 items. Category:English terms suffixed with -esque has nearly 450 items.
Keep the entry if it has lexical information not covered by Wikipedia, Wikispecies or Wikidata.
Classes of information:
The principle was enunciated in passed Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2. The vote was later rescinded since the requirement that the entry has to have such information from the start was deemed too stringent. After that, rather lenient inclusion criteria for place names were adopted, but such that it was no longer based on that sound principle.
Application:
See also User talk:Dan Polansky § Include attested proper names that are lexicographically interesting.
This principle does not protect multi-word names of biological taxa. Although they are being included in Wiktionary, they duplicate Wikispecies; about 1 000 000 entries for biological taxa can eventually be included. Nor does it protect "X County" entries.
This is a more inclusive version of #Dictionary-only lexical information, which makes it a weaker argument. The principle: include an entry if it has non-compositional linguistic information even if it is covered by Wikipedia, Wikidata or Wikispecies. It is inspired by Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2.
The non-compositionality treatment is essential: any name has a compositional pronunciation and compositional etymology, and in inflected languages, compositional inflection.
Classes of information:
The referent is arguably also a linguistic class of information, having to do with extensional semantics of names. However, if it would count, we would have to include all names of scientific articles from Wikidata, which would very badly swamp Wiktionary with linguistically low-value content.
Translation may be rather controversial, but it is a linguistic concern. An encyclopedia is not a name translation dictionary. Translation is also subject to non-compositionality treatment so not any translation of a capitalized descriptive phrase serving as a name counts.
Taxonomic names are covered by the principle, even if they duplicate Wikispecies and threaten to eventually swamp Wiktionary database.
See also #Name translation.
See #Single word.
This is a broad inclusion argument based on the Wiktionary slogan "All words in all languages". Its curtailment in WT:FICTION, WT:COMPANY and WT:BRAND was never properly and credibly justified. Wiktionary has enough database space to cover all Wikipedia's single-word names, and even all multi-word names if there is lexicographical merit such as translation. See also User:Dan Polansky/All words in all languages.
Applications:
See also all the single-word examples in User talk:Dan Polansky § Include attested proper names that are lexicographically interesting.
The "All words in all languages" slogan is taken seriously in some ways, but not in other ways. It serves to include all 3-attested very rare words formed from easily parseable highly productive prefixes such as non- and anti-. But other items that are clearly words, with interesting morphology, etymology or formation strategy, are excluded, as per above.
The principle: if multiple dictionaries include multiple items of a class, consider not outright banning the class but rather figuring out inclusion criteria for the class. That is, extrapolate from what other dictionaries are doing, erring on the side of inclusion. A related argument is #Wikipedia-style generosity. An advantage of this principle over the per-term lemming principle is that it allows rounded criteria to be developed, reducing the arbitrariness from following lemmings on a per-term basis.
Application:
Extrapolate lemmings for organizations: A multitude of organization names is being included by multiple other dictionaries, so do not regulate to exclude all or nearly all of them but rather figure out which to include. See also WT:OED and WT:MWO.
See also User talk:Dan Polansky § Criteria for inclusion of multi-word names of organizations.
The argument is that if a policy leads us to keep some items, we should extrapolate to similar items not covered by the policy. This is a relatively weak argument: if the policy keeping an item is not particularly good or keeps the item for simplicity of administration of rules, there is no reason to extend the boundary even further. The item on the boundary is not really 1.0-kept but rather, say, 0.6-kept and an item near to it can be 0.4-kept, meaning not kept.
Application:
WT:COALMINE is an actual policy that continues to be controversial. The idea behind it is that if a word exists in multiple spellings, we should include the most common spelling even if it is a sum of parts. The last vote on the subject in 2019 yielded a near-unanimous approval: Wiktionary:Votes/2019-08/Rescinding the "Coalmine" policy.
This is an analog of #Coalmine and states: if a rare or non-neutral term for a concept is included, also include the most common and neutral term for the concept even if the term is sum of parts. The force of this argument is unclear; it is not required for decoding. An application: since Anglistics is includable, we should also include the more usual term English studies. However, English studies is protected by WT:THUB anyway.
This is a reminder that dictionaries are used not only for decoding, for finding what a word means, but also as a spelling guide, hence various word lists without definitions being published. This use is recognized by the appreciation of "nonstandard" and "proscribed" labels. A vote to remove the proscribed label failed: Wiktionary:Votes/2016-10/Removing label proscribed from entries.
An application: non-French is useless for decoding, but it serves to show existence and a recommendation by a usage guide. For humans, nonadrenal is only marginally more useful for decoding.
This inclusion argument is usually covered by #Coalmine, which is a policy: non-compositional is protected. What is not protected by coalmine is non-French unless one creates nonFrench based on rare nonstandard quotations. Arguably, the preferable treatment is to delete rare nonstandard forms and keep non-French via #Affixed word.
One may object that the reader is better served by consulting a general hyphenation guide. That is true, but having an entry that points to a hyphenation guide is more convenient. One may further object that this would lead to inclusion of all attested hyphenated compound adjectives, as a convenience. That is a point to ponder.
A fairly weak inclusion criterion: an arguably sum of parts term sees restricted use that is hard to predict from the parts.
Applications:
These kinds of entries help users clarify the scope of reference of these expressions, but are out of remit of traditional lexicography.
The principle: Include an attested name, single-word or multi-word, if it has translations different from the name. This invokes lexicographical merit but is wildly inclusionist, possibly leading to inclusion of 1,000,000 names from Wikipedia or the like. On the other hand, this would be an addition to those 1,000,000 names from Wikispecies, on the same order of magnitude. The inclusion knob can be finetuned: do so only for (very) important entities; do so only if there are at least 10 translations; do so only if there are 10 independent attesting quotations, etc. It does not have to be all or nothing.
This kind of multi-lingual content is provided by Wikipedia via interwiki links, but these are an accident outside of the articles proper, not primary encyclopedic content. The interwikis themselves have no tracing to sources. One has to hope that Wikipedia editors for the sought language chose the most fitting title for the article. Translation data is also present in Wikidata, but without tracing to sources or attesting quotations.
Put differently, Wikipedia is no name translation dictionary. That is not its remit. It happens to serve the purpose relatively well.
Some sources indicate translation of names is a hard problem, especially between English and CJK languages. There is a potential for unique service here.
Wikipedia and Wikidata are currently probably the translators best sources for the purpose. However, the translator has to treat them as sources of hypotheses to be verified, with no hope for proper tracing for names as names. Wikidata is extremely generous and inclusive with its 99,894,920 items. Most Wikipedia articles are covered by Wikidata and much more. Wikidata includes not only entities but also sum-of-parts topics such as "history of England" (Q11755949), providing translations for such descriptive phrases into various languages.
See also the bold User talk:Dan Polansky § Translation dictionary of proper names, Wikipedia and Wikidata.
This inclusion criterion is not purely linguistic, being less good. It helps limit flooding Wiktionary with names while providing lexicographical benefits for at least some names, such as #Name translation. Something like this criterion plays a role in the current place name policy, which allows the likes of United Kingdom of Great Britain and Northern Ireland, and in the current astronomical name policy.
Applications:
See also #Capitalized descriptive phrase.
This argument is related to #Important-entity name. It says: if you fear flood of items of a class, include a sample of these anyway so that anyone interested in the class can get an impression of it, of how its members are formed and how they are translated. If the sample reveals the translations are sum of parts, that is also an observation to be made available to the reader or translator.
Application:
This principle is useful not only as an inclusion argument but also as a practical prioritization tool. Given a class of items, it pays off to create a sample of it for the reader but not bother to create every single attested member. This may apply to nonX terms, semiX terms, inhabitant names, Czech possesive adjectives (králův), etc. Sometimes, it is the rarer term that provides a unique value for the reader, to remove all doubt of existence and "correctness". Whether this has any force for nonX terms is questionable: it is a regular and "correct" formation to add prefix "non-" to any adjective.
This is a very inclusionist extension of the current WT:THUB policy. It would drop the requirement that the supporting translations are not closed compounds. At least two editors in the THUB vote support this (Wiktionary:Votes/pl-2018-03/Including translation hubs).
Application:
The rationale is the same as for THUB: improve the navigability between languages, one that many non-English Wiktionaries get by free. For instance, de:Autoschlüssel is automatically a translation hub without a special policy treatment. An alternative would be to allow non-English entries to host THUB translations. In the English Wiktionary, translation tables are disallowed from non-English entries, while German Wiktionary allows them.
A consequence would be a flood of English translation hubs. This would bring the English coverage closer to, say, German and Danish coverage. Whether it would be a bad thing is unclear. It would likely be controversial.
To reduce the flood, one could require, say, 7 translations instead of the 2 required by the current THUB. However, this would probably not reduce the flood all that much. car key is supported by 7 translations.
The criterion is that free variable is the most natural place to define the term, and most convenient for lookup. The question that the viewer is asking is not "what does free mean" or even "what does free mean in mathematics" but rather "what does it mean for a variable to be free". It is the combination that needs a definition. The combination is not syntactically frozen: a variable can be free, and a set can be open.
Relevant tests:
Both 1) and 2) apply to free variable and open set.
This contrasts to "green leaf", where "green" applies to all objects that can have color and not specifically to leaves.
A minor variation in the modified noun does not detract from the rationale: there can be "free variable" but also "free occurrence" of the variable.
When 2) is not met, the case becomes weaker. Such is the case with retroactive law: the question still is "what does it mean for a law to be retroactive" but looking the definition up in retroactive is convenient enough, unlike looking up a definition of "free variable" in free.
One test is whether the combination leads the collocation set in frequency. Such is the case for bulleted list per bulleted *_NOUN at the Google Books Ngram Viewer., where "bulleted list" and "bulleted lists" lead the pack by a wide margin. By contrast, retroactive law does not lead the pack per retroactive *_NOUN at the Google Books Ngram Viewer.; "retroactive legislation" would be more of a candidate but even that is not the leader.
Related RFDs resulting in keeping or undeleting include Talk:prime number, Talk:free variable, Talk:acute angle and Talk:nominative case. RFDs resulting in deletion include Talk:local variable (should have been kept) and Talk:Acadian epoch. See Special:Search/incategory:"RFD result (passed)" "free variable" and Special:Search/incategory:"RFD result (failed)" "free variable".
Some concerned entries: algebraic number, algebraic integer, bound variable, cardinal number, complex number, free variable, imaginary number, rational number, real number, transcendental number, free software, open set, closed set, complete graph, normal distribution, classical logic, and intuitionistic logic. Some are listed at Talk:free variable.
Syntactic unfrozenness or unboundedness is typical:
See also User talk:Dan Polansky/2013 § free variable.
WT:CFI says: "In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers."
To consider utility for the readers is praiseworthy and allowed by CFI. However, it is not an easily administrable criterion. We should be seeking specific administrable tests that reveal utility; invoking utility without hinting at an administrable test is less than ideal.
The cases are said to be "rare". This is not a problem: the individual cases kept as "useful" are a drop in the bucket compared to 10,000 nonX terms (already included) and a million taxa (yet to be created).
See also #Page views.
When examining utility for our readers, we may consider page views as objective evidence of what readers find worthwhile. The entry still needs to be lexicographical: it would not do to create an excellent well sourced encyclopedic article in the mainspace and then defend it by page views. For defending nominally sum of parts entries, it is relevant. It is inferior to other specific tests but is not without force as an argument.
Example applications:
One may object that this is not a fair comparison since noncholestatic is naturally a low performer. But that is the point: we do include trivially parseable low performers in great numbers and we would do well not to delete much more viewed entries as long as they only contain lexicographical information.
This is an auxiliary inclusion rationale. It is more restrictive than #Dictionary-only lexical information. It is an antonym of #No real dictionary. Wiktionary content provides unique value if it is not found elsewhere, including other dictionaries. Thus, content better covered by other dictionaries does not have unique value. Thus, paradoxically, it is content on the margins that provides unique edge.
Application:
Phrasebook is a policy in WT:CFI and is covered in Wiktionary:Phrasebook. Phrasebook entries may be useful for translation: I am hungry is mám hlad in Czech, as if I have hunger. Those not interested in these entries do not need to use them. I proposed a lemming test for phrasebook but it failed a vote. The current tests are "utility, simplicity and commonality" of the phrase; it can be sum of parts. Utility is subjective, and leads to arbitrary deletions, especially since phrasebook itself is controversial. Another test may be #Page views: if people visit the entry, let's keep it. Despite the phrasebook's being controversial, Wiktionary:Votes/2022-01/New phrasebook regulations passed unanimously. Category:English phrasebook has over 400 entries so there is no flood.
Kept and deleted phrasebook entries can be found via Special:Search/incategory:"RFD_result_(passed)" phrasebook and Special:Search/incategory:"RFD_result_(failed)" phrasebook.
I love you performs really well in terms of page views; Merry Christmas and a Happy New Year performs well only around Christmas; perfectly uncontroversial sum-of-parts-even-if-solid nonchocolate performs very poorly. I'm twenty years old, nominated for deletion, performs many times better than nonchocolate. See also #Overflood perspective.
One exclusion argument is that no real dictionary contains entries similar to the one under discussion. It is a weak argument: its line of reasoning is rejected overall by policy choices made.
First, other dictionaries do not declare #All words in all languages as their motto.
Second, Wiktionary is set up to have the following entries that no "real" dictionary such as OED or M-W has:
{{place}}
currently sees over 87,000 transclusions, and we are far from done. {{en-proper noun}}
sees over 100,000 transclusions.Furthermore, having content other dictionaries do not have provides a unique differentiator, in the business sense, a reason for people to visit Wiktionary sooner than any other dictionary for some needs. A person interested in usual English vocabulary can be better served by other professionally edited dictionaries, usually getting better quality.
See also #Overflood perspective.
WT:CFI#Attestation vs. the slippery slope says that for attestation, entries should be considered on their merit and not with the fear that their whole group will be included. Nonetheless, a form of slippery slope argument was repeatedly invoked in RFD, e.g. in Talk:ex-Christian to delete ex-X entries as being allegedly too numerous. It was pointed out that -ness entries are as numerous or more, but to no avail; indeed, Category:English terms suffixed with -ness has 9,830 entries. There are over 10,000 nonX/non-X entries. This form of slippery slope seems fundamentally fallacious, not attempting to do any serious quantitative analysis.
In general, the slippery slope argument has some force: when considering putative inclusion criteria, we should analyze their overall impact and not focus on a single entry only. Having many entries with little lexicographical value has costs: database storage use, size of dump downloads, computing resources needed for fulltext search, monitoring changes to entries for defects or vandalism, dispute resolution via RFD or RFV, etc. It helps to analyze each slippery slope argument in perspective: if there is a risk of including, say, 1000 ex-X entries and we have 10,000 nonX entries and support 1,000,000 taxa, these 1000 entries are a drop in the bucket.
See also #Overflood perspective.
WT:CFI contains a section on encyclopedic content, but that section provides nothing that should lead to exclusion of certain entries. "Delete as encyclopedic" is therefore not a CFI based argument. In detail, numbering mine:
Item (1) says the dictionary entry itself should be kept; it only regulates the content within entries. Item (2) is confusing and should ideally be removed. (a) Wiktionary entries are not only about words but also about multi-word terms, e.g. New York; (b) one reading would be that specific people and places should have no sense lines in Wiktionary entries, but for place names this would directly contradict their long-term treatment: London is not restricted to a single sense stating "place name", "any of various municipalities" or the like. The only practically acceptable interpretation of (2) is that it reinforces (1) exhortation to delegate encyclopedic information about referents to encyclopedia, while keeping short dictionary definitions covering referents, specific entities.
Thus, "delete as encyclopedic" to delete a complete entry is ad hoc lawmaking, allowed by WT:NSE. However, it is poor lawmaking: the principle is no simple test and points to no simple tests. Something being covered by Wikipedia is in itself no exclusion principle, certainly for non-name words. If this were an exclusion principle for names, there should be almost no geographic names covered since they are covered by Wikipedia.
See also #Dictionary-only lexical information and User talk:Dan Polansky § What is encyclopedic content and dictionary material.
This is a decent exclusion criterion, although perhaps unnecessarily strict.
Application:
Some RFD nomination are in the spirit of subjective dislike. W:WP:IDONTLIKEIT is relevant. Some of the examples given there are reminiscent of what we sometimes see in RFD: "Delete: No need.", "Delete as cruft.", "Delete as trivia." The WP page W:Wikipedia:Arguments to avoid in deletion discussions is an essay, not a policy. Consider its advice, numbering mine:
(1) does not apply: the discussions are mostly decided through head count, and nothing else can be called "consensus", properly speaking. (4) is too stringent: "delete as SOP" is not necessarily bad; sometimes it's all that needs to be said. "Keep per WT:COALMINE" is fine as well: the link does all the arguing. "Keep as a single word", while not highlighting the policy part, correctly invokes it. (2) and (3) seem relevant and interesting, and are all too often violated in our RFD discussions. Arguments should ideally be based on policy, and when it is not possible, they should invoke a candidate deletion principle that one could wish could be adopted as a policy.
They are excluded by policy. Deleting them as useless does no harm; there is no use case for them. Common misspellings can be useful for non-native speakers, who can type e.g. concieve and get to the required entry.
An alternative position would be that there are no misspellings, only very rare attested alternative forms. The resulting markup would serve the purpose: most readers understand not to use vanishingly rare alternative forms. This position is not taken by CFI. The practice would have a poor economy, requiring us to store many vanishingly rare 3-Usenet-attested spelling variations in the database, including japanese and london in lowercase. The economy would be improved by entering them as hard redirects.
Tests for misspelling:
Tests for "common" misspellings:
Anomalous spellings, those failing a pattern, are not misspellings, e.g. unchristian vs. un-Christian. See Wiktionary:Misspellings for more examples.
Typos are deleted as typos regardless of frequency, e.g. amgydala.
Precedent:
See also Wiktionary:Misspellings.
We need the sum of parts criterion to exclude nearly all attested phrases that are transparent syntactic constructions. We cannot include "green leaf", and "history of England". But sum of parts should not be taken to be strictly sufficient for deletion with no alleviating concerns possible. WT:THUB is one such concern discovered and codified over time. There are other concerns to be discovered and articulated, and this can happen on the fly before a policy is codified. See also #Utility.
Include attested affixed words even if hyphenated. Thus, include ex-pilot, ex-Christian, self-govern and non-French. This is in keeping with #All words in all languages. See Wiktionary:Beer parlour/2022/September § Including hyphenated prefixed words as single words.
Attested pronunciation spellings are being widely included as per Category:English pronunciation spellings. Wiktionary:Votes/pl-2011-01/Final sections of the CFI removed Typographic variants section of CFI, which dealt with "G-d, pr0n, i18n or veg*n". The voters did not indicate intent to remove those spellings; some voters indicated intent not to remove them. A change requires a vote.
Past discussions: Wiktionary:Beer_parlour/2008/March#-in'_forms, Talk:bein' and Talk:frontin'.
Spellings with asterisk are being included, e.g. veg*n. Wiktionary:Votes/pl-2011-01/Final sections of the CFI removed Typographic variants section of CFI, which dealt with "G-d, pr0n, i18n or veg*n". The voters did not indicate intent to remove those spellings; some voters indicated intent not to remove them. A change requires a vote.
Included items: veg*n, f**k, f*ck, f*der, b******s, d*ck, d—n, etc.
Category: Category:English censored spellings.
Discussions: Talk:f**k yielded a near-unanimous keep while Talk:f*ck yielded 6:4 for deletion.
Leet is being included per Category:English leet. Wiktionary:Votes/pl-2011-01/Final sections of the CFI removed Typographic variants section of CFI, which dealt with "G-d, pr0n, i18n or veg*n". The voters did not indicate intent to remove those spellings; some voters indicated intent not to remove them. A change requires a vote.
Neither "nickname of individual" nor "multi-word nickname of individual" are sufficient grounds for exclusion. Governator is a word, and if we are to document it, then as a nickname. Other nicknames are in Category:en:Nicknames of individuals, which has merely over 80 entries.
Some nicknames for presidents:
Multi-word nicknames are not protected by "all words in all languages".
This does not duplicate Wikipedia: W:Donald Trump does not list the above nicknames.
We are not swamped by nicknames nor are we about to become so any time soon. Rather, we have over 10,000 nonX solid-written words, trivially decipherable for humans, very uninteresting. And we are set up to duplicate on the order of 1,000,000 taxa from Wikispecies.
An early surviving nickname is Talk:Governator, 2007. Talk:Pharma Bro survived a 2019 RFD. However, Talk:Baghdad Bob was deleted in 2022, with the nomination rationale "Nickname of an individual".
See also User talk:Dan Polansky/2016 § Nicknames of specific people.
Arguments supporting particular nicknames include #Single word (Governator) and #Dictionary-only lexical information (Donald Trumpet).
However, the value is not so unique in so far as Wikipedia does cover this sort of lexicography:
Baghdad Bob is currently covered in W:Muhammad Saeed al-Sahhaf.
The inclusion of books and other literary works including plays is governed by WT:NSE. We have Bible, King James Bible, Genesis, Pentateuch, Book of Mormon, Old Testament, New Testament, Tanakh, Torah, Neviim, Ketuvim, Talmud, Octapla, Qur'an, Tao Te Ching, I Ching, Torah, Veda, Bhagavad Gita, Kama Sutra, Decameron, Little Red Book, Shahnameh, Edda, Iliad, Odyssey, Aeneid, Lysistrata, Hansel and Gretel, Jabberwocky; and further dictionaries: AHD, OED, CCE, COD, DARE, DCHP, LDE, NOAD, and RHD. There is Category:en:Books.
Discussions resulting in keeping include Talk:Odyssey, Talk:Kama Sutra, Talk:Hansel and Gretel, Talk:Jabberwocky.
Discussions resulting in deletion include Talk:Ali Baba and the Forty Thieves, Talk:Pearl of Great Price, Talk:Merseburger Zaubersprüche, Talk:Urban Dictionary, and Talk:基度山恩仇記. Talk:Oxford English Dictionary and Talk:Shorter Oxford English Dictionary were deleted as failing RFV, which makes no sense to me.
#Extrapolate lemmings arguments leads use to include some names and figure out criteria for them: there is Aeneid and Kama Sutra, but not Lysistrata.
#All words in all languages suggest the following: include attested single-word names not originating as a capitalization of a non-name word. Thus, include Lysistrata and Decameron but not the Clouds. In case of doubt, to limit the numbers, include only Wikipedia-notable ones. Some may be protected by #Entry hub. Many will have #Dictionary-only lexical information.
There are rather lenient criteria for place names ("geographic names") in place. However, they include many uninteresting names such as "X County" entries, while excluding many single-word names such as German street names, e.g. Hauptstraße. #All words in all languages and #Dictionary-only lexical information would lead to a different approach.
Place names are linguistically distinct from organization names:
As pointed out in #Place names, organization names have in general fewer saving graces than place names. This class of names is supported by #Extrapolate lemmings.
Some names are protected by #All words in all languages (e.g. Greenpeace) or #Linguistic information (e.g. Ku Klux Klan).
United Nations Organization is supported by #Important-entity name.
Some names can be supported by #Name translation, but that is likely to be more controversial.
Some political parties can be supported by #Derived-term principle: Democratic Party produced Democrat; Republican Party produced Republican; Green Party produced Green.
A 2022 vote failed: Wiktionary:Votes/2022-06/Updating CFI for names of organizations. Noteworthy comments in support for names of organizations from the vote and its talk page:
As a translator, I'm often curious about how different languages construct the names for things. Some of these names are arguably idiomatic, as the choice of this word or that as a translation for part of the original name can be arbitrary.' --Eiríkr Útlendi
Above, we have a witness of a translator. There is also the mention of WT:BRAND, which, as exclusionist as it gets, is much more inclusionist than excluding nearly all names of organizations.
Talk:Republican Party and Talk:Democratic Party failed RFD with the incorrect rationale that they fail WT:COMPANY. Political parties are not companies: not by dictionary definition and not by hyponymy in WordNet.
Surnames are protected by policy. Their inclusion illustrates the principles applied. They are supported primarily by #All words in all languages and #Dictionary-only lexical information. One can argue they are covered by Wikidata, but they have no gender and inflection information there. They would seem excluded via #No real dictionary since OED and M-W do not have surnames, but there are specialized dictionaries of surnames.
Surnames have no definition proper; they are defined merely as "surname". They can have etymology and pronunciation. They do not meet CFI's introductory rationale, "A term should be included if it's likely that someone would run across it and want to know what it means."
forebears.io reports 5,095,698 surnames in the United States. The magnitude matches the claim made in Quora that in the last census, there were 6.3 million surnames in the United States. By contrast, The Oxford Dictionary of Family Names in Britain and Ireland reports to have over 45,000 entries. According to Wikipedia, OED includes 616,500 word forms in total; 6 million surnames is 10 times as many.
It would make sense to delegate surnames to a specialized dictionary project. But Wiktionary claims to aspire to include all words in all languages.
In so far as taxonomic names are proper nouns and names of specific entities, they are not protected by CFI, and are subject to arbitrary deletion. They have not been subjected to RFD so far. There are on the order of 1,000,000 such names in existence. They are not protected by #All words in all languages. They would seem excluded per #No real dictionary; however specialized dictionaries of taxonomic names do exist. They could be excluded as duplicating Wikispecies.
This is a rebuttal of the notion that if something is in encyclopedia, it thereby should not be in the dictionary. A dictionary treats the term as a dictionary entry, providing a definition and other classes of lexical information. A definition is not an encyclopedic article. This notion is in keeping with WT:CFI#Wiktionary is not an encyclopedia's "Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept." See also User talk:Dan Polansky § What is encyclopedic content and dictionary material.
While Wikipedia is very inclusive and generous with coverage of popular culture, businesses, commercial brands, relatively minor organizations, etc., Wiktionary has adopted policies that unnecessarily curtail dictionary coverage. The curtailing policies are WT:FICTION, WT:BRAND and WT:COMPANY. That curtailment was never properly justified. It seemed to follow the exclusionist version of the lemming principle: if "real" dictionaries do not include that kind of content, nor should Wiktionary. A Wikipedia analog would be: if "real" encyclopedias do not include this kind of content, nor should Wikipedia. And yet, Wikipedia has generous article about W:Gondor.
This is a response to various claims that we are going to be overflooded by various kinds of entries if we allow them. Often, the arguer speaks of "infinity" of entries without doing any quantitative analysis. Thus, we must allegedly exclude ex-X entries (the prefix is too productive) or most names of literary works regardless of lexicographical merit. To that, we may note we have over 10,000 entirely uninteresting nonX entries, that we are multiplying the number of entries by including inflected forms as separate entries, and that we are in the process of duplicating on the order of 1,000,000 biological taxa from Wikispecies. Wikipedia has about 6,500,000 articles so even if we created an entry for each of them, we would not run out of database space. Only some of these articles are names; "History of England" is not a name. When a class of items is considered for inclusion using purely lexicographical criteria, the fear of overflood should be put in relation to these numbers. See also the extremely bold User talk:Dan Polansky § Translation dictionary of proper names, Wikipedia and Wikidata.
Is it acceptable to vote in RFD discussions against policy? Or is it something one should be ashamed of?
First, WT:CFI says: "In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers." Thus, invoking "utility" is not per se a policy override, strictly speaking, since it is covered by policy. See also #Utility.
Second, WT:CFI has phrasebook criteria, and one should keep that in mind when a sum of parts phrase is being requested for deletion.
Third, as a matter of fact, Wiktionary has a history of policy overrides:
WT:EL has flexibility section explicitly making it a guideline, not a set of rigid rules.
Policy overrides are not an unconditional good. They should be well reasoned, and not based on a whim. A hesitation to make them is advisable. On the other hand, they have done the project a lot of good, and are in the spirit of Wikipedia's W:WP:IAR: "If a rule prevents you from improving or maintaining Wikipedia, ignore it."
A vote designed to ban policy overrides failed: Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep.
OED uses quotations from Twitter as per WT:OED. Wiktionary editors decided to allow Internet quotations (Wiktionary:Votes/pl-2022-01/Handling of citations that do not meet our current definition of permanently archived), but a Beer parlour discussion (Wiktionary:Beer parlour/2022/September § Whether Reddit and Twitter are to be regarded as durably archived sources) yielded 11:8 for Twitter, not 2/3-supermajority for allowing Twitter for the normal standard of 3 attesting quotations spanning a year. It would make sense to require the Twitter quotations to show evidence of "sustained, widespread or accumulated use", using the language of OED and WT:MWO. The quantitative requirements would be left open. See also User talk:Dan Polansky § Using Twitter for attesting quotations.
Traditionally, 3 quotations from Usenet were considered enough. There is some opposition to it: Usenet is not edited and contains much more fringe word and proto-word material than printed publications.
Discussions:
As for attestation, Wiktionary inclusion criteria (WT:ATTEST) cannot be much more lenient than they are. The 3 quotations spanning a year standard approaches bare minimum to achieve independence and span, especially since it covers Usenet. Professional dictionaries are much stricter: see WT:Merriam-Webster and WT:Oxford English Dictionary criteria, using the language of "sustained", "widespread" or "accumulated" use. The current standard is easy to administer and reduces the workload of providers of quotations, compared e.g. to requiring 10 quotations. Reducing the standard to one use would eliminate all independence and allow nonce words invented by creative authors such as James Joyce. Admittedly, one limitations of such nonce words would still be that they need to convey meaning. See also Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3, which mandated moving deleted nonce words to the likes of Appendix:English nonces.
This rule used to be in WT:CFI, and said this:
It was removed via Wiktionary:Votes/pl-2010-05/Names of specific entities. It was in spirit of OED's inclusion criteria and would lead us to remove almost all names, including place names. One only has to look what names OED includes (almost none, much fewer than Merriam-Webster, which has a whole section for geographic names) to see the likely impact. Compared to that, the current place name criteria are very generous. Almost no opposition showed up in the vote. The exclusionist spirit of the rule survives in WT:FICTION.
A lot of Wiktionary content necessarily duplicates other wikis including Wikipedia and Wikispecies, incompletely so. This is especially true of various classes of proper names. Adding more of them cannot possibly be lexicographical priority. To wit:
See also #Dictionary-only lexical information.