This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.
Discrepancies detected:
lzh-lit
, is wrong; it should be Literary Chinese.lzh-lit
, is wrong; it should be Literary Chinese.lzh-lit
) has a canonical name that is not unique; it is also used by the code lzh
.inc-old
) has no child families or languages.ira-mid
) has no child families or languages.ira-old
) has no child families or languages.qfa-cre
) has no child families or languages.qfa-pid
) has no child families or languages.nb
) has Middle Norwegian (gmq-mno
) set as an ancestor, but is not in the West Scandinavian languages (gmq-wes
).nb
) has Danish (da
) set as an ancestor, but is not in the East Scandinavian languages (gmq-eas
).tt
) has an override_translit
value that is not nil
, true
or a string: false
hns
) has Bhojpuri (bho
) set as an ancestor, but is not in the Bihari languages (inc-bih
).hns
) has Awadhi (awa
) set as an ancestor, but is not in the Eastern Hindi languages (inc-hie
).alv-gtm-pro
) does not have the expected name "Proto-Ghana-Togo Mountain", even though it is the proto-language of the Ghana-Togo Mountain languages (alv-gtm
).auf-pro
) does not have the expected name "Proto-Arauan", even though it is the proto-language of the Arauan languages (auf
).awd-amc-pro
) has a proto-language code associated with the invalid code "awd-amc"
.awd-kmp-pro
) has a proto-language code associated with the invalid code "awd-kmp"
.awd-pro
) does not have the expected name "Proto-Arawakan", even though it is the proto-language of the Arawakan languages (awd
).awd-prw-pro
) has a proto-language code associated with the invalid code "awd-prw"
.awd-taa-pro
) does not have the expected name "Proto-Ta-Arawakan", even though it is the proto-language of the Ta-Arawakan languages (awd-taa
).dru-pro
) has a proto-language code associated with Rukai (dru
), which is not a family.euq-pro
) does not have the expected name "Proto-Vasconic", even though it is the proto-language of the Vasconic languages (euq
).gmq-pro
) does not have the expected name "Proto-North Germanic", even though it is the proto-language of the North Germanic languages (gmq
).inc-krd-pro
) does not have the expected name "Proto-KRDS lects", even though it is the proto-language of the KRDS lects (inc-krd
).mis-hkl
) has its canonical name ("Kelantan Peranakan Hokkien"
) repeated in the table of aliases
.nai-chu-pro
) does not have the expected name "Proto-Chumashan", even though it is the proto-language of the Chumashan languages (nai-chu
).nai-mdu-pro
) does not have the expected name "Proto-Maiduan", even though it is the proto-language of the Maiduan languages (nai-mdu
).nai-miz-pro
) does not have the expected name "Proto-Mixe-Zoquean", even though it is the proto-language of the Mixe-Zoquean languages (nai-miz
).nai-pom-pro
) does not have the expected name "Proto-Pomoan", even though it is the proto-language of the Pomoan languages (nai-pom
).omq-maz-pro
) does not have the expected name "Proto-Mazatecan", even though it is the proto-language of the Mazatecan languages (omq-maz
).poz-swa-pro
) does not have the expected name "Proto-North Sarawakan", even though it is the proto-language of the North Sarawakan languages (poz-swa
).sal-pro
) does not have the expected name "Proto-Salishan", even though it is the proto-language of the Salishan languages (sal
).sit-khp-pro
) has a proto-language code associated with the invalid code "sit-khp"
.sit-kon-pro
) does not have the expected name "Proto-Konyak", even though it is the proto-language of the Konyak languages (sit-kon
).smi-pro
) does not have the expected name "Proto-Sami", even though it is the proto-language of the Sami languages (smi
).tbq-kuk-pro
) does not have the expected name "Proto-Kukish", even though it is the proto-language of the Kukish languages (tbq-kuk
).xsc-sak-pro
) does not have the expected name "Proto-Sakan", even though it is the proto-language of the Sakan languages (xsc-sak
).apc
is set as an ISO 639-3 code on multiple items: Q56593
and Q22809485
.kjv
is set as an ISO 639-3 code on multiple items: Q838165
and Q31199873
.msn
is set as an ISO 639-3 code on multiple items: Q3331111
and Q3563857
.ttt
is set as an ISO 639-3 code on multiple items: Q56489
and Q123964178
.Blis
) is not used by any language and has no characters listed for auto-detection.Cpmn
) is not used by any language.Hira
) is not used by any language.Hrkt
) is not used by any language.Image
) is not used by any language and has no characters listed for auto-detection.Ipach
) is not used by any language and has no characters listed for auto-detection.Moon
) is not used by any language and has no characters listed for auto-detection.Morse
) is not used by any language and has no characters listed for auto-detection.Music
) is not used by any language.None
) is not used by any language and has no characters listed for auto-detection.Pcun
) is not used by any language and has no characters listed for auto-detection.Pelm
) is not used by any language and has no characters listed for auto-detection.Psin
) is not used by any language and has no characters listed for auto-detection.Roro
) is not used by any language and has no characters listed for auto-detection.Rumin
) is not used by any language.Semap
) is not used by any language and has no characters listed for auto-detection.Visp
) is not used by any language and has no characters listed for auto-detection.Zmth
) is not used by any language.Zsym
) is not used by any language.Zyyy
) is not used by any language and has no characters listed for auto-detection.Zzzz
) is not used by any language and has no characters listed for auto-detection.fa-Arab
, ug-Arab
, ks-Arab
, ps-Arab
, ur-Arab
, ku-Arab
, tt-Arab
, ota-Arab
, mzn-Arab
and sd-Arab
are currently alias codes. Only one code should be used in the data.ms-Arab
and kk-Arab
are currently alias codes. Only one code should be used in the data.For multiple data modules:
otherNames
, if present, must be an array.Q
and ending with decimal digits.The following must be true of the data used by Module:languages:
1
) must be present and must not be the same as the canonical name of another language.2
is not nil
, it must a valid Wikidata item ID.3
or family
is given and not nil
, it must be a valid family code.4
or scripts
is given and not nil
, it must be an array, and each string in the array must be a valid script code.ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code.family
is given, it must be a valid family code.type
is given, it must be one of the recognised values (regular
, reconstructed
, appendix-constructed
).entry_name
is given, it must be a table that contains either two arrays (from
and to
) or a string (remove_diacritics
) or both.sort_key
is given, it may either be a string, or at table that in turn contains either two arrays (from
and to
) or a string (remove_diacritics
).entry_name
or sort_key
is given, the from
array must be longer or equal in length to the to
array.standardChars
is given, it must form a valid Lua string pattern when placed between square brackets with ^
before it ("
). (It should match all characters regularly used in the language, but that cannot be tested.)override_translit
is set, translit
must also be set, because there must be a transliteration module that can override manual transliteration.link_tr
is present, it must be true
.1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"
.Checks not performed:
translit
is present, it should be the name of a module, and this module should contain a tr
function that takes a pagename (and optionally a language code and script code) as arguments.sort_key
is a string, it should be the name of a module, and this module should contain a makeSortKey
function that takes a pagename (and optionally a language code and script code) as arguments.entry_name
or sort_key
is a table and contains a field remove_diacritics
, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation (
).These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link
attempts to use the transliteration module.
Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.
The following must be true of the data used by Module:etymology languages:
canonicalName
must be given.parent
must be given must be a valid language, family or etymology-only language code.ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language."canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item"
.Codes in Module:families data must:
canonicalName
, which must not be the same as the canonical name of another family.family
is given, it must be a valid family code."canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item"
.Codes in Module:scripts data must:
canonicalName
.characters
pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets (""
). (It should match all characters in the script, but that cannot be tested.)"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction"
.