This module contains definitions and metadata for three-letter language codes starting with u
. See Wiktionary:Languages for more information.
This module must not be used directly in other modules or templates. The data should be accessed through Module:languages. For the corresponding extra data, see Module:languages/data/3/u/extra.
The following errors were detected by Module:data consistency check:
nb
) has Middle Norwegian (gmq-mno
) set as an ancestor, but is not in the West Scandinavian languages (gmq-wes
).nb
) has Danish (da
) set as an ancestor, but is not in the East Scandinavian languages (gmq-eas
).hns
) has Bhojpuri (bho
) set as an ancestor, but is not in the Bihari languages (inc-bih
).hns
) has Awadhi (awa
) set as an ancestor, but is not in the Eastern Hindi languages (inc-hie
).alv-gtm-pro
) does not have the expected name "Proto-Ghana-Togo Mountain", even though it is the proto-language of the Ghana-Togo Mountain languages (alv-gtm
).auf-pro
) does not have the expected name "Proto-Arauan", even though it is the proto-language of the Arauan languages (auf
).awd-amc-pro
) has a proto-language code associated with the invalid code "awd-amc"
.awd-kmp-pro
) has a proto-language code associated with the invalid code "awd-kmp"
.awd-pro
) does not have the expected name "Proto-Arawakan", even though it is the proto-language of the Arawakan languages (awd
).awd-prw-pro
) has a proto-language code associated with the invalid code "awd-prw"
.awd-taa-pro
) does not have the expected name "Proto-Ta-Arawakan", even though it is the proto-language of the Ta-Arawakan languages (awd-taa
).dru-pro
) has a proto-language code associated with Rukai (dru
), which is not a family.euq-pro
) does not have the expected name "Proto-Vasconic", even though it is the proto-language of the Vasconic languages (euq
).gmq-pro
) does not have the expected name "Proto-North Germanic", even though it is the proto-language of the North Germanic languages (gmq
).inc-krn-pro
) does not have the expected name "Proto-KRNB lects", even though it is the proto-language of the KRNB lects (inc-krn
).mis-hkl
) has its canonical name ("Kelantan Peranakan Hokkien"
) repeated in the table of aliases
.nai-chu-pro
) does not have the expected name "Proto-Chumashan", even though it is the proto-language of the Chumashan languages (nai-chu
).nai-mdu-pro
) does not have the expected name "Proto-Maiduan", even though it is the proto-language of the Maiduan languages (nai-mdu
).nai-miz-pro
) does not have the expected name "Proto-Mixe-Zoquean", even though it is the proto-language of the Mixe-Zoquean languages (nai-miz
).nai-pom-pro
) does not have the expected name "Proto-Pomoan", even though it is the proto-language of the Pomoan languages (nai-pom
).omq-maz-pro
) does not have the expected name "Proto-Mazatecan", even though it is the proto-language of the Mazatecan languages (omq-maz
).poz-swa-pro
) does not have the expected name "Proto-North Sarawakan", even though it is the proto-language of the North Sarawakan languages (poz-swa
).sal-pro
) does not have the expected name "Proto-Salishan", even though it is the proto-language of the Salishan languages (sal
).sit-khp-pro
) has a proto-language code associated with the invalid code "sit-khp"
.smi-pro
) does not have the expected name "Proto-Sami", even though it is the proto-language of the Sami languages (smi
).tbq-kuk-pro
) does not have the expected name "Proto-Kukish", even though it is the proto-language of the Kukish languages (tbq-kuk
).xsc-sak-pro
) does not have the expected name "Proto-Sakan", even though it is the proto-language of the Sakan languages (xsc-sak
).lzh-lit
) has a canonical name that is not unique; it is also used by the code lzh
.preprocess_links
for Hacked Thai (th-new
) is invalid.inc-old
) has no child families or languages.lzh-lit
, is wrong; it should be Literary Chinese.lzh-lit
, is wrong; it should be Literary Chinese.ira-mid
and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.ira-old
and the canonical name Old Iranian should be removed; they are not found in Module:families/data.ira-mid
and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.ira-old
and the canonical name Old Iranian should be removed; they are not found in Module:families/data.Every entry in the table must contain the following indexed fields:
1
2
nil
if not known/present. This replaces the older wikipedia_article
property, which can still be used to link to specific sections or language editions.3
4
Language:findBestScript
method in Module:languages. This function goes down the list of scripts and counts how many characters in the text belong to each script. If all the characters belong to one script, that script will be returned; otherwise, the script with the most characters will be returned. Thus, script detection will be faster if the most frequently used scripts are first in the list. If none of the characters match any of the listed scripts, then the None
script is returned (even if the characters would match a script not listed). Translingual (mul
) and Undetermined (und
) have the special value "All"
, which means they are treated as having every script. This value should not be set for any other language codes."Latn, Brai, Shaw, Dsrt"
.type
regular
- This value is the default, so it doesn't need to be specified. It indicates that the is attested according to WT:CFI and therefore permitted in the main namespace. There may also be reconstructed terms for the language, which are placed in the Reconstruction namespace and must be prefixed with * to indicate a reconstruction.reconstructed
- This language is not attested according to CFI, and therefore is allowed only in the Reconstruction namespace. All terms in this language are reconstructed, and must be prefixed with *.appendix-constructed
- This language is attested but does not meet the additional requirements set out for constructed languages (WT:CFI#Constructed languages). Its entries must therefore be in the Appendix namespace, but they are not reconstructed and therefore should not have * prefixed in links.ancestors
enm
(Middle English); ang
(Old English, the ancestor of Middle English), gem-pro
(Proto-Germanic, the ancestor of Old English), and ine-pro
(Proto-Indo-European, the ancestor of Proto-Germanic) are not listed.gem-pro
) belongs to the Indo-European (ine
) family, and its direct ancestor is Proto-Indo-European (ine-pro
). Because Proto-Indo-European is the proto-language of the Indo-European languages, Proto-Germanic does not need an ancestors
table; Proto-Indo-European will be automatically returned as its ancestor by the getAncestors
function."cr, fr"
.wikimedia_codes
"en, simple"
.interwiki_langs
in Module:translations/data; and the wiktprefix
field of the `metadata` variable in MediaWiki:Gadget-TranslationAdder-Data.js. FIXME: Unify this data.wikipedia_article
translit
isTransliterated
value set to false
in Module:scripts/data. This is used by transliterate
in Module:languages.link_tr
true
to link the language's transliteration. For instance, Gothic has entries in Gothic script and entries for transliterations: 𐌷𐌻𐌰𐌹𐌱𐍃 (hlaibs). Otherwise, this can be a comma-separated list of script codes, which means that links are only applied to terms using those scripts.override_translit
true
to make the automatic transliteration override an any given manual transliteration. Otherwise, this can be a comma-separated list of script codes, which means that the override is only applied to terms using those scripts.display_text
ӏ
, used in Cyrillic in many Caucasian languages, is frequently entered as I
, or even Latin l
or I
. As this is an ongoing issue (even among native speakers), the easiest way to solve the problem is to automatically correct the display form for those languages. This is used by makeDisplayText
in Module:languages.entry_name
ру́сский
→ русский
), or macrons from Latin or Old English words (ōs
→ os
), as these are not used in the normal written form of these languages. This is used by makeEntryName
in Module:languages.sort_key
"у" .. p
. Another character could be inserted straight after by using "у" .. p
(and so on).makeSortKey
in Module:languages.dotted_dotless_i
true
for languages that distinguish between the dotted and dotless I (such as some Turkic languages).translit
, display_text
, entry_name
and sort_key
all use the same syntax, which is designed to be as flexible as possible:
"sa-translit"
refers to Module:sa-translit.from
, to
, remove_diacritics
and remove_exceptions
relate to text substitution (see below).1
can be used as a fallback, which will be used if no specific behaviour is defined for that script.1
if you want to avoid this. It is not possible to process the output of a script-specific module with another module, however: this should be done (for example) with a tail call in the first module.text, lang, sc
, where text
is the input text (usually the page name or input by the user), lang
is the language code (not the language object), and sc
is the script code (not the script object). For performance reasons, they should only be used when it is not possible to achieve the desired result via text substitution.from
and to
keys.remove_diacritics
(and optionally remove_exceptions
).from
is paired with to
, and both of them must be tables that are organised pairwise: each element in from
is a pattern to identify which characters in the term to replace, while the corresponding element in to
defines what to replace them with (as arguments to mw.ustring.gsub
).false
or nil
), then any matching characters are removed altogether. This means that the from
list can be longer than the to
list, and an empty replacement will be assumed for any elements in from
that have no counterpart in to
.mw.ustring.gsub
function. See the Scribunto reference manual for more information. Note that patterns make double substitutions a viable way to achieve more complex results. See the Latin sortkey for Mandarin (cmn
) as an example of this.remove_diacritics
is a string which contains characters that will be removed after the text is decomposed. For instance, if remove_diacritics
is a combining acute accent, all acute accents will be stripped, even if they are part of precomposed characters (such as á or ά). Despite the name, the characters to be stripped need not be diacritics: for instance, including an apostrophe would remove all apostrophes (though be careful with hyphens, which must be be escaped as %-
).remove_diacritics
is given, then it is possible to specify a remove_exceptions
table, which prevents specific characters from having their diacritics stripped. For instance, if remove_diacritics
is a combining diaeresis, but remove_exceptions
contains "ё"
, then any instances of ё
will remain unchanged. On the other hand, an instance of ӱ
would still become у
(unless "ӱ"
is also added to remove_exceptions
).aliases
, varieties
, otherNames
family
3
.scripts
4
.local m_langdata = require("Module:languages/data")
-- Loaded on demand, as it may not be needed (depending on the data).
local function u(...)
u = require("Module:string utilities").char
return u(...)
end
local c = m_langdata.chars
local p = m_langdata.puaChars
local s = m_langdata.shared
local m = {}
m = {
"Uamué",
3441418,
}
m = {
"Kuan",
6441085,
}
m = {
"Tairuma",
7676386,
"ngf",
}
m = {
"Ubang",
3914467,
"nic-ben",
"Latn",
}
m = {
"Ubi",
56264,
}
m = {
"Buhi'non Bikol",
18664494,
"phi",
"Latn",
}
m = {
"Ubir",
3547642,
"poz-ocw",
"Latn",
}
m = {
"Umbu-Ungu",
12953245,
"ngf",
}
m = {
"Ubykh",
36931,
"cau-nwc",
"Cyrl, Latn",
translit = "uby-translit",
override_translit = true,
display_text = {Cyrl = s},
entry_name = {
Cyrl = s,
Latn = s,
},
sort_key = "uby-sortkey",
}
m = {
"Uda",
11011951,
"nic-lcr",
}
m = {
"Udihe",
13235,
"tuw-udg",
"Cyrl",
}
m = {
"Muduga",
16886762,
"dra-imd",
"Mlym",
translit = "ml-translit",
}
m = {
"Udi",
36867,
"cau-esm",
"Cyrl, Latn, Armn, Geor",
ancestors = "xag",
translit = {
Cyrl = "udi-translit",
Armn = "Armn-translit",
Geor = "Geor-translit",
},
override_translit = true,
display_text = {Cyrl = s},
entry_name = {
Cyrl = s,
Latn = s,
},
}
m = {
"Ujir",
14916906,
"poz-cet",
}
m = {
"Uldeme",
3515078,
"cdc-cbm",
}
m = {
"Udmurt",
13238,
"urj-prm",
"Cyrl",
translit = "udm-translit",
override_translit = true,
sort_key = "udm-sortkey",
}
m = {
"Uduk",
3182573,
"ssa-kom",
}
m = {
"Kioko",
18343036,
}
m = {
"Ufim",
7877531,
"ngf-fin",
"Latn",
}
m = {
"Ugaritic",
36928,
"sem-nwe",
"Ugar",
translit = "uga-translit",
}
m = {
"Kuku-Ugbanh",
10549854,
}
m = {
"Ughele",
966303,
"poz-ocw",
}
m = {
"Ugandan Sign Language",
7877677,
"sgn",
}
m = {
"Gong",
3448919,
"tbq-lob",
"Thai",
sort_key = "Thai-sortkey",
}
m = {
"Uruguayan Sign Language",
7901470,
"sgn",
}
m = {
"Uhami",
3913328,
"alv-nwd",
"Latn",
}
m = {
"Damal",
4748974,
}
m = {
"Uisai",
7878123,
"paa-sbo",
}
m = {
"Iyive",
11128658,
"nic-tvc",
"Latn",
}
m = {
"Tanjijili",
3914939,
"nic-pls",
}
m = {
"Kaburi",
6344482,
}
m = {
"Ukuriguma",
7878623,
"ngf-mad",
}
m = {
"Ukhwejo",
36623,
"bnt-bek",
}
m = {
"Muak Sa-aak",
23807993,
"mkh-pal",
}
m = {
"Ukrainian Sign Language",
10322106,
"sgn",
}
m = {
"Ukpe-Bayobiri",
3914470,
"nic-ben",
"Latn",
}
m = {
"Ukwa",
7878635,
"nic-ief",
}
m = {
"Kaapor Sign Language",
3322101,
"sgn",
}
m = {
"Ukue",
3913387,
"alv-nwd",
"Latn",
}
m = {
"Ukwuani-Aboh-Ndoni",
36636,
"alv",
"Latn",
}
m = {
"Kuuk Yak",
6448719,
"aus-psw",
"Latn",
}
m = {
"Fungwa",
5509187,
"nic-shi",
}
m = {
"Olukumi",
36722,
"alv-yor",
"Latn",
entry_name = {Latn = {remove_diacritics = c.grave .. c.acute .. c.macron}},
sort_key = {
from = {"ch", "ẹ", "gb", "gh", "gw", "kp", "kw", "ọ", "ṣ"},
to = {"c" .. p, "e" .. p, "g" .. p, "g" .. p, "g" .. p, "k" .. p, "k" .. p, "o" .. p, "s" .. p}
},
}
m = {
"Ulch",
13239,
"tuw-nan",
"Cyrl, Latn",
entry_name = {
from = {""},
to = {"ʼ"}
},
sort_key = "ulc-sortkey",
}
m = {
"Lule",
12635889,
nil,
"Latn",
}
m = {
"Afra",
4477735,
"paa-pau",
}
m = {
"Ulithian",
36842,
"poz-mic",
}
m = {
"Meriam",
788174,
"ngf",
"Latn",
}
m = {
"Ullatan",
8761579,
"dra-mal",
}
m = {
"Ulumanda'",
3501892,
}
m = {
"Unserdeutsch",
13244,
"crp",
"Latn",
ancestors = "de",
}
m = {
"Uma' Lung",
3548186,
"poz-swa",
}
m = {
"Ulwa",
2405552,
}
m = {
"Umatilla",
12953952,
"nai-shp",
"Latn",
ancestors = "nai-spt",
}
m = {
"Umbundu",
36983,
"bnt",
"Latn",
}
m = {
"Marrucinian",
36110,
"itc-sbl",
"Ital, Latn",
translit = {
Ital = "Ital-translit",
},
display_text = {
Latn = s
},
entry_name = {
Latn = s
},
sort_key = {
Latn = s
},
}
m = {
"Umbindhamu",
7881346,
"aus-pmn",
}
m = {
"Umbuygamu",
3915677,
"aus-pmn",
}
m = {
"Ukit",
7878321,
}
m = {
"Umon",
3915448,
"nic-ucn",
"Latn",
}
m = {
"Makyan Naga",
6740516,
"sit-kch",
}
m = {
"Umotína",
7881740,
"sai-mje",
}
m = {
"Umpila",
12953954,
"aus-pmn",
"Latn",
}
m = {
"Umbugarla",
2980392,
}
m = {
"Pendau",
7162371,
"poz-tot",
}
m = {
"Munsee",
56547,
"del",
"Latn",
entry_name = {remove_diacritics = c.acute .. c.breve},
}
m = {
"North Watut",
15887898,
"poz-ocw",
"Latn",
}
m = {
"Undetermined",
22282914,
"qfa-not",
"All",
}
m = {
"Uneme",
3913357,
"alv-yek",
"Latn",
}
m = {
"Ngarinyin",
1284885,
"aus-wor",
"Latn",
}
m = {
"Enawené-Nawé",
3307184,
"awd",
"Latn",
}
m = {
"Unami",
3549180,
"del",
"Latn",
--].
entry_name = {remove_diacritics = c.grave .. c.diaer},]===]
}
m = {
"Kurnai",
61676882,
"aus-pam",
"Latn",
}
m = {
"Mundari",
3327828,
"mun",
"Nagm, Deva, Onao", -- Onao is used by Bhumij, which may be a separate language; remove if it gets split out
translit = "hi-translit", -- for now
}
m = {
"Unubahe",
7897776,
}
m = {
"Munda",
36264959,
"mun",
"Latn",
}
m = {
"Unde Kaili",
12953596,
"poz-kal",
"Latn",
}
m = {
"Uokha",
3441216,
"alv-edo",
"Latn",
}
m = {
"Kulon",
11182000,
"map",
"Latn",
}
m = {
"Umeda",
7881465,
"paa-brd",
}
m = {
"Northeast Malakula",
13249,
"poz-vnc",
"Latn",
}
m = {
"Urarina",
1579560,
}
m = {
"Urubú-Kaapor",
13893353,
"tup-gua",
"Latn",
}
m = {
"Urningangg",
10710522,
}
m = {
"Uru",
2992892,
}
m = {
"Uradhi",
3915680,
"aus-pam",
"Latn",
}
m = {
"Urigina",
7900603,
"ngf",
"Latn",
}
m = {
"Urhobo",
36663,
"alv-swd",
"Latn",
}
m = {
"Urim",
7900609,
"qfa-tor",
"Latn",
}
m = {
"Urak Lawoi'",
7899573,
"poz-mly",
"Thai",
sort_key = "Thai-sortkey",
}
m = {
"Urali",
7899602,
"dra-kod",
"Knda",
}
m = {
"Urapmin",
7899769,
"ngf-okk",
}
m = {
"Uruangnirin",
7901389,
"poz-cet",
"Latn",
}
m = {
"Ura (Papua New Guinea)",
3121049,
"paa-bng",
"Latn",
}
m = {
"Uru-Pa-In",
7901376,
"tup-gua",
"Latn",
}
m = {
"Löyöp",
3272124,
"poz-vnn",
"Latn",
}
m = {
"Urat",
3502084,
"qfa-tor",
"Latn",
}
m = {
"Urumi",
7901530,
"tup",
"Latn",
}
m = {
"Uruava",
36875,
"poz-ocw",
"Latn",
}
m = {
"Sop",
7562808,
"ngf-mad",
"Latn",
}
m = {
"Urimo",
7900611,
"qfa-tor",
"Latn",
}
m = {
"Orya",
7105295,
"paa-tkw",
"Latn",
}
m = {
"Uru-Eu-Wau-Wau",
10266012,
"tup-gua",
"Latn",
}
m = {
"Usarufa",
7901714,
"paa-kag",
"Latn",
}
m = {
"Ushojo",
3540446,
"inc-shn",
"ur-Arab",
}
m = {
"Usui",
12644231,
}
m = {
"Usaghade",
3914048,
"nic-lcr",
"Latn",
}
m = {
"Uspanteco",
36728,
"myn",
"Latn",
}
m = {
"Saare",
63313662,
"nic-knn",
"Latn",
}
m = {
"Uya",
7904082,
}
m = {
"Otank",
3913990,
"nic-tvc",
"Latn",
}
m = {
"Ute",
13260,
"azc-num",
"Latn",
}
m = {
"Hun",
63313668,
"nic-knn",
"Latn",
}
m = {
"Aba",
2841465,
"poz-tem",
"Latn",
}
m = {
"Etulo",
35262,
"alv-ido",
"Latn",
}
m = {
"Utu",
7903469,
"ngf-mad",
}
m = {
"Urum",
13257,
"trk-kcu",
"Cyrl",
}
m = {
"Kulon-Pazeh",
36435,
"map",
"Latn",
}
m = {
"Ura (Vanuatu)",
7899531,
"poz-vns",
"Latn",
}
m = {
"U",
953082,
"mkh-pal",
}
m = {
"West Uvean",
36837,
"poz-pnp",
"Latn",
}
m = {
"Uri",
7900540,
"ngf-fin",
"Latn",
}
m = {
"Lote",
3259972,
"poz-ocw",
"Latn",
}
m = {
"Kuku-Uwanh",
3915687,
"aus-pmn",
}
m = {
"Doko-Uyanga",
7904095,
"nic-ucr",
"Latn",
}
return require("Module:languages").finalizeData(m, "language")