This is a Wiktionary policy, guideline or common practices page. Specifically it is a policy think tank, working to develop a formal policy. | |
Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES. |
Scripts, or writing systems, are groups of characters.
In Wiktionary, each script is recognized by a code and a name. The script codes are usually, but not always, ISO 15924 codes (appendix, unicode.org). For example:
Arab
: Arabic scriptCyrl
: Cyrillic scriptLatn
: Latin scriptSome Wiktionary script codes (used for particular languages' varieties of scripts) are named by combining an ISO script code and a Wiktionary language code.
fa-Arab
: Arabic script (of Persian language)ks-Arab
: Arabic script (of Kashmiri language)ota-Arab
: Arabic script (of Ottoman Turkish language)pa-Arab
: Arabic script (of Punjabi language)pjt-Latn
: Latin script (of Pitjantjatjara language)ur-Arab
: Arabic script (of Urdu language)There are also some exceptionally-named codes:
None
: Meant for no formatting at all. This is used as a kind of "blank" or "unknown" script, by languages that don't have a script specified yet in Module:languages.Polyt
: Meant for Ancient (as opposed to modern) Greek text, which uses polytonic diacritics.Music
: Meant for musical notation symbols.Morse
: Meant for Morse code symbols.Semap
: Meant for flag semaphore symbols.Ipach
: Meant for characters from the International Phonetic Alphabet (IPA).Rumin
: Meant for Rumi numerals (see Rumi Numeral Symbols).These scripts serve a number of functions.
According to our CFI, Wiktionary—as a dictionary of all words in all languages—includes definitions for individual characters. We therefore need to know which scripts the characters are part of; this knowledge also helps us organize them by means of categorization and further explanation in appendices.
Scripts are defined in Module:scripts/data.
Additionally, pieces of text may be formatted according to which script they are in by HTML spans wrapped around them. In theory browsers should handle formatting automatically, but in practice they do not do a good job. Therefore, formatting (text-direction, font-family, or font-size formatting) is controlled by applying an HTML class attribute (which can be used to format the text using Wiktionary's central Cascading Style Sheet (CSS), a registered editor's user style sheet, or a web browser's user style sheet) or an inline style attribute.
For the purpose of formatting text, there is the {{lang}}
template, which serves as a wrapper for predetermined formatting conventions. It is a "base" template that applies only the language and script-specific formatting, but does not do anything else. It can be used in situations when it is desirable to write non-Latin text in general. This template takes a required language code parameter, and the text to be wrapped. The optional |sc=
parameter is used to override the autodetected script, if necessary. The |face=
parameter can be used to apply a specific style to the text, and can be term
, head
, bold
, or empty for normal text. This allows italic and bold effects to be implemented in a script-specific way, with formatting that is appropriate for the conventions of that script, as well as to enhance readability.
Most other templates that show non-English text also have a language parameter, and a |sc=
parameter to override the autodetected script. This includes widely-used templates like {{l}}
, {{m}}
, {{t}}
, {{head}}
and {{form of}}
:
{{lang|sh|sc=Cyrl|вријеме}}
displays вријеме (the Wikipedia language code for Serbo-Croatian is ISO 639-1 code sh){{m|sh|sc=Cyrl|вријеме}}
displays вријеме{{head|sh|noun|sc=Cyrl|g=n}}
displays Scripts n{{t|sh|sc=Cyrl|вријеме|n}}
displays вријеме nMost templates, including all of those listed above, perform automatic detection of scripts. They will look at the text that was provided, and try to judge which of the language's scripts (specified in the Module:languages data submodules) is being used. Therefore, the |sc=
parameter is almost never needed. However, it is necessary to provide it if the text is written in a script that is not one of the recognised scripts for that language.
{{l|sh|vrijeme}}
displays vrijeme (the default script of Serbo-Croatian is "Latn"...){{l|sh|вријеме}}
displays вријеме (...but the template will also recognise text written in any other possible scripts listed for Serbo-Croatian, which is Cyrillic in this case){{l|sh|β|sc=Grek}}
displays β (however, if you want to write Greek letters in Serbo-Croatian, you need to provide the script code, since Greek is not one of the normal scripts for Serbo-Croatian)In the vast majority of cases, |sc=
is not needed. Some examples of where it is:
{{m|el|;|sc=Grek}}
— in Greek when the term contains no Greek characters but is used in Greek-script text{{m|zh|man|sc=Hani|t=manly}}
— in Chinese neologisms (borrowed chiefly from English), when a word in Latin script is used in Han-script text, and is not a romanization so should not be tagged as Latn
(Latin script)
{{m|cmn|man}}
— neutral-tone pinyin syllable, automatically detected as Latin scriptCyrl
(Cyrillic) and Cyrs
(Old Cyrillic), which have the same characters, but where Cyrl
represents modern Moldovan and Cyrs
represents an older Cyrillic orthographyAs of April 2023, Wiktionary has incorporated all ISO 15924 codes (see Appendix:ISO 15924) except these: Aran
, Cirt
, Hanb
, Jamo
, Pcun
, Pelm
, Piqd
, Psin
, Sara
, Syre
, Syrj
, Syrn
, Zinh
, Zsye
, Zxxx
. (The ISO also reserved several codes; these have not been incorporated.)
Wiktionary also uses several codes which are not listed in ISO 15924. These are included in the central list of scripts, and include codes for varieties of Arabic (fa-Arab
, etc), varieties of Latin (pjt-Latn
), polytonic Greek (Polyt
) and some other things (such as mnc-Mong
), as well as codes for musical notation (Music
) and the IPA (Ipach
). The code None
functions like a script code in some ways.