User:Mzajac/Language attributes

Hello, you have come here looking for the meaning of the word User:Mzajac/Language attributes. In DICTIOUS you will not only get to know all the dictionary meanings for the word User:Mzajac/Language attributes, but we will also tell you about its etymology, its characteristics and you will know how to say User:Mzajac/Language attributes in singular and plural. Everything you need to know about the word User:Mzajac/Language attributes you have here. The definition of the word User:Mzajac/Language attributes will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofUser:Mzajac/Language attributes, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

This is a brainstorming page for adding standard HTML language metadata (e.g., lang="uk" xml:lang="uk") to the workings of script templates.

This would require that {{term}}, {{t}}, {{infl}}, {{form of}}, and their variants provide a lang=xx parameter to the script template. The result would be HTML language metadata in all of their output.

Rationale

Lang and xml:lang are standard HTML metadata attributes for identifying the language of an element's content. They can be used to style text using CSS, possibly to supplement or replace the classes used in script templates. According to the HTML 4.01 specification, situations where language information may be helpful include assisting search engines, assisting speech synthesizers, helping a user agent select glyph variants for high quality typography, helping a user agent choose a set of quotation marks, helping a user agent make decisions about hyphenation, ligatures, and spacing, and assisting spell checkers and grammar checkers. HTML 5 says that the web browser may use the element's language, e.g., in the selection of appropriate fonts or pronunciations, or for dictionary selection.

An example application would be the use of standardized CSS selectors to style any language, rather than depending on the few classes defined in Wiktionary. For example, to use spaced small caps instead of italics for Ukrainian:

i:lang(uk) { 
    font-size: .75em;
    text-transform: uppercase;
    letter-spacing: .1em;
    }

Accessibility guidelines stress the importance of indicating language. “Clearly identify changes in the natural language of a document's text and any text equivalents (e.g., captions). ” (WCAG 1.0, 1999). “The human language of each passage or phrase in the content can be programmatically determined except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text. (Level AA)” (WCAG 2.0, 2008).

Wiktionary should strive to provide language metadata.

Code

The example {{Cyrl}} is used. The essential working code for most templates looks like the example below. ({Cyrl} actually has some #switch code which can turn the span into an i or b, and a deprecated .RU class, ignored here for clarity):

 <span class="Cyrl">{{{1}}}</span>

Show language: the template accepts a lang attribute from its parent (which may be one of {{term}}, {{t}}, {{infl}}, {{form of}}). (en needn't be sent, since English is the English-language Wiktionary's primary language, set in an entry's top-level HTML element.):

 <span class="Cyrl" lang="{{{lang}}}" xml:lang="{{{lang}}}">{{{1}}}</span>

But don't include empty lang attributes:

 <span class="Cyrl" {{ #if: {{{lang|}}} | lang="{{{lang}}}" xml:lang="{{{lang}}}"}}>{{{1}}}</span>

Indicate the language script using a language subtag, like lang="ru-Cyrl"

 <span class="Cyrl" {{ #if: {{{lang|}}} | lang="{{{lang}}}-Cyrl" xml:lang="{{{lang}}}-Cyrl"}}>{{{1}}}</span>

Default script: the script really shouldn't be indicated if it is the usual one for the particular language. So we should see lang="ru", but lang="sr-Cyrl" or lang="sr-Latn" (Russian is normally written in Cyrillic, but Serbian is in both Cyrillic and Latin, so it should have the script specified). This can be handled by a switch statement which filters the default languages:

 <span class="Cyrl" {{ #if: {{{lang|}}} | {{#switch: {{{lang}}}
 | ab | be | bg | kk | mk | ru | uk = lang="{{{lang}}}" xml:lang="{{{lang}}}"
 | #default = lang="{{{lang}}}-Cyrl" xml:lang="{{{lang}}}-Cyrl"
 }}}}>{{{1}}}</span>

Alternate languages: for the sake of documentation, explicitly name languages which commonly include the script code before the fallback.

 <span class="Cyrl" {{ #if: {{{lang|}}} | {{#switch: {{{lang}}}
 | ab | be | bg | kk | mk | ru | uk = lang="{{{lang}}}" xml:lang="{{{lang}}}"
 | az | bs | mn | sr | tg | uz | #default = lang="{{{lang}}}-Cyrl" xml:lang="{{{lang}}}-Cyrl"
 }}}}>{{{1}}}</span>

Test

The test template is currently at user:Mzajac/Language attributes/Cyrl. The HTML code is made visible by entering < as &lt;.

No lang
{{Cyrl |слово }}
<span class="Cyrl" >слово</span>
Empty lang
{{Cyrl |слово |lang= }}
<span class="Cyrl" >слово</span>
A default language for Cyrl
{{Cyrl |слово |lang=uk }}
<span class="Cyrl" lang="uk" xml:lang="uk">слово</span>
An ambiguous language for Cyrl
{{Cyrl |слово |lang=sr }}
<span class="Cyrl" lang="sr-Cyrl" xml:lang="sr-Cyrl">слово</span>
An undefined language
{{Cyrl |слово |lang=und }}
<span class="Cyrl" lang="und-Cyrl" xml:lang="und-Cyrl">слово</span>
Oops
{{Cyrl |слово |lang=sr-Cyrl }}
<span class="Cyrl" lang="sr-Cyrl-Cyrl" xml:lang="sr-Cyrl-Cyrl">слово</span>
Junk
{{Cyrl |слово |lang=NONSENSE! }}
<span class="Cyrl" lang="NONSENSE!-Cyrl" xml:lang="NONSENSE!-Cyrl">слово</span>

What's the best way to deal with incorrect input? Should we add comprehensive error-checking? Should error input be corrected, dropped silently, or throw an error message?

Should the input case be adjusted (EN > en)? This is supposed to be case-insensitive, so changing it is safe, and it may help buggy implementations deal with our content.

If the code grows complex, should it be generalized and maintained in a single template, to be transcluded into any script template?

Complex example

Rolling {{Arab}} and its partners into a single template (including {{fa-Arab}}, {{ks-Arab}}, {{ku-Arab}}, {{ota-Arab}}, {{pa-Arab}}, {{ps-Arab}}, {{sd-Arab}}, {{ug-Arab}}, and {{ur-Arab}}). This also supports varying class attribute:

 <span dir="rtl" {{ #if: {{{lang|}}} | {{#switch: {{{lang}}}
 | ar = class="Arab" lang="ar" xml:lang="ar"
 | fa | ps | ur = class="{{{lang}}}-Arab" lang="{{{lang}}}" xml:lang="{{{lang}}}"
 | ks | ku | ota | pa | sd | ug = class="{{{lang}}}-Arab" lang="{{{lang}}}-Arab" xml:lang="{{{lang}}}-Arab"
 | az | tg | #default = class="Arab" lang="{{{lang}}}-Arab" xml:lang="{{{lang}}}-Arab"
 }}
 | class="Arab"}}>{{{1}}}</span>

Notes

List of language subtags

Data from the IANA subtags registry (2008-11-25) has been extracted to User:Mzajac/Language attributes/IANA subtags. It needs amendments.

Language tags

w:Language tags: HTML 4.01 specifies the format for language tags to follow rfc:1766 (1995). HTML 5 specifies its replacement, rfc:3066 (2001). The latest specification is rfc:4646 (2006), and another revision is in progress.

No language

Language codes for no language.

  • zxx – non-linguistic matter (e.g., type samples, part numbers, binary data streams)
  • und – for text of undetermined language

XHTML doesn't allow the empty string (xml:lang=""). HTML 5 says “Setting the attribute to the empty string indicates that the primary language is unknown”.

We should avoid setting empty language tags.

To do

  • Support for region or variant subtags
  • Compile list of languages by script, or at least languages which conventionally use multiple scripts
  • Flag unusual language–script combinations
  • Allow explicitly setting an empty language tag (meaning “unknown language”)
  • Filter out lang="en", which needn't be included because this is the English-language Wiktionary's primary language, set in an entry's top-level HTML element
  • Generalize the code for any script template by using {{PAGENAME}} instead of “Cyrl” for both the class and lang attributes.
  • Use the #language: parser function to generalize the code for any Wiktionary

References