Owner/runner: User:Robert Ullmann as of 4 September 2007
The bot's primary task is to update the instances of template {{t}}
in translations sections:
{{t+}}
.{{t-}}
.{{t-sect}}
, it adds an xs= parameter to provide the section link; the language code template is then not used to parse the entry.The bot presently tries to add templates when possible. There are a lot of variations in format, mostly "standard", but that still allows for considerable variation.
At present the bot only converts to t templates in entries that it is looking at; i.e. entries that already have at least one that may need to be updated.
Tbot creates new entries from the translations tables it is updating.
Even technical terms that one would think are 1-to-1 are often not. (English high voltage, French haute tension, but English high tension)
What we call "Translations" is, and can only be, "words that correspond in some way in the other language."
That said, it is usually useful to create an entry for the FL word, "defining" it as the English word, with the translations gloss.
The languages that Tbot will create entries for are controlled by the existence of the language wiktionary. The word must exist in the FL wiktionary, with sufficient information for Tbot to verify it. (A null template, like a large number of entries in the ru.wikt, will not allow creation of the entry.)
Each entry is put in the corresponding language category, for example Category:Tbot entries (French). If the category doesn't exist, it will appear directly in Category:Tbot entries.
Each entry is also in a monthly category, e.g. Category:Tbot entries December 2007. The categories are added by {{tbot entry}}
For each translation where the local entry does not exist, and the FL wikt exists, and the word is in the FL wikt, Tbot reads the FL entry. If the entry refers to the English language word, it concludes that the translation is valid, if not "exact" (which, as observed above, is generally unattainable anyway).
It then sporks the picture and audio if found, checking each on commons (making sure they weren't uploaded locally on the FL wikt, or are at least duplicated on commons), and isolates the IPA if possible. It also recognizes the local equivalent of {{wikipedia}}
as far as its table goes, or if the same name. (But does not check the existence of the FL 'pedia article.)
Tbot creates the local entry, adding {{tbot entry}}
, and adds (or updates to) the {{t+}}
template, since the FL entry is known to exist.
Tbot recognizes words in various scripts, and adds them to {t} and {infl} when creating an entry. The scripts it knows now are Greek, Cyrllic, Armenian, Hebrew, Syriac, Arabic, Devanagari, Bengali, Georgian, and all the CJKV scripts and variants (including Han Extension B on plane 2). For Arabic, it uses fa-Arab, ur-Arab, and pa-Arab as appropriate, also Hayeren for Armenian, and polytonic for Ancient Greek.
Tbot uses link-alternation to produce alt= in {t} and head= in {infl}}, similarly from the parameter of {he-translation}. It recognizes and uses the transliteration as tr= in {infl} and the various genders and numbers in both {t} and {infl}.
(a number of the restrictions are temporary, as a starting point)
Also see User:Tbot/tbot entry.
Template {{t-sect}}
, used by t/t-/t+, optimizes the performance by supplying the language name for a number of common languages. It uses two sub-templates, {{t-lang}}
and {{t-lan2}}
. The first set is the languages with the most entries and translations; the second set is the other languages with more than 2000 translations as of September 2007.
Note that the Chinese languages have not yet been addressed. Also, the constructed languages Esperanto and Interlingua have more than 2000 translations but have not been included in the optimizations. (Ido has just over 1000.)