Owner/runner: User:Robert Ullmann
Source: User:AutoFormat/code
To queue an entry for AutoFormat, add tag {{rfc-auto}}
, exactly that way with no parameters. It can be placed anywhere reasonable. The page will appear in Category:Requests for autoformat.
The bot reads Recent changes, and checks each main namespace entry that has been patrolled. It waits 15-20 minutes before the check, sometimes longer when busy to limit traffic.
When otherwise idle, the bot picks up an entry at longish intervals from a prescreen of a recent XML dump, based on some simple tests, and then checks the current entry.
The essential idea is based on an observation: while the English Wiktionary has—as it must—a fairly rigid format, it is pointless and unproductive to go about complaining to users (both new and old) about fiddly details.
If one has to snap at some newbie for writing "Related Terms" instead of "Related terms", or for forgetting the odd horizontal rule between language sections, there is less time for more productive work, and the newbie may (will) find the requirements unfriendly. In the case of the contributor who has been around for a very long time and writes ===noun=== it is perhaps even less productive to engage in a talk page conversation.
There are also things people persist in doing even when they know better: using PAGENAME and forgetting to subst it; using {{Wikipedia}}
because they used to it. (And then there is "Pronounciation" ...) These are better fixed than complained about.
The more controversial class of problems is when someone writes an entry using a straight line of level 3 headers, or doesn't know how to really nest the etymology sections, but the structure is well-determined. This is very common, as is (oddly enough) starting off with Etymology or Pronunciation at level 4 (which appears to work, because the WM software displays the TOC correctly in spite of that!)
One observation from the initial testing is that involved, serious users notice the changes made by AutoFormat, probably from the their watch lists, and learn from them.
It fixes things that are errors, common mis-understandings of standard format; it is not intended to enforce policy. Anything controversial or being debated is outside of scope. Any 'bot action to implement a new policy should be done by a purposed bot running from the XML dump, since it should be fixing the entire wikt. An example would be changing "Scots Gaelic" to "Scottish Gaelic". (Although much later it might routinely canonicalize the language name.)
Blank lines, fiddly spacing in headers, spelling errors in headers, etc, etc. But not (for example) trying to convert a non-standard "Transitive verb" header to "Verb" with the definition line(s) tagged with {{transitive}}
, this would be asking for semantic errors. AutoFormat tags such cases for attention.
Part of the inspiration is found at User_talk:Connel_MacKenzie/Normalization_of_articles.
Note that when the bot is operating autonomously, that is, not on pages flagged explicitly for rfc-auto, it does not make an edit/save page just for minor spacing. If it makes any other changes, all of the minor spacing etc. is done. If the entry is new, it will apply the minor spacing and related changes, such as changing "category" to "Category".
Sorts language sections into canonical order, prolog code at the top ({{see}}
template, etc), all iwikis at end.
See User:AutoFormat/Languages for the control table.
{{rfc}}
Headers other than L2 language headers. Bot action is controlled by User:AutoFormat/Headers.
{{rfc-level}}
{{rfc-header}}
Categories are listed at the end of the language section, after a blank line.
In sections marked as POS, whether also NS or not.
{{defn|(language)}}
if no definition line in section (before subsections, which is the only place definitions should occur){{top}}
to {{trans-top}}
when gloss is present{{trans-top}}
and re-balances columnsReplace context labels in parenthesis and italics with context templates. Control table is User:AutoFormat/Contexts.
Small things. Note that any of these replacements could be done by bot on the entire wikt, but the point here is that AF is mostly about fixing new contributions and edits; these are things that are commonly introduced. A number of editors routinely use "wikipediapar", and "Wikipedia" is common among people from the 'pedia who expect a template name to be capitalized. (e.g. w:Template:Wiktionary), we have the redirect so they can do what they expect easily, and then we fix it. Likewise, people persist in using PAGENAME without subst'ing it.
{{Wikipedia}}
with {{wikipedia}}
{{cattag}}
with {{context}}
{{Acronym}}
with {{acronym}}
, similarly with Initialism and Abbreviation; these are properly lower case, but people tend to use the upper case form because headers should be upper case{{Unicode}}
with {{unicode}}
{{trad}}
with {{t}}
, {{trad-}}
with {{t-}}
This is more detailed description of some of the above.
AutoFormat adds tags to entries when they require further attention.
{{rfc-level}}
for a serious problem, known header (such as "See also"), occurring at level 2 or 1; AutoFormat makes no other changes{{rfc-level}}
for an Etymology header not at level 3, except if at L4 and the first header in the language section, corrected. (a fairly common case!){{rfc-header}}
for a header that cannot be corrected{{rfc-level}}
for structure problems that are not fixed{{rfc-trverb}}
for headers Transitive verb, Intransitive verb, and Reflexive verb{{rfc-xphrase}}
for headers of the form X phrase{{rfc-tsort}}
for translations tables containing lines that cannot be parsed well enough to permit sortingIt only adds these if there is no existing cleanup tag; that is, no template starting with "rfc". In almost all cases, it will only add one, even if there are multiple problems. The rfc-header, rfc-level, rfc-trverb, and rfc-xphrase tags are quiet: they have no visible effect on the page except for the added categories.
{{defn|(language)}}
is added when there appears to be no definition line in a POS section. Often this is a format problem, such as using * instead of # for a definition.When AutoFormat generates an inflection line, it writes it as {{infl|language code|part of speech}}, this will usually add the correct categorization.
It also add a few more specific things:
{{attention}}
as neededThese occur with remarkable frequency, as the use of headword repeaters for each and every POS is not intuitive.
Some details:
{{trans-top}}
(including those converted on the same edit){{trreq}}
is read correctly{{t}}
uses{{trans-top}}
will remain there