Wiktionary:Persian entry guidelines

This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.

Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

Shortcut:
WT:AFA

Entry layout

Below is a very basic example of a Persian entry:

==Persian==

===Etymology===
...

===Pronunciation===
{{fa-IPA|Classical romanization}}

===Part of Speech===
headword line

# 

===Further reading===
...

Linking to dialects and regional terms

While {{fa-regional}} is commonly used to link to the Tajik spelling, editors have decided to move Tajik spellings to the headword line and reserve {{fa-regional}} for dialectal differences that are not merely a difference in script.

Lemmatization

Spellings and pronunciations are lemmatized at formal Persian, however, regional and colloquial dialects of Persian are also included. Dialectal variants should typically be added as alternative forms, unless they do not derive from formal Persian, in which case they may be lemmas on their own. Dialectal pronunciations should also be listed under their corresponding formal pronunciation, when possible. See § Dialects and varieties for more.

When multiple etymologies are on a single page, they should be listed by commonality, but common terms existing in all major dialects (e.g. inherited terms or classical borrowings) should be listed above dialectal terms and alternative/inflected forms.

Hamza/Hamze

ء and its seated variants (ء, أ, إ, ؤ, ئ) should be included in the page title if it is included in standard Persian dictionaries. But the variants أ, إ should never appear at the beginning of the word, and the ء above هٔ is excluded.

For spelling variants without ء, use {{unhamzated|fa|}} to redirect to the lemma form.

Zero-width non-joiner

A Zero-width non-joiner (abbr. ZWNJ) is a type of invisible space used in Persian typesetting. It functions by preventing two letters from connecting, and by extension, forcing the previous letter to use its final form. The ZWNJ should only be used where it has a visual effect (i.e. never after the non-combining characters ا, د, ذ, ر, ز, ژ, و; and only ever between two letters), and should be removed in all other cases. The ZWNJ is still transliterated in cases where it would be written but can't because it follows a non-combining character.

If an entry was incorrectly made without a ZWNJ do not make a hard redirect as hard redirects affect all languages. Instead, create an entry at the correct title, and leave a soft redirect on the previous page. If a page name has a ZWNJ in an inappropriate position, the page should be deleted.

Unicode character encodings

Entries are lemmatized in the Persian alphabet (Arabic script), entries in romanized Persian are not added.

When differing from other Arabic script languages, ensure that Persian entries use Unicode code points assigned to Persian and never code points for other languages. As Arabic script characters are encoded with differing initial, medial, final, and isolated forms, it may be difficult to distinguish unicode letters in certain positions. A proper Persian keyboard on a computer should always use the correct encoding, so this section is unnecessary unless you do not have a proper Persian keyboard installed.

Language	Codepoint	Isolated	Final	Medial	Initial
Arabic	U+0643	ك	ـك	ـكـ	كـ
Persian	U+06A9	ک	ـک	ـکـ	کـ

Language	Codepoint	Isolated	Final	Medial	Initial
Arabic	U+064A	ي	ـي	ـيـ	يـ
Arabic	U+0649	ى	ـى	ـىـ	ىـ
Persian	U+06CC	ی	ـی	ـیـ	یـ

Language	Codepoint	Isolated	Final	Medial	Initial
Arabic & Persian	U+0647	ه	ـه	ـهـ	هـ
Urdu	U+06C1	ہ	ـہ	ـہـ	ہـ
Urdu & Uyghur	U+06BE	ھ	ـھ	ـھـ	ھـ
Uyghur	U+06D5	ە	ـە	ـەـ	ەـ

If an entry was created at an incorrect encoding do not make a hard redirect as hard redirects affect all languages. Instead, create an entry at the correct encoding, and nominate previous page for deletion (assuming no other entries are present).

Etymology

There are broadly three kinds of etymologies you may see in Persian entries; Note that these are not etymological categories defined by linguists (Iranian linguists do not distinguish modern loanwords from earlier ones), rather they are defined by Wiktionary for language treatment purposes:

Inherited terms

Inherited terms tend to have regular phonemic correspondences between dialects (as do Classical loans). The general path for inherited terms (assuming they can be traced to Proto-Indo-European) is Middle Persian pal, from Old Persian peo, from Proto-Iranian ira-pro, from Proto-Indo-Iranian iir-pro, from Proto-Indo-European ine-pro. To prevent duplication (and etymologies becoming outdated) it's better not to include the entire etymology unless using {{etymon}} (which can update automatically), though full etymologies can be listed if they are unlikely to become outdated later on.

Classical loans

Terms borrowed during Classical Persian, similarly to inherited terms, tend to have regular phonemic correspondences between dialects. The large majority of loanwords during Classical Persian were borrowed from Arabic. In accordance with the English Wiktionary practice —where {{bor}} is used for direct borrowings and {{der}} is used for borrowings via an intermediary language— Persian loanwords from Arabic may use {{bor}}, as Arabic loanwords were borrowed via direct contact.

When applicable, Arabic loanwords should also include {{root|fa|ar|}} such that terms can be categorized by their Arabic root. To show a list of Persian terms belonging to an Arabic root, use {{rootsee}}. For example, {{rootsee|fa|ar|ك ت ب}} shows:

Persian terms derived from the Arabic root ك ت ب (0 c, 6 e)

Besides Arabic, Classical loanwords may be from other languages in the region, including Ottoman Turkish ota, Old Anatolian Turkish trk-oat, Chaghatai chg, Parthian xpr, Soghdian sog or more. If a term was borrowed from a Turkic language, but it is not clear which one, entries may use {{bor|fa|trk}} (this also applies to other language families).

Modern loans

Modern loanwords are any loanwords borrowed after Classical Persian. Characteristics of these loanwords are that they are not typically borrowed via contact, tend to arise in multiple dialects separately, may have irregular vowel correspondences between dialects, etc. See entries such as Iranian Persian (septâmbr), Dari (siptimbar), and Tajik сентябр (sentyabr), where Persian speakers in Iran, Afghanistan, and Tajikistan had borrowed the same word via separate languages (in this case, French fr, English en, and Russian ru).

Entries for modern loans should not include pronunciations for Classical Persian and, generally, these entries should not include a Tajik spelling in the header (as similar Tajik terms may not share the same etymology). In these cases, similar words may be shown via {{fa-regional}}. In other cases, the similarities may be coincidental. Compare Dari سمنت (simint / sement) to Tajik семент (sement), despite the similarities, the words are of different origins and should therefore be listed in {{fa-regional}} not in the headword line.

As these loanwords were borrowed after globalization (i.e. after global language contact), it is difficult to distinguish direct borrowings from indirect borrowings, but the current practice is to use {{bor}}.

Dialects and varieties

See also: Wiktionary:Persian transliteration

On the English Wiktionary, all varieties of Persian—written in the Persian alphabet—are treated as a single language (fa). So they should always be under the same header ==Persian==. As Tajik was written in the Persian alphabet until ~1920, this Perso-Arabic form of Tajik is considered a variety of Persian. When written in other scripts (including other variants of the Arabic script not used by Persian generally), it is treated as a separate language descended from Classical Persian.

Outside of pronunciation sections (and Tajik entries), Classical, Dari, and Tajiki varieties all utilize and share a single romanization scheme. Modern Iranian Persian uses its own romanization scheme, but all Iranian dialects spoken during or before the ~16th century are considered Classical Persian. Etymologies and quotations should use the appropriate romanization to broadly reflect the pronunciation of the time. It is acceptable to use the Iranian Persian romanization for other dialects (if you are unable to provide the Classical Persian romanization), but ق and غ should still be distinguished.

For Early New Persian (fa-ear), ذ and ڤ may be romanized as <ḏ> and <ḇ>. Otherwise, ذ should be <z> and ڤ should not be used at all.

Colloquial and regional dialects of Persian are considered LDLs, unlike formal Persian. Senses and terms added for colloquial/regional varieties of Persian should always have labels indicating they are colloquial or dialectal, as well as the dialect they are used in.

Dialect codes and labels

The following is a list of codes and labels use to specify varieties of Persian (fa). Varieties without codes may be included in label templates such as {{a}} or {{lb}} by typing the city or region that dialect is spoken in. Varieties are in alphabetical order, nested codes are categorized as sub-varieties of the variety above them. As Persian dialects exist on a dialect continuum, and it is difficult to definitively categorize them, Wiktionary defines them in relation to their corresponding standard dialects:

Classical Persian (fa-cls)
- Early New Persian (fa-ear)
- Indo-Persian {{lb|fa|Indo-Persian}}
Dari (prs)
- Aimaq (aiq) (not categorized)
- Hazaragi (haz)
- Southeastern Dari {{lb|fa|Kabul}} + various other cities.
- Western Dari {{lb|fa|Herat}} + various other cities.
Iranian Persian (fa-ira or pes)
- Dashtestani {{lb|fa|Dashtestan}}
- Malayeri {{lb|fa|Malayer}}
- Sistani {{lb|fa|Sistan}}
- Tehrani {{lb|fa|Tehran}} (not categorized)
Tajik (tg) (not categorized)

Notes

^ On English Wiktionary, "Classical Persian" is defined broadly to include the premodern (post-ENP) New Persian reconstructed by D. N. MacKenzie and —what linguist Daniel A. Rees refers to as— "Proto-Persian" (ancestor of modern Persian dialects), in addition to the medieval literary language.
^ Broadly including all varieties of New Persian immediately following Middle Persian, but prior to the emergence of Classical Persian as a literary language.
^ The ISO code assigned to Iranian Persian (pes) is not frequently used on the English Wiktionary. In Persian specific templates, the codes ira or ir are typically used instead. Elsewhere, fa-ira is generally preferred, with pes being an alias.
^ Only within Persian entries, elsewhere tg is a separate language.

Dialect specific considerations

Templates relating to Persian

{{fa-IPA}} — one of the most important Persian templates and, ideally, should appear in every Persian entry.
{{fa-l}} — allows you to show both the Classical and Iranian romanization in a link.

Notable reference templates:

{{R:prs:Bulkin}} — Focuses on formal Dari, the pronunciation is more accurate than most Dari dictionaries.
{{R:fa:Dehkhoda:1931}} — The dictionary is written in Persian but is generally good for formal Iranian Persian. The dictionary also shows which words have a final ـَه for dialects that maintain a ـَه~ـِه distinction.
{{R:fa:Hayyim}} — Focuses on formal Iranian Persian, especially as spoken in Tehran, and the dictionary does not include a ـَه~ـِه distinction for Iranian dialects that have it.
{{R:fa:Steingass}} — Focuses on Indo-Persian. Though useful for Classical Persian generally, the pronunciations are far less accurate than the pronunciations reconstructed by MacKenzie.
{{R:fa:QKA}} — Dictionary written in Persian focusing on Dari. The transcription is sometimes inconsistent and not as reliable as Bulkin's.

Etymology references

{{R:ira:Cheung|page=223}}
{{R:pal:Mackenzie:1971}} — While the dictionary focuses on Middle Persian, MacKenzie also reconstructs the Classical Persian pronunciation from multiple Persian dialects. MacKenzie's Classical pronunciation is generally the most accurate.

Dictionaries focusing on regional or colloquial dialects.

{{R:prs:SIL}} — Focuses on colloquial Dari generally, but includes a broad range of (mostly southern) dialects.
{{R:haz:SIL}} — Focuses on Hazaragi dialect specifically, but uses its own orthography for Hazaragi.