This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors. | |
Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES. |
This page proposes how Marshallese word entries may be maintained on Wiktionary. To learn more about the language itself, see the article for the Marshallese language on Wikipedia.
There are currently display issues with five Marshallese letters:
Description | Letter | Unicode | Issue |
---|---|---|---|
L-cedilla | Ļ, ļ | U+013B, U+013C | Most fonts display this letter with a comma-below diacritic instead of a cedilla, to accommodate the expectations of the Latvian alphabet. |
M-cedilla | M̧, m̧ | U+004D-U+0327, U+006D-U+0327 | Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts. When displayed properly, the cedilla is placed either beneath the middle of the letter or underneath the rightmost column of the letter (but not too far to the right). |
N-cedilla | Ņ, ņ | U+0145, U+0146 | Most fonts display this letter with a comma-below diacritic instead of a cedilla, to accommodate the expectations of the Latvian alphabet. |
N-macron | N̄, n̄ | U+004E-U+0304, U+006E-U+0304 | Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts. |
O-cedilla | O̧, o̧ | U+004F-U+0327, U+006F-U+0327 | Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts. |
The characters given here are only approximations of the actual characters that are used in careful typesetting, but they are conditionally used until a better solution is found. Note especially that the sequences involving the combining macron character (U+304) or the combining cedilla character (U+0327) will not display correctly in the majority of fonts.
Only the standard diacritics are used in Wiktionary entries for Marshallese. Alternative schemes (particularly the Ḷ ḷ Ṃ ṃ Ṇ ṇ Ñ ñ Ọ ọ alternatives promoted by the online version of the Marshallese-English Dictionary) are not used. Three other letters with diacritics, Ā ā Ō ō Ū ū, are already well-displayed in most modern default browser fonts; alternative forms à ã Ä ä Õ õ Ö ö Ũ ũ Ü ü are not used.
Compare:
Separate word entries may be provided in both the old orthography and the new orthography, though references are easier to provide for words spelt in the new orthography because this is what the Marshallese-English Dictionary uses, and as such Marshallese word entries on Wiktionary are expected to overwhelmingly be from the new orthography. Where more than one spelling for the same word is used, including differences reflecting old and new orthographies, their entries can cross-reference each other by placing the other spellings in "Alternative forms" section of the entry, with a qualifier of which different orthography, if any, the linked word uses.
Marshallese pronunciations are embedded using a special template, {{mh-ipa-rows}}
, using phonological conversion algorithms described in Module:mh-pronunc. For example, {{mh-ipa-rows|mhahjelh}} embeds this:
The template describes both phonemic and phonetic pronunciations of Marshallese words:
The pronunciation template cannot rely directly on the spelling of Marshallese word entries to guess their pronunciation, as none of the common orthographies in use have a one-to-one phonemic correspondence, though the newer orthography is significantly more phonologically consistent than the older orthography. Instead, the template uses a code format which is essentially a simplified ASCII-only modification of Bender's pronunciation guide for the MED. For example, the {{mh-ipa-rows|mhahjelh}} example used earlier in this section uses the code mhahjelh, which is similar to m̧ahjeļ in the MED. The code uses a fairly strict syntax, but is case-insensitive.
The supported vowel phoneme symbols are:
Code | Height | MED Phoneme | IPA Phonemic | IPA Phonetic | Spellings |
---|---|---|---|---|---|
a | open | {a} | /æ/ | ā, a, o̧ | |
e | open-mid | {e} | /ɛ/ | e, ō, o | |
& | close-mid | {ȩ} | /e/ | ||
i | close | {i} | /i/ | i, ū, u |
The supported consonant symbols are:
Code | Articulation | MED Phoneme | IPA Phonemic | IPA Phonetic | Spellings | ||
---|---|---|---|---|---|---|---|
Primary | Secondary | Manner | |||||
b | labial | velarized | obstruent | {b} | /pˠ/ | b, bw | |
d | coronal | palatalized | trill | {d} | /rʲ/ | r | |
h | (dorsal) | velarized | approximant | {h} | /ɰ/ | - | - |
j | coronal | palatalized | obstruent | {j} | /tʲ/ | e, i, - | |
k | dorsal | velarized | obstruent | {k} | /k/ | k | |
kw | dorsal | labiovelarized | obstruent | {kʷ} | /kʷ/ | kw, k | |
l | coronal | palatalized | lateral | {l} | /lʲ/ | l | |
lh | coronal | velarized | lateral | {ļ} | /lˠ/ | ļ | |
lw | coronal | labiovelarized | lateral | {ļʷ} | /lʷ/ | ļw, ļ | |
m | labial | palatalized | nasal | {m} | /mʲ/ | m | |
mh | labial | velarized | nasal | {m̧} | /mˠ/ | m̧, m̧w | |
n | coronal | palatalized | nasal | {n} | /nʲ/ | n | |
ng | dorsal | velarized | nasal | {g} | /ŋ/ | n̄ | |
ngw | dorsal | labiovelarized | nasal | {gʷ} | /ŋʷ/ | n̄w, n̄ | |
nh | coronal | velarized | nasal | {ņ} | /nˠ/ | ņ | |
nw | coronal | labiovelarized | nasal | {ņʷ} | /nʷ/ | ņw, ņ | |
p | labial | palatalized | obstruent | {p} | /pʲ/ | p | |
r | coronal | velarized | trill | {r} | /rˠ/ | r | |
rw | coronal | labiovelarized | trill | {rʷ} | /rʷ/ | rw, r | |
t | coronal | velarized | obstruent | {t} | /tˠ/ | t | |
w | (dorsal) | labiovelarized | approximant | {w} | /w/ | - | w, - |
y | (dorsal) | palatalized | approximant | {y} | /j/ | - | e, i, - |
The template's code syntax also supports the use of any number of plain apostrophes (') to disambiguate symbol spellings. For example, {{mh-ipa-rows|jal'w&j}} embeds this:
...whereas {{mh-ipa-rows|jalw&j}} embeds this instead:
The syntax also permits any number of ASCII whitespace characters and ASCII plain hyphens (-), as well as commas (,) to separate multiple different examples. In the case of commas, the module script processes each comma-separated piece of code separately, and shows the converted result of each fragment with duplicate results removed. For example, {{mh-ipa-rows|kewkew, k&wk&w}} embeds this:
Besides these symbols, the template allows hyphens and whitespace characters in any combination intermixed with the code sequences, but besides these, the template only accepts code representations of certain sequences of consonants and vowels for each comma-separated fragment of code, and will display an error if that code is malformed. In particular:
When sorting entries for categorization, a simple ASCII-based transcription can properly collate Marshallese entries in Marshallese word categories. Use all lowercase for sorting, and include spaces and dashes as normally included in the entry. And for letters with diacritics:
ā | ļ | m̧ | ņ | n̄ | o̧ | ō | ū |
a~ | l~ | m~ | n~ | n~~ | o~ | o~~ | u~ |
So, a word like M̧ajōļ would be collated m~ajo~~l~.
When using template code like {{head|mh|noun}}, Wiktionary properly and automatically collates entries in associated Marshallese word categories, but collation may still be necessary for topical categories like Category:mh:Islands or Category:mh:Animals which are subcategories of Category:mh:List of sets. For example, in the page for the Marshallese country name M̧ajōļ, the word entry is not categorized with the markup ] by itself, but collation syntax is added to produce the markup ] instead.
The Marshallese–English Dictionary is the only complete Marshallese dictionary in existence, and has one significant online location. The template {{meod-ref}}
links to that location, and the template can be updated in case that location changes. The template can accept up to five arguments, each a separate reference. For each reference, only the URL substring immediately following MOD/
need be provided.