This module contains four functions, three of which are called by other modules.
standardDiacritics
takes spacing or nonstandard diacritics and converts them to standard combining diacritics. This function is used by pronunciationOrder
.
reorderDiacritics
takes the diacritics, removes them from the letter (mw.ustring.toNFD
), and reorders them so that macrons or breves are first; diaeresis or breathing mark is second; acute, grave, or circumflex is third; and iota subscript is last. Aside from the iota subscript part, this is the only order in which the diacritics can display correctly, as explained elsewhere. This function is used by Module:typing-aids and {{chars}}
.
pronunciationOrder
does the same thing, except it puts the macron or breve and iota subscript last and recombines the diacritics (mw.ustring.toNFC
) after reordering them. The diaeresis or breathing mark and accent mark will recombine, while the macron and breve remains uncombined as a combining character. This function is used by Module:grc-pronunciation and {{grc-IPA}}
.
Module:grc-utilities/data holds the diacritic definitions and substitutions that are used by this module.
The function tokenize
breaks the text into meaningful units of a single consonant or monophthong letter, or diphthong, with any diacritics, as shown below. This function is used by Module:grc-translit and Module:grc-accent, and by the sandbox module Module:grc-pronunciation/sandbox.
The first argument is the word to be tokenized. The second is a boolean: if true, the function will group εω together as a diphthong, for instance in πόλεως (póleōs), genitive of πόλῐς (pólis, “city state”).
word | tokens |
---|---|
ἡμεῖς | ἡ, μ, εῖ, ς |
οἷαι | οἷ, αι |
ἀναῡ̈τέω | ἀ, ν, α, ῡ̈, τ, έ, ω |
δαΐφρων | δ, α, ΐ, φ, ρ, ω, ν |
τούτῳ | τ, ού, τ, ῳ |
ὑϊκός | ὑ, ϊ, κ, ό, ς |
ἡ Ἑλήνη | ἡ, , Ἑ, λ, ή, ν, η |
νηῦς | ν, ηῦ, ς |
υἱός | υἱ, ό, ς |
ὄργυιᾰ | ὄ, ρ, γ, υι, ᾰ |
οὐ δοκεῖν ἀλλ’ εἶναι ἀγαθὸν | οὐ, , δ, ο, κ, εῖ, ν, , ἀ, λ, λ, ’, , εἶ, ν, αι, , ἀ, γ, α, θ, ὸ, ν |