The module should preferably not be called directly from templates or other modules.
To use it from a template, use {{xlit}}
.
Within a module, use Module:languages#Language:transliterate.
For testcases, see Module:he-translit/old/testcases.
tr(text, lang, sc)
text
written in the script specified by the code sc
, and language specified by the code lang
.nil
.11 of 150 tests failed. (refresh)
Text | Expected | Actual | Differs at | Comments | |
---|---|---|---|---|---|
![]() | בַּיִת | bayiṯ | bayiṯ | ||
![]() | בֵּית | bēṯ | bēṯ | ||
![]() | עַכּוֹ | ʿakkō | 1 | ||
![]() | בָּתִּים | bāttīm | bāttīm | ||
![]() | מַחֲנֶה | maḥăne | maḥăne | ||
![]() | בָּרָא | bārā | bārā | ||
![]() | רֶגֶל | reḡel | reḡel | ||
![]() | כֹּהֵן | kōhēn | kōhēn | ||
![]() | מֶלֶךְ | meleḵ | meleḵ | ||
![]() | מַמְלָכָה | mamlāḵā | mamlāḵā | ||
![]() | הַמַּמְלָכָה | hammamlāḵā | hammamlāḵā | ||
![]() | הַלְּלוּיָהּ | halləlūyāh | halləlūyāh | ||
![]() | הַלְלוּיָהּ | haləlūyāh | haləlūyāh | ||
![]() | יָדַע | yāḏaʿ | yāḏaʿ | ||
![]() | שָׁבוּעַ | šāḇūaʿ | šāḇūaʿ | ||
![]() | רוּחַ | rūaḥ | rūaḥ | ||
![]() | גָּבֹהַּ | gāḇōah | gāḇōah | ||
![]() | מָשִׁיחַ | māšīaḥ | māšīaḥ | ||
![]() | רֵיחַ | rēaḥ | rēaḥ | ||
![]() | שָׂדֶה | śāḏe | śāḏe | ||
![]() | שְׂדֵה | śəḏē | śəḏē | ||
![]() | בָּנַי | bānay | bānay | ||
![]() | בְּנֵי | bənē | bənē | ||
![]() | צָרְכִּי | ṣorkī | ṣorkī | ||
![]() | חָכְמָה | ḥāḵəmā | ḥāḵəmā | ambiguous case: could be ḥāḵəmā or ḥoḵmā, but I think ḥāḵəmā is the preferred default | |
![]() | שִׁפְרָה | šip̄rā | šip̄rā | ||
![]() | שָׁכְבְּךָ | šoḵbəḵā | šoḵbəḵā | ||
![]() | הָפְכָּה | hop̄kā | hop̄kā | made-up word, but a particular potentially problematic Unicode situation | |
![]() | קָטְבּוֹ | qoṭbō | qoṭbō | another particular potentially problematic Unicode situation | |
![]() | נִשְׂרְפָה | niśrəp̄ā | niśrəp̄ā | ||
![]() | בָּנָיו | bānāw | bānāw | ||
![]() | בָּנֶיהָ | bānehā | bānehā | ||
![]() | מִצְוֹת | miṣwōṯ | miṣwōṯ | ||
![]() | זִוּוּג | ziwwūḡ | ziwwūḡ | ||
![]() | רֹאשׁ | rōš | rōš | ||
![]() | רֵאשִׁית | rēšīṯ | rēšīṯ | ||
![]() | רִאשׁוֹן | rīšōn | rīšōn | ||
![]() | מְלָאכָה | məlāḵā | məlāḵā | ||
![]() | מְלֶאכֶת | məleḵeṯ | məleḵeṯ | ||
![]() | חֵטְא | ḥēṭ | ḥēṭ | ||
![]() | בָּרָאתָ | bārāṯā | bārāṯā | ||
![]() | חַטֹּאות | ḥaṭṭōṯ | ḥaṭṭōṯ | ||
![]() | יְראוּ | yərū | yərū | ||
![]() | וַיֶּאְסֹר | wayyeʾsōr | wayyeʾsōr | ||
![]() | הָחְלַט | hoḥlaṭ | hoḥlaṭ | ||
![]() | וַיֵּבְךְּ | wayyēḇk | wayyēḇk | ||
![]() | אַרְאֶךָּ | ʾarʾekkā | ʾarʾekkā | ||
![]() | וַיַּשְׁקְ | wayyašq | wayyašq | ||
![]() | אַתְּ | ʾatt | ʾatt | ||
![]() | וּוָווֹ | ūwāwō | ūwāwō | ||
![]() | וָו | wāw | wāw | ||
![]() | תָּו | tāw | tāw | ||
![]() | קַו | qaw | qaw | ||
![]() | לָאו | lāw | lāw | ||
![]() | חַי | ḥay | ḥay | ||
![]() | חָי | ḥāy | ḥāy | pausal | |
![]() | פִּיו | pīw | pīw | ||
![]() | כִּסְלֵו | kislēw | kislēw | ||
![]() | גּוֹי | gōy | gōy | ||
![]() | גֹּי | gōy | gōy | ||
![]() | גֹּיִים | gōyīm | gōyīm | ||
![]() | רָאוּי | rāʾūy | rāʾūy | ||
![]() | קִיא | qī | qī | ||
![]() | יָבִיאוּ | yāḇīʾū | yāḇīū | 5 | |
![]() | יְבִיאוּן | yəḇīʾūn | yəḇīūn | 5 | |
![]() | מֵאוּן | mēʾūn | mēʾūn | ||
![]() | מֵיאוּן | mēʾūn | mēyūn | 3 | |
![]() | בּוֹאוּ | bōʾū | bōʾū | ||
![]() | בֹּאוּ | bōʾū | bōʾū | ||
![]() | בּוּאוּ | būʾū | būʾū | made-up word, but may help identify the issue | |
![]() | אָבִיאָה | ʾāḇīʾā | ʾāḇīʾā | ||
![]() | מֵאָה | mēʾā | mēʾā | ||
![]() | גֵּיאָהּ | gēʾāh | gēʾāh | ||
![]() | אָבוֹאָה | ʾāḇōʾā | ʾāḇōʾā | ||
![]() | אָבֹאָה | ʾāḇōʾā | ʾāḇōʾā | ||
![]() | נְשׂוּאָה | nəśūʾā | nəśūʾā | ||
![]() | קִיאוֹ | qīʾō | qīō | 3 | |
![]() | גֵּאוֹ | gēʾō | gēʾō | ||
![]() | גֵּיאוֹ | gēʾō | gēʾō | ||
![]() | בּוֹאוֹ | bōʾō | bōʾō | ||
![]() | בֹּאוֹ | bōʾō | bōʾō | ||
![]() | מִלּוּאוֹ | millūʾō | millūʾō | ||
![]() | מִי | mī | mī | ||
![]() | אִיִּים | ʾiyyīm | ʾiyyīm | ||
![]() | אִיּוֹב | ʾiyyōḇ | ʾiyyōḇ | ||
![]() | אִיּוּן | ʾiyyūn | ʾiyyūn | ||
![]() | אַיִן | ʾayin | ʾayin | ||
![]() | בּוֹא | bō | bō | ||
![]() | יְפֵהפֶה | yəp̄ēp̄e | yəp̄ēp̄e | ||
![]() | אֹהֶל | ʾōhel | ʾōhel | ||
![]() | הָאֹהֱלָה | hāʾōhĕlā | hāʾōhĕlā | ||
![]() | אָהֳלוֹ | ʾohŏlō | ʾāhŏlō | 2 | |
![]() | אָהָלְךָ | ʾoholəḵā | ʾāhāləḵā | 2 | |
![]() | יִשָּׂשכָר | yiśśāḵār | yiśśāḵār | Still undecided if this actually needs to be handled | |
![]() | הוֹשִׁיעָה נָּא | hōšīʿā nnā | hōšīʿā nnā | ||
![]() | עַד בֹּאֲךָ | ʿaḏ bōʾăḵā | ʿaḏ bōʾăḵā | ||
![]() | וַיַּשְׁקְ אֶת הַצֹּאן | wayyašq ʾeṯ haṣṣōn | wayyašq ʾeṯ haṣṣōn | ||
![]() | בְּנֵי בְרָק | bənē ḇərāq | bənē ḇərāq | ||
![]() | בְרָק | ḇərāq | ḇərāq | ||
![]() | אִישׁ יְהוּדִי הָיָה בְּשׁוּשַׁן הַבִּירָה וּשְׁמוֹ מָרְדֳּכַי בֶּן יָאִיר בֶּן־שִׁמְעִי בֶּן־קִישׁ אִישׁ יְמִינִי׃ | ʾīš yəhūḏī hāyā bəšūšan habbīrā ūšəmō mordŏḵay ben yāʾīr ben-šimʿī ben-qīš ʾīš yəmīnī. | ʾīš yəhūḏī hāyā bəšūšan habbīrā ūšəmō mordŏḵay ben yāʾīr ben-šimʿī ben-qīš ʾīš yəmīnī. | ||
![]() | אִ֣ישׁ יְהוּדִ֔י הָיָ֖ה בְּשׁוּשַׁ֣ן הַבִּירָ֑ה וּשְׁמ֣וֹ מָרְדֳּכַ֗י בֶּ֣ן יָאִ֧יר בֶּן־שִׁמְעִ֛י בֶּן־קִ֖ישׁ אִ֥ישׁ יְמִינִֽי׃ | ʾīš yəhūḏī hāyā bəšūšan habbīrā ūšəmō mordŏḵay ben yāʾīr ben-šimʿī ben-qīš ʾīš yəmīnī. | ʾi֣yš yəhūḏi֔y hāyā֖h bəšūša֣n habbīrā֑h ūšəm֣ō mordŏḵa֗y be֣n yāʾi֧yr ben-šimʿi֛y ben-qi֖yš ʾi֥yš yəmīniֽy. | 2 | fully accented verse; stress should not be indicated in the final syllable |
![]() | וַיְהִי הַמַּבּוּל אַרְבָּעִים יוֹם עַל־הָאָרֶץ וַיִּרְבּוּ הַמַּיִם וַיִּשְׂאוּ אֶת־הַתֵּבָה וַתָּרָם מֵעַל הָאָרֶץ׃ | wayəhī hammabbūl ʾarbāʿīm yōm ʿal-hāʾā́reṣ wayyirbū hammáyim wayyiśəʾū ʾeṯ-hattēḇā wattā́rom mēʿal hāʾāreṣ. | wayhī hammabbūl ʾarbāʿīm yōm ʿal-hāʾāreṣ wayyirbū hammayim wayyiśʾū ʾeṯ-hattēḇā wattārām mēʿal hāʾāreṣ. | 4 | a reminder of why this is hard |
![]() | וַיְהִ֧י הַמַּבּ֛וּל אַרְבָּעִ֥ים י֖וֹם עַל־הָאָ֑רֶץ וַיִּרְבּ֣וּ הַמַּ֗יִם וַיִּשְׂאוּ֙ אֶת־הַתֵּבָ֔ה וַתָּ֖רָם מֵעַ֥ל הָאָֽרֶץ׃ | wayəhī hammabbūl ʾarbāʿīm yōm ʿal-hāʾā́reṣ wayyirbū hammáyim wayyiśəʾū ʾeṯ-hattēḇā wattā́rom mēʿal hāʾāreṣ. | wayhi֧y hammabb֛ūl ʾarbāʿi֥ym y֖ōm ʿal-hāʾā֑reṣ wayyirb֣ū hamma֗yim wayyiśʾū֙ ʾeṯ-hattēḇā֔h wattā֖rām mēʿa֥l hāʾāֽreṣ. | 4 | fully accented verse version of the above |
implicit ktiv/qre that would be nice to have | |||||
![]() | הִוא | hī | hī | ||
![]() | יְרוּשָׁלִַם | yərūšālayim | yərūšālayim | ||
![]() | יְרוּשָׁלִָם | yərūšālāyim | yərūšālāyim | pausal form | |
![]() | יְרוּשָׁלֲמָה | yərūšālaymā | yərūšālaymā | ||
![]() | יְרוּשָׁלֳמָה | yərūšālāymā | yərūšālāymā | ||
ktiv male tests | |||||
![]() | חַיָּיב | ḥayyāḇ | ḥayyāḇ | ||
![]() | חַוָּוה | ḥawwā | ḥawwā | ||
![]() | הֱוֵוה | hĕwē | hĕwē | ||
![]() | הַיְינוּ | haynū | haynū | ||
![]() | הִתְכַּוְּונוּ | hiṯkawwənū | hiṯkawwənū | ||
![]() | גַּוְונָא | gawnā | gawnā | ||
![]() | מְייוּחָד | məyūḥāḏ | məyūḥāḏ | there is no way to tell that it really should be məyuḥāḏ, but anyway this test is for the double yod | |
![]() | כְּדַאי | kəḏay | kəḏay | ||
![]() | כּוּלָּם | kullām | kullām | shuruk does not necessarily imply a long vowel | |
![]() | קִידּוּשׁ | qiddūš | qiddūš | chiriq male does not necessarily imply a long vowel |
Text | Expected | Actual | Differs at | Comments | |
---|---|---|---|---|---|
![]() | מַקְלֵעַ | maklea' | maklea' | ||
![]() | אַבְּסוּרְד | 'ab'sur'd | 'ab'sur'd | not sure about what should be expected here | |
![]() | בִּיּוֹמֶטְרִיָּה | biyometriya | biyometriya | ||
![]() | קַפְרִיסִין | kafrisin | kafrisin | ||
![]() | חֹרֶף | khoref | khoref | ||
![]() | טוּרְקִיז | turkiz | tur'kiz | 4 | |
![]() | טַחַב | takhav | takhav | ||
![]() | יִוָּלֵד | yivaled | yivaled | ||
![]() | יָקִינְתּוֹן | yakinton | yakinton | ||
![]() | כֻּתְנָה | kutna | kutna | ||
![]() | נַגָּרִיָּה | nagariya | nagariya | ||
![]() | נַעֲלֶה | na'ale | na'ale | ||
![]() | מִצְווֹת | mitsvot | mitsvot | ||
![]() | מָקוֹם | makom | makom | ||
![]() | פֶּרוּאָנִי | peru'ani | peru'ani | ||
![]() | צִדְפָּה | tsidpa | tsidpa | ||
![]() | תׇּכְנָה | tokhna | tokhna | ||
![]() | רְאוּ | r'u | r'u | ||
![]() | גּ׳וּק | juk | juk | ||
![]() | ג׳וּק | juk | juk | ||
![]() | גִּ׳ירָאפָה | jirafa | jirafa | ||
![]() | גִ׳ירָאפָה | jirafa | jirafa | ||
![]() | זַ׳רְגוֹן | zhargon | zhargon | ||
![]() | קַפּוּצִ׳ינוֹ | kapuchino | kapuchino | ||
![]() | סְקוֹץ׳ | s'koch | s'koch | ||
![]() | סְתוֹם תַּ׳פֶּה | s'tom ta′pe | s'tom ta′pe | ||
![]() | אִמָּא׳לֶה | 'ima′le | 'ima′le | ||
![]() | חָזָ״ל | khaza″l | khaza″l | ||
![]() | נַחַ״ל | nakha″l | nakha″l | ||
![]() | רה״מ | rh″m | rh″m | ||
![]() | ב״ה | b″h | b″h | ||
![]() | ת״א | t″' | t″' |
local export = {}
local U = require("Module:string/char")
local gsub = mw.ustring.gsub
--[[
-- Uncomment this to redefine gsub so that it prints to the Lua log
-- the names of the code points in the replacements it's making.
local function print_code_point_names(text)
if not text then return "" end
local names = require "Module:array"()
for cp in mw.ustring.gcodepoint(text) do
names:insert(require "Module:Unicode data".lookup_name(cp))
end
return names:concat ", "
end
local actual_gsub = mw.ustring.gsub
local gsub = function(...)
local old, pattern, repl = ...
local new, count = actual_gsub(...)
if old ~= new then
mw.log(table.concat({
print_code_point_names(old),
print_code_point_names(new),
pattern,
tostring(repl)
}, "\n") .. "\n")
end
return new, count
end
--]]
local sheva = U(0x05B0)
local hataf_segol = U(0x05B1)
local hataf_patah = U(0x05B2)
local hataf_qamats = U(0x05B3)
local hiriq = U(0x05B4)
local tsere = U(0x05B5)
local segol = U(0x05B6)
local patah = U(0x05B7)
local qamats = U(0x05B8)
local qamats_qatan = U(0x05C7)
local holam = U(0x05B9)
local holam_haser_for_waw = U(0x05BA)
local qubuts = U(0x05BB)
local dagesh_mappiq = U(0x05BC)
local shin_dot = U(0x05C1)
local sin_dot = U(0x05C2)
local macron_above = U(0x0304)
local macron_below = U(0x0331)
local macron = ""
local alef = "א"
local he = "ה"
local waw = "ו"
local yod = "י"
local vowel_letters = alef .. he .. waw .. yod
local vowel_letter = ""
-- '0' represents silent sheva
local vowel_points = (
sheva .. hataf_segol .. hataf_patah .. hataf_qamats .. hiriq .. tsere ..
segol .. patah .. qamats .. qamats_qatan .. holam .. qubuts .. '0' ..
holam_haser_for_waw
)
local vowel_point = ""
local short_vowels = segol .. patah .. hiriq .. qubuts .. qamats_qatan
local short_vowel = ""
local shuruq = waw .. dagesh_mappiq
local holam_male = waw .. holam
-- use dummies characters that do not match as punctuation
-- the dummy letter stands in for final silent alef or he, or for the hiatus before a furtive patah,
-- or comes before a pre-transliterated waw to aid in matching
local dummy_letter = U(0x0627) -- ARABIC LETTER ALEF
local dummy_geresh = U(0x064E) -- ARABIC FATHA
local dummy_gershayim = U(0x064B) -- ARABIC FATHATAN
local real_geresh = '׳'
local real_gershayim = '״'
local letter_modifier = "??"
local letters = "אבגדהוזחטיכךלמםנןסעפףצץקרשת"
local letter = "" .. letter_modifier
local letter_not_waw = "" .. letter_modifier
local gutturals = "אהחע"
local guttural = ""
local vowel_letter_or_geresh = ""
-- note, the geresh and gershayim are included in this, which is why dummies are used in their place
local word_break_chars = "%s%p"
local word_break = ""
local word_start = "%f" -- matches the boundary but not the actual word break characters
local word_end = "%f" -- matches the boundary but not the actual word break characters
local tr_vowels = "aeiouāēīōūəăĕŏ0"
local biblical_to_modern = {
= '\'',
= 'v',
= 'g',
= 'd',
= 'v',
= 'zh',
= 'kh',
= 't',
= 'kh',
= '\'',
= 'f',
= 'ts',
= 'ch',
= 'k',
= 'sh',
= 's',
= 't',
= '\'',
= 'e',
= 'a',
= 'o',
= 'i',
= 'e',
= 'a',
= 'o',
= 'u',
}
-- helper function to remove vowel letters but keep gereshes
local function gereshes(str)
return gsub(str, vowel_letter, '')
end
local biblical = {
{
-- replace geresh and gershayim with their dummy equivalents so that they won't match as word boundaries
= dummy_geresh,
= dummy_gershayim,
},
{
-- The default order is: consonant, vowel point, dagesh or mappiq, shin or sin dot.
-- The desired order is: consonant, shin or sin dot, dagesh or mappiq, vowel point.
-- Also, move geresh and gershayim closer to the letter for easier handling (will be moved back later if not actually a modifier)
)(" .. vowel_point .. "*)(" .. dagesh_mappiq .. "*)(*)(*)"] = "%1%4%5%3%2",
},
{
-- special case: change qamats in כל to qamats qatan
-- the problem is that כל might be preceded by prefixed clitics, which maybe be chained indefinitely,
-- while other unrelated words might happen to end in כל with a qamats gadol; therefore, match either
-- the entire word or only when preceded by a precisely recognized prefix
= "%1" .. qamats_qatan .. "%2",
" .. dagesh_mappiq .. "?" .. patah .. "כ" .. dagesh_mappiq .. ")" .. qamats .. "(ל)" .. word_end] = "%1" .. qamats_qatan .. "%2",
= "%1" .. qamats_qatan .. "%2",
כ" .. dagesh_mappiq .. ")" .. qamats .. "(ל)" .. word_end] = "%1" .. qamats_qatan .. "%2", -- patah is very archaic
" .. dagesh_mappiq .. "?" .. sheva .. "כ)" .. qamats .. "(ל)" .. word_end] = "%1" .. qamats_qatan .. "%2",
},
{
-- remove final alef and he, but only when preceded by a vowel
" .. word_end] = "%1" .. dummy_letter,
" .. word_end] = "%1" .. dummy_letter,
},
{
-- these are the cases, other than the above, where a final letter should be ignored
" .. word_end] = "ī",
)" .. vowel_letter_or_geresh .. "-" .. word_end] = "%1",
)" .. vowel_letter_or_geresh .. "-" .. word_end] = "%1",
},
{
= "0%1" .. sheva, -- two shevas in a row
= "%10", -- after a short vowel, assume(!) a silent sheva
= "%10", -- gutturals cannot have a vocal sheva
= "%1" .. dummy_letter .. "ww", -- when waw + dagesh is not a shuruq
= "%1" .. dummy_letter .. "ww%2", -- when waw + dagesh is not a shuruq
= "%1" .. dummy_letter .. "w" .. holam, -- when waw + holam is not a holam male
)" .. dagesh_mappiq] = "%1", -- handle mappiq (very rarely occurs on an alef)
},
{
= shuruq .. "ww", -- another potential case when waw + dagesh is not a shuruq
= shuruq .. "w" .. holam, -- another potential case when waw + holam is not a holam male
-- tentatively lengthen hiriqs with vowel letters
= function(vlg, l) return "ī" .. gereshes(vlg) .. l end,
-- rearrange furtive patach (mappiq should already have been removed, but handle it just in case)
= dummy_letter .. "a%1",
},
{
-- remove vowel letters
= function(l, vlg) return l .. gereshes(vlg) .. shuruq end,
= function(vlg, l) return shuruq .. gereshes(vlg) .. l end,
)"] = function(vlg, l) return shuruq .. gereshes(vlg) .. l end,
= function(vp, vlg, l) return vp .. gereshes(vlg) .. l end,
)"] = function(vp, vlg, l) return vp .. gereshes(vlg) .. l end,
},
{
-- handle two-character combinations first
= 'j',
= 'ž',
' .. dummy_geresh] = 'č',
= 'š',
= 'ś',
},
{
= 'ʾ',
= 'b' .. macron_below,
= 'g' .. macron_above,
= 'd' .. macron_below,
= 'h',
= 'z',
= 'ḥ',
= 'ṭ',
= 'y',
'] = 'k' .. macron_below,
= 'l',
'] = 'm',
'] = 'n',
= 's',
= 'ʿ',
'] = 'p' .. macron_above,
'] = 'ṣ',
= 'q',
= 'r',
= 't' .. macron_below,
},
{
)' .. macron .. '?' .. dagesh_mappiq] = '%1', -- assume(!) dagesh qal at the beginning of a word
()' .. macron .. '?' .. dagesh_mappiq] = '0%1', -- dagesh qal after sheva, and assume(!) silent sheva
= '%1' .. sheva .. '%1', -- vocal sheva between identical consonants
= 'ū',
},
{
-- restore geresh and gershayim order
)(" .. dagesh_mappiq .. "*)(" .. vowel_point .. "*)"] = "%2%3%1",
},
{
-- handle ירושלם
= "ayi", -- in this case, the vowels are reversed by Unicode normalization rules
= "ayi", -- just in case they're in the correct order
= "āyi", -- pausal form of above
= "āyi", -- as above
-- handle ירושלמה
" .. patah] = "ay", -- in this case, the vowels are reversed by Unicode normalization rules
"] = "ay", -- just in case they're in the correct order
" .. qamats] = "āy", -- pausal form of above
"] = "āy", -- as above
},
{
= 'ə',
= 'ĕ',
= 'ă',
= 'ŏ',
= 'i',
= 'ē',
= 'e',
= 'a',
= 'ā',
= 'o',
= 'u',
= '',
= '',
= 'ō',
= 'wō',
},
{
= '%1%1', -- gemination
},
{
(k' .. macron_below .. ')'] = '%1%2', -- special case for יששכר
},
{
= 'o%1', -- assume(!) qamats qatan before silent sheva
= 'ō',
= 'w',
= 'š', -- assume(!) shin if no shin or sin dot
},
{
-- handle bgdkpt letters in unvocalized words (such as acronyms)
-" .. macron .. "-)" .. word_end] = function(w) return gsub(w, "()" .. macron, "%1") end
},
{
"] = "",
-- short vowels in non-final closed syllables (this rule should be expanded)
= "u%1%1",
= "i%1%1",
},
{
= "", -- final sheva is always silent
= '′',
= '″',
= '.', -- sof pasuq
= '-', -- maqaf
},
}
function export.tr(text, lang, sc)
-- default to modern for Hebrew, but not for other languages, such as Aramaic
local modern = lang == "he"
return export.biblical(text, modern)
end
function export.biblical(text, modern)
-- decompose
text = mw.ustring.toNFD(text)
-- wrap with spaces to make initial and final replacements easier
text = ' ' .. text .. ' '
for _, replacements in ipairs(biblical) do
for regex, replacement in pairs(replacements) do
text = gsub(text, regex, replacement)
end
end
-- unwrap spaces
text = mw.ustring.match(text, "^ (.*) $")
if text == nil then error("Something went wrong, wrapped spaces were deleted.") end
-- must happen before recomposition
if modern then
text = gsub(text, "()%1", "%1")
text = gsub(text, "" .. macron .. "?", function(x) return biblical_to_modern or x end)
text = gsub(text, "''", "'")
end
-- recompose
text = mw.ustring.toNFC(text)
return text
end
return export