Module:ang-pron

Hello, you have come here looking for the meaning of the word Module:ang-pron. In DICTIOUS you will not only get to know all the dictionary meanings for the word Module:ang-pron, but we will also tell you about its etymology, its characteristics and you will know how to say Module:ang-pron in singular and plural. Everything you need to know about the word Module:ang-pron you have here. The definition of the word Module:ang-pron will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofModule:ang-pron, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

This module generates IPA for Old English words. There are three entry points:

  • show directly implements Template:ang-IPA, and is meant to be called from that template.
  • phonemic() generates the raw phonemic IPA for Old English text.
  • phonetic() generates the raw phonetic IPA for Old English text.

Test cases can be found in Module:ang-pron/testcases.

The primary documentation for this module can be found in the documentation for Template:ang-IPA.


--[=[

Implementation of pronunciation-generation module from spelling for
Old English.

Author: Benwing

Generally, the user should supply the spelling, properly marked up with
macrons for long vowels, and ċ ġ ċġ sċ for soft versions of these consonants.
In addition, the following symbols can be used:

-- acute accent on a vowel to override the position of primary stress
--   (in a diphthong, put it over the first vowel)
-- grave accent to add secondary stress
-- circumflex to force no stress on the word or prefix (e.g. in a compound)
-- . (period) to force a syllable boundary
-- - (hyphen) to force a prefix/word or word/word boundary in a compound word;
--   the result will be displayed as a single word but the consonants on
--   either side treated as if they occurred at the beginning/end of the word
-- + (plus) is the opposite of -; it forces a prefix/word or word/word boundary
--   to *NOT* occur when it otherwise would
-- _ (underscore) to force the letters on either side to be interpreted
--   independently, when the combination of the two would normally have a
--   special meaning

FIXME:

1. Implement < and > which works like - but don't trigger secondary stress
   (< after a prefix, > before a suffix) (DONE)
2. Recognize -lēas and -l as suffixes. (DONE)
2b. Recognize -fæst, -ful, -full as suffixes (so no voicing of initial
    fricative). (DONE)
3. If explicit syllable boundary in cluster after prefix, don't recognize as
   prefix (hence ġeddung could be written ġed.dung, bedreda bed.reda) (DONE)
4. Two bugs in swīþfèrhþ: missing initial stress, front h should be back (DONE)
5. Check Urszag's code changes for /h/. (DONE)
6. Bug in wasċan; probably sċ between vowels should be ʃʃ (DONE)
7. Bug in ġeddung, doesn't have allowed onset with ġe-ddung (DONE)
8. āxiġendlīc -- x is not an allowed onset (DONE)
9. Handle prefixes/suffixes denoted with initial/final hyphen -- shouldn't
   trigger automatic stress when multisyllabic. (DONE)
10. Don't remove user-specified accents on monosyllabic words. (DONE)
11. Final -þu/-þo after a consonant should always be voiceless (but should be
    overridable). (DONE; OVERRIDABLE THROUGH "EXPLICIT ALLOPHONE" NOTATION)
12. Fricative voiced between voiced sounds even across prefix/compound
    boundary when before (but not after) the boundary. (DONE)
13. Fricative between unstressed vowels should be voiceless (e.g. adesa);
    maybe only after the stress? (DONE)
14. Resonant after fricative/stop in a given syllable should be rendered
    as syllabic (e.g. ādl , botm , bōsm, bēacn ;
	also -mn e.g stemn /ˈstemn̩/. (DONE, BUT REVERSED)
15. Add aġēn- and onġēan- prefixes with secondary stress for verbs.
    (WILL NOT DO)
16. and- (and maybe all others) should be unstressed as verbal prefix.
    andswarian is an exception. (DONE)
17. Support multiple pronunciations as separate numbered params. (DONE)
17b. Additional specifiers should follow each pronun as PRONUN<K:V,K2:V2,...>.
    This includes the current pos=.
18. Double hh should be pronounced as . (DONE)
19. Add -bǣre as a suffix with secondary stress. (DONE)
20. Add -līċ(e), līnes(s) as suffixes with secondary stress. -līnes(s)
    should behave like -līċ(e) in that what's before is checked to determine
	the pos. (DONE)
21. -lēasnes should be a recognized suffix with secondary stress. (DONE)
22. Fix handling of crinċġan, dynċġe, should behave as if ċ isn't there. (DONE)
23. Rewrite to use ]. (DONE)
24. Ignore final period/question mark/exclamation point. (DONE)
25. Implement pos=verbal for handling un-. (DONE)
26. Simplify geminate consonants within a single syllable. (DONE)
27. Implement "explicit allophone" notation, e.g.  . (DONE)
28. ''-sian'' should be voiceless in verbs. (DONE)
29. ''-rian'' should have /j/ after short vowels in verbs. (DONE)
30. BUG: Secondary stress not getting applied in ''-lēas'' (see ]).

QUESTIONS:

1. Should /an/, /on/ be pronounced ? Same for /am/, /om/. 
2. Should final /ɣ/ be rendered as ? 
3. Should word-final double consonants be simplified in phonetic representation?
   Maybe also syllable-final except obstruents before ? 
4. Should we use /x/ instead of /h/? 
5. Should we recognize from- along with fram-? 
6. Should we recognize bi- along with be-? (danger of false positives) 
7. Should fricative be voiced before voicd sound across word boundary?
   (dæġes ēage ?) 
8. Ask about pronunciation of bræġn, is the n syllabic? It's given as
   /ˈbræjn̩/. Similarly, seġl given as /ˈsejl̩/. 
9. Ask about pronunciation of ġeond-, can it be either  or ? 
10. Is final -ol pronounced  e.g regol ? Hundwine has created
    entries this way. What about final -oc etc.? 
11. Is final -ian pronounced  or ? Cf. sċyldigian given as
    {{IPA|ang|/ˈʃyldiɣiɑn/|/ˈʃyldiɣjɑn/}}. What about spyrian given as /ˈspyr.jɑn/?
	
12. seht given as /seçt/ but sehtlian given as /ˈsextliɑn/. Which one is
    correct? 
13. Final -liċ or -līċ, with or without secondary stress?
14. Should we special-case -sian ? Then we need support for  notation
    to override phonetics.
]=]

local strutils = require("Module:string utilities")
local m_table = require("Module:table")
local m_IPA = require("Module:IPA")
local lang = require("Module:languages").getByCode("ang")
local com = require("Module:ang-common")

local u = require("Module:string/char")
local rsubn = mw.ustring.gsub
local rfind = mw.ustring.find
local rmatch = mw.ustring.match
local rsplit = mw.text.split
local rgsplit = mw.text.gsplit
local ulen = mw.ustring.len
local ulower = mw.ustring.lower

-- version of rsubn() that discards all but the first return value
local function rsub(term, foo, bar, n)
	local retval = rsubn(term, foo, bar, n)
	return retval
end

-- like str:gsub() but discards all but the first return value
local function gsub(term, foo, bar, n)
	local retval = term:gsub(foo, bar, n)
	return retval
end

local export = {}

-- When auto-generating primary and secondary stress accents, we use these
-- special characters, and later convert to normal IPA accent marks, so
-- we can distinguish auto-generated stress from user-specified stress.
local AUTOACUTE = u(0xFFF0)
local AUTOGRAVE = u(0xFFF1)

-- When the user uses the "explicit allophone" notation such as  or  to
-- force a particular allophone, we internally convert that notation into a
-- single special character.
local EXPLICIT_TH = u(0xFFF2)
local EXPLICIT_DH = u(0xFFF3)
local EXPLICIT_S = u(0xFFF4)
local EXPLICIT_Z = u(0xFFF5)
local EXPLICIT_F = u(0xFFF6)
local EXPLICIT_V = u(0xFFF7)
local EXPLICIT_G = u(0xFFF8)
local EXPLICIT_GH = u(0xFFF9)
local EXPLICIT_H = u(0xFFFA)
local EXPLICIT_X = u(0xFFFB)
local EXPLICIT_C = u(0xFFFC)
local EXPLICIT_I = u(0xFFFD)

local explicit_cons = EXPLICIT_TH .. EXPLICIT_DH .. EXPLICIT_S .. EXPLICIT_Z ..
	EXPLICIT_F .. EXPLICIT_V .. EXPLICIT_G .. EXPLICIT_GH .. EXPLICIT_H ..
	EXPLICIT_X .. EXPLICIT_C

-- Map "explicit allophone" notation into special char. See above.
local char_to_explicit_char = {
	 = EXPLICIT_TH,
	 = EXPLICIT_DH,
	 = EXPLICIT_S,
	 = EXPLICIT_Z,
	 = EXPLICIT_F,
	 = EXPLICIT_V,
	 = EXPLICIT_G,
	 = EXPLICIT_GH,
	 = EXPLICIT_H,
	 = EXPLICIT_X,
	 = EXPLICIT_C,
	 = EXPLICIT_I,
}

-- Map "explicit allophone" notation into normal spelling, for supporting ann=.
local char_to_spelling = {
	 = "þ",
	 = "þ",
	 = "s",
	 = "s",
	 = "f",
	 = "f",
	 = "g",
	 = "g",
	 = "h",
	 = "h",
	 = "h",
	 = "i",
}

-- Map "explicit allophone" notation into phonemes, for phonemic output.
local explicit_char_to_phonemic = {
	 = "θ",
	 = "θ",
	 = "s",
	 = "s",
	 = "f",
	 = "f",
	 = "ɡ", -- IPA ɡ!
	 = "ɡ", -- IPA ɡ!
	 = "x",
	 = "x",
	 = "x",
	 = "i",
}

-- Map "explicit allophone" notation into IPA phones, for phonetic output.
local explicit_char_to_phonetic = {
	 = "θ",
	 = "ð",
	 = "s",
	 = "z",
	 = "f",
	 = "v",
	 = "ɡ", -- IPA ɡ!
	 = "ɣ",
	 = "h",
	 = "x",
	 = "ç",
	 = "i",
}

local accent = com.MACRON .. com.ACUTE .. com.GRAVE .. com.CFLEX .. AUTOACUTE .. AUTOGRAVE
local accent_c = ""
local stress_accent = com.ACUTE .. com.GRAVE .. com.CFLEX .. AUTOACUTE .. AUTOGRAVE
local stress_accent_c = ""
local back_vowel = "aɑou"
local front_vowel = "eiyæœø" .. EXPLICIT_I
local vowel = back_vowel .. front_vowel
local vowel_or_accent = vowel .. accent
local vowel_c = ""
local vowel_or_accent_c = ""
local non_vowel_c = ""
local front_vowel_c = ""
-- The following include both IPA symbols and letters (including regular g and IPA ɡ)
-- so it can be used at any step of the process.
local obstruent = "bcċçdfgɡɣhkpqstvxzþðθʃʒ" .. explicit_cons
local resonant = "lmnŋrɫ"
local glide = "ġjwƿ"
local cons = obstruent .. resonant .. glide
local cons_c = ""
local voiced_sound = vowel .. "lrmnwjbdɡ" -- WARNING, IPA ɡ used here

-- These rules operate in order, and apply to the actual spelling,
-- after (1) macron decomposition, (2) syllable and prefix splitting,
-- (3) placement of primary and secondary stresses at the beginning
-- of the syllable. Each syllable will be separated either by ˈ
-- (if the following syllable is stressed), by ˌ (if the following
-- syllable has secondary stress), or by . (otherwise). In addition,
-- morpheme boundaries where the consonants on either side should be
-- treated as at the beginning/end of word (i.e. between prefix and
-- word, or between words in a compound word) will be marked with ⁀
-- before the syllable separator, and the beginning and end of text
-- will be marked by ⁀⁀. The output of this is fed into phonetic_rules,
-- and then is used to generate the displayed phonemic pronunciation
-- by removing ⁀ symbols.
local phonemic_rules = {
	{com.MACRON, "ː"},
	{"eoː", "oː"}, -- e.g. ġeōmor
	{"eaː", "aː"},
	{"ː?", {
		-- Alternative notation for short diphthongs: iu̯, eo̯, æɑ̯
		-- Alternative notation for long diphthongs: iːu̯, eːo̯, æːɑ̯
		 = "æ͜ɑ",
		 = "æ͜ɑː",
		 = "e͜o",
		 = "e͜oː",
		 = "i͜u",
		 = "i͜uː",
		 = "i͜y",
		 = "i͜yː",
	}},
	-- sċ between vowels when at the beginning of a syllable should be ʃ.ʃ
	{"(" .. vowel_c .. "ː?)(?)sċ(" .. vowel_c .. ")", "%1ʃ%2ʃ%3"},
	-- other sċ should be ʃ; note that sċ divided between syllables becomes s.t͡ʃ
	{"sċ", "ʃ"},
	-- x between vowels when at the beginning of a syllable should be k.s;
	-- remaining x handled below
	{"(" .. vowel_c .. "ː?)(?)x(" .. vowel_c .. ")", "%1k%2s%3"},
	-- z between vowels when at the beginning of a syllable should be t.s;
	-- remaining z handled below
	{"(" .. vowel_c .. "ː?)(?)z(" .. vowel_c .. ")", "%1t%2s%3"},
	-- short front vowel + -rian, -riend, -rienne, -riende in verb or verbal is
	-- rendered with /j/; we need to carefully change the syllable structure
	-- when doing this
	{"(" .. front_vowel_c .. ")%.ri%.(an⁀)", "%1r.ġ%2", {"verb"}},
	{"(" .. front_vowel_c .. ")%.ri%.(end⁀)", "%1r.ġ%2", {"verb", "verbal"}},
	{"(" .. front_vowel_c .. ")%.ri%.(en%.e⁀)", "%1r.ġ%2", {"verb", "verbal"}},
	{"nċ(?)ġ", "n%1j"},
	{"ċ(?)ġ", "j%1j"},
	{"c(?)g", "g%1g"},
	{"ċ(?)ċ", "t%1t͡ʃ"},
	{".", {
		 = "t͡ʃ",
		 = "k",
		 = "j",
		 = "x",
		 = "θ",
		 = "θ",
		 = "w",
		 = "ks",
		 = "ts",
		 = "ɡ", -- map to IPA ɡ
		 = "ɑ",
		 = "ø",
	}},
}

local fricative_to_voiced = {
	 = "v",
	 = "z",
	 = "ð",
}

local fricative_to_unvoiced = {
	 = "f",
	 = "s",
	 = "θ",
}

-- These rules operate in order, on the output of phonemic_rules.
-- The output of this is used to generate the displayed phonemic
-- pronunciation by removing ⁀ symbols.
local phonetic_rules = {
	-- Fricative voicing between voiced sounds. Note, the following operates
	-- across a ⁀ boundary for a fricative before the boundary but not after.
	{"(*)()(*)",
		function(s1, c, s2)
			return s1 .. fricative_to_voiced .. s2
		end
	},
	-- Fricative between unstressed vowels should be devoiced.
	-- Note that unstressed syllables are preceded by . while stressed
	-- syllables are preceded by a stress mark.
	{"(%.*%.)()",
		function(s1, c)
			return s1 .. fricative_to_unvoiced
		end
	},
	-- Final -sian, -siend, -sienne, -siende (and variants such as -siġan,
	-- -siġend, etc.) in verb or verbal is rendered with ; clǣnsian will
	-- have to be special-cased with ''''
	{"(" .. cons_c .. "ː?" .. "%.)z(i%.j?ɑn⁀)", "%1s%2", {"verb"}},
	{"(" .. cons_c .. "ː?" .. "%.)z(i%.j?end⁀)", "%1s%2", {"verb", "verbal"}},
	{"(" .. cons_c .. "ː?" .. "%.)z(i%.j?en%.e⁀)", "%1s%2", {"verb", "verbal"}},
	-- Final unstressed -þu/-þo after a consonant should be devoiced.
	{"(" .. cons_c .. "ː?" .. "%.)ð(⁀)",
		function(s1, s2)
			return s1 .. "θ" .. s2
		end
	},
	{"x", {
		 = "ʍ",
		 = "l̥",
		 = "n̥",
		 = "r̥",
	}},
	-- Note, the following will not operate across a ⁀ boundary.
	{"n(?)", "ŋ%1"}, -- WARNING, IPA ɡ used here
	{"n(?)j", "n%1d͡ʒ"},
	{"j(?)j", "d%1d͡ʒ"},
	{"()x", "%1h"},      --  occurs as a syllable-initial allophone
	{"(" .. front_vowel_c .. ")x", "%1ç"}, --  occurs after front vowels
	-- An IPA ɡ after a word/prefix boundary, after another ɡ or after n
	-- (previously converted to ŋ in this circumstance) should remain as ɡ,
	-- while all other ɡ's should be converted to ɣ except that word-final ɡ
	-- becomes x. We do this by converting the ɡ's that should remain to regular
	-- g (which should never occur otherwise), convert the remaining IPA ɡ's to ɣ
	-- or x, and then convert the regular g's back to IPA ɡ.
	{"ɡ(?)ɡ", "g%1g"}, -- WARNING, IPA ɡ on the left, regular g on the right
	{"()(?)ɡ", "%1%2g"}, -- WARNING, IPA ɡ on the left, regular g on the right 
	{"ɡ", "ɣ"},
	{"g", "ɡ"}, -- WARNING, regular g on the left, IPA ɡ on the right
	{"l(?)l", "ɫ%1ɫ"},
	{"r(?)r", "rˠ%1rˠ"},
	{"l(?" .. cons_c .. ")", "ɫ%1"},
	{"r(?" .. cons_c .. ")", "rˠ%1"},
	-- Geminate consonants within a single syllable are pronounced singly.
	-- Does not apply e.g. to ''ǣttren'', which will be divided as ''ǣt.tren''.
	{"(" .. cons_c .. ")%1", "%1"},
	{"rˠrˠ", "rˠ"},
	-- [In the sequence vowel + obstruent + resonant in a single syllable,
	-- the resonant should become syllabic, e.g. ādl , blōstm ,
	-- fæþm , bēacn . We allow anything but a syllable or word
	-- boundary betweent the vowel and the obstruent.] [BASED ON INPUT FROM
	-- ], I'VE DECIDE AGAINST THIS]
	-- {"(" .. vowel_c .. "*ː?)", "%1" .. com.SYLLABIC},
	-- also -mn e.g stemn /ˈstemn̩/; same for m + other resonants except m
	-- {"(" .. vowel_c .. "*mː?)", "%1" .. com.SYLLABIC},
	{".", explicit_char_to_phonetic},
}

local function apply_rules(word, rules, pos)
	for _, rule in ipairs(rules) do
		local allowed_pos = rule
		if not allowed_pos or m_table.contains(allowed_pos, pos) then
			word = rsub(word, rule, rule)
		end
	end
	return word
end

local function lookup_stress_spec(stress_spec, pos)
	return stress_spec or (pos == "verbal" and stress_spec) or nil
end

local function split_on_word_boundaries(word, pos)
	local retparts = {}
	local parts = strutils.split(word, "()")
	local i = 1
	local saw_primary_stress = false
	while i <= #parts do
		local split_part = false
		local insert_position = #retparts + 1
		if parts ~= "<" and parts ~= ">" then
			-- Split off any prefixes.
			while true do
				local broke_prefix = false
				for _, prefixspec in ipairs(com.prefixes) do
					local prefix_pattern = prefixspec
					local stress_spec = prefixspec
					local pos_stress = lookup_stress_spec(stress_spec, pos)
					local prefix, rest = rmatch(parts, "^(" .. prefix_pattern .. ")(.*)$")
					if prefix then
						if not pos_stress then
							-- prefix not recognized for this POS, don't split here
						elseif stress_spec.restriction and not rfind(rest, stress_spec.restriction) then
							-- restriction not met, don't split here
						elseif rfind(rest, "^%+") then
							-- explicit non-boundary here, so don't split here
						elseif not rfind(rest, vowel_c) then
							-- no vowels, don't split here
						elseif rfind(rest, "^..?$") then
							-- only two letters, unlikely to be a word, probably an ending, so don't split
							-- here
						else
							local initial_cluster, after_cluster = rmatch(rest, "^(" .. non_vowel_c .. "*)(.-)$")
							if rfind(initial_cluster, "..") and (
								not (com.onsets_2 or com.secondary_onsets_2 or
									com.onsets_3)) then
								-- initial cluster isn't a possible onset, don't split here
							elseif rfind(initial_cluster, "^x") then
								-- initial cluster isn't a possible onset, don't split here
							elseif rfind(after_cluster, "^" .. vowel_c .. "$") then
								-- remainder is a cluster + short vowel,
								-- unlikely to be a word so don't split here
							else
								-- break the word in two; next iteration we process
								-- the rest, which may need breaking again
								parts = rest
								if pos_stress == "unstressed" then
									-- don't do anything
								elseif pos_stress == "secstressed" or (saw_primary_stress and pos_stress == "stressed") then
									prefix = rsub(prefix, "(" .. vowel_c .. ")", "%1" .. AUTOGRAVE, 1)
								elseif pos_stress == "stressed" then
									prefix = rsub(prefix, "(" .. vowel_c .. ")", "%1" .. AUTOACUTE, 1)
									saw_primary_stress = true
								else
									error("Unrecognized stress spec for pos=" .. pos .. ", prefix=" .. prefix .. ": " .. pos_stress)
								end
								table.insert(retparts, insert_position, prefix)
								insert_position = insert_position + 1
								broke_prefix = true
								break
							end
						end
					end
				end
				if not broke_prefix then
					break
				end
			end

			-- Now do the same for suffixes.
			while true do
				local broke_suffix = false
				for _, suffixspec in ipairs(com.suffixes) do
					local suffix_pattern = suffixspec
					local stress_spec = suffixspec
					local pos_stress = lookup_stress_spec(stress_spec, pos)
					local rest, suffix = rmatch(parts, "^(.-)(" .. suffix_pattern .. ")$")
					if suffix then
						if not pos_stress then
							-- suffix not recognized for this POS, don't split here
						elseif stress_spec.restriction and not rfind(rest, stress_spec.restriction) then
							-- restriction not met, don't split here
						elseif rfind(rest, "%+$") then
							-- explicit non-boundary here, so don't split here
						elseif not rfind(rest, vowel_c) then
							-- no vowels, don't split here
						else
							local before_cluster, final_cluster = rmatch(rest, "^(.-)(" .. non_vowel_c .. "*)$")
							if rfind(final_cluster, "%..") then
								-- syllable division within or before final
								-- cluster, don't split here
							else
								-- break the word in two; next iteration we process
								-- the rest, which may need breaking again
								parts = rest
								if pos_stress == "unstressed" then
									-- don't do anything
								elseif pos_stress == "secstressed" then
									suffix = rsub(suffix, "(" .. vowel_c .. ")", "%1" .. AUTOGRAVE, 1)
								elseif pos_stress == "stressed" then
									error("Primary stress not allowed for suffixes (suffix=" .. suffix .. ")")
								else
									error("Unrecognized stress spec for pos=" .. pos .. ", suffix=" .. suffix .. ": " .. pos_stress)
								end
								table.insert(retparts, insert_position, suffix)
								broke_suffix = true
								break
							end
						end
					end
				end
				if not broke_suffix then
					break
				end
			end
		end

		local acc = rfind(parts, "(" .. stress_accent_c .. ")")
		if acc == com.CFLEX then
			-- remove circumflex but don't accent
			parts = gsub(parts, com.CFLEX, "")
		elseif acc == com.ACUTE or acc == AUTOACUTE then
			saw_primary_stress = true
		elseif not acc and parts ~= "<" and parts ~= ">" then
			-- Add primary or secondary stress on the part; primary stress if no primary
			-- stress yet, otherwise secondary stress.
			acc = saw_primary_stress and AUTOGRAVE or AUTOACUTE
			saw_primary_stress = true
			parts = rsub(parts, "(" .. vowel_c .. ")", "%1" .. acc, 1)
		end
		table.insert(retparts, insert_position, parts)
		i = i + 2
	end

	-- remove any +, which has served its purpose
	for i, part in ipairs(retparts) do
		retparts = gsub(part, "%+", "")
	end
	return retparts
end

local function break_vowels(vowelseq)
	local function check_empty(char)
		if char ~= "" then
			error("Something wrong, non-vowel '" .. char .. "' seen in vowel sequence '" .. vowelseq .. "'")
		end
	end

	local vowels = {}
	local chars = strutils.split(vowelseq, "(" .. vowel_c .. accent_c .. "*)")
	local i = 1
	while i <= #chars do
		if i % 2 == 1 then
			check_empty(chars)
			i = i + 1
		else
			if i < #chars - 1 and com.diphthongs[
				rsub(chars, stress_accent_c, "") .. rsub(chars, stress_accent_c, "")
			] then
				check_empty(chars)
				table.insert(vowels, chars .. chars)
				i = i + 3
			else
				table.insert(vowels, chars)
				i = i + 1
			end
		end
	end
	return vowels
end

-- Break a word into alternating C and V components where a C component is a run
-- of zero or more consonants and a V component in a single vowel or dipthong.
-- There will always be an odd number of components, where all odd-numbered
-- components (starting from 1) are C components and all even-numbered components
-- are V components.
local function break_into_c_and_v_components(word)
	local cons_vowel = strutils.split(word, "(" .. vowel_or_accent_c .. "+)")
	local components = {}
	for i = 1, #cons_vowel do
		if i % 2 == 1 then
			table.insert(components, cons_vowel)
		else
			local vowels = break_vowels(cons_vowel)
			for j = 1, #vowels do
				if j == 1 then
					table.insert(components, vowels)
				else
					table.insert(components, "")
					table.insert(components, vowels)
				end
			end
		end
	end
	return components
end

local function split_into_syllables(word)
	local cons_vowel = break_into_c_and_v_components(word)
	if #cons_vowel == 1 then
		return cons_vowel
	end
	for i = 1, #cons_vowel do
		if i % 2 == 1 then
			-- consonant
			local cluster = cons_vowel
			local len = ulen(cluster)
			if i == 1 then
				cons_vowel = cluster .. cons_vowel
			elseif i == #cons_vowel then
				cons_vowel = cons_vowel .. cluster
			elseif rfind(cluster, "%.") then
				local before_break, after_break = rmatch(cluster, "^(.-)%.(.*)$")
				cons_vowel = cons_vowel .. before_break
				cons_vowel = after_break .. cons_vowel
			elseif len == 0 then
				-- do nothing
			elseif len == 1 then
				cons_vowel = cluster .. cons_vowel
			elseif len == 2 then
				local c1, c2 = rmatch(cluster, "^(.)(.)$")
				if c1 == "s" and c2 == "ċ" then
					cons_vowel = "sċ" .. cons_vowel
				else
					cons_vowel = cons_vowel .. c1
					cons_vowel = c2 .. cons_vowel
				end
			else
				-- check for onset_3 preceded by consonant(s).
				local first, last3 = rmatch(cluster, "^(.-)(...)$")
				if #first > 0 and com.onsets_3 then
					cons_vowel = cons_vowel .. first
					cons_vowel = last3 .. cons_vowel
				else
					local first, last2 = rmatch(cluster, "^(.-)(..)$")
					if com.onsets_2 or (com.secondary_onsets_2 and not first:find("$")) then
						cons_vowel = cons_vowel .. first
						cons_vowel = last2 .. cons_vowel
					else
						local first, last = rmatch(cluster, "^(.-)(.)$")
						cons_vowel = cons_vowel .. first
						cons_vowel = last .. cons_vowel
					end
				end
			end
		end
	end

	local retval = {}
	for i = 1, #cons_vowel do
		if i % 2 == 0 then
			-- remove any stray periods.
			table.insert(retval, rsub(cons_vowel, "%.", ""))
		end
	end
	return retval
end

-- Combine syllables into a word, moving stress markers (acute/grave) to the
-- beginning of the syllable.
local function combine_syllables_moving_stress(syllables, no_auto_stress)
	local modified_syls = {}
	for i, syl in ipairs(syllables) do
		if syl:find(com.ACUTE) or syl:find(AUTOACUTE) and not no_auto_stress then
			syl = "ˈ" .. syl
		elseif syl:find(com.GRAVE) or syl:find(AUTOGRAVE) and not no_auto_stress then
			syl = "ˌ" .. syl
		elseif i > 1 then
			syl = "." .. syl
		end
		syl = rsub(syl, stress_accent_c, "")
		table.insert(modified_syls, syl)
	end
	return table.concat(modified_syls)
end

-- Combine word parts (split-off prefixes, suffixes or parts of a compound word)
-- into a single word. Separate parts with ⁀ and the put ⁀⁀ at word boundaries.
local function combine_parts(parts)
	local text = {}
	for i, part in ipairs(parts) do
		if i > 1 and not rfind(part, "^") then
			-- Need a syllable boundary if there isn't a stress marker.
			table.insert(text, "." .. part)
		else
			table.insert(text, part)
		end
	end
	return "⁀⁀" .. table.concat(text, "⁀") .. "⁀⁀"
end

local function transform_word(word, pos, no_auto_stress)
	word = com.decompose(word)
	local parts = split_on_word_boundaries(word, pos)
	for i, part in ipairs(parts) do
		local syllables = split_into_syllables(part)
		parts = combine_syllables_moving_stress(syllables,
			no_auto_stress or (#parts == 1 and #syllables == 1))
	end
	return combine_parts(parts)
end

local function default_pos(word, pos)
	if not pos then
		-- verbs in -an/-ōn/-ēon, inflected infinitives in -enne
		if rfind(word, "n$") or rfind(word, "ēon$") or rfind(word, "enne$") then
			pos = "verb"
		else
			-- adjectives in -līċ, adverbs in -līċe and nouns in -nes can follow
			-- nouns or participles (which are "verbal"); truncate the ending
			-- and check what precedes
			word = rsub(word, "^(.*" .. vowel_c .. ".*)le?$", "%1")
			word = rsub(word, "^(.*" .. vowel_c .. ".*)nss?$", "%1")
			-- participles in -end(e)/-en/-ed/-od, verbal nouns in -ing/-ung
			if rfind(word, "ende?$") or rfind(word, "d$") or rfind(word, "en$")
				or rfind(word, "ng$") then
				pos = "verbal"
			else
				pos = "noun"
			end
		end
	elseif pos == "adj" or pos == "adjective" then
		pos = "noun"
	elseif pos ~= "noun" and pos ~= "verb" and pos ~= "verbal" then
		error("Unrecognized part of speech: " .. pos)
	end
	return pos
end

local function generate_phonemic_word(word, pos)
	word = gsub(word, "$", "")
	word = rsub(word, "%", char_to_explicit_char)
	pos = default_pos(word, pos)
	local is_prefix_suffix
	if word:find("^%-") or word:find("%-$") then
		is_prefix_suffix = true
		word = gsub(word, "^%-?(.-)%-?$", "%1")
	end
	word = transform_word(word, pos, is_prefix_suffix)
	word = apply_rules(word, phonemic_rules, pos)
	return word, pos
end

function export.phonemic(text, pos)
	if type(text) == "table" then
		pos = text.args
		text = text
	end
	local result = {}
	text = ulower(text)
	for word in rgsplit(text, " ") do
		local phonemic, respos = generate_phonemic_word(word, pos)
		table.insert(result, phonemic)
	end
	result = table.concat(result, " ")
	result = rsub(result, ".", explicit_char_to_phonemic)
	return gsub(result, "⁀", "")
end

function export.phonetic(text, pos)
	if type(text) == "table" then
		pos = text.args
		text = text
	end
	local result = {}
	text = ulower(text)
	for word in rgsplit(text, " ") do
		local phonemic, respos = generate_phonemic_word(word, pos)
		word = apply_rules(phonemic, phonetic_rules, respos)
		table.insert(result, word)
	end
	return gsub(table.concat(result, " "), "⁀", "")
end

function export.show(frame)
	local parent_args = frame:getParent().args
	local params = {
		 = { required = true, default = "hlǣf-dīġe", list = true },
		 = {},
		 = {},
	}
	local args = require("Module:parameters").process(parent_args, params)

	local IPA_args = {}
	for _, arg in ipairs(args) do
		local phonemic = export.phonemic(arg, args.pos)
		local phonetic = export.phonetic(arg, args.pos)
		table.insert(IPA_args, {pron = '/' .. phonemic .. '/'})
		if phonemic ~= phonetic then
			table.insert(IPA_args, {pron = ''})
		end
	end

	local anntext
	if args.ann == "1" then
		anntext = {}
		for _, arg in ipairs(args) do
			-- remove all spelling markup except ġ/ċ and macrons
			arg = rsub(com.decompose(arg), "", "")
			arg = rsub(arg, "%", char_to_spelling)
			m_table.insertIfNot(anntext, "'''" .. arg .. "'''")
		end
		anntext = table.concat(anntext, ", ") .. ":&#32;"
	elseif args.ann then
		anntext = "'''" .. args.ann .. "''':&#32;"
	else
		anntext = ""
	end

	return anntext .. m_IPA.format_IPA_full { lang = lang, items = IPA_args }
end

return export