@Erutuon Hello, hello. I though you might enjoy a break your normal work to do an easy side project. A while ago while working on the long-stalled Sanskrit declension module at Module:User:JohnC5/Sandbox, I realized that doing the entire module in Devanagari was too annoying given its alphabetic properties. Also, someone (I don't recall who) asked me to make this module be extensible to other scripts. This led me to the realization that I should have the internals of the module be in IAST with transliterators and reverse-transliterators at either end. I was wondering whether you could either make me an IAST to Devanagari transliterator or make a module-accessbile entrance point to this code. The one extra requirement would be for this to ignore accents on the letters (so ápas would become अपस्). Theoretically in the future, adding coverage for other scripts would just involve creating transliterators for other scripts and plugging them in. Does this make more sense as a separate module or as functionality of this one? Thanks. —JohnC5 18:20, 30 August 2017 (UTC)
@Svartava, Kutchkutch There were problems in Latn to Deva conversion affecting inflection tables and PoS headers affecting short vowels and retroflex laterals. I haven't investigated the question of short vowels in Prakrit in the Kannada script - I haven't found much evidence, beyond the fact that the Kannada script was using the long vowel symbols for Sanskrit /e/ and /o/ by the end of the 19th century, so I haven't tested its handling of short vowels. --RichardW57 (talk) 22:58, 6 June 2022 (UTC)
The inflection code used and still uses the transliteration data tables
The most striking difference between Module:typing-aids/data/sa and Module:typing-aids/data/inc-pra-Deva is that the former converts ĕ and ŏ to ए and ओ whereas the latter converts them to ऎ and ऒ. I propose we keep it that way. The code in Module:typing-aids/data/sa needs to be modified to work out whether 'l' with underdot is a consonant or a vowel. I'm pretty sure that as a vowel, it does not occur next to a vowel except for a very small chance after 'a' (MW documented a case of initial ar̥-), and as a consonant ḷ occurs in only one cluster, ḷh. That should be enough to cope with real words. I've replaced at least one usage of the 'inc-pra-Deva' code in the headword template family by 'sa', improving a headword link as a result.
My changes in this area were started on 6 June, and have been done under user names RichardW57 and RichardW57m.
If disambiguation logic fails, the inflection module when outputting Devanagari can first do a global edit to replace 'ḷ' by 'L', and Module:typing-aids/data/sa be made to recognise that as the consonant. The headword templates contain an override parameter |deva=
, so the system can work if I can't produce an adequate resolution algorithm. --RichardW57 (talk) 22:58, 6 June 2022 (UTC)
module's testcases
, are you referring to Module:typing-aids/testcases? Kutchkutch (talk) 01:16, 8 June 2022 (UTC)
@RichardW57: I was looking at local Lvowels = "āeĕēiïīoŏōuŭuṛṝl̥̄l̥ḹ"
, which you introduced here and it contains two grapheme clusters that have multiple code points:
and
mw.ustring
patterns operate on code points, so patterns that use it don't work as intended. For instance, local Lvowel1 = ""
and local Lvowel2 = ""
are going to match (among other things) a small letter l or a combining ring below or a combining macron. {{subst:chars|m|sa|ḷhl}}
resolves to ळ्ह्ल् (ḷhl) because of {"(ḷ)(h?"..Lvowel2..")", "L%2"}
, even though the last letter isn't a vowel. That's a contrived example that might not ever occur in real inputs, but it illustrates what's going on. Maybe you will have a better idea whether Lvowel1
and Lvowel2
will match where they shouldn't or not match and cause real inputs to the function to give the wrong outputs. — Eru·tuon 17:54, 19 September 2023 (UTC)