The most fraught aspect of transliteration is in automated declension tables, where manual correction will be most difficult. In general, transliteration from the Lao script will be most difficult, as different writing systems lose different information as compared to the more westerly scripts (Burmese, Latin etc.). There are also differences with what we want to do with transliterations. Where inflection tables derive or present subordinate lemmas, we want to link to the form of the lemma in the 'main' script where information is centralised, namely to the Latin script. (Perhaps Sanskrit has some design ideas - transliteration and main script will always require different strings, one Latin and one Devanagari.)
For transliteration to Latin, we frequently do not need to know the precise writing system. For example, we do not need to know which of the various Burmese script writing systems is being transliterated from. The plan of action is as follows. At all stages, the working system should continue to work for almost every case. (Although transliteration from Thai have problems, they arise in relatively few places.)
@Erutuon, Octahedron80, Wyang, Bhagadatta, AryamanA, -sche, Erutuon, Atitarev, Benwing2, Victar, Mahagaja As there will be apparently pointless changes to transliteration and inflection tables as I change one module at a time while preserving a working system, be dimly aware that my intended steps are: (belatedly signed) RichardW57 (talk) 18:47, 10 May 2021 (UTC)
Note that Module:languages/data2 has recently been renamed Module:languages/data/2. --RichardW57m (talk) 09:35, 16 March 2023 (UTC)
These are passed into the inflection tables using optional argument |impl=
, which takes the values yes, no and both. This option will be passed down to orJoin() as a field impl of argument options and thence to trwo(). A value of both will be treated as not specifying any value.RichardW57m (talk) 12:12, 7 May 2021 (UTC)
These are passed into the inflection tables using optional arguments |impl=
(as for Thai), |y=
, whose permitted values are both, yaa, yung and synonyms thereof, and, for nominal inflection, |liap=
. The option |liap=
does not seem to be helpful, as we will not be transliterating PHO TAM as 'bh'.
The option |y=
specifies whether Latin 'y' in inflections is reflected in Lao script as the letter YO (option yaa) or NYO (option yung).
The algorithm for transliterating Lao letter NYO will be:
|y=
is undefined, then if YO is present, default it to yaa, else to yung.There will also be an algorithm to detect the usage of nuktas.RichardW57m (talk) 12:12, 7 May 2021 (UTC) RichardW57m (talk) 12:12, 7 May 2021 (UTC)
Thai Mon Pali (i.e. Burmese script Pali in the older Mon tradition of Thailand) appears to use ည MYANMAR LETTER NNYA as a single letter, so we may come to need to tag that in Burmese script transliteration. Clumsier ways round might be better. RichardW57m (talk) 12:12, 7 May 2021 (UTC)
Three schemes will ultimately be available:
|subst=
parameter of {{ux}}
. The third option may be necessary if an inflected word is written inconsistently. RichardW57m (talk) 12:12, 7 May 2021 (UTC)
@Octahedron80, Benwing2: The transliteration is now successfully activated, and seems to be working. For example, it transliterates the usage examples at တွံ (tvaṃ). There are some bits of mopping up to do:--RichardW57m (talk) 13:58, 8 June 2021 (UTC)
Compare the headwords for Pali and Sanskrit at ទាស. From previous comments, it seems Benwing would prefer the layout for Pali, with transliteration only in the first definition line; with it being implicitly assumed by default that the Roman script equivalent is the transliteration. (See ທັມມະ (damma) and သံဃ (saṃgha) for examples of where it isn't. On this basis, we need to fix entries which are using {{head}}
instead of Pali headwords. They fall into two categories:
|tr=-
Which is the better option?
Presumably the Sanskrit headwords also need to be fixed. So far as I am aware, all Sanskrit headwords are implemented by Sanskrit-specific templates.
Cases where transliteration is not the same as the Roman script form of the word are now being put in cat:Terms with redundant transliterations/pi. How should we handle them? One possibility I can think of is to supply the headword parameter with the option |tr=+
to say, "Contrary to normal practice for Pali, supply the transliteration". (If not supplied with |tr=
, Module:pi-headword defaults it to |tr=-
. This would need be interpreted by Module:pi-headword rather than by Module:headword. Another option to note that this is simply a list of words whose Roman script forms and transliterations are different, and think about how to rename it. Such a category might also apply to Sanskrit in some fashion - it depends on how systematic alternative spellings are to be handled. (Common examples are anusvara v. homorganic nasal and gemination of the consonant in rC clusters.) --RichardW57m (talk) 13:58, 8 June 2021 (UTC)
A typical example is, for a word such as ສຣີຣໍ (sarīraṃ), a definition line
# {{pi-sc|Lao|sarīraṃ}}, ''which is'' {{inflection of|pi|ສຣີຣ|tr={{l|pi|sarīra}}||nom//acc|s|t=body}}
which yields
This definition succinctly links to both the Lao script stem (where the Lao script inflection will be found), and to the Roman script stem, where other meanings and general information will be recorded. Unfortunately, the transliteration marked up as a link is deemed not to be the same as the transliteration of the Lao stem. Consequently, the page is placed in cat:Terms with manual transliterations different from the automated ones/pi.
@Benwing2, Octahedron80: What I want is something like the link_tr property of language objects, that causes transliterations to be converted to links, but on a selective basis. There is a dirty trick to get what I want, which is to convert the template invocation to
{{inflection_of|en|ສຣີຣ#Pali|tr={{l|pi|sarīra}}|sc=Thai||nom//acc|s|t=body}}
What I would like is something like
{{inflection_of|pi|ສຣີຣ|link_tr=1||nom//acc|s|t=body}}
to provide what I want. Thoughts?
@Benwing2, Octahedron80: Pali words in the Thai and Lao scripts cannot always be transliterated without knowing the writing system. For inflection tables, where manual overrides would be horrendously tedious, I use the trwo entry point to the module. I then need to pass the transliteration down to full_link() in Module:links. Is there any good was to disable the automatic generation and comparison? Delving into the code, it looks as though calling links.full_links(term, nil, nil, true) would work to suppress categorisation as having a redundant transliteration, but the fourth argument seems to be purely for internal calls. I also want mismatches to be accepted - what full_links() calls a manual transliteration (variable manual_tr) is actually itself an 'automatic' transliteration, but one that uses knowledge of the writing system in use. I'm currently using the above trick (in function orJoin() in Module:pi-decl/noun) to avoid the check. --RichardW57 (talk) 21:21, 8 June 2021 (UTC)
Repinging @Benwing2, Octahedron80 as the previous ping may not have worked. --RichardW57 (talk) 08:04, 2 July 2021 (UTC)
ś and ṣ are not used by Pali and should be removed. Pali uses only s. --Octahedron80 (talk) 00:39, 8 June 2021 (UTC)
Assuming @Octahedron80 knows what he is talking about (quotations would be good), we have a problem with transliterating this vowel sign. While in most Burmese script writing systems for Pali, the correct transliteration is as ī (e.g. ဂီတ (gīta)), he claims that the normal Mon way of writing Pali iṃ in Pali is to use SIGN II. (That's not borne out by the writings of @咽頭べさ, for which see the example at တသ္မိံ (tasmiṃ) and the source of the example.) Therefore, this cannot be handled by the tr() interface. The natural method is to treat it as an exception and input the transliteration manually. --RichardW57 (talk) 04:54, 2 July 2021 (UTC)
Please note that the masculine/neuter locative singular seems to be an invalid example - Mason reports that the Burmese used -smi/-smi(ṃ) and -mhi/mhi(ṃ) indiscriminately, and given the poor discrimination between final -i and -ī, I wouldn't place a lot of trust in random manuscripts. --RichardW57 (talk) 04:54, 2 July 2021 (UTC)
Languages | I | Ī | IṂ | ĪṂ |
---|---|---|---|---|
Burmese/Shan Pali & Burmese | ကိ | ကီ | ကိံ | ကီံ (Burmese only) |
Mon Pali & Mon | ကိ | ကဳ | ကီ (or ကိံ also happen?) | not happened |
|instr=
parameter will be quick fix method to start implementing it - convert II to <I, Ṃ> before applying transliteration, but an inflection table option will be the nicer solution. (I haven't encoded that parameter yet; I've written a test case at User:RichardW57/sandbox.) I would like non-Shan Burmese verbal inflection tables to show it as a footnoted alternative form for the first person singular of the aorists in -i. I must start working on adding footnotes to inflection tables. I believe we need it to be possible for entries to add footnotes to inflection tables. At the moment my technical problem is how one should specify footnotes via the templates parameters. --RichardW57 (talk) 06:04, 2 July 2021 (UTC)|subst=
has now been added to {{pi-decl-noun}}
, {{pi-conj-special}}
and {{pi-conj-future}}
. --RichardW57 (talk) 19:02, 3 July 2021 (UTC)@RichardW57 Is the issue that some transliterations were not displaying due to the fact I made incomplete transliterations return nil? If so, that was working as intended, and you should find a more suitable way of displaying problematic characters in the output that doesn't involve displaying the raw input characters with the Latin (i.e. wrong) script code. Thank you. Theknightwho (talk) 00:03, 5 March 2023 (UTC)