Wiktionary talk:Votes/2019-05/Lemmatizing Akkadian words in their transliteration

Oppose

Latest comment: 5 years ago23 comments3 people in discussion

Very few people actually know cuneiform script

People who do not know cuneiform cannot add entries for it reliably either, they do not know what is an alternative form or what is an alternative reading or what is the same word. After allowing Latin script entries the strangest things will be added. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Actually, having people write in a script they are not familiar with, in a language they don't speak, is a lot more likely to go wrong that just writing the transliteration that was given in the source book they just read. – Tom 144 (𒄩𒇻𒅗𒀸) 20:03, 11 May 2019 (UTC)Reply

With Egyptian the opposite has been the case. Editors banned because of having no clue of Egyptian but copying over transcriptions wrongly. Fay Freak (talk) 20:05, 11 May 2019 (UTC)Reply

Late comment, but for users who aren't editors (I believe these exist as well), it's certainly much easier to type in a Latin transliteration than cuneiform. The difference with Sanskrit is that lots of languages are still written in Nagari script and typing in that script is reasonably accessible. פֿינצטערניש (Fintsternish), she/her (talk) 12:02, 15 August 2019 (UTC)Reply

Egyptian

Egyptian will be moved to Egyptian script when Unicode is ripe. Opening Latin script for Akkadian is a step backwards. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Books

Books and journals have suffered antiquated technology, and lacking oversight because even if a scholar could write it the editor, printer or whoever needed to handle it, hence one made all easier and dirtier with transcriptions and transliterations. If one could have printed cuneiform from the 1860s with ease all would be in cuneiform with transcriptions. But the way of naming the signs used in the originals has failed because the names of signs are ambiguous, so that one needs sign lists (Zeichenlexika). It’s all worse than Arabic in Latin script, because the transcriptions of Arabic are clear, whereas with Akkadian we have transliterations, the naming of the cuneiform signs used, and furthermore ambiguous transcriptions. This should never have been done in the first place, technology forced it. Technology does not force it anymore. People are just too inert to make a cut to the twentieth century and start to do things differently like one always would have if one could have. Like world still suffers copyright though the advantages in processing data without it would be immense, only because many cling to the old ways. So few progressive people. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Books include cuneiforms in their texts, but not to display every single word, because it's cumbersome to readers who rely on transliterations. It's pointless to list words in cuneiforms to someone who can not read them. But they would still be displayed. - Tom 144 (𒄩𒇻𒅗𒀸) 19:59, 11 May 2019 (UTC)Reply

Oh well, the bulk of Arabic and Sanskrit dictionaries is in native script, some with transcriptions and some without, and of those with both some are sorted according to the native script and some after the transcriptions, though there are those which are without native script. But in general dictionaries are not aimed at people who know nothing of the language at all and are only there to harvest for their comparative studies, this is a sad profession. Typically, or normatively, one learns the script, learns about the grammar, then one reads texts and uses dictionaries for them. Fay Freak (talk) 20:10, 11 May 2019 (UTC)Reply

Well, those languages are alive, and they are written in scripts that are still used today. For cuneiforms written languages, textbooks do not advice you to learn the script first, but to learn as you study the language, because otherwise, you are never going to finish. –Tom 144 (𒄩𒇻𒅗𒀸) 20:26, 11 May 2019 (UTC)Reply

Search

One can still search transcriptions. Just use the search function and hedge the search as it lists you. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Whats the point of having them lemmatized at cuneiforms if we will still search for transcriptions? – Tom 144 (𒄩𒇻𒅗𒀸) 19:54, 11 May 2019 (UTC)Reply

I think it is easier for editors to organize around what’s attested. We can have various transcription and transliteration entries (non-lemma, like Pinyin), but nothing will depend on any system or reading. Fay Freak (talk) 19:59, 11 May 2019 (UTC)Reply

It also makes it unnecessarily difficult to look for words in categories

Typical problem if you are not fluent in a script. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Yes, it is particularly difficult when the number of signs necessary to know to have a minimum understanding of the language adds up to 200 and there can be up to five different versions of each one depending of the period, area, or the scribe. Tom 144 (𒄩𒇻𒅗𒀸) 19:52, 11 May 2019 (UTC)Reply

Not all users might be able to display the signs properly

If they have not enabled themselves then they don’t care enough either. They are even free too, as in particular the Chicago Assyrian Dictionary. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

Cuneiform signs changed depending on the period, so they do not accurately represent words of all times.

Transcriptions do not accurately represent the words either, caused by the readings being ambiguous. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

@Fay Freak Actually, I was arguing about the fonts, not about the words. Signs passed throught various stages (OB lapid, various different cursives, and Neo assyrian), I haven't finished editing the Rationale, so I see why you might have thought I was talking about the phonological representations of the signs, but in my defence, I added a note that said I was still working on it. So I'd appreciate if you'd let me finish before trying to refute my arguments. – Tom 144 (𒄩𒇻𒅗𒀸) 19:48, 11 May 2019 (UTC)Reply

A similar thing applies to transliterations: There are multiple systems, schools, complications. And yes, actual you were talking about phonological representations. I realized you changed “transcriptions”, while I wrote, to “transliteration”, but there was little to change in my arguments.

But hm, transliterations make the matter worse on one side because, as I think, people rather search transcriptions, hardly ever transliterations. Fay Freak (talk) 19:56, 11 May 2019 (UTC)Reply

But transcriptions do not go one-to-one to cuneiform characters, so as you said, information about the spelling would be lost (do to ambiguities). By keeping them in transliterations you assure legibility, and soft redirects can be made from transcriptions. –Tom 144 (𒄩𒇻𒅗𒀸) 20:10, 11 May 2019 (UTC)Reply

If we can do that we can also have the cuneiform as lemmas. Or do we want people who have no clue about the underlying writing system to add masses of transliteration entries, and this is what you want to further? Fay Freak (talk) 20:13, 11 May 2019 (UTC)Reply

All our cuneiform entries rely on transcriptions, because thats how words are written in the sources we provide. There is accutally little knowledge required about the srcipt to contribute to wiktionary given that in worst case scenario you can simply use the module typing-aids and input the transliteration. Pages have to include the transliteration anyway, so why sacrifice legibility lemmatizing in cuneiforms? –Tom 144 (𒄩𒇻𒅗𒀸) 20:21, 11 May 2019 (UTC)Reply

Why do we even care what the others do?

If we only have what others have people can just go to the others. Fay Freak (talk) 19:34, 11 May 2019 (UTC)Reply

We care because it's the very source we rely on, we can not give them what others do not have, because our content depends on theirs. But again, cuneiforms would still be displayed, so we wouldn’t really stop providing content that we provide now. – Tom 144 (𒄩𒇻𒅗𒀸) 15:48, 13 May 2019 (UTC)Reply

Proposal

Latest comment: 6 years ago10 comments3 people in discussion

@JohnC5 and I were discussing this, and there are two proposals I'd like to put forth, neither really being mutually exclusive:

Adding transcriptions where ever possible, including category pages and perhaps headers. Because of the way categories are processed, this would need to be done using Javascript, but I would support doing this for Sanskrit as well.
Creating lemmatized entries as reconstructions rendered in transcriptions, with the attested declensions displayed in the declension table. So, for example, the lemmatized entry for 𒊓𒄠𒋢𒌝 (ša₁₀-am-šu₁₁-um /⁠šamšum⁠/) would be found at *šamšu. Unless you use transcriptions, the only issue that @Tom 144 brings up that will be resolved is legibility, and that can be done using the first proposal without the need of the second.

@Canonicalization, Dan_Polansky, Emascandam, Mahagaja, Mofvanes, Nicole Sharp, Suzukaze-c, Tom 144, ჯეო --{{victar|talk}} --{{victar|talk}} 17:41, 6 July 2019 (UTC)Reply

I don’t understand anything of it. What does “lemmatized” mean and why will one suddenly have the word for the sun in the reconstruction namespace? @Tom 144 is making it more complicated than ever. It used to be easy: Just write and search what the language’s users actually wrote, and execute the Unicode Consortium’s will. Now you want “systems” and “transcriptions” which can be saved in many ways. The transcriptions are more complicated, unnatural, than the actual script ever was. Unicode is the tool to get back to the roots. Also I am interested what @Profes.I. who has added cuneiform spellings says to all that. Would it be more complicated? Fay Freak (talk) 18:57, 6 July 2019 (UTC)Reply

@Fay Freak: "What I want"? You're preaching to the choir here -- I'm just trying to find a more practical middle around. I'm not sure why you think lemmatized needs to be in quotes, like it's some foreign concept, and if you read my comments, you'll see that I'm anti-lemmatization. But if my hand is forced and transliterations/transcriptions are made the norm, I think, on both grounds, they should be made reconstructions, because that's really what they are, in the end. --{{victar|talk}} 20:06, 6 July 2019 (UTC)Reply

@Victar: I know that you want not, here you see the defect English has by merging thou and ye. Transcriptions are probably not reconstructions since they are replacement names for cuneiform signs. However they are uncertain for other reasons. A cuneiform sign has multiple names given to it, if one has ever looked into one of the sign lexica, and after this vote passed one still does not know what can be done. If a word has multiple transcriptions because of multiple sign names these are not properly alternative forms either, because the underlying form is the same. I think this vote is illegal and ineffective even if it passes because of being too inspecific. Anyone, anyone who can, can also create Unicode Akkadian entries since he can say rightly “I do not understand what transscription system or encoding, this is all not thought through”. Fay Freak (talk) 20:16, 6 July 2019 (UTC)Reply

@Victar: I find interesting lemmatizing at reconstructions in order to lemmatize at transcriptions, but I believe it'd be better to do it only for words that are written exclusively with logograms, since one could say that they aren't explicitly attested. And keeping syllabically attested entries at the mainspace. I understand that you probably wouldn't agree with this though. – Tom 144 (𒄩𒇻𒅗𒀸) 19:02, 12 July 2019 (UTC)Reply

@Tom 144: There was a lot more said here. Can you try and address the entrity of it? --{{victar|talk}} 20:18, 12 July 2019 (UTC)Reply

@Victar: I guess I didn't understand in what way was the first proposal any different to the way we handle transcriptions today. – Tom 144 (𒄩𒇻𒅗𒀸) 06:12, 13 July 2019 (UTC)Reply

So what will be the page titles? On which “transliteration” will the entry for the word for “threshing floor; lot; site; earnest money” transcribed “maškanu” be? You can write it 𒈦𒃷 (/⁠MAŠ.GAN₂, MAŠ.KAN₂, MAŠ.KÁN, MAŠ.GÁN⁠/) with many transliterations for the same sign, and these two cuneiform signs are one of many ways to express the same word in cuneiform, there are several logographic and syllabic ways to write it and each has multiple transliterations. Other words have hugely differing transcriptions: the word for black cumin has zibibiānu, sibibiānu, zabibânu, sabubânu, šibibânu, šibibiānu, šipipiānu, zizibibiānu in the CAD, volume 2 page 102, many are plausible. You don’t even know what an alternative form is, going from transliterations, or from transcriptions. The possible lemmatization practices contravene WT:EL. Will you have sections like “alternative transliterations”?

And now we are told that you do not even know which namespace is to use. You guys are even unsure what is “explicitly attested”, which would not happen that way with the original cuneiform. Important questions for the architecture of Wiktionary which have not been important for printed cuneiform scholarship. You have not seen how you would fit in the things that have been printed on paper into the structure of Mediawiki.

Apart from possible TOS violations pursued by this vote because of distorting the principal direction the Wiktionary environment is provided for (there are certain applications which the software under this domain is provided for and practices that exceed the permitted frame, which I will not pursue here), this vote is not binding or effective anyhow because of being underspecified. You won’t be able yourself to put it into practice in a way that satisfies you if you want to do it in a serious extent. And if one tries, people can only watch chaos ensuing. Fay Freak (talk) 21:15, 12 July 2019 (UTC)Reply

@Fay Freak:It is conventional in the field to reference a sign for its voiced form, so MAŠ.GÁN would be the way to go. For example you will never see UD being referenced as UT. Why wouldn't I know what an alternative form is? From the transliteration you can unambiguously get the cuneiform script, given that it offers more information than the cuneiform. If I can tell alternative forms from alternative readings in cuneiform, then why wouldn't I able to do it for transliterations? Most of the multiple forms of zibibiānu you listed are actually multiple orthographies, not multiple readings, so if you lemmatized at cuneiform, you'd still face the same problem. Also, you can see in the CAD yourself that words that have not been syllabically attested are marked with an asterisk, so I do not understand how using the reconstructed namespace would contradict important scholarship, as you believe. – Tom 144 (𒄩𒇻𒅗𒀸) 06:12, 13 July 2019 (UTC)Reply

Writing MAŠ.GÁN , MAŠ.GAN₂ or whatever in printed works is fine. On Mediawiki you have to have a certain page title, and the discussion between various tranliterations and their encodings is arbitrary. And some of the signs like U+2080–U+2089 aren’t much easier to type than cuneiform either. Hence it is more straightforward to just have the entries under cuneiform. It is a novelty, and I dare say in my view an abuse, to use the reconstruction namespace for terms that are attested, while only their transcription or transliteration is not (whatever, we are confusing them again): having terms only in the reconstruction namespace because the transcription is uncertain while the spelling is known is topsy-turvy and turns the things upside-down. Fay Freak (talk) 11:22, 13 July 2019 (UTC).Reply