Hello, you have come here looking for the meaning of the word Template talk:redlink category. In DICTIOUS you will not only get to know all the dictionary meanings for the word Template talk:redlink category, but we will also tell you about its etymology, its characteristics and you will know how to say Template talk:redlink category in singular and plural. Everything you need to know about the word Template talk:redlink category you have here. The definition of the word Template talk:redlink category will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofTemplate talk:redlink category, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.
Latest comment: 5 years ago68 comments7 people in discussion
This approach is the wrong way to solve the problem, and it causes more problems by virtue of its existence.
By having the server do the heavy lifting of parsing and looking up whether or not a title exists for a language we use up a finite resource (Lua memory/processing time) and needlessly add a ton of computation to the project. The computation is repeated far more often than is necessary. Since the Lua memory is finite and when exhausted causes the pages to load incorrectly, this harms the user experience and causes additional work for editors in tracking down and refactoring pages which have errors. Over time this issue will only increase as more pages reach the Lua memory limit.
This does not need to be accomplished in real time. Using a module for this task is akin to going to the grocery store once for every ingredient as you are trying to cook a recipe. The better strategy is to go to the store periodically and save lots of time and resources. This task would be easy to run periodically, and in a way which could be more complete and require less effort.
The results, while useful, are far from necessary. If this module accomplished something which could not otherwise be accomplished it still wouldn't justify its existence. Breaking page after page over time is more problematic than not having a convenient, language-specific list of red links ready to hand. Fortunately there are other, easier ways to accomplish this task which have less downside, so the point isn't terribly relevant.
I suggest that we get rid of this method of generating red-link categories, and instead move the idea to the toolserver with a group project that periodically (every time a dump is released, conservatively monthly) regenerates such lists without the need for exceptions. All words in all languages could easily be covered, without the need for ever-growing exception lists. If the project were done on the toolserver it could be maintained and operated by a group rather than an individual, so the problem of individual-point-of-failure is reduced. I am willing to help with the implementation, and I know for a fact that others have already implemented the same concept at least half a dozen times, so there are others who are capable of doing so as well. - TheDaveRoss12:33, 14 March 2019 (UTC)Reply
Sounds sensible, and I would be in favour of such a solution if and only if someone is willing to actually implement it - I personally don't have the technical know-how. As it stands, I find these categories really useful (especially the translation redlink categories, which allow me to weed out nonsensical neologisms in dead languages very easily), and if possible would very much like there to be a replacement if it is to be deleted. — Mnemosientje (t · c) 13:07, 14 March 2019 (UTC)Reply
Agreed. The fact that it's limited to a few languages makes it less useful. I'm willing to help implement this. – Jberkel14:41, 14 March 2019 (UTC)Reply
Good idea. It all depends on whether someone wants to work on it, but I imagine that using the dump more interesting data could be generated, like most-redlinked entries in particular languages, and redlink checking could be extended to templates that don't currently have it like {{der}}, {{der3}}, {{alter}}. I'm interested in learning how this would be implemented; maybe I could contribute. — Eru·tuon23:19, 18 March 2019 (UTC)Reply
Having most-redlinked stats will be great to determine which new entries to create first. Unfortunately we don't have full HTML dumps (phab:T182351) yet, so if we work with XML dumps we need to add special parsing logic for some of the templates you mentioned, which means we won't get 100% link coverage., but it should be good enough. – Jberkel11:41, 19 March 2019 (UTC)Reply
Oh, this category is only used in {{m}}, {{l}} and {{t}}, so any static approach should deliver the same results without a lot of extra effort (I first assumed that the checking happened in the bowels of Module:links or similar). Jberkel13:03, 20 March 2019 (UTC)Reply
I've created a program for printing all instances of a particular template. I imagine it might simplify things to have lists of all instances of each linking template. The program could be modified to print each relevant template (and its redirects) to a separate file or it could be run multiple times because for me at least it takes less than 90 seconds (even for a widely used template like {{m}}).
Could create custom logic to determine which parameters in each template indicate an entry name, or try to generate the output of the invoked module functions by replicating the module system and supplying fake frame objects to the functions. I imagine the first option would be simpler; at least, I don't know how easy replicating the module system would be. — Eru·tuon23:26, 22 March 2019 (UTC)Reply
I've taken the first approach, using a Wikitext parser. For this to work I had to convert the language data modules to JSON, and then reimplement Language:makeEntryName. The bulk of the links come just from a few templates, so doing parameter extraction is not too bad. Here are the first results, sorted by link count: User:Jberkel/lists/missing/20190320/all. Some results are interesting, but there are also many instances where the link target is simply wrong. – Jberkel00:02, 23 March 2019 (UTC)Reply
Yes, I want to do this, and split the output into several pages, maybe just one per language. With the orange links it's sometimes tricky to tell if the link is wrong or if the entry is missing. I'll see if I can put this on the toolserver as suggested above. – Jberkel00:39, 23 March 2019 (UTC)Reply
For Arabic and Hebrew, it would also be helpful to show the original link text (with vowel diacritics), because there could be several different words with the same letters but different diacritics. Perhaps also for other languages. Not sure how to make that work though. — Eru·tuon02:01, 23 March 2019 (UTC)Reply
That's doable. Right now everything is transformed to the canonical form because it is easier to aggregate. How many redlinks should we display per language? Just the most common ones? And maybe it's worth prioritizing redlinks from translation tables, or from entries in another language. What is exactly needed, how are these categories used currently? @Erutuon, TheDaveRoss, DCDuring, Mnemosientje – Jberkel07:47, 23 March 2019 (UTC)Reply
@Jberkel Thanks for the sortable table. I started to use it to clean up some obvious errors in "en" like using "en" where "kn" would be right. Unfortunately, the sort order reverted to frequency rather than language, so the process takes longer than it should. How can I make the sort order and focus durable? DCDuring (talk) 13:47, 23 March 2019 (UTC)Reply
@DCDuring: Yes, the arrow just uses the "what links here" page. To find the links you can also use insource search, e.g. insource:"{{m|de|Test}}". In the next run I'll include links back to the entries. – Jberkel21:33, 25 March 2019 (UTC)Reply
@Jberkel: Right. I thought you probably would press on in that direction, but that perhaps something else would come up. (You know about bright, shiny objects, I'm sure.) Thanks for doing this. DCDuring (talk) 00:01, 26 March 2019 (UTC)Reply
Did the sort order change when you clicked a link and then went back? The JavaScript doesn't automatically re-sort the table to the way it was sorted before when you visit it again. That could probably be done, but would require some custom JavaScript. — Eru·tuon20:20, 23 March 2019 (UTC)Reply
@Erutuon: Exactly. It would be nice and not just for this. But I don't know how many people value such a capability, nor how hard it is to code. For the task at hand JBerkel says he intends to produce from the dump something more usable that the sortable table. I can wait for that. DCDuring (talk) 00:01, 26 March 2019 (UTC)Reply
For the use-case mentioned by Mnemosientje, clearing out fictitious translations, I think all redlinks have to be accessible because it's very likely that the fictitious translation is just redlinked once. — Eru·tuon19:32, 23 March 2019 (UTC)Reply
@Jberkel: Very nice! (In my browser, it takes a while for the JavaScript to process the collapsible elements, so the page is unresponsive for several seconds, but that may be fixed when the list is broken up.) Not all the blue links are correct. MediaWiki:Gadget-OrangeLinks.js is not quite smart enough to tell that Category:Norwegian Bokmål lemmas in stille is not a Norwegian category, or that Category:Ancient Greek redlinks in παστός doesn't indicate that there's an Ancient Greek entry, for instance.
To show only the blue links (not red or orange), I entered $('.mw-parser-output tr').has('span a.new, span a.partlynew').hide() in my browser's JavaScript console. One, rock samphire, had the wrong header: Translingual instead of English. (I wonder if anyone checks for that type of situation.) Several of the blue links are Norwegian or Ancient Greek, which are incorrectly oranged by the gadget, but some are genuine (rimay has a Quechua entry, обући a Serbo-Croatian entry). I wonder what's going on with the latter. — Eru·tuon03:59, 8 April 2019 (UTC)Reply
@Erutuon Bugs squashed and table regenerated, Quechua/Serbo-Croatian entries are gone now. What about Norwegian links? When is {{m|no}} actually used? We have around ~1900 pages with "Norwegian" L2 (compared to 61K Bokmål, 42K Nynorsk). – Jberkel19:37, 8 April 2019 (UTC)Reply
@Jberkel: I don't really know about the Norwegians. (To me it seems like there should be one Norwegian with qualifiers for Bokmål and Nynorsk, but I don't make the rules.) Maybe Donnanz can explain. — Eru·tuon19:42, 8 April 2019 (UTC)Reply
@Jberkel: The objective is to replace Norwegian with Bokmål and Nynorsk, except for surnames and given names which are impossible to separate. This has already been done to a large extent in the Norwegian entries. You are going to find thousands of entries like this as I have concentrated on the Norwegian entries, not the English ones; anyway I fixed quiet, but was unsure with appease and mute, so I removed Norwegian stille completely from those. DonnanZ (talk) 21:17, 8 April 2019 (UTC)Reply
I forgot to say that Bokmål and Nynorsk entries may already exist, but links using {{t|no}} won't link directly to them, so they have to be checked carefully. DonnanZ (talk) 21:24, 8 April 2019 (UTC)Reply
@Jberkel: (edit conflict) The new language-specific pages that you're creating are still slow to load on my computer because of the collapsible elements. I think I'll try to make a faster collapsibility script that can be used instead of jQuery.makeCollapsible (mw-collapsible).
Maybe the best solution is to decrease page size, and implement proper pagination. Maybe this could be done with a module, which would generate the table. – Jberkel06:37, 9 April 2019 (UTC)Reply
Yeah, that would work. I suppose data could be stored in a plain text format and then the module could grab as much as it needed to. A format that uses tabs as separators has worked well for storing possibly incorrect headers and is easy to parse with Lua. The data currently displayed could be laid out with the following tab-separated fields: language code, title, entries that link to it. This would be briefer than a data module or JSON. — Eru·tuon07:19, 9 April 2019 (UTC)Reply
Looks like there's no easy way to do pagination without creating (placeholder) pages. Still, with the module the output is a lot more manageable, and the concerns are nicely separated. There are a lot of pages to create, I'm wondering if it's really worth it (who is going to page through 60k entries?) An alternative would be to just show the top lists here and provide the full datasets on toolforge? – Jberkel19:53, 9 April 2019 (UTC)Reply
Perhaps pages could be created on request. But I think it would be a good idea to provide all the data somewhere, so that anyone can copy the data to create a list (rather than having to request it from you).
I imagine that weeding out fictitious translations would be easier with lists of redlinks in {{t}} and {{t+}}. I'm not sure if it would be useful to create separate lists for all templates though. — Eru·tuon21:28, 9 April 2019 (UTC)Reply
@Jberkel: I signed up for a Wikimedia developer account (without requesting access to Toolforge), but now I can't figure out what my username and password were and don't know how to recover them, and creating new accounts is currently disabled. So I guess being added to the project is out for now. — Eru·tuon20:57, 10 April 2019 (UTC)Reply
I'm curious to see the code to find out if there are ways to optimize it. I wonder if my template extractor would be useful or not, or an optimized entry name generator. I'm mad at myself for not carefully remembering my username and password. Maybe I'll figure it out one of these days. — Eru·tuon21:40, 10 April 2019 (UTC)Reply
The code is on gitlab. It's written in Java/Python and uses Spark, which is probably overkill but works really well, even when used on a single machine the processing is distributed on all cores. The tradeoff is more overhead in shuffling/serializing data, but I haven't actually measured it. Parsing the markup also takes time, it builds a complete AST. I'm wondering if that tree could be serialized, so all future processing on it would be very fast. What's also really nice in terms of productivity is a SQL-like query interface in Spark which lets you do all sorts of aggregations on flat data files (you can see a snapshot of an interactive session here). – Jberkel22:27, 10 April 2019 (UTC)Reply
The parsing might be faster if the script were working from files that contain all the instances of the templates. My program can print all instances of {{l}}, {{m}}, {{t}}, {{t+}} each to a separate file with the titles of the pages on which they were found, in the format '\1' <title> '\n' (<template> '\n')+. (The \1 character can be replaced with something else if necessary.) The program takes under 2 minutes even to print a large number of different templates, like this list of form-of templates. The sizes of the files for each template were as follows: {{l}}, 62M; {{t}}, 29M; {{t+}}, 27M; {{m}}, 17M. Maybe adding the intermediate step of grabbing all instances of the templates and printing them in a usable format would speed things up.
The method of finding a template is rudimentary: it just looks for matching nested {{ and }} where {{ is followed by a non-empty string, in each position of which neither | or }} match. But I think that works in most pages in the main namespace, where fancy template syntax is rare. — Eru·tuon23:22, 10 April 2019 (UTC)Reply
Yes, I think an index is the way to go. We could generate something containing a mapping of all templates to pages (once), then we could run any type of query really efficiently. I'll try that for the next version. – Jberkel09:35, 11 April 2019 (UTC)Reply
It would also be great to extend this to more templates, especially the most common etymology templates ({{der}}, {{inh}}, {{bor}}, {{cog}}, {{noncog}}). That would require a Java version of getNonEtymological from Module:etymology to convert etymology language codes to regular language or language family codes. Other templates that do not accept etymology language codes would be easier to add. — Eru·tuon22:13, 11 April 2019 (UTC)Reply
Oh, the other thing is adding the Reconstruction and Appendix namespaces. I was puzzled at first that the all.jsonl file doesn't include any links from languages ending in -pro besides Proto-Norse (gmq-pro), but that's probably the only such language that has entries in the mainspace. — Eru·tuon23:08, 11 April 2019 (UTC)Reply
For curiosity's sake, here is a census of the total byte counts of some templates that can contain links to Wiktionary entries. (Redirects are included.) — Eru·tuon23:18, 11 April 2019 (UTC)Reply
That's a useful list of cases to cover then. Something else I remembered which is missing: nested links are currently skipped, e.g. {{m|en|] ]}}. – Jberkel23:27, 11 April 2019 (UTC)Reply
Wonderful! It's interesting how with the addition of the first etymology templates some languages jumped up in the rankings, especially Middle English, which wasn't even in the top 30 or so before. And there are some new words that apparently were only linked in etymology templates, like κεράσιον(kerásion) at the top of the Ancient Greek list. — Eru·tuon17:22, 13 April 2019 (UTC)Reply
Yes, it's getting more useful, and interesting to see the connections and patterns. Do you know what's up with all those Lojban entries in the "all" list? Are they legit or is some filtering needed? – Jberkel18:48, 13 April 2019 (UTC)Reply
Hmm, they aren't actual redlinks or "orange links". Lojban is an appendix-reconstructed language, but the logic you're using doesn't account for that, so it looks for an entry in mainspace rather than Appendix namespace. I guess appendix-reconstructed languages should be filtered out, like reconstructed links, until you've developed the arcane logic to handle them. — Eru·tuon18:55, 13 April 2019 (UTC)Reply
It occurs to me that the link data could also be used (together with data on {{senseid}} templates) to find links that go to a nonexistent sense id. Oh wait, I guess at the moment the values of id parameters are not being saved, so it would require some changes to the data generation. — Eru·tuon21:18, 13 April 2019 (UTC)Reply
It would require indexing all instances of {{senseid}}. How many do we have? While useful I don't think it's a widely used template.
My {{senseid}} file from the mainspace comes to 132093 bytes, and there are 4195 total instances on 2273 pages. There are a few more in the Reconstruction namespace. (I should have my program look there as well.) I just figure it might be easier to track sense ids with your program than to create a separate one that duplicates some of the same work.
{{ja-r}} would be pretty tricky to parse correctly, so I would omit it (though there are bound to be a lot of redlinks that will be excluded that way). What about other templates like {{l}} and {{m}}? Does your program gather them when they're inside other templates, like when {{l}} is inside the |t= parameter of {{l}} or {{m}}? — Eru·tuon21:07, 14 April 2019 (UTC)Reply
I wonder why {{bor}} couldn't handle the formatting, very messy to nest templates like that. My program doesn't handle the other cases you mentioned, I'll add it. I can also take a look at the senseids. My general idea for the project is to move it towards something like a framework, which covers extraction, parsing etc and which can easily be extended to perform all sorts of analyses. – Jberkel21:42, 14 April 2019 (UTC)Reply
I think we should move the logic into the linking templates/modules, so one can just write {{bor|en|ja|御殻}}. Easier for editors and easier to parse for machines. – Jberkel06:12, 15 April 2019 (UTC)Reply
An inefficient system that strains the server extensively and causes memory errors (until it is modded to exclude specific pages), yet is of little help (it doesn't point out redlinks, but entries that contain redlinks). It is a much better idea to generate these from dumps, like has been done in User:Jberkel/lists/wanted. This template and system has all the hallmarks of being an irredeemable kludge. — SURJECTION/ T / C / L /16:00, 24 January 2022 (UTC)Reply
Keep. Compared to Jberkel's lists, it has the following advantages:
* Allows users to view all pages containing redlinks, whereas Jberkel's list only shows the top 1000, with no "next" button
* Allows users to view recent changes
* Continuously updated in real time
* Usefully categorizes redlinks into l, m, t and t+
* Once set up, it works automatically until it is intentionally disactivated, so it is not hampered by a user's temporary or permanent inactivity (unlike Wiktionary Statistics, which is no longer being updated, presumably because Ungoliant has been inactive since mid-January) and does not require communication with individual users Martin123xyz (talk) 12:47, 24 February 2022 (UTC)Reply
@Martin123xyz: The code is open source, the lists are generated on Toolforge, and it's possible to assign more users to the project (Erutuon also has access), so it doesn't really depend on one single user. Regarding the 1000 limit, you can download the raw data which contains all the links. It's obviously not as powerful as a fully dynamic system, but I think it's a good tradeoff and helps conserving server resources. – Jberkel19:06, 1 April 2022 (UTC)Reply
Three of those things are easily achievable with Jberkel's solution (the first, third and fourth); the second is not useful (why would you want to see changes to entries with redlinks?), and the fifth is inconsequential if someone is going to view the redlink categories anyway. The disadvantages of this approach on the other hand are easy to name and most of them cannot be fixed:
* It adds considerable strain to server resources and is probably one of the major causes behind memory errors.
* It must be enabled separately for each language, which means that it must be configured weeks if not months in advance (to let all of the necessary categories update).
* It doesn't show redlinks, but pages with redlinks, which defeats at least half of its entire purpose.
* It cannot be sorted correctly due to the previous point.
* It does not consider redlinks in etymological templates, etc.
Delete Putting some effort into more discriminating dump-derived lists would help address the underlying need, if it isn't done already. For example, getting lists of redlinked terms from translation boxes (which could be easily split by language) or from other groups of terms that have an associated language parameter. DCDuring (talk) 12:26, 27 September 2022 (UTC)Reply
Delete per the above. @Surjection, you may also be interested to know that {{redlink category}} is the reason why it wasn't possible to use {{multitrans}} on water/translations (with 3,500 translations), as it was causing the preprocessor node count to hit the limit of 1,000,000 after processing around 1,800 translations. I've created {{tt-lite}} and {{tt+lite}}, which are identical except that they exclude {{redlink category}}, and we're comfortably within all limits. They can be deprecated and deleted once this RFD is over. Theknightwho (talk) 12:07, 12 January 2023 (UTC)Reply
Delete. I'm surprised I haven't voted on this already. This is currently transcluded in 1,757,720 entries, in many cases dozens and even hundreds of times. There are something like 467 pages in its exclusion list, which got there mostly because its module was pushing them past the limits the system would allow. I'm the one who's added most of them, and I can tell you that it often makes a difference of a megabyte or more out of the 50 MB limit. What's more, it's executed every time the page is viewed, every day, 24/7, 365 days a year- all to collect information that hasn't changed in years for 99.9% of those pages. This has to be one of the stupidest clever hacks in a project that's seen a lot of them. Chuck Entz (talk) 15:56, 12 January 2023 (UTC)Reply
Note, I have added the {{rfd}} template to all subcategories that would be affected, so hopefully if anyone is perusing these categories, their attention will be attracted to this discussion.
Formally RFD-deleted. Note that Cat:Redlinks by language will not be deleted, as it also contains "LANG terms with red links in their headword lines" and "LANG terms with red links in their inflection tables" subcategories which are not the subject of this deletion request. This, that and the other (talk) 08:00, 29 January 2023 (UTC)Reply
@Jberkel's lists are not something new. I respect Jberkel's work, but the categories had more advantages, @Martin123xyz explained it well above. Furthermore, Jberkel's lists do not appear in Redlinks or in Entry maintenance. If there are new contributors, how would they find these lists? Gorec (talk) 19:09, 31 January 2023 (UTC)Reply