Module:category tree/topic

Hello, you have come here looking for the meaning of the word Module:category tree/topic. In DICTIOUS you will not only get to know all the dictionary meanings for the word Module:category tree/topic, but we will also tell you about its etymology, its characteristics and you will know how to say Module:category tree/topic in singular and plural. Everything you need to know about the word Module:category tree/topic you have here. The definition of the word Module:category tree/topic will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofModule:category tree/topic, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

This module implements the topic subsystem, which generates the descriptions and categorization for topical category pages on Wiktionary, i.e. those of the format LANGCODE:LABEL, such as Category:fr:Birds and Category:en:Cities in Georgia, USA; and the corresponding umbrella (language-independent) categories LABEL, such as Category:Birds and Category:Cities in Georgia, USA. It is not to be confused with the poscatboiler subsystem, which handles all other type of categories.

Following is the documentation on the submodules for the topic subsystem, which specify data on individual topics.

Use WT:CLTR to propose category additions or changes.
The category tree modules are intentionally protected. New topics, or substantive changes to existing topics, need to obtain prior consensus. To do this, post at WT:CLTR (Wiktionary:Category and label treatment requests).


Introduction

This is the documentation page for the main data module for the Module:category tree/topic category tree subsystem, as well as for its submodules. Collectively, these modules handle generating the descriptions and categorization for topic pages such as Category:en:Birds, Category:es:France and Category:zh:State capitals of Germany, and the corresponding non-language-specific pages such as Category:Birds, Category:France and Category:State capitals of Germany. (All other categories handled through the {{auto cat}} system are handled by the Module:category tree/poscatboiler subsystem.)

The main data module at Module:category tree/topic does not contain data itself, but rather imports the data from its submodules, and applies some post-processing.

  • To find which submodule implements a specific category, use the search box on the right.
  • To add a new data submodule, copy an existing submodule and modify its contents. Then, add its name to the subpages list at the top of Module:category tree/topic.

Concepts

Per-language and umbrella categories

The topic cat system internally makes a distinction based on which languages a category applies to:

  1. Per-language categories. These are of the form langcode:label (e.g. Category:es:Birds and Category:de:States of the United States). Here, langcode is the language code of a recognized full Wiktionary language (see WT:LOL for the list of all such languages and their codes), and label is a topic, generally one that can apply to multiple languages. The intended category contents is terms in the language in question that are either related to, instances of or types of the topic in question (depending on the type of category; see below). Associated with each per-language category is an umbrella category; see below. The following restrictions apply to per-language categories:
    1. The language mentioned by langcode must currently be a full language, not an etymology-only language. (Etymology-only languages include lects such as Provençal, considered a variety of Occitan, and Biblical Hebrew, considered a variety of Hebrew. See here for the list of such lects.)
    2. The category label specified by label as found in the category name always begins with a capital letter, whether or not the underlying form of the label is capitalized (contrast Category:en:Birds with Category:en:France). Internally, this is different, and the internal form of a label begins with a lowercase or uppercase letter as appropriate (birds but France).
  2. Umbrella categories. These are of the form label, i.e. a bare category label. As with per-language categories, this label is always capitalized in the category name, regardless of the underlying form of the label. Examples are Category:Birds, Category:France and Category:State capitals of Germany. Umbrella categories serve to group all the per-language categories for a particular topic. They also serve to group more specific subcategories, e.g. under Category:Birds can be found Category:Birds of prey, Category:Freshwater birds, Category:Columbids (which includes doves and pigeons), etc. as well as Category:Eggs and Category:Feathers. Umbrella categories should not normally directly contain any terms.
  3. Unlike for the poscatboiler system, language-specific categories do NOT currently exist. These would be topics that only make sense for a given language or small set of languages, and which are allowed for that language or those languages. Currently, all topics are cross-language even if in practice they don't make sense except in conjunction with a subset of languages; but this may change in the future.

Category types

In addition to the above distinction, the topic cat system divides categories according to the category type, which specifies the relationship between the category and the members of that category:

  1. Related-to categories (type = "related-to") contain terms that are semantically related to the category topic. For example, Category:en:Chess contains terms such as checkmate, rank (a row on a chessboard), endgame, en passant, Grandmaster, etc. "Related to" is a nebulous criterion, and as a result the terms in the category should be related to the category as directly as possible, to avoid the category becoming a grab bag of random terms.
  2. Name (type = "name") categories contain terms that are names of individual, specific instances of the category. For example, Category:Chess openings contains names of specific openings, such as Ruy Lopez and Sicilian Defense. Even more clearly, Category:Moons of Jupiter contains names of individual moons that orbit the planet Jupiter.
  3. Type (type = "type") categories contains terms for types of the entity described by the category name. For example, Category:Checkmate patterns contains types of checkmates, such as ladder mate and smothered mate. Even more clearly, Category:Hobbyists contains terms for types of hobbyists, such as oenophile (a wine enthusiast), numismatist (a stamp collector), etc. (If this were a name category, it would contain names of specific, presumably famous, hobbyists — something that would probably not be dictionary-worthy material.)
  4. Set (type = "set") categories are used when the distinction between names and types of a given topic may not always be clear, but the overall membership is still well-defined. For example, Category:Heraldic charges contains terms for components of coats of arms, e.g. bend sinister (a diagonal band from lower left to upper right), fleur-de-lis (a stylized image of a lily, as is commonly associated with New Orleans) and quatrefoil (a symmetrical shape made from the outline of four circles).
  5. Grouping (type = "grouping") categories are higher-level categories that are used only to group more specific categories and should not contain elements themselves (but nevertheless sometimes do). An example is Category:Industries, which contains subcategories devoted to particular industries (e.g. Category:Banking, Category:Mining, Category:Music industry, Category:Oil industry, etc.).
  6. Top-level (type = "toplevel") categories are special high-level categories that list all the categories of one of the above types, and which are always named List of type categories, e.g. Category:List of related-to categories (listing all the "related-to" umbrella categories) or Category:es:List of name categories (listing all the Spanish name-type categories). The number of top-level categories is fixed.

Note that name, type and set categories are conceptually similar to each other, in that each contains terms that have an is-a relationship with the topic in question, whereas related-to categories express a weaker sort of relation between term and topic, merely asserting that the term is in some way "related" or "pertinent" to the topic in question. For this reason, when creating new topics, you should always strive to create name, type or set topics whenever possible, and avoid related-to topics unless there is no alternative and you're convinced this topic is really necessary. Before creating such a category:

  1. Consider whether there is another category already in existence that will cover this semantic space.
  2. Consider whether you can convert the category to a name, type or set category.
  3. Investigate whether there needs to be a category for the semantic concept at all (in particular, abstract concepts often do not merit related-to categories).
  4. Make sure there are enough terms to fill up this category in at least two languages (one of which should be English). What qualifies as "enough" varies a bit from topic to topic but generally should be at least 10.
  5. Make sure the terms you add or consider adding to this category are directly related to the topic at hand. Do not add terms merely because the term contains the name of the topic in it (e.g. if you create a category named brick, do not add terms like brick house, thick as a brick or yellow brick road merely becaues they have the word "brick" in them; instead, use the ===Related terms=== section of the brick lemma to include these terms).

It should also be noted that name, type and set categories typically use the plural in their topic name, which related-to categories often use the singular. This is not a hard and fast rule, however, and there are exceptions in both directions. If it's not obvious what type of category a given topic refers to, consider making this explicit in the topic name, e.g. names of stars or types of stars rather than just stars. (In the future, all, or at least most, topic categories may be named in such a fashion.)

Adding, removing or modifying categories

A sample entry is as follows (in this case, found in Module:category tree/topic/History):

labels = {
	type = "related-to",
	description = "default",
	parents = {"history"},
}

This generates the description and categorization for all per-language categories of the form langcode:Ancient history (e.g. Category:en:Ancient history) as well as for the umbrella category Category:Ancient history (see above for the definition of per-language and umbrella categories).

The meaning of this snippet is as follows:

  • The label itself needs to use proper capitalization or lower case in the first letter of the label, even though the label as it appears in the category name is always capitalized, consistent with the principle that category names begin with a capital letter. In this case, the label is lowercase, and other labels that reference it need to use the same casing (as in the example below). By contrast, a label like Ancient Near East (as in the example below) is capitalized because the label refers to a specific region, and toponyms are capitalized in English.
  • the type field specifies the category type, as described above. This label is a "related-to" category.
  • The description field gives the description text that will appear when a user visits the category page. Certain special values are recognized, including "default", which generates a default label. The value of the default label depends on the label's name, the language of the category, and the label's type. In this case, it is equivalent to "{{{langname}}} terms related to ] ]" (where {{{langname}}} is replaced with the name of the language in question) and "terms related to ] ]"" for the umbrella category. See #Descriptions below for more information on specifying descriptions.
  • The parents field gives the labels of the parent categories. Here, the category specifies a single parent "history". This means that a category such as Category:en:Ancient history will have Category:en:History as its parent. An additional top-level list parent will automatically be added (in this case Category:en:List of related-to categories) as well as the umbrella parent Category:Ancient history.

Another example follows:

labels = {
	type = "name",
	displaytitle = "places in ''Romance of the Three Kingdoms''",
	description = "=places in ''{{w|Romance of the Three Kingdoms}}''",
	parents = {"Romance of the Three Kingdoms", "China"},
}

This is a subcategory of "Romance of the Three Kingdoms" (a 14th century Chinese historical novel) and accordingly specifies "Romance of the Three Kingdoms" as the parent, along with "China" (note the capitalization, in accordance with the principles laid out above). A description is given explicitly, preceded by = (which in this case prepends "names for specific" to the description). The displaytitle field is also set so that the name of the work is italicized.

Category label fields

The following fields are recognized for the object describing a label:

type
The type of the label ("related-to", "name", "type", "set", "grouping" or "toplevel", as described above. Mandatory. It is possible to specify multiple comma-separated types, for "mixed" categories that can contain more than one type of term. For example, the label flags currently has type = "related-to,name,type" because it contains a mixture of terms related to flags (e.g. flagpole and grommet), terms for individual flags (e.g. Star-Spangled Banner) and terms for types of flags (e.g. prayer flag, flag of convenience). Mixed categories are strongly dispreferred and should be split into separate per-type categories.
description
A plain English description for the label. This should generally be no longer than one sentence. Place additional, longer explanatory text in the additional field described below, and put {{wikipedia}} boxes in the topright field described below so that they are correctly right-aligned with the description. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} will be expanded appropriately; see #Template substitutions in field values below. Certain values are handled specially, including "default" (and variants such as "default with the", "default wikify" and "default no singularize") and phrases preceded by an = sign, as explained in more detail below.
parents
A table listing one or more parent labels of this label. This controls the parent categories that the category is contained within, as well as the chain of breadcrumbs appearing across the top of the page (see below).
  • An item in the table can be either a single string (the parent label), or a table containing (at least) the two elements name and sort. In the latter case, name specifies the parent label name, while the sort value specifies the sort key to use to sort it in that category. The default sort key is the category's label.
  • If a parent label begins with Category: it is interpreted as a raw category name, rather than as a label name. It can still have its own sort key as usual.
  • The first listed parent controls the category's parent breadcrumb in the chain of breadcrumbs at the top of the page. (The breadcrumb of the category itself is determined by the breadcrumb setting, as described below.)
breadcrumb
The text of the last breadcrumb that appears at the top of the category page.
  • By default, it is the same as the category label, with the first letter capitalized.
  • The value can be either a string, or a table containing two elements called name and nocap. In the latter case, name specifies the breadcrumb text, while nocap can be used to disable the automatic capitalization of the breadcrumb text that normally happens.
  • Note that the breadcrumbs collectively are the chain of links that serve as a navigation aid for the hierarchical organization of categories. For example, a category like Category:en:Ancient Near East will have a breadcrumb chain similar to "Fundamental » All languages » English » All topics » History » Ancient history » Ancient Near East", where each breadcrumb is a link to a category at the appropriate level. The last breadcrumb here is "Ancient Near East", and its text is controlled by this field.
displaytitle
Apply special formatting such as italics to the category page title, as with the {{DISPLAYTITLE:...}} magic word (see mw:Help:Magic words). The same formatting is also applied to breadcrumbs, descriptions and other mentions of the label in formatted text. The value of this is either a string (which should be the formatted label, e.g. "The Matrix", "people in Romance of the Three Kingdoms" or "Glee (TV series)") or a Lua function to generate the formatted category title. The Lua function is passed two parameters: the raw label (without any preceding language code) and the language object of the category's language (or nil for umbrella categories). It should return the appropriately formatted label. If the value of this field is a string, template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} will be expanded appropriately; see below. See Module:category tree/topic/Culture for examples of using displaytitle.
topright
Introductory text to display right-aligned, before the edit and recent-entries boxes on the right side. This field should be used for {{wikipedia}} and other similar boxes. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description; see #Template substitutions in field values below. Compare the preceding field, which is similar to topright but used for left-aligned text placed above the description.
preceding
Introductory text to display directly before the text in the description field. The difference between the two is that description text will also be shown in the list of children categories shown on the parent category's page, while the preceding text will not. For this reason, use preceding instead of description for {{also}} hatnotes and similar text, and keep description relatively short. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description; see #Template substitutions in field values below. Compare the topright field, which is similar to preceding but is right-aligned, placed above the edit and recent-entries boxes.
additional
Additional text to display directly after the text in the the description field. The difference between the two is that description text will also be shown in the list of children categories shown on the parent category's page, while the additional text will not. For this reason, use additional instead of description for long explanatory notes, See also references and the like, and keep description relatively short. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description; see #Template substitutions in field values below.
wp
Display a box linking to a Wikipedia entry in the upper right corner. The value can either be true to link to an entry that is the same as the label; a string, to link to that entry; or a list of strings or true, to generate multiple boxes, one per list item. For example, if the label pesäpallo has wp = true, a box will be generated that links to Pesäpallo on Wikipedia, and if the label football (American) has wp = "American football", a box will be generated that links to American football on Wikipedia.
wpcat
Display a box linking to a Wikipedia category in the upper right corner. This is similar to wp except that the link is to a category (the generated entry or entries is/are prepended with Category:). For example, if the label animals has wpcat = true set, a box will be generated that links to Category:Animals on Wikipedia.
commonscat
Display a box linking to a Wikimedia Commons category in the upper right corner. This is similar to wpcat except that the link is to Wikimedia Commons instead of Wikipedia. For example, if the label racquet sports has commonscat = true set, a box will be generated that links to Category:Racquet sports on Wikimedia Commons.
topic
Text indicating the topic being handled by this category. This appears in the auto-generated "additional" message following the description, which indicates what type this category is (based on the type field) and what sorts of terms should go into it. This does not normally need to be specified, as it's derived directly from the label. But it is useful e.g. for the label types of planets, which sets topic = "planets", because the auto-generated "additional" message contains the text " ... It should contain terms for types of {{{topic}}}, ...", and using the label directly will result in redundant text. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description; see #Template substitutions in field values below. The value of this field can be "default" or "default with the", which will be expanded appropriately based on the label.
umbrella
A table describing the umbrella category that collects all language-specific categories associated with this label. The umbrella category is named using the label, without any language prefix. For example, for the label ancient history, the umbrella category is named Category:Ancient history, and is a parent category (in addition to any categories specified using parents) of Category:en:Ancient history, Category:fr:Ancient history and all other language-specific categories holding adjectives. This table contains the following fields:
description
A plain English description for the umbrella category. By default, it is derived from the description field of the label itself by removing language references (specifically, {{{langname}}} , {{{langcode}}}:, {{{langcode}}} and {{{langcat}}} ) and adding This category concerns the topic: before the result. Text is automatically added to the end indicating that this category is an umbrella category that only contains other categories, and does not contain pages describing terms.
breadcrumb
The last breadcrumb in the chain of breadcrumbs at the top of the category page; see above. By default, this is the category label.
topright
Like the topright field on regular category pages; see above.
preceding
Like the preceding field on regular category pages; see above.
additional
Like the additional field on regular category pages; see above.
topic
Like the topic field on regular category pages; see above.
umbrella_description
The same as the description subfield of the umbrella field.

Template substitutions in field values

Template invocations can be inserted in the text of description, parents (both name and sort key), breadcrumb, toc_template and toc_template_full values, and will be expanded appropriately. In addition, the following special template-like invocations are recognized and replaced by the equivalent text:

{{PAGENAME}}
The name of the current page. (Note that two braces are used here instead of three, as with the other parameters described below.)
{{{langname}}}
The name of the language that the category belongs to. Not recognized in umbrella fields.
{{{langcode}}}
The code of the language that the category belongs to (e.g. en for English, de for German). Not recognized in umbrella fields.
{{{langcat}}}
The name of the language's main category, which adds "language" to the regular name. Not recognized in umbrella fields.
{{{langlink}}}
A link to the language's main category. Not recognized in umbrella fields.
{{{umbrella_msg}}}
The message normally at the end of the description for umbrella categories, indicating that the category contains no terms but only subcategories.
{{{topic}}}
The value of the topic field (or the umbrella.topic field for umbrella categories), if specified; else, the value of displaytitle (if specified) or the label, with "the" added if the description is "default with the" or a variant containing "with the" (such as "default with the wikify").

Descriptions

The description field is of one of three types:

  1. An English sentence, ending in a period.
  2. A phrase preceded by = and not ending in a period.
  3. The value "default" or one of its variants, such as "default with the" or "default wikify".

If preceded by =, the description is generated from the specified phrase by prepending {{{LANGNAME}}} (which is replaced with the language name) followed by standard type-dependent text, and appending a period. The text prepended is currently as follows:

Type Text
related-to terms related to
set terms for types or instances of
name names of specific
type terms for types of
grouping categories concerning more specific variants of
toplevel N/A

For example, for the label biblical characters, the description is currently "=characters in the ]", which expands to {{{LANGNAME}}} names of specific characters in the ]., and in turn is expanded to e.g. French names of specific characters in the ]. (if the category is Category:fr:Biblical characters).

Note that no standard text is provided for top-level categories, all of which include a custom description.

If "default" or one of its variants is used as the description, a default description is generated as if the description consisted of = prepended to the label, except that the word the might be added to the beginning of the label, and the words in the label might be wikilinked. Specifically:

  1. If the description is of the form "default with the" (or a form such as "default with the wikify", "default with the no singularize", etc.), the word the is prefixed to the label.
  2. If the label is of the form "default wikify" (or a related form), the label is linked to Wikipedia. If the label ends in an -s, the label is linked to a Wikipedia entry based on the singular form of the label (which converts -ies to -y; converts -xes, -ches or -shes, respectively, to -x, -ch or -sh; and otherwise just removes -s), unless the label is "default wikify no singularize" or a related form, in which case the label is linked unchanged.
  3. Otherwise, the code attempts to link the entire label or the individual words of the label to Wiktionay terms, as follows:
    1. If the label ends in -s and no singularize is not specified in the description, and the singular form of the label (generated according to the algorithm described just above) is a Wiktionary term, the label is linked to that term. Note that "is a Wiktionary term" simply means that a page of this name exists; the code does not currently check to see whether there is an English entry or whether the term is a lemma.
    2. Otherwise, if the label itself is a Wiktionary term, the label is linked to that term.
    3. Otherwise, the label is split into individual words, and each word is checked to see if a page named according to that word exists. If so, the individual words are linked to their corresponding Wiktionary entries; otherwise, the label is left unlinked. Note that the last word is handled specially if it ends in -s and no singularize is not found in the description, in that the code first attempts to link the word to its singular equivalent, falling back to the word itself if the singular equivalent doesn't name a Wiktionary term.

For example, a label video games will be linked as ]s because the page video game exists, but Arabic deities will be linked as ] ] because neither Arabian deity nor Arabian deities exists as a page. The use of no singularize is needed with labels such as linguistics, comics and humanities, because their respective singular forms linguistic, comic and humanity exist as Wiktionary pages.

Finally, note that the components of a default-type description (wikify, with the and no singularize) can be given in any order if more than one of them needs to be specified.

Handlers

It is also possible to have handlers that can handle arbitrarily-formed labels, e.g. political divisions of country for any country (categories such as Category:tg:Emirates of the United Arab Emirates) or divisions of polity for any division and polity (e.g. Category:fr:Counties of South Korea or Category:pt:Municipalities of Tocantins, Brazil). Currently, handlers exist only in the toponym-handling code in Module:category tree/topic/Places and in Module:category tree/topic/Names. As example, the following is the handler for script letter names:

table.insert(handlers, function(label)
	local script = label:match("^(.*) letter names$")
	if script then
		local sc = require("Module:scripts").getByCanonicalName(script)
		if sc then
			local script_page
			local appendix = ("Appendix: %s script"):format(script)
			local appendix_title = mw.title.new(appendix)
			if appendix_title and appendix_title.exists then
				script_page = appendix
			else
				script_page = "w:" .. sc:getWikipediaArticle()
			end
			local link = ("]"):format(script_page, script)
			return {
				type = "name",
				description = ("{{{langname}}} terms that serve as names for letters and symbols directly based on letters, " ..
					"such as ]s and letters with ]s, of the %s."):format(link),
				parents = {"letter names"},
			}
		end
	end
end)

The handler checks is passed a single argument (the label), checks if the passed-in label has a recognized form, and if so, returns an object that follows the same format as described above for directly-specified labels. In this case, the handler makes sure the given script name specifies an actual script, and constructs an appropriate link for the script, depending on whether an appendix page for the script exists (falling back to Wikipedia).

NOTE: The handler needs to be prepared to handle both umbrella categories and per-language categories. The label is passed in as it appears in the category; this means the handler may need to handle both uppercase-initial and lowercase-initial variants of the label. (For this handler, this isn't an issue because the script always appears uppercased.) One way to do that is to convert the label to lowercase-initial before further processing, using mw.getContentLanguage():lcfirst().

Note also that if a handler is specified, the module should return a table holding both the label and handler data; see the above modules.

Subpages


local raw_handlers = {}
local raw_categories = {}


--[=[
This module implements the topic category subsystem. It is currently implemented with a single raw handler that
handlers both language-specific and umbrella topic categories, and a corresponding handler for thesaurus categories.
The topmost topic category ] is special and potentially could be handled as a separate raw
category, but currently it's handled as part of the raw topic handler. The topmost thesaurus category
] is in fact handled as a raw category.
]=]

local functions_module = "Module:fun"
local labels_utilities_module = "Module:labels/utilities"
local languages_module = "Module:languages"
local string_pattern_escape_module = "Module:string/patternEscape"
local string_replacement_escape_module = "Module:string/replacementEscape"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"

local topic_data_module = "Module:category tree/topic/data"
local topic_utilities_module = "Module:category tree/topic/utilities"
local thesaurus_data_module = "Module:category tree/topic/thesaurus data"

local concat = table.concat
local insert = table.insert
local dump = mw.dumpObject
local is_callable = require(functions_module).is_callable
local pattern_escape = require(string_pattern_escape_module)
local replacement_escape = require(string_replacement_escape_module)
local split = require(string_utilities_module).split

local type_data = {
	 = {
		desc = "terms related to",
		additional = "'''NOTE''': This is a \"related-to\" category. It should contain terms directly related to " ..
		"{{{topic}}}. Please do not include terms that merely have a tangential connection to {{{topic}}}. " ..
		"Be aware that terms for types or instances of this topic often go in a separate category.",
	},
	set = {
		desc = "terms for types or instances of",
		additional = "'''NOTE''': This is a set category. It should contain terms for {{{topic}}}, not merely " ..
		"terms related to {{{topic}}}. It may contain more general terms (e.g. types of {{{topic}}}) or more " ..
		"specific terms (e.g. names of specific {{{topic}}}), although there may be related categories "..
		"specifically for these types of terms.",
	},
	name = {
		desc = "names of specific",
		additional = "'''NOTE''': This is a name category. It should contain names of specific {{{topic}}}, not " ..
		"merely terms related to {{{topic}}}, and should also not contain general terms for types of {{{topic}}}.",
	},
	type = {
		desc = "terms for types of",
		additional = "'''NOTE''': This is a type category. It should contain terms for types of {{{topic}}}, not " ..
		"merely terms related to {{{topic}}}, and should also not contain names of specific {{{topic}}}.",
	},
	grouping = {
		desc = "categories concerning more specific variants of",
		additional = "'''NOTE''': This is a grouping category. It should not directly contain any terms, but " ..
		"only subcategories. If there are any terms directly in this category, please move them to a subcategory.",
	},
	toplevel = {
		desc = "UNUSED", -- all categories of this type hardcode their description
		additional = "'''NOTE''': This is a top-level list category. It should not directly contain any terms, but " ..
		"only a {{{topic}}}.",
	},
}


local function invalid_type(types)
	local valid_types = {}
	for typ, _ in pairs(type_data) do
		insert(valid_types, ("'%s'"):format(typ))
	end
	error(("Invalid type '%s', should be one or more of %s, comma-separated")
		:format(types, mw.text.listToText(valid_types)))
end


local function split_types(types)
	types = types or "related-to"
	local splitvals = split(types, "%s*,%s*")
	for i, typ in ipairs(splitvals) do
		-- FIXME: Temporary
		if typ == "topic" then
			typ = "related-to"
		end
		if not type_data then
			invalid_type(types)
		end
		splitvals = typ
	end
	return splitvals
end


local function gsub_escaping_replacement(str, from, to)
	return (str:gsub(pattern_escape(from), replacement_escape(to)))
end


function ucfirst(txt)
	local italics, raw_txt = txt:match("^('*)(.-)$")
	return italics .. mw.getContentLanguage():ucfirst(raw_txt)
end


function lcfirst(txt)
	local italics, raw_txt = txt:match("^('*)(.-)$")
	return italics .. mw.getContentLanguage():lcfirst(raw_txt)
end


local function convert_spec_to_string(data, desc)
	if not desc then
		return desc
	end
	local desc_type = type(desc)
	if desc_type == "string" then
		return desc
	elseif desc_type == "number" then
		return tostring(desc)
	elseif not is_callable(desc) then
		error("Internal error: `desc` must be a string, number, function, callable table or nil; received a " ..
			desc_type)
	end
	desc = desc {
		lang = data.lang,
		sc = data.sc,
		label = data.label,
		category = data.category,
		topic_data = data.topdata,
	}
	if not desc then
		return desc
	end
	desc_type = type(desc)
	if desc_type == "string" then
		return desc
	end
	error("Internal error: the value returned by `desc` must be a string or nil; received a " .. desc_type)
end


local function get_and_cache(data, obj, key)
	local val = convert_spec_to_string(data, obj)
	obj = val
	return val
end


local function process_default(desc)
	local stripped_desc = desc
	local no_singularize, wikify, add_the
	while true do
		local new_stripped_desc = stripped_desc:match("^(.+) no singularize$")
		if new_stripped_desc then
			no_singularize = true
		end
		if not new_stripped_desc then
			new_stripped_desc = stripped_desc:match("^(.+) wikify$")
			if new_stripped_desc then
				wikify = true
			end
		end
		if not new_stripped_desc then
			new_stripped_desc = stripped_desc:match("^(.+) with the$")
			if new_stripped_desc then
				add_the = true
			end
		end
		if new_stripped_desc then
			stripped_desc = new_stripped_desc
		else
			break
		end
	end
	if stripped_desc == "default" then
		return true, no_singularize, wikify, add_the
	else
		return false
	end
end


local function format_desc(data, desc)
	local desc_parts = {}
	local types = split_types(data.topdata.type)
	for _, typ in ipairs(types) do
		insert(desc_parts, type_data.desc .. " " .. desc)
	end
	return "{{{langname}}} " .. require(table_module).serialCommaJoin(desc_parts) .. "."
end


local substitute_template_specs

local function format_displaytitle(data, include_lang_prefix, upcase)
	local topdata, lang, label = data.topdata, data.lang, data.label
	local displaytitle = substitute_template_specs(data, topdata.displaytitle)
	if not displaytitle then
		return nil
	end
	if upcase then
		displaytitle = ucfirst(displaytitle)
	end
	if include_lang_prefix and lang then
		displaytitle = ("%s:%s"):format(lang:getCode(), displaytitle)
	end

	return displaytitle
end


local function get_breadcrumb(data)
	local topdata, lang, label = data.topdata, data.lang, data.label
	local ret

	if lang then
		ret = topdata.breadcrumb or format_displaytitle(data, false, "upcase")
	else
		ret = topdata.umbrella and topdata.umbrella.breadcrumb or
			topdata.breadcrumb or format_displaytitle(data, false, "upcase")
	end
	if not ret then
		ret = label
	end

	if type(ret) == "string" or type(ret) == "number" then
		ret = {name = ret}
	end

	local name = substitute_template_specs(data, ret.name)
	local nocap = ret.nocap

	return {name = name, nocap = nocap}
end


local function make_category_name(lang, label)
	if lang then
		return lang:getCode() .. ":" .. ucfirst(label)
	else
		return ucfirst(label)
	end
end


local function replace_special_descriptions(data, desc)
	if not desc then
		return desc
	end

	if desc:find("^=") then
		desc = desc:gsub("^=", "")
		return format_desc(data, desc)
	end

	local is_default, no_singularize, wikify, add_the = process_default(desc)
	if is_default then
		local linked_label = require(topic_utilities_module).link_label(data.label, no_singularize, wikify)
		if add_the then
			linked_label = "the " .. linked_label
		end
		return format_desc(data, linked_label)
	else
		return desc
	end
end


local function get_displaytitle_or_label(data)
	return format_displaytitle(data, false) or data.label
end


local function process_default_add_the(data, topic)
	local is_default, _, _, add_the = process_default(topic)
	if is_default then
		topic = get_displaytitle_or_label(data)
		if add_the then
			topic = "the " .. topic
		end
	end
	return topic, is_default
end


substitute_template_specs = function(data, desc)
	desc = convert_spec_to_string(data, desc)
	if not desc then
		return nil
	end
	
	local topdata, lang, label = data.topdata, data.lang, data.label
	if desc:find("{{{umbrella_msg}}}") then
		local catname = ucfirst(label)
		desc = gsub_escaping_replacement(desc, "{{{umbrella_msg}}}",
			"This category contains no dictionary entries, only other categories. The subcategories are of two " ..
			"sorts:\n\n* Subcategories named like \"{{{thespref}}}aa:" .. catname ..
			"\" (with a prefixed language code) are categories of terms in specific languages. " ..
			"You may be interested especially in ], for English terms.\n" ..
			"* Subcategories of this one named without the prefixed language code are further categories just like " ..
			"this one, but devoted to finer topics."
		)
	end
	if desc:find("{{{topic}}}") then
		-- Compute the value for {{{topic}}}. If the user specified `topic`, use it. (If we're an umbrella category,
		-- allow a separate value for `umbrella.topic`, falling back to `topic`.) Otherwise, see if the description
		-- was specified as 'default' or a variant; if so, parse it to determine whether to add "the" to the label.
		-- Otherwise, just use the label directly.
		local topic = not lang and topdata.umbrella and topdata.umbrella.topic or topdata.topic
		if topic then
			topic = process_default_add_the(data, topic)
		else
			local desc
			if not lang then
				desc = topdata.umbrella and get_and_cache(data, topdata.umbrella, "description") or
					get_and_cache(data, topdata, "umbrella_description")
			end
			desc = desc or get_and_cache(data, topdata, "description")
			local defaulted_desc, is_default = process_default_add_the(data, desc)
			if is_default then
				topic = defaulted_desc
			else
				topic = get_displaytitle_or_label(data)
			end
		end

		desc = gsub_escaping_replacement(desc, "{{{topic}}}", topic)
	end
	
	desc = desc:gsub("{{{thespref}}}", data.thesaurus_data and "Thesaurus:" or "")

	return desc
end


local function process_box(data, def_topright_parts, val, pattern)
	if not val then
		return
	end
	local defval = ucfirst(data.label)
	if type(val) ~= "table" then
		val = {val}
	end
	for _, v in ipairs(val) do
		if v == true then
			insert(def_topright_parts, pattern:format(defval))
		else
			insert(def_topright_parts, pattern:format(v))
		end
	end
end


local function get_topright(data)
	local topdata, lang = data.topdata, data.lang
	local def_topright_parts = {}
	process_box(data, def_topright_parts, topdata.wp, "{{wikipedia|%s}}")
	process_box(data, def_topright_parts, topdata.wpcat, "{{wikipedia|category=%s}}")
	process_box(data, def_topright_parts, topdata.commonscat, "{{commonscat|%s}}")

	local def_topright
	if #def_topright_parts > 0 then
		def_topright = concat(def_topright_parts, "\n")
	end

	if lang then
		return substitute_template_specs(data, topdata.topright or def_topright)
	else
		return topdata.umbrella and substitute_template_specs(data, topdata.umbrella.topright) or
			substitute_template_specs(data, def_topright)
	end
end


local function remove_lang_params(desc)
	desc = desc:gsub("^{{{langname}}} ", "")
	desc = desc:gsub("{{{langcode}}}:", "")
	desc = desc:gsub("^{{{langcode}}} ", "")
	desc = desc:gsub("^{{{langcat}}} ", "")
	return desc
end


local function get_additional_msg(data)
	local types = split_types(data.topdata.type)
	if #types > 1 then
		local parts = {"'''NOTE''': This is a mixed category. It may contain terms of any of the following category types:"}
		for i, typ in ipairs(types) do
			insert(parts, ("* %s {{{topic}}}%s"):format(type_data.desc, i == #types and "." or ";"))
		end
		insert(parts, "'''WARNING''': Such categories are strongly dispreferred and should be split into separate per-type categories.")
		return concat(parts, "\n")
	elseif label == "all topics" then
		return "'''NOTE''': This is the topmost topic category for {{{langname}}}. It should not directly contain " ..
		"any terms, but only lists of topic categories organized by type."
	else
		return type_data].additional
	end
end


local function get_labels_categorizing(data)
	local m_labels_utilities = require(labels_utilities_module)
	return m_labels_utilities.format_labels_categorizing(
		m_labels_utilities.find_labels_for_category(data.label, "topic", data.lang), nil, data.lang)
end


-- Return the description along with the text following and preceding the description. The description and additional
-- (i.e. following) text are returned in the form of closures so the work of calculating the text (which can be
-- expensive, especially in the case of the additional text, where get_labels_categorizing() scans the entire set of
-- labels for any that categorize into this category) is not done when not needed, e.g. in higher levels of the
-- breadcrumb chain, where only the breadcrumb and parents (in fact, really just the first parent) are actually needed.
local function get_description_additional_preceding(data)
	local topdata, lang, label = data.topdata, data.lang, data.label
	local desc, additional, preceding

	-- This is kind of hacky, but it works for now.
	local function postprocess_thesaurus(txt)
		if not txt then
			return nil
		end
		if not data.thesaurus_data then
			return txt
		end
		txt = txt:gsub(" terms()", " thesaurus entries%1")
		return txt
	end

	if lang then
		desc = function()
			return postprocess_thesaurus(substitute_template_specs(data,
				replace_special_descriptions(data, get_and_cache(data, topdata, "description"))))
		end
		preceding = topdata.preceding
		additional = function()
			local additional_parts = {}
			if topdata.additional then
				insert(additional_parts, topdata.additional)
			end
			if not data.thesaurus_data then
				insert(additional_parts, get_additional_msg(data))
				local labels_msg = get_labels_categorizing(data)
				if labels_msg then
					insert(additional_parts, labels_msg)
				end
			end
			return postprocess_thesaurus(substitute_template_specs(data, concat(additional_parts, "\n\n")))
		end
	else
		if label == "all topics" then
			desc = "This is the topmost topic category for all languages."
			additional = "It contains no dictionary entries, only other categories. The subcategories are of two " ..
				"sorts:\n\n" ..
				"* Subcategories listed at the beginning, without a prefixed language code, are grouping " ..
				"categories similar to this category, but are devoted to general subject areas. Under them are " ..
				"finer-grained subject areas.\n" ..
				"* Subcategories named like \"aa:All topics\" (with a prefixed language code) are top-level " ..
				"categories like this one, but for specific languages. You may be interested especially in " ..
				"], for English terms.\n" ..
				"Note that categories under this tree categorize terms semantically rather than grammatically. " ..
				"Grammatical categories (such as all French verbs, or all English irregular plural forms) " ..
				"have a different naming structure, with the language name spelled out, such as " ..
				"] or ]."
			return desc, additional
		end

		-- Assume that if the description field contains a function, the function will return non-nil, so we don't
		-- have to call the function at this point (in case it is heavyweight).
		local has_umbrella_desc = topdata.umbrella and topdata.umbrella.description or topdata.umbrella_description

		desc = function()
			local desc = topdata.umbrella and get_and_cache(data, topdata.umbrella, "description") or
				get_and_cache(data, topdata, "umbrella_description")
			if not desc then
				 desc = get_and_cache(data, topdata, "description")
				 if desc then
					desc = replace_special_descriptions(data, desc)
					desc = remove_lang_params(desc)
					desc = desc:gsub("%.$", "")
					desc = "This category concerns the topic: " .. desc .. "."
				 end
			end
			if not desc then
				desc = "Categories concerning " .. label .. " in various specific languages."
			end
			return postprocess_thesaurus(substitute_template_specs(data, desc))
		end

		preceding = topdata.umbrella and topdata.umbrella.preceding or not has_umbrella_desc and topdata.preceding
		if preceding then
			preceding = remove_lang_params(preceding)
		end

		additional = function()
			local additional_parts = {}
			local topdata_additional = topdata.umbrella and topdata.umbrella.additional or
				not has_umbrella_desc and topdata.additional
			if topdata_additional then
				insert(additional_parts, remove_lang_params(topdata_additional))
			end
			insert(additional_parts, "{{{umbrella_msg}}}")
			if not data.thesaurus_data then
				insert(additional_parts, get_additional_msg(data))
				local labels_msg = get_labels_categorizing(data)
				if labels_msg then
					insert(additional_parts, labels_msg)
				end
			end
			return postprocess_thesaurus(substitute_template_specs(data, concat(additional_parts, "\n\n")))
		end
	end

	preceding = substitute_template_specs(data, preceding)
	return desc, additional, preceding
end


local function normalize_sort_key(data, sort)
	local lang, label = data.lang, data.label
	if not sort then
		-- When defaulting sort key to label, strip 'The ' (e.g. in 'The Matrix', 'The Hunger Games')
		-- and 'A ' (e.g. in 'A Song of Ice and Fire', 'A Christmas Carol') from label.
		local stripped_sort = label:match("^he (.*)$")
		if stripped_sort then
			sort = stripped_sort
		end
		if not stripped_sort then
			stripped_sort = label:match("^ (.*)$")
			if stripped_sort then
				sort = stripped_sort
			end
		end
		if not stripped_sort then
			sort = label
		end
	end

	sort = substitute_template_specs(data, sort)

	if not lang then
		sort = " " .. sort
	end

	return sort
end


local function get_topic_parents(data)
	local topdata, lang, label = data.topdata, data.lang, data.label
	local parents = topdata.parents

	if not lang and label == "all topics" then
		return {{ name = "Category:Fundamental", sort = "topics" }}
	end

	if not parents or #parents == 0 then
		return nil
	end

	local ret = {}

	for _, parent in ipairs(parents) do
		parent = mw.clone(parent)

		if type(parent) ~= "table" then
			parent = {name = parent}
		end

		parent.sort = normalize_sort_key(data, parent.sort)

		if type(parent.name) ~= "string" then
			error(("Internal error: parent.name is not a string: parent = %s"):format(dump(parent)))
		end
		if parent.name:find("^Category:") or parent.nontopic then
			-- leave as-is
			parent.nontopic = nil
		else
			parent.name = make_category_name(lang, parent.name)
		end
		parent.name = substitute_template_specs(data, parent.name)
		
		insert(ret, parent)
	end

	local function make_list_of_type_parent(typ)
		return {
			name = make_category_name(lang, ("list of %s categories"):format(typ)),
			sort = (not lang and " " or "") .. label,
		}
	end

	if topdata.type ~= "toplevel" then
		local types = split_types(topdata.type)
		for _, typ in ipairs(types) do
			insert(ret, make_list_of_type_parent(typ))
		end
		if #types > 1 then
			insert(ret, make_list_of_type_parent("mixed"))
		end
	end

	-- Add umbrella category.
	if lang then
		insert(ret, {
			name = make_category_name(nil, label),
			sort = lang:getCanonicalName(),
		})
	end

	return ret
end


local function get_thesaurus_parents(data)
	local topdata, lang, label = data.topdata, data.lang, data.label
	local parent_substitutions = data.thesaurus_data.parent_substitutions
	local parents = topdata.parents

	if not parents or #parents == 0 then
		return nil
	end

	local ret = {}

	for _, parent in ipairs(parents) do
		-- Process parent categories as follows:
		-- 1. skip non-topic cats and meta-categories that start with "List of"
		-- 2. map "en:All topics" to "English thesaurus entries" (and same for other languages), but map "All topics" itself to the root "Thesaurus" category
		-- 3. check if this parent is to be substituted, if so, substitute it
		-- 4. prepend "Thesaurus:" to all other category names
		parent = mw.clone(parent)

		if type(parent) ~= "table" then
			parent = {name = parent}
		end

		parent.sort = normalize_sort_key(data, parent.sort)

		if type(parent.name) ~= "string" then
			error(("Internal error: parent.name is not a string: parent = %s"):format(dump(parent)))
		end
		if parent.name:find("^Category:") or parent.nontopic then
			-- skip
		elseif parent.name == "all topics" or parent_substitutions == "all topics" then
			if not lang then
				insert(ret, {
					name = "Thesaurus",
					sort = label,
				})
			else
				insert(ret, {
					name = "thesaurus entries",
					sort = parent.sort,
					lang = lang:getCode(),
					is_label = true,
				})
			end
		else
			parent.name = "Thesaurus:" .. make_category_name(lang, parent_substitutions or parent.name)
			parent.name = substitute_template_specs(data, parent.name)
			insert(ret, parent)
		end
	end

	-- Add the non-thesaurus version of this category as a parent, unless it is a thesaurus-only category.
	if not topdata.thesaurusonly then
		insert(ret, { name = make_category_name(lang, label), sort = " " })
	end

	-- Add umbrella category.
	if lang then
		insert(ret, {
			name = "Thesaurus:" .. make_category_name(nil, label),
			sort = lang:getCanonicalName(),
		})
	end

	return ret
end


local function generate_spec(category, lang, upcase_label, thesaurus_data)
	local label_data = require(topic_data_module)
	local label

	-- Convert label to lowercase if possible
	local lowercase_label = mw.getContentLanguage():lcfirst(upcase_label)

	-- Check if the label exists
	local labels = label_data

	if labels then
		label = lowercase_label
	else
		label = upcase_label
	end

	local topdata = labels

	-- Go through handlers
	if not topdata then
		for _, handler in ipairs(label_data) do
			topdata = handler.handler(label)
			if topdata then
				topdata.module = handler.module
				break
			end
		end
	end

	if not topdata then
		return nil
	end

	local data = {
		category = category,
		lang = lang,
		label = label,
		topdata = topdata,
		thesaurus_data = thesaurus_data,
	}

	local description, additional, preceding = get_description_additional_preceding(data)
	local parents
	if thesaurus_data then
		parents = get_thesaurus_parents(data)
	else
		parents = get_topic_parents(data)
	end

	return {
		lang = lang and lang:getCode() or nil,
		description = description,
		additional = additional,
		preceding = preceding,
		parents = parents,
		breadcrumb = get_breadcrumb(data),
		displaytitle = format_displaytitle(data, "include lang prefix", "upcase"),
		topright = get_topright(data),
		module = topdata.module,
		can_be_empty = not lang,
		hidden = false,
	}
end


-- Handler for `Thesaurus:...` categories.
table.insert(raw_handlers, function(data)
	local code, upcase_label = data.category:match("^Thesaurus:(%l*%a):(.+)$")
	local lang
	if code then
		lang = require(languages_module).getByCode(code)
		if not lang then
			mw.log(("Category '%s' looks like a language-specific thesaurus category but unable to match language prefix"):
				format(data.category))
			return nil
		end
	else
		upcase_label = data.category:match("^Thesaurus:(.+)$")
	end

	if upcase_label then
		local thesaurus_data = require(thesaurus_data_module)
		-- substituted category names are not allowed
		if thesaurus_data.parent_substitutions then
			error(("Category is not allowed as a Thesaurus category: %s (see the list of parent substitutions at " ..
				"])"):format(data.category))
		end
		return generate_spec(data.category, lang, upcase_label, thesaurus_data)
	end
end)


-- Handler for regular topic categories.
table.insert(raw_handlers, function(data)
	local code, upcase_label = data.category:match("^(%l*%a):(.+)$")
	local lang
	if code then
		lang = require(languages_module).getByCode(code)
		if not lang then
			mw.log(("Category '%s' looks like a language-specific topic category but unable to match language prefix"):
				format(data.category))
			return nil
		end
	else
		upcase_label = data.category
	end

	return generate_spec(data.category, lang, upcase_label)
end)


-----------------------------------------------------------------------------
--                                                                         --
--                              RAW CATEGORIES                             --
--                                                                         --
-----------------------------------------------------------------------------


raw_categories = {
	description = "Category for entries of the Wiktionary thesaurus, located in a separate namespace.",
	additional = [=[
There are '''three ways to browse''' the thesaurus:
* Look under ''']''' to get started.
* Use the search box below.
* Browse the thesaurus by topic using the links under "Subcategories" below.

The main project page is ].

{{ws header|<nowiki/>|link=}}]=],
	parents = {
		"Category:Fundamental",
		"Category:Wiktionary projects",
	},
}

return {RAW_CATEGORIES = raw_categories, RAW_HANDLERS = raw_handlers}