Module:User:Erutuon/patterns

Hello, you have come here looking for the meaning of the word Module:User:Erutuon/patterns. In DICTIOUS you will not only get to know all the dictionary meanings for the word Module:User:Erutuon/patterns, but we will also tell you about its etymology, its characteristics and you will know how to say Module:User:Erutuon/patterns in singular and plural. Everything you need to know about the word Module:User:Erutuon/patterns you have here. The definition of the word Module:User:Erutuon/patterns will help you to be more precise and correct when speaking or writing your texts. Knowing the definition ofModule:User:Erutuon/patterns, as well as those of other words, enriches your vocabulary and provides you with more and better linguistic resources.

Contains a function that determines if a pattern will behave in exactly the same way in both the basic Lua string functions (string.find, string.match, string.gsub, string.gmatch) and the Ustring functions (mw.ustring.find, mw.ustring.match, mw.ustring.gsub, mw.ustring.gmatch). This assumes text is validly encoded in UTF-8, the encoding used by MediaWiki.

Beware: the function tells you that a pattern requires the Ustring functions if it contains character classes ("%w, %s, %d, ...". But this is not always true. It depends what characters you actually need the character class to match. For example, if you're matching language codes, which only contain ASCII, then the character class "%a" only needs to match ASCII alphabetic characters ("" (not, for instance, Greek alphabetic characters "αβγδεζηθ..."), and the basic string function can be used.

Usage

Don't use this in an actual template-invoked function to decide between string and mw.ustring on the fly. That would be very inefficient. If you want, use it to test a pattern and decide whether string will work just as well as mw.ustring in a given instance. Switching to string saves a noticeable amount of processing time if the function involves a lot of pattern-matching.

local UstringNotNeeded = require("Module:User:Erutuon/patterns").canUseString(pattern_to_check)
--> true if pattern will behave in string functions, false if it requires Ustring functions

The function log sends a message to the log if a mw.ustring function can be replaced by the corresponding string function.


local export = {}

-- Non-ASCII bytes. Only \128-\191 and \194-\244 are actually used in UTF-8.
local nonASCIIByte = ""
local nonASCIIChar = "+"

-- Character classes that will match non-ASCII codepoints if they are used in
-- Ustring pattern-matching functions.
local ustringClass = "%%"

local function isPattInPatt(str, patt1, patt2)
	for match in string.gmatch(str, patt1) do
		if string.find(match, patt2) then
			return true
		end
	end
	return false
end

-- Function to determine whether a pattern will behave exactly the same in the
-- basic string functions as it does in the Ustring functions.

-- The Lua-implemented version of Ustring has a function that supposedly makes
-- this determination, but it's overly conservative (disqualifying a pattern
-- if it contains non-ASCII bytes). Not sure about the PHP version of Ustring
-- that is actually used by Scribunto.

-- This does not check that the string is well-formed UTF-8.
function export.canUseString(pattern)
	assert(type(pattern) == "string", ("argument #1 to canUseString should be string, but is " .. type(pattern)))
	
	-- Remove percent sign followed by anything besides a letter.
	pattern = string.gsub(pattern, "%A", "")
	
	-- If non-ASCII inside a set: "", ""
	if isPattInPatt(pattern, "%b", nonASCIIByte) then
		return false
	end
	
	-- In Ustring, the classes listed in the pattern all contain multi-byte characters.
	if string.find(pattern, ustringClass) then
		return false
	end
	
	-- Quantifier following multi-byte character:
	-- in basic string function, quantifiers act on bytes, not on UTF-8 characters.
	if string.find(pattern, nonASCIIChar .. "") then
		return false
	end
	
	-- In basic string function, dot matches a single byte, not a UTF-8 character.
	if string.find(pattern, "%.") then
		return false
	end
	
	return true
end

function export.log()
	for _, funcname in ipairs{ "find", "match", "gmatch", "gsub" } do
		local old_func = mw.ustring
		mw.ustring = function (str, patt, ...)
			if export.canUseString(patt) then
				mw.log(funcname, str, patt, "can use string")
			end
			return old_func(str, patt, ...)
		end
	end
end

return export