here is a shell command you can run to test whether the value of MAYBE_NCNAME is an ncname or not; returns exit status 0 if it is and 1 otherwise
printf '%s\n' '<transform xmlns="http://www.w3.org/1999/XSL/Transform" xmlns:exsldyn="http://exslt.org/dynamic" version="1.0"><param name="thing"/><template match="/"><choose><when test="/self::node()[translate(normalize-space($thing), " /([,*", "")=string($thing) and exsldyn:evaluate(concat("not(self::exsldyn:", $thing, ")"))]">ok</when><otherwise>ng</otherwise></choose></template></transform>' | xsltproc --stringparam thing "${MAYBE_NCNAME}" --html --novalid - /dev/null 2>/dev/null | grep -F -q -x 'ok'
@Lady Can't do it in sed?
@aschmitz you could do it in sed but i’m not sure the regex would be shorter and personally i would rather not worry that i may have a bug in my regular expression
@aschmitz i would like to think you could LC_ALL=POSIX and write some extended regular expressions which just manually manage the UTF-8 bytes but (a) that sounds terrible and (b) i’m not actually sure that all implementations allow this in practice
i need at least cross-platform between macOS and debian for anything i write, and unicode support and error handling is one of those things that is liable to be subtly different between those platforms
@aschmitz (i am not actually trying to validate NCNames in shell right now i just needed a fast way to confirm that the approach worked. but i might at some point, for example as a filename restriction in some code)
@aschmitz (probably the actual easiest way is just to construct some xml and see if xmllint can parse it or throws an error. with appropriate escaping of the input of course)
@Lady At least cross-platform, I guess.