@aschmitz (probably the actual easiest way is just to construct some xml and see if xmllint can parse it or throws an error. with appropriate escaping of the input of course)
@aschmitz (i am not actually trying to validate NCNames in shell right now i just needed a fast way to confirm that the approach worked. but i might at some point, for example as a filename restriction in some code)
@aschmitz i would like to think you could LC_ALL=POSIX and write some extended regular expressions which just manually manage the UTF-8 bytes but (a) that sounds terrible and (b) i’m not actually sure that all implementations allow this in practice
i need at least cross-platform between macOS and debian for anything i write, and unicode support and error handling is one of those things that is liable to be subtly different between those platforms
@aschmitz (and if we want to be fully honest, unicode support in grep/sed is dicey so the answer might actually be no)
@aschmitz (grep actually would probably be more appropriate but same difference)
@aschmitz you could do it in sed but i’m not sure the regex would be shorter and personally i would rather not worry that i may have a bug in my regular expression
everyone decided to build operating systems around C code and a portable shell and then they standardized that, but everyone agreed on a whole lot of other things also and they decided that wasn’t worth the bother
Posix be like, “ok but suppose someone is developing an operating system which never has to process XML” 🙄
this might be the easiest portableish way of doing this and that is a condemnation of the current state of computing on Posix
here is a shell command you can run to test whether the value of MAYBE_NCNAME is an ncname or not; returns exit status 0 if it is and 1 otherwise
printf '%s\n' '<transform xmlns="http://www.w3.org/1999/XSL/Transform" xmlns:exsldyn="http://exslt.org/dynamic" version="1.0"><param name="thing"/><template match="/"><choose><when test="/self::node()[translate(normalize-space($thing), " /([,*", "")=string($thing) and exsldyn:evaluate(concat("not(self::exsldyn:", $thing, ")"))]">ok</when><otherwise>ng</otherwise></choose></template></transform>' | xsltproc --stringparam thing "${MAYBE_NCNAME}" --html --novalid - /dev/null 2>/dev/null | grep -F -q -x 'ok'
/self::node()[translate(normalize-space($thing), ' /([,*', '')=string($thing) and exsldyn:evaluate(concat('not(self::exsldyn:', $thing, ')'))]
i think is safe
this probably actually needs a touch of hardening (making sure $thing does not contain a '[' or '/', or else you could run into serious issues) but
XSLT 1.0 + EXSLT is a constant exercise in things you probably weren’t intended to be able to do but nevertheless can
@akjcv mastodon instance actors have periods in them which i think are otherwise disallowed so it might have problems
but otherwise i don't know why it would be an issue; it's only socially problematic because an instance maybe can't block that account without breaking federation
@aescling there's a bunch of virtual console on my wii u but i don't think it's that one
@aescling not in delaware
@aescling praying it comes to nintendo switch online i guess
Administrator / Public Relations for GlitchCat. Not actually glitchy, nor a cat. I wrote the rules for this instance.
“Constitutionally incapable of not going hard” — @aescling
“Fedi Cassandra” – @Satsuma
I HAVE EXPERIENCE IN THINGS. YOU CAN JUST @ ME.
I work for a library but I post about Zelda fanfiction.