Follow

want to know if something is a valid NCName? easy! just do

/self::node()[exsldyn:evaluate(concat('not(self::exsldyn:', $thing, ')'))]

· · Web · 1 · 0 · 0

XSLT 1.0 + EXSLT is a constant exercise in things you probably weren’t intended to be able to do but nevertheless can

this probably actually needs a touch of hardening (making sure $thing does not contain a '[' or '/', or else you could run into serious issues) but

/self::node()[translate(normalize-space($thing), ' /([,*', '')=string($thing) and exsldyn:evaluate(concat('not(self::exsldyn:', $thing, ')'))]

i think is safe

here is a shell command you can run to test whether the value of MAYBE_NCNAME is an ncname or not; returns exit status 0 if it is and 1 otherwise

printf '%s\n' '<transform xmlns="w3.org/1999/XSL/Transform" xmlns:exsldyn="exslt.org/dynamic" version="1.0"><param name="thing"/><template match="/"><choose><when test="/self::node()[translate(normalize-space($thing), &quot; /([,*&quot;, &quot;&quot;)=string($thing) and exsldyn:evaluate(concat(&quot;not(self::exsldyn:&quot;, $thing, &quot;)&quot;))]">ok</when><otherwise>ng</otherwise></choose></template></transform>' | xsltproc --stringparam thing "${MAYBE_NCNAME}" --html --novalid - /dev/null 2>/dev/null | grep -F -q -x 'ok'

this might be the easiest portableish way of doing this and that is a condemnation of the current state of computing on Posix

Posix be like, “ok but suppose someone is developing an operating system which never has to process XML” 🙄

everyone decided to build operating systems around C code and a portable shell and then they standardized that, but everyone agreed on a whole lot of other things also and they decided that wasn’t worth the bother

@Lady yeah, I believe you that it might well be the easiest portablieish way to do this but when i look at the code ... well let's just say that 'easy' isn't the first word that sprang to mind.

@aschmitz you could do it in sed but i’m not sure the regex would be shorter and personally i would rather not worry that i may have a bug in my regular expression

@aschmitz (grep actually would probably be more appropriate but same difference)

@aschmitz (and if we want to be fully honest, unicode support in grep/sed is dicey so the answer might actually be no)

@aschmitz i would like to think you could LC_ALL=POSIX and write some extended regular expressions which just manually manage the UTF-8 bytes but (a) that sounds terrible and (b) i’m not actually sure that all implementations allow this in practice

i need at least cross-platform between macOS and debian for anything i write, and unicode support and error handling is one of those things that is liable to be subtly different between those platforms

@aschmitz (i am not actually trying to validate NCNames in shell right now i just needed a fast way to confirm that the approach worked. but i might at some point, for example as a filename restriction in some code)

@aschmitz (probably the actual easiest way is just to construct some xml and see if xmllint can parse it or throws an error. with appropriate escaping of the input of course)

Sign in to participate in the conversation
📟🐱 GlitchCat

A small, community‐oriented Mastodon‐compatible Fediverse (GlitchSoc) instance managed as a joint venture between the cat and KIBI families.