@Lady No fan of U+FFFC?
@aschmitz U+FFFC is allowed in XML documents; i need a character which is NOT allowed in XML documents but which is still a valid Unicode character
there are three of these: U+0000 (not ideal), U+FFFE, and U+FFFF (both of these last two are great)
@aschmitz (well, do not use U+FFFE in a UTF‐16 environment where it might be confused for a byte‐swapped U+FEFF)
@Lady Fair enough I suppose, though expecting that your input will always be valid feels like asking for a certain kind of trouble. But if you're the one writing it you're probably okay. (And yeah, though FFFE is theoretically allowed I'd avoid it for the reason you say unless you can guarantee it won't show up early.)
@aschmitz the other best‐practice with noncharacters is to never store them in a place where anyone other than the program which understands their meaning will see them
having the noncharacters produce XML which isn’t valid provides a bit of a guarantee against that; a downstream recipient SHOULD error out if it receives a document where the noncharacter wasn’t handled/removed
@aschmitz i am very disappointed in the state of XML parsers as well
@Lady Ideally! (In my world, most XML parsers are extremely far from validating, but a final check that things are valid as they depart is feasible enough, at least.)