character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Packbat the charset <meta> tag is the last‐resort option; browsers will look at the Content-Type provided by the server first, so specifying it does nothing UNLESS your server doesn’t specify a content type. also, if you write XHTML, UTF-8 is the default option because all XML defaults to UTF‐8. if you’re the type of person who always closes your tags anyway, you might as well just write XHTML and save yourself some trouble
(XHTML is also distinguished from HTML by its content‐type; be sure to use an extension which your web browser will serve as XML not as HTML if you go this route)
re: character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Lady I actually have no idea how to check that! We're gonna edit a note into the OP, though, because that seems important
re: character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Packbat these instructions are for firefox but the process is similar for other browsers:
• open the Web Inspector (Tools > Browser Tools > Web Developer Tools)
• navigate to the Network tab
• reload the page
• click on the first request in the results (should be for the page you are on)
• in the panel that opens, look under “Response Headers” for “content-type”
if it says `text/html; charset=utf-8`, then no charset declaration is necessary
re: character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Packbat (a lot of web servers just default to this now because almost everyone in 2023 is writing HTML pages as utf-8)
character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Lady @Packbat I wouldn't recommend writing modern web pages in XHTML, though. The separate XHTML specs have been retired, and were never updated past equivalence with HTML 4.01. An XML serialization was defined for HTML 5 (this is *not* XHTML, although some people colloquially call it XHTML5) but it's rarely used. Some of the syntax and semantics are different enough that resources for HTML might not work as-is, and certain things that the HTML parser allows like directly writing non-namespaced SVG in the document are more work when you're dealing with XML. A bunch of the Javascript DOM interfaces have different behaviour too.
character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@kepstin @Packbat i disagree that the XML serialization of HTML5 isn't XHTML; it's in the xhtml namespace, has the mime type application/xhtml+xml, and most importantly is what i meant in the post
there are minor differences but generally the xml ones are “correct” and the html ones are “for compatibility reasons” (like uppercasing all tag names, despite nobody doing this in 2024)—with a few exceptions
character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Lady @Packbat official word on the name is at https://html.spec.whatwg.org/dev/introduction.html#html-vs-xhtml - "The XML syntax for HTML was formerly referred to as "XHTML", but this specification does not use that term (among other reasons, because no such term is used for the HTML syntaxes of MathML and SVG)."
The use of the application/xhtml+xml mime type is primarily for compatibility reasons - the XML serialization of modern HTML was designed to be a more or less forwards compatible update from XHTML 1.1, to ease adoption.
I'd still recommend using non-XML HTML for hand-written stuff, just because the syntax is simpler and it'll match examples that you find on the web on sites like MDN, without needing a mental translation. (e.g. boolean attributes always trip me up in the XML syntax.)
The reasons to choose XML in my opinion are mostly if a) you want the browser to refuse to render the page on syntax errors, instead of trying to recover and keep going; or b) you want to embed XML formats (other than SVG or MathML) into the document using XML namespaces.
re: character encodings re: HTML tips for 90s kids trying to make 20s websites, metadata edition
@Packbat *unless your server doesn’t specify a charset parameter in the content‐type, lol