**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 18:32

**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 18:32

Vaporeon ☭ @vaporeon_@glaceon.social

Mar 10, 2026, 18:32

Vaporeon ☭ @vaporeon_@glaceon.social

Yesterday, I finally got JSON parsing working, so today, I must fight HTML

**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 19:08

**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 19:08

Mar 10, 2026, 19:08

Vaporeon ☭ @vaporeon_@glaceon.social

<- reading things about SGML and the HTML4 DTD and such

This was a reasonable introduction, but I still don't understand:

For some elements, e.g. <BODY> it's possible to omit the start tag and the end tag... So how do I know when such an element begins and when it is over? (Since the HTML of Mastodon posts doesn't use <HEAD> and <BODY>, I suppose I can ignore this and expect every element to have a start tag, but I still would like to know...)
The SGML says which elements are allowed to be contained in an element, for example:

<!ELEMENT P - O (%inline;)*            -- paragraph -->

this says that a  must have a start tag, the end tag is allowed to be omitted, and that its children are allowed to be zero or more %inline... So what is my parser supposed to do if a forbidden element is encountered? Since for , the closing tag may be omitted, I guess the reasonable thing to do would be to close that paragraph and open whatever the new tag is. E.g. Line oneLine two, upon encountering the second , it would open a new paragraph, since  is not allowed to contain a .

But in general, if the end tag may not be omitted, what do I do then? Throw an error?

**æʃliŋ, loaf of autismo** @aescling@cat.family · Mar 10, 2026, 19:18

**æʃliŋ, loaf of autismo** @aescling@cat.family · Mar 10, 2026, 19:18

Mar 10, 2026, 19:18

æʃliŋ, loaf of autismo @aescling@cat.family

@vaporeon_ the living standard is much more explicit about answering these kinds of questions. see, e.g., what they have to say about 

**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 19:22

**Vaporeon ☭** @vaporeon_@glaceon.social · Mar 10, 2026, 19:22

Mar 10, 2026, 19:22

Vaporeon ☭ @vaporeon_@glaceon.social

@aescling That page ate my laptop's entire remaining RAM and I had to close it... I did see that there's a multipage version, can you perhaps link that? Sorry, I'm still on the laptop with less than 700MB of RAM...

**æʃliŋ, loaf of autismo** @aescling@cat.family · 2026-03-10T19:38:19Z

æʃliŋ, loaf of autismo @aescling@cat.family

@vaporeon_ try https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element

Mar 10, 2026, 19:38 · · · ·

Resources

Developers

What is Mastodon?

cat.family

More…