2007-01-16
Unifying config file syntaxes with nesting
In his post “Ok, now we are getting somewhere! (Re: XML-based configurations)”, Gunnar Wolf writes that he doesn’t regard XML as a suitable standard for configuration files which need some sort of nesting (like, e.g. the apache configuration needs nesting while a config for a simple /usr/bin/sendmail replacement like nullmailer or msmtp won’t need nesting). He seems to like YAML, as you can see in his first post on the subject: Configuration files for humans and for computers. I looked into YAML a bit and to me, it doesn’t seem like a good replacement for XML in config files. Sure, the files look simpler, but I would have a heck of a problem to memorize the syntax. Sure, a colon (:) is intuitive for a direct mapping, but remembering that a pipe (”|”) marks a literal which preserves newline while the default is a scalar which maps newlines to spaces is harder. Folded style (”>”) is even harder to remember. It folds multiple lines with the same indentation into a single line unless there is a differently indented or empty line in between.
More important is that I honestly hate syntaxes which rely on indentation. Indentation is dead useful when a nested config or source is read by a human, but it often causes problems when edited by a human, especially when tabs are used. A program could (and most often would) see a difference between “test” and “<8*space>test”, while a human would most probably not see a difference with default tabs.
To sum it up: I see the problems with XML, most notably bad program outputs which don’t use indentation (some even put everything on a single line) and the sometimes overwhelming amount of whatever constructs, but YAML doesn’t seem to be the right solution to me neither, especially due to the indentation-used-as-syntax-element problem.
What I really would like to see is a syntax that doesn’t use indentation as a syntax element and doesn’t need the name of the opening tag when closing it, but still allows to use it. In other words both of the following constructs (or an equivalent which looks less like XML) should be legal:
<x>
whatever
</>
<x>
whatever
</x>
While the following should raise an error:
<x>
whatever
</y>

chithanh said,
January 16, 2007 at 21:35 CET (+0100)
This is part of XML’s SGML legacy. As SGML allows applications where closing tags are optional, it might be ambiguous which tag ends when the name is missing.
Gunnar said,
January 17, 2007 at 01:40 CET (+0100)
Well… The thing is, I don’t understand who came up with this silly SGML syntax. As you say, why repeat what you have already said? I used TeX for the first time in 1983 (or around 1983, at least), and it was much saner. {\something … } – ’something’ is valid from where it is declared until its block is closed. Of course, it’s possible not to close a brace in the right place, or to misplace \something so it falls before the brace, but getting a balanced document after copying/deleting a portion is (almost) as easy as adding braces until the balance is restored – and you can later fix the little details. Same thing goes for POD, although it has a bit stranger syntax, or for YAML, or for whatnot.
No wonder when you program you basically do the same. It would be silly to do a <while (condition)> (…) </while>
sven said,
January 17, 2007 at 15:37 CET (+0100)
chithanh: Well, I find the ambiguity of unnamed end tags far less worrying than leaving out end tags completely.
Gunnar: Well, speeking of TeX: Why do I have to close \begin{document} with \end{document} then? But you are right that fixing the syntax by inserting a few closing braces and later fix the “little” details is convenient: Just fix the syntax first and see what the interpreter doesn with it to find out where the braces should actually be put. That’s often far easier than reading all the source to find the right place. Especially since most interpreters don’t point you to the place where you introduced the error but to the point where the interpreter decided it couldn’t get fixed anymore (often at the very end of the code or document source).
But when I program, I often use stuff like this (bash shell code in this case, since most readers should be familiar with the syntax):
This is to give me hints where certain loops end or at least should end. That’s also what I use indenting for: To give me visual hints about the program structure. But as said in my original post: I don’t think indentation is suitable to give the interpreter/compiler information about the syntactical structure of a program.
I’m wondering a bit why no Python user jumped at that assertion yet. I’ve not yet used Python myself, but as far as I know, it uses indentation as a syntax element.