XML Syntax Rules
The syntax rules of XML are very simple and logical. The rules are easy to learn, and easy to use.
XML Documents Must Have a Root Element
XML documents must contain one root element that is the parent of all other elements:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
In this example <note> is the root element:
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The XML Prolog
This line is called the XML prolog:
<?xml version="1.0" encoding="UTF-8"?>
The XML prolog is optional. If it exists, it must come first in the document.
XML documents can contain international characters, like Norwegian øæå or French êèé.
To avoid errors, you should specify the encoding used, or save your XML files as UTF-8.
UTF-8 is the default character encoding for XML documents.
Character encoding can be studied in our Character Set Tutorial.
UTF-8 is also the default encoding for HTML5, CSS, JavaScript, PHP, and SQL. |
All XML Elements Must Have a Closing Tag
In HTML, some elements might work well, even with a missing closing tag:
<p>This is a paragraph.
<br>
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph.</p>
<br />
The XML prolog does not have a closing tag. This is not an error. The prolog is not a part of the XML document. |
XML Tags are Case Sensitive
XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
"Opening and closing tags" are often referred to as "Start and end tags". Use whatever you prefer. It is exactly the same thing.
XML Elements Must be Properly Nested
In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
In the example above, "Properly nested" simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element.
XML Attribute Values Must be Quoted
XML elements can have attributes in name/value pairs just like in HTML.
In XML, the attribute values must always be quoted.
INCORRECT:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
CORRECT:
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element.
This will generate an XML error:
<message>salary < 1000</message>
To avoid this error, replace the "<" character with an entity reference:
<message>salary < 1000</message>
There are 5 pre-defined entity references in XML:
< | < | less than |
> | > | greater than |
& | & | ampersand |
' | ' | apostrophe |
" | " | quotation mark |
Only < and & are strictly illegal in XML, but it is a good habit to replace > with > as well. |
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
Two dashes in the middle of a comment are not allowed.
Not allowed:
<!-- This is a -- comment -->
Strange, but allowed:
<!-- This is a - - comment -->
White-space is Preserved in XML
XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one single white-space):
XML: | Hello Tove |
HTML: | Hello Tove |
XML Stores New Line as LF
Windows applications store a new line as: carriage return and line feed (CR+LF).
Unix and Mac OSX uses LF.
Old Mac systems uses CR.
XML stores a new line as LF.
Well Formed XML
XML documents that conform to the syntax rules above are said to be "Well Formed" XML documents.