Explaining XML Basics

Objective

After completing this lesson, you will be able to Explaining XML Basics.

XML Basics

XML is a markup language that defines a set of rules for encoding documents in a format that is human-readable and machine-readable. It is a textual data format supporting Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

The XML specification published by the World Wide Web Consortium (W3C) defines a metalanguage on the basis of which application-specific languages are defined by structural and content restrictions. These restrictions are expressed by schema languages such as Document Type Definition (DTD) or XML Schema. Examples of XML languages are: RSS, MathML, GraphML, XHTML, XAML, Scalable Vector Graphics (SVG), GPX, but also XML Schema.

An XML document consists of text characters, in the simplest case in ASCII coding, and is thus human-readable. By definition, it does not contain binary data.

The most important structural unit of an XML application is the element. The name of an XML element can be chosen by the user. Elements can contain additional elements, text nodes, and other nodes - even mixed if necessary. Elements are the carriers of information in an XML document, regardless of whether it is text, images, or so on.

According to the tag hierarchy, the elements are called root node, parent node, or child node. An XML document can also be understood as a tree structure. Such a result tree is generated by parsers, for example.

XML - Well-formed and Valid

The XML specification defines an XML document as well-formed text, meaning that it satisfies some syntax rules.

Some key points:

  • The XML document starts with an XML declaration.
  • A single root element contains all the other elements.
  • All elements have a start-tag and an end-tag.
  • Tag names are case sensitive, so start-tag and end-tag have to match exactly.
  • The document contains only properly encoded legal Unicode characters.
  • An element must not have more than one attribute with the same name.

If these rules are not observed, then the XML document is not well-formed. Most XML editors offer a well-formedness check.

In addition to being well-formed, an XML document may be valid. This means that it contains a reference to a Document Type Definition (DTD) or an XML schema (XSD), and that its elements and attributes are declared in that DTD / XSD and follow the grammatical rules for them that the DTD / XSD specifies.

XML-Schema Definition

XML Schema Definition (XSD), is a W3C recommendation for defining structures for XML documents. Unlike classic XML DTDs, the structure is described in the form of an XML document. In addition, a large number of data types are supported.

XML schema describes data types, individual XML schema instances (documents), and groups of such instances in a complex schema language. A concrete XML schema is also referred to as an XSD (XML Schema Definition) and usually has the extension ".xsd" as a file. Unlike DTDs, XML Schemas can be used to distinguish between the name of the XML type and the name of the XML tag used in the instance.

XML Namespaces

You can use own element names (tags) in XML documents. To ensure the uniqueness of these tags, individual tags can be assigned to a specific namespace. Each namespace is uniquely identified by a URI (Uniform Resource Identifier). For example, a URL can be used as an URI. The protocol (http://) does not have to be specified for the URI and the URI does not have to point to a document. Namespaces must be unique, which is why internet addresses are almost always used in URIs for XML applications since these are unique and, therefore, the uniqueness of the namespace is guaranteed. If a tag belongs to a particular namespace, this namespace must also be assigned to the tag. The attribute, xmlns, which contains the namespace as an attribute value, is added to the tag.

The namespace then also applies to all child elements of this tag. If you add the attribute to the root element, the namespace applies to the entire document.

This type of namespace specification works as long as elements from different namespaces are not mixed. If elements from different namespaces appear mixed, it is usually more appropriate to use qualified names:

The figure shows the position in which the default namespace and qualified name is placed within the coding.

Log in to track your progress & complete quizzes