Ok, so I'm not quite sure the difference between these languages. Could someone clarify? I know that XML has user-defined tag and html is pre-defined, but thats basically the extent of my knowledge.
I know that HTML5 is supposed to replace HTML, but wasn't XML supposed to do that as well? Basically, which languages here are a substitute for the other, and which complement? Does XML replace XHTML?
You can google or use wikipedia for exact definition. I'll just give an example:
HTML :
XHTML:
HTML 5:
XML is the syntax on which is based XHTML:
First, there was SGML, the conceptual ancestor of both HTML and XML, which is a
Then, HTML was created as a specific set of SGML tags used to define how web pages should be presented.
XML was created as a simplification of SGML.
XHTML was created to recast HTML as well-formed XML (requiring closing tags, for example, which hadn't been strictly necessary in SGML and HTML).
HTML 5 is the current version of HTML. It rejects the motivation behind XHTML and allows a looser specification of markup than the rules of XML would require.
XML is a meta language. A meta language is a language that provides a syntax mechanism for creating other languages without constraining expression through a predefined grammar. XML is defined in the SGML doctype language. Adherence to the strict syntax requirements of XML is called well-formedness, which is a practice of precise accuracy to a stated set of requirements in an effort to achieve uniform processing of a document across various different applications and user agents.
SGML is a meta language like XML and is even the parent of XML. SGML offers a broad form for defining data in uses of syntax without providing a data typing convention. Unlike SGML XML features a rigid and extremely simplified syntax that is not open to confusion. XML also features data type definitions also unlike SGML. Elements in XML provide namespace scope in a lambda fashion, while SGML provides no support for namespaces.
Doctype is an SGML based language that uses a syntax completely unlike XML for defining markup language grammars and broad data type conventions to tell data elements apart from text.
XML Schema is an XML written language that allows language grammar definitions with precise structural form in addition to specific data typing conventions for elements, structures, and attributes. Languages written in Schema structurally self-aware, unlike SGML vocabularies, so that they know of their own internal requirements at any various point in the structure. Languages defined by schema are able to be immediately open to validation through reference to the Schema document, due to the structural self-awareness, while languages defined in Doctype require unrelated software with static definitions to order to perform validation.
HTML 1.0 was written in English text and is neither SGML or XML.
HTML 2 - 4 are written in SGML and feature SGML flexibilities, such as uppercase tags or start tags without a matching closing tag.
XHTML 1.0 is an SGML defined form of the HTML language with some extended requirements to gleam progressive compatibility towards XML syntax.
XHTML 1.1 is the HTML language defined in XML with XML well-formedness requirements.
HTML5, like HTML 1.0, is not defined using any meta language. It is written in English text and moves radically in opposition of the uniform requirements of an XML serialization. HTML5 appears to be created for usability and media delivery without regard for structure or language hierarchies.
XHTML5 stands for "XML Serialization of HTML5" and is an XML syntax for HTML5 that can be used when serializing a DOM tree back into HTML5 (a DOM tree looses the ability to distinguish between tag soup tags and properly tags) and must adhere to the stricter XML rules and namespaces. It is meant for easier machine reading or data interchange, or when two HTML5 documents need to be compared. It is specified together with and in the HTML5 standard (thanks to hsivonen for pointing this out).
XML is a syntax: it defines how you write data, but not what data you can write. For example:
HTML is a vocabulary: it defines what kinds of elements you can write (e.g. BODY, P, LI, etc.) but isn't very strict about how you write it (see "Tag soup");
XHTML is (approximately) the HTML vocabulary except written using the (much stricter) XML syntax. It's therefore (because the syntax is stricter) easier for software to parse, but it's harder for non-programmers to write correctly. It isn't very popular, because Internet Explorer doesn't support it properly.
HTML5 is the next-generation version of HTML (the current version of HTML 4), still in draft, not a standard yet, partially supported by some browsers (and so, experimental). HTML5 will explicitly support being served either using the XML syntax or as tag soup.
The standards for all those languages are maintained by the World Wide Web Consortium.
The exact differences and subtleties are beyond the scope of a question on stackoverflow, but w3schools.com has some tutorials that can help you get started on this.
I'd suggest reading the intro to each of the languages you asked about on w3schools. That should give you some idea as to the differences.
HTML is the HyperText Markup Language, which is designed to create structured documents and provide for semantic meaning behind the documents. HTML5 is the next version of the HTML specification.
XML is the Extensible Markup Language, which provides rules for creating, structuring, and encoding documents. You often see XML being used to store data and to allow for communication between applications. It's programming language-agnostic - all of the major programming languages provide mechanisms for reading and writing XML documents, either as part of the core or in external libraries.
XHTML is an XML-based HTML. It serves the same function as HTML, but with the same rules as XML documents. These rules deal with the structure of the markup.