Many blogs use the concept of "tags" and "categories" to add metadata to a post. What is the best practice for semantic markup for this information, such that a machine reading the blog post could easily identify the tags?
Currently I add "tag"
to the rel
attribute on the link, e.g.
<a rel="tag" class="tag" href="/tags.html#site-configuration">#site-configuration</a>
I suppose one could use Dublin Core's html format for keyword:
<meta name = "DC.Subject"
content = "site-configuration">
and add this to the page header, or can meta tags go in the body? Is one or the other preferable, or some entirely different option?
Is there a better strategy in terms of providing precise and standardized definitions for content?
Is HTML5 a reasonable choice if I want to be so picky about metadata, or should I be using an XML doctype?
What are the pros and cons of the different approaches?
The first step would be to get/use the plain HTML semantically right. In case of (X)HTML5 you should build an appropriate outline using the sectioning content elements
section
,article
,aside
andnav
, and useheader
andfooter
to separate the metadata content from the main content; also think of inline-level semantics liketime
(publication date),dfn
(definitions),abbr
(abbreviations/acronyms) etc. And make use ofmeta
-name
andrel
values that are defined in the spec.The second step would be to make use of metadata attribute values that are not defined in the specification, but are registered at specified places (so they are valid to use), like
name
keywords formeta
elements andrel
values fora
/area
/link
elements.The third step would be to enhance the markup with semantic, machine-readable annotations. There are three common ways to do this:
class
andrel
values)RDFa and Microdata are similar (both extensible and rather complex), while Microformats is simpler (but not so expressive/extensible). I wrote a short answer over at Programmers about the differences, and more detailed answer about the differences between Microdata and RDFa.
In the case of RDFa or Microdata, your main job would be to find vocabularies/ontologies that are able to describe/classify your content. Such vocabularies can be created by everyone (you could even create one yourself), but it's often advisable to use well-known/popular ones, for example so that search engines can make use of your annotations (popular example: Schema.org).
In the case of Microformats, you'd have to find a Microformat (on the wiki at microformats.org) that suits your needs. If there is none for your case, you could propose a new Microformat (but that would take some time until it gets "accepted", if at all).
You could also use XHTML5, if you need/want XML support. If you "only" use the (X)HTML defined in the specification and no additional XML schemas/vocabularies, it won't matter from a semantic perspective if you use HTML(5) or XHTML(5).