I have been wondering which of those formats is "best"? Schema.org, Microdata, and RDFa are bit of a pain to implement. They can break validation and require quite an effort to put into documents.
JSON-LD is, at last for me, a way better to implement structured data. But does it work? What level of support is there for it (at least by Google)?
Schema.org is a vocabulary that can, like any other vocabulary, be used in many forms. The website http://schema.org/ has examples using Microdata and the RDF syntaxes RDFa and JSON-LD, but these are not the only syntaxes it can be used with. You could, for example, use it with any other RDF syntax like Turtle or RDF/XML.
There is no best syntax. They all have advantages and disadvantages. See for example my answer about differences between Microdata and RDFa. Note that you can use different syntaxes (and vocabularies) in the same document.
Now, if you have a specific consumer in mind, you should consult their documentation. However, support of syntaxes comes and goes, and not everything they might support is necessarily documented, and not everything that is documented necessarily works.
In case of Google, you are probably interested in their Rich Snippets. Their documentation about Rich Snippets mentions Microdata, Microformats and RDFa. However, note that not all linked examples use the Schema.org vocabulary, but the older Data-vocabulary.org or Microformats (as you can’t use vocabularies like Schema.org nor Data-vocabulary.org with Microformats). And there are also some Rich Snippets that aren’t listed on that page, like the Sitelinks Search Box, for which they even recommend the JSON-LD syntax.
As general advice: Search engines typically favor visible content over hidden metadata. For example, having keywords as hidden metadata easily allows authors to claim that their documents are about something different than they really are (either because of trying to trick the search engine, or because authors forget to update content in both places). Therefore, uncoupling the metadata from the content, like it’s the case with JSON-LD, could (possibly!) lead to the same issues current search engines have with hidden metadata. (If or which search engines actually handle it like that is a question which is off-topic on Stack Overflow.)
Another possible advantage for coupling the metadata with the content (for example, with RDFa), is that you could easily and automatically generate the same information in JSON-LD, Turtle etc. because everything’s just RDF. Just parse the RDFa, convert to formats of your preference, and embed (in script
) or link (with rel
-alternate
) it if it makes sense.
But yes, adding RDFa is often more complex than adding a JSON-LD blob, because you have to adapt it to the existing markup. (However, it should not "break validation" unless you’re making mistakes.)
The lines between Microdata, RDFa, and JSON-LD are indeed currently very blurry and that there is still no widely accepted de facto among the three. This will have to wait for now. Perhaps a couple or more years.
Meanwhile, Microdata should not be labeled with Schema.org like you mentioned because those two are different things. Schema.org is a vocabulary so it can be used for Microdata, RDFa, and JSON-LD.
Using Schema.org as the vocabulary and using JSON-LD as the data representation is probably the most anticipated pair because of two common aspects about them:
- Easy to read for humans; and
- Lightweight machine-readable
but even so there are still disconnects between the two like this example.
Regarding the JSON-LD support, since Bing, Google, Yahoo!, and Yandex acknowledges the use of schema.org then perhaps it is safe to say they are also supporting it like in this example.
2017 Update
Google has been very pro-active in promoting JSON-LD-schema.org these past couple or three years.
It seems Google is leaning towards the use of JSON-LD but it hasn't implemented it for every use-case!
Google is in the process of adding JSON-LD support to more
markup-powered features. So far, JSON-LD is supported for all
Knowledge Graph features, sitelink search boxes, Event Rich Snippets,
and Recipe Rich Snippets; Google recommends the use of JSON-LD for
those features. For the remaining Rich Snippets types and breadcrumbs,
Google recommends the use of microdata or RDFa.
http://developers.google.com/structured-data/schema-org
Google uses JSON-LD as reference examples for Structured Data SEO for their Knowledge Graph (companies and people).
See https://developers.google.com/structured-data/customize/overview
I personally use a combination of JSON-LD and Microdata for my sites (for the time being).
I would say they have other means to identify if the information you provide through JSON-LD is relevant to their search engine (like checking your page is actually talking about what it claims to talk about).
(updating answers!)
About "popularity", please see this question/answers.
Microdata today is the most popular: in a universe of 34 million of domains, 5.63 million (~17%) use "content markup" (I will use the jargon markup) by RDFa (0,9 million), Microdata (2.5 million) or Microformats, and less than half use separated semantic descriptors, noticing the most popular as JSON-LD, with 2.12 million (6%).
PS: we prefer "per-domain statistics" (instead per-page statistics) because pages in same domain in general have same templates and other local-authority convention enforcements.
In a universe of "domains expressing semantics" (7,75 million) the statistic profile is:
- 73% markup semantic
- 27% separated semantic
- (... intersection as mix "separated+markup" can be zero to simplify...)
Rule of thumb in 2017
Use markup semantic with Microdata and, after it, if you need to express something more to machines, use JSON-LD.
Use markup semantic because it is the most popular, and because marked contented will be verificable/auditable simultaneously by humans and machines.
Important: remember that Microdata, RDFa (a W3C standard) and JSON-LD (a W3C standard) can be (easily) translated to RDF, so all these formats are compatible.
PS: for HTML tables see also W3C's tabular-metadata. For open non-HTML resources, as CSV files, use RDF-compatible W3C's tabular-data-model and/or frictionlessdata/specs.
From scratch, JSON-LD would be the way to go. Let's let one of the primary creators of JSON-LD, Manu Sporny, weigh in:
The desire for better Web APIs is what motivated the creation of
JSON-LD, not the Semantic Web. If you want to make the Semantic Web a
reality, stop making the case for it and spend your time doing
something more useful, like actually making machines smarter or
helping people publish data in a way that’s useful to them.
JSON-LD is all about publishing the data in ways that are useful/easy to implement because...
it’s based on technology that most web developers use today.