dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
value := token / quoted-string
According to these BNF from the RFCs, the comma that separates the data from the mime type can actually appear in both the mime type and the data, so there's no simple way (i.e. reg ex) to break the URI into parts. Thus a full parser is needed.
I am wondering does any one know any data URI libraries in Java? My Google search didn't yield anything.
There is a Java data URI parser implementation available on GitHub called jDataUri.
Disclaimer: I am the author
I ended up having to implement my own parser. The RFCs provided BNFs, so it's possible to implement full lexers and syntax analysers. However, for this simple case, I jused used a simple scanning + stack mechamism to trace the quoted strings and locate the separating comma. javax.activation's MimeType is used for actual Mime parsing.