Correct use of Apache Tika MediaType

2019-02-20 23:26发布

问题:

I want to use APache Tika's MediaType class to compare mediaTypes.

I first use Tika to detect the MediaType. Then I want to start an action according to the MediaType.

So if the MediaType is from type XML I want to do some action, if it is a compressed file I want to start an other action.

My problem is that there are many XML types, so how do I check if it is an XML using the MediaType ?

Here is my previous (before Tika) implementation:

if (contentType.contains("text/xml") || 
    contentType.contains("application/xml") || 
    contentType.contains("application/x-xml") || 
    contentType.contains("application/atom+xml") || 
    contentType.contains("application/rss+xml")) {
        processXML();
}

else if (contentType.contains("application/gzip") || 
    contentType.contains("application/x-gzip") || 
    contentType.contains("application/x-gunzip") || 
    contentType.contains("application/gzipped") || 
    contentType.contains("application/gzip-compressed") || 
    contentType.contains("application/x-compress") || 
    contentType.contains("gzip/document") || 
    contentType.contains("application/octet-stream")) {
        processGzip();
}

I want to switch it to use Tika something like the following:

MediaType mediaType = MediaType.parse(contentType);
if (mediaType == APPLICATION_XML) {
    return processXml();
} else if (mediaType == APPLICATION_ZIP || mediaType == OCTET_STREAM) {
    return processGzip();
}

But the problem is that Tika.detect(...) returns many different types which don't have a MediaType constant.

How can I just identify the MediaType if it is type XML ? Or if it is type Compress ? I need a "Father" type which includes all of it's childs, maybe a method which is: "boolean isXML()" which includes application/xml and text/xml and application/x-xml or "boolean isCompress()" which includes all of the zip + gzip types etc

回答1:

What you'll need to do is walk the types hierarchy, until you either find what you want, or run out of things to check. That can be done with recursion, or could be done with a loop

The key method you need is MediaTypeRegistry.getSupertype(MediaType)

Your code would want to be something like:

// Define your media type constants here
MediaType FOO = MediaType.parse("application/foo");

// Work out the file's type
MediaType type = detector.detect(stream, metadata);

// Is it one we want in the tree?
while (type != null && !type.equals(MediaType.OCTET_STREAM)) {
   if (type.equals(MediaType.Application_XML)) {
       doThingForXML();
   } else if (type.equals(MediaType.APPLICATION_ZIP)) { 
       doThingForZip();
   } else if (type.equals(FOO)) {
       doThingForFoo();
   } else {
       // Check parent
       type = registry.getSuperType(type);
   }
}