I want to define the MIME type of *.txt files: text/txt
, so that Tika can apply a more specific parser than the one used for text/plain
files.
The glob *.txt
is included in the definition of the type text/plain
in tika-mimetypes.xml
. Moreover, it seems to me that you cannot redefine a MIME type in custom-mimetypes.xml
, only add new globs or magic patterns. Additionally, if I define the text/txt
type in tika-mimetypes.xml
as a subtype of text/plain
with only the glob *.txt
, Tika still detects a txt file as text/plain
.
Is it absurd to define a subtype of text/plain
only for txt files? If not, is it possible to define it only with custom-mimetypes.xml
? If not, what is the easiest way to extend tika so that it can parse txt files differently than (let's say) STEP 3D CAD .stp files or .cfg files?
The use case in detail: I have a large source of data composed of (recursive) archives. Some plain text files are huge and I don't want Tika to parse them. However, I want to keep all the txt files.
Edit: specify that I don't want to keep .cfg files either (*.cfg
is a glob of text/plain
)