Subject: [jira] [Commented] (TIKA-1946) Add mime detection
and parser for WordPerfect


Nick Burch commented on TIKA-1946:

Ideally different file formats would have different mimetypes, but when there's
a well-known mimetype shared by several file types which are logically similar
but format-wise different, there's not much we can do...

In the past, we have added our own tika-custom parent types, to group together
a handful of similar well-known ones, but normally we have to go for
type/format suffixes on well-known ones to clarify/differentiate when formats
actually differ

Maybe we should add something along the lines of {{EncryptedDocumentException}}
for parsers to throw when they can't handle a specific file, I think for now we
might have a few places where they just through a straight Tika exception.
Perhaps {{UnsupportedFormatException}} ? That would also make it easier to
catch and re-try with another parser, especially if we get the proper fallback
/ multiple parser stuff going on 2.x

For now, adding formats/types to the mimetypes and have the parser claim just
those seems a good first step though

> Add mime detection and parser for WordPerfect
> ---------------------------------------------
> Key: TIKA-1946
> URL:
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser<...

> Reporter: Nick C
> Fix For: 2.0, 1.15
> Attachments:
> I noticed some code on github for parsing WordPerfect files
> ( Also looks like the author
> [~pascal.essiembre] has contributed to Tika before

This message was sent by Atlassian JIRA

Programming list archiving by: Enterprise Git Hosting