Subject: [jira] [Commented] (TIKA-1946) Add mime detection
and parser for WordPerfect




[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768945#comment-15768945
]

Nick Burch commented on TIKA-1946:
----------------------------------

Ideally different file formats would have different mimetypes, but when there's
a well-known mimetype shared by several file types which are logically similar
but format-wise different, there's not much we can do...

In the past, we have added our own tika-custom parent types, to group together
a handful of similar well-known ones, but normally we have to go for
type/format suffixes on well-known ones to clarify/differentiate when formats
actually differ

Maybe we should add something along the lines of {{EncryptedDocumentException}}
for parsers to throw when they can't handle a specific file, I think for now we
might have a few places where they just through a straight Tika exception.
Perhaps {{UnsupportedFormatException}} ? That would also make it easier to
catch and re-try with another parser, especially if we get the proper fallback
/ multiple parser stuff going on 2.x

For now, adding formats/types to the mimetypes and have the parser claim just
those seems a good first step though

> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
> Key: TIKA-1946
> URL: https://issues.apache.org/jira/browse/TIKA-1946
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Reporter: Nick C
> Fix For: 2.0, 1.15
>
> Attachments: wordperfect_mimes_fuller.zip
>
>
> I noticed some code on github for parsing WordPerfect files
> (https://github.com/Norconex/importer) Also looks like the author
> [~pascal.essiembre] has contributed to Tika before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



Programming list archiving by: Enterprise Git Hosting