Maybe with processGzippedXML() from Crawler-Commons? Is this possible?
On 08/01/2017 05:21 PM, Michael Chen wrote:
I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
build the parse-zip plugin. parse-ext, parse-swf and feed also failed
to build. It seems to be a known issue (NUTCH-874) and is marked for
Is there a workaround to parse gunzipped files? Is the porting of
these plugins under active development?