Subject: Re: Regarding checksum error in hadoop in my
latest PR.

Hi Omkar,

the test fails because nothing is generated, i.e. the tests detected a broken
functionality. :)

Best is to test generator from the command-line whether they behave as
expected, e.g. (with a
non-empty CrawlDb):

$ bin/nutch generate path/to/crawldb path/to/segments/
Generator: segment: path/to/segments/20170811152636
Generator: finished at 2017-08-11 15:26:37, elapsed: 00:00:03

$ tree path/to/segments/
`-- 20170811152636
`-- crawl_generate

The folder crawl_generate is empty! Generator is complex with multiple steps
working in temporary
folders. Eventually, it's only the final copying which is broken. But I see no
other way as to debug
the fetch list generation to find out what the reason is.

I strongly recommend to test also all other tools from command-line. For the
bulk of them just run
a sample crawl via bin/crawl.


On 08/09/2017 11:56 AM, Omkar Reddy wrote:
> Hello dev@,
> I am facing an EOFException in the file and I cannot get
> my hands on the way in
> which I can solve it. The Exception is as follows :
> 1. 2017-08-09 12:57:06,026 WARN fs.FSInputChecker
> (<init>(157)) - Problem
> opening checksum file:
> file:/tmp/hadoop-omreddy/mapred/temp/generate-temp-16af5bc1-1a80-412b-b0ca-481c82877f3b/fetchlist-0/part-r-00000.
> Ignoring exception:
> 2.
> I cannot understand the reason for it. This PR[1] is the part of an effort to
> upgrade Nutch to use
> new MapReduce API.
> Please find the detailed log of the test her...

e[0]. Any suggestions/help would

> be appreciated.
> Thanks,
> Omkar.
> [0]
> [1]

Programming list archiving by: Enterprise Git Hosting