[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

hadoopInputFormat and elasticsearch


I want to write batch job which reads data from *elasticsearch* using
*elasticsearch-hadoop* (
and *hadoopInputFormat*

example code (from

elasticsearch-hadoop creates one Hadoop InputSplit (tasks) per Elasticsearch
so if my index have 20 shards, it will be split to 20 InputSplit

/My question is:/
What will happen if my job restart (failover) after finishing half of the
InputSplit's ?
Does hadoopInputFormat remember which InputSplit are finished and knows how
to continue from where it stopped? (maybe read from beginning of unfinished
InputSplit? ) or it starts from the beginning?


Sent from: