git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Druid Heavy Data Load perform only partial load


Hi Kiran,

I guess it's because you set rollup to true and some rows have rolled up
during ingestion. Would you validate data by checking some aggregation
query result?

Jihoon

On Fri, Oct 19, 2018 at 4:03 AM Kiran Jagtap <kiranja@xxxxxxxxxx> wrote:

> Hi Team,
> Thank you so much & appreciated your help & support.
> I'm facing some issues to load heavy data into druid single node setup,
> data load job is successful, but only partial data gets loaded.
>
> Machine config : Linux-ubuntu 16.04 LTS, 4 CPU, 16GB RAM, 500 GB disk space
>
> Data csv file size : 290MB
> Total rows : 1 million
> Total columns : 21
> Total dimensions : 21
>
> Druid data ingestion config :
>
> {
>   "type" : "index",
>   "spec" : {
>     "ioConfig" : {
>       "type" : "index",
>       "firehose" : {
>         "type" : "local",
>         "baseDir" : "data/",
>         "filter" : "0_1000000.csv"
>       },
>       "appendToExisting" : false
>     },
>     "dataSchema" : {
>       "dataSource" : "data_1_million",
>       "granularitySpec" : {
>         "type" : "uniform",
>         "segmentGranularity" : "day",
>         "queryGranularity" : "day",
>         "intervals" : ["2018-08-01/2018-10-19"],
>         "rollup" : true
>       },
>       "parser" : {
>         "type" : "string",
>         "parseSpec" : {
>           "format" : "csv",
>                   "hasHeaderRow" : true,
>           "dimensionsSpec" : {
>             "dimensions" : [
>                 "col_1",
>                                 "col_2",
>                                 "col_3",
>                                 "col_4",
>                                 "col_5",
>                                 "col_6",
>                                 "col_7",
>                                 "col_8",
>                                 "col_9",
>                                 "col_10",
>                                 "col_11",
>                                 "col_12",
>                                 "col_13",
>                                 "col_14",
>                                 "col_15",
>                                 "col_16",
>                                 "col_17",
>                                 "date_col_18",
>                                 "date_col_19",
>                                 "col_20",
>                                 "col_21"
>             ]
>           },
>           "timestampSpec": {
>                     "column": "date_col_19"
>           }
>         }
>       },
>       "metricsSpec" : []
>     },
>     "tuningConfig" : {
>       "type" : "index",
>       "targetPartitionSize" : 50000000,
>       "maxRowsInMemory" : 10000000,
>       "forceExtendableShardSpecs" : true
>     }
>   }
> }
> Druid data ingestion job is successful, but only loads 0.7 million rows
> out of 1 million rows
> Highly appreciated your help & pointers to solve this issue.
>
> Thanks & Sincerely,
> Kiran Jagtap
>
> "Legal Disclaimer: This electronic message and all contents contain
> information from Cybage Software Private Limited which may be privileged,
> confidential, or otherwise protected from disclosure. The information is
> intended to be for the addressee(s) only. If you are not an addressee, any
> disclosure, copy, distribution, or use of the contents of this message is
> strictly prohibited. If you have received this electronic message in error
> please notify the sender by reply e-mail to and destroy the original
> message and all copies. Cybage has taken every reasonable precaution to
> minimize the risk of malicious content in the mail, but is not liable for
> any damage you may sustain as a result of any malicious content in this
> e-mail. You should carry out your own malicious content checks before
> opening the e-mail or attachment." www.cybage.com
>