git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Druid Heavy Data Load perform only partial load


Hi Team,
Thank you so much & appreciated your help & support.
I'm facing some issues to load heavy data into druid single node setup, data load job is successful, but only partial data gets loaded.

Machine config : Linux-ubuntu 16.04 LTS, 4 CPU, 16GB RAM, 500 GB disk space

Data csv file size : 290MB
Total rows : 1 million
Total columns : 21
Total dimensions : 21

Druid data ingestion config :

{
  "type" : "index",
  "spec" : {
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "data/",
        "filter" : "0_1000000.csv"
      },
      "appendToExisting" : false
    },
    "dataSchema" : {
      "dataSource" : "data_1_million",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "day",
        "intervals" : ["2018-08-01/2018-10-19"],
        "rollup" : true
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "csv",
                  "hasHeaderRow" : true,
          "dimensionsSpec" : {
            "dimensions" : [
                "col_1",
                                "col_2",
                                "col_3",
                                "col_4",
                                "col_5",
                                "col_6",
                                "col_7",
                                "col_8",
                                "col_9",
                                "col_10",
                                "col_11",
                                "col_12",
                                "col_13",
                                "col_14",
                                "col_15",
                                "col_16",
                                "col_17",
                                "date_col_18",
                                "date_col_19",
                                "col_20",
                                "col_21"
            ]
          },
          "timestampSpec": {
                    "column": "date_col_19"
          }
        }
      },
      "metricsSpec" : []
    },
    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : 50000000,
      "maxRowsInMemory" : 10000000,
      "forceExtendableShardSpecs" : true
    }
  }
}
Druid data ingestion job is successful, but only loads 0.7 million rows out of 1 million rows
Highly appreciated your help & pointers to solve this issue.

Thanks & Sincerely,
Kiran Jagtap

"Legal Disclaimer: This electronic message and all contents contain information from Cybage Software Private Limited which may be privileged, confidential, or otherwise protected from disclosure. The information is intended to be for the addressee(s) only. If you are not an addressee, any disclosure, copy, distribution, or use of the contents of this message is strictly prohibited. If you have received this electronic message in error please notify the sender by reply e-mail to and destroy the original message and all copies. Cybage has taken every reasonable precaution to minimize the risk of malicious content in the mail, but is not liable for any damage you may sustain as a result of any malicious content in this e-mail. You should carry out your own malicious content checks before opening the e-mail or attachment." www.cybage.com