git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Large CSV files with headers


> The tokenize language has a skipFirst you can use to skip the header line

OK. But it would be nice if I could make use of the header line (and not
just throw it away).


> There are a number of different CSV data formats / components you can
> use, if the <csv> is not good enough for you.

Which one would you recommend?


On Tue, Apr 17, 2018 at 6:39 PM, Claus Ibsen <claus.ibsen@xxxxxxxxx> wrote:

> Hi
>
> The tokenize language has a skipFirst you can use to skip the header line
> https://github.com/apache/camel/blob/master/camel-core/
> src/main/docs/tokenize-language.adoc
>
> There are a number of different CSV data formats / components you can
> use, if the <csv> is not good enough for you.
>
>
> On Tue, Apr 17, 2018 at 6:11 PM, Frizz <frizzthecat@xxxxxxxxxxxxxx> wrote:
> > I have some CSV files with a header line, so setting useMaps="true" would
> > be the natural thing to do. Works great.
> >
> > My CSV files are very big, so using streaming/parallelProcessing would be
> > the natural thing to do. Also works great.
> >
> > Unfortunately using useMaps="true" AND streaming/parallelProcessing does
> > not work: It results in lots of empty Lists/Maps. Which is
> understandable,
> > but not nice.
> >
> >>> So the question remains: How to efficiently process large CSV files
> that
> > have a header line? <<
> >
> > By the way, this is my route:
> >
> > <route id="CSVRoute">
> >     <from uri="file:/tmp/data/" />
> >     <split streaming="true" parallelProcessing="true">
> >         <tokenize token="\n" />
> >         <unmarshal>
> >             <csv delimiter=";" useMaps="true" />
> >         </unmarshal>
> >         <log message="Got ${body}"/>
> >         <to uri="mock:nextStageProcessor"/>
> >     </split>
> > </route>
>
>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2
>