git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Large CSV files with headers


Hi

The tokenize language has a skipFirst you can use to skip the header line
https://github.com/apache/camel/blob/master/camel-core/src/main/docs/tokenize-language.adoc

There are a number of different CSV data formats / components you can
use, if the <csv> is not good enough for you.


On Tue, Apr 17, 2018 at 6:11 PM, Frizz <frizzthecat@xxxxxxxxxxxxxx> wrote:
> I have some CSV files with a header line, so setting useMaps="true" would
> be the natural thing to do. Works great.
>
> My CSV files are very big, so using streaming/parallelProcessing would be
> the natural thing to do. Also works great.
>
> Unfortunately using useMaps="true" AND streaming/parallelProcessing does
> not work: It results in lots of empty Lists/Maps. Which is understandable,
> but not nice.
>
>>> So the question remains: How to efficiently process large CSV files that
> have a header line? <<
>
> By the way, this is my route:
>
> <route id="CSVRoute">
>     <from uri="file:/tmp/data/" />
>     <split streaming="true" parallelProcessing="true">
>         <tokenize token="\n" />
>         <unmarshal>
>             <csv delimiter=";" useMaps="true" />
>         </unmarshal>
>         <log message="Got ${body}"/>
>         <to uri="mock:nextStageProcessor"/>
>     </split>
> </route>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-camel-users/msg03253.html on line 113
Call Stack
#TimeMemoryFunctionLocation
10.0007364576{main}( ).../msg03253.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-camel-users/msg03253.html on line 113
Call Stack
#TimeMemoryFunctionLocation
10.0007364576{main}( ).../msg03253.html:0