git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reducing database connection with JdbcIO



Hello, 
We did own jdbcio with thread pool per jwm (using lazy initialization in @Setup). In processElement we are getting/freeing connection. 

Best Regards,
Aleksandr Gortujev.

14. märts 2018 12:49 PM kirjutas kuupäeval "Derek Chan" <derekcsy@xxxxxxxxx>:
Hi,

We are new to Beam and need some help.

We are working on a flow to ingest events and writes the aggregated counts to a database. The input rate is rather low (~2000 message per sec), but the processing is relatively heavy, that we need to scale out to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also low. But because of the number of workers, it keeps 3000 connections to the database and it keeps hitting the database connection limits.

Is there a way that we can reduce the concurrency only at the output stage? (In Spark we would have done a repartition/coalesce).

And, if it matters, we are using Apache Beam 2.2 via Scio, on Google Dataflow.

Thank you in advance!