git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SQL Filter Pushdowns in Apache Beam SQL


It is currently the later where all the data is read and then filtered within the pipeline. Note that this doesn't mean that all the data is loaded into memory as the way that the join is done is dependent on the Runner that is powering the pipeline.

Kenn had shared this doc[1] which is starting to look at integrating Runners and IO into the SQL shell and attempting to start defining a way to map properties from SQL onto the IO connector but it seems natural that the filter would get pushed down to the IO connector as well. Please take a look and feel free to comment.

1: https://docs.google.com/document/d/1ZFVlnldrIYhUgOfxIT2JcmTFFSWTl4HwAnQsnwiNL1g/edit#heading=h.4zubkdp87wok

On Wed, Jun 13, 2018 at 7:39 AM Harshvardhan Agrawal <harshvardhan.agr93@xxxxxxxxx> wrote:
Hi,

We are currently playing with Apache Beam’s SQL extension on top of Flink. One of the features that we were interested is the SQL Predicate Pushdown feature that Spark provides. Does Beam support that?

For eg:
I have an unbounded dataset that I want to join with some static reference data stored in a database. Will beam perform the logic of figuring out all the unique keys in the window and push it down to the jdbc source or will it bring all the data from the jdbc source into memory and then perform the join?

Thanks,
Harsh
--
Regards,
Harshvardhan