git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TPCH/TPCDS benchmark


Hi,

I have a branch in my Github repository to test the TPC-H queries [1] [2].
All queries are supported (four need to be slightly rewritten).

When checking the results of the benchmark, please keep in mind that so far
we focused our efforts on extending the functionality and unified semantics
for batch and stream inputs.
We did not try to improve the performance for SQL queries yet (besides
improvements to Flink in general).
We also didn't enable join reordering in the optimizer due to lack of
statistics. Joins are executed in the order in which tables are referenced
in the query.

Best, Fabian

[1] https://github.com/fhueske/flink/tree/tableTPCH
[2]
https://github.com/fhueske/flink/blob/tableTPCH/flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/tpch/TPCHQueries.scala


Am Mi., 3. Okt. 2018 um 23:24 Uhr schrieb Jin Sun <isunjin@xxxxxxxxx>:

> Hi team,
>
>
>
> Do we have a tpch/tpcds benchmark located somewhere that we can run to
> validate performance? Some people did the benchmark and compare between
> Hive, Presto, and spark, but no flink:
> https://www.slideshare.net/ssuser6bb12d/hive-presto-and-spark-on-tpcds-benchmark
> , the benchmark they use located here:
> https://github.com/hortonworks/hive-testbench
>
>
>
> Jin
>
>