[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hashjoin implementation

Hi vino,


I was running a join operation on two DataSets and writing the result to disk and the results were correct.
I just was not able to identify the moment when the Hashtable is built. ( is not used in this case?)

Do you have an idea why I cannot find it?

Here is a part of my code:
DataSet<RowCustomers> customers = env.fromElements( new RowCustomers(1, Mayer));
tEnv.registerDataSet("Customers", customers, "customerID, customerName");
DataSet<RowOrders> orders = env.fromElements( new RowOrders(1, 1, new String[]{Pen, Paper"}, "22.08.2018"));
tEnv.registerDataSet("Orders", orders, "orderID, customerID, items, date");
DataSet<Tuple2 <RowCustomers, RowOrders>> result = customers.join(orders, JoinOperatorBase.JoinHint.REPARTITION_HASH_FIRST)

Thanks a lot.

Am 11.09.2018 um 04:24 schrieb vino yang <yanghua1127@xxxxxxxxx>:

Hi Benjamin,

The approximate location is this package, the more accurate location is here.[1]

Specifically, Hash Join is divided into two steps:

1) build side
2) probe side

Thanks ,vino.

Benjamin Burkhardt <benjamin.burkhardt@xxxxxxxxxxxxxxxxxx> 于2018年9月10日周一 下午10:09写道:

can anyone tell me where the default hybrid hash join function for partitioning (shuffle phase) is implemented?
Even after deeper dinning I was not able to figure out where it is located.

Might be somewhere here?

Thanks in advance.