git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Behaviour of Process Window Function


Hi Harshvardhan,

> 1) Does the state in the process window function qualify as KeyedState or OperatorState? 
KeyedState

> We want to be able to rehydrate the guava cache at the beginning of each window by making an external rest call and clear the cache at the end of that respective window. How can we enforce this behaviour in Flink?
Why do you want to clear cache after window if the cache is shared across all keys. Do you want to load cache per key?
If you want to aggregate elements incrementally, I think it is hard to get start and end in `ProcessWindowFunction` or in `IncrementalAggregation` function. However, I think we can get start and end in the trigger function, i.e., do cache load and clear in the trigger function.

Best, Hequn


On Fri, Sep 7, 2018 at 11:28 AM vino yang <yanghua1127@xxxxxxxxx> wrote:
Hi Harshvardhan,

1) Yes, ProcessWindowFunction extends AbstractRichFunction, through getRuntimeContext,you can access keyed state API.
2) ProcessWindowFunction has given you considerable flexibility, you can based on processing time / event time / timer / it's clear method / customized implementation, the specific design depends on your business logic, how long you need to save the cache.

Thanks, vino.

Harshvardhan Agrawal <harshvardhan.agr93@xxxxxxxxx> 于2018年9月6日周四 下午10:10写道:
Hello,

We have a Flink pipeline where we are windowing our data after a keyBy. i.e.
myStream.keyBy().window().process(MyIncrementalAggregation(), MyProcessFunction()).

I have two questions about the above line of code:
1) Does the state in the process window function qualify as KeyedState or OperatorState? If operator state, can we access KeyedState from the Process Window function?
2) We also have certain reference information that we want to share across all keys in the process window function. We are currently storing all that info in a Guava cache. We want to be able to rehydrate the guava cache at the beginning of each window by making an external rest call and clear the cache at the end of that respective window. How can we enforce this behaviour in Flink? Do I need to use a timerservice for this where the callback will be a window.maxtimestamp() or just clearing the cache in the clear method will do the trick?

--
Regards,
Harshvardhan Agrawal