git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Streaming


Forward shiming mail to Aitozi.

Aitozi

We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time.

However, the time resolution of this operator is 1 millisecond, so it ends up with too many timers in the java heap which might leads to OOM.

Cheers
Shimin


> 在 2018年6月27日,下午5:34,zhangminglei <18717838093@xxxxxxx> 写道:
> 
> Aitozi
> 
> From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once.
> 
> For uv, we can use a bloomfilter to filter pv for geting uv in the end. 
> 
> Window is usually used in an aggregate operation, so I think all should be realized by windows.
> 
> I am not familiar with this fields, so I still want to know what others response this question.
> 
> Cheers
> Minglei
> 
> 
> 
>> 在 2018年6月27日,下午5:12,aitozi <gjying1314@xxxxxxxxx> 写道:
>> 
>> Hi, community
>> 
>> I am using flink to deal with some situation.
>> 
>> 1. "distinct count" to calculate the uv/pv.
>> 2.  calculate the topN of the past 1 hour or 1 day time.
>> 
>> Are these all realized by window? Or is there a best practice on doing this?
>> 
>> 3. And when deal with the distinct, if there is no need to do the keyBy
>> previous, how does the window deal with this.
>> 
>> Thanks 
>> Aitozi.
>> 
>> 
>> 
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>