git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[osprofiler] Distributed tracing in OpenStack



On 4/12/19 2:34 PM, Monty Taylor wrote:
> 
> 
> On 4/11/19 9:42 PM, Ilya Shakhat wrote:
>> Hi,
>>
>> Distributed tracing is one of must-have features when one wants to 
>> track the full path of request going through different services and 
>> APIs. This makes it similar to shared request-id, but with nice 
>> visualization at the end [1]. In OpenStack the tracing can be achieved 
>> via osprofiler library. The library was introduced 5 years ago, and 
>> back then there was no standard approach on how to do tracing and 
>> that's why it stays aside from what has become a mainstream. Yet there 
>> is no single standard, but the major players are OpenTracing and 
>> OpenCensus communities. OpenTracing is represented by Uber's Jaeger 
>> which is the default tracer from k8s world.
>>
>> Issues and limitations to be fixed:
>> 1. Compatibility. While osprofiler library supports many different 
>> storage drivers, it has only one way of transferring trace context 
>> over the wire. Ideally the library should be compatible with other 
>> third-party tracers and allow traces to start in front of OpenStack 
>> APIs (e.g. in user apps) and continue after (e.g. in storage systems, 
>> or network management tools). [2]
>> 2. Operation mode. With osprofiler tracing is initiated by user 
>> request, while in industrial solutions the tracing can be managed 
>> centrally via dynamic sampling policies.
>> 3. In-process trace propagation. Depending on execution model 
>> (threaded, async) the ways of storing current trace context differ. 
>> OSProfiler supports thread-local model, which recently got broken with 
>> new async implementation in openstacksdk [3].
> 
> FWIW - we should have re-fixed that issue in SDK for all instances other 
> than parallel uploading of Large Objects segments to swift. The 
> parallism support now relies on the calling context's parallism. The 
> large-object segment uploader is a thing we should make sure we do 
> things with to make sure we're not losing those interactions.
> 
> That said - if we move forward with this plan - let's be sure to make 
> sure it works in openstacksdk - and that we're testing it so that we 
> don't break it.

Do we need to wrap logical operations that may make more than one remote 
call in a single span?

I ask because in the cloud layer of openstacksdk, there are methods, 
like "create_image" or "get_server" which can wind up making multiple 
calls to multiple services, but it's a single logical operation to the 
user. I don't know enough about the opentracing best practices - do we 
care about such aggregations? Or is simply wrapping the http call at the 
ksa layer enough?

>> With OpenTracing it is possible to select the appropriate model 
>> alongside with tracer configuration.
>>
>> What's the plan:
>> Switching to OpenTracing could be a good option to gain compatibility 
>> with 3rd-party solutions. The actual change should go to osprofiler 
>> library, but indirectly affects all OpenStack projects (should it be a 
>> global team goal then?). I'm going to make a PoC of proposed change, 
>> so reviews would be highly appreciated.
>>
>> Comments, suggestions?
> 
> Generally supportive. I have specific impl feedbacks - but I'll leave 
> those on the patches.
> 
>> Thanks,
>> Ilya
>>
>> [1] e.g. 
>> http://logs.openstack.org/15/650915/4/check/tempest-smoke-py3-osprofiler-redis/7c6c14e/osprofiler-traces/trace-3e5cc660-8815-4079-86b9-778af8469d79.html.gz 
>>
>> [2] https://bugs.launchpad.net/osprofiler/+bug/1798565
>> [3] https://bugs.launchpad.net/osprofiler/+bug/1818493
> 
>