git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Openstack] [Nova][Cyborg] Cyborg-Nova integration -- new submitted Cyborg implementation code


Hi,

I have noticed there are more and more developers who are interested in Cyborg at PTG, and there are also some valuable outputs after this PTG.
According to the summary at this PTG(https://etherpad.openstack.org/p/cyborg-nova-ptg-stein ), 4 possible solutions about how Nova interact with Cyborg are proposed, including device profile proposal, device context etc. And the 4th solution(from Jay) is very similar with my draft implementation before PTG, and now I have submitted to upstream.

FYI, the original description of 4th solution in etherpad as below: (https://etherpad.openstack.org/p/cyborg-nova-ptg-stein , L112-L123):
      a) CTX_UUID=$(cyborg device-context-create --resources=FPGA_GZIP:1 --requires=SOME_FOO_CAPABILITY --config-data=key1:val1) **or** cyborg device-context-create --device-profile-id=$PROFILE_ID
      ++
      alex: the different is just create context/profile on the fly?
      Is $CTX_UUID a profile or a specific instance? If the former, this is #1; if the latter, it is #2
      (jaypipes) it's neither. It's the equivalent of a port binding request, just for an accelerator or super amazing device thingy. It's not like #1 because the device context is (eventually) bound to an instance and a specific device/slot identifier. It's not like #2 because it's not pre-creating any device.
      (efried) oh, okay, so it's kind of like a "dynamic profile" - it only exists as long as this request. It's the equivalent of a Neutron port (bind request) or a Cinder attachment ID. Ight.
      b) nova boot --device-context-id=$CTX_UUID
      c) placement finds a compute node that has availability for 1 FPGA_GZIP and has SOME_FOO_CAPABILITY
      d) nova-conductor sends async call to Cyborg to identify a specific slot on the chosen compute host to start dedicating to the instance. This step could be called "pre_bind()" or something like that?

My current code will work like this: when a new VM request comes in to Nova, the info about the acceleration device is stored in device-context, there will be 'resources' field and 'required traits' field. After selected a host by invoking placement, nova should call Cyborg API to allocate one accelerators(binding) on the selected host. The Cyborg's allocation API[1] will do that. But before that, nova should firstly invoke the parser() function in OS-ACC [2] to parse the device-context to the acceptable parameters in order to call Cyborg API with them. And OS-ACC also provides the attach() function for PCI device to generate the xml file.

Hope my code can help to move faster for this problem, and welcome reviews.

Ref.
[1] Cyborg will provide 2 RESTFul APIs to do allocation(binding) and deallocation(unbinding) like described in https:/review.openstack.org/#/c/596187/  and https://review.openstack.org/#/c/597991/ .
[2] The parser in OS-ACC to help parse these parameters into an acceptable format in order to invoke Cyborg API with them when binding or unbinding. The parser's related patch is the following: https://review.openstack.org/#/c/606036/.

Thanks,
Xin-Ran



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20180929/05477d3e/attachment.html>