git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ironic] Re: OpenStack Ironic Issue


Hi Akshay,

On 09.10.20 06:58, Akshay 346 wrote:
> Hello Team,
> 
> I hope you all are good.
> 
> I am using openstack ironic deployment and have some issues and some 
> observations. These are:
> 
> Issue:  At the time of "openstack server create" for launching baremetal 
> node, I came across the following multiple observations:
> 
> - Sometimes when I launch baremetal node on openstack, after one time 
> pxe booting, the baremetal node goes down again and then comes up and 
> goes into second time booting and gets stuck there in "Probing" state ( 
> Seen on node's console) BUT according to openstack horizon, it is up and 
> running and according to "openstack baremetal node show", it is in 
> "Active" state.

Right: in order to deploy a node, Ironic will boot the node via PXE
into a ramdisk (with the Ironic Python Agent) to download and install
the user image. Once this is done, it boots the node from the just
installed disk. These are the two boot events you see.

At the moment when Ironic boots the node the second time, Ironic is done
with the deployment. At this stage the node moves to active, which means
there is now a user instance on this node. Whether or not the node is
able to boot from this image does not affect this state.

> 
> - And sometimes when i launch  baremetal node on openstack, after one 
> time pxe booting, the baremetal node goes down again and then comes up, 
> the "spawning" state on openstack horizon  goes into ERROR.
> 
> Error seen in "nova-compute-ironic-0" container is :
> 
> "ERROR nova.compute.manager [instance: 
> edd447c6-12ac-49ba-b0bc-f419aff4892a] 
> nova.exception.InstanceDeployFailure: Failed to provision instance 
> edd447c6-12ac-49ba-b0bc-f419aff4892a: Timeout reached while waiting for 
> callback for node 75210cc4-ad98-442d-ace1-89ce69467580"

In this case, something went wrong during the deployment. The Ironic 
deploy logs will give some hint about the cause. The specific error
you quote looks like Ironic timed out waiting for the node to call
back.
When the deployment fails, Ironic may try to clean the node and this
is the second boot you see.

> - The baremetal node always takes near about 2 hours to be in 
> "available" state from "cleaning" and "clean-wait". Is it correct 
> behaviour ?

That depends on how you configured cleaning, but if Ironic, for 
instance, needs to erase all disks, cleaning can take a while.
If you have added your keys to the IPA image, you can log into the
node while it is cleaning and actually check what it is doing.

HTH,
  Arne

--
Arne Wiebalck
CERN IT