git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][cinder][ops] question/confirmation of legacy vol attachment migration


On 10/10, Matt Riedemann wrote:
> On 10/10/2019 5:00 AM, Gorka Eguileor wrote:
> > > 1. Yeah if the existing legacy attachment record doesn't have a connector I
> > > was worried about not properly cleaning on for that old connection, which is
> > > something I mentioned before, but also as mentioned we potentially have that
> > > case when a server is deleted and we can't get to the compute host to get
> > > the host connector, right?
> > >
> > Hi,
> >
> > Not really... In that case we still have the BDM info in the DB, so we
> > can just make the 3 Cinder REST API calls ourselves (begin_detaching,
> > terminate_connection and detach) to have the volume unmapped, the export
> > removed, and the volume return to available as usual, without needing to
> > go to the storage array manually.
>
> I'm not sure what you mean. Yes we have the BDM in nova but if it's really
> old it won't have the host connector stashed away in the connection_info
> dict and we won't be able to pass that to the terminate_connection API:
>
> https://github.com/openstack/nova/blob/19.0.0/nova/compute/api.py#L2186
>
> Are you talking about something else? I realize ^ is very edge case since
> we've been storing the connector in the BDM.connection_info since I think at
> least Liberty or Mitaka.

Hi,

I didn't know that Nova didn't use to store the connector...  For those
cases it is definitely going to be a problem.

If you have one such cases, and the Nova compute node is down (so you
cannot get the connector info), then we should just wait until the node
is back up to do the migration.


>
> >
> >
> > > 2. If I were to use os-terminate_connection, I seem to have a tricky
> > > situation on the migration flow because right now I'm doing:
> > >
> > > a) create new attachment with host connector
> > > b) complete new attachment (put the volume back to in-use status)
> > >    - if this fails I attempt to delete the new attachment
> > > c) delete the legacy attachment - I intentionally left this until the end to
> > > make sure (a) and (b) were successful.
> > >
> > > If I change (c) to be os-terminate_connection, will that screw up the
> > > accounting on the attachment created in (a)?
> > >
> > > If I did the terminate_connection first (before creating a new attachment),
> > > could that leave a window of time where the volume is shown as not
> > > attached/in-use? Maybe not since it's not the begin_detaching/os-detach
> > > API...I'm fuzzy on the cinder volume state machine here.
> > >
> > > Or maybe the flow would become:
> > >
> > > a) create new attachment with host connector
> > This is a good idea in itself, but it's not taking into account weird
> > behaviors that some Cinder drivers may have when you call them twice to
> > initialize the connection on the same host.  Some drivers end up
> > creating a different mapping for the volume instead of returning the
> > existing one; we've had bugs like this before, and that's why Nova made
> > a change in its live instance migration code to not call
> > intialize_connection on the source host to get the connection_info for
> > detaching.
>
> Huh...I thought attachments in cinder were a dime a dozen and you could
> create/delete them as needed, or that was the idea behind the new v3
> attachments stuff. It seems to at least be what I remember John Griffith
> always saying we should be able to do.

Sure, you can create them freely, but the old and new API's were not
meant to be mixed, which is what we would be doing here.

The more I look at this, the more I think it is a bad idea.

First, when you create the new attachment on a volume that was attached
using the old APIs (Cinder attachment exists in DB without connection
info):

- Cinder DB attachment entry has the instance_uuid: then, since the
  volume to be migrated is not multi-attach, we will reuse the
  attachment that already exists [1] and a new one will not be created.
  So we will end up with just 1 attachment entry like if we didn't do
  anything.

- If the DB attachment entry doesn't have the instance_uuid, then the
  attachment creation will fail [2] because it is not a multi-attach
  volume.

If somehow we were able to create a second attachment entry, then the
attachment_update will raise an exception [3], because it's expecting to
have the connector information in all the attachments (because old and
new attachment flows were not meant to be mixed).

Even if we pass through that, this is not a multi-attach volume, so it
will still fail because it cannot have 2 attachments [4].

Even if we get past that, when we create the attachment with the
connector info or create it and then update it with the connection info,
we'll get a new call to initialize_connection, and depending on the
driver this will create a new export and mapping or reuse the old one.

- If we create a new one the code could be fine when you call Cinder to
  remove that attachment, because we have 2 exports and we'll remove
  one.  "Hopefully" the one we want.

- The problem is if the driver's initialize_connection was idempotent,
  because then the new attach API expects it to return "True" when
  called a second time to initialize_connection with the same connector
  info, yet I don't think this was documented anywhere [5], so I don't
  think there are any drivers that are doing this.  If drivers are
  idempotent and they don't return "True", then when we terminate the
  old attach connection we'll remove the export that is used by both
  connections, breaking the attachment.

And this is not even thinking of what will happen on the OS-Brick side,
which is most likely not good.  For example, if we have a multipath
iSCSI volume and the driver created a new target-portal, then the new
devices that will appear on the system will be aggregated to the already
existing multipath DM, which means that the terminate connection of the
first one will flush the shared multipath DM, thus destroying the
mapping from the new API flow.

I stand by my initial recommendation, being able to update the existing
attachment to add the connection information from Nova.

Cheers,
Gorka.

[1]: https://opendev.org/openstack/cinder/src/commit/b8198de09aa13113d16d0cef8916223e66f9d8c1/cinder/volume/api.py#L2107-L2118
[2]: https://opendev.org/openstack/cinder/src/commit/b8198de09aa13113d16d0cef8916223e66f9d8c1/cinder/volume/api.py#L2121-L2127
[3]: https://opendev.org/openstack/cinder/src/commit/b8198de09aa13113d16d0cef8916223e66f9d8c1/cinder/volume/api.py#L2223
[4]: https://opendev.org/openstack/cinder/src/commit/b8198de09aa13113d16d0cef8916223e66f9d8c1/cinder/volume/api.py#L2227-L2233
[5]: https://opendev.org/openstack/cinder/src/commit/b8198de09aa13113d16d0cef8916223e66f9d8c1/cinder/volume/driver.py#L1470-L1476

>
> Also if you can't refresh the connection info on the same host then a change
> like this:
>
> https://review.opendev.org/#/c/579004/
>
> Which does just that - refreshes the connection info doing reboot and start
> instance operations - would break on those volume drivers if I'm following
> you.
>

The part related to the new API looks fine, but doing that for the old
initialize_connection doesn't look right to me.

Cheers,
Gorka.


> >
> >
> > > b) terminate the connection for the legacy attachment
> > >    - if this fails, delete the new attachment created in (a)
> > > c) complete the new attachment created in (a)
> > >    - if this fails...?
> > >
> > > Without digging into the flow of a cold or live migration I want to say
> > > that's closer to what we do there, e.g. initialize_connection for the new
> > > host, terminate_connection for the old host, complete the new attachment.
> > >
> > I think any workaround we try to find has a good chance of resulting in
> > a good number of bugs.
> >
> > In my opinion our options are:
> >
> > 1- Completely detach and re-attach the volume
>
> I'd really like to avoid this if possible because it could screw up running
> applications and the migration operation itself is threaded out to not hold
> up the restart of the compute service. But maybe that's also true of what
> I've got written up today though it's closer to what we do during
> resize/cold migrate (though those of course involve downtime for the guest).

I agree, this is not an idea we should pursue.


>
> > 2- Write new code in Cinder
> >
> > The new code can be either a new action or we can just add a
> > microversion to attachment create to also accept "connection_info", and
> > when we provide connection_info on the call the method confirms that
> > it's a "migration" (the volume is 'in-use' and doesn't have any
> > attachments) and it doesn't bother to call the cinder-volume to export
> > and map the volume again and simply saves this information in the DB.
>
> If the volume is in-use it would have attachments, so I'm not
> following you there. Even if the volume is attached the "legacy" way
> from a nova perspective, using os-initialize_connection, there is a
> volume attachment record in the cinder DB (I confirmed this in my
> devstack testing and the notes are in my patch). It's also precisely
> the problem I'm trying to solve which is without deleting the old
> legacy attachment, when you delete the server the volume is detached
> but still shows up in cinder as in-use because of the orphaned
> attachment.
>
> >
> > I know the solution it's not "clean/nice/elegant", and I'd rather go
> > with option 1, but that would be terrible user experience, so I'll
> > settle for a solution that doesn't add much code to Cinder, is simple
> > for Nova, and is less likely to result in bugs.
> >
> > What do you think?
> >
> > Regards,
> > Gorka.
> >
> > PS: In this week's meeting we briefly discussed this topic and agreed to
> > continue the conversation here and retake it on next week's meeting.
> >
>
> Thanks for discussing it and getting back to me.
>
> --
>
> Thanks,
>
> Matt