git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cinder][tooz]Lock-files are remained



On 10/15/19 10:41 AM, Herve Beraud wrote:
> 
> 
> Le mar. 15 oct. 2019 à 16:48, Ben Nemec <openstack at nemebean.com 
> <mailto:openstack at nemebean.com>> a écrit :
> 
> 
> 
>     On 10/15/19 7:48 AM, Herve Beraud wrote:
>      > I proposed some patches through heat templates and puppet-cinder to
>      > remove lock files older than 1 week and avoid file system growing.
>      >
>      > This is a solution based on a cron job, to fix that on stable
>     branches,
>      > in a second time I'll help the fasteners project to fix the root
>     cause
>      > by reviewing and testing the proposed patch (lock based on file
>     offset).
>      > In next versions I hope we will use a patched fasteners and so we
>     could
>      > drop the cron based solution.
>      >
>      > Please can you give /reviews/feedbacks:
>      > - https://review.opendev.org/688413
>      > - https://review.opendev.org/688414
>      > - https://review.opendev.org/688415
> 
>     I'm rather hesitant to recommend this. It looks like the change is only
>     removing the -delete lock files, which are a fraction of the total lock
>     files created by Cinder, and I don't particularly want to encourage
>     people to start monkeying with the lock files while a service is
>     running. Even with this limited set of deletions, I think we need a
>     Cinder person to look and verify that we aren't making bad assumptions
>     about how the locks are used.
> 
> 
> Yes these changes should be validated by the cinder team.
> I chosen this approach to allow use to fix that on stable branches too, 
> and to avoid to introduce a non backportable new feature.
> 
> 
>     In essence, I don't think this is going to meaningfully reduce the
>     amount of leftover lock files and it sets a bad precedent for how to
>     handle them.
> 
>     Personally, I'd rather see a boot-time service added for each OpenStack
>     service that goes out and wipes the lock file directory before starting
>     the service.
> 
> 
> I agree it can be an alternative to the proposed changes.
> I guess it's related to some sort of puppet code too, I'm right? (the 
> boot-time service)

That's probably how you'd implement it in TripleO. Or maybe Ansible now. 
Best to check with the TripleO team on that since my knowledge is quite 
out of date on that project now.

> 
> 
>     On a more general note, I'm going to challenge the assertion that
>     "Customer file system growing slowly and so customer risk to facing some
>     issues to file system usage after a long period." I have yet to hear an
>     actual bug report from the leftover lock files. Every time this
>     comes up
>     it's because someone noticed a lot of lock files and thought we were
>     leaking them. I've never heard anyone report an actual functional or
>     performance problem as a result of the lock files. I don't think we
>     should "fix" this until someone reports that it's actually broken.
>     Especially because previous attempts have all resulted in very real
>     bugs
>     that did break people.
> 
> 
> Yes I agreee it's more an assumption than a reality, I never seen 
> anybody report a disk usage issue or things like this due to leftover 
> lock files.
> 
> 
>     Maybe we should have oslo.concurrency drop a file named _README (or
>     something else likely to sort first in the file listing) into the
>     configured lock_path that explains why the files are there and the
>     proper way to deal with them.
> 
> 
> Good idea.
> 
> Anyway, even if nobody facing a file system issue related to files 
> leftover, I think it's not a good thing to lets grow a FS, and we need 
> to try to address it to prevent potential file system issues related to 
> disk usage and lock files, but in a secure way to avoid to introduce 
> race conditions with cinder.
> 
> Cinder people need to confirm that my proposed changes can fit well with 
> cinder's mechanismes or choose a better approach.

I'm opposed in general to external solutions. If lock files are to be 
cleaned up, it needs to happen either when the service isn't running so 
there's no chance of deleting an in-use lock, or it needs to be done by 
the service itself when it knows that it is done with the lock. Any 
fixes outside the service run the risk of drifting from the 
implementation if, for example, Cinder made a change to its locking 
semantics such that locks you could safely remove previously no longer 
could be. I believe Neutron implements lock file cleanup using [0], 
which is really the only way runtime lock cleanup should be done IMHO.

0: 
https://github.com/openstack/oslo.concurrency/blob/85c341aced7b181724b68c9d883768b5c5f7e982/oslo_concurrency/lockutils.py#L194

> 
> 
>      >
>      > Thanks
>      >
>      >
>      > Le lun. 30 sept. 2019 à 03:35, Rikimaru Honjo
>      > <honjo.rikimaru at ntt-tx.co.jp <mailto:honjo.rikimaru at ntt-tx.co.jp>
>     <mailto:honjo.rikimaru at ntt-tx.co.jp
>     <mailto:honjo.rikimaru at ntt-tx.co.jp>>> a écrit :
>      >
>      >     On 2019/09/28 1:44, Ben Nemec wrote:
>      >      >
>      >      >
>      >      > On 9/23/19 11:42 PM, Rikimaru Honjo wrote:
>      >      >> Hi Eric,
>      >      >>
>      >      >> On 2019/09/20 23:10, Eric Harney wrote:
>      >      >>> On 9/20/19 1:52 AM, Rikimaru Honjo wrote:
>      >      >>>> Hi,
>      >      >>>>
>      >      >>>> I'm using Queens cinder with the following setting.
>      >      >>>>
>      >      >>>> ---------------------------------
>      >      >>>> [coordination]
>      >      >>>> backend_url = file://$state_path
>      >      >>>> ---------------------------------
>      >      >>>>
>      >      >>>> As a result, the files like the following were remained
>     under
>      >     the state path after some operations.[1]
>      >      >>>>
>      >      >>>> cinder-63dacb3d-bd4d-42bb-88fe-6e4180164765-delete_volume
>      >      >>>> cinder-32c426af-82b4-41de-b637-7d76fed69e83-delete_snapshot
>      >      >>>>
>      >      >>>> In my understanding, these are lock-files created for
>      >     synchronization by tooz.
>      >      >>>> But, these lock-files were not deleted after finishing
>     operations.
>      >      >>>> Is this behaviour correct?
>      >      >>>>
>      >      >>>> [1]
>      >      >>>> e.g. Delete volume, Delete snapshot
>      >      >>>
>      >      >>> This is a known bug that's described here:
>      >      >>>
>      >      >>> https://github.com/harlowja/fasteners/issues/26
>      >      >>>
>      >      >>> (The fasteners library is used by tooz, which is used by
>     Cinder
>      >     for managing these lock files.)
>      >      >>>
>      >      >>> There's an old Cinder bug for it here:
>      >      >>> https://bugs.launchpad.net/cinder/+bug/1432387
>      >      >>>
>      >      >>> but that's marked as "Won't Fix" because Cinder needs it
>     to be
>      >     fixed in the underlying libraries.
>      >      >> Thank you for your explanation.
>      >      >> I understood the state.
>      >      >>
>      >      >> But, I have one more question.
>      >      >> Can I think this bug doesn't affect synchronization?
>      >      >
>      >      > It does not. In fact, it's important to not remove lock files
>      >     while a service is running or you can end up with synchronization
>      >     issues.
>      >      >
>      >      > To clean up the leftover lock files, we generally recommend
>      >     clearing the lock_path for each service on reboot before the
>      >     services have started.
>      >
>      >     Thank you for your information.
>      >     I think that I understood this issue completely.
>      >
>      >     Best Regards,
>      >
>      >
>      >      >>
>      >      >> Best regards,
>      >      >>
>      >      >>> Thanks,
>      >      >>> Eric
>      >      >>>
>      >      >>
>      >      >
>      >
>      >     --
>      >     _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
>      >     Rikimaru Honjo
>      > E-mail:honjo.rikimaru at ntt-tx.co.jp
>     <mailto:E-mail%3Ahonjo.rikimaru at ntt-tx.co.jp>
>      >     <mailto:E-mail%3Ahonjo.rikimaru at ntt-tx.co.jp
>     <mailto:E-mail%253Ahonjo.rikimaru at ntt-tx.co.jp>>
>      >
>      >
>      >
>      >
>      > --
>      > Hervé Beraud
>      > Senior Software Engineer
>      > Red Hat - Openstack Oslo
>      > irc: hberaud
>      > -----BEGIN PGP SIGNATURE-----
>      >
>      > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
>      > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
>      > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
>      > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
>      > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
>      > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
>      > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
>      > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
>      > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
>      > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
>      > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
>      > v6rDpkeNksZ9fFSyoY2o
>      > =ECSj
>      > -----END PGP SIGNATURE-----
>      >
> 
> 
> 
> -- 
> Hervé Beraud
> Senior Software Engineer
> Red Hat - Openstack Oslo
> irc: hberaud
> -----BEGIN PGP SIGNATURE-----
> 
> wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> v6rDpkeNksZ9fFSyoY2o
> =ECSj
> -----END PGP SIGNATURE-----
>