[openstack-ansible] strange execution delays
Restarting of systemd-logind would sometimes hang indefinitely, which is
why we've defaulted to just a hard stop/start of the container. The problem
then slowly begins to creep up again.
If you haven't seen this behavior, then that's still helpful. We'll scour
the environment trying to find *something* that might be causing it.
On Thu, Jan 2, 2020 at 1:03 PM Mohammed Naser <mnaser at vexxhost.com> wrote:
> Hi Joe,
> Those timeouts re almost 99% the reason behind this issue. I'd
> suggest restarting systemd-logind and seeing how that fares:
> systemctl restart systemd-logind
> If the issue persists or happens again, I'm not sure, but those
> timeouts are 100% a cause of issue here.
> On Mon, Dec 30, 2019 at 2:51 PM Joe Topjian <joe at topjian.net> wrote:
> > Hi Mohammad,
> >> Do you have any PAM modules that might be hitting some sorts of
> >> external API for auditing purposes that may be throttling you?
> > Not unless OSA would have configured something. The deployment is *very*
> standard, heavily leveraging default values.
> > DNS of each container is configured to use LXC host for resolution. The
> host is using the systemd-based resolver, but is pointing to a local,
> dedicated upstream resolver. I want to point the problem there, but we've
> run into this issue in two different locations, one of which has an
> upstream DNS resolver that I'm confident does not throttle requests. But,
> hey, it's DNS - maybe it's still the cause.
> >> How is systemd-logind feeling? Anything odd in your system logs?
> > Yes. We have a feeling it's *something* with systemd, but aren't exactly
> sure what. Affected containers' logs end up with a lot of the following
> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su: Successful su
> for root by root
> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su: + ??? root:root
> > Dec 3 20:30:17 infra1-repo-container-a0f194b3 su:
> pam_unix(su:session): session opened for user root by (uid=0)
> > Dec 3 20:30:27 infra1-repo-container-a0f194b3 dbus-daemon: [system]
> Failed to activate service 'org.freedesktop.systemd1': timed out
> > Dec 3 20:30:42 infra1-repo-container-a0f194b3 su:
> pam_systemd(su:session): Failed to create session: Connection timed out
> > Dec 3 20:30:43 infra1-repo-container-a0f194b3 su:
> pam_unix(su:session): session closed for user root
> > But we aren't sure if those timeouts are a symptom of cause.
> > Thanks for your help!
> > Joe
> Mohammed Naser â?? vexxhost
> D. 514-316-8872
> D. 800-910-1726 ext. 200
> E. mnaser at vexxhost.com
> W. https://vexxhost.com
-------------- next part --------------
An HTML attachment was scrubbed...