git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[openstack-ansible] strange execution delays


Hi Mohammad,

Restarting of systemd-logind would sometimes hang indefinitely, which is
why we've defaulted to just a hard stop/start of the container. The problem
then slowly begins to creep up again.

If you haven't seen this behavior, then that's still helpful. We'll scour
the environment trying to find *something* that might be causing it.

Thanks,
Joe


On Thu, Jan 2, 2020 at 1:03 PM Mohammed Naser <mnaser at vexxhost.com> wrote:

> Hi Joe,
>
> Those timeouts re almost 99% the reason behind this issue.  I'd
> suggest restarting systemd-logind and seeing how that fares:
>
> systemctl restart systemd-logind
>
> If the issue persists or happens again, I'm not sure, but those
> timeouts are 100% a cause of issue here.
>
> Thanks,
> Mohammed
>
> On Mon, Dec 30, 2019 at 2:51 PM Joe Topjian <joe at topjian.net> wrote:
> >
> > Hi Mohammad,
> >
> >> Do you have any PAM modules that might be hitting some sorts of
> >> external API for auditing purposes that may be throttling you?
> >
> >
> > Not unless OSA would have configured something. The deployment is *very*
> standard, heavily leveraging default values.
> >
> > DNS of each container is configured to use LXC host for resolution. The
> host is using the systemd-based resolver, but is pointing to a local,
> dedicated upstream resolver. I want to point the problem there, but we've
> run into this issue in two different locations, one of which has an
> upstream DNS resolver that I'm confident does not throttle requests. But,
> hey, it's DNS - maybe it's still the cause.
> >
> >>
> >> How is systemd-logind feeling?  Anything odd in your system logs?
> >
> >
> > Yes. We have a feeling it's *something* with systemd, but aren't exactly
> sure what. Affected containers' logs end up with a lot of the following
> entries:
> >
> > Dec  3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: Successful su
> for root by root
> > Dec  3 20:30:17 infra1-repo-container-a0f194b3 su[4170]: + ??? root:root
> > Dec  3 20:30:17 infra1-repo-container-a0f194b3 su[4170]:
> pam_unix(su:session): session opened for user root by (uid=0)
> > Dec  3 20:30:27 infra1-repo-container-a0f194b3 dbus-daemon[47]: [system]
> Failed to activate service 'org.freedesktop.systemd1': timed out
> (service_start_timeout=25000ms)
> > Dec  3 20:30:42 infra1-repo-container-a0f194b3 su[4170]:
> pam_systemd(su:session): Failed to create session: Connection timed out
> > Dec  3 20:30:43 infra1-repo-container-a0f194b3 su[4170]:
> pam_unix(su:session): session closed for user root
> >
> > But we aren't sure if those timeouts are a symptom of cause.
> >
> > Thanks for your help!
> >
> > Joe
>
>
>
> --
> Mohammed Naser â?? vexxhost
> -----------------------------------------------------
> D. 514-316-8872
> D. 800-910-1726 ext. 200
> E. mnaser at vexxhost.com
> W. https://vexxhost.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200102/a9a6ab65/attachment.html>