git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [users@httpd] Apache crashes with: AH03104: apr_thread_create


Am 22.09.2018 um 22:17 schrieb Jerry Martinez:
Hello!

Apache has been randomly crashing (for a few months now) and I cannot seem
to understand why. I cannot replicate the crash even when hitting the server
with 4,000 requests @ a concurrency of 500. This is a production server and
I am willing to compensate someone for their efforts resolving this. Below
is a sample of one, of the many, error messages:

[Fri Sep 21 11:27:24 2018] [mpm_event:alert] (11)Resource temporarily
unavailable: AH03104: apr_thread_create: unable to create worker thread

apr_pthread_create() on Linux/Unix is mostly pthread_create(). The man page for that an SLES 12 tells us:

=== SNIP ===

       EAGAIN Insufficient resources to create another thread.

EAGAIN A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this error: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached; the kernel's system-wide limit on the number of processes threads, /proc/sys/kernel/threads-max, was reached (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see
              proc(5)).

EAGAIN The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process
              {PTHREAD_THREADS_MAX} would be exceeded.

=== SNIP ===

Since your system seems to have lots of free memory, I don't expect a memory shortage unless there's a memory leak and the memory numbers you showed below would be very different when the crash actually happens. Each thread needs a thread stack in memory.

What could happen is that the limit of threads your use can create (sum over all of his processes) hits the nproc limit. Note that although it is called nproc = number of processes, what it limits on Linux is actually the (much bigger) number of threads per user.

Other limits could be total number of threads or processes and number of file descriptors per process.

What is a bit strange though: typically Apache httpd does not start single threads. When it needs more concurrency it starts new processes, each process having ThreadPerChild worker threads. So it seems that due to increased load - or more likely if it is a reverse proxy due to a temporary slowness of the backend - you web server needs to start new processes. The maximum number is in your MPM config.

So even if you find the reason for not being able to create more threads and you can get rid of that, the next thing might be that your httpd will end up with all worker threads busy and you need to find out, why the load is so high or more likely some backend gets slow.

BTW: if you want to get a better idea, what processes and threads get used, to can add %P (process id) and %{tid}p (thread id) to your access log format. And retrieving the number of busy and idle workers from server_status regularly can tell you, when exactly the increase in threads starts and how quickly it goes up.

Regards,

Rainer


Below is more information that might be useful:

cat /etc/SuSE-release
SUSE Linux Enterprise Server 12 (x86_64)
VERSION = 12
PATCHLEVEL = 2
# This file is deprecated and will be removed in a future service pack or
release.
# Please check /etc/os-release for details about this release.
cat /etc/os-release
NAME="SLES"
VERSION="12-SP2"
VERSION_ID="12.2"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP2"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp2"
lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1200.199
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4794.82
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
cqm_occup_llc
free -m
              total       used       free     shared    buffers     cached
Mem:          7547       2691       4856        365          6       1965
-/+ buffers/cache:        719       6828
Swap:         2062          0       2062

Apache information
Server Version: Apache/2.4.34 (Unix) OpenSSL/1.0.2l
Server MPM: event
**All MPM event settings are default.**
Should I enable some type of core dump settings? I do have the scoreboard
enabled (mod_status) and if it helps at all, this is where the error is
being triggered:
https://github.com/apache/httpd/blob/571b20fb11ae3eb1498b2e279423b2d53eda7e4
b/server/mpm/event/event.c#L2620

Thank you so much in advance!


Jerry Martinez

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx