git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [users@httpd] Apache crashes with: AH03104: apr_thread_create


Rainer, thank you for helping!

As I mentioned, event MPM settings are at their defaults. I don't think this website server needs to handle more than 400 req/s (ServerLimit (16) X ThreadsPerChild (25)) since it serves about 380K page views per month. The homepage receives 80% of this traffic and it fetches 20 assets inclusively. If I do the math, this yields ~3req/s. Although increasing user limits is above my competency level, I do understand (and agree) that it is more important to research the cause of the load spike. However, we aggressively perform full page caching which rules out PHP-FPM and MySQL as culprits.

Is there any way that I could trigger the error - as in using Apache Bench? What would be more helpful is to be able to see which resource (URL) forced Apache to spawn a new process and resulted in the crash. Any ideas?

Thanks!!

Jerry Martinez

-----Original Message-----
From: Rainer Jung [mailto:rainer.jung@xxxxxxxxxxx] 
Sent: Saturday, September 22, 2018 5:12 PM
To: users@xxxxxxxxxxxxxxxx; Jerry Martinez
Subject: Re: [users@httpd] Apache crashes with: AH03104: apr_thread_create

Am 22.09.2018 um 22:17 schrieb Jerry Martinez:
> Hello!
> 
> Apache has been randomly crashing (for a few months now) and I cannot 
> seem to understand why. I cannot replicate the crash even when hitting 
> the server with 4,000 requests @ a concurrency of 500. This is a 
> production server and I am willing to compensate someone for their 
> efforts resolving this. Below is a sample of one, of the many, error messages:
> 
> [Fri Sep 21 11:27:24 2018] [mpm_event:alert] (11)Resource temporarily
> unavailable: AH03104: apr_thread_create: unable to create worker 
> thread

apr_pthread_create() on Linux/Unix is mostly pthread_create(). The man page for that an SLES 12 tells us:

=== SNIP ===

        EAGAIN Insufficient resources to create another thread.

        EAGAIN A  system-imposed  limit  on  the number of threads was encountered.  There are a number of limits that may trigger this error: 
the RLIMIT_NPROC soft resource
               limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached; the kernel's system-wide limit on  the  number
               of  processes  threads,  /proc/sys/kernel/threads-max, was  reached  (see proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see
               proc(5)).

        EAGAIN The system lacked the  necessary  resources  to  create another  thread,  or  the  system-imposed  limit  on  the  total  number
  of  threads  in  a  process
               {PTHREAD_THREADS_MAX} would be exceeded.

=== SNIP ===

Since your system seems to have lots of free memory, I don't expect a memory shortage unless there's a memory leak and the memory numbers you showed below would be very different when the crash actually happens. 
Each thread needs a thread stack in memory.

What could happen is that the limit of threads your use can create (sum over all of his processes) hits the nproc limit. Note that although it is called nproc = number of processes, what it limits on Linux is actually the (much bigger) number of threads per user.

Other limits could be total number of threads or processes and number of file descriptors per process.

What is a bit strange though: typically Apache httpd does not start single threads. When it needs more concurrency it starts new processes, each process having ThreadPerChild worker threads. So it seems that due to increased load - or more likely if it is a reverse proxy due to a temporary slowness of the backend - you web server needs to start new processes. The maximum number is in your MPM config.

So even if you find the reason for not being able to create more threads and you can get rid of that, the next thing might be that your httpd will end up with all worker threads busy and you need to find out, why the load is so high or more likely some backend gets slow.

BTW: if you want to get a better idea, what processes and threads get used, to can add %P (process id) and %{tid}p (thread id) to your access log format. And retrieving the number of busy and idle workers from server_status regularly can tell you, when exactly the increase in threads starts and how quickly it goes up.

Regards,

Rainer


> Below is more information that might be useful:
> 
>> cat /etc/SuSE-release
> SUSE Linux Enterprise Server 12 (x86_64) VERSION = 12 PATCHLEVEL = 2 # 
> This file is deprecated and will be removed in a future service pack 
> or release.
> # Please check /etc/os-release for details about this release.
>   
>> cat /etc/os-release
> NAME="SLES"
> VERSION="12-SP2"
> VERSION_ID="12.2"
> PRETTY_NAME="SUSE Linux Enterprise Server 12 SP2"
> ID="sles"
> ANSI_COLOR="0;32"
> CPE_NAME="cpe:/o:suse:sles:12:sp2"
>   
>> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                12
> On-line CPU(s) list:   0-11
> Thread(s) per core:    2
> Core(s) per socket:    6
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
> Stepping:              2
> CPU MHz:               1200.199
> CPU max MHz:           3200.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              4794.82
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              15360K
> NUMA node0 CPU(s):     0-11
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts 
> rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq 
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm 
> pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes 
> xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm 
> tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 
> smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc
>   
>> free -m
>               total       used       free     shared    buffers     cached
> Mem:          7547       2691       4856        365          6       1965
> -/+ buffers/cache:        719       6828
> Swap:         2062          0       2062
> 
> Apache information
> Server Version: Apache/2.4.34 (Unix) OpenSSL/1.0.2l Server MPM: event 
> **All MPM event settings are default.**
>   
> Should I enable some type of core dump settings? I do have the 
> scoreboard enabled (mod_status) and if it helps at all, this is where 
> the error is being triggered:
> https://github.com/apache/httpd/blob/571b20fb11ae3eb1498b2e279423b2d53
> eda7e4
> b/server/mpm/event/event.c#L2620
> 
> Thank you so much in advance!
> 
> 
> Jerry Martinez



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx