ModSecurity: Spontaneously high CPU usage
This bug is hard to replicate, but I’ll try to describe it here. We’ve activated modsec on our servers, but with some we’re noticing extremely high cpu usage. It manifests itself after running for some time in detection mode. We have mlogc logging enabled.
It might be caused by graceful restart, although I’m not sure – no way to replicate reliably.
Here’s what happens with a process that’s consuming 100% CPU:
(gdb) bt
#0 0x00002b36cebad969 in run_child_cleanups () from /usr/local/apache/lib/libapr-1.so.0
#1 0x00002b36cebad9af in cleanup_pool_for_exec () from /usr/local/apache/lib/libapr-1.so.0
#2 0x00002b36cebad9c6 in cleanup_pool_for_exec () from /usr/local/apache/lib/libapr-1.so.0
#3 0x00002b36cebad9c6 in cleanup_pool_for_exec () from /usr/local/apache/lib/libapr-1.so.0
#4 0x00002b36cebad9eb in apr_pool_cleanup_for_exec () from /usr/local/apache/lib/libapr-1.so.0
#5 0x00002b36cebba4e9 in apr_proc_create () from /usr/local/apache/lib/libapr-1.so.0
#6 0x00002b36cf1e86c2 in suphp_script_handler (r=0x1122ceb8) at mod_suphp.c:953
#7 0x00002b36cf1e8ef9 in suphp_handler (r=0x1122ceb8) at mod_suphp.c:569
#8 0x000000000044ac13 in ap_run_handler ()
#9 0x000000000044b4dc in ap_invoke_handler ()
#10 0x00000000004b9ddd in ap_internal_redirect ()
#11 0x00000000004e1f11 in handler_redirect ()
#12 0x000000000044ac13 in ap_run_handler ()
#13 0x000000000044b4dc in ap_invoke_handler ()
#14 0x00000000004b9ddd in ap_internal_redirect ()
#15 0x00000000004b90d3 in ap_die ()
#16 0x00000000004b92ca in ap_process_request ()
#17 0x00000000004b5abd in ap_process_http_connection ()
#18 0x00000000004546cf in ap_run_process_connection ()
#19 0x0000000000454b33 in ap_process_connection ()
#20 0x00000000004e3157 in process_socket ()
#21 0x00000000004e3a82 in worker_thread ()
#22 0x00002b36cebbb3a1 in dummy_worker () from /usr/local/apache/lib/libapr-1.so.0
#23 0x00000037c180683d in start_thread () from /lib64/libpthread.so.0
#24 0x00000037c10d4fcd in clone () from /lib64/libc.so.6
strace tells nothing (as far as I tried). apache running perfectly fine without modsec (SecRuleEngine Off).
# httpd -V
Server version: Apache/2.2.29 (Unix)
Server built: May 14 2015 10:34:22
Cpanel::Easy::Apache v3.28.8 rev9999
Server's Module Magic Number: 20051115:36
Server loaded: APR 1.5.1, APR-Util 1.5.4
Compiled using: APR 1.5.1, APR-Util 1.5.4
Architecture: 64-bit
Server MPM: Worker
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/worker"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses disabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT="/usr/local/apache"
-D SUEXEC_BIN="/usr/local/apache/bin/suexec"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"
Hopefully, will have more info as we gather stats.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 28 (6 by maintainers)
Commits related to this issue
- CHANGES: Adds info on: #890, #2049 — committed to owasp-modsecurity/ModSecurity by deleted user 3 years ago
This example httpd.conf reproduces the bug with only core Apache modules and ModSecurity.
test.conf.txt
I was able to isolate the corruption of the linked list to the functions in re_operators.c that are using rule->ruleset->mp (a global per-process pool) in the threads that handle individual HTTP requests. These should use storage pools that are assigned to the HTTP request rather than a pool that is shared by all the worker threads in the process.
I’ll submit a pull request with fixes for the pool usage in re_execute.c in a moment.
We’re seeing this bug after a big update to our WAF. Today, after deploying it to 35 servers, within an hour Apache’d broken down and tried to use 100% of CPU on 7 of those servers. There’s nothing in strace on the broken processes, and ltrace shows why: they’re all infinite loops, executing a dummy function over and over again:
which is consistent with the backtrace above.
I only see four places in the entire modsec 2.9.2 repo where apr_pool_cleanup_null is registered as a cleanup function, so it should be easy to at least shift blame to Apache.
Although I don’t have anything like an easy “do this and you see the bug” replication, I can replicate it fairly reliably, and with cgroups CPU limitations I can do that while leaving some CPU to spare for any investigations, so please let me know (within a week, ideally) if there’s some investigation you’d like me to perform.
We’re not doing anything with mlogc and SecAuditLog.
This might be related to mlogc being specified via
SecAuditLog "|/usr/bin/mlogc /etc/apache2/conf/mlogc.conf"
. It looks like this is happening after graceful restart has been invoked. Maybe somehow modsec is impeding the pool cleanup? Or maybe there’s a stack corruption of some sort?