davy
02/19/13

Indianio: rlimits crash

[Indianio is an online hand-in assignment platform I develop and maintain at work. It checks what the students submit -file type, content of archives, etc.- and rejects if necessary. Accepted files can be processed further: compiling, converting to PDF, automatic unit-tests, etc. It even has integration with BlueJ and SPOJ.]

As administrator of Indianio, I had the displeasure of watching it crash and burn a few weeks ago. During an exam. Granted, the number of students was higher than Indianio ever had to endure before. And all those students wanted to hand-in their solutions at the same time: at the end of the exam. But still, there were only 287 students...

When I received the phone call with the message that the server was unreachable, I immediately logged into the server. The Apache daemon had crashed, and the error log was filled with

[emerg] (43)Identifier removed: couldn't grab the accept mutex
...
[emerg] (22)Invalid argument: couldn't grab the accept mutex

After fiddling about a bit with AcceptMutex (to no avail), I noticed the following lines, preceding the ones about mutexes:

[alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 33
[alert] Child $p returned a Fatal error... Apache is exiting!

A resource exhaustion! The Apache server tried to use system resources, but ran into a limitation. Had I set my limits.conf wrong? Turns out, kind of.

To protect the server agains processing scripts that go haywire, there is a limit of 25 processes for all users that are part of the Indianio platform (group indianio). As it happens, the account that I use to administer server, is also a member of this group. But that poses no problem, because everything happens through sudo, which resets the rlimits, right?

Well, on my server, it didn't:

davy@indianio:~$ ulimit -u
25
davy@indianio:~$ sudo bash -c "ulimit -u"
25

This limitation was also imposed on the Apache daemon and its subprocesses, after restarting it with sudo /etc/init.d/apache2 restart. The output of the CGI-script I installed to verify my suspicion:

id
+ id
uid=33(www-data) gid=33(www-data) groups=33(www-data)
ulimit -u
+ ulimit -u
25

Oops...

Note that this doesn't happen when restaring Apache with apachectl, because that doesn't create a new process.

No feedback yet