[Indianio is an online hand-in assignment platform I develop and maintain at work. It checks what the students submit -file type, content of archives, etc.- and rejects if necessary. Accepted files can be processed further: compiling, converting to PDF, automatic unit-tests, etc. It even has integration with BlueJ and SPOJ.]
As administrator of Indianio, I had the displeasure of watching it crash and burn a few weeks ago. During an exam. Granted, the number of students was higher than Indianio ever had to endure before. And all those students wanted to hand-in their solutions at the same time: at the end of the exam. But still, there were only 287 students...
When I received the phone call with the message that the server was unreachable, I immediately logged into the server. The Apache daemon had crashed, and the error log was filled with
[emerg] (43)Identifier removed: couldn't grab the accept mutex
[emerg] (22)Invalid argument: couldn't grab the accept mutex
After fiddling about a bit with AcceptMutex (to no avail), I noticed the following lines, preceding the ones about mutexes:
[alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 33
[alert] Child $p returned a Fatal error... Apache is exiting!
A resource exhaustion! The Apache server tried to use system resources, but ran into a limitation. Had I set my limits.conf wrong? Turns out, kind of.
To protect the server agains processing scripts that go haywire, there is a limit of 25 processes for all users that are part of the Indianio platform (group indianio). As it happens, the account that I use to administer server, is also a member of this group. But that poses no problem, because everything happens through
sudo, which resets the rlimits, right?
Well, on my server, it didn't:
davy@indianio:~$ ulimit -u
davy@indianio:~$ sudo bash -c "ulimit -u"
This limitation was also imposed on the Apache daemon and its subprocesses, after restarting it with
sudo /etc/init.d/apache2 restart. The output of the CGI-script I installed to verify my suspicion:
id + id uid=33(www-data) gid=33(www-data) groups=33(www-data) ulimit -u + ulimit -u 25
Note that this doesn't happen when restaring Apache with
apachectl, because that doesn't create a new process.