[Indianio is an online hand-in assignment platform I develop and maintain at work. It checks what the students submit -file type, content of archives, etc.- and rejects if necessary. Accepted files can be processed further: compiling, converting to PDF, automatic unit-tests, etc. It even has integration with BlueJ and SPOJ.]
As administrator of Indianio, I had the displeasure of watching it crash and burn a few weeks ago. During an exam. Granted, the number of students was higher than Indianio ever had to endure before. And all those students wanted to hand-in their solutions at the same time: at the end of the exam. But still, there were only 287 students...
When I received the phone call with the message that the server was unreachable, I immediately logged into the server. The Apache daemon had crashed, and the error log was filled with
[emerg] (43)Identifier removed: couldn't grab the accept mutex
[emerg] (22)Invalid argument: couldn't grab the accept mutex
After fiddling about a bit with AcceptMutex (to no avail), I noticed the following lines, preceding the ones about mutexes:
[alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 33
[alert] Child $p returned a Fatal error... Apache is exiting!
A resource exhaustion! The Apache server tried to use system resources, but ran into a limitation. Had I set my limits.conf wrong? Turns out, kind of.
To protect the server agains processing scripts that go haywire, there is a limit of 25 processes for all users that are part of the Indianio platform (group indianio). As it happens, the account that I use to administer server, is also a member of this group. But that poses no problem, because everything happens through
sudo, which resets the rlimits, right?
Well, on my server, it didn't:
davy@indianio:~$ ulimit -u
davy@indianio:~$ sudo bash -c "ulimit -u"
This limitation was also imposed on the Apache daemon and its subprocesses, after restarting it with
sudo /etc/init.d/apache2 restart. The output of the CGI-script I installed to verify my suspicion:
id + id uid=33(www-data) gid=33(www-data) groups=33(www-data) ulimit -u + ulimit -u 25
Note that this doesn't happen when restaring Apache with
apachectl, because that doesn't create a new process.
I just had to share this with you; marvelous song!
I'm currently working on my computer at work. Not local, but remotely, from a computer room during an exam. All the computers here are switched into exam mode, which means everything is filtered, except for access to the Citrix server and some other stuff. Using putty on the Citrix server, I set up a tunnel to a server in the data center. On the desktop I'm currently working on, something similar was done in advance, in such a way that the two tunnels would connect. Next to putty, I'm also running a TightVNC viewer, connected, through those tunnels, with krfb, which is sharing this desktop.
I feel like a bug digging throught an onion!
Since I installed my ChAssNAS device (aka snake) in my basement, I've been having problems with the ethernet connection. The thing becomes unreachable at seemingly random times and nothing but waiting helps. Restarting the entire thing, disabling and enabling the
eth0 device, reconnecting the cable: no dice. I found a thread on the issue tracker of snake-os in which some guys suggest that EMI/RFI are causing troubles. I tried their suggestion with a few ferrite beads I had lying around, but that didn't seem to improve anything.
Instead of trying to improve my probably poor attempt at eliminating noise, I ordered a usb2ethernet thingy from DealExtreme. The idea was that this would suffice until I could replace the NS-K330 with a Raspberry Pi. But alas, delivery from Hong Kong seems to take more time than usual. Luckily, my ADSL-router also has a USB network connection: snake is now has an
eth1 device, provided by
cdc_ether. The connection is only USB1.1, so the speed is not really that high, but it'll do for now.
Extra content! I found an old picture on my cellphone from when I first screwed around with the NS-K330. ''Twas the time that I bricked the board by flashing the wrong SnakeOS image (the one without bootloader, damn!). The only option was desoldering the flash chip and reflashing it with the proper firmware. Luckily, I had a Seeeduino lying around to reflash the 3.3v SPI chip. Instead of resoldering the flash chip in its original place, I did what you can see in the picture. I didn't want to risk a second tricky desolder operation, since the first already destroyed a trace...