lfelton May 5 2008, 08:38 AM

Hi - could someone give me some pointers to help identify this problem?

Our linux server was "down" this morning, actually responding extremely slow to all requests.

I rebooted it and all was fine... for a while. Then it slowed, then it slowed again to almost a complete stop.

I was logged in via SSH and even this was slow to respond. CPU usage was minimal, as was memory. Disks are all fine. The network seemed to have a high level of traffic, around a million packets per hour transmitted and slightly less recieved. There didn't seem to be an unusually high number of httpd processes (25ish) and traffic seemed to be going at about the same rate if I shut httpd down.

I rebooted again and monitored the network more closely. Initially send / recieve traffic was idling at about .5 packets per second and the http response (tested from a remote network) was very fast (as it normally is). Somewhere around 7-8 minutes after booting the send receive traffic jumped up to about 100 packets per second and at the same time http response slowed noticably, but it still usable. I am +48 minutes from boot and the packets per second is staying very constant, the http is still slow but not at a standstill.

Can anyone suggest how I can identify the cause of this problem?

michaelk May 5 2008, 03:06 PM

What linux distribution/version are you running?
Have you looked at the logs?

lfelton May 6 2008, 04:17 AM

oops, sorry should have mentioned that.

Logs were not much help I'm afraid.

It turns out that I was on the right track with the network traffic. This is a server located in an ISP DC and it's main function is to to sit on a fast internet connection and send and receive customer files which are sync'ed with servers in our own DC. This problem exposed two issues: the throttle mechanism which limits how much data an individual user can send us had been "broken" when a script had been updated (self inflicted wound), second the sync process is not sophisticated enough to cope with this scenario. We are B2B and a new customer on a (very) fast internet pipe had accidently left a huge multi-file upload running and gone home. The problems with the sync process meant that this acted like a DOS and ate all available bandwidth.

thanks for your help!

