Yesterday I was benchmarking my OpenLdap server. For this I used the production cluster with 210 machines. So I sshed to all machines and started my little ldap bench program in a `while [ true ]` loop and whent home. This morning I came back stopped all the jobs and had a look at my ldap server. It was still fine and I was quite happy with the outcome. This afternoon then I got and Mail from one of the cluster admins that I had shredded quite a few machines through filling up /var. This happend because every network connection is logged in /var/log/messages. So when the log wanted to roll over gzip failed because it didn't have any space, further sendmail had gone into zomby mode as it couldn't log anymore. So my first real fuck up. But at least then we knew that any user can take down the cluster through syslog. This has been fixed :)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment