[Linux-aus] linux.conf.au

Andrew Pollock me at andrew.net.au
Tue Dec 6 09:59:03 UTC 2005


On Tue, Dec 06, 2005 at 02:09:18PM +1300, Nick Phillips wrote:
> On 6/12/2005, at 12:31 PM, Michael Still wrote:
> 
> >Nick Phillips wrote:
> >>On 5/12/2005, at 9:36 AM, Dassa wrote:
> >>>I'm getting a connection refused when I try to get there.
> >>Thanks again. Restarted it again. Get in now while the damn thing  
> >>is  still up!
> >>Looks like we need to have a serious look at it :-(
> >
> >Given there is money riding on some LA machines being up (I'm  
> >thinking the LCA machine more than others), and that there is  
> >certainly "reputational capital" riding on the other machines being  
> >up, is it time to roll out some sort of monitoring solution for  
> >those machines?
> >
> >I'm willing to put some spare time if I ever have any into looking  
> >into it if people think it's a good idea.
> 
> I think some kind of monitoring would be a good idea -- it just looks  
> extremely amateurish to have the main site for something like this  
> yoyoing frantically the whole time.
> 
> That said, it would be better to work out what the problem is and fix  
> it. However, if we can do both...
> 
> Andrew -- can you give us any more details about what was going on  
> that meant we had to reboot the UML yesterday? Could that have been  
> causing the problems we were seeing inside the UML?

/tmp filled up on umlhost, and on inspection, there were a number of large
deleted files being help open in /tmp by the UML processes. I figured a
reboot would cause these files to be closed, and thus properly removed from
disk.

That said, there's a few there again now (but /tmp isn't full).

https://umlhost.linux.org.au/cacti/graph.php?local_graph_id=7&rra_id=all
sort of summarises the situation (Mike has access, can pull the graphs out)

I'm happy to run up Nagios on my personal box, which is in Brisbane, and
should provide good "third-party" monitoring, however I think it's better to
treat the cause, not the symptoms. If you'd like help troubleshooting, just
ask. I've got root access, but historically the LA admins are keeping at
arms length from the admining of the LCA stuff.

regards

Andrew




More information about the linux-aus mailing list