I have been tasked with preparing a list of checks to do as a part of a weekly system health check routine which my team is supposed to do. The problem is neither I nor any of my colleagues have ever been a professional system administrator and the best we can come up with is pretty laughable.
The system runs Siemens SIMATIC IT and LIMS, but I’m interested in some generic checks/tests for operating systems and database servers. Someone else will take care of tests specific to the applications being run.
The setup is a follows:
All servers are virtual, running in the vSphere5 environment.
- Web server – MS Windows Server 2003 R2
- 2 servers running SIMATIC IT components one for Historian and one for the Production Modeler and other components – MS Windows Server 2003 R2
- Database server – MS Windows Server 2003 R2 + MS SQL Server 2005
- Database + LIMS server – MS Windows Server 2008 R2 + Oracle Database 11g
Most probably we will not get access to the vCenter console, so the idea is to connect a remote desktop to those servers, make some constructive checks/ tests and prepare a report.
As I wrote already, there is no much besides checking for a free disk space, that I can come up with. I can also think of checking the level of fragmentation of a file system and file system errors with ChkDsk, looking into windows event viewer for some important errors and warnings, checking the level of index fragmentation in databases and maybe collecting some statistics of response times and times of execution of some important queries.
I will greatly appreciate any help. Besides information about what should be checked, hints for what not to do on a system that is under load 24/5 will also be very helpful. For example running a defragmenter even just for analysis on a database server under load might be a very bad idea, but I don’t know it yet.
Thank you.