9

This should probably be community wiki. I'm trying to come up with a list of all the sysadmin tasks that we should be doing on a regular basis because I believe we're not doing enough at our company. The attitude around here is that fixing problems is inconvenient, but we don't have time to do preventative maintenance or continuous improvement.

Daily:

  • swap nightly backup tape/drive
  • check that antivirus updates were pushed out to all systems

Weekly:

  • swap weekly backup tape/drive
  • clean temporary files from all systems
  • defrag all systems

Monthly:

  • plan infrastructure improvements
  • deliver/send obsolete equipment to electronics recycler
  • rebuild or replace aging workstations
  • test restore from backup

Annual:

  • rebuild or replace aging servers
  • replace UPS batteries
Scott
  • 1,173
  • 3
  • 13
  • 25
  • You're right, it should be community wiki. Also, don't be in too much of a hurry to segregate tasks like that. e.g. Planning, rebuilding machines, etc. should be done as required, not assigned as a weekly (or any other specific period) task. – John Gardeniers Oct 14 '10 at 21:10

6 Answers6

7

If you have insufficient time to do preventative maintenance and spend most of your time solving problems your entire methodology needs to be revised. Rather than tell you what you should be doing each period I'll give you some ideas so that you won't have to do things.

Fist up you need a good monitoring system and as much automation as you can manage. These two items should free up more time than many admins realise until after they have it set up well.

Just a few of the things your monitoring system should be doing for you are:

  • Alert you when mail or spam filter queues grow too large or too suddenly.
  • Alert you when drive space gets too low, CPU use gets too high, etc.
  • Record things like disk utilisation so that you can see trends over time.
  • Same thing with mailboxes.
  • Alert you when the firewall registers an abnormal number of hits.
  • Same thing for anything serving the outside world. e.g. DNS and web servers.
  • Alert you if AV updates are too old of if any machine has the AV software turned off or uninstalled.

Defragging shouldn't even be on your list of tasks because it should be an automated process. At your desired interval have the server run disk checks and a defrag after a reboot. Consider tying this in with a system to install queued updates and patches (which have previously been tested on a non-production machine).

Temporary folders can also be cleaned with automation. I create a simple application that is triggered after a reboot which waits for 10 minutes and then cleans out all temporary locations. The delay is to ensure it doesn't delete files that may be required for an install or upgrade that completes after the reboot (learned that the hard way!).

One thing you must do manually at whatever time period suits you is to monitor the monitoring system and automation, just to be safe. I check mine daily but haven't actually encountered an issue for over a year.

When you do get your system and automation going make sure you also have a version control system to put it in. It can be real annoying to discover that last little tweak broke something else but you can't remember exactly what you changed.

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
  • Which single monitoring system can do all of that? If it exists, I want it! – Cypher Oct 15 '10 at 00:22
  • @Cypher, you're thinking too narrowly. A single monitoring system will normally include multiple components, such as Nagios and MRTG, just as an OS is more than one component. – John Gardeniers Oct 15 '10 at 01:09
  • i suppose i was really hoping i had missed some amazing tool that could replace the dozen or so tools i currently use for all of those things. :) +1 for automation and automated monitoring tools: if i have to do something more than once, it gets automated. – Cypher Oct 15 '10 at 16:54
  • @Cypher, that amazing toll you're looking for is nothing more that the computer itself. Have it work for you, rather than you work for it. ;) – John Gardeniers Oct 16 '10 at 00:55
1

In Daily, I would have Check Event Logs, either manually or through a script of some sort.

Perhaps Monthly could include OS updates?

I would also say annually take a look at where maintenance/warranty is on your servers.

Christopher
  • 1,673
  • 12
  • 17
1

Monthly:

  • review infrastructure usage - this arguably is lumped in with the 'plan infrastructure improvements' bit, but you can't make plans unless you know (ie. 'have hard data') what bits need improvement.

Quarterly:

  • Test Infrastructure failover - from the app layer (webserver, email) to the network layer (switch, network link) to the physical layer (power), if you've got redundancy in the system that you expect to be able to save you, it needs to be maintained and tested periodically.
pjz
  • 10,497
  • 1
  • 31
  • 40
1

Here are some monthly backups you might not have thought of:

1) Even if automated, I still copy my core network switch config to a local machine
2) Firewall configs
3) SAN configs
4) exported ISA configs (win 2003)
5) DHCP static reservations (win 2008)
6) DNS entries (win 2008)
7) Encryption keys (stored in binary files) to KeePass, especially since our backups are encrypted - additionally saved outside of our backup systems
8) our IT documentation folder, additionally saved outside of our backup systems
jftuga
  • 5,572
  • 4
  • 39
  • 50
0

In Daily, I might recommend that you add subscribing to the well known patch vulnerability mailing lists and have a process on patching/updates.

This might only happen once a month but it takes just that one missed message about a products vulnerability to cause a lot of disruption.

I think this could be trimmed down into a couple of words to fit on one line, if you agree.

BTW; This is a great list, I look forward to seeing its completion/.

Nick O'Neil
  • 1,769
  • 11
  • 10
0

Internal Audits:

  • Compare the systems that went live with the list of systems being backed up. Did anything sneak into production without backup? (at least monthly if not more often depending on how much gets deployed)
  • Go visit your tapes if you have an offsite vault. Make sure they are where they are supposed to be. (once or twice a year)
damorg
  • 1,198
  • 6
  • 10