10

What software or system do you guys out on server fault use to remind you to do routine maintenance? How do you checklist and log the various items you are supposed to check? Do you have an internal process document? Do you have cron mail you every week with reminders to check system logs?

Also, do you work on a team to do system maintenance, and if so, how do you coordinate who will do what maintenance?

If you use a bug/issue tracking system to enter tasks, do you have a cron job enter recurring tasks?

voretaq7
  • 79,345
  • 17
  • 128
  • 213
Zak
  • 1,032
  • 2
  • 15
  • 25

6 Answers6

5

I'm currently using Request Tracker (http://www.bestpractical.com/rt)
All maintenance events get an associated ticket in the "systems" queue. Notes on problems encountered, who did what work when, etc. are all entered into the ticket, along with necessary approvals.

At the moment our recurring tasks (quarterly patching, etc.) are manually created, but they could be automated easily enough (cron job + email).

Coordinating who is doing what work is relatively easy for us as there's only 2 people in our admin group, but as we scale up the plan is to create a master ticket for maintenance events & use child tickets assigned to the responsible parties to delegate the work.


Daily stuff (log checks, etc.) is another matter: I have all of that farmed out to automated processes:

  • InterMapper keeps an eye on the servers' overall status (SNMP queries looking for high load, low disk space, etc.), functionality of our web interfaces, and sundry other things that could indicate trouble.
  • Syslog-NG collects logs from our hosts & feeds them through a bunch of scripts that check for obvious badness. I cast my eye over the logs occasionally to sanity-check the scripts, but it's not regularly scheduled.
voretaq7
  • 79,345
  • 17
  • 128
  • 213
2

Outlook and OneNote

GregD
  • 8,713
  • 1
  • 23
  • 35
2

Properly implemented automation does away with the need for task and check lists altogether. Why are you manually wanting to check things when you have computers which can do the job far more effectively and efficiently?

Anything that needs periodic checking is checked by the monitoring system. Routine tasks are automated whenever practical and reminders sent for those few tasks that need to be done manually. Documentation is another matter but done right your computers can mostly create their own documentation.

Stop looking for better manual ways and start looking for better automated ways to do any job. The computers are there to work for us, not us to work for them.

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
  • Good rule of thumb: A sysadmin should always be both competent and lazy. The desire not to do work will lead good sysadmins to implement good automation. – voretaq7 Mar 24 '10 at 17:55
  • Let me give a specific example: I need to monitor for security patches for Apache, then generate a new build and test it when a patch does come out. The routine part is monitoring for a new Apache release. Can't just update directly from the (main) repository because it won't have the correct modules compiled in. Also, need to audit to make sure that releases have been checked for. Does that make more sense? – Zak Mar 24 '10 at 22:35
  • Also, I don't want to just roll the latest batch of whatever software until the build has passed QA. Much of QA is automated, but not all of it. – Zak Mar 24 '10 at 22:38
  • And is there a reason that can't all be scripted? Automated checks for updates, sending you an alert when some are available, followed by a scripted compile and install, ready for you to test. Let the machine do the bulk of the work and tell you when your attention is required. – John Gardeniers Mar 25 '10 at 00:06
1

For project-work, it's driven out of the Project Management app (email & calendar integrated with the ability to document detailed work and schedule it for particular people).

For maintenance, upgrades, fixes, etc. we have a ticketing system that more or less integrates with our Change Management process to handle requests and scheduling.

For completely internally-driven work and work on long cycles (quarterly, yearly, etc.) :

Reminders to do things are calendared. Informal/Semi-formal documentation exists ("wiki") for what the general schedule might be.

Some amount of "how to" and procedural documentation exists on how to carry out tasks and is accessible to the team at large but people have their own admin "black books" and logs with notes & recipes.

damorg
  • 1,198
  • 6
  • 10
1

A monitoring system can help with these things:

  • We document each round of monthly maintenance in a word doc file with checkboxes. Each month we save the report into a folder on our NAS. We monitor the folder's minimum file age. If the minimum file age is above 40 days, we get an alarm.

  • One part of our routine maintenance is to reboot selected servers and appliances once a month. We use "system uptime" sensors (SNMP/WMI) on our monitoring software and if the uptime is above 40 days we get an alarm.

  • For backups we monitor the minimum file age in each server's backup folder on our NAS. If the minimum file age is above 10 days, we get an alarm.

Dirk Paessler
  • 914
  • 1
  • 7
  • 15
1

I use Checkpanel (https://checkpanel.com) to manage my recurring maintenance tasks. It provides reusable checklists and an easy interface to log results of each check.

After checking an item, it is not just "done" but remains available for further checks. Each check is recorded so that you can easily review a history of all past checks of an item – including optional details (e.g. error messages for failed checks).

You can set a recurring for each item to make sure that you check it at least once per week / every 2 days / etc. There's a consolidated view of all due items. If you want you can also receive a daily email with all due items.

There's a server maintenance checklists template which you can use as a basis for your own checklists. Other templates include checklists for web applications, WordPress and more.

Disclosure: I am the founder of Checkpanel.