52

So we've all probably had this situation: you debug some problem, only to realize it was caused by a config change you made six months ago, and you can't remember why you did it. So you undo it and fix the problem, and now some other problem comes back. Oh yeah, NOW I remember! Then you fix it properly.

It's because you didn't take proper notes, you fool! But what's a good way to do this?

In engineering we have loads of software meant to help us detect and track changes. Source control, code reviews, and so on. Every change is tracked, every change requires a comment as to what it is. And typical engineering departments require good comments so that in six months when you're figuring out why you broke it like that, you can use a historical 'blame' feature or binary search builds to pinpoint the problem. These tools are very effective communication tools and historical records.

But in serverland, we have 500 different services, all with different ways of configuring them. And they don't always have a text format (consider setting permissions on a folder or altering the pagefile location) though they may have a textual representation.

In our environment, we check in what config files that we can into Perforce, but there are very few of those. Can't exactly check in the Active Directory DB..though perhaps a dump that could be diff'd...

In the past I have tried keeping a manual change log in our wiki, but it's super hard to maintain the discipline to do this (I know, not a good excuse, but it really is tough).

MY QUESTION: What strategies and tools do you use to cope with this problem of tracking configuration changes to your servers?

-- Update --

Note: I'm not looking for shared-note taking tools (I'm familiar with OneNote, etc) so much as automated tools specifically meant to help with tracking server changes. There's no comprehensive tool for tracking server config changes, but perhaps there are some for specific applications like GPO's.

Also I am very interested in specific strategies that you've found useful. "We share notes in Sharepoint" is pretty vague. How do you maintain the discipline? What format do you use to track your changes? How do you organize your change data? I'd really like examples as well as ideas.

user9517
  • 114,104
  • 20
  • 206
  • 289
scobi
  • 879
  • 3
  • 13
  • 17

12 Answers12

20

In Linux land, people are pursuing a couple of different strategies:

  • Configuration constraint systems, like cfengine or puppet or chef. These are similar to windows GPOs. Point being that all the server configuration is intentionally documented in a single place and you know at what granularity (server room, group, specific server) the policy is enacted. This won't quite save you from "what was the hell was different six months ago?" but it does let you just nuke a server config and rebuild from scratch. You might put the cfengine and puppet policies under revision control to answer the question.
  • Revision controlling /etc. Generally, Linux programs store their configuration in one place, /etc. The daring are beginning to write scripts to put /etc into revision control. One such program I know of is etckeeper:
Description: store /etc in git, mercurial, bzr or darcs
 The etckeeper program is a tool to let /etc be stored in a git, mercurial,
 bzr or darcs repository. It hooks into APT to automatically commit changes
 made to /etc during package upgrades. It tracks file metadata that version
 control systems do not normally support, but that is important for /etc, such
 as the permissions of /etc/shadow. It's quite modular and configurable, while
 also being simple to use if you understand the basics of working with version
 control.
jldugger
  • 14,122
  • 19
  • 73
  • 129
  • 1
    +1 for mention of both types of system, and specifically etckeeper which makes this quite easy - works with git or hg. – RichVel Dec 08 '11 at 15:03
  • 1
    I use one to install the other, and thus have both. – Dan Garthwaite Apr 01 '15 at 17:24
  • FYI the **cfengine** link points to www.cfengine.org, which is now broken. The official site is now located at [www.cfengine.com](http://www.cfengine.com). Also **ectkeeper** now has a home page at [etckeeper.branchable.com](http://etckeeper.branchable.com/) – e_i_pi Jul 19 '17 at 01:24
  • @e_i_pi and also puppet is no longer puppetlabs. – jldugger Jul 19 '17 at 18:39
10

One of the problems in this situation is that, really, it's a combination business process/technological problem. And it is definitely bigger than just tracking what changes an admin made. You also need to keep an eye out for unexpected changes, and good coordination between admins or units so that a change on an AD controller doesn't break a database permissions setting on some departmental server. I.e., your question is a giant can of worms :)

In my organization, we are about a year into rolling out processes and systems to address this. For the business process side we formed a Change Management team. According to SOP all changes to production environments are coordinated through them. They compile all the changes, along with scope, systems affected, services affected, etc. Enforce good documentation on the changes, as well as both roll-out and roll-back plans. Host weekly (open) meetings to go over upcoming environment changes, then send emails out detailing all of these changes. The end goal with this process is so that, effectively, everybody in IT knows everything else that is going on. This helps stop the problem of, for example, a SysAdmin installing a kernel patch and rebooting a system that will take down the timeclock database.

As for the technological side I can only speak of the Unix/Linux guys since I don't deal with Windows. They have been rolling out Puppet, by Reductive Labs, for configuration management of all of those systems. Simply, is a client/server system where one defines a machine configuration on the server, and the client pulls those chances every so often (30 minutes by default). Additionally, if any chances are made to managed files locally then they are reverted back at that time as well. We use it for managing running services, firewall configurations, user authorization, etc.

I would also recommend looking into something like TippingPoint. It is a client service that watches system configuration, and sends alerts on changes. It makes us security folks most happy. It is largely used for tracking malicious or unpublished changes.

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
  • When you store puppet config files in a VCS, you get a complete history and log of your server configurations, very neat :) But, converting every thing to a puppet script requires another discipline :D – hayalci May 23 '09 at 21:32
  • I never said it was easy, only useful :) The trick with puppet is to make prolific use of modules, an to remember that your efforts *will* be rewarded. Now if only RSA enVision had a parser for the logs... – Scott Pack May 23 '09 at 22:36
  • You are absolutely correct that the problem is bigger than just the technology of recording changes. But let's not expand the problem into the realm of the unsolvable either. Having an effective tool can focus your team and not having one destroys the morale of trying to effect a change in their way of thinking. I've implemented a few different systems, the best is probably still the wiki page with a table of changes, but it's still not perfect. /etckeeper is definitely a plus, but hard to scale across systems. and most important: Active Directory! This is the key need. – ckg Jul 20 '15 at 13:50
4

I have been at 4 or 5 companies now I don't really remember.

We all had this problem. None of us have solved it 100 percent, but at the company I am at now we have what I think is the best strategy to date.

Sharepoint/Wiki/Evernote/PINs

  • Sharepoint
    • moan all you want...it has some very nice list features.
    • IP address lists
    • inventory
    • service accounts and use
    • change notification logs
  • Wiki
    • How-to's
    • long range task lists
  • Evernote
    • my partner and I use this to put everything we don't want in Wiki
    • more how-to's that are technical in nature
    • scratch notes we both need to see
    • task accounting for the week
    • contractor task lists
    • evernote clipper makes it easy to screen shot AD/rights settings
    • available everywhere
  • PINs
    • Password repository
Thomas Denton
  • 686
  • 5
  • 13
2

There's probably better tools for some of these, but this is what we use:

  • Track configuration changes and upgrades/patches on a per-server basis in a private wiki
  • Also keep howtos and a record of problems/solutions in the wiki
  • Use Sharepoint or Google Docs to keep authoritive copies of things like static IP lists
  • use Subversion to track changes to configuration files
Brent
  • 22,219
  • 19
  • 68
  • 102
  • i like using source control on config files - do you enforce "useful" comments when checking-in or -out a version? – warren Sep 08 '09 at 10:40
  • No, in fact I have written a couple of scripts (submit and revert) to make submitting and reverting changes easier. However, we are now experimenting with etckeeper. – Brent Sep 08 '09 at 14:52
2

For Windows, check out Microsofts System Center series or any other competitor in configuration and service management for that platform.

The changes need to be routed through a decent change management routine which by itself approves and logs them before they're actually done. This can be 100% manual for starters. With some of the better integrated tools you could ask the tool to do the actual changes and get "automatic" logging out of it to a central configuration database - rather than go bare-hands into an individual server's console, digging through settings by hand to try and fix a problem cowboy-style.

Oskar Duveborn
  • 10,740
  • 3
  • 32
  • 48
2

You absolutely should have a change management process in place, especially if there are multiple people who have the ability/access to make changes on the system level in your environment. This also provides a way for management to sign off on potential changes, however the downside it does induce latency in the change process if you can't do changes on the fly.

Some ways of tracking changes might include the validation of events in your SEM (assuming you have a Security Event Manager) or tools such as Nessus (with a lot of work can audit your environment to find changes).

David Yu
  • 1,032
  • 7
  • 14
2

This is a more localised, *nix based answer. I've not found any good tools to emulate it under Windows.

There's a few ways to implement this ... and to catch it when you forget.

Revision control systems like subversion, git, cvs or RCS is a good way of tracking the history of a config file. If you don't want to install a revision control system on your production servers, storing configuration file directories either locally or remotely using something like rsnapshot will give you most of the benefits of a RCS, but you lose the possibility of auditing or leaving commit logs (although this could be worked around with comments inside the files themselves).

To help you remember to log the changes, automated reporting of configuration changes via a nightly, cron'ed tripwire run is a good start. After building tripwire's database of the current state of files, any change to them will result in an email during the next run. You will continue to receive this mail until the database is updated, thus "resetting" the tripwire.

Greg Work
  • 1,956
  • 12
  • 11
1

I would use an issue tracking system such as flyspray (any will do, but I like flyspray for non-programming stuff). Before anyone touches a config, the improvement/problem should be logged. When you fix/implement it, the changes go in the ticket.

A wiki can be nice to document the current setup, but it's easy for it to get out of date - and it seems to take more effort to update IMO.

You're not going to find something automated to do this - although you could probably set it up so changes to certain config files got automatically emailed to the issue tracker if you wanted.

I think it's just a matter of a good policy, low-barrier tools and discipline.

Draemon
  • 517
  • 1
  • 5
  • 15
1

We created something homegrown to do change log tracking in our environment; it's not anything super-complicated, and it works quite well.

  • A self-policing policy is setup that any change that in your estimation either deviates from an out-of-the-box setup or could potentially cause issues, should be documented in the changelog system.
    • opposite side of this 'coin' is if you are troubleshooting a problem, search for recent or related changelog entries.
  • Sign into the system and choose the server, service, or hardware component that you are changing
    • the components are previously entered into the same system with basic 'demographic' information (location, vendor, serial number, responsible department)
  • Choose from a drop-down of basic categories
    • Unscheduled downtime
    • Patching
    • Hardware Maintenance
    • Software Installation
  • Put in details of what you did, saw, observed
  • a copy is sent to the responsible party and stored as XML files that are indexed by a search appliance.
  • Profit

As I said, nothing fancy. It uses PERL CGI (was written a billion years ago), and a Google Search appliance for indexing.

Shortcomings:

  • Groups of services are hard to work with, for example, you just added the same patch to all 25 of your domain controllers; we don't have a "Domain Controller" group, so we have to manually select them all
  • Doesn't integrate with hardware, software, or event log error reporting to help with troubleshooting
  • relatedly, manual data entry for all 'demographic' data as I said above

Anyway, if after all that you'd be interested in the code, let me know and I can probably grab it to share.

Guamaniac
  • 458
  • 1
  • 5
  • 8
1

As said, its often a cultural issue - after all, some development shops don't bother with comments anymore (self documenting code is a fashionable buzzword today!) and some use a version control system as a holy grail of historical records. Obviously, these aren't perfect.

So, the only true way to fix it is to make it a cultural solution. Ensure all reasons for change are logged in a bug tracker (or knowledgebase, or wiki), and ensure all changes are logged in a change control system.

We have emergency service customers, every change that happens to their system is logged, and every time we log into their system, we have to log it. For some of them, we have to phone for permission first (and I guess they log that too!). Every change is logged, and it'll be a disciplinary offence to change the customer system without logging it.

It sounds onerous, but its not. You quickly get into the habit of adding yourself to the access log and change log - its no worse than having to write a comment when checking in a code change.

I recommend a bugtracker as the change control reason log, as they're usually easy to update (I use Mantis).

gbjbaanb
  • 3,852
  • 1
  • 22
  • 27
1

If you're looking for the "enterprise solution" (ie, you have more money than god and want to have a really cool tool), the tool I used to support and provide onsite work for does this as one of its multitudinous features.

No idea what the base pricing is, but before HP bought Opsware, it was ~$350,000 US (with no support, and trust me - you wanted support when I started with Opsware).

Several of the customers we had while I worked there used the application configuration and snapshot features in conjunction with Tripwire.

Of course, if you have no budget - this is a Bad Choice™ :)

And, fwiw, the ad that appeared at the top of this page for me when I reloaded it was for spiceworks. Looks mighty similar to HPSA :)

warren
  • 17,829
  • 23
  • 82
  • 134
1

If all you want to do is track changes and not manage the whole process (i.e., via Chef or Puppet), just rsync your etc directory (wherever that might be) into a local git repo.

for HOST in alpha bravo charlie delta ...; do

    rsync -avz --exclude-from=exclusions -e ssh admin@$HOST:/opt/local/etc/ ./$HOST

done

You can, of course, add other sources as needed.

PartialOrder
  • 302
  • 1
  • 2
  • 8