I have a large number of systems (100s) managed by a small group of people which has changed over time. Each system is installed using a base image (which has its own version which is different depending on the age of the installation) which is then customised over time (forked) in various ways according to the needs of the client.
I have a copy of each version of the install image. Over 90% of the install image is the same between versions. Customisations are usually less than 3%.
I need to find out which versions are installed and what customisations have been made since the install.
Due to bandwidth constraints, I can't do a network diff
or an rsync --dry-run
over the network*.
However, I envisage being able to run a script over each install image, and send that as a database to each system to compare with its own filesystem and report back - like a "fingerprint", if you will.
The "fingerprint" (filesystem tree + checksum for each file & folder) would be limited to the fileset that are modifiable (and not /proc
, /sys
, /tmp
, pipes, sockets, etc.).
The "fingerprint" can't be an MD5 of the filesystem because one change would result in a different fingerprint and we can't be sure which files may have been customised.
I'm looking for a utility that will report 2 things:
- Suggest which version best matches the filesystem as it currently stands from a database of filesystem "fingerprints" (metadata of tree structure + file & folder checksum), and
- List which files/folders have changed (customised) from that version, including new files and deleted files.
Additionally, it would be good if I could create new databases from existing ones so that I can take information from customisations to make new versions (e.g. Version 2.0.3-withmodX).
I've considered:
- Backup utilities - they presume that versions have a 1:1 linear progression per client
- Image management systems - tend to presume that images go server->client with only known customisation (e.g. new files, specific config folders), where we want information where client (references database)->server.
I could, perhaps, use git
in some way to generate a '.git' database of the filesystem and then send multiple .git databases to compare against, then:
- Least number of
git status
lines = version. git status
output against version = customisations.
Is there such a "fingerprint"-ing utility for filesystems or is there some utility that will make this easier to build?
*although I'm wondering if rsync
can output a database of meta-information which could be used to build such a tool easily.