16

So we have a file share that was started 10 years or so ago and it started off with the best intentions. But now it's gotten bloated, there's files in there that nobody know who put them there, it's hard to find information, ect ect. You probably know the problem. So what I'm wondering is what do people do in this situation. Does anyone know of a decent program that can go through a file share and find files that no body has touched? Duplicate files? Any other suggestions on cleaning this mess up?


Well the file share is windows based and it's almost over 3TB. Is there a utility out there that can do some reporting for me. We like the idea of being able to find anything older then 6 months and then taking it to archive, only problem is with a file share this big that could be really hard to do by hand.

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148

9 Answers9

30

We counsel Customers to "scorch the earth" and start fresh, oftentimes.

I have yet to see a good solution that works that doesn't involve have non-IT stakeholders involved. The best scenario I've seen yet is a Customer that has had management identify "stewards" of various data areas and delegated control of the AD groups that control access to those shared areas to those "stewards". That has worked really, really well, but has required some training on the part of the "stewards".

Here's what I know doesn't work:

  • Naming individual users in permissions. Use groups. Always. Every time. Without fail. Even if it's a group of one user, use a group. Job roles change, turnover happens.
  • Letting non-IT users alter permissions. You'll end up with "computer Vietnam" (the parties involved have "good" intentions, nobody can get out, and everybody loses).
  • Having too grandiose-ideas about permissions. "We want users to be able to write files here but not modify files they've already written", etc. Keep things simple.

Things that I've seen work (some well, others not-so-well):

  • Publish a "map" indicating where various data types are to be stored, typically by functional area. This is a good place to do interviews with various departments and learn how they use file shares.
  • Consider "back billing" for space usage or, at the very least, regularly publishing a "leader board" of the departmental space users.
  • Did I mention naming groups exclusively in permissions?
  • Develop a plan for data areas that "grow without bounds" to take old data "offline" or to "nearline" storage. If you allow data to grow forever it will, taking your backups with it to infinity.
  • Plan on some kind of trending for space usage and folder growth. You can use commercial tools (someone mentioned Tree Size Professional or SpaceObServer from JAM Software) or you can code something reasonable effective up yourself with a "du" program and some scripting "glue".
  • Segment file shares based on "SLA". You might consider having both a "business-critical" share that crosses departmental lines, and a "nice to have running but not critical" share. The idea is to keep the "business-critical" share segregated for backup/restore/maintenance purposes. Having to take down business to restore 2TB of files from backup, when all that was really needed to go about business was about 2GB of files, is a little silly (and I've see it happen).
Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
  • 1
    I'd +42 if I could. =) – Wesley Jul 07 '09 at 00:15
  • 2
    An alternative to "tree Size Pro" etc is "WinDirStat - free and visually useful, you see immediately where the space is going, and which file type. Unpack it to your home directory and you can run it anywhere. – nray Jul 07 '09 at 04:55
  • While the above may SEEM time-consuming, trust me - the alternative (trying to 'fix' it yourself, trying to get software to complete these human tasks) will burn a lot more time, with less-than-useful results. – Kara Marfia Jul 07 '09 at 14:05
  • 1
    You'll end up with "computer Vietnam"...explains the additional burden. – Saif Khan Jul 07 '09 at 17:44
6

I agree with Evan that starting over is a good idea. I've done 4 "file migrations" over the years at my current company, and each time we set up a new structure and copied (some) files over, backed up the old shared files and took them offline.

One thing we did on our last migration might work for you. We had a somewhat similar situation with what we called our "Common" drive, which was a place where anyone could read/write/delete. Over the years, a lot of stuff accumulated there, as people shared stuff across groups. When we moved to a new file server, we set up a new Common directory, but we didn't copy anything to it for the users. We left the old Common in place (and called it Old Common), made it read-only, and told everyone they had 30 days to copy anything they wanted to the new directories. After that, we hid the directory but we would un-hide it on request. During this migration, we also worked with all the departments and created new shared directories and helped people identify duplicates.

We've used Treesize for years for figuring out who's using disk space. We've tried Spacehound recently and some of my co-workers like it, but I keep going back to Treesize.

After our most recent migration, we tried setting up an Archive structure that people could use on their own, but it hasn't worked very well. People just don't have the time to keep track of what's active and what's not. We're looking at tools that could do the archiving automatically, and in our case it would work to periodically move all the files that haven't been touched for 6 months off to another share.

Ward - Reinstate Monica
  • 12,788
  • 28
  • 44
  • 59
  • 'Common' drive is fantastic. I'm going to steal that if I ever get a new file server approved. ;) – Kara Marfia Jul 07 '09 at 14:07
  • 1
    The trick that I've used with a "common drive" that has worked well is a script that deletes directories and files more than a specified number of days old (analagous to a janitor cleaning up the mess everyone left). Having said that, I've seen multiple occasions where employees left sensitive material on the "common drive" because they didn't understand that it was accessible to everyone. Clearly, some education was in order. – Evan Anderson Jul 07 '09 at 14:21
2

At 3TB you probably have a lot uf huge unnecessary files and duplicated junk in there. One useful method I've found is to do searches, starting for files > 100MB (I might even go up to 500MB in your case) then take it down. It makes the job of finding the real space wasters more manageable.

Maximus Minimus
  • 8,937
  • 1
  • 22
  • 36
1

My first order of business would be to use an enterprise file manager/analyzer/reporter/whatever-you-want-to-call-it such as TreeSize Professional or SpaceObServer. You can see what files are where, sort by creation data, access date and a host of other criterion including statistics on file types and owners. SpaceObServer can scan various file systems including remote Linux/UNIX systems via an SSH connection. That can give you great visibility into your collection of files. From there, you can "Divide and Conquer".

Wesley
  • 32,320
  • 9
  • 80
  • 116
1

You might want to consider just blanket archiving anything more than six months old to another share, and watch for file accesses on that share. Files that are consistently accessed you could put back on the primary server.

Another option is something like the Google Search Appliance. That way you can let Google's app smartly figure out what people are looking for when they search for things and it will "archive" by putting less-accessed documents further down on the search page.

Adam Brand
  • 6,057
  • 2
  • 28
  • 40
1

On our Windows 2003 R2 File Server we use the built-in reporting functionality of File Resource Monitor, it will send you, least used file lists along with other reports.

JamesBarnett
  • 1,129
  • 8
  • 12
0

I move all existing data onto a new read-only shared folder: if the enduser needs to update a file, they can copy it into the fresh new shared drive.

This way, all the old stuff does stay available, but I can take out of the backup schedule.

On top of that, once every year, I remove folders (after checking that the archive is healthy) that haven't been updated/accessed for 3 years.

DutchUncle
  • 1,265
  • 8
  • 16
0

Perhaps the first step is to get an idea of the size of the problem. How much space is occupied by the file share? How many files are we talking about?

If you're lucky, you'll find that certain portions of the file share follow naming conventions, either on a per-user, per-business process, or per-department basis. This can help you parcel out the task of triaging the files.

In a worst-case scenario, you can take the whole thing offline and wait to see who complains. Then you can find out who they are and what they were using it for. (Evil, but it works.)

dthrasher
  • 207
  • 2
  • 7
0

I think the best solution is to move to a new drive. If the number of people accessing the share is reasonable, ask them and find out which parts are truly needed. Move those to the new share. Then encourage everyone to use the new share. After some period of time, take down the old share. See who screams and then move that data over to the new share. If no one asks for something for 3-6 months, you can safely delete or archive it.

Steve Rowe
  • 223
  • 4
  • 7