4

this is related to this post:

Scalable (> 24 TB) NAS for research department

but perhaps a little more general.

Background:

We're a research lab of around 10 people who do a lot of experiments that involve taking pictures at one of several lab setups and then analyzing it an one of several lab computers. Each experiment may produce 2 or 3 GB of data, and we are generating data at the rate of about 10 TB/year.

Right now, we are storing the data on a 6-bay netgear readynas pro, but even with 2 TB drive, this only gives us 10 TB of storage. Also, right now we are not backing up at all. Our short term backup plan is to get a second readynas, put it in a different building and mirror the one drive onto the other. Obviously, this is somewhat non-ideal.

Our options:

1) We can pay our university $400/ TB /year for "backed up" online storage. We trust them more than we trust us, but not a whole lot.

2) We can continue to buy small NASs and mirror them between offices. One limit, although stupid, is that we don't have an unlimited number of ethernet jacks.

3) We can try to implement our own data storage solution, which is why I'm asking you guys.

One thing to consider is that we're a very transient population and none of us are network administration experts. I will probably be here only another year or so, and graduate students, who are here the longest, have a 5-6 year time scale. So nothing can require expert oversight.

Our data transfer rates are low - most of the data will just sit on the server waiting for someone to look at it once or twice - so we don't need a really high speed system.

Given these contraints, can someone recommend a fairly low-cost, scalable, more or less turn key shared data storage system with backup in a separate physical location. Does such a thing exist or should we just pay the university to take care of it for us?

As a second question, our professor just got tenure and is putting together a budget. Here the goal is to ask for as much as you can and hope you get a fraction of it. So the same question, minus the low-cost. Without budget constraints, can you recommend a scalable turn-key backed up storage system.

Thanks

Marc
  • 175
  • 3
  • Is the output data the (2 to 3 GB of data) is that mostly read only at that point. – tony roth Jun 23 '10 at 18:34
  • @tony roth, Yes, the general workflow is to take the data on the local machine, transfer it to the network drive, then analyze it on another computer. At that point, we use only the (much smaller, ~100 MB) analysis files. We still need to retain the original data, but at that point it is definitely read only, and most likely not even accessed again. – Marc Jun 23 '10 at 19:16
  • Is this gis data, just curious? I'd put the smaller files on the university system and get a cheap jbod for mostly read only data. – tony roth Jun 23 '10 at 22:17
  • It's not GIS, pictures of worms crawling around, actually. I'm thinking the best best may be what you say, or rather to periodically pull the read only stuff off the network drive on to two duplicate USB drives and store a set in the lab and a set in another building. – Marc Jun 23 '10 at 22:41
  • Sounds like you need a distributed file system. How large are your smallest files, and do they compress well? – Andrew Jun 24 '10 at 02:08
  • Worms! aaagghhhh... and yes your idea seem like the least amount of administration and the lowest cost. You say a growth of 10TB a year how many years do you think the processed files will be used? Also how long does it take to process the files? If the numbers are appropriate cloud storage may be an option. Sorry another question is it a custom app that does the analysis? – tony roth Jun 24 '10 at 14:20
  • @tony roth - yes custom app. we thought about cloud, but because our data processing needs are so limited, it doesn't make sense to pay the higher rates for transfer and storage. @andrew - basically each experiment produces ~4000 500 kb jpegs and 1 ~1kb text file that lists the times the pictures were taken and sometimes other data like the temperature. – Marc Jun 27 '10 at 14:02

3 Answers3

3

I think this is one of the cases where you should outsource it. Let the university IT department handle the storage, they take care of the backup and maintenance of the storage solution. It will be better in the long run.

JamesBarnett
  • 1,129
  • 8
  • 12
1

There's an excellent and extremely detailed article on building NAS "pods" by a company who developed the system for its own use, at http://www.backblaze.com/petabytes-on-a-budget-how-to-build-cheap-cloud-storage.html . They describe it as "67 TB for $7,867", which is very good going. They run JFS on top of RAID-6 volumes under Debian; they then offer that via https, but there's nothing to stop you putting (eg) SaMBa in there instead (you don't say what your current remote-file-access protocol is).

Disclaimer: I know nothing about these people except what I have read, and I haven't tried to build one of these myself. Nevertheless, unless they have been faking photos, they really do build and deploy a bunch of these things, and they haven't yet gone out of business.

Edit: it took me a little longer to find the specific supplier list (the detailed parts list is in the original link above), but it's at http://blog.backblaze.com/2009/10/07/backblaze-storage-pod-vendors-tips-and-tricks/#more-199 . I really do admire the way these guys have thrown open their detailed infastructure for reuse; but as they say in the original posting:

Finally, we thank the thousands of engineers who slaved away for millions of hours to bring us the pod components that are either inexpensive or totally free, such as the Intel Processor, Gigabit Ethernet, ridiculously dense hard drives, Linux, Tomcat, JFS, etc. We realize we’re standing on the shoulders of giants.

I don't know about their product (I have my own tape stacker for backups) but I approve of their humility.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • This product offers quite some storage for a modest price, however which disk speed do you require? They run 4 SATA channels over a PCI SATA card (max. 133 MB/s), each of these channels is further split up for several drives. I have read reports of people using them saying they get about 20 MB/s effectively. If that's enough for you this might be a good option. – Christian Jan 03 '11 at 11:58
  • Thanks for the link! I'm not sure if we will implement this, but it would definitely solve our storage problems and is very well documented! – Marc Jan 25 '11 at 13:51
1

Ask your university folks if there is a lesser tier of storage. Perhaps something that isn't backed up or only backed up weekly/monthly. If there is a "no backup" option for cheaper, then buy 2-3 times the storage and write some scripts to make backups semi-automatically. Backup, especially tape backup can be two-thirds the cost of networked storage.

Also ask if they offer "near-line storage".

edgester
  • 583
  • 1
  • 5
  • 15