Background
I work at a research department working with biomedical data and we are currently considering to the revise our IT-structure. We have several instruments that generate GBs of data on a daily basis, connected to network-isolated computers. The data is carried around in the network, processed at intermediate steps before it's transferred to the national data storage service for universities.
What we need to improve is the intermediate step where the data is stored for short term (~3 months) during which researchers can access the data without having to query the data from a remote data center. As it is, the intermediate server is used for a number of different purposes, and usually runs out of space. We intend on buying a NAS which will be dedicated for short term storage of instrument data. I was given the responsibility to come up with alternatives.
I started off by charting out what we need, which led to the following list of our requirements:
- at least 8TB space: this should not really be an issue with modern setups
- Gb bandwidth: same as above
- rack-mount: so that the NAS will physically be close to the other servers we have
- expandable: in case our data volume increases in the near future (I assume it will)
- minimal maintenance: we don't have the liberty (economically and bureaucratically) to have full-time system admins, as it is the most tech-savvy scientists help out with server maintenance. None of us are IT-professionals...
Question(s)
I started reading on storage systems, the list of most common questions on meta was a great resource. Likewise I found two similar question asking about storage in a research environment:
However both questions seem to focus on long-term storage, and also focus on individual appliances, whereas I am mostly interested in figuring out what features/specs/qualities are valuable in this context.
Based on prior knowledge and recent reading, I figure there are a couple of aspects which could be of importance when choosing a NAS in our case:
support for SAS drives - is it really crucial? I understand that SAS drives are of higher quality generally, but assuming that there is redundancy in the array, what's the big deal if a SATA disk dies?
Link aggregation - I have to say I am not well-read about the network layers and devices that go along with it, but my limited understanding of link aggregation is that with multiple network cards, a NAS can theoretically double/triple the bandwidth, likewise the multiple links be used for error correction (at least according to Synology). I would appreciate any additional information that might help me make sense of this and distinguish the reality from marketing talk.
Multiple networks - it would make sense for us to be able have the NAS available in two different VLANs that do not see one another due to the isolation criteria we have on some computers. If the NAS has two ethernet ports, is it as simple as connecting it to two different networks and be done with it??
Hot-swap etc - there seem to be a number of different versions of this aspect. My understanding is that hot-swap refers to an extra disk connected to the NAS which is written to first when one disk fails. Is this correct? If so is hot-swap a cool feature to have, or a must even though the array is running single/double redundancy?
Another version of "hot-swap" (which I am not sure how it's called) allows for replacement of disks while the server is on-line, so it's sort of a hot-replacement (Drobo offers something like this). Is it a common feature, or something specific to Drobo? Are there similar technologies available? Is there a "catch" that I might not be aware of? Otherwise I think it's pretty interesting since it allows for online expansion of the storage space.
The above list of features were some that I have been pondering about, I would really appreciate some insight into these and possibly others I might have missed.