Questions tagged [deduplication]

94 questions
6
votes
2 answers

How does ZFS Block Level Deduplication fit with Variable Block Size?

According to The First Google Result for "ZFS Deduplication" ... What to dedup: Files, blocks, or bytes? ... Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated, but unlike file-level dedup, it…
700 Software
  • 2,163
  • 9
  • 47
  • 77
5
votes
2 answers

Offline Deduplication Software for ZFS

I do have a 300TB Freenas server to backup several linux nodes. Backup works with daily snapshot and rsync tasks. The user often move big datasets (2-5TB) between the server - so often big files are getting backed up several times on several…
philipp
  • 101
  • 3
5
votes
3 answers

Store multiple versions of large binary file with minimal data duplication (preferably Linux)

I need to store multiple versions of a ~ 150 GB binary file (qcow2) on Linux servers with local storage, and was hoping there is some solution that involves just keeping diffs that can be merged as needed, so that I dont have to create another copy…
user160910
5
votes
3 answers

Proper use of disk to disk to tape backup using de-duplication and LTO5

I currently have ~12TB of data for a full disk to tape (LTO3) backup. Needless to say, it's now requiring over 16 tapes so I'm looking at other solutions. Here is what I've come up with. I'd like to hear the community's thoughts. Server for…
Michael
  • 506
  • 2
  • 8
  • 19
5
votes
18 answers

How to Eliminate Tape Backup and Off-site Storage Service?

PLEASE READ UPDATE AT THE BOTTOM. THANKS! ;) Environment Info (all Windows): 2 sites 30 servers site #1 (3TB of backup data) 5 servers site #2 (1TB of backup data) MPLS backbone tunnel connecting site #1 and site #2 Current Backup Process: Online…
Daniel Lucas
  • 1,192
  • 1
  • 14
  • 25
5
votes
5 answers

Full-featured online backup providers for medium-sized enterprise?

I've been banging my head against the wall trying to find an online backup service that supports all of the following enterprisey features: Full-system backup for Linux and Windows 2003/2008 servers, including windows registries, System State,…
rmalayter
  • 3,744
  • 19
  • 27
5
votes
6 answers

Capacity Optimization / Deduplication Options for Primary Storage

I'm exploring options for making more efficient use of our primary storage. Our current NAS is an HP ProLiant DL380 G5 with an HP Storageworks MSA20, and one other disk shelf which I'm not sure what it is. The vast majority of our files are PDF…
4
votes
2 answers

How to verify that a deduplication has taken place?

Microsoft Windows Server 2012 and onwards offers a de-duplication service that periodically scans files, find identical chunks and removes excessive copies to save space. To the user browsing the files, they should all look the same. My problem is…
DraxDomax
  • 139
  • 1
  • 5
4
votes
1 answer

On deduped volumes, how can I determine space used in a folder with Measure-DedupFileMetadata

I'm trying to understand how Measure-DedupFileMetadata works so I can recursively go through some folders to report on how much space is actually used. I don't know how to interpret the output. If I understand the documentation correctly,…
Dan Buhler
  • 456
  • 4
  • 9
4
votes
2 answers

File System with Real-Time Data Deduplication

Is there a file system that stores files under a hash so there are no duplicates? It can be under any operating system. I know Git does that, but I'm looking for something that can run in real-time.
mik
  • 199
  • 2
  • 12
4
votes
0 answers

Windows NTFS Data Deduplication and Snapshot Backups

We have a file server (fs00) on Google Cloud Platform (GCP): Running Windows Server 2019 (with Desktop Experience installed). One OS / System disk (250GB SSD) An independent data disk (5TB standard tier) Backups performed via disk snapshots with…
3
votes
1 answer

Can an unoptimize dedup job be resumed?

I have a Start-DedupJob -type Unoptimize ... that's been running for 7 hours and is still at 0% progress. I'm reconfiguring the server and need to restart. Disk activity has been pegged to near capacity, with reading and writing often over a…
Louis Waweru
  • 695
  • 9
  • 26
3
votes
1 answer

zfs wrong space usage

I have a backup server with ZFS (Ubuntu 16.04; 32GB RAM, 4x6TB HDD, raidz2). Recently I've found the problem with space available. # zpool list -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT pool 21.6T 19.9T 1.76T …
3
votes
1 answer

How to use OverlayFS with Docker volumes?

For some use cases I'd like to have the possibility to create a volume with docker volume create, and fill it with data. Then I want to create a new volume that is only a copy of the first one, but no data needs to be copied, only changed files…
tabb
  • 131
  • 2
  • 6
3
votes
2 answers

Deduplication and cost savings on IaaS object stores (S3/Azure Blobs)

Do any of the commercial IaaS object stores (S3, Azure Blobs etc.) avoid charging multiple times for storing duplicate data (identical files, or parts of files)? For instance, we have a 15 TB dataset of tweets and one of our team wants to make a…
Jedi
  • 408
  • 1
  • 5
  • 19