Append-only
Append-only is a property of computer data storage such that new data can be appended to the storage, but where existing data is immutable.
Access control
Many file systems' Access Control Lists implement an "append-only" permission:
- chattr in Linux can be used to set the append-only flag to files and directories. This corresponds to the
O_APPEND
flag inopen()
.[1] - NTFS ACL has a control for "Create Folders / Append Data", but it does not seem to keep data immutable.[2]
Many cloud storage providers provide the ability to limit access as append-only.[3] This feature is especially important to mitigate the risk of data loss for backup policies in the event that the computer being backed-up becomes infected with ransomware capable of deleting or encrypting the computer's backups.[4][5]
Data structures
Many data structures and databases implement immutable objects, effectively making their data structures append-only. Implementing an append-only data structure has many benefits, such as ensuring data consistency, improving performance,[6] and permitting rollbacks.[7][8]
The prototypical append-only data structure is the log file. Log-structured data structures found in Log-structured file systems and databases work in a similar way: every change (transaction) that happens to the data is logged by the program, and on retrieval the program must combine the pieces of data found in this log file.[9] Blockchains add cryptography to the logs so that every transaction is verifiable.
Append-only data structures may also be mandated by the hardware or software environment:
- All objects are immutable in purely functional programming languages, where every function is pure and global states do not exist.[10]
- Flash storage cells can only be written to once before erasing. Erasing on a flash drive works on the level of pages with cover many cells at once, so each page is treated as an append-only set of cells until it fills up.[9][11]
- Hard drives that use shingled magnetic recording cannot be written to randomly because writing on a track would clobber a neighboring, usually later, track. As a result, each "zone" on the drive is append-only.[12][6]
Append-only data structures grow over time, with more and more space dedicated to "stale" data found only in the history and more time wasted on parsing these data. A number of append-only systems implement rewriting (copying garbage collection), so that a new structure is created only containing the current version and optionally a few older ones.[7][13]
See also
- Access control list
- Cloud storage
- Comparison of file hosting services
- Data structure
- Purely-functional data structure
- Log-structured merge-tree
References
- – Linux User's Manual – User Commands
- "powershell - How to give "only append" access to user in windows , for logging purposes". Server Fault.
- Jim Donovan (September 11, 2018). "Why Use Immutable Storage?". Wasabi.
- Eugene Kolodenker, William Koch, Gianluca Stringhini,Manuel Egele (April 2017). "PayBreak: Defense Against Cryptographic Ransomware". Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security: 599–611. doi:10.1145/3052973.3053035.
Due to the threat of ransomware targeting the key vault, our implementation stores the harvested key material into an append-only file protected with Administrator privileges.
CS1 maint: uses authors parameter (link) - Pont, Jamie; Abu Oun, Osama; Brierley, Calvin; Arief, Budi; Hernandez-Castro, Julio (2019). "A Roadmap for Improving the Impact of Anti-ransomware Research". Secure IT Systems, Proceedings of 24th Nordic Conference, NordSec 2019. Springer International Publishing. pp. 137–154. ISBN 978-3-030-35055-0.
- Magic Pocket Hardware Engineering Teams. "Extending Magic Pocket Innovation with the first petabyte scale SMR drive deployment". dropbox.tech.
- "Redis Persistence". Redis.
- "Additional Notes". Borg Deduplicating Archiver 1.1.11 documentation.
- Reid, Colin; Bernstein, Phil (1 January 2010). "Implementing an Append-Only Interface for Semiconductor Storage" (PDF). IEEE Data Eng. Bull. 33: 14–20.
- "Thirteen ways of looking at a turtle". F# for fun and profit. Retrieved 2018-11-13.
- "NVMe Zoned Namespace". ZonedStorage.io.
The internals of Solid State Drives are such that they implement a log-structured data structure, where data is written sequentially to the media.
- Jake Edge (March 26, 2014). "Support for shingled magnetic recording devices". LWN.net. Retrieved December 14, 2014.
- Brewer, Eric; Ying, Lawrence; Greenfield, Lawrence; Cypher, Robert; T'so, Theodore (2016). "Disks for Data Centers". Proceedings of USENIX FAST 2016.
Because of the write restrictions imposed by SMR, when data is deleted, that deleted capacity can not be reused until the system copies the remaining live data in that SMR zone to another part of the disk, a form of garbage collection (GC).