Does archive size of tar, zip and rar effect the time it takes to delete a file from it?

1

1

Does it take a longer time to delete files from a large tar zip and rar archive than a smaller one? I would think that for a file to be deleted from an archive, all the data that exists after the deleted file would have to be re-written to the archive, thus taking longer as opposed to a smaller archive where the amount of data to re-write is less... if not, how are these archives able to remove data from the middle of the archive without re-writing the rest of the data?

Daniel

Posted 2013-08-27T21:27:18.193

Reputation: 517

Answers

1

You're exactly right. It depends a bit on the precise archive format and compression used, but generally, at a minimum, all the data stored "after" the deleted file must be rewritten.

David Schwartz

Posted 2013-08-27T21:27:18.193

Reputation: 58 310

To be perfectly honest, neither did you explain why the data stored needs to be rewritten. Maybe it's a vocabulary quandary, as rewriting can mean different things to different people. – Doktoro Reichard – 2013-08-29T08:27:52.117

0

tar does not support compressed archive file modification. then you must completly rewrite tar archive, temporarly keep uncompressed one. it depends to tar archive format.

Znik

Posted 2013-08-27T21:27:18.193

Reputation: 259

-1

With regards to @David (the previous poster), I feel that the answer given is somewhat lacking.

Let's analyze the questions:

1. Does it take a longer time to delete files from a large tar zip and rar archive than a smaller one?

Yes it does, because the archive is bigger. However this is an absurd generalization. Considering the two main factors that may affect this: archive size and number of files archived.

If there is only one file archived, essentially what you're doing is deleting the archive itself. If there are many files, however, the archiving programs (and formats) have different ways to treat files.

Tar, for instance, was meant to be a sequential file storing format for storing tape archives. One of the disadvantages is that, since there is no "table of contents", it needs to iterate through the whole archive to find a folder or a file.

Rar, on the other hand, has an option to make solid files. A solid file is an archive where all information was previously treated as a big stream. This means that, whenever someone wants to access, edit, add or delete a file, the entire archive must first be decompressed, and then recompressed.

And now we come at something new: compression ratio. If the files are highly compressed, it will take more time, no matter the algorithm, to access them. Although this is dependent on the kind of files being compressed (text files (not .docx) have high redundancy so they can be de/recompressed quickly)

2. How are these archives able to remove data from the middle of the archive without re-writing the rest of the data?

The reasoning before this question isn't always valid, except for the rar "solid" archive.

Barring Tar (for reasons shown on the Wikipedia link), both zip and rar have something of a "table of contents" that enables the archives to selectively extract data. All this is done without recompressing the existing data, although some things need to be altered inside the archive to tell him that the file no longer exists.

Think of an archive as a small box, where each file is crammed and squeezed in order to fit it. As soon as you take one item, the box srinks in order to fill the space.

Doktoro Reichard

Posted 2013-08-27T21:27:18.193

Reputation: 4 896

You never answered the question you posed in "2". How is it possible? The question wasn't about extraction or recompression but about re-writing. – David Schwartz – 2013-08-29T03:28:16.363

I actually did, if you read about the "table of contents". Essentially each file inside the archive is compressed individually. The only re-writing that might happen is only to shift the files that come after (the contents) to the positions occupied by the deleted file. But that is trivial, as you don't need to alter anything in the compressed files. The best comparison would be to delete a line of text inside a document, where each line is a file. You didn't have to re-write all other files just to delete one (except in the rar 'solid' case). – Doktoro Reichard – 2013-08-29T08:02:25.387

So, if the file comes at the beginning of the archive, then you do have to rewrite the entire archive, right? How is that trivial? – David Schwartz – 2013-08-29T18:01:33.393

@David It's trivial in the sense you don't need to reprocess the remaining files, all the heavy work has already been done. Again, I refer to you to my analogy: if you delete a line at anywhere on the document, you don't need to write again all lines. – Doktoro Reichard – 2013-08-29T18:47:00.570