Is redirection with `>>` equivalent to `>` when target file doesn't yet exist?

80

18

Consider a shell like Bash or sh. The basic difference between > and >> manifests itself in a case when the target file exists:

  • > truncates the file to zero size, then writes;
  • >> doesn't truncate, it writes (appends) to the end of the file.

If the file does not exist it is created with zero size; then written to. This is true for both operators. It may seem the operators are equivalent when the target file doesn't yet exist.

Are they really?

Kamil Maciorowski

Posted 2018-07-23T08:42:56.713

Reputation: 38 429

Answers

107

tl;dr

No. >> is essentially "always seek to end of file" while > maintains a pointer to the last written location.


Full answer

(Note: all my tests done on Debian GNU/Linux 9).

Another difference

No, they are not equivalent. There is another difference. It may manifest itself regardless of whether the target file existed before or not.

To observe it, run a process that generates data and redirect to a file with > or >> (e.g. pv -L 10k /dev/urandom > blob). Let it run and change the size of the file (e.g. with truncate). You will see that > keeps its (growing) offset while >> always appends to the end.

  • If you truncate the file to a smaller size (it can be zero size)
    • > won't care, it will write at its desired offset as if nothing happened; just after the truncating the offset is beyond the end of the file, this will cause the file to regain its old size and grow further, missing data will be filled with zeros (in a sparse way, if possible);
    • >> will append to the new end, the file will grow from its truncated size.
  • If you enlarge the file
    • > won't care, it will write at its desired offset as if nothing happened; just after changing the size the offset is somewhere inside the file, this will cause the file to stop growing for a while, until the offset reaches the new end, then the file will grow normally;
    • >> will append to the new end, the file will grow from its enlarged size.

Another example is to append (with a separate >>) something extra when the data generating process is running and writing to the file. This is similar to enlarging the file.

  • The generating process with > will write at its desired offset and overwrite the extra data eventually.
  • The generating process with >> will skip the new data and append past it (race condition may occur, the two streams may get interleaved, still no data should be overwritten).

Example

Does it matter in practice? There is this question:

I'm running a process which produces a lot of output on stdout. Sending it all to a file [...] Can I use some kind of log rotation program?

This answer says the solution is logrotate with copytruncate option which acts like this:

Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one.

According to what I wrote above, redirecting with > will make the truncated log large in no time. Sparseness will save the day, no significant disk space should be wasted. Nevertheless each consecutive log will have more and more leading zeros in it that are completely unnecessary.

But if logrotate creates copies without preserving sparseness, these leading zeros will need more and more disk space every time a copy is made. I haven't investigated the tool behavior, it may be smart enough with sparseness or compression on the fly (if compression is enabled). Still the zeros may only cause trouble or be neutral at best; nothing good in them.

In this case using >> instead of > is significantly better, even if the target file is about to be created yet.


Performance

As we can see, the two operators act differently not only when they begin but also later. This may cause some (subtle?) performance difference. For now I have no meaningful test results to support or disprove it, but I think you shouldn't automatically assume their performance is the same in general.

Kamil Maciorowski

Posted 2018-07-23T08:42:56.713

Reputation: 38 429

9So >> is essentially "always seek to end of file" while > maintains a pointer to the last written location. Seems that there might be some subtle performance difference in the way they work as well... – Mokubai – 2018-07-23T08:53:42.837

@Mokubai Well said. I'm going to use your first sentence as tl;dr, if you don't mind. – Kamil Maciorowski – 2018-07-23T08:57:11.737

No worries, go for it. ;) – Mokubai – 2018-07-23T08:58:07.377

Where can we find some documented reference about how both > and >> differ? – jjmontes – 2018-07-23T10:51:21.570

10

On the system call level, >> uses the O_APPEND flag to open(). And actually, > uses O_TRUNC, while >> doesn't. The combination of O_TRUNC | O_APPEND would also be possible, the shell language just doesn't provide that feature.

– ilkkachu – 2018-07-23T10:51:49.533

3

@jjmontes, the standard source would be POSIX: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_07 but of course Bash's manual also has descriptions on the redirection operators, including the non-standard ones it supports: https://www.gnu.org/software/bash/manual/html_node/Redirections.html

– ilkkachu – 2018-07-23T10:53:38.967

2

@ilkkachu I found this to be of interest, as it explains details about O_APPEND which I was wondering about after your comment :): https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix

– jjmontes – 2018-07-23T11:10:47.810

@KamilMaciorowski the reason I suspect there might be some small performance difference is because the always seek to EOF implies there is either a checkLengthOfFile() followed by writeAtLocation(endOfFile) while the > simply does a writeAtLocation(currentLocation). Both would trigger the same checks at the filesystem level to make sure that the file is long enough or needs a sparse expansion, but one needs an extra step to check the length first. I would expect with good disk caching that the difference would be somewhere between trivial and non-existent though... – Mokubai – 2018-07-23T12:40:42.007

1@Mokubai, Any sane OS would have the file length at hand when it's open, and checking a flag and moving the offset to the end should just disappear in all the other bookkeeping. Trying to emulate O_APPEND with an lseek() before each write() would be different though, there'd be the extra system call overhead. (And of course it wouldn't work, since another process could write() in between.) – ilkkachu – 2018-07-23T12:49:42.017

1@ilkkachu I get you. Reading your other comment this is handled all within the filesystem level bits and bobs and is essentially free due to the way the file handle is opened. I admit my comment was from a position of slightly naive and out of date knowledge of reading about how things used to be done... – Mokubai – 2018-07-23T12:57:24.653

@Mokubai Unix/Linux virtual file systems usually implement a vnode structure for every open file. BSD's can be seen at https://man.openbsd.org/vnode.9. I haven't examined BSD closely, but I suspect the void *v_data; /* private data for fs */ field in the structure contains the relevant metadata such as current file length and current file offset. Since write() operations involve locking the structures, figuring out where to write is a simple "look up the file open flags, then use the current length or offset depending on the value of the flags".

– Andrew Henle – 2018-07-23T16:28:29.717

@AndrewHenle: I believe that you are (at least partly) looking at this backwards. On the one hand, I’m surprised that that the file size isn’t featured clearly in the vnode structure; it would seem it’s an attribute common to all filesystem types. But perhaps it is buried under v_data. On the other hand, remember that multiple processes can have the same file open simultaneously, with different file offsets. The file offset can’t be in the inode/vnode structure, but must be in the file (description) structure; there can be multiple file table entries for a single file. … (Cont’d) – Scott – 2018-07-30T02:51:56.730

(Cont’d) …  I could find only a very old copy of file.h here. This course handout shows the relationship between the structures. This code from BSD 4 shows f_offset being copied from the file structure to u.u_offset, and this shows readi and writei using u.u_offset as the offset into the file.  … (Cont’d)

– Scott – 2018-07-30T03:03:16.843

(Cont’d) … P.S. I couldn’t find where O_APPEND is handled. – Scott – 2018-07-30T03:03:18.967