tl;dr
No. >>
is essentially "always seek to end of file" while >
maintains a pointer to the last written location.
Full answer
(Note: all my tests done on Debian GNU/Linux 9).
Another difference
No, they are not equivalent. There is another difference. It may manifest itself regardless of whether the target file existed before or not.
To observe it, run a process that generates data and redirect to a file with >
or >>
(e.g. pv -L 10k /dev/urandom > blob
). Let it run and change the size of the file (e.g. with truncate
). You will see that >
keeps its (growing) offset while >>
always appends to the end.
- If you truncate the file to a smaller size (it can be zero size)
>
won't care, it will write at its desired offset as if nothing happened; just after the truncating the offset is beyond the end of the file, this will cause the file to regain its old size and grow further, missing data will be filled with zeros (in a sparse way, if possible);
>>
will append to the new end, the file will grow from its truncated size.
- If you enlarge the file
>
won't care, it will write at its desired offset as if nothing happened; just after changing the size the offset is somewhere inside the file, this will cause the file to stop growing for a while, until the offset reaches the new end, then the file will grow normally;
>>
will append to the new end, the file will grow from its enlarged size.
Another example is to append (with a separate >>
) something extra when the data generating process is running and writing to the file. This is similar to enlarging the file.
- The generating process with
>
will write at its desired offset and overwrite the extra data eventually.
- The generating process with
>>
will skip the new data and append past it (race condition may occur, the two streams may get interleaved, still no data should be overwritten).
Example
Does it matter in practice? There is this question:
I'm running a process which produces a lot of output on stdout. Sending it all to a file [...] Can I use some kind of log rotation program?
This answer says the solution is logrotate
with copytruncate
option which acts like this:
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one.
According to what I wrote above, redirecting with >
will make the truncated log large in no time. Sparseness will save the day, no significant disk space should be wasted. Nevertheless each consecutive log will have more and more leading zeros in it that are completely unnecessary.
But if logrotate
creates copies without preserving sparseness, these leading zeros will need more and more disk space every time a copy is made. I haven't investigated the tool behavior, it may be smart enough with sparseness or compression on the fly (if compression is enabled). Still the zeros may only cause trouble or be neutral at best; nothing good in them.
In this case using >>
instead of >
is significantly better, even if the target file is about to be created yet.
Performance
As we can see, the two operators act differently not only when they begin but also later. This may cause some (subtle?) performance difference. For now I have no meaningful test results to support or disprove it, but I think you shouldn't automatically assume their performance is the same in general.
9So
>>
is essentially "always seek to end of file" while>
maintains a pointer to the last written location. Seems that there might be some subtle performance difference in the way they work as well... – Mokubai – 2018-07-23T08:53:42.837@Mokubai Well said. I'm going to use your first sentence as tl;dr, if you don't mind. – Kamil Maciorowski – 2018-07-23T08:57:11.737
No worries, go for it. ;) – Mokubai – 2018-07-23T08:58:07.377
Where can we find some documented reference about how both
>
and>>
differ? – jjmontes – 2018-07-23T10:51:21.57010
On the system call level,
– ilkkachu – 2018-07-23T10:51:49.533>>
uses theO_APPEND
flag toopen()
. And actually,>
usesO_TRUNC
, while>>
doesn't. The combination ofO_TRUNC | O_APPEND
would also be possible, the shell language just doesn't provide that feature.3
@jjmontes, the standard source would be POSIX: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_07 but of course Bash's manual also has descriptions on the redirection operators, including the non-standard ones it supports: https://www.gnu.org/software/bash/manual/html_node/Redirections.html
– ilkkachu – 2018-07-23T10:53:38.9672
@ilkkachu I found this to be of interest, as it explains details about O_APPEND which I was wondering about after your comment :): https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix
– jjmontes – 2018-07-23T11:10:47.810@KamilMaciorowski the reason I suspect there might be some small performance difference is because the
always seek to EOF
implies there is either acheckLengthOfFile()
followed bywriteAtLocation(endOfFile)
while the>
simply does awriteAtLocation(currentLocation)
. Both would trigger the same checks at the filesystem level to make sure that the file is long enough or needs a sparse expansion, but one needs an extra step to check the length first. I would expect with good disk caching that the difference would be somewhere between trivial and non-existent though... – Mokubai – 2018-07-23T12:40:42.0071@Mokubai, Any sane OS would have the file length at hand when it's open, and checking a flag and moving the offset to the end should just disappear in all the other bookkeeping. Trying to emulate
O_APPEND
with anlseek()
before eachwrite()
would be different though, there'd be the extra system call overhead. (And of course it wouldn't work, since another process couldwrite()
in between.) – ilkkachu – 2018-07-23T12:49:42.0171@ilkkachu I get you. Reading your other comment this is handled all within the filesystem level bits and bobs and is essentially free due to the way the file handle is opened. I admit my comment was from a position of slightly naive and out of date knowledge of reading about how things used to be done... – Mokubai – 2018-07-23T12:57:24.653
@Mokubai Unix/Linux virtual file systems usually implement a
– Andrew Henle – 2018-07-23T16:28:29.717vnode
structure for every open file. BSD's can be seen at https://man.openbsd.org/vnode.9. I haven't examined BSD closely, but I suspect thevoid *v_data; /* private data for fs */
field in the structure contains the relevant metadata such as current file length and current file offset. Sincewrite()
operations involve locking the structures, figuring out where to write is a simple "look up the file open flags, then use the current length or offset depending on the value of the flags".1
Of course, logrotate is not the answer this century. https://superuser.com/a/868519/38062 https://superuser.com/a/291397/38062 https://superuser.com/a/291397/38062 https://unix.stackexchange.com/a/392924/5132
– JdeBP – 2018-07-25T14:10:06.467@AndrewHenle: I believe that you are (at least partly) looking at this backwards. On the one hand, I’m surprised that that the file size isn’t featured clearly in the
vnode
structure; it would seem it’s an attribute common to all filesystem types. But perhaps it is buried underv_data
. On the other hand, remember that multiple processes can have the same file open simultaneously, with different file offsets. The file offset can’t be in theinode
/vnode
structure, but must be in thefile
(description) structure; there can be multiplefile
table entries for a single file. … (Cont’d) – Scott – 2018-07-30T02:51:56.730(Cont’d) … I could find only a very old copy of
– Scott – 2018-07-30T03:03:16.843file.h
here. This course handout shows the relationship between the structures. This code from BSD 4 showsf_offset
being copied from thefile
structure tou.u_offset
, and this showsreadi
andwritei
usingu.u_offset
as the offset into the file. … (Cont’d)(Cont’d) … P.S. I couldn’t find where
O_APPEND
is handled. – Scott – 2018-07-30T03:03:18.967