19

This question was previously asked in stack overflow but the good folks there have recommended that i try the community over here instead.

I am researching on sparse files with regards to various filesystems and am trying to find something concrete that states that sparse files is supported by Network File Systems (NFS) or Server Message Block (SMB).

I understand that SMB is widely used in Windows and that according to this entry, an SMB server can support sparse file even if the underlying file system does not. However, if i am right, the file system that does not support sparse files would just fill the 'holes' with zeroes and this could lead to a performance problem.

With regards to NFS, i have not been able to find out anything about using NFS supporting sparse files.

Hence, my questions is,

Are sparse files supported in NFS and SMB ?

winhung
  • 303
  • 2
  • 7

2 Answers2

12

NFS: it has a partial support for sparse file. Basically, it supports creating a sparse file but, when reading, the file is expanded to include zeroes. This means that, while you can create a sparse file via NFS, when reading back that very same file the in-transit network data will include any zeroes found on the original file. A simple test show that behavior:

cd /mnt/nfs
truncate test.img -s 1G
ls -lh test.img

-rw-r--r--. 1 root root 1.0G Oct 26 11:29 test.img

du -hs test.img

0 test.img

As you can see, the test.img file has an on-disk size of 0 bytes. However, reading back it using dd if=test.img of=/dev/null bs=1M iflag=direct shows

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.2269 s, 105 MB/s

It is clear that when transferring the sparse file, it is expanded to include all zeros.

NFSv4.2 will expand by including special handling for network transfer of sparse file. In other word, with NFSv4.2 the above dd will complete almost instantly.

SMB: it has the same behavior as NFS, at least in my test environments, using a Samba v3.6.x server with CIFS v1 and a Linux client using mount.cifs. Maybe under Windows it behave differently...

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 1
    Can NFS support sparse files if the NFS server's underlying file system does *not* support sparse file? – Andrew Henle Oct 26 '15 at 10:59
  • 2
    @shodanshok: your test is invalid. Executing the same commands on a file system that *does* support sparse files yields the same result. `dd` reads in block by block and whether the underlying file system supports sparse files or not, holes are turned into zeros by the OS. Try it on ext4 and you will see the same numbers. – abligh Oct 26 '15 at 13:01
  • @AndrewHenle if the underlying FS does not support sparse file, how can NFS expose a non-existant support? Anyway, nowadays it is quite difficult to find a filesystem without sparse file support, as all recent (ext3/4, xfs, etc) Linux filesystems support that feature. – shodanshok Oct 26 '15 at 15:41
  • 1
    @abligh You are wrong. Executing the `dd` command over a local sparse file will give much faster results. See here for an example:, `root@hubble:~# truncate -s 1G test.img root@hubble:~# dd if=test.img of=/dev/null bs=1M iflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 0.10478 s, 10.2 GB/s` As you can see, reading a local sparse file give I/O speed north of **10 GB/s** – shodanshok Oct 26 '15 at 15:46
  • 2
    @shodanshok - Oh I see, you are looking at the speed, not the amount transferred. Perhaps clarifying that in your answer would be helpful. The canonical test for a file being stored in a sparse way is `du -s` vs `ls -l`, but you are correct that doesn't help with the transmission over the network; but in either case (as `strace` will confirm) `dd` is reading the whole file, including 'holes' as zeroes, the difference only being where the 'zeroes' originate (server or client side). However note (as per my answer) that NFS 4.2 *does* fully support sparse files. – abligh Oct 26 '15 at 15:53
  • @abligh: So what NFS version will allow non-sparse-aware programs to be able to read zeros from holes without the zeros going over the wire? That's useful for partially-downloaded torrents, for example, where a fast client can scan through zeros much faster than 100MB/s even without being sparse-aware. Linux's NFS4.2 implementation still sends the zeros over the wire (Linux4.11.7 server, Linux 4.8.11 client, using `vers=4.2` over TCP). – Peter Cordes Aug 11 '17 at 10:06
  • One of the initial NFS sparse-file support [mailing list messages](https://www.ietf.org/mail-archive/web/nfsv4/current/msg07423.html) suggested this optimization as the first and easiest thing to implement. Did it end up getting left out? If so, I'd hardly say that NFS4.2 fully supports holes. It lets sparse-aware programs skip them, but it seems to fail to optimize reading from holes. It would be useful for files with pre-allocated extents too, which IIRC are different from holes according to `SEEK_HOLE`. – Peter Cordes Aug 11 '17 at 10:07
  • @PeterCordes the NFS 4.2 spec fully supports sparse files (see my own answer to this question). Of course different implementations of both server and client may or may not have sparse file support as my understanding is support is optional. So the software version that supports these is going to a question for the vendor, and I'd expect it might vary linux / BSD / Windows. – abligh Aug 11 '17 at 12:07
  • @abligh: But does NFS 4.2 include anything that let a client & server avoid transferring zeros over the wire when you `dd` a sparse file? Would the client have to use `READ_PLUS` on every read? With a Linux client+server, I can copy sparse files efficiently, but only by using a sparse-aware program like `qemu-img convert -t none -T none -f raw -O raw src dst`. So I have NFS4.2 sparse file support, but it's not speeding up reads by non-sparse-aware programs that behave like `dd`. – Peter Cordes Aug 11 '17 at 12:17
  • @PeterCordes: the client must support the READ_PLUS call. From the IETF docs: `When a client sends a READ operation, it is not prepared to accept a READ_PLUS-style response providing a compact encoding of the scope of holes. If a READ occurs on a sparse file, then the server must expand such data to be raw bytes. If a READ occurs in the middle of a hole, the server can only send back bytes starting from that offset. By contrast, if a READ_PLUS occurs in the middle of a hole, the server can send back a range which starts before the offset and extends past the requested length` – shodanshok Aug 11 '17 at 16:17
  • @shodanshok: IDK if Linux supports this yet. I found http://nfsv4bat.org/Documents/ConnectAThon/2014/prototyping.pdf, and other client/server patches from 2014/2015 about adding support for READ_PLUS to Linux. I don't see any NFS mount options for enabling use of `READ_PLUS`, though, so IDK if this got merged or what. – Peter Cordes Aug 12 '17 at 02:32
  • Ah, [these slides from Mar 2017](http://events.linuxfoundation.org/sites/events/files/slides/AllMar18_0.pdf) say that patches for `READ_PLUS` to speed up reading holes are still a work-in-progress. @abligh: sorry for being a jerk in my earlier comments. I assumed that since Linux supported NFS4.2, it supported and took advantage of all features of it. [This mailing list message](https://www.spinics.net/lists/linux-nfs/msg48443.html) points out some issues (like race conditions) that make it non-trivial for the server to reliably check for holes while servicing a READ_PLUS... – Peter Cordes Aug 12 '17 at 02:37
10

NFS

Yes, NFS 4.2 fully supports sparse files (see this canonical document and this presentation).

Prior to NFS 4.2, the NFS client/server model supported sparse files in the sense that the API supported all POSIX file operations. This meant that writing sparse files on a server which supported sparse files on the backing file system resulted in a sparse file being created (rather than storing lots of zeros). But reading the file would result in the transmission of a lot of zeroes for the sparse element. IE the answer is 'partially'.

NFS 4.2 adds the ability for the client to 'see' holes in the files, and therefore for the server to not have to transmit all those zeroes. From the ID:

1.4.3.  Sparse Files

Sparse files are ones which have unallocated or uninitialized data
blocks as holes in the file.  Such holes are typically transferred as
0s during I/O. READ_PLUS (see Section 15.10) allows a server to send
back to the client metadata describing the hole and DEALLOCATE (see
Section 15.4) allows the client to punch holes into a file.  In
addition, SEEK (see Section 15.11) is provided to scan for the next
hole or data from a given location.

Despite the fact the specification supports sparse files, it would be possible for a lazy implementor to avoid implementing support for sparse files in either the client or the server.

SMB

I know less about SMB, but I believe it does support sparse files too, if the relevant FS capability bit is set. See here for more info.

abligh
  • 285
  • 1
  • 10