There is some assurance that if you send 20 bytes at the very beginning of a TCP stream, it will not arrive as two 10 byte pieces. This is because TCP stack will not send such small segments: there is a minimum MTU size. However, if the send is anywhere in the middle of a stream, all bets are off. It could be that your protocol stack takes 10 byte of the data to fill a segment and send it out, and then then next ten bytes go to another segment.
Your protocol stack breaks data into chunks and places them into a queue. The chunk sizes are based on the path MTU. If you perform a send operation, and there is still queued data pending, the protocol stack will typically peek at the segment which is at the tail of the queue and see whether there is room in that segment to add more data. The room could be as small as one byte, so even a two-byte send could be broken into two.
On the other end, the segmentation of data means that there can be partial reads. A receive operation can potentially wake up and obtain data when as few as one segment arrives. In the widely implemented sockets API, a receive call can ask for 20 bytes, but it could return with 10. Of course, a buffering layer can be built on it which will block until 20 bytes are received, or the connection breaks. In the POSIX world, that API can be the standard I/O streams: you can fdopen
a socket descriptor to obtain a FILE *
stream, and you can use fread
on it to fill a buffer such that the full request is satisfied with as many read
calls as it takes.
UDP datagrams frame the data. Each send call generates a datagram (but see below about corking). The other side receives a full datagram (and, in the socket API, it must specify a buffer large enough to hold it, or else the datagram will be truncated). Large datagrams get fragmented by IP fragmentation, and are re-assembled transparently to applications. If any fragment is missing, the entire datagram is lost; there is no way to read partial data in that situation.
There exist extensions to the interface allowing multiple operations to specify a single datagram. In Linux, a socket can be "corked" (prevented from sending). While it is corked, written data is assembled into a single unit. Then when the socket is "uncorked", a single datagram can be sent.