Can you tell by the network traffic whether a video was watched or downloaded from YouTube?

1

1

My question is about popular YouTube downloaders like youtube-dl (a command line program) or VideoDownloadHelper (a Firefox-browser extension).

Comparing two cases:

  1. Watching a video on YouTube
  2. Download the video using a downloader (to be specific let's assume youtube-dl)

Is it possible to tell – for instance by inspecting the network traffic – that the video was downloaded and not "only watched" on YouTube?

Maybe one could compare network traffic using programs like Wireshark? I cannot do that myself, but maybe this will help somebody to answer the question.

humanityANDpeace

Posted 2012-11-10T15:35:54.923

Reputation: 642

Unless it's some kind of special player+stream combo using anti-copying measures, when online videos are played they're also downloaded to your local machine, and you can copy them from your browser cache. – Karan – 2012-11-10T16:00:08.803

I have to give this browser cache some background research. Maybe this way I can find a way / software which when downloading does not generate any difference to the simple watching of a video. – humanityANDpeace – 2012-11-10T16:28:07.403

Answers

2

Yes, it's possible to differentiate between these two use cases when looking at network traffic. The simple explanation is:

  • When you're downloading the raw video file with youtube-dl, you're loading a complete file at once.
  • When you're watching YouTube video through the browser, the Flash client downloads the video in chunks. The chunks fill up a buffer, and once that buffer is about to run out, the player fetches the next chunks.

Both can be done through HTTP these days. You can observe the client behavior when you load up a video. It is never completely downloaded at once: The buffer will be played out, then the next part will be loaded. This of course is visible in network traffic, as multiple requests are sent to YouTube for one resource over the course of time.

To cite Kuschnig et al. (see below):

A video segment is split into chunks of size lch, which are served by a standard HTTP server. The download of the video chunks is coordinated by the client. For that purpose, the client maintains nc HTTP-based request-response streams and schedules the downloads of the different chunks by using a separate queue for each stream

If you want more specifics about the YouTube streaming traffic, I could of course explain more. We currently conduct various simulated experiments regarding optimization of YouTube buffering and analysis of diverse video streaming scenarios.

Further reading:

  • Kuschnig, Robert, Ingo Kofler, and Hermann Hellwagner. "Evaluation of http-based request-response streams for internet video streaming." Proceedings of the second annual ACM conference on Multimedia systems. ACM, 2011 (PDF)

  • Stockhammer, Thomas. "Dynamic adaptive streaming over HTTP--: standards and design principles." Proceedings of the second annual ACM conference on Multimedia systems. ACM, 2011. (PDF)

slhck

Posted 2012-11-10T15:35:54.923

Reputation: 182 472

This answer was written in 2012 and the points it makes are somewhat misleading in today's environment. YouTube in particular is being quite aggressive with their DASH deployment, in fact requiring the use of that fragmentation protocol if you want to obtain the highest quality content rendition. Meanwhile, as regards the OP's question about distinguishing automated access, the fact that youtube-dl now supports DASH seems to obscure the premise of this answer..

– Glenn Slayden – 2017-01-31T02:02:44.660

@GlennSlayden You're right in saying that DASH is now predominant as a streaming technology in YouTube, and that youtube-dl is using this protocol to fetch content. However, the traffic itself should look different, as a regular player download would fill the buffer and then enter an oscillating state where it depletes the buffer to a certain extent, then fills it up again. I am assuming that youtube-dl would do a best-effort download at full rates. (Of course, this remains to be verified…) – slhck – 2017-01-31T08:54:58.597

@slhck Good points. Accurately simulating the oscillation pattern of a real-time download would quickly devolve to simply having the unattended download proceed in real time. If some automatic process does need to maintain that particular fiction, it could still try to obtain the same "net" (pun) bandwidth by pulling multiple feeds in parallel. The client would present an IP address just as suspiciously over-ravenous as before, barring specific subterfuge in that regard. – Glenn Slayden – 2017-01-31T10:22:30.257

So is it not entirely possible for a downloader to mimic data requests in a manner similar to that of the Flash client? Along with the correct user agent string and what not, would it still be possible to differentiate? – Karan – 2012-11-10T22:03:00.230

Well, then of course it's splitting hairs between what's a proper video client or merely a downloader acting as such :) You're right of course: You could definitely mimic video player requests, and changing user agent strings would be another way to obscure traffic. I'm sure if you're clever enough you could fool any detection algorithm. – slhck – 2012-11-10T22:07:09.127

True. Referring to the original question, Google/the music industry is not so stupid as to be ignorant of the fact that content can and is downloaded (often with multiple connections to the server using download accelerators). Guess either they don't care as long as it's for personal use, or don't want to reduce their popularity and/or spark off an arms race by introducing some form of DRM, or whatever. In any case, I doubt there'd be much left if all copyrighted content not uploaded by the copyright owners themselves were to be removed from YouTube. :) – Karan – 2012-11-10T22:14:11.267

-1

Yes it is different (in the special case of using youtube-dl) which can be seen by the fact that the traffic while watching on youtube.com website uses a https:// transfer and the traffic generated by youtube-dl is using an unencrypted http://.

If somebody sniffes the packages he can tell that the file was not watched on youtube. At least not the ordinary way

humanityANDpeace

Posted 2012-11-10T15:35:54.923

Reputation: 642

Can't youtube-dl be made to use an https connection? Does YouTube always use https? – Karan – 2012-11-10T16:36:56.363

I see no reason why it should be impossible to make youtube-dl use https connections. Still the handling of https is a little more tricky and as it seems not required to achieve the goal (to provide a mechanism to download the resource). In the current way it would still not achieve the side-goal of downloading the data an "mimic video watching way". This goal (even with using https-connections) would not be achieved since I doubt the elaborate behaviour of the browser is immitated. I think youtube-dl is more like small python app. – humanityANDpeace – 2012-11-12T08:57:59.907

why the downvoting? It answers the question by showing an example of a case where it is different. At least it thereby partially responds to the question. It took some work to use wireshark and investigate this. I feel unappreciated for this work. – humanityANDpeace – 2012-11-12T08:59:34.597

Although I don't know who downvoted, don't take it so seriously. It's just how the site works. – Karan – 2012-11-12T16:51:37.123

@ Karan :thanks for the consolation. Still I am confused to see a downvote on an "not-wrong" even partly helpful answer of mine. Instead of downvoting I would rather see better answers to be voted up. I am confused, since I though the site works the way that wrong answers are downvoted. – humanityANDpeace – 2012-11-14T09:24:10.843

Voting is done by people, and since when have people been known to always do what's sane or "right"? :) – Karan – 2012-11-14T23:33:07.637