Why do images from some Tumblr pages not load, but using wget on them works?

8

4

Helping a friend out with their Internet connection because “some pages won’t load”, I noticed that the problem was that the images of certain blogs' image posts weren’t loading on the browser. I found it weird because of the following reasons:

  1. Only images that are part of the post won’t load. User avatars, banners, headers, various theme and/or page-related images still appear.
  2. Happens with any browser on the computer (Tested on Firefox and Chrome/ium both with and without ad/script blockers).
  3. Using wget on the images' direct links works.
  4. This does not apply to all Tumblr pages. Most load properly, but when making a list of pages with posts that don’t load images show that they’re mostly from the same bunch of users.
  5. The problem seems to be blog-specific in the sense that if a certain blog's image post doesn't load in the browser, other blogs (unaffected or not) that reblogged the same post won't load the image in the browser as well. Conversely, if an affected blog is reblogs from an unaffected one, the image loads fine.
  6. The images are from user-created Tumblr posts where the user uploads an image to post and are hosted by Tumblr. For example (this example is not one of the affected blogs), in this image post (randomly selected), this would be the direct link to the image in the post. Image posts automatically make the images a link to another page in Tumblr using a (usually) larger version of the image used in the post that is closer to the size of what the user uploaded for the post.

What can possibly be the reason for this happening? The part that really gets me is the fact that wget works, so I think I can assume that it’s not a problem with the network connection.

Update:

Here is an example of a reblogged post that fails to load on the browsers. The main blog has other image posts that load properly. This is the direct link to the image in the post and here is the one for the bigger version (both don't load here). wget works for both, but upon going to any direct link with Firefox, this error appears:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

<Error>
    <Code>AccessDenied</Code>
    <Message>Access Denied</Message>
    <RequestId>A626307DF577B411</RequestId>
    <HostId>J9GxX1HY9vX3ElWjYf7M48ByvKXLRIwRBJ2al2voS3J/C+WhILWHyd3crFhhNtkXuvG0zaxBTxw=</HostId>
</Error>

RequestID and HostId changes every time. My friend and I are located in the Philippines.

Update [2014/03/08]

Upon further tests and replying to the emails of Tumblr support, wget has stopped working (getting 403 errors on direct links) on some occasions.

Update [2014/03/09]

Turning off the Tumblr rules for HTTPS-Everywhere seems to sometimes fix the problem.


Note:

  • In the example for #6, direct links both point to the same image. Usually, though, the one used in the image post (as compared to the zoomable image page) uses a smaller version of the image to fit the theme of the page. The example uses a theme made for larger screens so it does not need the smaller version.

maki57

Posted 2015-03-04T23:54:27.683

Reputation: 327

Did I read 5 correctly, that other people cannot view images that are reblogged by the person with the issue? – Paul – 2015-03-04T23:57:32.867

I posted an answer, but what might help is if you could provide actual URLs to the blog posts which seem to break as well as URLs to the images that seem problematic. Please be sure to edit your question to add these details if possible. – JakeGould – 2015-03-05T00:09:18.390

@Paul I meant that if I view an image post by tumblrUser1 that doesn't load on the browser and if tumblrUser2, tumblrUser3 ... tumblrUserN reblogs tumblrUser1's post, the browser will also not be able to load in on those other users' pages. – maki57 – 2015-03-05T02:10:34.343

The examples you show are all PNG images. What is your friend’s operating system? Please edit the question to clarify that. It could be a core OS issue connected to PNG images. – JakeGould – 2015-03-05T02:14:04.847

@Paul I meant that if I view an image post by tumblrUser1 that doesn't load on my current browser and if tumblrUser2, tumblrUser3 ... tumblrUserN reblogs tumblrUser1's post, the browser will also not be able to load the image on those other users' pages. – maki57 – 2015-03-05T02:18:41.383

@JakeGould I don't think I can link the posts with the broken images because of the potentially NSFW nature of the blog. I'll edit an example in once I'm out of work or find a similarly broken one that's not NSFW. As for the PNG part, other blogs with image posts have PNG images that still work, so I don't think that's the problem. I'll edit in the fact that it only applied when viewing certain blogs. – maki57 – 2015-03-05T02:21:12.477

@maki57 The wget works, but are you saying that loading the direct image link in the browser doesn't work? What error do you get? – Paul – 2015-03-05T02:27:21.920

@Paul The "broken image" icon where the image should be in some cases or a blank space where it should be. Using Firefox's "Inspect Element", a mouseover on the direct link written on the element of the image tells me that it failed to load image. – maki57 – 2015-03-05T02:30:01.347

No I mean if you put the image URL directly into the address bar. You should get an error message, not a broken image icon. – Paul – 2015-03-05T02:31:07.107

@Paul I'll try this again once I'm clear to access the NSFW blogs. – maki57 – 2015-03-05T02:41:35.913

@maki57 Can you please edit your question to let us know where in the world you are seeing these errors? Meaning where are you—and the problematic Tumblr Internet connection—physically located? – JakeGould – 2015-03-05T19:08:15.440

Answers

10

UPDATE: It seems the core issue with images not loading stemmed from the way the EFF’s HTTPS Everywhere plugin/extension handled some Tumblr URLs. The developer’s were notified and a fix appears to be in place. This answer basically breaks down the detective work done to uncover the issue as outlined by the initial question and could prove useful for further debugging/diagnosis if a similar issue appears in the future.


EDIT: The larger content about image leeching seems invalid. So will add a new idea at the top and leave the image leeching info at the bottom just in case it is useful to someone.

Amazon CloudFront CDN Ideas

Okay, using the URLs you have provided—as well as some of my real world experience with Amazon CloudFront CDN setups—I think I discovered something. It seems like Tumblr’s Amazon CloudFront CDN config is choking for some reason. Here is why I think that is the case.

Let’s take this example URL:

http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png

Now let’s run curl -I to get header information on that file:

curl -I http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png

The output for that would be something like this:

HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 782141
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: max-age=1209600
Date: Thu, 05 Mar 2015 02:15:44 GMT
Server: nginx
X-Cache: Miss from cloudfront
Via: 1.1 7e54fc06cd70e4752fe050bbe5c130be.cloudfront.net (CloudFront)
X-Amz-Cf-Id: QyIUyzfaJJN3PU_xWkW0P-D2kjg_1cVenKzFAoY2PubgZQlBHWorZQ==

Now the things to pay attention to here are the Date (the date and time of the file on the CloudFront endpoint) and X-Cache (Amazon content delivery status) headers. Typical behavior on Amazon CloudFront is the first access will convey a “Miss from cloudfront” and then if you do another curl -I right away afterwards there should be a Hit from cloudfront.

But that’s not what I saw just now. Here is a breakdown of the Date and X-Cache status of a bunch of accesses I made:

  • Date: Thu, 05 Mar 2015 02:19:37 GMT = X-Cache: Miss from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:39 GMT = X-Cache: Miss from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:44 GMT = X-Cache: Miss from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:50 GMT = X-Cache: Miss from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:50 GMT = X-Cache: Hit from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:50 GMT = X-Cache: Hit from cloudfront
  • Date: Thu, 05 Mar 2015 02:19:50 GMT = X-Cache: Hit from cloudfront

The reason why there are multiple items with the same exact data which are Hit from cloudfront near the end is because that is what happens on a CDN: If the endpoint of the CDN has the file, then Date correlates to the actual creation/modification date of the file that endpoint has.

You notice the first four access are seconds apart, with different dates/times and all of them are Miss from cloudfront, right? That means the CDN endpoint is just echoing back that there was an attempt to access that file at those times and all attempts were misses.

So my armchair assessment of this is that Tumblr’s systems are not keeping up with the Amazon CloudFront CDN or the Amazon CloudFront CDN is not keeping up with Tumblr. But in some way, things are amiss on their server side. And since this is a CDN, someone accessing the files in one location might not notice an issue while someone else in another location would have issues viewing the image.

Which is all to say, I don’t think this can easily be cleared up on the client side.


EDIT: So the original poster added some new URLs, and this still points to a server-side issue, but I just wanted to post the details for the record.

EdgeCast & Highwinds CDN Ideas

So the original poster added more specifics, so here are more details based on the blog post that is being used as an example:

http://claystorks.tumblr.com/post/112741831192/soulmister-claystorks-windspeare-explain

And these image URLs are provided as examples of URLs in that post:

https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png

https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png

And those two image URLs do indeed fail. But from my side—looking at the original soure code of the blog post from Brooklyn, New York, USA—I am not seeing those EdgeCast (gs1.wac.edgecastcdn.net) URLs. Rather, these are the URLs I am seeing:

http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png

http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png

So my first thought is why is the original poster seeing those EdgeCast (gs1.wac.edgecastcdn.net). But then if I do a traceroute to the 41.media.tumblr.com I see that is a server managed by Highwinds (!?!?). In contrast the initial URLs passed on by the original user are using the 36.media.tumblr.com hostname and you can see they are managed by Amazon CloudFront CDN servers.

Which is all to say—which I said before—all of this seems to be a server side issue with Tumblr and their CDN management. But from my side—in Brooklyn, New York, USA—I am clearly seeing content being delivered as expected from Highwinds CDN servers as well as Amazon CloudFront CDN servers. Where these EdgeCast URLS are coming from or how/why they are then failing is out of anyone’s control on the client side. This would definitely be something to contact Tumblr tech staff about because there is no way a desktop end-user could resolve this.


Image Leeching Ideas

Might not be relevant anymore, but here for reference.

You stating this give me a clue:

Using wget on the images' direct links works.

Many sites have rules in place—usually set via Apache—that prevent image leeching. More details on how those rules work are provided here and is summarized as this:

Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image or CSS file on your site, for example, is either blocked (failed request, such as a broken image) or served a different content (ie: an image of an angry man).

Based on your description—and the fact you can access the images via wget—leads me to believe that the images you are having issues with are not hosted on Tumblr by users, but rather images that are placed on a Tumblr blog but actually hosted on another site.

When standard image leeching procedures are put in place, viewing an embedded image on one site that is hosted on another site—which blocks leeching—would result in a broken image link or perhaps a “Stop Leeching!” image being returned. This is because basic anti-leeching rules—such as those in that example page—crosscheck image referrers to make sure the page requesting the image matches the domain hosting the image.

So when you are accessing the image via wget you are accessing the image directly. So image leeching rules would not kick in. Thus you can get the image via wget but not when it is embedded in another page.

JakeGould

Posted 2015-03-04T23:54:27.683

Reputation: 38 217

1They're Tumblr image posts hosted by Tumblr. I'll edit the description. – maki57 – 2015-03-05T01:21:41.053

I may be mistaken, but I thought Tumblr used EdgeCast. Either way, thanks for the very interesting explanation. Does this still apply when considering the update I added to the question? – maki57 – 2015-03-05T11:31:57.657

1@maki57 Seems like Tumblr uses Amazon CloudFront, EdgeCast and Highwinds to serve CDN content from their sites. And from my vantage point in Brooklyn, NY I cannot reproduce this error; those Edgecast URLs fail for me but the page you link to gives me Highwinds CDNs. More details in my answer, but this is a server-side issue that needs to be brought up with Tumblr. Will vote to close this question for now since this is really not something you will be able to solve from the desktop which is what this site is about. – JakeGould – 2015-03-05T19:07:17.347

1You still were able to answer my main question of "why", anyway, so I still thank you very much for that. I'll report it to Tumblr soon. In the meantime, I'll just tell my friend to use wget for now. – maki57 – 2015-03-05T22:52:40.640

I was doing more tests yesterday and noticed that, on Firefox at least, that disabling settings for Tumblr in HTTPS-Everywhere sometimes works, but it's weird because it only happens if the add-on is enabled. Is this still connected with the answer you gave? – maki57 – 2015-03-08T23:31:56.090

1

@maki57 Well, looking at what HTTPS Everywhere does and the Tumblr specific ruleset it seems like that plugin might be highlighting a flaw in the way Tumblr deals with HTTPS. That plugin forces HTTPS, and they URL you are having issues with seems to be what “HTTPS Everywhere” forces all assets to use. Which is based on how Tumblr might work, but it could also be that Tumblr does not properly sync their EdgeCast HTTPS servers? I would let the developers of “HTTPS Everywhere” as well.

– JakeGould – 2015-03-08T23:54:39.307

5

I am currently having this very problem. This is a safe for work—well it’s a silly comic— example of an affected blog.

If found however that the problem happened only in Chrome for me. After a while, I realized that the cause of the issue was the extension “HTTPS Everywhere.” When I installed it in Firefox, I had the same problem there too. And actually, if I disable the HTTPS rule “Tumblr (partial)” (which I guess means *.tumblr.com), it works fine again.

So, the issue seems to be that, at least sometimes, when HTTPS is used to access an image, you are redirected to an invalid EdgeCast URL. For example, this image URL works fine:

http://36.media.tumblr.com/57d2af15f7b21037364125f9f32c4379/tumblr_nktjzyNkv91s667kio1_1280.png

But if you change the protocol from http to https you get redirected to this URL which does not work:

https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/57d2af15f7b21037364125f9f32c4379/tumblr_nktjzyNkv91s667kio1_1280.png

I am not sure if this counts as an error from Tumblr side or not. I guess that if clients are not supposed to access their media servers with HTTPS you cannot really blame them for it.

EDIT: And actually the problem seems to have been dealt with as reported in this GitHub thread.

jdehesa

Posted 2015-03-04T23:54:27.683

Reputation: 151

1

I’ve noticed this behavior more while on my mobile carrier, T-Mobile. I'm thinking this is some sort of traffic shaping based off of image size or some carrier built “difficulty metric” in retreaving said item.

In previous testing—over a year ago—I’ve then shared the broken post to a friend who has Verizon, and the image loads fine.

While I can’t test this image I’m about to provide—as my friend is unavailable—this image doesn’t load for me. I am running stock Android (5.0.1) on a Nexus 5 using Chrome as a browser.

http://41.media.tumblr.com/efebad51567e927b8f130f9bdc4efae3/tumblr_ndvnpjcBZa1qewacoo1_500.png

When I try to load the image directly I get a 504 gateway timeout error.

EDIT: This is @JakeGould posting the actual image for reference.

enter image description here

Further testing and details: I'm in Baltimore MD, running off of LTE data and the following image did work: http://40.media.tumblr.com/a5e0a96d36170c997aabad7efc630d3e/tumblr_njnalkSD7M1s5cyzso1_500.jpg

Further testing shows that PNG doesn't seem to be the issue. Most of the other images I hit that worked were a mix of png and jpg, but all were on non "41" servers.

Final note: I got home, hopped on my wifi -Comcast- with my phone -the device I have been testing on- and all the photos I couldn't see due to 504 I can now see.

EDIT: New to superuser, trimmed and edited post so it was more factual and less discussion.

UPDATE: Issue seems to be tied to LTE. Loaded up tumblr, found some images that wouldn't load, forced my phone down to 3g, reloaded page, all images show. Reverted phone back to LTE, cleared cache, and the images that previously didn't load under LTE now load.
(I'm testing again and now i can't reproduce. So maybe the above behavior was a fluke.)

userWCB

Posted 2015-03-04T23:54:27.683

Reputation: 11

This is good information, but what might also help is if you could provide some details on your physical location. I can see the image linked to quite well here in Brooklyn, NY, USA. And from my vantage point the image is being delivered by Highwinds CDN. – JakeGould – 2015-03-06T17:47:20.223