UPDATE: It seems the core issue with images not loading stemmed from the way the EFF’s HTTPS Everywhere plugin/extension handled some Tumblr URLs. The developer’s were notified and a fix appears to be in place. This answer basically breaks down the detective work done to uncover the issue as outlined by the initial question and could prove useful for further debugging/diagnosis if a similar issue appears in the future.
EDIT: The larger content about image leeching seems invalid. So will add a new idea at the top and leave the image leeching info at the bottom just in case it is useful to someone.
Amazon CloudFront CDN Ideas
Okay, using the URLs you have provided—as well as some of my real world experience with Amazon CloudFront CDN setups—I think I discovered something. It seems like Tumblr’s Amazon CloudFront CDN config is choking for some reason. Here is why I think that is the case.
Let’s take this example URL:
http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png
Now let’s run curl -I
to get header information on that file:
curl -I http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png
The output for that would be something like this:
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 782141
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: max-age=1209600
Date: Thu, 05 Mar 2015 02:15:44 GMT
Server: nginx
X-Cache: Miss from cloudfront
Via: 1.1 7e54fc06cd70e4752fe050bbe5c130be.cloudfront.net (CloudFront)
X-Amz-Cf-Id: QyIUyzfaJJN3PU_xWkW0P-D2kjg_1cVenKzFAoY2PubgZQlBHWorZQ==
Now the things to pay attention to here are the Date
(the date and time of the file on the CloudFront endpoint) and X-Cache
(Amazon content delivery status) headers. Typical behavior on Amazon CloudFront is the first access will convey a “Miss from cloudfront” and then if you do another curl -I
right away afterwards there should be a Hit from cloudfront
.
But that’s not what I saw just now. Here is a breakdown of the Date
and X-Cache
status of a bunch of accesses I made:
Date: Thu, 05 Mar 2015 02:19:37 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:39 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:44 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
The reason why there are multiple items with the same exact data which are Hit from cloudfront
near the end is because that is what happens on a CDN: If the endpoint of the CDN has the file, then Date
correlates to the actual creation/modification date of the file that endpoint has.
You notice the first four access are seconds apart, with different dates/times and all of them are Miss from cloudfront
, right? That means the CDN endpoint is just echoing back that there was an attempt to access that file at those times and all attempts were misses.
So my armchair assessment of this is that Tumblr’s systems are not keeping up with the Amazon CloudFront CDN or the Amazon CloudFront CDN is not keeping up with Tumblr. But in some way, things are amiss on their server side. And since this is a CDN, someone accessing the files in one location might not notice an issue while someone else in another location would have issues viewing the image.
Which is all to say, I don’t think this can easily be cleared up on the client side.
EDIT: So the original poster added some new URLs, and this still points to a server-side issue, but I just wanted to post the details for the record.
EdgeCast & Highwinds CDN Ideas
So the original poster added more specifics, so here are more details based on the blog post that is being used as an example:
http://claystorks.tumblr.com/post/112741831192/soulmister-claystorks-windspeare-explain
And these image URLs are provided as examples of URLs in that post:
https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png
https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png
And those two image URLs do indeed fail. But from my side—looking at the original soure code of the blog post from Brooklyn, New York, USA—I am not seeing those EdgeCast (gs1.wac.edgecastcdn.net
) URLs. Rather, these are the URLs I am seeing:
http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png
http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png
So my first thought is why is the original poster seeing those EdgeCast (gs1.wac.edgecastcdn.net
). But then if I do a traceroute to the 41.media.tumblr.com
I see that is a server managed by Highwinds (!?!?). In contrast the initial URLs passed on by the original user are using the 36.media.tumblr.com
hostname and you can see they are managed by Amazon CloudFront CDN servers.
Which is all to say—which I said before—all of this seems to be a server side issue with Tumblr and their CDN management. But from my side—in Brooklyn, New York, USA—I am clearly seeing content being delivered as expected from Highwinds CDN servers as well as Amazon CloudFront CDN servers. Where these EdgeCast URLS are coming from or how/why they are then failing is out of anyone’s control on the client side. This would definitely be something to contact Tumblr tech staff about because there is no way a desktop end-user could resolve this.
Image Leeching Ideas
Might not be relevant anymore, but here for reference.
You stating this give me a clue:
Using wget
on the images' direct links works.
Many sites have rules in place—usually set via Apache—that prevent image leeching. More details on how those rules work are provided here and is summarized as this:
Using .htaccess, you can disallow hot linking on your server, so those
attempting to link to an image or CSS file on your site, for example,
is either blocked (failed request, such as a broken image) or served a
different content (ie: an image of an angry man).
Based on your description—and the fact you can access the images via wget
—leads me to believe that the images you are having issues with are not hosted on Tumblr by users, but rather images that are placed on a Tumblr blog but actually hosted on another site.
When standard image leeching procedures are put in place, viewing an embedded image on one site that is hosted on another site—which blocks leeching—would result in a broken image link or perhaps a “Stop Leeching!” image being returned. This is because basic anti-leeching rules—such as those in that example page—crosscheck image referrers to make sure the page requesting the image matches the domain hosting the image.
So when you are accessing the image via wget
you are accessing the image directly. So image leeching rules would not kick in. Thus you can get the image via wget
but not when it is embedded in another page.
Did I read 5 correctly, that other people cannot view images that are reblogged by the person with the issue? – Paul – 2015-03-04T23:57:32.867
I posted an answer, but what might help is if you could provide actual URLs to the blog posts which seem to break as well as URLs to the images that seem problematic. Please be sure to edit your question to add these details if possible. – JakeGould – 2015-03-05T00:09:18.390
@Paul I meant that if I view an image post by tumblrUser1 that doesn't load on the browser and if tumblrUser2, tumblrUser3 ... tumblrUserN reblogs tumblrUser1's post, the browser will also not be able to load in on those other users' pages. – maki57 – 2015-03-05T02:10:34.343
The examples you show are all PNG images. What is your friend’s operating system? Please edit the question to clarify that. It could be a core OS issue connected to PNG images. – JakeGould – 2015-03-05T02:14:04.847
@Paul I meant that if I view an image post by tumblrUser1 that doesn't load on my current browser and if tumblrUser2, tumblrUser3 ... tumblrUserN reblogs tumblrUser1's post, the browser will also not be able to load the image on those other users' pages. – maki57 – 2015-03-05T02:18:41.383
@JakeGould I don't think I can link the posts with the broken images because of the potentially NSFW nature of the blog. I'll edit an example in once I'm out of work or find a similarly broken one that's not NSFW. As for the PNG part, other blogs with image posts have PNG images that still work, so I don't think that's the problem. I'll edit in the fact that it only applied when viewing certain blogs. – maki57 – 2015-03-05T02:21:12.477
@maki57 The wget works, but are you saying that loading the direct image link in the browser doesn't work? What error do you get? – Paul – 2015-03-05T02:27:21.920
@Paul The "broken image" icon where the image should be in some cases or a blank space where it should be. Using Firefox's "Inspect Element", a mouseover on the direct link written on the element of the image tells me that it failed to load image. – maki57 – 2015-03-05T02:30:01.347
No I mean if you put the image URL directly into the address bar. You should get an error message, not a broken image icon. – Paul – 2015-03-05T02:31:07.107
@Paul I'll try this again once I'm clear to access the NSFW blogs. – maki57 – 2015-03-05T02:41:35.913
@maki57 Can you please edit your question to let us know where in the world you are seeing these errors? Meaning where are you—and the problematic Tumblr Internet connection—physically located? – JakeGould – 2015-03-05T19:08:15.440