0

I am making an image hosting service. This is my first time building a large site, so I don't have much knowledge about creating a reliable web service, but I THINK I have this figured out.

How should I set up my site hosting to accommodate large amounts of traffic? This is what I was thinking. Tell me if this is a good idea:

  1. Pick any cheap provider that has php and mySQL.
  2. Store only the back end stuff on that cheap provider (php scripts, server config files, sql database).
  3. Use Amazon S3 to store all the front end things, like css, js, images, and of course, all the stored images that the users upload (this is an image hosting website).

Does that work? That means that the cost from large amounts of traffic is all done through Amazon S3, right? The cheap provider shouldn't get hit with any significant costs because all it's doing is running scripts and updating the database? Or will that also add up (and run slowly)?

Should I move the database to Amazon SimpleDB? I also hear I can run the site using Amazon EC2, but it looks like that takes a lot of work to set up (and is expensive). I guess what I'm asking can be summarized as: what's the most cost-effective way to reliably run an image-hosting website?

Thanks.

lifeformed
  • 103
  • 2
  • 1
    If you're serious about doing this; go work for someone who does this stuff already, get the prerequisite knowledge, then if you're still interesting in doing this you'll know how. – Chris S Apr 29 '11 at 14:50
  • @Chris - It's just a small project, I just hope I'm not getting in over my head. I already have almost all of the site built; I think I'll see how it goes and enlist some help if it gets crazy – lifeformed Apr 29 '11 at 18:39

2 Answers2

1

The most cost-effective way to run a large-scale image (or any other) hosting website is not to run one -- Bandwidth costs alone will be astronomical if you achieve any level of popularity.

That all being said, I would test the waters on some commodity virtual private server host first (vps.net, linode.com, etc. -- Not endorsements, google around). If you're growing fast enough & it looks like you might be able to at least break even on costs you can expand to a scenario like you described.


From an architecture standpoint I would suggest that if you're going "into the cloud" you go all the way -- Shoving data back and forth to your back-end systems will be slower if it's not "near" the front-end, and I believe cloud providers are pretty generous with bandwidth that doesn't leave their networks.
Also take a lesson from the recent Amazon EBS-related outages and ensure that you have an appropriate level of redundancy in your cloud-based services. Customers tend to get whiny when you lose all their family photos :-)

voretaq7
  • 79,345
  • 17
  • 128
  • 213
1

So, here are a few points that may help you out.

  1. Provider: For most of the front end side of things, if cost is your main factor then it is up to you to find out what provider suits your needs. Reliability, Cost, and Scaling are all factors that you will need to consider.
  2. Note, unless you have the user download some kind of client side program (Flash, JS etc) your servers will have to receive the file and then upload it to S3 for them. This will induce a lot of load as well as bandwidth costs. However, it will also give you much better control over 'what' can be uploaded and how. Once you hand control over to the client you will not be able to truly control what gets uploaded.
  3. S3 is great for storing static content and it will be key in creating a site like this and keeping costs in line. Make sure you properly control who has upload permissions to which buckets. For example, if you have css and javascript in one bucket, only you should be able to upload to that location, otherwise a malicious user could upload some nasty files to replace your content. On the other side if you are going to allow the user to upload content directly to save on bandwidth, you will have to make sure that is a separate bucket, ideally per user. This is not trivial to enforce, and nearly impossible if you provide the client direct upload access.

Depending on your upload configuration (Client Side Client vs Server Side Client) your needs will be different. Client Side will be cheaper up front for server costs, but be aware that someone will probably find a way to store any kind of file and you will be responsible for moderating that content. For the Server Side model, be prepared to have your server costs increase with user traffic as you will need to build out more servers to handle upload requests.

Once you have the content hosted you will also want to look into a CDN (Content Delivery Network) such as Amazon's CloudFront (if you want to stay on the Amazon stack) or Akamai Networks. These will increase your costs at first, but save you money on high usage content.

Amazon SimpleDB is an interesting Database style. It is 'eventually consistent' which means that data sent to the database may not be immediately accessible, similar to Amazon S3. If you are going use the database as a way to keep data synced across multiple nodes for many realtime transactions, I would not recommend it.

Flashman
  • 1,311
  • 10
  • 9
  • Hehe, just saw the other posts/comments. I have to agree. It is going to be very expensive once you get big and it is a very complex problem that will require talent and experience. – Flashman Apr 29 '11 at 15:00