3

I´m building a url shortener web application and I would like to know the best architecture to do it in order to provide a fast and reliable service.

I would like to have two separate servicies in different machines.

  • The first machine will have the application itself with a apache, nginx, whatever..
  • The second one will contain the database.
  • The third one will be the one that will be responsible to handle the short url petitions.

UPDATE:

The service is not a url shortener at all. It was just easier to explain it like that.

I just need one machine that receives one http query and inserts a record on a database. And I need this machine to do this simple task in a very efficient way. The system will run on linux (I don´t know the distro yet) and I´m totally open to any language or technology. I was thinking using Yaws, Tornado or Snap for that service, but I don´t know yet and its time to plan the architecture for that part. The database will be built on Hadoop.

For the third machine I just need to accept one kind of http petition (GET www.domain.com/shorturl), but it have to do it really fast and it should be stable enough.

Ben Pilbrow
  • 11,995
  • 5
  • 35
  • 57
masylum
  • 147
  • 5
  • 2
    I'd recommend getting something working and profitable before worrying about making it high performance. – Chris S Jun 03 '10 at 18:41
  • 1
    Well, it's actually not *so* difficult to build this kind of application. The problem is, there are already just *so many* of them around... and good short domain names tend to be already taken; after "makeashorterlink.com", they just started to sound silly. – Massimo Jun 03 '10 at 18:53

3 Answers3

2

Do you really think there is need for yet another URL shortener? There are just so many of them around... unless you've by chance managed to acquire a very short and appropriate domain name, I just don't think your site is going to be noticed by anyone. Just my two cents, of course.

Anyway, to the technical part:

  • What language are you going to write your application in?
  • On which operating system are you planning to run it?
  • Will you be using free or commercial software?

It's difficult to answer your question without even knowing this.

The only answer that can make any sense here is "avoid Java like a plague". A Java application server is overkill for many applications, and it would for sure be overkill for such a simple one.

I'd go for Linux/Apache/MySQL/PHP here... if I could think of any good reason to even start the project, of course.


Edit:

Ok, now it makes a little more sense; but the suggestion to start as simple as possible and then worry about scaling up is still valid. If your application really is that simple, any decent web server/language/database combination should be able to process lots of requests per second on modern hardware (but I still strongly suggest avoiding Java).

If performance is paramount, I'd go with a CGI application written in C; that will be the fastest possible solution, orders of magnitude faster than any interpreted or VM language; and having it do simple INSERTs and SELECTs to a database shouldn't be so difficult. But I think LAMP is more than enough for your needs... they actually run Facebook on it, do you know?

Massimo
  • 68,714
  • 56
  • 196
  • 319
  • 2
    mod_perl would be faster than cgi in C. A C module for apache or nginx would be faster yet. And Facebook uses HipHop to compile their PHP into C, and uses an event engine to call it which technically isn't a LAMP stack. – user6738237482 Jun 03 '10 at 20:32
  • You can *always* find something faster if you dig deep enough; you could even roll your own web server. But I don't think such a simple application will need something like this, even if it *really* grew to thousands of requests per second. – Massimo Jun 03 '10 at 21:00
  • 2
    `orders of magnitude faster than any interpreted or VM language`: you have the benchmarks to back that up? – BMDan Apr 14 '11 at 12:11
0

Are these just recording data, or do they also send back something of interest? If they're just logging, then just use apache and fling the apache logs into hadoop. If they have to return some sort of data, then it's not at all clear to me how they get the data that they're returning.

Still, apache set up to just return a static file for any request is pretty damned fast.

user10501
  • 652
  • 1
  • 5
  • 7
0

First, I know you said it's not a URL shortener, but if it's anything similar, a RDBMS is a terrible way to store this data; since there's no real relationship between any two pieces of data, you want a flat storage engine. Consider Mongo (or Couch, depending on your actual solution space).

As to your solution, beware premature optimization. There are a lot of ways to go crazy with this; since you asked, the craziest that I can think of offhand might be to fire up Varnish, write all your pages in the VCL, and have it connect to memcache on the backend to store and retrieve the corresponding data. But realistically, that's batshit crazy unless you're under patently absurd loads.

BMDan
  • 7,129
  • 2
  • 22
  • 34