-1

I've written a real-time socket API that communicates with a data provider and emits data models to connections subscribed to model events. Imagine the provider is StackExchange and the model is a question, ie. emit every question as it is created to each connection subscribed to receive it.

It dawned on me that is would require massive bandwidth. If each model is ~5kb of JSON data, and I'm emitting on average 25 models per second to each connection that's roughly 5kb * 25 * 86400 = 10,800Mb per connection per day, or 324Gb per connection per month. On a standard 5TB / month VPS plan, that means I can only handle 15 connections which is well below the usage would be (could even be as high as 1000 connections).

Does this mean that my app is basically useless because I can't afford to run it? It's open source and can't be monetized so there is no option for ads or investment.

Should I just let it go?


Edit: Users can subscribe to a subset of the data, but the option to "send me everything" is a core use case. This means I have to use that as the worst case metric.

rtheunissen
  • 109
  • 2
  • Is it the same 25 models for everyone? Maybe you can look for sponsors who'll use it and can help with costs? – djsmiley2kStaysInside Jul 03 '15 at 16:56
  • 3
    Think more carefully about the design. Almost nobody needs to receive _everything_. They're usually only interested in a small subset of data. – Michael Hampton Jul 03 '15 at 17:44
  • 1
    Multicast is ideal for that kind of thing. – MadHatter Jul 03 '15 at 17:58
  • @MichaelHampton I've edited the question to mention that users can opt for only a subset, but by worst case analysis they *can* opt to receive everything. – rtheunissen Jul 04 '15 at 03:17
  • 1
    @MadHatter I've been reading and watching things about multicast all day and I'm still not 100% sure how to actually implement it. It has also led me down a UDP / P2P rabbit hole haha. – rtheunissen Jul 04 '15 at 03:19
  • @paranoid-android in general, multicast will only work on a network you control from end to end. It will not work over the internet. P2P methods might work. – Grant Jul 04 '15 at 03:41
  • possible duplicate of [Can you help me with my capacity planning?](http://serverfault.com/questions/384686/can-you-help-me-with-my-capacity-planning) – kasperd Jul 04 '15 at 11:00

1 Answers1

1

Open source (usually) doesnt mean it cant be monetized. You can give the source code or a VM image away and let people host their own server, or charge them to do the hosting for them. Plenty of open source projects go with that method - free to run it yourself, or software as a service for $x/month or $x/gb traffic.

Other than that, you can save bandwidth by:

  • Limiting what parts of the JSON data you send. If there are parts of the data most people dont need, make sending them optional.

  • Put filters on subscriptions so they can request a subset of events. In your example that might mean my connection only shows serverfault questions with the bandwidth tag.

  • Make sure everything is being compressed. Text compresses quite well, so the bandwidth saving should be substantial.

All of these also benefit your users. They pay for bandwidth too, and nobody wants 100GB of data they didnt need sent through their connection.

There are also hosting plans with unmetered bandwidth. Then your only issue is data per second, not per month. Beware that many cheap "unlimited" plans are unlimited until we feel you have used too much, so do some research before buying.

Grant
  • 17,671
  • 14
  • 69
  • 101
  • Thank you very much for your answer, @Grant. I've edited the question to indicate that users do have the option the only subscribe to a specific 'channel' of the data, but I would like to cater for the worst case when every client subscribes to everything. Unmetered bandwidth is not a bad idea, I'll take a look at some providers. :) – rtheunissen Jul 04 '15 at 03:22