1

I want to build a small warehousing application for my company. We have a central warehouse which distributes to 8 sales points across the country. They insist on an in-house solution. I am thinking to setup a central mySQL db Linux server and have the branches connect to it to store sales.

Queries to the db from the branches will be minimum, maybe 10 per hour. However I need all the branches to be able to store each sale data ( product ID, customer ID ) in the central db at peak time at most once every five minutes.

My question is can I get away with simple 24mbps/768kbps DSL lines? If not what is the bandwith requirement? Can I rely on a load balancing router to combine additional lines if needed? Can you propose some server hardware specs?

Ok let me clarify this a bit. All I need is to store data (prodctID, itemsSold) when something is sold and to retrieve availability in other stores. Eg to return quantity of certain product in other branches in order to re-supply from central warehouse or have some other branch send something over. I am guessing one row (branchName, itemQuantity) per every other branch (7 branches - 7 rows) whenever a branch is out of something. I think the data sent is minimal but I dont know if there is overhead. How can I estimate that?

user66734
  • 11
  • 1
  • 3
  • 1
    A take-away from the responses is "know thy queries" which can be hard to do if what is generating them was purchased and you don't have access to the code. At somepoint you will have to prototype (at least that is what I would be telling mgt) and that may have to be adjusted based upon what you discover. In the SQL Server world there is the Profiler tool to grab what hit's a db. Don't know about mySQL. – jl. Jan 13 '11 at 15:03
  • Just to provide a clue, it's probably worth mentioning that most credit card transactions (POS, ATM, etc.) in my of the world are still done using simple dial-up modems, many of which are only 14.4Kbps. – John Gardeniers Jan 17 '11 at 04:23
  • This is enlightening I never made the connection with POS. You are right I dont see why it cant be done. – user66734 Jan 17 '11 at 11:11

5 Answers5

4

Only you can answer this question by measuring or estimating the bandwidth usage based on the number of queries and/or transactions and the size of the dataset per query and/or transaction, multiplied by the number of queries and/or transactions over a specific period of time.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
2

As you are not going to transfer much data, you won't need too much bandwidth. Additionally compression using f.e. SSH Tunnels could help.

In my experience, latency is a much greater issue for remote (DB) applications.

FloE
  • 143
  • 6
  • Thank you I was thinking about encrypting connections but I thought it would slow down things. Compression didn't occur to me. – user66734 Jan 17 '11 at 11:05
1

You also could set up local MySQL instances in a cluster-like solution. Propagating of changes would be managed asynchronously by the database itself.

rabbit
  • 21
  • 3
  • You mean something like caching the db locally? This is not an option for security reasons. I wouldn't want the db copied by employees. – user66734 Jan 17 '11 at 11:08
1

Given the narrow scope of your project, it actually should be fairly simple to figure out the kind of bandwidth you'll be consuming and the amount of time it will take.

Sending queries

This is pretty simple to calculate. Let's assume this is your query:

SELECT COUNT(*) as Qty, Branch
FROM ProdsTable
GROUP BY Branch
ORDER BY Branch

That's 79 characters. 79 characters = 632 bytes, you have a 24Mb inbound connection so that query will take 24*1024*1024/632 simultanious queries (39819) before becoming bandwidth limited. I can't tell you how long that will take with any certainty though because:

  • It's available bandwidth will definately be dictated by the upstream speed of the client connection
  • There are additional headers and authentication that take place with queries, especially if you need to initiate the connection before sending the query

But it should be reasonably quick.

Retrieving data

Let's assume:

productID CHAR(20)
itemsSold INT

That's a grand total of 20 + 4 bytes for each row. 7 rows = 7*(20+4) = 168 bytes. You have 768Kb of outbound bandwidth, so you can send 4681 requests of that size simultaniously before they start getting squeezed at the bandwidth end.

Now forget everything I've just said

Because there's so much more to it than just that. As I've already alluded to, there's overheads in authentication, initiating connections, and then you've got latency over DSL links, possible contention ratio issues, and because it's not over a switched network there's every possibility that a whole bunch of TCP re-assembly and re-transmission is going to be required for every single query and this can affect perceived speed dramatically.

The only way to really know is to try.

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
  • Yes I ve calculated the data myself. Seems to me the bottleneck would be the servers upload. I can't think of a way to estimate the overhead though. Just wondering if anyone has done something similar and came up with a way or had tested and came up with numbers. – user66734 Jan 17 '11 at 11:19
  • I guess I 'll just try it. Are you aware of an app that is used in this sort of case (to log amount of data received)? Wireshark? – user66734 Jan 17 '11 at 11:31
  • @user66734 - Yeah, wireshark is a great place to start, just tell it to capture on the MySQL port and check the resulting file size. Also it will give you a good idea of how much background "noise" there is regarding authentication and keepalives. – Mark Henderson Jan 17 '11 at 19:13
0

The bandwidth requirement should be related not only to the frequency of queries, but it should be related to the queries output and table(s) size. For example, a single query may return one single row and another query may return thousands of rows. So, there is no specific answer unless you have an approximate of your data size and type of queries.

Khaled
  • 35,688
  • 8
  • 69
  • 98
  • Ok let me clarify this a bit. All I need is to store data (prodctID, itemsSold) when something is sold and to retrieve availability in other stores. Eg to return quantity of certain product in other branches in order to re-supply from central warehouse or have some other branch send something over. I am guessing one row (branchName, itemQuantity) per every other branch (7 branches - 7 rows) whenever a branch is out of something. I think the data sent is minimal but I dont know if there is overhead. How can I estimate that? – user66734 Jan 16 '11 at 14:38