5

It seems that network connections are usually faster than local disk seek, as discussed in the question Are networks now faster than disks?

I come up with this question when I am using Berkeley DB, an embeded database which uses caching mechanism to improve performance. When the database is very large and there is no big enough memory, disk seek reduces performance.

If the network is faster, I wonder if I can put the database to the memories of sevral remote computers and visit it through network so as to avoid disk seek.This can be an alternative solution of workstation-class PCs.

PS: I am not a native English speaker. So apologize for my inaccurate expressions. Thanks!

Dz.
  • 51
  • 1
  • 4

3 Answers3

4

As per the question and answer you reference, it may well be faster to contact a host over a network than it is to perform a local disk seek operation (depending on the network and the disks in question, of course).

That doesn't always translate into faster operation for a real life working system - keep in mind when you talk about putting databases "in the memory" of various distributed systems (and leaving aside the availability and latency issues that might arise) you have to remember that those systems will perform their own memory management (and might page your data out to their local disk, giving you the worst of both worlds) and may well have other work to do which will make a system resource such as the network connection, say, busy and cut down on your speed advantage.

There's a big difference between a relatively simple cache system and trying to run a database in the memory of a number of distributed systems as you seem to be doing. Some database transactions might become very cheap (aka fast) but others may become much more expensive, and you may find that a need to design for fast performance in this kind of system places constraints on your DB design that negate any benefits.

So my answer to you is a rather boring one: It depends. You'd need to test your specific system under load to see if any possible theoretical performance gains translate into real ones for your particular situation.

Rob Moir
  • 31,664
  • 6
  • 58
  • 86
1

The answer to this question, as with many performance questions, is:

"Maybe. Benchmark your situation and find out."

womble
  • 95,029
  • 29
  • 173
  • 228
  • I just want to make sure if it is feasible because it definitely will take a lot of time for me to benchmark it... – Dz. Jul 20 '11 at 12:05
  • The question you linked to already answers the question "is it feasible?" – womble Jul 20 '11 at 23:37
0

Considering the popularity of memcached I'd say definitely yes.

I get sub 1ms times with over 4 network hops in my network, while a disk seek on a 7200rpm drive can take anywhere from 1ms (if you're extremely lucky), through 15ms (average) up to few seconds (when sector is unreadable and the drive performs re-reads).

You should still have a dedicated LAN for access to memcached instances.

Hubert Kario
  • 6,351
  • 6
  • 33
  • 65
  • 1
    That memcached is widely used says *nothing* about whether network is faster than disk. – womble Jul 20 '11 at 10:10
  • 1
    "unreadable sectors" is a failure mode IMHO, thus not comparable with normal performance of memcached (or anything else) - otherwise you could get into all sorts of nonsensical arguments like "well, *if* there are x unreadable sectors, *and* the network drops y% of packets, *but* some of the databases are huge, *yet* it's two days before a full moon, *but also* ..." – Piskvor left the building Jul 20 '11 at 11:34
  • @Piskvor: Problem is, that a) HDDs do perform re-reads, b) even a fully operational, out-of-the box HDD will need to perform a re-read every million or so sectors and c) it will need to perform multiple re-reads (each taking additional 8.33ms) every 100 million sectors or so. What I meant is that the variability of disk is much greater than variability of network. – Hubert Kario Jul 20 '11 at 18:30
  • "the variability of disk is much greater than variability of network." -- oh if only 'twere true, 'twould be... 'tweriffic! – womble Jul 20 '11 at 23:38
  • @Hubert Kario: A re-read every million sectors or so? That is quite plausible - but I still see retransmits on a network significantly more often than once in a million packets. – Piskvor left the building Jul 21 '11 at 07:51