3

Most of the nosql autoscaling faces issue due to the fact the data have to be migrated during peak load. What if data is stored in a shared storage like CLVM which has less overhead(compared to NFS or shared file system). Now if each bucket/shard is a separate LVM and a compute can mount one or more LVMs based on the amount of shards its responsible for. On high load, the compute will give up few shards(umount LVM) and new compute which has come up will mount the shards. This decouples the storage and compute problems of DB and can make compute horizontally scalable. I know serverfault doesn't accept open ended discussions. Suggesting a forum to post this will also help me. If anybody could help me understand pitfalls on this idea are also welcome

kalyan
  • 229
  • 1
  • 2
  • 11

2 Answers2

2

What if data is stored in a shared storage like CLVM which has less overhead ... This decouples the storage and compute problems of DB and can make compute horizontally scalable. ... Suggesting a forum to post this will also help me. If anybody could help me understand pitfalls on this idea are also welcome.

If only compute is scaled up essentially everything is scaled down in comparison, except latency. You might end up trading compute scale-up for disk I/O scale-down (bottleneck), but you may be OK with the tradeoff and can add additional faster disks to avoid that; the additional chassis provides plenty of space.

Refer to: "DB-Engine's Website", "DataStax's - Data Replication in NoSQL Databases Explained", "Linux Journal - High-Availability Storage with HA-LVM" and "Wikipedia's Horizontal/Vertical / Database Scaling" webpages for additional information.

There are solutions such as MemCached(DB) which are easier to add than switching from NoSQL to other software, but software that promises better scaling is the better solution.

In this comparison of Casandra vs. MongoDB the verdict is: "If write scalability and 100% availability is your thing, Cassandra is a better fit for you.". There's always going to be some tradeoffs, you need a complete cost/benefit analysis that considers money and time - the worst situation is to go one way and wind-up dead-ended, forced to reapproach the problem when the same or a different wall is hit.

It's up to you to decide if the solution you want must be inexpensive in the short term or the long term, and what feature set (complexity) suits your needs and is worth your time. The first site linked above provides a means to compare different features sets and costs, along with providing links to many sites.

Let me suggest another way to go, distributed shared memory or reflective memory.

"A distributed-memory system, often called a multicomputer, consists of multiple independent processing nodes with local memory modules which is connected by a general interconnection network. Software DSM systems can be implemented in an operating system, or as a programming library and can be thought of as extensions of the underlying virtual memory architecture. When implemented in the operating system, such systems are transparent to the developer; which means that the underlying distributed memory is completely hidden from the users. In contrast, software DSM systems implemented at the library or language level are not transparent and developers usually have to program them differently. However, these systems offer a more portable approach to DSM system implementations. A distributed shared memory system implements the shared-memory model on a physically distributed memory system".

Along with a software solution for shared memory there's also a hardware solution. Here are a few vendors for distributed computing interface boards:

  • Dolphin - PXH812 Host / Target Adapter - Connects the PCIe bus of one computer to another at 64 Gb/s with 138 nanoseconds latency over copper at up to 5 meters (200 with Fiber). This is the transparent version (no software required, any OS and can mix CPU Arches), there's also the PXH810 NTB Host Adapter (non-transferable bridging) with a Shared-Memory Cluster Interconnect (SISCI) API but the OS is limited to: Windows, Linux, VxWorks or RTX. See their "PCI Express Reflective Memory" webpage.

  • Curtiss-Wright - SCRAMNet GT200 - dual-port memory appears on the host bus as additional host memory. The host reads and writes data through one port while the network writes data to memory through a second port. Data written to memory by the host is automatically transmitted by the hardware to all nodes on the network.

  • Abaco - PCIE-5565RC Interface Card - Shares 128 or 256 MB SDRAM on Linux, VxWorks, or Windows. Interface cards are available for PCI Express, PMC, PCI and VME.

Along with board to board you can add a hub to most of the above products and add between several and 256 nodes. It might be worth waiting until next year, and PCIe 4.0.

Rob
  • 320
  • 1
  • 3
  • 9
1

MongoDB, for instance, has the concept of the replica set for this situation. In this case multiple MongoDB instances serve the same data. If one fails, the others will continue serving the data. Shared storage is not necessary or desirable for a replica set; each MongoDB instance keeps a separate copy of the data.

This is entirely orthogonal to sharding, in which data is split among different MongoDB instances or replica sets.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • That helps in read scaling. But write scaling is not possible without resharding and rebalancing the cluster. So scaling compute also needs disk to be rebalanced. – kalyan Nov 25 '18 at 15:08