Multiple Web Application - Cache Layer Design

Question

I have multiple client e-commerce web applications running on a single VM, each web application is a node.js express application.

The express web application communicates with the back end via API's to retrieve the content for the pages and the list of products etc. However, to make the product searches quicker we're currently using an in-memory database (lokijs) for all search to happen against, this works pretty well for us at the moment as the average product catalogue for each client is only around 80 items, we have a simple cron running within each clients web app to refresh the in-memory database with the latest product list via an API call.

Some of the downsides to this are since we're using PM2 to run each application in a 2 node cluster, the in-memory database has to be duplicated as you cannot share memory between the nodes, to ensure that in-memory databases stay in sync we use IPC's to pass messages to all processes within the cluster.

As we're bringing larger clients on-board with larger product catalogues we don't really want to be keeping duplicate in-memory databases for each child process.

The way our offering works is although the product catalogues aren't large per client, the search volume is quite large, so a client with 80 items may still get 1000's of searches a day.

I had a couple of options in mind, however unsure what would be the best:

Option 1 - Single global elasticsearch cluster

I have spoken to and thought about using elasticsearch and then point every single client web application to a single global elasticsearch cluster to perform product searches against, this way we can use events to keep the elasticsearch database up to date in real time.

However, i don't know how this will perform as we scale up, i don't want elasticsearch to become a bottleneck

Option 2 - Local nosql database

The second option was to simply replace the use of lokijs as the in-memory database and have a shared nosql database (such as mongo) for each VM, all the web apps then use the local database for queries, each web app is still responsible for updating the product catalogue for their own store. Then as we add more VM's each VM will have it's own local database for any apps running on there to use

We are heavy users of faceted search, and aggregations on facets to get counts for the facets

Wait, what exactly is being duplicated in memory? Do different clients have identical product catalogues? — Michael Hampton, Dec 04 '20 at 15:39
No the clients will have different product catalogues, but if i create a new in-memory datastore in node, that in-memory datastore can only be accessed by that one particular process, when spinning up 2 instances of the application they are effectively 2 individual processes and don't share memory, they each hold a copy of the clients product catalogue. They way to overcome that would be to move the datastore outside of node to a shared one such as mongodb and then both instances can connect to the same instance of mongodb — Tam2, Dec 04 '20 at 16:11
Seems more like something you would use redis for, (or Amazon's ElastiCache branded version) rather than mongodb. — Michael Hampton, Dec 04 '20 at 16:43
We store our products as individual JSON documents, then need the ability to perform a search against the documents for things like name, price, colours etc - Don't think you can perform searches against JSON documents in redis as it just key-value? — Tam2, Dec 04 '20 at 16:54
No, you certainly need a document store. But your question was about caching, not your document storage. — Michael Hampton, Dec 04 '20 at 16:56
Sorry, my initial question might not have been clear, our main datastore is a SQL Server this ultimately holds all the products for all our clients. Then for each clients website, we need to show the inventory and also allow the customers to search through that inventory. We don't want to have to call our main datastore for each search request, so wanting a local cache/datastore for the clients websites to query against (think shopify). So although it is a cache, i think we still need a document store as the cache so we have the ability to search against it etc — Tam2, Dec 04 '20 at 16:59
Eh? Why don't you want to search the data in SQL Server? That's what it's for! — Michael Hampton, Dec 04 '20 at 17:00
So we have 50 client websites at the moment, i didn't think it would be performant for an API call to be to our backend made every single time a customer performed a search on their website? Thought that the API/SQL server would become a bottleneck very quickly as we added more websites The same SQL database is used for the back-end application where the clients add the products to the system etc Again my assumption may be completely wrong and i don't need to worry about the API/SQL server becoming a bottleneck for a while yet? — Tam2, Dec 04 '20 at 17:05
You're essentially making a new database and copying everything to it in order to avoid searching the database. It really doesn't seem to make sense. SQL Server can handle anything you throw at it. This whole network of sites (including [so]) runs on it. — Michael Hampton, Dec 04 '20 at 17:10
Fair point, guess i should probably move a few sites to search directly against our API + SQL server and monitor the performance and see if it holds up or if we need a separate database/cache just for inventory searches as the API is used in the applications for retrieving other page content directly too — Tam2, Dec 04 '20 at 17:15
If the SQL server becomes a bottleneck in that case, implement caching of SQL queries on the client nodes by using memcached or similar caching solutions. — Tero Kilkanen, Dec 05 '20 at 19:09
Great thanks for the advice both, we currently use axios-cache-adapter to cache API calls for the CMS side of things, i guess we'd extend the use of this to cache the products too — Tam2, Dec 06 '20 at 14:06

Multiple Web Application - Cache Layer Design

0 Answers0