0

I am trying to understand what ElasticSearch exactly does when it comes to persistence.

When I index documents in ElasticSearch, I also within this step save them within ElasticSearch. Though the reason I index my documents with ElasticSearch is to be able to search for them using the API and certain search algorithms.

Then, within for the rest of my application, I also have a relational database where I store a lot of stuff concerning my application. Within the relational database I also have the same documents I index in ElasticSearch.

As a result I have the documents saved in both ElasticSearch and the relational database. I learend about ElasticSearch to get extended search abilities, but now I wonder whether the step of saving the documents in ElasticSearch and the relational database isn't a bit redundant.

Would it be wise to delete the documents from the relational database and use the indexed documents in ElasticSearch as the data source?

Socrates
  • 241
  • 4
  • 13

1 Answers1

1

Maybe. This is a design decision that our friends over at DBA Stack Exchange may have more to say about. Sometimes you make redundant copies because of different search or reporting needs, or database engines with different characteristics.

As a part of this design, understand the safety of Elasticsearch especially as it is a distributed system. The Jepsen report is particularly interesting. Network partitions can result in document loss in some scenarios:

My recommendations for Elasticsearch users are unchanged: store your data in a database with better safety guarantees, and continuously upsert every document from that database into Elasticsearch. If your search engine is missing a few documents for a day, it’s not a big deal; they’ll be reinserted on the next run and appear in subsequent searches. Not using Elasticsearch as a system of record also insulates you from having to worry about ES downtime during elections.

A practical example, I know of an enterprise with a sprawling MediaWiki who built a search engine in Elasticsearch. The wiki and other sources have their own DBMSes supporting lots of CRUD applications. Then tens of millions of documents of all kinds are stuffed into the search engine, so people can usually find things.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32