1

I am running Apache Cassandra 3.11.1 and have 6 tables sizes in the failing state.

Max partition is larger than 100MB.

For these 6 tables the partition sizes are on average between 200MB and upwards of 5GB. These 6 tables are split across 3 key spaces and are specific to Akka Persistence eventsByTag (i.e. eventsByTag1, eventsByTag2).

Much of the data in these tables is not used it still needs to be available.

I'm looking at changing the data model however at the same time I'm trying to better understand what the impact is of having large partition sizes.

Other than running out of memory or hitting Cassandra limitations what are some of the other negative impacts of having large partition sizes if most of the data is not accessed?

A specific case that might be related (not confirmed) is that I'm currently running Cassandra with materialized views and elasticsearch. At times the projections used to update elasticsearch with data from Cassandra fail and I'm not yet sure if this is related.

The error message I receive in this case is:

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: 
Cassandra timeout during read query at consistency LOCAL_QUORUM (2 
responses were required but only 1 replica responded)
Charles Green
  • 55
  • 1
  • 8

1 Answers1

0

With this version of Cassandra it should be better than before, although still there could be performance problems accounted to accessing to many SSTables, making selections only on partition key, etc.

This presentation gives good overview of work done to support "wide partitions", although it's still recommended way to re-model data.

Alex Ott
  • 316
  • 1
  • 5
  • Many thanks Alex. The PPT and video is very helpful. SSTables and the growth of objects on the JVM that need to be serialized and garbage collected were my main take aways when partition sizes increase. – Charles Green Jan 11 '18 at 06:00