0

Currently trying a setup for a large project. The project will utilize tens of thousands of tables to chunk the big data into separate pieces that are faster to search. So to test this I'm creating these tables but notice that these are created very slowly.

Adjusting the schema for these tables requires me to (of course) delete the existing tables. But with it taking 10-30 seconds per table results in days of waiting time.

Command to delete table: echo "use keyspace;TRACING ON;drop table table28;exit;" | cqlsh --request-timeout=60000 > trace

The data will exceed 1,000,000,000,000 rows which is why they are being split up per time frame. We always know what the timeframe is so we split the tables up by timeframe. <5 columns though.

I was hoping someone could help me with debugging this to see how performance can be increased. Trace is linked below: https://ufile.io/gz9mz

Niels
  • 1

1 Answers1

1

More than several hundreds tables in Cassandra is a clear sign of the bad data model - if you have thousands of tables, then you need to think about how you're trying to solve your task. You need to take into account that for every table, a memory allocated on heap and off-heap to contain different metadata, etc.

Why you need to separate data into chunks - why it doesn't work inside one table? Are you using queries with ALLOW FILTERING? Can you describe the use case?

P.S. This question is more for StackOverflow or DBA StackExchange...

Alex Ott
  • 316
  • 1
  • 5
  • Search speeds need to be within milli seconds on minimal hardware (cost optimization). Imagine every entry has a date. What would be faster, searching through 1,000,000,000,000 records or searching through 50,000,000 from March 2014? – Niels Mar 24 '19 at 17:23
  • The system recommended I post it here as it's a performance issue I'm having – Niels Mar 24 '19 at 17:37
  • If you're doing search by partition key & clustering columns, then it shouldn't matter if 10^40 entries in the table, or 50*10^20 - if the data is more or less evenly distributed. If you're not searching by partition key, then you're doing Cassandra wrong, and need to change data model... And it's 99.9999% of cases when Cassandra works slowly. That's why I recommend to post to StackOverflow a question with your data model (table structure), what queries are performed, etc. – Alex Ott Mar 24 '19 at 18:42