6

I've inherited a very large and very messy database recently and need to clean it up. To give some idea of size, the primary database currently contains 3 big tables each with around 300 million rows and takes up about 225GB storage space. Over 5 million rows are added each day.

Because of a critical disk space shortage (predecessor did not archive old data or manage it's size at all), I was forced to delete about 280 million rows from the largest table. This process took over 25 hours to complete and the database needed to be cut off from customer-facing applications during that time.

Now, I need to reindex the table because selects and inserts take a very long time. However, I can't just take the database offline indefinitely, I need to be able to estimate the amount of time needed to perform the reindex. I have never reindexed a table so large before, so I don't really have any good reference points to draw upon.

The primary table includes a clustered, monotonically-increasing primary key, and a non-unique non-clustered key as well. I have plenty of disk space available to perform the reindex.

So my question is this: About how long do people thing this will take me? What is a good rule of thumb for estimating reindexing time?

Whiteknight
  • 183
  • 1
  • 1
  • 5
  • 1
    Why not take a current back up of the database, restore it to a lab machine of somewhat similar specs, and try it? – user6373 Nov 12 '09 at 17:47
  • 1
    that's a good question. The problem is that I don't have a spare machine available with enough muscle to try it. It's going to take 225GB of storage to hold the replicated database and (I'm estimating) up to another 100GB of storage for the log. The database server is pretty beefy (2 quad core xeons, 16GB RAM, etc) and no other machines I have available are even close to that. So any experimental will be "slower, but I'm not sure how much slower", which isn't really a good answer and isn't something I'm going to take the time to learn. – Whiteknight Nov 12 '09 at 18:12

4 Answers4

3

Have you actually checked how fragmented your tables / indexes are? Try running the following query against the database(the query runs on SQL2005 or above). Note this query will impact your server and should be run at a quiet time:

SELECT    OBJECT_NAME(i.OBJECT_ID) AS TableName,
        i.name AS IndexName,
        indexstats.avg_fragmentation_in_percent
FROM    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') indexstats
        INNER JOIN sys.indexes i ON i.OBJECT_ID = indexstats.OBJECT_ID
    AND i.index_id = indexstats.index_id

You can selectively reindex the tables/indexes which are most fragmented.

SuperCoolMoss
  • 1,252
  • 11
  • 20
1

Any database operation is highly dependant on the hardware you are running it on.

Since you say you cut out a very large amount of rows already, it shouldn't be a problem to run it during off-peak hours.

You could set up replication and offer that database to your customers while you clean up the old one and set it in read-only so they can still get any data they might need.

gekkz
  • 4,219
  • 2
  • 20
  • 19
1

No way to estimate how long it will take - so many different things would have an impact. Similar to what Paul writes about the length of a CHECKDB, very similar things would come into play with indexing and more. The best answer would be how long has it taken in the past? If you can't determine that, next best option would probably be to try it on a similar non-production environment, but even that won't necessarily match (i.e. concurrency concerns, etc. would be different).

As a side note, you may want to look into ONLINE INDEX OPERATIONS in Sql 2005 and above...I can't add a second hyperlink, but google "online index operations sql server" and click the top link.

boydc7
  • 531
  • 2
  • 6
  • Supplement for "second link" https://docs.microsoft.com/en-us/sql/relational-databases/indexes/guidelines-for-online-index-operations?view=sql-server-2017 – Mark Schultheiss Jun 13 '18 at 10:39
1

Once you get the table rebuilt initially, you'll want to set up maintenance jobs to periodically defrag/rebuild the indexes. Michelle Ufford aka @SQLFool has an excellent set of scripts for this:

http://sqlfool.com/2009/06/index-defrag-script-v30/

You set the thresholds of when you want to defrag vs rebuild. It automatically detects which indexes it can rebuild online, and does that as well, giving you some uptime benefits.

Be aware that index operations can throw a ton of activity into the transaction log, which can really slow down database mirroring and transaction log backups.

Brent Ozar
  • 4,425
  • 17
  • 21
  • 1
    I can also recommend Ola Hallengren's soloution which also includes integrity checking and backups: http://ola.hallengren.com/ – SuperCoolMoss Nov 13 '09 at 19:23