4

In SQL Server, I'm looking at TableA, which currently has a uniqueidentifier clustered primary key. The GUID has no meaning in any context.

(I'll give you a second to clean up your keyboard and monitor and set down the soda.)

I'd like to drop that primary key and add a new unique integer primary key to the table. My question is this: when I drop the index, modify the column from uniqueidentifier to int, and add the new clustered unique primary key to the modified column, will the new PK values be in the order of insertion into the table, or will they be in some other order? Is this the right way to go here? Will this work? (I'm kind of a noobkin with regard to table creation/modification.)

stack
  • 143
  • 4

4 Answers4

4

When you drop a clustered index the table becomes a heap. Since heaps have a very different physical structure from indexes, the data will have to be copied into the new structure. Heaps have no order whatsoever. When you add back a new clustered index, the data will be copied from the heap into the new index, and the order will be defined by the new clustered key.

If you want to preserve the existing order then all you have to do is assign the new integer ids properly:

ALTER TABLE Table ADD Integer_Id INT;
GO

WITH cte AS (
  SELECT ROW_NUMBER() OVER (ORDER BY Guid_Id) as RowOrderByGuid,
    Guid_Id
  FROM Table)
UPDATE t
  SET t.Integer_Id = c.RowOrderByGuid
FROM Table t
JOIN cte c ON t.Guid_Id = c.Guid_Id;

Now the order of Integer_Ids will match the order of Guids. You can drop the Guid column and add a clustered index on the new Integer column and the physical order of records was preserved.

Remus Rusanu
  • 8,253
  • 1
  • 19
  • 22
3

By definition, a clustered index imposes a physical ordering on actual data pages; so, yes, if you drop a clustered index and create a new one, this will force a physical reordering of data.

In your case, I think it's safe to assume the following will happen:

  • The existing clustered index will be dropped, but the actual data on disk is not going to move because of this.
  • You will modify the column type (or drop the existing column and create a new one), setting costraints for it to be not null, unique, primary key, identity and autoincrementing (this is vital, or SQL Server won't even let you add it as it wouldn't know what to put into it).
  • At this point, the column will be automatically populated by SQL Server. I don't know for sure what will happen here, but I think it will be populated in the order rows are phyically stored in the database. But I'm just guessing about this.
  • Problem is, ordering can be quite messy when UIDs are involved; so you don't know how data is actually stored now, and you don't know how it will be stored later; if my guessing about column population is correct, there will not be a huge reordering... but it could happen; and, even if I'm correct, the index building is anyway going to take a while, if the table is large enough.

Bottom line: you will have a huge impact, and you could get rows from an unordered SELECT in the same order you get them now. You will have to try.

Massimo
  • 68,714
  • 56
  • 196
  • 319
2

A clustered index, by definition, determines the physical order of the data, so when you create the new clustered index, the data will be re-ordered; if it's a large table, plan for that to take a while.

SqlACID
  • 2,166
  • 18
  • 18
  • Cool that's good to know. I thought indexes were just how SQL Server read the data and not necessarily having to do with the actual physical location of it. I'm still pretty limited in my knowledge about indexes so this was helpful. – Sean Howat Jun 03 '10 at 17:14
  • 1
    That's true for normal (unclustered) indexes; the clustered one (there can be only one per table) is different exactly for this reason. – Massimo Jun 03 '10 at 17:16
2

If you create a table with a clustered primary key and then drop the clustered PK the physical order of the data in the table will be undisturbed. However, the physical order of query results are not guaranteed to be the same as the order in the table, so this ordering is fairly meaningless.

If you then add an integer column and create a clustered primary key on that, the table will be rearranged into whatever order the key sorts in. This may or may not be the same physical order as the GUID depending on how the key is assigned. You could explicitly assign it based on the sort order of the GUID key (e.g. using row_number() over the old key ordering), or you could assign it some other way. Unless you take steps to ensure that the ordering is explicitly made the same the physical order or the rows in the table is not guaranteed to drive the ordering of your new key.