I have the following table
CREATE TABLE DiaryEntries
(
[userId] [uniqueidentifier] NOT NULL,
[setOn] [datetime] NOT NULL, -- always set to GETDATE().
[entry] [nvarchar](255) NULL
)
Each user will insert around 3 entries per day. There will be around 1'000'000 users. This means 3'000'000 new records in this table every single day. Once a record is older than 1 month, we delete it.
Most of the queries have the following WHERE clause:
WHERE userId = @userId AND setOn > @setOn
Most queries return no more than 3 rows, except for one which returns all rows inserted within this month (which is at most 90 rows).
The date and userId cannot be changed, once the record is inserted.
Now my question is - how to best arrange the index for this table? I am stuck with the two alternatives:
- Clustered index on (userId, setOn) - this will give me fast searches, but I am worried about excessive page splits, because we will insert a lot of intermediary values (same userId but different date).
- Non-Clustered indexes on (userId) and on (setOn) - this will also cause page splits on (userId) index (but is it as expensive as in the first option?). The search is slowed down because we are using NC index.
- Clustered index on an additional column (id) and Non-clustered index on (userId, setOn) - this will eliminate page splits of the data table, but will still cause some on the NC index. This option is also not optimal for search, since we search using NC index.
What are your suggestions? Any other options?
PS - Thanks for your time.
After 2 days of pondering, I came up with a different solution to this problem.
CREATE TABLE MonthlyDiaries
(
[userId] uniqueidentifier NOT NULL,
[setOn] datetime NOT NULL, -- always set to GETDATE().
[entry1_1] bigint NULL, -- FK to the 1st entry of the 1st day of the month.
[entry1_2] bigint NULL, -- FK to the 2nd entry of the 1st day of the month.
[entry1_3] bigint NULL,
[entry2_1] bigint NULL,
[entry2_2] bigint NULL,
[entry2_3] bigint NULL,
...
[entry31_1] bigint NULL,
[entry31_2] bigint NULL,
[entry31_3] bigint NULL,
PRIMARY KEY (userId, setOn)
)
CREATE TABLE DiaryEntries
(
[id] bigint IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[entry] nvarchar(255) NOT NULL
)
Basically I grouped 31 days into a single row. This means I insert a new record only once a month per user. This reduces page splits from 3 times a day per user to once a month per user. Obviously there are drawbacks, here are some of them
- Row size is huge - however in 99.999% of the time I query only a single row from MonthlyDiaries.
- I am potentially using more space than I need, because some days might have no entries. Not a big deal.
- To find an entry for a specific day will require an extra index seek on DiaryEntries. I believe it is not going to be such a great cost, since I am retrieving not more than 90 rows and in 80% of the cases I retrieve only 1 row.
Overall, I think it is a good trade-off: reducing from 3 page splits/day/user to just 1 page split/month/user, but in return paying a small price by making my searches slightly slower. What do you think?