5

I'm inheriting a database that has 400 tables and only 150 foreign key constraints registered. Knowing what I do about the application and looking at the table columns, it's easy to say that there ought to be a lot more.

I'm afraid that the current application software will break if I started adding the missing FKs because the developers have probably come to rely on this "freedom", but step one in fixing the problem is to come up with the list of missing FKs so we can evaluate them as a team.

To make matters worse, the referencing columns don't share a naming convention.

The relationships ARE coded informally into the hundreds of ad-hoc queries and stored procedures, so my hope is to parse these files programmatically looking for JOINS between actual tables (but not table variables, etc).

Challenges I foresee in this approach are: newlines, optional aliases and table hints, alias resolution.

  • Any better ideas? (Besides quitting)
  • Are there any pre-built tools that can solve this?
  • I don't think regex can handle this. Do you disagree?
  • SQL Parsers? I tried using Microsoft.SqlServer.Management.SqlParser.Parser but all that is exposed is the lexer - can't get an AST out of it - all that stuff is internal.
Jason Kleban
  • 786
  • 3
  • 8
  • 19

3 Answers3

4

I feel your pain.

The free SQL Search SSMS Addin might be helpful for you.

In general, Yes, regex can handles this, but you should be aware of the point of diminishing returns in attempting to conjure regex magic. You may be better off just reviewing and searching through the code while mapping out the relationships.

SQL Search might make this a lot easier for you.

unhappyCrackers1
  • 977
  • 1
  • 6
  • 18
2

Here's what I came up with. This query looks for foreign-key-like columns (int, bigint, guid) which are not the primary key of the table and which are not currently registered with a foreign key constraint. Sure, I get a few Sort Order and Quantity columns, but it really narrows down the list with less effort than parsing out SQL scripts.

WITH ExistingFKCs AS
(
    SELECT
        CU.TABLE_NAME, -- Referencing Table
        CU.COLUMN_NAME -- Referencing Column
    FROM 
        INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C
        JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME = CU.CONSTRAINT_NAME
)
SELECT 
    T.TABLE_NAME AS [Table Name], 
    COL.COLUMN_NAME AS [Column Name]
FROM 
    INFORMATION_SCHEMA.TABLES T 
    JOIN INFORMATION_SCHEMA.COLUMNS COL ON 
        T.TABLE_TYPE = 'BASE TABLE' AND
        COL.TABLE_NAME = T.TABLE_NAME
    LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PKC ON 
        PKC.CONSTRAINT_TYPE = 'Primary Key' AND 
        PKC.TABLE_NAME = COL.TABLE_NAME
    LEFT JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE PKCU ON 
        PKCU.TABLE_NAME = PKC.TABLE_NAME AND 
        PKCU.CONSTRAINT_NAME = PKC.CONSTRAINT_NAME AND
        PKCU.COLUMN_NAME = COL.COLUMN_NAME
    LEFT JOIN ExistingFKCs EFKS ON
        EFKS.TABLE_NAME = COL.TABLE_NAME AND
        EFKS.COLUMN_NAME = COL.COLUMN_NAME
WHERE 
    PKCU.COLUMN_NAME IS NULL
    AND EFKS.COLUMN_NAME IS NOT NULL
    AND COL.DATA_TYPE IN ('int', 'bigint','uniqueidentifier')
ORDER BY T.TABLE_NAME, COL.COLUMN_NAME
Jason Kleban
  • 786
  • 3
  • 8
  • 19
2

Powerful SQL Parser can help to analyze the hundreds of ad-hoc queries and stored procedures automatically, and from the query parse tree generated by SQL Parser, you can easily to find relationship of variables table/columns.

Here is sample:

SELECT a.ASSMT_NO,
   b.LINK_PARAM,
   c.EXPL                                               AS LINK_PG,
   (SELECT count()
    FROM   GRAASPST t
    WHERE  t.ASSMT_NO = a.ASSMT_NO
           AND t.ROLE != '02')                          AS PSN_CNT,
   (SELECT count()
    FROM   GRAASPST t
    WHERE  t.ASSMT_NO = a.ASSMT_NO
           AND t.ROLE != '02'
           AND ASSMT_FIN_YN = 'Y')                      AS PSN_FIN_CNT,
   (SELECT Avg(assmt_pts)
    FROM   GRAASSMT t
    WHERE  t.ASSMT_NO = a.ASSMT_NO
           AND t.ASSMT_TGT_SEQ_NO = a.ASSMT_TGT_SEQ_NO) AS ASSMT_PTS_AVG,
   a.ASSMT_RES,
   a.ASSMT_RPT_SUB_TITLE
FROM   GRAASTAT a
   JOIN GRAASRET b
     ON b.DELIB_REQ_NO = a.DELIB_REQ_NO
   JOIN GRTCODDT c
     ON c.DIV_CD = 'GR013'
        AND c.CD = b.DELIB_SLCT
   JOIN CMUSERMT d
     ON d.USERID = a.REGID
WHERE  a.ASSMT_NO = :ASSMT_NO
ORDER  BY a.ASSMT_TGT_SEQ_NO 

After analyzing this query, you may get something like this:

JoinTable1         JoinColumn1       JoinTable2    JoinColumn2     
GRAASRET       DELIB_REQ_NO      GRAASTAT      DELIB_REQ_NO    
GRTCODDT       CD            GRAASRET      DELIB_SLCT      
CMUSERMT       USERID        GRAASTAT      REGID              
GRAASPST       ASSMT_NO      GRAASTAT      ASSMT_NO        
GRAASSMT       ASSMT_NO      GRAASTAT      ASSMT_NO        
GRAASSMT       ASSMT_TGT_SEQ_NO  GRAASTAT      ASSMT_TGT_SEQ_NO

You can check this demo for detailed information.

James
  • 36
  • 1