General discussion of SQL Topics; aimed for the novice-intermediate level Microsoft SQL Server User. Currently focuses on using SQL Server 2005.

Thursday, October 23, 2008

TSQL Code: Compare Tables of Databases on Separate Servers

NOTE: This script was originally developed to compare table row counts, but with the below mentioned modifications this could be helpful if you have a database that is shipped to another server and often times needs to be ETL into the other database and you are worried there may be records that aren't getting transferred properly (or at all).

The other day I came across a question on how to compare the tables in one database to the tables in another database, this was in concerns to a migration project. This person had a database that existed on an older server running SQL 2000 and they chose to migrate their database to a new server running SQL Server 2005. They had wanted to be able to somehow be able to visually display a comparison of the two databases to prove the conversion was successful, and also to display the comparisons of the tables. Results meeting above requirements are shown in Figure 1.

AW_Compare_All 
Figure 1 (results of running sp_CompareTables_All)

In response to this I had started to develop a TSQL script that created a stored procedure that would allow you to run it from the current database and specify the location of the original database. The script would then collect the tables and count the records within each table for both databases. It will display the table names on the outside (left and right) of the results, and the record counts will be next to each other on the inside (next to their table names); which will allow for a very simple and pleasing visual comparison. I wanted, and did, avoid using cursors or other techniques that will potentially bog down the system resources (such as the stored proc 'sp_MSForEachDB' and 'sp_MSForEachTable').

Now, this is a rough draft that I had thrown together and tested over a lunch period; so, there are some issues that can still be cleaned up on the script and it lacks the ability to detect if a table exists on one database, but not the other.

At the end of the script I'll also provide a quick comment that will allow you to change the results from displaying all tables and their row counts to display only the tables with mismatched row counts (which may be useful if you want to use this script as a method to troubleshoot databases that tend to not migrate all records).

The first problem faced is how to best access the server with the original database; while there are many options I chose to use the built-in stored procedure "sp_addlinkedserver". This procedure seems to be simple to implement and allows for seamless integration into TSQL code. Testing for this script was performed on the AdventureWorks database (developed for SQL 2005) on both test systems. The 'originating' database was on a SQL 2005 instance (although testing was briefly performed on SQL 2000 and SQL 2008 to validate compatibility). The 'originating' server is called TestServer35, the database is on an instance called Dev05; the database for both instances is called AdventureWorks2005. This information becomes important when using the sp_addlinkedserver command. I used the following TSQL code:

EXEC sp_addlinkedserver   
   
@server='TestServer35-5', 
   
@srvproduct='',
   
@catalog='AdventureWorks2005',
   
@provider='SQLNCLI', 
   
@datasrc='Server35\Dev05'
 

As you can see, the linked server is referenced as TestServer35-5. We will use this reference, in a four-part identifier (Server.Catalog.Schema.Table). The next obstacle is to obtain a listing of tables and their row counts. I used a script I had modified last year to perform this since this script will run both on SQL 2005 and SQL 2000 (you can view my script on SQLServerCentral.com's Script section at: http://www.sqlservercentral.com/scripts/Administration/61766/). I then take the results of this and store them into a temporary table; I also do this for the new database (on local server where this stored proc is running at).

Then comes the simple part of joining the two temp tables into a final temp table. I chose this route because I wanted to have the two database in separate temp tables in the event I want to work with that data, which I will be working with the data in my update to determine if a table is missing from one of the databases.

Here is the TSQL code I used (remember if you want to use this you will need to change the linked server information to the correct information; as well as to create this stored proc in the appropriate database):

--Change the database name to the appropriate databse
USE [AdventureWorks2005];
GO

CREATE PROCEDURE sp_CompareTables_all
 
AS
 
 
CREATE TABLE #tblNew
( tblName varchar(50), CountRows int )

INSERT INTO #tblNew
( tblName, CountRows)
SELECT o.name AS "Table Name", i.rowcnt AS "Row Count"
FROM sysobjects o, sysindexes i
WHERE i.id = o.id
AND indid IN(0,1)
AND xtype = 'u'
AND o.name <> 'sysdiagrams'
ORDER BY o.name

CREATE TABLE #tblOLD
( tblName varchar(50), CountRows int )

INSERT INTO #tblOLD
( tblName, CountRows)
SELECT lo.name AS "Table Name", li.rowcnt AS "Row Count"
--********
--Replace TestServer35-5 and AdventureWorks2005 below with your appropriate values
--********
FROM [TestServer35-5].[AdventureWorks2005].[dbo].[sysobjects] lo, 
   
[TestServer35-5].[AdventureWorks2005].[dbo].[sysindexes] li
WHERE li.id = lo.id
AND indid IN(0,1)
AND xtype = 'u'
AND lo.name <> 'sysdiagrams'
ORDER BY lo.name

CREATE TABLE #tblDiff
( OldTable varchar(50), OldRowCount int, NewRowCount int, NewTableName varchar(50))

INSERT INTO #tblDiff
( OldTable, OldRowCount, NewRowCount, NewTableName )
SELECT ol.tblName, ol.CountRows, nw.CountRows, nw.tblName
From    #tblNew nw
JOIN #tblOLD ol
ON (ol.tblName = nw.tblName AND ol.CountRows = nw.CountRows)
        
SELECT * FROM #tblDiff
        
DROP TABLE #tblNEW
DROP TABLE #tblOLD
DROP TABLE #tblDiff

 

You simply execute the code with the following TSQL:

EXECUTE sp_CompareTables_All
 

The results of this script are shown in Figure 1 (above).

Now, this is great if you want to have a list that you can go through yourself to verify each table matches in row counts. But, what if that database has 1000 or more tables? What if you are just, simply put, lazy? Why not utilize SQL Server to process this information for you?

Well, I sure enough did just that. With a very small modification to this script you can easily have it only display any tables that don't match up in record counts.

All you have to do is change the INSERT INTO #tblDiff block's "ON" statement to join if the CountRows are NOT equal. The following is the modified block of code; the remaining stored procedure remains the same:

ON (ol.tblName = nw.tblName AND ol.CountRows <> nw.CountRows)
 

I did also rename the stored procedure from "sp_CompareTables_All" to "sp_CompareTables_Diff", but this is optional for your own ability to clarify which stored proc is being used.

To get some results I had made a few modifications to the AdventureWorks2005 database. I had added a couple of rows to a table, and removed some rows from two tables. The results of the stored proc showing only the different tables are shown in Figure 2.

AW_Compare_Diff
Figure 2 (results of running sp_CompareTables_Diff)

As you can see the ability to change this script to show all tables or only different tables is very simple. Even setting up this script is simple, where the hardest part of the whole thing is adding a linked server (which is fairly simple also).

In a future post I'll revisit this script and include the ability to display tables that exist on one database, but not in the other. Be sure to check back for this update.

Until next time...Happy Coding!!

Friday, October 17, 2008

Searching text of programmable objects...

This is a follow-up posting to my article "Finding and Listing ALL Procedures, Functions, and Views in SQL Server 2005" I wrote for SQLServerCentral.com (http://www.sqlservercentral.com/articles/T-SQL/63471/). This will work with both SQL Server 2005 and SQL Server 2008.

I've found a few instances where I would've liked to be able to search through a Stored Procedure or a View for specific text in the definition; specifically to check to see if there was a reference to another Stored Procedure or View. Originally to solve this problem I had just queried the syscomments legacy view for the search string joining with sysobjects, while this was effective it has its limitations...such as this only searches objects created post-release of SQL Server. So, what happens if you want to search for a specific string in ALL of the programmable objects? Well, then we fall back to our trusty view of sys.all_objects! We can join it with sys.all_sql_modules and get almost all of what we need...almost.

Now, we can do a clever query that will join sys.all_objects and sys.all_sql_modules. Something like:

DECLARE @SearchString NVARCHAR(MAX)
SET @SearchString = 'ENTER_SEARCH_TEXT_HERE'
SELECT [ao].[name], [asm].[definition]
FROM sys.all_objects ao
JOIN sys.all_sql_modules asm
ON [ao].[object_id] = [asm].[object_id]
WHERE [asm].[definition] LIKE '%' + @SearchString + '%'
--AND [ao].[is_ms_shipped] = 0 --Only search user & post SQL release objects, or
--Change the above value to 1 to only include objects provided by Microsoft with the release of SQL Server
--Optional  to limit search results....
--AND [ao].[type] = 'P' --Search only Stored Procedures
--AND [ao].[type] <> 'V' --Exclude Views
--See referenced article for listing of additional object types

This would do the job for the most part, the only missing items would be constraints and rules for the most part; which how many times do you really need to search a constraint for a specific string of text?

Now, the execution plan shows that this can be a little bit of a resource monger. Here's the estimated execution plan I get for this:

Plan_For_Defintion_Search_by_Joins

This Execution Plan is OK; there's a lot of Clustered Index Scans and Compute Scalars. You might be able to improve upon this with some ingenious Indexing, hints and such...but, again...why bother? You'll have to upkeep it and monitor yet another query plan. So, I'm not too keen on this plan myself and I don't like that this query still doesn't do everything...I want everything or nothing (start my pouting and tantrum).

As I said, I want it all. So the problem for me here is that there is that 1 in 1,000,000 chance I'll need to search a constraint and then that means I'll have to either modify my code or discover a new method...why bother doing it again later if we can just make it do what we want now. Plus, and this may be more important for myself; I want something that is easier to read and understand at a later time and uses less resources. Time and resources is the name of the game we play as DBAs!

How do we solve this? Can it be done? It sure can! Enter the sacred "OBJECT_DEFINITION" function! This little function helps us out a lot by eliminating the need to manually join tables and the return results is the exact column (definition column) that we are wanting also! Let's take a look at the same search function using this syntax instead of joins:

DECLARE @SearchString NVARCHAR(MAX)
SET @SearchString = 'ENTER_SEARCH_TEXT_HERE'
SELECT [name], OBJECT_DEFINITION(OBJECT_ID([name]))
FROM sys.all_objects
WHERE OBJECT_DEFINITION(object_id([name])) LIKE '%' + @SearchString + '%'
--Optional  to limit search results....
--AND [is_ms_shipped] = 0 --Only search user & post SQL release objects
--Change the above value to 1 to only include objects provided by Microsoft with the release of SQL Server
--AND [type] = 'P' --Search only Stored Procedures
--AND [type] <> 'V' --Exclude Views
--See referenced article for listing of additional object types

As you can see, this is a lot cleaner to read. We can easily tell what is being queried and how the query results should be returned to us. Not much to look at here; we query the sys.all_objects view to find the proper object and then tell SQL that we want to view the object's definition...simple enough.

Now, let's look at the execution plan for this query. We know the query is straight forward, I'm guessing the execution plan will be also! Let's see:

Plan_For_Defintion_Search_by_Object_Definition

Well, it definitely is a smaller execution plan. Almost 1/2 the plan of the original query! Not bad at all. I'd venture a guess that this will be much less resource intense on our machine, and that means less time! I like less time!

Now, we can see that using SQL Server's "Object_Definition" function improves the ability to search the definitions. I'm sure you can see with the above that when possible use a SQL built-in function, these are here to help us. As they say...'why reinvent the wheel?'; I'd say for sure that this object_definition function is already optimized to run in SQL and that it's a fairly safe bet that we DBAs wouldn't be able to optimize our JOIN statement to this level...and if you happen to be able to even come close to the same optimization, I'd be willing to wager that you spent more time than it was worth to get to that point. With that said, it brings me back to my original statement...why bother?

Keep looking for those built-in functions, you'll find there are many of them inside SQL Server that will not only make your DBA life easier and quicker, but also more fun!

Until next time, Happy Coding!!!