General discussion of SQL Topics; aimed for the novice-intermediate level Microsoft SQL Server User. Currently focuses on using SQL Server 2005.

Wednesday, December 31, 2008

Updated: Search Stored Procedures for Dependencies / References

In a discussion on the MSDN Forums I had posted my original code to search a Stored Procedure (search any object within SQL really) for specific text (http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/847b421d-332d-40f8-9649-b26fc2306920). You can view my original blog post describing this code block at: http://sqln.blogspot.com/2008/10/searching-text-of-programmable-objects.html.

Adam Haines had posted a wonderful enhancement to my T-SQL code. He had returned the matching text result as an XML instead of a text, this allows the user to click on the returned text and have the resulting text appear in a new window.

The new code, with the XML returned results, is:

DECLARE @StringToSearch NVARCHAR(MAX)
SET @StringToSearch = 'ENTER_SEARCH_TEXT_HERE'

SELECT    
   
[name],   
   
(   
       
SELECT    
           
OBJECT_DEFINITION(obj2.object_id) AS [text()]   
       
from sys.all_objects obj2   
       
where obj2.object_id = obj1.object_id   
       
FOR XML PATH(''), TYPE   
   
) AS Obj_text,   
   
[type] as ObjType    
FROM sys.all_objects obj1  
WHERE OBJECT_DEFINITION(object_id([name])) LIKE '%' + @StringToSearch + '%'
--Optional to limit search results....    
--AND [is_ms_shipped] = 0 --Only search user & post SQL release objects
--Change above to 1 if want to search only Microsoft provided objects released with SQL Server
--AND [type] = 'P' --Search only Stored Procedures
--AND [type] <> 'V' --Exclude Views
--See referenced article for listing of additional object types
--http://www.sqlservercentral.com/articles/T-SQL/63471/

Thanks again for this enhancement Adam!

Until next time, Happy Coding!!!

Friday, December 12, 2008

Books Online could be better...and we, the users, already can make it better!

Yesterday, Kalen Delaney wrote a small article discussing if MS should or shouldn't build in more warning messages or taking better actions in warning users about potential problems from performing tasks/commands within SQL Server. You can read her article at: http://www.sqlmag.com/Articles/ArticleID/101033/101033.html

I agree that to a certain extent the DBA is responsible for the database and should know how to use the software thoroughly...so ideally the database would never be at an unknown risk of data loss. Unfortunately, SQL Server has jumped from the realm of an application to the realm of a data platform! This changes things quite a bit; there is no realistic expectation that a single DBA (or even a small team of DBAs) can possibly know the database software and all of its commands and internal workings 100%. So there is some portion of relief that Microsoft must provide...rather it is a method of warning the DBA from messages, errors, or warnings I don't know or speculate a foreseeable method. I do agree that Books Online (BOL) is considered the SQL Server documentation and it is Microsoft's responsibility to ensure that the documentation provides all of the information (including inner workings where necessary) of commands and features that have potential unsafe effects.

The question is where does this responsibility end? Also, where does it fall onto the DBA (or end-user) shoulders to piece together all of this information and understand how they are utilizing the data platform and the data loss potential is created by this usage?

The first thing that comes to my mind is that Microsoft has taken a step in adding a way for the user's of BOL to add in their content that can point this type of stuff out. Why is it that we DBAs aren't doing this? I hope after reading this maybe our community will take a moment or two to place a useful comment (especially warning of potential data loss areas) in BOL for the other less knowledgeable DBAs.

Until next time, Happy Coding!

Thursday, October 23, 2008

TSQL Code: Compare Tables of Databases on Separate Servers

NOTE: This script was originally developed to compare table row counts, but with the below mentioned modifications this could be helpful if you have a database that is shipped to another server and often times needs to be ETL into the other database and you are worried there may be records that aren't getting transferred properly (or at all).

The other day I came across a question on how to compare the tables in one database to the tables in another database, this was in concerns to a migration project. This person had a database that existed on an older server running SQL 2000 and they chose to migrate their database to a new server running SQL Server 2005. They had wanted to be able to somehow be able to visually display a comparison of the two databases to prove the conversion was successful, and also to display the comparisons of the tables. Results meeting above requirements are shown in Figure 1.

AW_Compare_All 
Figure 1 (results of running sp_CompareTables_All)

In response to this I had started to develop a TSQL script that created a stored procedure that would allow you to run it from the current database and specify the location of the original database. The script would then collect the tables and count the records within each table for both databases. It will display the table names on the outside (left and right) of the results, and the record counts will be next to each other on the inside (next to their table names); which will allow for a very simple and pleasing visual comparison. I wanted, and did, avoid using cursors or other techniques that will potentially bog down the system resources (such as the stored proc 'sp_MSForEachDB' and 'sp_MSForEachTable').

Now, this is a rough draft that I had thrown together and tested over a lunch period; so, there are some issues that can still be cleaned up on the script and it lacks the ability to detect if a table exists on one database, but not the other.

At the end of the script I'll also provide a quick comment that will allow you to change the results from displaying all tables and their row counts to display only the tables with mismatched row counts (which may be useful if you want to use this script as a method to troubleshoot databases that tend to not migrate all records).

The first problem faced is how to best access the server with the original database; while there are many options I chose to use the built-in stored procedure "sp_addlinkedserver". This procedure seems to be simple to implement and allows for seamless integration into TSQL code. Testing for this script was performed on the AdventureWorks database (developed for SQL 2005) on both test systems. The 'originating' database was on a SQL 2005 instance (although testing was briefly performed on SQL 2000 and SQL 2008 to validate compatibility). The 'originating' server is called TestServer35, the database is on an instance called Dev05; the database for both instances is called AdventureWorks2005. This information becomes important when using the sp_addlinkedserver command. I used the following TSQL code:

EXEC sp_addlinkedserver   
   
@server='TestServer35-5', 
   
@srvproduct='',
   
@catalog='AdventureWorks2005',
   
@provider='SQLNCLI', 
   
@datasrc='Server35\Dev05'
 

As you can see, the linked server is referenced as TestServer35-5. We will use this reference, in a four-part identifier (Server.Catalog.Schema.Table). The next obstacle is to obtain a listing of tables and their row counts. I used a script I had modified last year to perform this since this script will run both on SQL 2005 and SQL 2000 (you can view my script on SQLServerCentral.com's Script section at: http://www.sqlservercentral.com/scripts/Administration/61766/). I then take the results of this and store them into a temporary table; I also do this for the new database (on local server where this stored proc is running at).

Then comes the simple part of joining the two temp tables into a final temp table. I chose this route because I wanted to have the two database in separate temp tables in the event I want to work with that data, which I will be working with the data in my update to determine if a table is missing from one of the databases.

Here is the TSQL code I used (remember if you want to use this you will need to change the linked server information to the correct information; as well as to create this stored proc in the appropriate database):

--Change the database name to the appropriate databse
USE [AdventureWorks2005];
GO

CREATE PROCEDURE sp_CompareTables_all
 
AS
 
 
CREATE TABLE #tblNew
( tblName varchar(50), CountRows int )

INSERT INTO #tblNew
( tblName, CountRows)
SELECT o.name AS "Table Name", i.rowcnt AS "Row Count"
FROM sysobjects o, sysindexes i
WHERE i.id = o.id
AND indid IN(0,1)
AND xtype = 'u'
AND o.name <> 'sysdiagrams'
ORDER BY o.name

CREATE TABLE #tblOLD
( tblName varchar(50), CountRows int )

INSERT INTO #tblOLD
( tblName, CountRows)
SELECT lo.name AS "Table Name", li.rowcnt AS "Row Count"
--********
--Replace TestServer35-5 and AdventureWorks2005 below with your appropriate values
--********
FROM [TestServer35-5].[AdventureWorks2005].[dbo].[sysobjects] lo, 
   
[TestServer35-5].[AdventureWorks2005].[dbo].[sysindexes] li
WHERE li.id = lo.id
AND indid IN(0,1)
AND xtype = 'u'
AND lo.name <> 'sysdiagrams'
ORDER BY lo.name

CREATE TABLE #tblDiff
( OldTable varchar(50), OldRowCount int, NewRowCount int, NewTableName varchar(50))

INSERT INTO #tblDiff
( OldTable, OldRowCount, NewRowCount, NewTableName )
SELECT ol.tblName, ol.CountRows, nw.CountRows, nw.tblName
From    #tblNew nw
JOIN #tblOLD ol
ON (ol.tblName = nw.tblName AND ol.CountRows = nw.CountRows)
        
SELECT * FROM #tblDiff
        
DROP TABLE #tblNEW
DROP TABLE #tblOLD
DROP TABLE #tblDiff

 

You simply execute the code with the following TSQL:

EXECUTE sp_CompareTables_All
 

The results of this script are shown in Figure 1 (above).

Now, this is great if you want to have a list that you can go through yourself to verify each table matches in row counts. But, what if that database has 1000 or more tables? What if you are just, simply put, lazy? Why not utilize SQL Server to process this information for you?

Well, I sure enough did just that. With a very small modification to this script you can easily have it only display any tables that don't match up in record counts.

All you have to do is change the INSERT INTO #tblDiff block's "ON" statement to join if the CountRows are NOT equal. The following is the modified block of code; the remaining stored procedure remains the same:

ON (ol.tblName = nw.tblName AND ol.CountRows <> nw.CountRows)
 

I did also rename the stored procedure from "sp_CompareTables_All" to "sp_CompareTables_Diff", but this is optional for your own ability to clarify which stored proc is being used.

To get some results I had made a few modifications to the AdventureWorks2005 database. I had added a couple of rows to a table, and removed some rows from two tables. The results of the stored proc showing only the different tables are shown in Figure 2.

AW_Compare_Diff
Figure 2 (results of running sp_CompareTables_Diff)

As you can see the ability to change this script to show all tables or only different tables is very simple. Even setting up this script is simple, where the hardest part of the whole thing is adding a linked server (which is fairly simple also).

In a future post I'll revisit this script and include the ability to display tables that exist on one database, but not in the other. Be sure to check back for this update.

Until next time...Happy Coding!!

Friday, October 17, 2008

Searching text of programmable objects...

This is a follow-up posting to my article "Finding and Listing ALL Procedures, Functions, and Views in SQL Server 2005" I wrote for SQLServerCentral.com (http://www.sqlservercentral.com/articles/T-SQL/63471/). This will work with both SQL Server 2005 and SQL Server 2008.

I've found a few instances where I would've liked to be able to search through a Stored Procedure or a View for specific text in the definition; specifically to check to see if there was a reference to another Stored Procedure or View. Originally to solve this problem I had just queried the syscomments legacy view for the search string joining with sysobjects, while this was effective it has its limitations...such as this only searches objects created post-release of SQL Server. So, what happens if you want to search for a specific string in ALL of the programmable objects? Well, then we fall back to our trusty view of sys.all_objects! We can join it with sys.all_sql_modules and get almost all of what we need...almost.

Now, we can do a clever query that will join sys.all_objects and sys.all_sql_modules. Something like:

DECLARE @SearchString NVARCHAR(MAX)
SET @SearchString = 'ENTER_SEARCH_TEXT_HERE'
SELECT [ao].[name], [asm].[definition]
FROM sys.all_objects ao
JOIN sys.all_sql_modules asm
ON [ao].[object_id] = [asm].[object_id]
WHERE [asm].[definition] LIKE '%' + @SearchString + '%'
--AND [ao].[is_ms_shipped] = 0 --Only search user & post SQL release objects, or
--Change the above value to 1 to only include objects provided by Microsoft with the release of SQL Server
--Optional  to limit search results....
--AND [ao].[type] = 'P' --Search only Stored Procedures
--AND [ao].[type] <> 'V' --Exclude Views
--See referenced article for listing of additional object types

This would do the job for the most part, the only missing items would be constraints and rules for the most part; which how many times do you really need to search a constraint for a specific string of text?

Now, the execution plan shows that this can be a little bit of a resource monger. Here's the estimated execution plan I get for this:

Plan_For_Defintion_Search_by_Joins

This Execution Plan is OK; there's a lot of Clustered Index Scans and Compute Scalars. You might be able to improve upon this with some ingenious Indexing, hints and such...but, again...why bother? You'll have to upkeep it and monitor yet another query plan. So, I'm not too keen on this plan myself and I don't like that this query still doesn't do everything...I want everything or nothing (start my pouting and tantrum).

As I said, I want it all. So the problem for me here is that there is that 1 in 1,000,000 chance I'll need to search a constraint and then that means I'll have to either modify my code or discover a new method...why bother doing it again later if we can just make it do what we want now. Plus, and this may be more important for myself; I want something that is easier to read and understand at a later time and uses less resources. Time and resources is the name of the game we play as DBAs!

How do we solve this? Can it be done? It sure can! Enter the sacred "OBJECT_DEFINITION" function! This little function helps us out a lot by eliminating the need to manually join tables and the return results is the exact column (definition column) that we are wanting also! Let's take a look at the same search function using this syntax instead of joins:

DECLARE @SearchString NVARCHAR(MAX)
SET @SearchString = 'ENTER_SEARCH_TEXT_HERE'
SELECT [name], OBJECT_DEFINITION(OBJECT_ID([name]))
FROM sys.all_objects
WHERE OBJECT_DEFINITION(object_id([name])) LIKE '%' + @SearchString + '%'
--Optional  to limit search results....
--AND [is_ms_shipped] = 0 --Only search user & post SQL release objects
--Change the above value to 1 to only include objects provided by Microsoft with the release of SQL Server
--AND [type] = 'P' --Search only Stored Procedures
--AND [type] <> 'V' --Exclude Views
--See referenced article for listing of additional object types

As you can see, this is a lot cleaner to read. We can easily tell what is being queried and how the query results should be returned to us. Not much to look at here; we query the sys.all_objects view to find the proper object and then tell SQL that we want to view the object's definition...simple enough.

Now, let's look at the execution plan for this query. We know the query is straight forward, I'm guessing the execution plan will be also! Let's see:

Plan_For_Defintion_Search_by_Object_Definition

Well, it definitely is a smaller execution plan. Almost 1/2 the plan of the original query! Not bad at all. I'd venture a guess that this will be much less resource intense on our machine, and that means less time! I like less time!

Now, we can see that using SQL Server's "Object_Definition" function improves the ability to search the definitions. I'm sure you can see with the above that when possible use a SQL built-in function, these are here to help us. As they say...'why reinvent the wheel?'; I'd say for sure that this object_definition function is already optimized to run in SQL and that it's a fairly safe bet that we DBAs wouldn't be able to optimize our JOIN statement to this level...and if you happen to be able to even come close to the same optimization, I'd be willing to wager that you spent more time than it was worth to get to that point. With that said, it brings me back to my original statement...why bother?

Keep looking for those built-in functions, you'll find there are many of them inside SQL Server that will not only make your DBA life easier and quicker, but also more fun!

Until next time, Happy Coding!!!

Friday, August 22, 2008

VS/SQL 2008 Developers need more space!

I’m not just talking about needing a larger office (although, that couldn’t hurt). I’m talking about hard drive space!

With the release of SQL Server 2008; I’ve officially moved over my trusty development machine to the latest and greatest MS technology, Visual Studio 2008 Professional Edition and SQL Server 2008 Developer Edition.

Before you do the same, make sure you have plenty of Hard Drive space[1]. If you already have Visual Studio 2008 installed and then you try to install SQL 2008 you’ll get a friendly little message that states you MUST have VS 2008 Service Pack 1 installed. I’m not quite sure why you MUST have VS 2008 SP 1 installed to run SQL Server 2008 and VS 2008 on the same machine; but, if I do find a reason (other than the guesses of to enable some sort of feature or the ability for the two applications to see each other) I’ll post an update. So, now you know the approximate hard drive space it will take to install these applications.

Here is a simple break down of the hard drive space requirement for each product I installed:

Product Name Hard Drive Space Required URL with Hard Drive Requirement Specifications
Visual Studio 2008 (Professional Edition) 2.2 GB http://msdn.microsoft.com/en-us/vs2008/products/bb894726.aspx
Visual Studio 2008 Service Pack 1 3.8 GB (plus 600 MB on installed Hard Drive for VS 2008) http://download.microsoft.com/download/A/2/8/A2807F78-C861-4B66-9B31-9205C3F22252/VS2008SP1Readme.htm
SQL Server 2008 (Developer Edition) 2.0 GB (required even if installing components to other hard drives) http://msdn.microsoft.com/en-us/library/ms143506.aspx

As you can see, VS 2008 SP 1 is no small update. Also note that these hard drive space requirements are based on installing all features/options of each application and the final installation space used will typically be smaller than the requirements specifications. Requirements typically include space required for temporary files during installation of the product.

This comes to a total of around 8 GB of space during installation, and around 6.5 GB upon completion of installation. This, of course, does not include any project files, database files or add-ons.

What if hard drive space is a premium and you just want to test out these new applications? How do you use less space? The simplest way is to use the Express editions of each product and only install the languages/features you absolutely need or want to use/learn. A language such as C++ being installed for a developer who only knows and uses VB would seem pointless, unless that developer is intending to learn and develop with C++.

Is there a way to bypass the requirement to install VS 2008 SP 1? Well, I haven’t tried it myself; but, conceptually, if you install SQL 2008 first, then install VS 2008 you won’t get any messages or requirements during either product installation to have SP 1 installed. Now, that doesn’t mean that VS and SQL will play nice with each other. It also doesn’t mean that you won’t get some sort of message or error at a later time stating that you need to install SP 1. As mentioned earlier, I don’t know why SQL 2008 installation requires VS 2008 SP 1 to be installed with a machine that has VS 2008; I’d imagine there is a good reason though. Try at your own risk.

Until next time, Happy Coding!


[1] The installation hard drive size requirement is based on my personal experience and may vary per installation.

Tuesday, August 12, 2008

SQL Server 2008 Officially RTM!

On August 11, 2008 Microsoft announced through their MSDN Flash newsletter that SQL Server 2008 has been RTM (Released To Manufacturing). This has been a long awaited release for many of us in the SQL community! Congratulations to Buck Woody and the many, many others on the SQL development team on this release!

While August 11, 2008 is probably the official RTM date, those of you that have an MSDN subscription may have been enjoying the RTM a week earlier (August 6th, 2008 through the MSDN Subscriptions website) as announced by Buck Woody's blog (Carpe Datum). You can view the post at: http://blogs.msdn.com/buckwoody/archive/2008/08/06/news-flass-sql-server-2008-is-here.aspx.

Undoubtedly, you can find many blogs and posts on the new features of SQL Server 2008. This looks to have a great improvement on SQL Server 2005; in just about every area. There is improvement on security, developing, and administering; to name a few. There seems to be a little bit of something for everyone to get excited about, ranging from Business Intelligence for the enterprise users to Intellisense in TSQL syntax for the programming DBA to Resource Governing for the IT Administrators.

I look forward to using SQL Server 2008, and to enjoying a little more easily administrated database with tons of new features, flexibility, and control!

Until next time...Happy Coding!

Wednesday, July 2, 2008

Publication: Introduction to the Transaction Log...

Today SQLServerCentral.com (SSC) had published my article that introduces the Transaction Log. The primary purpose of this article was to help the newer DBAs understand how the transaction log works, and to clarify some commonly misunderstood parts of maintaining the transaction log.

This article covers a very broad and general introduction; including how the log is filled by SQL Server, easy to follow analogies to help grasp how transaction logs work with SQL Server, and some Good Practices to use when Transaction Logs are in use by SQL Server.

You can read the entire article at: http://www.sqlservercentral.com/articles/Design+and+Theory/63350/

Please feel free to join in the discussion for this article; there are some great posts that help to further clarify the topic and also provides additional information that was not included with this article. You can join the discussion at: http://www.sqlservercentral.com/Forums/FindPost527448.aspx.

Tuesday, July 1, 2008

Concept: Finding the Right Order...

The other day I was helping out a friend of mine with a space issue on their server, and one of the topics I discussed was the importance of find the best order to perform the operations in to utilize the least amount of extra hard drive space. While discussing this it came to mind that often times while plugging away with SQL we can get so intent on getting the job done (often times in the first way that works, while this is not necessarily the best way), that we might take for granted all of that hard drive space and server power available to us!

With just a little bit of forethought, planning, and understanding what we are trying to accomplish, we can actually use less disk space...and we could also potentially improve the performance we get out of SQL Server! I don't know about you, but getting better performance always ranks high on my list of desired accomplishments. I think this can be best demonstrated with an example....

For simplicity reasons I will use a database with a single table that each record being inserted will equal 1 MB of hard drive space. Thus, 1,000,000 records in our "simple database" will result in a database size of 1,000,000 MB. Now, let's say that on average this database grows by about 100,000 records per year. This would mean that with 1,000,000 records this database has been used for about 10 years. Now, assume there is a decent DR in place and the transaction log is kept at a fairly decent size of maybe 1,000 MB to accommodate for a months worth of data before the FULL backup operation. Simple enough, right?

So, 10 years has passed and the boss walks in and says they just merged with another company that has an identical database setup and stats with only different data and the two databases will be merged (since this is hypothetical, we will pretend that a merge of the data can be done without any issues and it would simply double our database size). Also, the boss says that they are archiving 5 years worth of data from both databases to match the business rules for hard copy storage on site and archiving. And lastly, the boss says "...because you don't look busy..." that they only want to have enough space in the database to allow for 2 years of data to be added and that they need the extra hard drive space available for other needs.

From a first glance you might say not a problem. We'll just merge the two databases, take out 5 years worth of data, perform a FULL backup, shrink the transaction log (leaving enough free space for a month worth of data), and then shrink the database down (leaving enough room for 2 years worth of data to be added later). I'm going to ignore the transaction log size in this analysis because there will typically be one out come of the size regardless of the steps taken; and this will differ based on the database recovery mode and method taken to perform these steps; and to analyze each possibility is beyond the scope and purpose of this blog. Now, let's look at how this might look on our system after each process...

First step in the above scenario would be to merge the two databases (remember that since the databases are originally from different companies, they are physically stored on separate hard drives).

HD Space Used: 2,000,000 MB
SQL Record Count: 2,000,000 records
SQL Space Used: 2,000,000 MB

Second step would be to take out 5 years worth of data.

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Third step would be to perform a FULL backup (this is based on a FULL recovery model).

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Fourth and Fifth steps to shrink the log and database.

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

 

From the above we can see that the maximum HD space being used would be 2,000,000 MB. Which depending on our available HD space this wouldn't be too much of a problem. The real issue would be the amount of resources needed; since this is hypothetical we can only imagine from a step-by-step that the resources being used would be fairly moderate. There are too many variables to be accurate when it comes to actual performance measuring, so I won't even attempt to measure it.

Now, what if we put a little thought towards these requirements from the "boss", and especially what if the HD space available to use was 1,800,000 MB. Now, we have a problem that can easily be resolved with just finding the right order. In this case, just making a simple change of placing Step # 1 at the very end and performing an extra (final) FULL backup and a (final) transaction log shrink will allow us to remain under 1,800,000 MB and also can save our server a lot of processing time, because it will have to process less records. We can also utilize the secondary server (the server holding the data from the company being merged with) processing power to cut total time being required to archive and thus decreasing the time required to merge because of the lower number of records being merged. Here is how the steps would look....

Step # 1: Remove data from each database at it's original location

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 2: Perform FULL backups for each database

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 3: Shrink the transaction log and databases

HD Space Used (per Server): 500,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 500,000 MB

Step # 4: Merge the two databases

HD Space Used: 1,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Step # 5: Perform FULL backups for database and shrink transaction log

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

 

Conclusion:

Now, you can easily see in the above scenario that we only use a maximum of 1,200,000 MB of space and that is because we are reserving that free space. Logically, by performing the data archiving before merging the two databases we save our server processing power because it will process ONLY that of which will remain in the database. Sometimes the order of operations will already be pre-determined from outside factors, sometimes it may be required to perform certain tasks in a particular sequence.

I hope this illustrates that it's not always good enough to just find a way to make things work; and that with little planning in the beginning you can save yourself time, lower the requirements to perform the tasks at hand, and also improve the performance of your SQL Server!

Wednesday, June 18, 2008

Listing User-Defined Stored Procedures in SQL Server 2005

You can easily obtain a listing of the User-Defined Stored Procedures that are in any of your databases. In fact, Microsoft had made this information very easily accessible. You only need to access a System Table view called "Sys.Procedures". This information may be easily accessed using a small TSQL code, such as the following...

USE [your_database_name_here];
GO
SELECT * FROM sys.procedures
ORDER BY [name];
GO

Your results will vary based on what Stored Procedures you or anyone else with access to your database have created. The following is a sample of the results I have obtained when running this code on my 'model' database.

User-Defined Stored Procedures

The most important thing to keep in mind with this TSQL code is that each database may contain different User-Defined SPs.

Do you want to get a listing of EVERY single Stored Procedure in your database(s)? If so, then be sure to check out SQLServerCentral.com for one of my upcoming articles! I'll post the direct link to the article as soon as it becomes available!

You'll want to check this article out when it is published...you'll NEVER have to search the Internet again to find out what SPs are in SQL Server 2005!! UPDATE: You can read this article on SQLServerCentral.com (http://www.sqlservercentral.com/articles/T-SQL/63471/). Please leave feedback if you have a few moments.

TIP:

Do you want all of your databases you create (from here on...not previously created databases) to use a specified stored procedure or set of stored procedures? If so, then create the desired Stored Procedure(s) in your 'model' database. Now any new databases created will get the SP(s) you created in the 'model' database because your new databases are based on the 'model' database!

 

-----------------------

Legal Disclaimer/Warning
Please remember that when using an SP you are not sure the outcome or effect it has should be done on test systems; never use anything that has not been thoroughly tested in a production environment. I am not encouraging you to use any type of Stored Procedures (documented or undocumented); I am only informing you of the method that can be used to obtain a listing of all procedures that are found within SQL Server 2005. Microsoft strongly states that undocumented Stored Procedures, Extended Stored Procedures, functions, views, tables, columns, properties, or metadata are NOT to be used and do not have any associated support; and Microsoft Customer Support Services will not support any databases or applications that leverage or use any undocumented entry points. Please refer to: http://msdn.microsoft.com/en-us/library/ms166021.aspx for Microsoft's legal disclaimer and further information in Microsoft's support for use of stored procedures.

Thursday, May 22, 2008

Optimizing Indexes

Summary:

Optimizing Indexes is quite a complicated subject. There are many techniques; and what is the most difficult is that there are so many different scenarios as to when and how to optimize your indexes. Instead of trying to describe a method of how to optimize, I am going to discuss how to identify when optimizing indexes should be considered and where to go to find the different options.

The first step is NOT to just rebuild all indexes, or to defrag them all. There are some people that believe that rebuilding all indexes is the way to solve this; don't listen to them!! PLEASE DON"T!!! I'm rarely the person to give advice on not listening to other people; most people can help you learn about what's good or bad...but if you go the route of rebuilding and defraging all your indexes, you can in fact be causing additional problems that you didn't have to cause. In particular using system resources that don't need to be used, and causing unneeded fragmentation on the OS level! Any decent DBA is always concerned with using system resources, remember that system resources is always a precious commodity and should never be wasted.

The first step is to analyze the database. The next step is to identify the queries that can/will benefit from optimizing the index(es). The third step is to optimize the index. The final step is to periodically review these methods to identify when optimizing indexes will be required; this will be an ongoing project and will ultimately require re-iterrating through your queries to identify when an index has become warranted for optimization. There is no cut and dry method to warn you, prevent indexes from becoming out of tune, or to auto-optimize the queries and indexes.

Analyzing the Database:

So, how do you analyze a database to determine if an index needs to be rebuilt or should run a defrag on the index? Well, first you need to understand how indexes are built and what is causing them to become fragmented. Review my blog entry "Introduction To Indexes" to learn about how indexes are built. Also, take the time to review this white paper called "Microsoft SQL Server 2000 Index Defragmentation Best Practices"; even though it is written for SQL Server 2000, it will still apply the same to SQL Server 2005. It is a long read, but will help out well beyond the scope of this blog entry.

Here, in a nutshell, is how to determine when your index is fragmented. Use SQL Profiler to identify poor performing queries; in particular use the template "SQLProfilerTSQL_Duration", this will already contain the traces needed to identify the offending queries. Once the queries are identified you can then start looking into which indexes there queries are accessing; especially queries that are retrieving ranges instead of singletons. These queries are the highest risk of having fragmented indexes, remember though that this is just a method to determine the potential problem areas. Your own judgement will be best to make the final determination.

When to Consider Optimizing Indexes:

The first key in detecting that you should optimize your index is when you are observing performance degradation and have no absolute culprit to cause this. Now, keep in mind that just because you are seeing performance degradation doesn't immediately indicate that there is index fragmentation. This can only be determined by analyzing your database properly. Next, identify which queries are utilizing the most I/O; these are the next candidates. Then consider queries that have workloads that are suspected to benefit from index defrags; this can be quite difficult and should be carefully determined.

All of these are covered in the MSDN white paper called "Microsoft SQL Server 2000 Index Defragmentation Best Practices"; mentioned earlier. This paper even goes into very good detail on how to make these determinations, which will probably be enough to get your through while you gain experience.

Getting Help:

Sometimes it's nice to have someone more experienced help out, if that is how you feel then go with that feeling and seek out that someone.

If you don't have physical access to an experienced DBA then seek advice from trusted websites, forums, and/or discussion groups. Remember when seeking advice they can only provide an answer with quality that matches the information you provided. Such as stating you have a database that needs indexes rebuilt or defragged, will most likely get you answers stating to use DBCC commands or some other commonly used index rebuilding command. The purpose of seeking advice is to provide detailed information to get an answer that is specific to your scenario; so be sure to provide as much information as possible without breaking any company policies.

Conclusion:

As you can see, Indexes can be fairly simple to optimize. It's determining when to optimize and what to optimize that becomes difficult.

There are methods to fine-tune the auto-index handling of your indexes, this is covered in the white paper I mention in the next section. You'll always find different opinions and experiences, embrace all you can and mix and match what works best for you and your situation. There is no one-size fits all for optimizing indexes, just as there isn't one-size fits all for database solutions. It's all about customizing to your needs, and utilizing your available resources to make your work easier and more enjoyable.

I can't stress enough that reading the "Microsoft SQL Server 2000 Index Defragmentation Best Practices" white paper will help out tremendously. This paper covers the topic so well, that originally I had planned on providing tips on how to identify the queries that required indexes to be rebuilt, and how to find additional help on this topic. During my research I came across this white paper and it covered absolutely everything and more than I had planned to cover!

Until next time, Happy Coding!!

Additional Resources:

Microsoft SQL Server 2000 Index Defragmentation Best Practices (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx)
SQL Server Best Practices (http://msdn.microsoft.com/en-us/sqlserver/bb671432.aspx)
How To: Optimize SQL Indexes (http://msdn.microsoft.com/en-us/library/ms979195.aspx)
Database Journal: Reducing SQL Server Index Fragmentation (http://www.databasejournal.com/features/mssql/article.php/2238211)
SQLServer Performance: SQL Server Index Fragmentation and Its Resolution (http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx)

Tuesday, May 6, 2008

CREATE INDEX (Transact-SQL)

Summary:

This covers the syntax and some examples on how to create an Index for a table. I’ll finish this blog entry with an alternative method for creating an index using SQL Server Management Studio (SSMS).

Syntax:

The following is the main syntax for CREATE INDEX from Books Online (BOL). You can view the entire syntax by visiting the referenced link.
Reference: http://msdn.microsoft.com/en-us/library/ms188783.aspx
NOTE: If you do not understand how to read this syntax please review my blog entry “Understanding MSDN Syntax Conventions

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name

    ON <object> ( column [ ASC | DESC ] [ ,...n ] )

    [ INCLUDE ( column_name [ ,...n ] ) ]

    [ WITH ( <relational_index_option> [ ,...n ] ) ]

    [ ON { partition_scheme_name ( column_name )

         | filegroup_name

         | default

         }

    ]

[ ; ]

UNIQUE – The index key used may not be duplicated in another row.

CLUSTERED | NONCLUSTERED – When using CLUSTERED, the logical order of the key values determines the physical order of the rows in the table. You may only have a single clustered index per table. CLUSTERED indexes should be created before any NONCLUSTERED indexes. In cases where the CLUSTERED index is created after the NONCLUSTERED index, the NONCLUSTERED indexes will be rebuilt.
NOTE: CLUSTERED is the default, if you OMIT the [CLUSTERED | NONCLUSTERED] argument then SQL will first look for a CLUSTERED index, if not found it will create the index as CLUSTERED; if it CLUSTERED already exists then it will create the index as NONCLUSTERED.

index_name – Gives your index a name. A common practice is to prefix the name with “idx_” or “ix”. An example of an index for Last Names of a customer table might be called “idx_Customers_Last_Names” or “ix_Customers_Last_Names”.

<object> - Name of the table the index is being created for. This can be up to a four part name, such as Servername.DatabaseName.Schema.TableName, as with other commands you do not always have to fully qualify (type all four parts) the <object>. You only need to be able to determine the table, without causing confusion. Example if you have two tables called Customers, then you’d at minimum need to qualify the table using the SCHEMA_NAME; such as Colorado.Cusotmers and California.Customers.

column_name [ASC | DESC] – Specify the column, or columns, to make the index for. There is no minimum or limit to the number of columns you can specify. Typically it is suggested, for CLUSTERED indexes, to use only the columns that can logically be organized; such as the First_Name and Last_Name columns. In NONCLUSTERED indexes you usually want to add ‘helper’ columns; such as the Street_Address and Phone_Number. See my blog entry “Introduction to Indexes” for additional details in choosing the appropriate columns for indexing.

<relational_index_option> - You can specify additional options with this argument, this goes beyond the scope of this blog entry. I might cover this in a later entry. For now, if you want specific details as to what options you can use with this argument and how to use the argument then review the BOL syntax command at the referenced location above.

partition_scheme_name (column_name) | filegroup_name | default - You can specify the partition scheme and columns to include in your index. This goes beyond the scope of this blog entry. I might cover this in a later entry. For now, if you want specific details as to what options you can use with this argument and how to use the argument then review the BOL syntax command at the referenced location above.

Simple Terminology:

As you can see with just a little bit of the Syntax I’ve posted, this can be quite a complicated T-SQL command. Yet, it will be among the most commonly used throughout the creation and life-span of your tables. You’ll constantly find yourself tweaking your indexes as your needs change and the data defined within the database changes. You can think of indexes as a child to your table, as with all children they will grow in complication and evolve as their experiences grow. Indexes among a table can have the same phases of ‘life’ occur also.

As with the Customer’s table example you might originally only be storing the customer’s first name, last name, street address, city, state, zip. So you may have a CLUSTERED index on the last name, then first name columns. Maybe even create a NONCLUSTERED index for the street address.

Now, let’s say a couple of years later you find that you now want to store the customer’s phone number, fax number, maybe mailing lists columns with an opt-in designator for your mailing lists. Then you decide it would be nice to just look up customer’s by their phone numbers, or find the customer’s who’ve ‘opted-in’ to certain mailing lists. You might then create additional NONCLUSTERED indexes to make these searches more efficient. Especially the Mailing Lists opt-in columns (assuming you have 100s of mailing lists…following Normalization rules would mean this should be in a separate table; but for this example it is in the Customer’s table).

Now, let’s say a year later you decide to Normalize your Customer’s table and separate the Mailing Lists columns into a “Mailing_Lists” table. Obviously the indexes for the Mailing Lists won’t be needed in the Customer’s Table, thus you’d drop those indexes; and most likely you would’ve created the appropriate indexes in the “Mailing_Lists” table when you created the table.

As you can see, the indexes can have different reasons to be tweaked. I find most commonly I will look into tweaking the indexes when I have large queries running that are taking up resources. I can usually find an index that could be added or modified that can help to improve the efficiency of the search results being returned. There are many different methods and ways to determine when to use an index and how to optimize your indexes; I’d suggest trial and error (with test systems only) as a first option. I’d also suggest reading up on optimizing queries and/or SQL Server performance (in that order). Queries are what drive your data, what gives you your results.

It’s usually a good idea to be in the habit of obtaining performance information, especially in large databases and periodically review the usage of your indexes and adjust them as appropriate. There is no perfect formula, but there are many good methods and discussions on how to achieve the best performance. Always be willing to read and try to understand your options; and when possible spend time testing to see how things are affected by your changes. What might look good today, could end up causing problems you won’t see until a few days have passed…this is why I must stress…TEST, TEST, TEST!

Example Syntax:

The following will create a UNIQUE CLUSTERED index on the Customer’s table using the Last Name and First Name columns (notice the order of names is Last then First because of logical searches will typically be performed on the last name, and then the results would be sorted by the first name):

USE myDemoTable;

GO

CREATE UNIQUE CLUSTERED INDEX idx_Customers_Names

ON Customers (Last_Name, First_Name);

GO

The following example will create a NONCLUSTERED index on the Customers table using the Street Address column:

USE myDemoTable;

GO

CREATE NONCLUSTERED INDEX ix_Customers_Addresses

ON Customers (Street_Address);

GO

The following example creates a UNIQUE NONCLUSTERED index on the Customers table using the customer’s phone number column. This will ensure that none of our customers have a duplicate phone number as an existing customer already has:

USE myDemoTable;

GO

CREATE UNIQUE NONCLUSTERED INDEX idx_Customers_Phone_Numbers

ON Customers (Phone_Number);

GO

If you were to attempt to enter a new customer and use a phone number that already exists with another customer you will get a “Msg 2601, Level 14” error code that states you cannot insert a duplicate key.

Also, note that in the second example I used the prefix “ix_” and the other examples I used the prefix “idx_”. First, I used the “idx_” prefix because for my personal uses this means it is a UNIQUE index; thus anytime I see “idx_SOMETHING” I know it is a UNIQUE index and will not allow multiple keys. I use “ix_” to mean that it is NOT unique and is NONCLUSTERED.

Remember that CLUSTERED index is the default index type; however, I strongly recommend stating the type of index in every syntax command for two reasons. First, distinction can be easily made when reviewing the syntax at a later time. Second, just because CLUSTERED is the default right now does not guarantee it will be in future SQL Server releases. The less you leave to be interpreted the more compatible you can make your code for future releases (and for backwards compatibility in many cases).

Using SSMS to create your indexes:

You can create indexes within SSMS in several places. The more common areas to create indexes are: Database Engine Tuning Advisor, the Table Designer, and in Database Diagrams, as well as in Object Explorer.

The easiest method, in my opinion, is to create a new index using Object Explorer. In object explorer you will want to navigate to the table you want to create your index on. Expand the table by clicking on the plus sign to the immediate left of the table icon to show the folders containing the objects for that table (Columns, Keys, Constraints, etc). You will right-click on the folder labeled “indexes” and select “New Index…”, this will bring up a new window called “New Index”. Here you can name your index, add the columns for the index and choose many options to go with the creation of your index.

If the “New Index…” is grayed out when you right-click the “indexes” folder then this means you have the table opened in “Design” mode. If this is the case, you can either close the “Table Designer” window and then access the “New Index…” or you can right-click anywhere in the “Table Designer” and select “Indexes/Keys…”. This will bring up a slightly different window, but just as easy to follow to create your indexes.

To modify or delete an index you have two simple methods. If in Object Explorer you can select the Index name under the table’s “Index” folder and right-click the index name and select “Delete”. Otherwise, when in “Table Designer” you can right-click anywhere and select “Indexes/Keys…”, in the resulting window you will highlight the index name you wish to delete. You will then click on “Delete”. CAUTION: There is NO confirmation or UNDO in this window; once that button has been clicked you have removed your index. You cannot CANCEL out of the window either. So make sure this is what you want to do before clicking that button!

Conclusion:

Indexes are helpful, simple to create and very powerful in making your queries and database operate at a very efficient level. Anyone can quickly learn to create indexes, modify indexes, and to drop (delete) indexes. Most people will spend a fair amount of time reading about indexes when first learning about them, this is because they are so versatile in usage and can provide such a powerful result when leveraged correctly.

I suggest at minimum to try to understand how indexes are determined and how to optimize them. These are key aspects to indexes that can make the most difference. I also suggest for you to schedule in a regular review period for indexes on your most heavily accessed tables and queries. This doesn’t need to be daily, weekly, or even monthly; but, it should be done over some periodic time because your data needs and accessing will evolve as your database evolves.

Until next time, happy coding!

Friday, May 2, 2008

Introduction To Indexes

Summary:

This is a basic introduction to what Indexes are and how to determine what should be indexed. I will cover the basic concept behind indexes, how they are intended to be helpful, why you would want to use them, and how to determine what should be included in your index.

I’ll also cover the differences between Clustered and Non-Clustered indexes, and provide some tips that will help you to know how to differentiate when to use which kind of index, as well as an analogy to break down the differences between Clustered and Non-Clustered indexes in simple terms.

What indexes are and why to use them:

Indexes are intended to help you to efficiently find information within your table; they are meant help you to lower the amount of CPU resources needed to find this information, and also to minimize the amount of Input/Output (I/O) used to access this information. All of this can result in a much faster result being returned.

Indexes in SQL Server can be thought of in a similar fashion as an index in the back of a book or a table of contents in the front of the book. In the book and Index is intended to allow you to specify a word that you are interested in finding and point you to the page(s) that word has been referenced. A Table Of Contents can be used to specify a topic you are interested in, and will point you to the section covering that topic; some table of contents will even point you to sub-sections that refine the context of the topic that can help you to more accurately focus your reading on relevant information.

Indexes within SQL Server are designed to perform these same functions, and provide the same helpful information. Indexes are created either through a T-SQL command or through a form of interface application that connects directly to the database and supports T-SQL commands (such as SQL Server Management Studio, or Enterprise Management, etc). NOTE: I will cover the topics of “Creating Indexes” and “Tuning Indexes” in separate blog entries. Once indexes are created SQL Server will automatically use them to ‘help’ in returning result sets; there are no options you must specify to take advantage of the created indexes…this is all built-in to SQL Server.

There are some minor differences in SQL Server 2000 and SQL Server 2005; since I primarily use SQL Server 2005 my discussion is based on this version. However, all of this information can be confirmed with the use of Books Online (BOL) for SQL Server 2000. I will try to avoid using specific information that can’t be used in SQL Server 2000; however, I am unable to guarantee that all information applies to previous SQL Server versions. Please use BOL if you are unsure of any of this information will work with your version of SQL Server.

How SQL Server Retrieves Data without Indexes:

Before we can go into how an index works, we should really understand how SQL Server finds data without the indexes; this will illustrate the importance of using Indexes. My examples will be based on a Customer Table that holds basic customer information such as the customer’s first and last name, their street address, city, state and zip code. When I use the term “Customer” I am referring to the Customer Table; this also goes for “First Name”, “Last Name”, etc. These are referencing their respective columns within the Customer Table.

What happens when SQL Server attempts to retrieve data from a table that does not contain any indexes is called a “Table Scan”. This is where SQL Server will go through each record (column by column, then row by row) to find matching records. After it has gone through the entire table it will return any matching results it has found. As you can probably image this is no quick task. This would be similar to picking up a book and deciding to find any pages that contain the word “alligator” in it. There might be a page with that word, there might now. In either case, without an index or table of contents you’d have to flip through every page to see if it contains this search word. If the book is under 30 pages, this may not be too bad…but, what if the book is 1500+ pages; that’s a whole different story.

This brings us to Indexes to help lower the amount of time SQL Server needs to spend finding matches in results!

Different Types of Indexes (and the general structure of an index):

There are two basic types of indexes: Clustered and Non-Clustered. Before I get into the specifics of each type and the differences between the types I want to cover the basic structure of an index; and the minor structural differences between the two types of indexes.

Basically all indexes are formulated with what is called “index key(s)” for a column (or combination of columns). These basically are pointers that tell SQL Server were specific words (or data) is contained within a table. Each table is broken into pages (this is physical storage of the data; each page is 8 KB and typically is constructed based on the order of entry…not a sort order). A page can contain hundreds of records, or just a few records. Because of the size limit in a data page, it all depends on the data types and how many columns are being stored for each record. There are formulas that can pin point the number of records being stored on a data page; this is beyond the scope of this blog entry.

Each index key(s) has a definition of the page the indexed word (or data) is stored. SQL Server will use these keys to determine what page to go to; so if the search term shows up in data page # 4 of 27 pages, SQL will skip over the first 3 pages and search page # 4. It will halt its search if there are no other keys specifying that data is stored in other pages; there are often times cases where data will be stored through multiple pages (like page #s 4, 12, 13, 21, 25, 26) or a data record could span multiple pages (such as page #4 & 5). Again, this is beyond the scope of this blog entry. The important thing to know is that data isn’t naturally stored in logical order within the physical file; this is why we need to have indexes to speed up the search. Also, since searches aren’t pre-determined when data is being entered SQL cannot have the data sorted, and in many cases you may have a specific method of sorting the data…but you might not be searching the sorted data (such as in Customer table, you might sort data based on customer’s location like City/State…but want to search for customer’s with Last Name starting with R; this type of a query and sort method can’t be pre-determined…hence the difficulty in storing data logically). Data is basically stored on a First In basis, and the rows just sequentially increase as data is entered, or removed. We need searches to tell us where the data is so that we can easily find this data to match our unknown (at entry time) search criteria and sorting methods.

This brings us to how do we tell SQL Server the best method to search if we don’t know it ourselves?! The answer is Clustered and Non-Clustered indexes, some research and knowledge of the data being stored, and a lot of testing (and refined Tuning as the database is used).

Clustered indexes:

Clustered indexes are a logical sorting and storing of index keys. This can allow SQL Server to very efficiently find data. This is all based on the defined columns within the clustered index. So, why don’t we want to just have every column within every table to be included in the clustered index? Because the more indexes included in the clustered index takes away from performance for your INSERTs, UPDATEs, and DELETE statements.

Basically the idea is that Indexes, especially Clustered, are wonderful to increase the speed to obtain results for Queries and Reports. However, there is a tradeoff which is that it takes more time write and/or modify data within the table. This is because the clustered index needs to update itself.

Imaging a file cabinet holding your customer information, you originally store all data by the customer’s Last name and then the First name. Now, you decide well, for getting information it would be quicker to store all of the data within each file by the date the data was added (and incase of similar dates then by alphabetical titles). So, now you need to find a customer John Smith and a paper that hold his Personal Address information…you look under Smith, find John, then in his folder look up Address section, then Personal and you quickly get the information.

Now, what if you have to update his file to include his birthdate, so that might go under Personal Information…not personal address though! So, now you find the last name, first name, then information section, then personal information page and add the file. Wouldn’t it have been quicker to just find his file and add her personal information to the end of his folder? That’s where the balance comes in, so question is partly how well your system performs with just reading queries/reports versus entering/modifying information…but the other part is what will be done more often. Are you going to query the information more often or are you going to update/modify the information more often? If you are querying, then siding on extra columns in your cluster MIGHT be ok; but if you are inserting/modifying data then you want only to use the number of columns it is to filter the result sets into manageable sections (maybe filter only name; last name, then first name…or maybe name and location; such as last name, first name, then city/state). It all just depends on what you are doing, how much data is being stored, and how you anticipate accessing this data will occur (i.e. often, seldom, with reports, lots of updates, lots of inserts, few inserts, etc).

Now, an important note is that you can ONLY have 1 clustered index per table! So, this can bring up the point of where you want the data to be accessible, but you don’t want to bog down the system each time you need to add/modify/delete data. So, how do you do this? How do you walk this fine line of optimal performance? Here come non-clustered indexes to the rescue!

Non-Clustered indexes:

You can have up to 249 Non-Clustered indexes; however, just as with Clustered Indexes there is such as thing as too many! Non-clustered indexes have its intended usage, which will get covered in the Tips section.

Non-Clustered indexes are indexes that aren’t sorted at the physical data page layer. This means that SQL Server can be pointed to the data page containing the data matching the search page; but the index stops after that point; it then becomes up to SQL to search that page and pick out the data. Remember Clustered Indexes point to the exact location of the data, and is quick because it is sorted. Non-Clustered indexes only point to the page; the Index Keys in the index are sorted also…but not the logical locations. So, there is some performance loss between Clustered and Non-Clustered. Always try to think of non-clustered indexes as an alternative to listing every column in a clustered index, and a method to allow for data that is less accessed or returns more Exact matched results.

So what does all of this mean? Well, basically if you were to tell SQL Server to find results of last names that start with ‘RE’ and used a non-clustered index to index the last name column then the index would point you to the page(s) that contain the last name(s) starting with “RE”, but there could be names also that start with “RA” or “RI” or “RH”, it all depends on what is actually stored at the time the query is executed. This then means that SQL would go through these other results until it finds the matching result; this is still faster than an entire table scan (which a table scan would start with Last Name “A” and End with Last Name “Z”).

Tips (putting this all together):

Now, that we have an understanding of what Clustered and Non-Clustered indexes are, and how SQL uses them let’s look at how we can determine what we should index (and what type of index to use). Remember:

  • · A table can only have a single Clustered Index
  • · Up to 249 non-clustered indexes
  • · Clustered Indexes point to the exact location of matching results
  • · Non-Clustered Indexes only point to the data page containing matching results, so some scanning will occur upon searching for matching results (however, much less scanning than not having any indexes)
  • · Clustered Indexes can retrieve data much more efficiently; BUT have a cost of slower data INSERT, UPDATE, and DELETE
  • · Non-Clustered indexes will use index key(s) if available, if none are available it will then use the data page ID or Row ID to perform its search for matching data

So now let’s put this all together into a simple understandable analogy. I will use a Library as our ‘database’, each section of the library (such as Romance, History, Computers, etc) will be our ‘Tables’, and of course each book will be our ‘record, or row’.

A catalog will contain two types of information. A ‘clustered’ catalog that will utilize the Dewey Decimal System; and is stored in the index box in Alphabetical Order by Book Title, followed by Author Name. A ‘non-clustered’ catalog will simply tell us the Section (‘Table’), the Shelf Number where top shelf is # 1 and bottom shelf is # 5 (This simulates our ‘data pages’), and finally the title of the book and the author (this is our First Name and Last Name columns). This is not sorted but does contain a listing of all book titles with Author Names that we can look at on a separate sheet to quickly get the index card # it is stored on.

Now with this in mind; imagine searching for a book called “James’ Awesome T-SQL Blog Book” by James R (no such book by the way exists). Now if we did this with a ‘clustered’ catalog. We could quickly look in the Title section and find the book that matches, if there were two books with this title we could then further define the result by looking at the author name.

Now imagine this same scenario but with the ‘non-clustered’ index. Let’s say the book title is somewhere around index card # 100 of 15000. Since we can quickly review the index cards listed we can see the card is around #100. So, we open the index drawer…we find our book information and now we are off.

Since the ‘clustered’ index card includes the Dewey Decimal System we can simply find the closest matching book numbered and quickly jump up or down as need to the exact book. Now with the ‘non-clustered’ index information we only know the section, title and shelf number. Now we get to the proper shelf, we then go through each book one at time to find the proper book. This of course is much quicker than starting at the beginning of the section and looking through each book, or even worse at the beginning of the first book in the library and sequentially going through the entire collection of books until we found the information (which may not even be there!).

Choosing Type of Index:

When to choose Clustered index and when to choose Non-Clustered indexes…

When to choose Clustered Indexes:

  • · Columns that are frequently accessed
  • · Columns intended to be stored sequentially (such as Last Name, First Name, etc)
  • · Columns that will be queried in ranges (when you use WHERE clause with the BETWEEN, >, or < type of operators)

Once you have determined the columns that should be in your clustered index it then becomes ESPECIALLY CRITICAL to consider what type of queries will be ran most often and what queries MUST have optimal performance. If these columns are not within these queries then you may want to reconsider revising your clustered index NOT to include unnecessary columns (consider creating non-clustered index for these columns). If you find that you require columns within these queries that meet the above Index selection tips and they are not already included, and then consider revising your index to include these columns.

When to choose Non-Clustered Indexes:

  • · Columns frequently in the WHERE clause that return EXACT matches
  • · Columns that contain many distinct values (Such as Last Name, or Street Address; but not City, State or Zip Code).
  • · Queries that do NOT return large result sets
  • · Especially in columns that are needed for critical queries, and aren’t already covered in your Clustered Index; or may be queried in a non-sequential manner that isn’t already covered in your Clustered Index

Keep in mind that once you have created your Indexes, rather it be Clustered or Non-Clustered you can always revise them to meet your current needs. Also, remember that your needs will change and this often times will require revising your Indexes to meet those needs!

Conclusion:

Indexes are here to help you; however they are a complicated concept to master. In most cases it is easiest to figure out what columns qualify for being indexed and then to simply try them out. Don’t be afraid to try out a few different combinations of Clustered and Non-Clustered Indexes; performance is never a one size fits all. Your indexes should not be determined with that type of mind set.

I would also suggest setting a personal reminder or adding to your occasional checklist to check your indexes and how the table is accessing your indexes and to see if there is a way to improve the performance of your queries with the indexes for that table. Indexes will always be evolving, your queries will always be evolving, and your tables will always be evolving…don’t let me evolve without your intervention.

Remember that Speed = Happy Users…to me a Happy User = a Happy DBA! =)

Until next time, happy coding!