SQL 'N: DBCC

The other day I was helping out a friend of mine with a space issue on their server, and one of the topics I discussed was the importance of find the best order to perform the operations in to utilize the least amount of extra hard drive space. While discussing this it came to mind that often times while plugging away with SQL we can get so intent on getting the job done (often times in the first way that works, while this is not necessarily the best way), that we might take for granted all of that hard drive space and server power available to us!

With just a little bit of forethought, planning, and understanding what we are trying to accomplish, we can actually use less disk space...and we could also potentially improve the performance we get out of SQL Server! I don't know about you, but getting better performance always ranks high on my list of desired accomplishments. I think this can be best demonstrated with an example....

For simplicity reasons I will use a database with a single table that each record being inserted will equal 1 MB of hard drive space. Thus, 1,000,000 records in our "simple database" will result in a database size of 1,000,000 MB. Now, let's say that on average this database grows by about 100,000 records per year. This would mean that with 1,000,000 records this database has been used for about 10 years. Now, assume there is a decent DR in place and the transaction log is kept at a fairly decent size of maybe 1,000 MB to accommodate for a months worth of data before the FULL backup operation. Simple enough, right?

So, 10 years has passed and the boss walks in and says they just merged with another company that has an identical database setup and stats with only different data and the two databases will be merged (since this is hypothetical, we will pretend that a merge of the data can be done without any issues and it would simply double our database size). Also, the boss says that they are archiving 5 years worth of data from both databases to match the business rules for hard copy storage on site and archiving. And lastly, the boss says "...because you don't look busy..." that they only want to have enough space in the database to allow for 2 years of data to be added and that they need the extra hard drive space available for other needs.

From a first glance you might say not a problem. We'll just merge the two databases, take out 5 years worth of data, perform a FULL backup, shrink the transaction log (leaving enough free space for a month worth of data), and then shrink the database down (leaving enough room for 2 years worth of data to be added later). I'm going to ignore the transaction log size in this analysis because there will typically be one out come of the size regardless of the steps taken; and this will differ based on the database recovery mode and method taken to perform these steps; and to analyze each possibility is beyond the scope and purpose of this blog. Now, let's look at how this might look on our system after each process...

First step in the above scenario would be to merge the two databases (remember that since the databases are originally from different companies, they are physically stored on separate hard drives).

HD Space Used: 2,000,000 MB
SQL Record Count: 2,000,000 records
SQL Space Used: 2,000,000 MB

Second step would be to take out 5 years worth of data.

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Third step would be to perform a FULL backup (this is based on a FULL recovery model).

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Fourth and Fifth steps to shrink the log and database.

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

From the above we can see that the maximum HD space being used would be 2,000,000 MB. Which depending on our available HD space this wouldn't be too much of a problem. The real issue would be the amount of resources needed; since this is hypothetical we can only imagine from a step-by-step that the resources being used would be fairly moderate. There are too many variables to be accurate when it comes to actual performance measuring, so I won't even attempt to measure it.

Now, what if we put a little thought towards these requirements from the "boss", and especially what if the HD space available to use was 1,800,000 MB. Now, we have a problem that can easily be resolved with just finding the right order. In this case, just making a simple change of placing Step # 1 at the very end and performing an extra (final) FULL backup and a (final) transaction log shrink will allow us to remain under 1,800,000 MB and also can save our server a lot of processing time, because it will have to process less records. We can also utilize the secondary server (the server holding the data from the company being merged with) processing power to cut total time being required to archive and thus decreasing the time required to merge because of the lower number of records being merged. Here is how the steps would look....

Step # 1: Remove data from each database at it's original location

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 2: Perform FULL backups for each database

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 3: Shrink the transaction log and databases

HD Space Used (per Server): 500,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 500,000 MB

Step # 4: Merge the two databases

HD Space Used: 1,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Step # 5: Perform FULL backups for database and shrink transaction log

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

Conclusion:

Now, you can easily see in the above scenario that we only use a maximum of 1,200,000 MB of space and that is because we are reserving that free space. Logically, by performing the data archiving before merging the two databases we save our server processing power because it will process ONLY that of which will remain in the database. Sometimes the order of operations will already be pre-determined from outside factors, sometimes it may be required to perform certain tasks in a particular sequence.

I hope this illustrates that it's not always good enough to just find a way to make things work; and that with little planning in the beginning you can save yourself time, lower the requirements to perform the tasks at hand, and also improve the performance of your SQL Server!

Summary:

Optimizing Indexes is quite a complicated subject. There are many techniques; and what is the most difficult is that there are so many different scenarios as to when and how to optimize your indexes. Instead of trying to describe a method of how to optimize, I am going to discuss how to identify when optimizing indexes should be considered and where to go to find the different options.

The first step is NOT to just rebuild all indexes, or to defrag them all. There are some people that believe that rebuilding all indexes is the way to solve this; don't listen to them!! PLEASE DON"T!!! I'm rarely the person to give advice on not listening to other people; most people can help you learn about what's good or bad...but if you go the route of rebuilding and defraging all your indexes, you can in fact be causing additional problems that you didn't have to cause. In particular using system resources that don't need to be used, and causing unneeded fragmentation on the OS level! Any decent DBA is always concerned with using system resources, remember that system resources is always a precious commodity and should never be wasted.

The first step is to analyze the database. The next step is to identify the queries that can/will benefit from optimizing the index(es). The third step is to optimize the index. The final step is to periodically review these methods to identify when optimizing indexes will be required; this will be an ongoing project and will ultimately require re-iterrating through your queries to identify when an index has become warranted for optimization. There is no cut and dry method to warn you, prevent indexes from becoming out of tune, or to auto-optimize the queries and indexes.

Analyzing the Database:

So, how do you analyze a database to determine if an index needs to be rebuilt or should run a defrag on the index? Well, first you need to understand how indexes are built and what is causing them to become fragmented. Review my blog entry "Introduction To Indexes" to learn about how indexes are built. Also, take the time to review this white paper called "Microsoft SQL Server 2000 Index Defragmentation Best Practices"; even though it is written for SQL Server 2000, it will still apply the same to SQL Server 2005. It is a long read, but will help out well beyond the scope of this blog entry.

Here, in a nutshell, is how to determine when your index is fragmented. Use SQL Profiler to identify poor performing queries; in particular use the template "SQLProfilerTSQL_Duration", this will already contain the traces needed to identify the offending queries. Once the queries are identified you can then start looking into which indexes there queries are accessing; especially queries that are retrieving ranges instead of singletons. These queries are the highest risk of having fragmented indexes, remember though that this is just a method to determine the potential problem areas. Your own judgement will be best to make the final determination.

When to Consider Optimizing Indexes:

The first key in detecting that you should optimize your index is when you are observing performance degradation and have no absolute culprit to cause this. Now, keep in mind that just because you are seeing performance degradation doesn't immediately indicate that there is index fragmentation. This can only be determined by analyzing your database properly. Next, identify which queries are utilizing the most I/O; these are the next candidates. Then consider queries that have workloads that are suspected to benefit from index defrags; this can be quite difficult and should be carefully determined.

All of these are covered in the MSDN white paper called "Microsoft SQL Server 2000 Index Defragmentation Best Practices"; mentioned earlier. This paper even goes into very good detail on how to make these determinations, which will probably be enough to get your through while you gain experience.

Getting Help:

Sometimes it's nice to have someone more experienced help out, if that is how you feel then go with that feeling and seek out that someone.

If you don't have physical access to an experienced DBA then seek advice from trusted websites, forums, and/or discussion groups. Remember when seeking advice they can only provide an answer with quality that matches the information you provided. Such as stating you have a database that needs indexes rebuilt or defragged, will most likely get you answers stating to use DBCC commands or some other commonly used index rebuilding command. The purpose of seeking advice is to provide detailed information to get an answer that is specific to your scenario; so be sure to provide as much information as possible without breaking any company policies.

Conclusion:

As you can see, Indexes can be fairly simple to optimize. It's determining when to optimize and what to optimize that becomes difficult.

There are methods to fine-tune the auto-index handling of your indexes, this is covered in the white paper I mention in the next section. You'll always find different opinions and experiences, embrace all you can and mix and match what works best for you and your situation. There is no one-size fits all for optimizing indexes, just as there isn't one-size fits all for database solutions. It's all about customizing to your needs, and utilizing your available resources to make your work easier and more enjoyable.

I can't stress enough that reading the "Microsoft SQL Server 2000 Index Defragmentation Best Practices" white paper will help out tremendously. This paper covers the topic so well, that originally I had planned on providing tips on how to identify the queries that required indexes to be rebuilt, and how to find additional help on this topic. During my research I came across this white paper and it covered absolutely everything and more than I had planned to cover!

Until next time, Happy Coding!!

Additional Resources:

Microsoft SQL Server 2000 Index Defragmentation Best Practices (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx)
SQL Server Best Practices (http://msdn.microsoft.com/en-us/sqlserver/bb671432.aspx)
How To: Optimize SQL Indexes (http://msdn.microsoft.com/en-us/library/ms979195.aspx)
Database Journal: Reducing SQL Server Index Fragmentation (http://www.databasejournal.com/features/mssql/article.php/2238211)
SQLServer Performance: SQL Server Index Fragmentation and Its Resolution (http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx)

SQL 'N

Tuesday, July 1, 2008

Concept: Finding the Right Order...

Conclusion:

Thursday, May 22, 2008

Optimizing Indexes

Summary:

Analyzing the Database:

When to Consider Optimizing Indexes:

Getting Help:

Conclusion:

Additional Resources:

Blog Archive

Resources

My Other Blogs

About Me