General discussion of SQL Topics; aimed for the novice-intermediate level Microsoft SQL Server User. Currently focuses on using SQL Server 2005.

Wednesday, July 2, 2008

Publication: Introduction to the Transaction Log...

Today SQLServerCentral.com (SSC) had published my article that introduces the Transaction Log. The primary purpose of this article was to help the newer DBAs understand how the transaction log works, and to clarify some commonly misunderstood parts of maintaining the transaction log.

This article covers a very broad and general introduction; including how the log is filled by SQL Server, easy to follow analogies to help grasp how transaction logs work with SQL Server, and some Good Practices to use when Transaction Logs are in use by SQL Server.

You can read the entire article at: http://www.sqlservercentral.com/articles/Design+and+Theory/63350/

Please feel free to join in the discussion for this article; there are some great posts that help to further clarify the topic and also provides additional information that was not included with this article. You can join the discussion at: http://www.sqlservercentral.com/Forums/FindPost527448.aspx.

Tuesday, July 1, 2008

Concept: Finding the Right Order...

The other day I was helping out a friend of mine with a space issue on their server, and one of the topics I discussed was the importance of find the best order to perform the operations in to utilize the least amount of extra hard drive space. While discussing this it came to mind that often times while plugging away with SQL we can get so intent on getting the job done (often times in the first way that works, while this is not necessarily the best way), that we might take for granted all of that hard drive space and server power available to us!

With just a little bit of forethought, planning, and understanding what we are trying to accomplish, we can actually use less disk space...and we could also potentially improve the performance we get out of SQL Server! I don't know about you, but getting better performance always ranks high on my list of desired accomplishments. I think this can be best demonstrated with an example....

For simplicity reasons I will use a database with a single table that each record being inserted will equal 1 MB of hard drive space. Thus, 1,000,000 records in our "simple database" will result in a database size of 1,000,000 MB. Now, let's say that on average this database grows by about 100,000 records per year. This would mean that with 1,000,000 records this database has been used for about 10 years. Now, assume there is a decent DR in place and the transaction log is kept at a fairly decent size of maybe 1,000 MB to accommodate for a months worth of data before the FULL backup operation. Simple enough, right?

So, 10 years has passed and the boss walks in and says they just merged with another company that has an identical database setup and stats with only different data and the two databases will be merged (since this is hypothetical, we will pretend that a merge of the data can be done without any issues and it would simply double our database size). Also, the boss says that they are archiving 5 years worth of data from both databases to match the business rules for hard copy storage on site and archiving. And lastly, the boss says "...because you don't look busy..." that they only want to have enough space in the database to allow for 2 years of data to be added and that they need the extra hard drive space available for other needs.

From a first glance you might say not a problem. We'll just merge the two databases, take out 5 years worth of data, perform a FULL backup, shrink the transaction log (leaving enough free space for a month worth of data), and then shrink the database down (leaving enough room for 2 years worth of data to be added later). I'm going to ignore the transaction log size in this analysis because there will typically be one out come of the size regardless of the steps taken; and this will differ based on the database recovery mode and method taken to perform these steps; and to analyze each possibility is beyond the scope and purpose of this blog. Now, let's look at how this might look on our system after each process...

First step in the above scenario would be to merge the two databases (remember that since the databases are originally from different companies, they are physically stored on separate hard drives).

HD Space Used: 2,000,000 MB
SQL Record Count: 2,000,000 records
SQL Space Used: 2,000,000 MB

Second step would be to take out 5 years worth of data.

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Third step would be to perform a FULL backup (this is based on a FULL recovery model).

HD Space Used: 2,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Fourth and Fifth steps to shrink the log and database.

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

 

From the above we can see that the maximum HD space being used would be 2,000,000 MB. Which depending on our available HD space this wouldn't be too much of a problem. The real issue would be the amount of resources needed; since this is hypothetical we can only imagine from a step-by-step that the resources being used would be fairly moderate. There are too many variables to be accurate when it comes to actual performance measuring, so I won't even attempt to measure it.

Now, what if we put a little thought towards these requirements from the "boss", and especially what if the HD space available to use was 1,800,000 MB. Now, we have a problem that can easily be resolved with just finding the right order. In this case, just making a simple change of placing Step # 1 at the very end and performing an extra (final) FULL backup and a (final) transaction log shrink will allow us to remain under 1,800,000 MB and also can save our server a lot of processing time, because it will have to process less records. We can also utilize the secondary server (the server holding the data from the company being merged with) processing power to cut total time being required to archive and thus decreasing the time required to merge because of the lower number of records being merged. Here is how the steps would look....

Step # 1: Remove data from each database at it's original location

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 2: Perform FULL backups for each database

HD Space Used (per Server): 1,000,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 1,000,000 MB

Step # 3: Shrink the transaction log and databases

HD Space Used (per Server): 500,000 MB
SQL Record Count (per Server): 500,000 records
SQL Space Used (per Server): 500,000 MB

Step # 4: Merge the two databases

HD Space Used: 1,000,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,000,000 MB

Step # 5: Perform FULL backups for database and shrink transaction log

HD Space Used: 1,200,000 MB
SQL Record Count: 1,000,000 records
SQL Space Used: 1,200,000 MB

 

Conclusion:

Now, you can easily see in the above scenario that we only use a maximum of 1,200,000 MB of space and that is because we are reserving that free space. Logically, by performing the data archiving before merging the two databases we save our server processing power because it will process ONLY that of which will remain in the database. Sometimes the order of operations will already be pre-determined from outside factors, sometimes it may be required to perform certain tasks in a particular sequence.

I hope this illustrates that it's not always good enough to just find a way to make things work; and that with little planning in the beginning you can save yourself time, lower the requirements to perform the tasks at hand, and also improve the performance of your SQL Server!