Wednesday, October 28, 2009

Locking A Table, While It Is Being Loaded, And Minimizing Down Time

I came across an interesting thread in the MVP newsgroups a few weeks back and thought I would share it’s content here.  The thread was about providing a reliable method to load a table, while keeping the downtime to a minimum.  As an added bonus the users still want to be able to query the old data, while you are loading the new data. I would like to point out that I did not come up with this solution.  I got this solution from Aaron Bertrand, http://sqlblog.com/blogs/aaron_bertrand/.  His solution is absolutely fantastic.  Aaron's solution is by far the best way to attack this problem, in my opinion.

Instinctively,  the first solution that comes to most of our minds is an insert into/select statement, with locking hints.  This solution is not a scalable one and does not adhere to our business requirements.   I will start by creating a sample table and then we can dig into how to solve this problem.

USE [tempdb]
GO
 
IF EXISTS(SELECT 1 FROM sys.tables WHERE name = 'TestData')
BEGIN
    DROP TABLE [dbo].[TestData];
END
GO
 
CREATE TABLE [dbo].[TestData](
RowNum INT PRIMARY KEY,
SomeCode CHAR(2)
);
GO
 
INSERT INTO [dbo].[TestData] (RowNum,SomeCode)
SELECT TOP 10000
    ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
    CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS SomeCode
FROM 
    [Master].[dbo].[SysColumns] t1,
    [Master].[dbo].[SysColumns] t2
GO

Okay with our table out of the way, we can start to really think about how to solve this problem.  The first step in solving this problem is to create two new schemas.  The first schema is called “Holder” and this schema will be a holder or container, for our table that we will be loading with new data.

--create Holder schema
CREATE SCHEMA [Holder];
GO

The Holder schema does just what the name implies… it holds the table that I will be inserting into.  The next step is to create a table that matches the same definitions, as the one above, but in the Holder schema. 

--Create TestData table in the Holder schema
CREATE TABLE [Holder].[TestData](
RowNum INT PRIMARY KEY,
SomeCode CHAR(2)
);
GO

With our schema and table created, I only have one other schema to create.  The last schema is the Switch schema.  The Switch schema is used as an intermediary schema to house the current source table (dbo.TestData) while the loaded table in the Holder schema (Holder.TestData) is transferred to the dbo schema (dbo.TestData). 

--Create Switch schema
CREATE SCHEMA [Switch];
GO

This solution adheres to all of our business rules and reduces downtime to the amount of time required to perform a schema metadata operation, which is nearly instantaneous.  This is a very scalable solution because the loading of the data is completely transparent to the users, all-the-while allowing them to query the stale data.  Let’s have a look at the final solution:

--Create procedure to load the table
CREATE PROCEDURE [dbo].[usp_LoadTestData]
AS
BEGIN
 
SET NOCOUNT ON;
 
--Truncate holder table
TRUNCATE TABLE [Holder].[TestData];
 
--load new data into holder table
INSERT INTO [Holder].[TestData] (RowNum,SomeCode)
SELECT TOP 500
    ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
    CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS SomeCode
FROM 
    [Master].[dbo].[SysColumns] t1,
    [Master].[dbo].[SysColumns] t2
 
BEGIN TRANSACTION
    -- move "live" table into the switch schema
    ALTER SCHEMA [Switch] TRANSFER [dbo].[TestData];
 
    -- move holder populated table into the "live" or dbo schema
    ALTER SCHEMA [dbo] TRANSFER [Holder].[TestData];
 
    -- Move the prior table to the holder schema
    ALTER SCHEMA [Holder] TRANSFER [Switch].[TestData];
COMMIT TRANSACTION
 
END
GO

Let’s see how it works.  Execute the code and then query the dbo.TestData table.  You will see that the table now contains 500 rows instead of the 10000 I started with. 

EXEC dbo.[usp_LoadTestData];
SELECT * FROM dbo.TestData;

Note: If you are interested in seeing how the solution handles locks, you can strip all the code out of the stored procedure and run everything but the commit transaction.  You can then open another window and try to query the table, which will result in a wait until the schema transfer is committed. 

That’s it!!! This is by far the best method I have seen to solve this business problem.  It is very scalable,  has very little downtime, and is not affected by the NOLOCK hint.  I am really happy and thankful that Aaron shared his solution.  This solution has given me a lot of insight to solving this problem and similar problems.  Hopefully this post will have the same effect on you.

Until next time, happy coding.

Friday, October 23, 2009

Converting A Scalar User Defined Function To A Inline Table Valued Function

In my last post http://jahaines.blogspot.com/2009/10/io-stats-what-are-you-missing.html, I talked about the performance problems associated with scalar user defined functions and how SSMS may report invalid IO statistics, for scalar UDFs.  In this post, I will focusing on how to transform those pesky scalar UDFs into more scalable function.  When developing user defined functions, you have to keep a few things in mind.  Firstly, scalar UDFs are evaluated for each row returned by the query.  Additionally, SQL Server is not able to maintain statistics and optimize any function, except an inline table valued function.  Lastly, most code logic does not necessarily need to be encapsulated, in a function.  You may get better performance if you choose to use a derived table instead of a function; however, the biggest problem with a derived table is it cant be encapsulated and reused across an application.  An inline TVF is really useful if you need to encapsulate business logic and reuse it throughout an application.  Let’s start by creating the sample table DDL.

USE [tempdb]
GO
 
IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO
 
--Create Sample Table
CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeDate DATETIME
);
GO
 
INSERT INTO dbo.TestData 
SELECT TOP 1000 
    ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
    ABS(CHECKSUM(NEWID()))%250 AS SomeId, 
    DATEADD(DAY,ABS(CHECKSUM(NEWID()))%1000,'2008-01-01') AS SomeDate
FROM 
    Master.dbo.SysColumns t1
GO

Next I am going to create two functions.  One function will be a scalar UDF and the other will be an inline table valued function.  If you do not know what an inline TVF is, an inline TVF is like a parameterized view and is subject to the same restrictions as a view.  For more information you can read the following post in BOL, http://msdn.microsoft.com/en-us/library/ms189294.aspx.

--Create Scalar Function
CREATE FUNCTION dbo.fn_SomeFunction(@SomeId INT)
RETURNS DATETIME
AS
BEGIN
    DECLARE @dt DATETIME
    SELECT @dt = MAX(SomeDate)
    FROM dbo.TestData
    WHERE SomeId = @SomeId
    
    RETURN @dt
END
GO
 
--Create Inline Table Valued Function
CREATE FUNCTION dbo.fn_SomeInlineFunction()
RETURNS TABLE
RETURN(
    SELECT SomeId, MAX(SomeDate) AS SomeDate
    FROM dbo.TestData
    GROUP BY SomeId
)
GO

All of our DDL is in place.  All that is left is to test the performance.  If you read my last post, you should be expecting the inline TVF to out perform the scalar UDF.  Let’s see what actually transpires.  Discard the query results to the grid by clicking Tools –> Options –> Query Results –> SQL Server –> Results To Grid –> Discard results after execution.  Next open SQL Server profiler and use the standard template.  Run the code below to capture the performance counters in profiler.

SELECT SomeId,dbo.fn_SomeFunction(SomeId)
FROM dbo.[TestData]
GO
 
SELECT TestData.SomeId,max_dt.SomeDate
FROM dbo.[TestData]
INNER JOIN dbo.fn_SomeInlineFunction() AS max_dt
    ON max_dt.SomeId = TestData.SomeId
GO

The results of the queries should look similar to my results below.    The things of note are the reads and the CPU required to satisfy the each query.  The number of reads and CPU required to satisfy the scalar UDF  is astronomically greater than the inline TVF. 

image

If the above screenshot is not enough to discourage you from using scalar UDFs, I do not know what can.  The point being that there are all kinds of great alternatives to encapsulating code logic, without the use of scalar functions.  Inline TVFs offer a SET based  approach for encapsulating business logic; plus the optimizer is able to use existing statistics and indexes to optimize inline TVFs.  It is my recommendation that you should try to convert all scalar UDFs to inline TVFs.  I know this is not always possible, but it is a good start.  I typically try to stay away from scalar and multi-line UDFs, unless absolutely necessary.  I hope that you have learned something new and that you can use this example to get the needed signoff to change those problematic scalar UDFs, into inline TVFs.

Until next time, happy coding.

Tuesday, October 20, 2009

IO Stats – What Are You Missing?

In this post, I will address a common misconception that IO statistics are always reliable.  The truth of the matter is IO stats can sometimes yield incorrect information, which in turn may influence bad coding habits. I have had developers tell me that the scalar UDF query is better because it has less IO than the set based TVF query.  In situations like these it is important to express the message that you must account for more than IO when implementing optimization techniques, but in some cases IO can be misinterpreted. The question on the table is, “How can IO statistics be wrong?”  The short answer is IO stats are mostly correct and only under certain circumstances IO statistics are misrepresented in SSMS.  So what are these magical circumstances? The IO statistics become invalid anytime a scalar UDF is used.  The optimizer only accounts for the base table and not any of the IO encountered inside the scalar UDF, which misconstrues the query IO.  Let’s look at an example.

First I will create the tables, with data.

USE [tempdb]
GO
 
IF OBJECT_ID('tempdb.dbo.t1') IS NOT NULL
BEGIN
    DROP TABLE tempdb.dbo.t1;
END
GO
 
CREATE TABLE t1(
id INT,
col CHAR(1)
);
GO
 
INSERT INTO t1 VALUES (1,'a');
INSERT INTO t1 VALUES (2,'b');
GO
 
IF OBJECT_ID('tempdb.dbo.t2') IS NOT NULL
BEGIN
    DROP TABLE tempdb.dbo.t2;
END
GO
 
CREATE TABLE t2(
t2_id INT IDENTITY(1,1),
t1_id INT,
col CHAR(1)
);
GO
 
INSERT INTO t2 VALUES (1,'c');
INSERT INTO t2 VALUES (1,'d');
INSERT INTO t2 VALUES (1,'e');
INSERT INTO t2 VALUES (1,'f');
INSERT INTO t2 VALUES (2,'d');
INSERT INTO t2 VALUES (2,'g');
GO

The next step is to create our scalar UDF.

CREATE FUNCTION dbo.fn_ConcatenateCols(@id INT)
RETURNS VARCHAR(8000)
AS
BEGIN
    DECLARE @rtn varchar(8000)
    
    SELECT @rtn = COALESCE(@rtn + ',','') + t2.col
    FROM dbo.t2
    WHERE t2.t1_id = @id
    
    RETURN @rtn
END
GO

Now that I have all my sample DDL in place, we can run a simple test to measure our IO.

SET NOCOUNT ON 
GO
SET STATISTICS IO ON
GO
 
--Missing I/O
SELECT id,dbo.fn_ConcatenateCols(id)
FROM [dbo].[t1];
 
SET STATISTICS IO OFF
GO
/*
id          
----------- --------
1           c,d,e,f
2           d,g
 
Table 't1'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/

So what is missing from the results above?  If you look closely you will see that t2 is nowhere in the IO stats.  Let’s start a profiler trace and run the same query again.  Open SQL Server profiler and use the standard template. Once the profile is tracing, run the same query again. If you want the most accurate number of reads make sure to turn off query results Tools –> Options –> Query Results –> SQL Server –> Results To Grid –> Discard Results After Execution. This time around you will see the number of reads is 9 , as shown below.

image

As you can see IO statistics are a lot different than the actual number of logical reads, using IO Statistics.  This behavior can be a huge surprise to many unsuspecting victims.  Yes I did use the word victim :).  I say victim because this usually occurs to an individual that expects IO to be presented correctly and trusts Microsoft enough to not question their information.  This “victim” never thinks twice about questioning the information returned, which can be a huge performance problem. 

The take away is developers should always be careful when using scalar functions because they can really degrade performance and never trust anyone’s word,  not even Microsoft’s or mine. Always test yourself.  If you have not done so, I recommend reading my post on correlated subqueries, as it does apply to functions as well, http://jahaines.blogspot.com/2009/06/correlated-sub-queries-for-good-or-evil.html.  In my next post, I will show you how to get rid scalar functions and use Inline TVFs to optimize performance, while encapsulating code logic.

Until next time, happy coding.

Friday, October 16, 2009

Exporting Binary Files To The File System

In my last post I demonstrated how to use SSIS to load binary files into a SQL Server 2005 VARBINARY(MAX) column, http://jahaines.blogspot.com/2009/10/ssis-importing-binary-files-into.html.  This post will focus on recreating the binary documents on the file system.  I will be using a combination of TSQL and the BCP utility to perform the export, http://msdn.microsoft.com/en-us/library/ms162802.aspx.  I will be using the same table and data from the last post.  I will start by creating the TSQL to dynamically create a BCP command.

Below is the stored procedure I will use to export the data.  You will see that I have decided to use a cursor to process all documents.  The procedure also accepts a DocID, which limits the export to a single document.  A cursor is fine here because we are limited to executing a single BCP command; however, you could create an SSIS package that executes the stored procedure across multiple streams, if you need parallel processing.

CREATE PROCEDURE usp_ExportBinaryFiles(
    @DocID INT = NULL,
    @OutputFilePath VARCHAR(500) = 'C:\'
)
AS 
BEGIN
 
DECLARE @sql VARCHAR(8000)
 
IF @DocID IS NULL --Open Cursor to export all images
BEGIN
 
    DECLARE curExportBinaryDocs CURSOR FAST_FORWARD FOR
    SELECT 'BCP "SELECT Doc FROM [tempdb].[dbo].[Documents] WHERE DocId =' 
        + CAST(DocId AS VARCHAR(5)) + '" queryout ' + @OutputFilePath 
        + DocName + '.' + DocType + ' -S A70195\Dev -T -fC:\Documents\Documents.fmt'
    FROM dbo.Documents
 
    OPEN curExportBinaryDocs
    FETCH NEXT FROM curExportBinaryDocs INTO @sql
 
    WHILE @@FETCH_STATUS = 0
        BEGIN
            --PRINT @sql
            EXEC xp_cmdshell @sql,NO_OUTPUT
            
            FETCH NEXT FROM curExportBinaryDocs INTO @sql
        END
 
    CLOSE curExportBinaryDocs
    DEALLOCATE curExportBinaryDocs
END
ELSE --Export a single image
BEGIN
    SELECT @sql = 'BCP "SELECT Doc FROM [tempdb].[dbo].[Documents] WHERE DocId =' 
        + CAST(DocId AS VARCHAR(5)) + '" queryout ' + @OutputFilePath 
        + DocName + '.' + DocType + ' -S A70195\Dev -T -fC:\Documents\Documents.fmt'
    FROM dbo.Documents
    WHERE DocID = @DocID
    
    --PRINT @sql
    EXEC xp_cmdshell @sql,NO_OUTPUT
END
 
END
GO

The above stored procedure dynamically builds a BCP command.  Currently the stored procedure has two main parameters, DocID and OutputFilePath.  DocID is used when you want to export a single row, if no value is supplied the stored procedure will export all documents.  The outputFilePath is the directory where the files will be exported.  You will note that I have hard coded my server name and format file path.  You can add additional parameters if you need these attributes to be dynamic.  The next item on my list is to actually create the format file.  Create a .fmt file somewhere on your file system.   I put mine in the same folder as my other documents.  Copy the code below into the format file.

9.0   
1   
1       SQLBINARY     0       0       ""   1     Doc                                       ""  

You may be asking yourself, why the format file looks skimpy, or is lacking content.  The format file is lacking content because it only has what we need.  All we need in the format file is the VARBINARY(MAX) column and its data type. You can find more info on format files here, http://msdn.microsoft.com/en-us/library/ms191516.aspx.

Once the format file is in place, we can execute our stored procedure to process all documents.

EXEC dbo.usp_ExportBinaryFiles @OutputFilePath = 'C:\Documents\BCP_Out\'

That’s it!  It is that easy to output VARBINARY data onto the file system.  SQL Server 2005 has a slue of tools that make working with binary data very simplistic.  I hope that you have learned something new. Please stay tuned, as I plan to focus more on TSQL concepts and performance considerations.

Until next time happy coding.

Wednesday, October 14, 2009

SSIS – Importing Binary Files Into A VARBINARY(MAX) Column

Have you every had the need to import images, Word documents, Excel documents, or any other type of file into a SQL Server table?  If yes, you are in luck.  SQL Server 2005 gives us a powerful and scalable method to import binary files or documents, into a relational table.  I will be using the term binary file and document interchangeably throughout this post to describe a file on the file system.  I will be utilizing SSIS and the VARBINAARY(MAX) data type to import the document. Let us start by creating the sample DDL.  In this example, I will need a staging table and a table to house our binary data. 

USE [tempdb]
GO
 
IF OBJECT_ID('tempdb.dbo.[Doc_Stage]') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[Doc_Stage];
END
GO
 
CREATE TABLE [dbo].[Doc_Stage](
DocId INT IDENTITY(1,1) PRIMARY KEY,
DocName VARCHAR(50) NOT NULL,
DocPath VARCHAR(1000) NOT NULL,
DocType VARCHAR(4) NOT NULL
);
GO
 
IF OBJECT_ID('tempdb.dbo.[Documents]') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[Documents];
END
GO
 
CREATE TABLE [dbo].[Documents](
DocId INT IDENTITY(1,1) PRIMARY KEY,
DocName VARCHAR(50) NOT NULL,
Doc VARBINARY(MAX) NULL,
DocType VARCHAR(4)
);
GO

Next, I need to create the SSIS package.  The first step is to add an Execute SQL Task to the designer canvas.  Configure the Execute SQL Task, as shown below.

Note: You will need to create the source connection to the database where you created the DDL

image 

The SQL Statement that I used is below.

TRUNCATE TABLE [dbo].[Doc_Stage] ;

Next I will create a variable called FilePath that is of the string data type.  Now I am ready to add a ForEach Loop Container to the canvas.  Add the container to the canvas and configure it as shown below.

Note: I am grabbing all file types. If you only want a specific type, change the File to include the extension that you want.  E.g. *.jpg

image image

Next, I will have to create another Execute SQL Task, but this time, I have to drag the task into the ForEach Loop Container.  Configure the task as shown below.

image

The SQL Statement is presented below:

INSERT INTO [dbo].[Doc_Stage](DocName,DocPath,DocType) VALUES ('DocName','DocPath','jpg');

Now for the tricky part.  In this step, I have to build an expression to dynamically build an insert statement.  The insert will capture the document name, path, and type.  Click the expressions tab and create an expression, on the property SQLStatementSource (near the bottom).  Below is the code for the expression.

"INSERT INTO [dbo].[Doc_Stage](DocName,DocPath,DocType) VALUES ('" +
REPLACE(RIGHT(@[User::FilePath],FINDSTRING(Reverse(@[User::FilePath] ) ,"\\", 1)-1),RIGHT(@[User::FilePath],4),"")
+ "','" + @[User::FilePath] 
+ "','" + RIGHT( @[User::FilePath] ,FINDSTRING( REVERSE(@[User::FilePath] ),".",1)-1)
+ "');"
 

With that out of the way, we can press on.  The last task I will need is a Data Flow Task.  Drag the data flow task to the canvas.  Here is what my canvas looks like at present:

image

Open the Data Flow task and drag an OLE DB Source to the canvas and configure it as shown below.

image 

Make sure to click the Columns tab to set the column mappings.  Next, I will drag a Import Column Transformation to the canvas and configure it as shown below.

image image

Note: Make sure to take note of the output column’s LineageID.  You will need to take this ID and add it to the Input Column’s FileDataColumnID.

image image

We are almost there!!! The last step is to add the OLE DB Destination.  I will add the OLE DB Destination and configure it as shown below. Do not forget to click the Mappings tab to map the columns.

image

That’s it!!!! Click the debug button and all of the components should light up green. I have successfully implemented a document library solution that allows the insertion of any document type into SQL Server 2005.  Stay tuned because I will show you how to export the images to the file system using BCP and TSQL.

Until next time happy coding.

Friday, October 9, 2009

Missing Committed Rows, In The Read Committed Isolation Level

Today, I am going to talk about the possibility of a SELECT statement that misses committed rows, in the read committed isolation level.  You may be asking your self, what!!!! How can committed rows be missed in the READ COMMITTTED isolation level?  This is the perfect example of an oxy-moron in my opinion, but it can and does happen.  Perhaps you have noticed this behavior or perhaps has not been that apparent to you or your users.  Granted this only occurs under certain circumstances, but it can really confuse a lot of people.  The problem exists because of the method SQL Server uses to scan a table.  When SQL Server scans a given table it takes a shared lock one row at a time.  The problem occurs when a transaction obtains an  exclusive lock, which prohibits the table from proceeding further with the scan.  What do you think will happen if the inserted value occurs before the currently scanned row?  You got it… the row does not appear in the result set.  Take a look at the graphic below to see what is actually happening.

Drawing1

Now it is time to see this behavior in action.

In a new query window, run the below code. I will be referring to this window as Query1.

create table t (a int primary key, b int)
insert t values (1, 1)
insert t values (2, 2)
insert t values (3, 3)
 
begin tran
update t set b = 2 where a = 3

Next, create a new query window, which I will refer to as Query2, and paste and execute the following code.

select * from t ORDER BY a ASC

Note: I use an order by clause to guarantee the sort.  Otherwise, the data will be returned as it is found on disk

Now open a third window.  Paste and execute the code below.

INSERT INTO t VALUES (0,10)

Return to Query1 (The query with the UPDATE statement), and execute the code below.

INSERT INTO t VALUES (4,10)
select * from t
commit tran

After jumping through all these hoops, here are the results:

Query1:

image

Query 2:

image

Query 3:

image

Query 3 does not really tell us much other than our insert was committed to the database.  The query that tells us the story is Query 2.  As you can see, Query 2 is missing the row we inserted and committed in Query3.  If you repeat the steps above but change the sort order to DESC you will see that all rows are returned, unless of course you decide to insert a row after the current MAX a value.  Another thing of note is the clustered index.  If you change the clustered index to NONCLUSTERED, you will get all rows.  Why does this happen?  The answer is when a CLUSTERED index is scanned thus it returns the data in the order of the index sort; however, when a heap is scanned it returns data as it finds it on disk, unless an order by is specified.  Please realize that no sort is ever guaranteed, without an ORDER BY clause.  As you can see the direction of the scan impacts which data will be missing and which data is displayed.  How do you resolve this issue?  The only answer is to choose an isolation level with more consistency and less concurrency, like serializable or snapshot.  Here is the BOL entry for transactional isolation level, http://msdn.microsoft.com/en-us/library/ms173763.aspx.

There you have it.  I have demonstrated how committed rows can be skipped, in the read committed isolation level.  I wonder what else can happen in the read committed isolation level?  If you want a hint, try updating a row that has not been scanned yet and an row that has already been scanned, in the final commit part.  I think you will be surprised that your query will return invalid data and will be missing data. 

Happy coding.

Monday, October 5, 2009

Determine Which Indexes Are Not Being Used

Have you ever had the need to find out which indexes are actually being used?  If you are using SQL 2005 and greater, you are in luck.  SQL Server 2005 introduced a new dynamic management view that gives all the usage statistics, sys.dm_db_index_usage_stats (http://msdn.microsoft.com/en-us/library/ms188755.aspx).  What does this mean?  Well it means we can capture which indexes are being used and which ones are an administrative burden.  What’s the catch?  The catch is the usage statistics do not persist a SQL Server restart. The statistics will not be of much value after a server restart, so an option may be to insert the results of the view into a permanent table, for later analysis.  Let’s look at an example.

SELECT 
    QUOTENAME(s.name) + '.' + QUOTENAME(t.name) AS TableName,
    i.name AS IdxName,
    i.type_desc AS IdxType,
    ius.user_seeks,
    (ius.user_seeks*1.) / NULLIF((ius.user_seeks+ius.user_scans+ius.user_lookups),0) AS [%Seek],
    ius.user_scans,
    (ius.user_scans*1.) / NULLIF((ius.user_seeks+ius.user_scans+ius.user_lookups),0) AS [%Scan],
    ius.user_lookups,
    (ius.user_lookups*1.) / NULLIF((ius.user_seeks+ius.user_scans+ius.user_lookups),0) AS [%Lookup],
    ius.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats ius
    ON  ius.object_id = i.object_id
        AND ius.index_id = i.index_id
        AND database_id = DB_ID()--Current DB
INNER JOIN sys.tables t
    ON t.object_id = i.object_id
INNER JOIN sys.schemas s
    ON t.schema_id = s.schema_id
WHERE
    t.type = 'U'
    AND t.is_ms_shipped = 0
ORDER BY ius.user_seeks + ius.user_scans + ius.user_lookups DESC
 

Note: I have used a left join.  If an index has never been used, it will not have an entry in sys.dm_db_index_usage_stats.

The columns that really drive this view are user_seeks, user_scans, user_lookups, and user_updates.  User_seeks represents the number of seeks on a given index, since SQL Server last started.  Likewise, user_scans and user_lookups represent the number of scans and the number of lookups respectively.  One of the most important columns is user_updates.  User_updates represents the number of inserts, updates, and deletes to the index. If you find your index in a scenario where the number of user_updates is greater than the number of seeks, lookups, or scans, you should consider dropping the index.  There is no magic number to dictate when an index should be dropped because of the maintenance overhead.  You should use your best judgment when dropping an index. 

There you have it a simplistic method to get the index usage details.  How else can this dynamic management view be used?  I leave this to you to find out, but I will leave you with one other use.

The other use for this view is to identify the most queried tables.  By looking at the index usage stats, we can deduce how often the table is queried. 

--Most Accessed Tables
SELECT 
    DB_NAME(ius.database_id) AS DBName,
    OBJECT_NAME(ius.object_id) AS TableName,
    SUM(ius.user_seeks + ius.user_scans + ius.user_lookups) AS TimesAccessed    
FROM sys.indexes i
INNER JOIN sys.dm_db_index_usage_stats ius
    ON  ius.object_id = i.object_id
        AND ius.index_id = i.index_id
WHERE
    ius.database_id = DB_ID()
GROUP BY 
    DB_NAME(ius.database_id),
    OBJECT_NAME(ius.object_id)
ORDER BY SUM(ius.user_seeks + ius.user_scans + ius.user_lookups) DESC

That is it.  I have provided two uses of sys.dm_db_index_usage_stats.  I hope this post will help you identify poorly performing indexes or you most active tables. 

Happy Coding.

Friday, October 2, 2009

SQL Server MVP – October 2009

I am very honored and pleased to announce that I have been awarded and recognized, as a SQL Server MVP.  An MVP is not defined as the most technically savvy professional in the world, it is defined as a community member dedicated to sharing knowledge, professional growth,  mentorship and community.  This award is so prestigious that I feel blessed for having been nominated and even more blessed to receive it. I feel a little insignificant in receiving the award, as I know the caliber of past awardees.  To me, I feel like I am comparing myself to giants, but I guess we all have to start somewhere :).  I believe the recipients of this award are what give it merit.  All recipients of this award go above and beyond to help the SQL Server community. This is exactly what I plan to do.  Now that I have been awarded, I feel even more enthused and privileged to help such a great community.  No other online  community has ever made me feel more at home than the SQL Server community.  I can truly say that I have found my home and calling.