Can You Dig It? – Missing Indexes

Follow the rest of this series at the Can You Dig It? – Plan Cache series introduction post.

3295969599_eb16a58118_m1
She can dig it!
D Sharon Pruitt

When I started using XQuery to dig into the plan cache, it was just searching for some parallelism and I really wasn’t aware that I was using XQuery.  I just kind of glanced over it and let it be.

Fast forward a couple years later and I was at a PASS session where Grant Fritchey (Blog | @GFritchey) was presenting.  In his session he discussed some performance tuning techniques and mentioned a method for finding missing indexes.  His method was quite different that what I had been used to.

Missing Index DMVs

Prior to that session, I used a fairly common technique to find missing indexes.  I would query some DMVs included with the release SQL Server 2005 released that were designed just for this purpose.

There are four DMVs that can be used for this.  They are:

Together these DMVs can be used find indexes in a server that could help improve the performance of some queries on the server.  I talk about using this information some in this blog post.  There is also more good information on this here and here.

There’s a slight problem with using these DMVs though.  While they do a decent job of indicating indexes that would be useful, they don’t let you know the specific queries that will be improved by adding the indexes.  Certainly an index that can be used in many places and across multiple queries would be good.  Knowing the query that will be improved will be better.

Missing Index Queries

Listing the indexes that would improve queries is possible using a couple DMVs other that the missing index DMVs above.  With a query against sys.dm_exec_cached_plans and a little XQuery this information is readily available.

As was mentioned in a previous post, the DMV sys.dm_exec_cached plans has the SHOWPLAN XML for an execution plan.  Within an execution plan one of the possible elements is MissingIndexes.  This element, when present, contains the details for an index that would improve the performance of the query.  It includes the name of table that could use the index, the columns for the index and included columns, and information of the improvement in the query.

To illustrate this, let’s consider the following query:

USE AdventureWorks
GO

SELECT DueDate FROM Sales.SalesOrderHeader
WHERE DueDate Between '20010701' AND '20010731'
GO

Looking at the execution plan for the query above, the following should be returned:

image

The execution plan indicates that this query could be improved by the addition of an index.  We can also look at the SHOWPLAN XML and find that there is an element for MissingIndexes which contains the following information:

image

Missing Indexes in the Plan Cache

Finding one execution plan with this missing index information in it is useful.  Going into the plan cache and finding all of the plans that this element exists within is a treasure trove and that is exactly what can be done.

Now that we know that there is a MissingIndexes element possible in the SHOWPLAN XML, we can utilize the exist() method to search for the element.  XQuery can then be leveraged to search the entire plan cache through the DMV sys.dm_exec_cached_plans to find all plans with the MissingIndexes element.

Building this logic into a query can be accomplished with the following:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO

WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT query_plan, usecounts
FROM sys.dm_exec_cached_plans ph
CROSS APPLY sys.dm_exec_query_plan(ph.plan_handle) tp
WHERE tp.query_plan.exist('//MissingIndex')=1

Executing the query will return a list of all of the execution plans in the plan cache that have a MissingIndex element.  This result set below shows the output:

image

Once all of the plans with MissingIndex elements are discovered, the plans can be shred to determine the SQL text, the index that is missing, and the count of the number of times the plan was used.  The query below can be used to expand out this information:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO

;WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
,PlanMissingIndexes
AS (
SELECT query_plan, usecounts
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
WHERE qp.query_plan.exist('//MissingIndexes') = 1
)
, MissingIndexes
AS (
SELECT
stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Database)[1]', 'sysname') AS DatabaseName
,stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Schema)[1]', 'sysname') AS SchemaName
,stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex/@Table)[1]', 'sysname') AS TableName
,stmt_xml.value('(QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]', 'float') AS impact
,pmi.usecounts
,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
FROM stmt_xml.nodes('//ColumnGroup') AS t(cg)
CROSS APPLY cg.nodes('Column') AS r(c)
WHERE cg.value('(@Usage)[1]', 'sysname') = 'EQUALITY'
FOR  XML PATH('')), 1, 2, '') AS equality_columns
,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
FROM stmt_xml.nodes('//ColumnGroup') AS t(cg)
CROSS APPLY cg.nodes('Column') AS r(c)
WHERE cg.value('(@Usage)[1]', 'sysname') = 'INEQUALITY'
FOR  XML PATH('')), 1, 2, '') AS inequality_columns
,STUFF((SELECT DISTINCT ', ' + c.value('(@Name)[1]', 'sysname')
FROM stmt_xml.nodes('//ColumnGroup') AS t(cg)
CROSS APPLY cg.nodes('Column') AS r(c)
WHERE cg.value('(@Usage)[1]', 'sysname') = 'INCLUDE'
FOR  XML PATH('')), 1, 2, '') AS include_columns
,query_plan
,stmt_xml.value('(@StatementText)[1]', 'varchar(4000)') AS sql_text
FROM PlanMissingIndexes pmi
CROSS APPLY query_plan.nodes('//StmtSimple') AS stmt(stmt_xml)
WHERE stmt_xml.exist('QueryPlan/MissingIndexes') = 1
)
SELECT DatabaseName
,SchemaName
,TableName
,equality_columns
,inequality_columns
,include_columns
,usecounts
,impact
,query_plan
,CAST('' AS xml) AS SQLText
,CAST('' AS xml) AS PotentialDDL
FROM MissingIndexes
ORDER BY DatabaseName
,SUM(usecounts) OVER(PARTITION BY DatabaseName
,SchemaName
,TableName) DESC
,SUM(usecounts) OVER(PARTITION BY TableName
,equality_columns
,inequality_columns) DESC
,usecounts DESC

The output can be use to determine the databases and tables that would most benefit from additional indexes.  The query output also includes the percent improvement, use count, query plan, SQL text, and potential DDL for creating the index.

image

Missing Index Wrap Up

If you are looking for more on the MissingIndex element in the execution plan, check out Jonathan Kehayias’ (Blog | @SQLSarg) post Digging into the SQL Plan Cache: Finding Missing Indexes.  I’ve accommodated some of pieces of his query into the one I have above.  In his, he queries to the StmtSimple element instead of MissingIndexes, this is pure genius since it makes the @StatementText variable available.  Also, the method he uses to retrieve the columns for the missing index is much cleaner than was in my original query so that was adopted as well.

Since I mention it every now and then and think it’s been a couple posts since it was last included, the <?query – Some Information –?> syntax was something I picked up when looking at Adam Machanic’s (Blog | @AdamMachanic) Who Is Active stored procedure.  You better be using this and if not, start today.

Now for the icing on the wrap-up.  After the last two posts on the plan cache I hope you’ve started to see some of the benefits of letting SQL Server identify potential bottlenecks.  This information is readily available and sitting there for the taking.  In the next few posts, we’ll dive into the structure of the XML schema to help inspire some additional ways to leverage the plan cache.

Final note, I hadn’t checked before writing this, but it appears Grant Fritchey has a post on this as well.  Go ahead and check that one out as well – it’s More Refinements on the Missing Indexes Query.

17 thoughts on “Can You Dig It? – Missing Indexes

  1. Thank you for this; I’ve found it very useful. I’m wondering if there isn’t some way of rearranging and extending this somehow to show the statements in cache that are using a specific index.

    I have some indexes with many include columns that make me wonder whether the SQL that is using them could be optimized to select fewer columns and therefore let me include fewer columns in the index that supports the query. In some cases I would just like to be able to document why a particular index exists. For example, “This index was created to optimize the performance of the xyz stored procedure”.

    Do you happen to have such a query in your collection?

    Like

  2. I like the idea of being able to see the queries that would be impacted by adding a missing index. But when I run this query, I get a very different list than with the query that I've been using (Brent Ozar's 2009-04-01 FindMissingIndexes query). Do you have an theory on why the results are so different?

    One thing I particularly like about the one I am using is that there is a reasonable ranking and cut-off of the suggested index additions (WHERE CONVERT(INT, (migs.user_seeks + migs.user_scans) * migs.avg_total_user_cost * (migs.avg_user_impact * 0.01)) > 10000). Is there a good way to put an equivalent cut-off formula into this query?

    Like

    1. The differances in the results is due to how the information is retained after it is accumulated. They both source from the QO when queries are submitted for execution. The DMVs accumulate in memory until their buffer is reached. Typically this is large enough that it's size isn't an issue. The plan cache is also accumulated in memory but these are within execution plans and if a plan ages out of the plan cache the missing index information will go away with it.

      When it comes to ranking of the plan cache missing indexes, the query is set up to segment by database and rank the highest suggested indexes table. When it comes to indexing strategies, a lot of what you choose depends on what plans you want to influence and what you are willing to spend to do that. Through these results, you know specifically which plans will benefit. If there is a plan with thousands of executions and it has a suggested index that would improve performance by 90% that's almost a no brainer.

      One thing to be careful with use ranks from the missing index DMVs is that values like 10,000 don't really mean anything. If the server was just restarted, there probably won't be many results with a value over 10,000. If it has been up for a few months, then all of the results may be over 10,000. Over all, I would probably look at the to 5-25% of the results for consideration rather than specific ranking.

      If you haven't looked at my Index Analysis series, you may want to dig into that as well. It summarizes and correlates some index statistics to help provide a holistic view of the indexes in your environment.

      Overall, I think you need to use all three of these items. Look in your plan cache to see missing indexes by plan/table. Look in the missing index DMVs to see overall what could be used. Review your tables with the index analysis script to see where the index would sit in relation to other indexes on the table. From there you should have enough information to know if the index will benefit the table, plans on the server, and is worth the cost.

      Like

Comments are closed.