Find Query Plans That May Utilize Parallelism

When ever I go to a new client to assist with performance issues, I inevitably download the Troubleshooting Performance Problems in SQL Server 2005 white paper.  In my opinion and many others, it is one of those documents that should be at the click of a button to open and I keep it handy on my desktop.

One of the most common wait types that I observe at clients is CXPACKET, known on the street as parallelism.  And since most of these are OLTP database systems, there is a good need for looking into the causes of said wait type.

When I’ve identified that parallelism is the reason for the server running hot and hopefully not catching on fire I start to look at what is running or has run on the server that is utilizing parallelism.  When I first starting to look for execution plans that might use parallelism, I used the query in the white paper (mentioned above) that would return back a list of all of the plans that are currently in the procedure cache that would utilize parallelism if they were selected.

That query is:

SELECT
p.*,
q.*,
cp.plan_handle
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) q
WHERE cp.cacheobjtype = 'Compiled Plan'
AND p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";max(//p:RelOp/@Parallel)', 'float') > 0

But recently I got to thinking; and this can be a dangerous game. Do I care about a plan that is only used once? Also, is it better to first look at plans that utilize more CPU than other plans? How do I rank one plan against another plan?

To accomplish this, I turned to the DMV sys.dm_exec_query_stats. This DMV aggregates performance statistics for cached query plans. The trouble with this DMV is that it is the query level and not the procedure level so conditional statements and looping will affect how often some SQL statements are within a procedure. I’ve put together a couple queries that combine the query above with sys.dm_exec_query_stats.

First, I put together the top 50 statements by average CPU utilization that is part of an execution plan that utilizes parallelism. What I like about this query is that I get an exact SQL statement to look at. Unfortunately, this isn’t an exact science because the statement with the high average CPU may not be the statement that is utilizing parallelism. On the other hand, who is going to complain if a high CPU statement without parallelism is tuned?

SELECT TOP 50
OBJECT_NAME(p.objectid, p.dbid) as [object_name]
,qs.execution_count
,qs.total_worker_time
,qs.total_logical_reads
,qs.total_elapsed_time
,CASE statement_end_offset WHEN -1 THEN q.text
ELSE SUBSTRING(q.text, statement_start_offset/2, (statement_end_offset-statement_start_offset)/2) END as sql_statement
,p.query_plan
,q.text
,cp.plan_handle
FROM sys.dm_exec_query_stats qs
INNER JOIN sys.dm_exec_cached_plans cp ON qs.plan_handle = cp.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) as q
WHERE cp.cacheobjtype = 'Compiled Plan'
AND p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";max(//p:RelOp/@Parallel)', 'float') > 0
ORDER BY qs.total_worker_time/qs.execution_count DESC

The second query I have here ignores the statement level detail of sys.dm_exec_query_stats and summarizes the views at the plan_handle level. This presents some problems though since execution counts amongst all query statements isn’t always the same within a plan and which number would be best to use? Minimum executions? Maximum executions? Average executions? It all depends which side you want to error on.  For the script, though, I went with maximum executions.

WITH cQueryStats
AS (
SELECT qs.plan_handle
,MAX(qs.execution_count) as execution_count
,SUM(qs.total_worker_time) as total_worker_time
,SUM(qs.total_logical_reads) as total_logical_reads
,SUM(qs.total_elapsed_time) as total_elapsed_time
FROM sys.dm_exec_query_stats qs
GROUP BY qs.plan_handle
)
SELECT TOP 50
OBJECT_NAME(p.objectid, p.dbid) as [object_name] ,qs.execution_count
,qs.total_worker_time
,qs.total_logical_reads
,qs.total_elapsed_time
,p.query_plan
,q.text
,cp.plan_handle
FROM cQueryStats qs
INNER JOIN sys.dm_exec_cached_plans cp ON qs.plan_handle = cp.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) as q
WHERE cp.cacheobjtype = 'Compiled Plan'
AND p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";max(//p:RelOp/@Parallel)', 'float') > 0
ORDER BY qs.total_worker_time/qs.execution_count DESC

If you are already on SQL Server 2008, you do have the benefit of having the DMV sys.dm_exec_procedure_stats at your disposal.  Using this DMV will solve the problem of not having accurate execution information at the procedure level.  For those blesses with SQL Server 2008, you can use this query to identify execution statistics for stored plans that utilize parallelism.

SELECT TOP 50
OBJECT_NAME(p.objectid, p.dbid) as [object_name]
,ps.total_worker_time/ps.execution_count as avg_worker_time
,ps.execution_count
,ps.total_worker_time
,ps.total_logical_reads
,ps.total_elapsed_time
,p.query_plan
,q.text
,cp.plan_handle
FROM sys.dm_exec_procedure_stats ps
INNER JOIN sys.dm_exec_cached_plans cp ON ps.plan_handle = cp.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) as q
WHERE cp.cacheobjtype = 'Compiled Plan'
AND p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";max(//p:RelOp/@Parallel)', 'float') > 0
ORDER BY ps.total_worker_time/ps.execution_count DESC

If anyone has any queries that they use to address and research parallelism in their environments, I would be more than happy to see them.