A few weeks back spoke at a Pragmatic Works Training on the T’s on indexing that was titled 5 Ways to Improve Performance through Indexing. The abstract for the sessions is:
Sometimes the hardest performance issues in your environment have simple solutions. One of these simple solutions is through properly indexing the database. In this session, we’ll look at five ways that performance of a database can be improved through indexing. When we finish, you’ll be armed with scripts and methodologies that you can put into place today to improve database performance.
After the session, I had a number of questions that came in. I hadn’t had a chance to answer the questions until now, hopefully the questions and answers helps some people. Hopefully, these answer the questions that are out there, if there are more, please let me know.
Q: Are filtered indexes available in Standard Edition of SQL Server 2005?
No, they were not introduced until SQL Server 2008.
Q: Would using dynamic SQL force the use of a filtered index?
That is a good option for ensuring that the query will utilize the filtered index. The downsides to dynamic SQL would be the prejudice that people have against it and the opportunity for poor coding standards (lack of parameterization and using EXEC). An added benefit of this practice would be that stored procedures wouldn’t be limited to the filter criteria for the query, in cases where the filtered index didn’t cover the predicates, it could still be used on those values.
Q: Are partitioned tables generally considered candidates for filtered indexes?
When partitioning tables, it can be quite useful to use filtered indexes to cover the ranges of values in table, provided those values will be used when applying predicate.
Q: How would you balance between the query plan improvements when using a parameterized query versus a non-parameterized query of filtered index?
Look at the performance with and without the filtered index, and then compare what is lost through parameterization. For instance, if a filtered index improves a 20 minute query to 1 minute, and then parameterization adds back on 30 seconds, then this is likely a good balance of performance.
Q: Question for the end: how is a filtered index good on a large table that gets a large number of inserts every day?
It depends which part of the data is getting filtered. If the data that is static is within a filtered index, then there is really no impact to the index maintenance of the tables. If the filtered portion of the index is being modified, then the cost is similar to that of a regular non-clustered index. It’s not more expensive to have filtered indexes than their non-filtered counterparts, then filtering just provides the limited scope of data that is indexes which decreases the effort for creating and maintaining the index. Along with the value in making a scan across a smaller set of pages less costly than the whole table.
Q: Could we get more discussion regarding Merge Joins?
For more information on Merge Joins, I recommend reading Craig Freedman’s blog post on the topic.
Q: Could you discuss cases where an index can help with certain queries and hurt others (on the same table), e.g., reading data vs. inserting data. Thank you.
Supposes you had a system where the modified date for a column was being indexed. This column would be updated with every update that occurs on the table. The order of rows with the clustered index key would shift often, resulting in a lot of shifting of data between pages in the index. An index, though, on the modified date may be useful for a small subset of queries that leverage modified date to identify recently updated rows, such as an ETL process that extracts the last days worth of modifications. Without the index, the ETL may be slow and cause deadlocks to occur on the table. But with the index, there is an index that is constantly thrashed throughout the day. In this case, you need to balance the read / write activity to choose the lesser of two evils.
Q: How do you determine in which order you should include the columns in an index. Does the column order in an index make a difference?
Generally, the most selective of the columns should be first. The trouble is, with some columns, certain values may be more selective than other values and depending on the set of queries, one order may be better than the other. In this case, create the different combinations of the index and test to see which is preferred by SQL Server. Depending on the rate of change in the column values, it could be most beneficial to have numerous variations of the same set of indexed columns, this won’t happen too often, though.
Q: Do indexes have to be co-located with tables or different set of disks on file groups in SQL Server
This depends entirely on the underlying disk and the level of use of the database. If you have some indexes that is accessed more frequently and it’s causing contention in the database, then look at splitting the indexes apart onto other disk arrays. There isn’t a hard and fast rule that they need to be separated.
Q: I understand a best practice is not to modify the schema of a database of a third party application. Do you recommend tweaking indexes of third party applications?
I won’t say that I recommend changing indexes on third-party applications, but I can say that I’ve done it before. Sometimes building indexes on third-party applications is the difference between having a weekend and babysitting their application.
Q: Is it possible to create clustered and non-clustered on same column? What about the performance in that case?
It is possible to have a clustered index and a non-clustered index with the same key values. This can be useful if you have queries that need to get the values in those columns quickly and the base table is wide. The way to think about it is, all of the int values for a single column in a non-clustered index, will take up much less space that the same int in a clustered index where the average row size is 900-bytes. By using the non-clustered index, some queries would be much faster than if they could only access the clustered index.
Q: Off the cuff, the INCLUDE to cover sounds good. Is there a corresponding cost/beneift to number/size of columns included versus db size or some other variable
There is definitely a cost/benefit for the number of columns that should appear in the INCLUDED column list. The point of this cost/benefit will be different per table. The cost of including a large number of unchanging values is not the same as including two or three values that change often. You will want to monitor the rate of usage compared to updates and then review what columns are updated most often. In some cases reverting to using a lookup can be more beneficial. From a rule of thumb perspective, I try not to add more than half of the columns from a table into an index.
Q: When avoiding lookups by adding columns to indexes, do you tend to add an INCLUDE over adding them to the index? Can you explain the advantage?
Generally, I’ll add the column in as an INCLUDED column. The reason for this is that columns that are being retried through a lookup generally are not being used as predicates. Since they are search columns I don’t need to be concerned about their order and can add them into the index without any thought to order.
Q: how did he get info for Case Study 1?
I used the Index Analysis script that can be found on this page and described in these posts.
Q: How do you get this info showed in the table, regarding number of seeks and lookups?
Same as the previous answer.
Q: Which report produces this summary?
Again, same as the previous answer.
Based on the last three questions, is there interest from people for me to show some examples of where I’ve used the Index Analysis script and how to get recommendations out of it?
2 thoughts on “Q&A From “5 Ways to Improve Performance through Indexing””
Great Post Jason !!! ..
Your welcome. Thanks for stopping by.
Comments are closed.