A couple weeks back I was asked “How come Microsoft has yet to put TRIM() in SQL?”
I don’t really know why there isn’t a TRIM() function in SQL Server. There is an LTRIM for function removing leading spaces. And an RTRIM function for removing trailing spaces. But there isn’t a TRIM function.
Let’s be honest, the “why” doesn’t matter to me – I don’t want the function to be a part of SQL Server. So, Instead of figuring out why, I’m going to write about my opinions. This is a great place for them, this being my blog and all.
There are two main reasons that I am against having a TRIM function in SQL Server…
What Do You Know
First, a large majority of people tend to stick with what they know. Once you find something that works, why not stick with it until there is a reason not to use it any longer? Unfortunately this can have some unintended consequences.
Most of the developers that have built the applications that I consult on were originally designed by Application Developers versus SQL Server Developers. This distinction is important, because a lot of these developers are then aware of the .NET or Java function for TRIM in the languages that they are using. In their cases, using it will not have a seriously negative impact on their environments.
But that “knowledge of a function” doesn’t translate well into SQL Server. In SQL Server, the use of TRIM would have a serious impact on the effectiveness of indexing being used to query the database. In case you don’t believe me, this will be demonstrated below.
What Do You Need
Secondly, how often do leading spaces need to be removed from a field? Sure, I’ve removed them while loading data into a database in an ETL process. But most often, leading space saved to a field in a database is just “bad” data that I’d rather clean up.
Are there reasons to have leading spaces in a field? Probably, but a vast majority of the time, they won’t be there and they aren’t going to be needed.
By this logic, most of the time only trailing spaces will need to be removed from a field.
Considering the Two Reasons
If leading spaces don’t need to removed but people know about TRIM from their primary programming language, will they be inclined to find out if there is a function that only removes trailing spaces? Or would a person be more inclined to just use what they know. I understand the nature of laziness, and am pretty confident that people would end up just using TRIM().
Depending on where the white space is that needs to be removed, it can have a serious impact on performance. As an example of the performance impact, let’s consider the following query:
USE AdventureWorks SELECT * FROM Person.Contact WHERE EmailAddress = 'firstname.lastname@example.org'
When executed, the query returns the following execution plan:
You’ll notice that since there is an index on the column EmailAddress, that the Query Optimizer chooses an Index Seek on the column to find the value(s) that are being filtered. Then it looks up the rest of the columns with a Key Lookup operation.
Suppose though, that the column had trailing spaces in it that needed to be removed when querying the data. To do this, you’d need to use the RTRIM() function. The query in this case would likely look like this:
USE AdventureWorks SELECT * FROM Person.Contact WHERE RTRIM(EmailAddress) = 'email@example.com'
And when executed the query would return the following execution plan:
And in this case, the Query Optimizer has chosen to use an Index Scan. It can do this because SQL Server knows how the value begins and can scan for similar values within the index. It then follows-up with a Key Lookup for the rest of the values in the query.
But what we really are talking about is a function that removes the leading and the trailing spaces. To do this you would need to use the LTRIM() function in conjunction with the RTRIM() function. And now the query would look something like this:
USE AdventureWorks SELECT * FROM Person.Contact WHERE LTRIM(RTRIM(EmailAddress)) = 'firstname.lastname@example.org'
Running that query would produce the following execution plan:
As you see now, the execution plan is quite different from the previous two executions. Now, the Query Optimizer doesn’t know the first letters of the values in the EmailAddress column. With the existing index being useless, the query doesn’t have any other good candidate to find the values from the WHERE clause. This results in the query just doing a Clustered Index Scan.
Of course, the last execution plan looks like it is simpler, but closer scrutiny of the execution would show that the IO between each of the queries has some significant differences.
--EmailAddress = 'email@example.com'Table 'Contact'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --RTRIM(EmailAddress) = 'firstname.lastname@example.org' Table 'Contact'. Scan count 1, logical reads 178, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --LTRIM(RTRIM(EmailAddress)) = 'email@example.com' Table 'Contact'. Scan count 1, logical reads 569, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Based on my people going with what they know, the addition of a TRIM() function would likely result in some serious performance issues with some queries. This would likely happen just as a matter of development without any thought as to the effect and difference between the performance of RTRIM(), LTRIM(), and TRIM().
Some People Want TRIM
Now my opinion isn’t the only one that is out there. Quite often developers I work with ask about the TRIM() function. And Pinal Dave (blogging at SQLAuthority) put in a Connect item regarding adding a TRIM() function. He’s had a number of posts regarding this topic and ways to get around the lack of a TRIM() function.
While I appreciate the discussions I’ve had with him, one of those times on the cursed “it depends”, I’ll have to disagree with him on adding a TRIM() function. The pain in this discovery from people just coming to SQL Server, is worth the benefit of unnecessary performance death that could occur by overuse of a TRIM() function. I actually would encourage people to vote down this Connect item.
8 thoughts on “Please, no TRIM()?”
I know that using functions at the left of the predicate defeats the purpose of index usage. However, there are ways where you can clean your data and pass it to SQL for a performed search. Ex.:
DECLARE @Email VARCHAR(320) = LTRIM(RTRIM(‘firstname.lastname@example.org’));
SELECT * FROM Person.Contact
WHERE EmailAddress = @Email;
This is just a simple example and the extension can be vast in how you can add a previous step where you’re performing cleaning prior to sending to SQL. So, using a TRIM() would be kind of beneficial for these types of checks.
But of course RTRIM should be able to use a Seek.
Read this, then go vote for the Connect item.
I will be there and will be eager to talk to you. You have always good points to discuss.
I had enjoyed conversation with you as well. In fact, this is very good blog post.
Thanks. Will you be at the MVP Summit next week?
I wouldn't say I'm against a TRIM() function, but mainly for the purpose of loading legacy data and removing spaces in that situation. But, in reality, I can do that in the source query or in SSIS or other ETL process.
An interesting side note is that FoxPro has a TRIM() function, but it behaves as an RTRIM() and only trims trailing spaces. If you want to trim leading AND trailing spaces you need to use ALLTRIM(). Let's just say I was not impressed when I found that out.
Comments are closed.