Without much ado, here are some things that I’ve learned through the years that I’d wished I had known from the beginning.
Data Types Really Matter
In one of my first data warehouse projects I ran into a small issue with data typing. I used DTS to load the data to staging tables and then used stored procedures to clean it all up.
One item in particular that I needed to clean up was the dates that came through the data file. They came in the format “ddmmyyyy”. I needed to update the date string to the format “dd/mm/yyyy” prior to converting it to datetime. When I created the staging table I opted to use the data type varchar(10) to store the date string.
And this was where the problem began. In development, my multiple thousand row data file loaded without significant issues. I ran everything through a few times and thought that I had an ETL that was working pretty good. But as soon as the package was deployed to the test server things didn’t quite work out so well. The test environment used a production file that had a few million records and the ETL process came to a grinding halt.
A numerous hours of execution and troubleshooting later I discovered that the problem was the varchar(10) columns that were growing from 8 to 10 characters in length. With millions of rows and multiple columns containing unformatted date strings I was being rewarded with more page splits than I could ever want.
Changing the data type to char(10) was a little better performing. And by a little I mean the ETL finished in minutes versus hours.
Data Types Really, Really Matter
A few years later I was working on the ETL processes for another data warehouse that was aggregating cash register data for a major US retailer. The database a little bit big – at the time it was a little over 1 TB for 90 days worth of data on SQL Server 2000.
After the initial design was implemented, I went in to review some of the initial assumptions on data types and recommended some changes on int data types to smallint and tinyint. The total space savings didn’t surpass 100 GB, but it was pretty close. And the time to rebuild all of the tables was a serious pain.
Properly data typing the database ahead of time is well worth the time and effort.
Always Clean Up After Yourself
A number of years ago, I wrote a process for maintaining databases that automated a number of the basics; check db, backups, index tuning, etc. In my desire to be a verbose as necessary in the event of an error. The procedures output a small log file (1-2 KB) with each execution that could be reviewed to determine the root of any issue.
Low and behold – after a few years these small files can become large folders of data if they aren’t cleaned out on a regular basis. I had a former co-worker call a few years back asking if the files in the folder could be purged. It had grown to a few GB in size.
Since then I always make sure I incorporate clean up of any log files in the creation of any process.
Don’t Clean Up Presentations The Night Before Presenting
This one came up last year, I spent a dozen hours in the office practicing and fine tuning my demos for a SQL Server 2008 SSIS presentation. I built SQL scripts for all of the databases. They would build the databases, the data, tables, and stored procedures the presentation needed. As the final step before closing my laptop the night before the presentation, I deleted all of the databases since the scripts would rebuild them all right before the presentation.
Oops, I forgot to create a script for one of the databases and the Data Profiler demo suffered significantly because of this. I don’t play with my presentations anymore the night before.
I am tagging: