Welcome to day twenty-sixth post of the 31 Days of SSIS. We’re getting near the end now. If you have not read the other posts, they can be found through the introductory post.
This series has been a combination of my thoughts on practices for working with SSIS and proof of concept packages that represent solutions I’ve worked on in the past. We’ve done a few days of packages now, so I thought I would shift gears again and talk about one of the best practices that I try to follow.
This best practices is to keep the unit of work for each SSIS packages small, focused, and limited in the task that it is trying to accomplish. This means that when I write an SSIS package one of the goals will be to limit the number of Data Flows to as few is logically possible.
For instance, if I have three files to process then I will usually build this as three SSIS packages. If all three of those packages have a shared process with identical inputs, then I may create a fourth SSIS package and pass data from the first three packages to the fourth with a Raw File.
There are a some good reasons for developing small SSIS packages. These reasons are tied into their impacts on your environment. I could say it’ll ease development, but rather than doing that I want to look at the impact that this practice will have on impact.
- Development Impact: When you are developing smaller packages you end up with smaller amounts of work that need to be accomplished. We work in a mostly agile world where people count on us to deliver often. By delivering smaller packages and breaking apart the work, you can move towards the delivery and achieve the same goals the other development teams are already working towards. This is the difference between an SSIS package that can can import most of the files and the SSIS solution that has 9 of the 10 packages done.
- Performance Impact: As packages are broken down into smaller units of work, you will begin to find that some of these packages will have dependencies on other packages. Other times, you will find that some perceived dependencies are false and don’t really exist. They may have been create to help the visual flow of the data. As these opportunities are discovered ETL frameworks can be utilized to leverage parallelism within the process execution.
- Maintenance Impact: With packages that are broken down into smaller tasks, the cost and effort of maintenance of the SSIS packages is reduced. If an SSIS package has 20-30 different components and updates a dozen or so tables there is always the concern and risk that one change to one import will have an actual or perceived side effect. When the SSIS package that updates TableA is changed, it’s helpful to know that the changes to TableB won’t have to be retested since it is in another SSIS package.
- Troubleshooting Impact: If you’ve had to troubleshoot monster SSIS packages before, you already will understand this point. The smaller an SSIS package is and the more focused the package’s task, the easier it will be to troubleshoot issues. It becomes easier because the list of things could have led to the issue are reduced. It there is an issue with a single task it is easy to identify since
- Recycled Impact: As the story below will detail, as SSIS packages are broken down to smaller tasks you may also find ways to leverage them in ways that you never considered in the original design. This will open up you environment to other opportunities to solve new problems with existing architectures.
A Two-Part Story
Let’s examine a recent use of this best practice that I had that really exemplifies the package. Let’s start with a small package.
In a recent engagement I had to add in some new functionality to an existing SSIS process. To protect the innocent, let’s say that the requirement was to geo-code addresses after they arrived. To do so, I was going to be using a web service. The geo-code information then needed to be placed in a special geo-code table. The one additional requirement was that this didn’t need to happen for all imports and sometimes would need to be run some time after the original import was already executed. These aren’t the true requirements, but they are a close enough approximation.
During design, I had a couple options that I had considered using:
- The functionality could be added to the package that did the address inserts and updates. After the record was added, the geo-code information could be retrieved from the web service. The data would then be added to the geo-code table. For imports that used the geo-code web service, there would be a flag to turn this functionality on or off.
- The functionality could be added as a new SSIS package. The new SSIS package would receive an input file, probably a raw file, that was generated when the first package executed. The imports that needed geo-coding would then have this SSIS package run against the output from the previous package.
Of course, I went with the second option. This was implemented fairly well and after a bit of development was ready to deploy. The new SSIS package did a great job with its place in the existing architecture.
But wait… of course, there is more. The part of this that really sells the concept of smaller, focused SSIS packages is what happened a couple weeks after deploying the new SSIS package.
On a snowy afternoon, I received an e-mail wondering if there was any way a few hundred addresses could be run through the new process. This wasn’t part of the initial requirements, but they thought to ask anyways. The problem was there was a different way in which addresses were loaded and they needed geo-coding as well. The existing process allowed the user to request that one address at a time would be geo-coded.
The win here is that with a query and an export of the results to SSIS, the manual work was eliminated. Since the design focused on a single task that was independent of the import it could be leveraged for previously unexpected uses. We got mad props for the design of the package and one person saved themselves from a serious case of sore wrist.
This practice isn’t absolute and every one of my packages aren’t always small and focused. Though I do prefer to develop in this direction and look to opportunities to do this as often as possible.
What are your thoughts on this practice? Anything that you would like to add? Feel free to leave a comment.
6 thoughts on “31 Days of SSIS – Small Package, Big Win (26/31)”
I’m new to SSIS and am lapping up your blog thanks! I’ve got a question about the multiple package thing though. How do I have a variable or parameter that is constant against all the packages? I’d like to have one row in a log table that keeps being updated so I need to store the ID of the row. Having read what feels like half the internet, I can’t see how to manage that.
Thank you again, katy
This is what I have always bitched about to my fellow developers, but they tend to put the whole ETL inside one package (to keep it simple!!!!)
But the package takes forever to load, and it is an utterly hell to maintain any thing inside the “one SSIS package to rule them all”.
I agree, complete hell. The other thing is when people have huge data flows writing to multiple destinations. Usually those packages have a complete disregard for managing which destinations are written to and transaction management. One failure on one destination can have significant negative impact.
And thanks for the comment. 🙂
Comments are closed.