No matter how busy (or not busy) your life is, you probably follow a pretty established routine day in and day out. You get up, shower, eat, work, sleep, etc… Face it?your life is a loop. Sure, there are a few things that you do each day that make that particular day unique, but you can chalk that up as branch logic.
When I began thinking of my own life as a loop, I did what all programmers do with loops; optimize it by reducing the amount of time that each thing that I do takes. While I can't speed up the length of each loop iteration (Mother Nature takes care of that), I can definitely try to pack more productivity into each day.
At first, it was easy to find ways to save time. I cut my lunch hour to half an hour. I stopped hanging out in the shower for an extra twenty minutes in the morning. As anyone skilled at optimizing loops will tell you, though, you need to look at the small things, too. That's because the time it takes to perform even a small activity can add up fast when you have to do it hundreds or even thousands of times.
For instance, I read an article the other day on Anchordesk.com (http://reviews-zdnet.com.com/AnchorDesk/4520-7298_16-5131343.html) that said that it takes about 48 seconds to boot a PC. The author obviously doesn't have 26 applications loaded in his system tray like I do. Nevertheless, at just 48 seconds, the article points out that you waste 4.9 hours each year on that one activity. As you can see, even a small latency can add up over time. So, now I put my laptop in Hibernate mode each night instead of shutting it off completely. That cuts my morning startup time in half.
I had a peculiar run-in with latency the other day. I was building a batch process in T-SQL that looped over several million records in a table and calculated cost buckets using a rather complex set of algorithms. I declared several variables to hold interim computations, and those variables had to be reset to 0 at the beginning of each loop iteration. Naturally, in my mind, I figured that the best way to do that was to use a series of SET statements, like so:
SET @foo1 = 0
SET @foo2 = 0
SET @foo3 = 0
--etc
When I tested the process, it took quite a long time to complete. I did my best to optimize the rest of the procedure, but its performance still wasn't where I wanted it to be. On a whim, I turned my sights to those SET statements. If you're familiar with T-SQL (and if you're a Microsoft developer, you'd better be), you know that there is more than one way to put a value into a variable. You can use the aforementioned SET statement, or you can use a SELECT statement:
SELECT @foo1 = 0, @foo2 = 0, @foo3 = 0
I had always thought that the difference between the two was merely syntactical, and that the decision to use one or the other was a matter of personal preference. I was grasping at straws, though, so I decided to run a simple loop test to make sure. I ran the following two routines and examined the performance results:
--METHOD 1 (SLOWER)
DECLARE
@counter INT,
@foo1 INT, @foo2 INT, @foo3 INT,
@foo4 INT, @foo5 INT, @foo6 INT,
@foo7 INT, @foo8 INT, @foo9 INT
SET @counter = 0
WHILE @counter < 10000
BEGIN
SET @foo1 = 0
SET @foo2 = 0
SET @foo3 = 0
SET @foo4 = 0
SET @foo5 = 0
SET @foo6 = 0
SET @foo7 = 0
SET @foo8 = 0
SET @foo9 = 0
SET @counter = @counter + 1
END
--METHOD 2 (FASTER)
DECLARE
@counter INT,
@foo1 INT, @foo2 INT, @foo3 INT,
@foo4 INT, @foo5 INT, @foo6 INT,
@foo7 INT, @foo8 INT, @foo9 INT
SET @counter = 0
WHILE @counter < 10000
BEGIN
SELECT
@foo1 = 0, @foo2 = 0, @foo3 = 0,
@foo4 = 0, @foo5 = 0, @foo6 = 0,
@foo7 = 0, @foo8 = 0, @foo9 = 0
SET @counter = @counter + 1
END
As my comments in the code denote, the second method using the SELECT statement runs much faster than using the SET statements (three times faster on my server); and that was with only 10,000 loop iterations, never mind the millions of loop iterations that I was really dealing with). The secret behind the performance difference is revealed if you view the execution plan of each test routine. In the code example, when you issue nine SET statements, it translates that into nine SELECT statements to assign those variables. The second method accomplishes the same amount of work with a single SELECT statement. The more variables that you need to assign in a batch like this, the bigger the performance difference becomes. The relationship is almost linear. Remember that SQL Server at its heart is optimized to work on sets of data. It has a much harder time dealing with sequentially processed data (for now) so you have to learn to play to its strengths whenever possible.
Note that if you are doing single variable assignment, you won't get any advantage either way. The performance gain only kicks in when you need to set many variables at once. Once I switched to SELECT statements to assign variable values, my batch process ran a heck of a lot faster (ten times faster in my case, but your mileage may vary depending on your particular situation).
Hopefully, you can see that sometimes making a simple change to your code can have a profound affect on its performance. Personally, I'm looking forward to the release of Yukon so that I can build these types of long running stored procedures using my favorite .NET language and forget about a lot of this T-SQL nonsense (though not completely). Now, if I could only get my clients to settle on what they want the first time around, I could spare you the “My Life Is a Recursion Loop” article next issue. My name is Jonathan Goodyear, and I am the angryCoder.