Many application developers face the challenges of working with SQL Server 2000.
These challenges include retrieving and grouping large amounts of data, building result sets, and tracking changes to data. All require professional strategies that hold water against a seemingly endless number of possibilities. This installment of "The Baker's Dozen" presents a variety of real-world database situations and how you can use Transact-SQL and SQL-92 to tackle these issues.
Beginning with the End in Mind
A major objective of "The Baker's Dozen" is to provide tips for those wishing to become more productive in a certain technology. Hopefully, even those with experience may still find a gold nugget in one of the tips. Just like the legendary Van Amsterdam Bakery that gave us the original Baker's Dozen name, I've spent the last few months baking a batch of tasty tips to serve up to developers, in hopes of helping you meet the challenges of SQL Server 2000.
Developers of similar strength and experience can and will disagree on the emphasis of performance (or the promise of performance) versus maintenance and readability. Sometimes there is no single correct answer. The key is to apply consistency in approaching these types of situations.
Most of the tips raise a business application requirement, and address how you can use Transact-SQL code to solve it. The tips in this issue are:
- Return multiple result sets from a single stored procedure
- Write User-Defined-Functions (UDFs) to return a scalar value
- Write a UDF to parse a comma-separated list of selection keys and return a table that can be used for subsequent JOINs
- Use LIKE to build text-search routines
- Understand and use CASE syntax to evaluate run-time conditions
- Baker's Dozen Spotlight: Use subqueries and derived tables to handle involved requirements, and a brief discussion about different techniques
- Use table variables, and understand the differences between table variables and temporary tables
- Use datepart functions to summarize daily data by a week-ending date
- Learn how to use functions like CAST and CONVERT to make different data types work together
- Use triggers to define specific actions when database row(s) are modified
- Extend triggers to implement basic audit trail capability
- Use Dynamic SQL to build queries where key conditions are not known until runtime
- Retrieve a list of tables and columns from a particular database
You can find all listings in this article on the Common Ground Solutions Web site at www.commongroundsolutions.net. So once again, borrowing from the theme of Van Amsterdam's bakery, here are thirteen productivity tips, a "Baker's Dozen".
Tip 1: Return Multiple Result Sets
Requirement: You need to create a simple stored procedure to return all the Order Information for a single customer. The result set must contain three tables: the order header, order detail, and order notes.
A single SQL 2000 stored procedure can easily return multiple result sets. The general rule is that any SQL SELECT statement in a stored procedure with no explicit output declaration (to a table, variable, etc.) returns the results to the calling routine.
Listing 1 shows a basic stored procedure that contains one argument, an integer key for a customer account. The procedure queries the Order Header, Detail, and Notes tables. Note that the procedure creates a variable of type table, and stores the list of order keys to be used for all three queries. Tip #7 covers table variables in more detail.
Because the three queries do not contain an output destination, the results are returned to the calling procedure. You can execute the stored procedure from a script in Query Analyzer, which displays the three results sets when the query runs. If you called the stored procedure from C#, the data adapter fills the result set (dataset) with the three data tables, like so:
SqlDataAdapter oDa =
new SqlDataAdapter(cSQLString, oMyConn);
oDa.Fill(DsReturn,"MyResults");
// Table 0 will contain the order headers
// Table 1 will contain the order details
// Table 2 will contain the order notes
Normally, you would use some type of data access methodology to call stored procedures. A future Baker's Dozen article will discuss these types of development strategies in a distributed computing environment.
Tip 2: UDFs to Return a Value
Requirement: You need to sum customer debits and credits based on a cutoff date, and return the sum as a customer balance. You would like to use this calculation across multiple queries.
This is another example where you can use Transact SQL to prepare a result set with little or no data munging in the business layer.
SQL 2000 allows developers to write and call user-defined functions. UDFs allow developers to encapsulate logic and functionality, and return a scalar value that they can use across different queries. Listing 2 presents a basic UDF that sums a table of debits and credits for a customer and date threshold, and returns the result. You can use the function in a query as follows:
DECLARE @dCutOffDate DATETIME
SET @dCutOffDate = CAST('11/30/2004' AS DateTime)
SELECT CustID,
dbo.CUSTBAL(CustID, @dCutOffDate)
AS CustBalance
FROM...
If the application has five queries that need the customer balance, your code can use the UDF every time, instead of repeating the same code across every query.
Of course, that is a basic demonstration of UDFs. Now I'll add a twist. The Customer Balance UDF queries the Debits and Credits table based on a COMPDATE column. Suppose the Debits and Credits tables contain three dates that represent the different stages of a record (DRDATE for draft, WIPDATE for work-in-progress, and COMPDATE for completed date). The balance UDF needs to examine the record status to determine the date to which the cutoff date should be compared.
The balance UDF could implement a series of in-line IF or CASE statements to read the status and use the corresponding date column. However, other processes may need this logic. Listing 3 demonstrates a second UDF that retrieves the customer record and returns the appropriate date, given the customer's status. Now, instead of saying:
AND CompDate <= @dCutOffDate
You can use the following:
AND dbo.GetDate(Status,DrDate,WIPDate,CompDate)
<= @dCutOffDate
Of course, this means that the original query calls a UDF which in turn calls another UDF. The interest in isolating code should be weighed against performance. If no other queries/functions use this logic to determine the date, then perhaps this second UDF is not necessary. You should make this decision after identifying all the technical requirements of the application.
Please note that SQL Server 2000 UDFs carry some restrictions. You cannot modify data in any existing tables inside a UDF, nor can you create temporary tables inside a UDF. The general purpose of a UDF is to leverage code to return results, not to modify data.
Tip 3: UDFs to Return a Table
Requirement: In Tip #1, you implemented a stored procedure to return multiple result sets for a single account. Now imagine that you need to expand this requirement so that users can select one account, ten accounts, or even fifty accounts, and view the order information for all these accounts.
Some developers use a combination of building an IN statement with dynamic SQL, temporary tables, and other means to address this situation. Because UDFs can return tables as well as scalar values, you can use this capability to more cleanly address the requirement for running a stored procedure for a variable number of accounts. You'll do so by modifying the stored procedure in Tip #1 to perform the following:
- Pass the list of selected accounts as a comma-separated parameter (e.g., 1,2,17,912,1091) to the stored procedure.
- Create a UDF (Listing 4) that receives the comma-separated list of keys, builds a table variable with a one row integer key for each entry in the list, and returns the table variable.
- Call the UDF as part of the JOIN into the Orders tables to retrieve all the order information for the list of accounts.
A developer can leverage this UDF any time a query runs against a variable number of accounts, products, or other variable set of keys. Listing 5 shows a version of the stored procedure from Tip #1, modified to use the UDF.
Note that the UDF in Listing 4 contains a varchar parameter of 8,000 bytes. Assuming the comma-separated list of values represents integer key values, you should be able to pass several hundred integer keys in the list before the varchar limit of 8,000 bytes is reached. Those who require something larger (either thousands of keys, or a wider character data type) will need to look at alternate approaches, such as XML.
Tip 4: Use LIKE to Perform Test Searches
Requirement: Users of a job placement application have asked for the ability to list all candidates who match keyword searches.
Although it's difficult to completely avoid dynamic SQL or writing multiple queries that differ only slightly, effective use of CASE can certainly reduce these instances.
The T-SQL LIKE command allows you to search for patterns within strings. For example, suppose a user wants to search for the phrase "XP" within a memo column that contains "skills include Windows XP, FrontPage, etc." Developers can use the LIKE command and the wildcard percentage character (%) to perform pattern matching.
LIKE has multiple uses, depending on the type of search you want to perform. Most searches need to check for a pattern that exists anywhere within a column. However, some searches only need to return rows where a column begins with a search pattern. Additionally, some searches may be interested in rows where a column ends with a search pattern.
-- Search anywhere in the column
SELECT * FROM Applicants WHERE
Skills LIKE '%XP%'
-- Search where skills begins with XP
SELECT * FROM Applicants WHERE
Skills LIKE 'XP%'
-- Search where skills ends with XP
SELECT * FROM Applicants WHERE
Skills LIKE '%XP'
Additionally, you can use the single wild-character underscore character (Name LIKE '_EVIN'),
Tip 5: A Case for CASE
Requirement: You need to produce an accounting aging report that summarizes customer account balances by standard aging brackets (balance aged 1-30 days, 31-60 days, etc.). You also need to summarize the results either by account or by balance reason.
Tip #2 demonstrated basic CASE functionality to evaluate a set of customer status codes and return a corresponding date value. Now look at another example of how you can use CASE to meet a need.
Some development solutions take detail or partly-summarized result sets and perform further summarization in a business component. Although not invalid, this approach overlooks the ability to build the final result set completely in SQL 2000. For example, some solutions retrieve the accounting data from SQL and then process the results into the appropriate aging brackets. Listing 6 demonstrates each accounting balance grouped into the corresponding aging bracket. By using a CASE statement to evaluate a date column against a date range (a date column between AsOfDate?90 days and AsOfDate-61 days, etc.), the query can place the amount into the corresponding bracket.
That addresses the initial requirement to group customer accounting balances into aging brackets. You can also use the CASE statement in the GROUP BY clause to handle the need to sum the results by account or by category. Although it's difficult to completely avoid dynamic SQL or writing multiple queries that differ only slightly, effective use of CASE can certainly reduce these instances.
Tip 6: The Baker's Dozen Spotlight?Subqueries and Derived Tables
Requirement: A construction company has a table of construction jobs, and two tables for hours worked per job and materials purchased per job. You need to produce a basic result that lists each job, the total hours worked per job, and the total material costs per job. Some jobs have labor hours but no costs, and a few jobs have material costs but no labor hours.
On the surface, this may seem like a fairly easy requirement. However, the situation poses an interesting problem, and also opens a discussion on query methodologies.
Some developers may approach the situation by using a pair of OUTER JOIN statements, like so:
SELECT J.JobID, SUM(H.Hours) AS TotHours,
SUM(C.Costs) AS TotCosts
FROM JobMast J
LEFT JOIN JobHours H ON H.JobID = J.JobID
LEFT JOIN JobCosts C ON C.JobID = J.JobID
GROUP BY J.JobID
Unfortunately, this query doesn't produce correct results. Because of the aggregation involved (a single job could have two labor rows and three cost rows), the results for hours and costs will be very high (i.e., double-counted, triple-counted, etc).
Fortunately, the SQL-92 standard lets you solve this problem by implementing subqueries that return a scalar value in the SELECT statement. The following query produces reliable results.
SELECT J.JobID,
(SELECT SUM(H.Hours) FROM JobHours H
WHERE H.JobID = H.JobID) AS TotHours,
(SELECT SUM(C.Costs) FROM JobCosts C
WHERE C.JobID = H.JobID) AS TotCosts
FROM JobMast J
GROUP BY J.JobID
The above code represents a correlated subquery. A correlated subquery depends on the results of the outer query. As such, it cannot be executed independently.
The outer query executes the subquery for every record in the outer query. Some view this practice as inefficient, and elect to use derived tables instead.
SELECT J.JobID, H.TotHours, C.TotCost
FROM JobMast J
LEFT JOIN (SELECT JobID, SUM(Hours) AS TotHours
FROM JobHours
GROUP BY JobID) H
ON H.JobID = J.JobID
LEFT JOIN (SELECT JobID, SUM(Costs) AS TotCosts
FROM JobCosts
GROUP BY JobID) C
ON C.JobID = J.JobID
In the above query, the two subqueries in parenthesis produce derived tables. Derived tables are essentially temporary, ad hoc views that only last for the duration of the query. The contents of the derived table can be referenced by the outer query, just like any other table. (Note that you must always reference a derived table with an alias.) You can also create a derived table from a subquery in a FROM clause. You should primarily use derived tables when you need a temporary result set for just one query.
Everyone knows the old joke about getting ten economists in a room and getting ten different views of economics. Although that's somewhat of an exaggeration, you could ask ten different developers which approach is preferable, and you'd likely see some votes for the correlated subquery approach because of readability, some votes for the derived table approach on the basis of better potential performance, and no doubt a few votes for still another alternative.
Developers of similar strength and experience can and will disagree on the emphasis of performance (or the promise of performance) versus maintenance and readability. Sometimes there is no one single correct answer. The key is to apply consistency in approaching these types of situations.
Before I move on, let me add a final piece to the requirement. The result set I constructed contained a summary of hours for each job. Instead of a summary of hours, you may want to sum hours multiplied by an hourly employee rate to produce total labor dollars.
As an additional complication, the rates may change throughout a job (an employee may work half a job at one rate, and a second job at a higher rate). Listing 7 expands the query above to demonstrate how to match up hours for a work date against the valid rate for that date. The additional logic finds the employee rate with the highest (MAX) effective date that is less than or equal to the specific date worked. Either query above can use the additional subquery.
Tip 7: Table Variables vs. Temporary Tables
The listings for Tips 1 and 3 introduced table variables as a means of storing temporary result sets for repeated uses in the stored procedure. In those examples, the results of the table-valued UDF (that worked with the list of account keys) were directly joined against the Orders Header table. If you wanted to use the table containing the account keys for another join, you could use the following:
DECLARE @tAcctKeys TABLE (IntKey int)
INSERT INTO @tAcctKeys
SELECT IntKey = IntKey FROM
CsvToTable(@cAcctKeyList)
-- now you can use @tAcctKeys in any subsequent
-- joins in the stored procedure
According to Microsoft documentation, table variables provide some of the same benefits of temporary tables, with the additional benefit of slightly improved performance. Table variables have the same general scope as any other variable: they are automatically cleared at the end of a procedure or function. SQL Server recompiles stored procedures less often when you use table variables as opposed to temporary tables. Additionally, transactions that involve table variables last only for the duration of an update on the table variable. As a result, table variables require less locking and logging resources. Because table variables have limited scope and are not part of the persistent database, transaction rollbacks do not affect them.
Although table variables may perform slightly better, there is a slight misconception that they perform better than temporary tables because SQL Server stores the contents in memory instead of the tempdb system database. According to Microsoft, a table variable is not a memory-only structure. A table variable could hold more data than can fit in memory; therefore, it has to have a place on disk to store data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are created and processed while in memory (data cache).
Even with this fact, developers generally prefer table variables over temporary tables for the reasons described above. However, table variables carry some notable limitations. You cannot TRUNCATE a table variable, nor can you modify the structure of a table variable after you've created it. Additionally, you cannot use either of the following statements with table variables:
-- neither line will work with table variables
INSERT @tTable EXEC <sp_mystoredproc>
SELECT * INTO @tTable FROM <mytable>
Tip 8: Date Functions
Requirement: You need to retrieve daily order information and summarize it by week for a weekly report or weekly graph.
In constructing an UPDATE trigger, you need to remember that the trigger fires once, regardless of the number of rows affected by an UPDATE statement.
The two brief code snippets below demonstrate how to use SQL Server's DATEPART function to solve this requirement. The first snippet presents a UDF that converts any date during the week to a Saturday date for that week. The second snippet demonstrates the UDF as part of a query against orders, which summarizes the data by the week-ending Saturday date. You can use this result set for a weekly report or graph. This represents another example where you can use Transact SQL to prepare a result set with little or no data munging in the business layer.
CREATE FUNCTION dbo.GetEndOfWeek
(@dDate DateTime)
-- Converts date to the Saturday date for the week
RETURNS DateTime AS
BEGIN
DECLARE @dRetDate DateTime
SET @dRetDate =
@dDate + ( 7-DATEPART(weekday,@dDate))
RETURN @dRetDate
END
SELECT dbo.GetEndOfWeek(OrderDate) AS WeekEnding,
SUM(Amount) AS WeekAmount
FROM OrderHdr
GROUP BY dbo.GetEndOfWeek(OrderDate)
DATEPART returns an integer based on the portion of the date that you want to analyze. SQL Server Books Online lists all of the date portions (dateparts) that the DATEPART function can analyze. For example, you could use the quarter datepart to summarize data by quarter.
The Weekday datepart is scoped to SQL Server's setting for the first day of the week. The default is Sunday. To set the first day of the week to a different day (e.g., summarize sales from Monday to Sunday instead of Sunday to Saturday), use the SET DATEFIRST command:
-- Change first day of week from default of
-- Sunday (7) to Monday (1)
SET DATEFIRST 1
One final note on dates: in many instances, you might build queries that compare dates where the date column has a time other than 12 AM, yet the date comparison variable is a date with no time defined.
For example, a row with a date of "05-26-2003 22:10:00" will not be retrieved if the query calls for all rows where Date <= CAST('05-26-2003') AS DATETIME. This is because SQL compares 5-26-2003 at 22:10:00 to 5-26-2003 at 00:00:00 (12 AM), and finds that the former is not <= the latter. If you want all data through 5-26, including any transactions that hit right up to 11:59 PM that day, you could add one day to the comparison date, and change the logic to less than (Date < CAST('05-26-2003') + 1 ) .
Tip 9: Data Conversions with CAST and CONVERT
Requirement: You need to produce an English-like message that describes the results of an accounting process. The string should read something like "100 Employee Checks were generated on 11/15/2004 at 5:07:02 PM, for a total dollar value of $134,123.11".
SQL Server provides the functions CAST and CONVERT that allow you to make different data types work together. Listing 8 demonstrates how to use variables of different data types (date, decimal) to build a message string that could be used for a log or display. As is the case with other material presented in this article, SQL Server Books OnLine is an excellent reference on different options for these functions.
Tip 10: Update Triggers
Requirement: You need to make sure that any UPDATES performed against a table automatically update a timestamp column.
An UPDATE trigger is a specific type of stored procedure that fires every time an UPDATE statement executes against the table. You can use triggers to enforce specific rules, and/or to ensure that specific columns are updated.
UPDATE triggers provide access to two critical system tables that contain the state of the row before it was updated (DELETED) and after it was updated (INSERTED). You can use the contents of these tables to implement basic audit trail functionality, which I'll cover in Tip #11. For now, I'll concentrate on setting up an automatic update of a LASTUPDATED column.
In constructing an UPDATE trigger, you need to remember that the trigger fires once, regardless of the number of rows affected by an UPDATE statement. Those new to SQL Server sometimes make the mistake of writing trigger code with the assumption that only one row will be updated.
CREATE TRIGGER Upd_Client ON dbo.Client
FOR UPDATE
AS
UPDATE Client
SET LastUpdate = GETDATE()
FROM Client C
JOIN Inserted I
ON I.PrimaryKey = C.PrimaryKey
Tip 11: Use UPDATE Triggers to Implement Audit Trail Changes
Requirement: You need to log changes to specific columns to an audit trail log. The log should contain the table modified, the primary key of the row modified, the name of the column modified, the values before and after the change (old value/new value), and the date/time of the update.
Many clients want to track changes to specific database items. UPDATE triggers allow you to implement audit trails by providing the INSERTED and DELETED tables that you used in the previous tip.
In Listing 9, you use an UPDATE trigger on a Product Master table to track changes to an item by querying the INSERTED and DELETED tables for changes to particular columns. For this example, check the PRICE and DESCRIPTION columns for changes, and then write the values from INSERTED and DELETED into an application log table.
Note This tip provides basic functionality for Audit Trail logging. If you want to build a completely automated audit trail solution, you should consider building a data-driven script to generate UPDATE triggers for those tables/columns where you require logging.
Several companies make products that can help you build automated audit trail solutions. I can personally recommend SQLAudit from Red Matrix Technologies which provides these capabilities (and more). Considering the amount of development effort to build a comprehensive solution for audit trail, purchasing a third-party tool like SQLAudit may be a wise economic choice.
Tip 12: Dynamic SQL
Although it's a controversial subject, dynamic SQL statements can sometimes provide value in data-driven applications where you cannot determine critical query syntax until runtime. Listing 10 demonstrates two uses of dynamic SQL. The first example constructs a simple query where the table name is a variable. The second example demonstrates how to direct the results of a query that returns one row to an output variable.
Note In Listing 10, the Unicode data (nChar and nVarChar) must be used with dynamic SQL.
You can find excellent discussions on dynamic SQL by searching for "dynamic SQL" on Google. Many online articles demonstrate a variety of dynamic SQL techniques, and some even discuss SQL injection with respect to dynamic SQL.
Tip 13: Get a List of Tables and Columns for a Database
A common beginner's question is how to obtain a list of all tables and columns for a database. You can easily query the Information Schema to retrieve this list:
-- Return a list of table names
SELECT DISTINCT Table_Name FROM
Northwind.INFORMATION_SCHEMA.COLUMNS
ORDER BY Table_Name
-- Return a list of columns
-- (perform a SELECT * to see a full list)
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH FROM
Northwind.INFORMATION_SCHEMA.COLUMNS
ORDER BY Table_Name
-- You could combine the capability of LIKE
-- to find columns with a particular search
-- pattern
Closing Thoughts:
My Web site (www.commongroundsolutions.net) lists several good books on SQL Server 2000 and Transact-SQL. In addition, my Web site also lists dozens of online sources for SQL Server, Transact-SQL, SQL-92, etc.