This installment of “The Baker’s Dozen” presents a variety of tips and techniques to become productive with data handling techniques using ADO.NET 2.0 in Visual Studio 2005.
ADO.NET 2.0 is faster than the first version of ADO.NET; in some instances, significantly faster. While many view ADO.NET 2.0 as more evolutionary than revolutionary, it provides many functions to give developers greater control over data access and data manipulation. It also leverages the new database capabilities in SQL Server 2005. In addition, ADO.NET 2.0 simplifies the creation of multiple-database solutions.
Beginning with the End in Mind
The ability to effectively work with data is essential in a line-of-business application. ADO.NET has a rich object model, but takes time and experience to master, especially when coming from other development environments.
ADO.NET 2.0 introduces a new provider factory class that provides a much easier way to support multiple databases by loading the desired data provider for a specific connection string.
While ADO.NET in Visual Studio .NET 2003 offered tremendous power, ADO.NET 2.0 in Visual Studio 2005 provides even more capabilities, along with major performance gains when inserting, filling, and merging large amounts of data. Here’s a quick rundown of what’s in store for this Baker’s Dozen installment.
- New capabilities for the DataTable that previously were only available for the DataSet
- Baker’s Dozen Spotlight: maximizing the use of typed DataSets
- The new base class Provider model and how to write generic code to leverage it
- Performance and optimization enhancements in ADO.NET 2.0
- Baker’s Dozen Potpourri: miscellaneous data-munging
- Data relations and look-ups using typed DataSets
- Asynchronous command execution
- New capabilities in the LOAD method to create a DataSet without a DataAdapter
- More control over the rowstate
- The new SqlBulkCopy class
- Multiple active result sets (MARS)
- Connection pooling
- Connection statistics
In addition to the Spotlight and Potpourri sections, I’ve added a new section to this series-the Baker’s Dozen commentary-which appears at the end.
Tip 1: The DataTable Breaks Free
In Visual Studio .NET 2003, several methods were only available for the DataSet. One example is writing (and reading) XML data: a developer who wanted to write the contents of a data table to an XML file had to create a DataSet, add the data table to the DataSet, and use the DataSet’s WriteXml method.
Developers who wanted to utilize those methods for a single data table had to create a DataSet and add the data table just to gain access to these methods. ADO.NET 2.0 now provides several methods for the data table. Table 1 lists these methods.
- In addition, ADO.NET 2.0 allows a developer to serialize a data table in remoting scenarios. The new RemotingFormat property (covered in detail in Tip 4) applies to the data table.
Finally, ADO.NET 2.0 allows developers to create a new data table from any data view. The ToTable method also contains an overload for the developer to create a subset list of columns in the new table. And if that isn’t enough, hold onto your hat-there’s more! ToTable contains an additional overload to filter distinct rows. (Previously, you had to write additional code to accomplish this.) Now a developer simply must specify an array of columns to filter on unique values.
I’m very pleased with the data table enhancements in ADO.NET 2.0. There are some enhancements that I hope Microsoft will consider for the next version. Even with the new independence of the data table, there are still capabilities that require a DataSet. For instance, you can only establish a data relation between two data tables in the same DataSet. Situations where master codes exist in multiple transaction tables requires some additional coding when relations need to be established.
Tip 2: Getting the Most Out of Typed Datasets
A typed DataSet is a subclass of the standard ADO.NET DataSet. It exposes the tables, rows, columns, etc., as strongly-typed properties. Visual Studio 2005 can check these strongly-typed names at compile time, as opposed to runtime. In this sense, strongly-typed DataSets are self-documenting. IntelliSense will display the names in the typed DataSet object hierarchy-exposing column names as properties in the IDE can be very helpful in a large database application. It isn’t necessary to perform the boxing/unboxing that is required when using untyped DataSets.
Typed DataSets simplify handling null values. A strongly-typed data row contains two methods to check if a column value is null, and to set a column value to null.
When used properly, typed DataSets are especially beneficial in a multi-developer environment as well as applications that make heavy use of result sets and reporting. Developers can define typed DataSets as separate project DLLs and then set references to them from other projects.
Listing 1 contains sample code to demonstrate the basic use of typed DataSets. In addition, developers can subclass a typed DataSet class to add validation code or other methods. Listing 2 includes a brief example that extends typed DataSets through an interface and a subclassed definition.
Recently I developed a reporting solution involving a stored procedure that returned ten tables. The scenario represented a one-many-many relationship with a large number of columns that I didn’t want to retype into the VS.NET typed DataSet designer. I wanted to take the structure of the stored procedure results, turn it into an XML schema, and use it to create a typed DataSet for the project (Listing 3).
A complaint about typed DataSets is the default naming conventions in the typed DataSet class for DataTable/DataRow objects and methods/events. The generated names may not be consistent with preferred naming conventions. Fortunately, developers can use Typed Dataset Annotations to solve some of the common naming issues. Annotations allow developers to modify the names of elements in the typed DataSet without modifying the actual schema.
Annotations also allow you to specify a nullValue annotation-this instructs the typed DataSet class how to react when a column is DbNull. You can optionally tell the typed DataSet to return an empty string or a default replacement value. Shawn Wildermuth demonstrates typed DataSet annotations in an excellent online article (see the Recommended Reading sidebar).
Tip 3: Supporting Multiple Databases with Provider Factories
In ADO.NET 1.0, supporting multiple databases meant coding against interfaces for the ADO.NET objects (IdbConnection, IdbDataAdapter, IdbCommand, etc.), as well as writing a case statement to determine which connection object to use. ADO.NET 2.0 introduces a new provider factory class, which provides a much easier way to support multiple databases by loading the desired data provider for a specific connection string.
For line-of-business, data-driven applications, the DataSet approach provides many benefits. Tasks such as two-way data-binding, RowFilter/Sorting, XML integration, setting up relationships between data objects, preserving rowstate information, merging changes, and complex reporting operations are much easier with DataSets.
A developer can code against the new classes DbProviderFactory and DbProviderFactories, and pass the desired namespace at runtime. Table 2 displays the methods that a new DbProviderFactory object exposes. Note that the names of the providers must appear in the machine configuration file. Listing 4 demonstrates the use of provider factories.
Tip 4: Performance Enhancements in ADO.NET 2.0
Microsoft has made two significant enhancements to ADO.NET that result in better performance. First, they rewrote the indexing engine for better performance. As a result, many operations will execute more quickly, especially as the size of a DataSet grows. Any increase in performance will vary based on many application environment factors, but many should see increases by a factor of two or greater. One test published on MSDN that inserted a million rows into a DataTable took 30 minutes in Visual Studio .NET 2003 and 45 seconds in Visual Studio 2005! Lesser amounts of data will result in lower orders of magnitude; but in general, insert, update, delete, fill, and merge operations will execute more quickly in Visual Studio 2005.
Second, many developers know that passing and returning DataSets across physical boundaries carries substantial overhead and incurs performance penalties. ADO.NET 2.0 supports true binary serialization in remoting environments by providing a new RemotingFormat property, which a developer can set to SerializationFormat.Binary.
Depending on the size of the DataSet, the binary serialization may yield a result as much as six times smaller than XML serialization, resulting in faster transfer speeds and smaller required bandwidth resources. Note that this enhancement only works in remoting environments because Web services (by definition) pass XML.
Tip 5: Baker’s Dozen Potpourri: Data Munging
Roll up your sleeves! It’s time for some elbow grease. Listing 5 demonstrates some miscellaneous tasks that developers new to ADO.NET often need to accomplish:
- Filtering rows that contain NULL values
- Filtering on columns in a parent/child relationship
- Defining primary keys and performing lookups
- Adding a new row
- Creating a filter and looping through the results
- Handling null data with a strongly-typed datarow
Tip 6: Data Relations and Look-Ups with Typed Datasets
Listing 1 references a typed DataSet with two tables. If the developer defines a relation between the two tables in the typed DataSet, the XML Schema Definition Tool will add methods to navigate through the data without the need to specify the name of the DataRelation.
In the specific example of a master table called DtOrderHeader and a child table called DtOrderDetail, the XML Schema Definition Tool automatically creates a GetDtOrderDetailRows method (similar to GetChildRows) to the strongly typed DataRow class for DtOrderHeader, and a GetDtOrderHeaderRow method (similar to GetParentRow) to the strongly typed DataRow class for DtOrderDetail. This demonstrates yet another advantage of strongly typed DataSets-the XML Schema Definition Tool has done some of the work for the developer.
Strongly-typed DataSets are self-documenting. IntelliSense will display the names in the typed DataSet object hierarchy. They are especially beneficial in a multi-developer environment as well as applications that make heavy use of result sets and reporting.
Additionally, if a typed data table contains a primary key, the class exposes a Find method associated with the column name. A developer can perform a Find against a table with a primary key of OrderID by using the method FindByOrderID. If the primary key is a concatenation of two columns (CustomerID and OrderID), the developer would use FindByCustomerIDOrderID. (As mentioned in Tip #2, developers can use Typed Dataset Annotations for alternate naming conventions.
Tip 7: Asynchronous Command Execution
ADO.NET 2.0 introduces a new asynchronous API that allows a program to continue while specific database commands execute. Previously, a developer could simulate asynchronous processing using asynchronous delegates or the ThreadPool class. However, ADO.NET 2.0 provides true asynchronous functionality.
Three methods in ADO.NET 2.0 provide asynchronous capabilities: ExecuteNonQuery, ExecuteReader, and ExecuteXMLReader. ADO.NET provides Begin and End prefix/suffix references to these methods.
IAsyncResult oAR = command.BeginExecuteReader();
// other processing
while(oAR.IsCompleted==false) {
// do something while waiting
}
SqlDataReader r = command.EndExecuteReader(oAR);
The developer also needs to know when the operation is complete. ADO.NET 2.0 provides three models to detect when an asynchronous command has completed:
- Callback model: the developer specifies a function to be executed when the command has executed. The begin methods contain an overload for a delegate parameter.
- Synchronization objects (wait model): the IasyncResult object listed above contains a WaitHandle property
- Polling: the IasyncResult object also contains an IsCompleted property that subsequent code can evaluate for completion. You can write subsequent code and test for whether IsCompleted is true.
Important note: You must add async=true to the connection string for asynchronous execution to work. If you don’t plan to use asynchronous commands, Microsoft recommends setting this to false.
Tip 8: New Data LOAD Options
In ADO.NET 1.0, populating a DataSet from a back-end database meant either you must add the Fill method of the Data Adapter, or create a DataReader and loop through it. ADO.NET 2.0 provides greater capabilities for creating a data table when all that exists is a data reader.
SqlDataReader oReader =
oMyCommand.ExecuteReader();
DataTable DtTable = new DataTable();
DtTable.Load(oReader,LoadOption.OverwriteRow);
// note that additional LoadOption enumerations
// exist, such OverwriteChanges, PreserveChanges
ADO.NET 2.0 also provides a new DataTableReader class, which a developer can use to stream data from a DataSet/DataTable using the new GetDataReader method.
Tip 9: More Control Over RowState
ADO.NET 2.0 provides two new methods to provide greater control over a DataRow’s RowState: SetAdded and SetModified. These methods set the RowState of a DataRow, which was a read-only property in ADO.NET 1.0.
// Developer can use these to set the RowState
// of an unchanged row
oMyDataRow.SetAdded();
oMyDataRow.SetModified();
Tip 10: SqlBulkCopy
ADO.NET 2.0 provides the ability to bulk copy the contents of a DataReader, DataTable, array of DataRows, or a DataSet to a SQL Server database. Listing 6 demonstrates an example of this welcome new capability to transfer data from one source to another.
Tip 11: Multiple Active Result Sets (MARS)
SQL Server 2005 supports Multiple Active Result Sets (MARS), which permits multiple DataReaders on a single connection. MARS has the potential to optimize database connections and performance, as multiple stored procedures or SQL pass-through queries can be executed asynchronously. Listing 7 demonstrates how to use MARS.
Tip 12: Connection Pooling
ADO.NET 2.0 improves connection pooling through two new methods: ClearPool and ClearPools. Both the SQL Server and Oracle providers support these functions. ClearPool will clear a specific pool while ClearPools will clear all connection pools.
Tip 13: Connection Statistics
A developer can retrieve Connection Statistics from the SqlConnection class by setting the connection object’s StatisticsEnabled property to true and calling the RetrieveStatistics method (Listing 8). Table 3 lists the statistics.
In Addition…
Coming up with thirteen tips is always difficult, especially when a particular technology has more than thirteen new capabilities! ADO.NET 2.0 also contains the following enhancements:
- Support for the new SQL DataTypes (user-defined types, the XML DataType, and large character support)
- Query notifications to refresh cached data when the original server data changes
- Promotable transactions
- Namespace-qualified tables
- Batch updates
- New Execute methods for the DataReader
The Baker’s Dozen Commentary
An old debate has resurfaced in the form of the argument about using DataSets versus custom entities, passing XML across boundaries versus passing custom objects, etc. Here are my twin copper Lincolns:
I’m immediately skeptical of any argument that begins with “never use DataSets” or “always use DataSets.” To paraphrase the old expression, the form of an application follows the functionality. I’ll freely admit that I come down more on the DataSet side, primarily because the applications I’ve built have the requirements that are often addressed by the built-in capabilities of ADO.NET. Had the applications been different, taking a more hardcore OOP approach to creating entities might have been preferable.
But for the ubiquitous line-of-business, data-driven applications, the DataSet approach provides many benefits. Tasks such as two-way data-binding, RowFilter/Sorting, XML integration, setting up relationships between data objects, preserving rowstate information, merging changes, and complex reporting operations are much easier with DataSets.
You may have to write additional code-sometimes a substantial amount of code-in the absence of DataSets. Those advocating this approach should ask themselves-is it worthwhile? And with the performance enhancements in ADO.NET 2.0, will custom code to accomplish these tasks execute as quickly as native ADO.NET 2.0 capabilities? In my opinion, the ability to effectively finish an application more quickly using native capabilities in .NET is a fundamental argument that developers should seriously evaluate when opting for approaches that exclude DataSets.
Closing Thoughts
Over the last several months, I’ve received many positive e-mails and comments on the Baker’s Dozen series. I’m reminded of the famous quote, “No man is an island.” I want to thank folks like Bonnie Berent, Cathi Gero, and Mike Antonovich, who have provided valuable input and guidance over the years. My editors (Erik Ruthruff and Melanie Spiller) deserve a medal for dealing with me. I also want to give a special thanks to Rod Paddock and John Petersen who have given me incredible support and direction. You’re all first class folks.