Simplifying ADO.NET Data Access, Part 2

Published on Author Michael

Part 2 of a series on simplifying ADO.NET code.

In the previous article we talked about the basics of ADO.NET. There are times where ADO.NET is still the correct choice for data access. Unfortunately it was written back in the original days of .NET and hasn’t been updated with newer features like generics. In this article we will update the ADO.NET types to make them easier to use. This is a stepping stone to simplifying the overall data access process.

A Word On Generics

One of the common issues with ADO.NET (and other “legacy” types) is that they do not play well with LINQ or the var keyword. The issue is that, prior to generics, if you wanted a collection you derived from the generic CollectionBase type and exposed custom, typed equivalents. Since the base collection type works with object LINQ and var cannot infer the type making them of limited use. LINQ does have the OfType<T> method that allows you to safely convert the older syntax to the newer one. The only downside is the extra typing. We’ll be using this method (as an extension) throughout this article to make it cleaner. You can apply this same approach to any other places this occurs in the framework.

DataSet

The core information that a DataSet stores is the list of tables. We will add a few extensions to reduce the typing involved to get information about the tables and to play nice with LINQ. Here’s how you might do it today.

var users = new List();
using (var conn = new SqlConnection(m_connectionString))
{
    using (var cmd = new SqlCommand("SELECT Id, Name FROM Users", conn))
    {
        conn.Open();

        var ds = new DataSet();

        var da = new SqlDataAdapter(cmd);                    
        da.Fill(ds);

        if (ds.Tables.Count > 0)
        {
            foreach (DataRow row in ds.Tables[0].Rows)                        
            {
                users.Add(new User(Convert.ToInt32(row[0])) { Name = Convert.ToString(row[1]) });                            
            };
        };
    };
};

return users;

Here’s the updated version.

if (ds.TableCount() > 0)
{
    foreach (var row in ds.Tables[0].AsEnumerable())                        
    {
        users.Add(new User(Convert.ToInt32(row[0])) { Name = Convert.ToString(row[1]) });                            
    };
};

Here’s the extension methods.

public static class DataSetExtensions
{        
    public static bool HasTables ( this DataSet source )
    {
        return source.Tables.Count > 0;
    }

    public static int TableCount ( this DataSet source )
    {
        return source.Tables.Count;
    }

    public static IEnumerable TablesAsEnumerable ( this DataSet source )
    {
        return source.Tables.OfType();
    }
}

DataTable

For a DataTable the data revolves around the columns and the rows. For columns the biggest frustration is adding a new column. In order to add a new column you can use the Add method. This is fine for one or two columns but quickly gets annoying when you have lots of columns.

var dt = new DataTable();
dt.Columns.Add("Column1", typeof(string));
dt.Columns.Add("Column2", typeof(int));
dt.Columns.Add("Column3", typeof(double));

A simple extension method can be added that allows you to chain together the creation of columns. (You can extend this to include other overloads as well).

public static DataTable AddColumn ( this DataTable source, string name, Type type )
{
    var column = new DataColumn(name, type);
    source.Columns.Add(column);

    return source;
}

And the updated sample code.

var dt = new DataTable()
               .AddColumn("Column1", typeof(string))
               .AddColumn("Column2", typeof(int))
               .AddColumn("Column3", typeof(double));

Turning to the rows themselves we have similar extensibility points as DataSet had.

public static bool HasRows ( this DataTable source )
{
    return source.Rows.Count > 0;
}

public static int RowCount ( this DataTable source )
{
    return source.Rows.Count;
}

public static IEnumerable<DataRow> RowsAsEnumerable ( this DataTable source )
{
    return source.Rows.OfType<DataRow>();
}

The final extensibility point is adding a new row. For whatever reason adding a new row to a table is a 2 step process: create the row, add the row. Here’s how it generally looks.

var row = dt.NewRow();
row["column1"] = 10;
row["column2"] = "Name";
dt.Rows.Add(dt);

With a simple extension method or two we can combine the two steps into one as this is the typical case. The first method adds a new, empty row. The second method adds a row along with its values and returns back the table so chaining can occur.

public static DataRow AddNewRow ( this DataTable source )
{
    var row = source.NewRow();
    source.Rows.Add(row);

    return row;
}

public static DataTable InsertRow ( this DataTable source, params object[] values )
{
    var row = source.NewRow();

    if (values != null)
        for (int index = 0; index < values.Length; ++index)
            row[index] = values[index];

    source.Rows.Add(row);
    return source;
}

And the updated sample.

var row = dt.AddNewRow();
row["column1"] = 10;
row["column2"] = "Name";
            
//Or
dt.InsertRow(10, "Name")
  .InsertRow(11, "Name 2");

DataRow

A DataRow is simply an array of values (one for each column). Even though it contains just a value there is a lot of extensibility to the type. The values are typed as object but can be any primitive type (and a few others). The key for working with a row is getting the typed value easily.

foreach (var row in ds.Tables[0].AsEnumerable())                        
{
    users.Add(new User(row.Field<int>("Id")) { Name = row.Field<string>("Name") });       
};

The type already provides support for getting column values by type but it will fail if the column is null. A database null (represented by DBNull) is quite a bit different from a .NET null. Sometimes a null column matters and sometimes we just want it to default. When it matters use the standard method. When null doesn’t matter then use some extension methods to default the column to the type’s default. There is an overload that allows you to specify the default value to use instead of the type’s default.

Here’s one for Int32.

public static int GetInt32ValueOrDefault ( this DataRow source, string columnName )
{
    return GetInt32ValueOrDefault(source, columnName, 0);
}

public static int GetInt32ValueOrDefault ( this DataRow source, 
                                 string columnName, int defaultValue )
{
    VerifyColumnExists(source, columnName);

    return GetValueCore(source, columnName, Int32.TryParse, defaultValue);
}

private static T GetValueCore<T>( DataRow source, 
                            string columnName, TryParseDelegate<T> parser, T defaultValue 
                            ) where T : struct
{
    T value;
    TryGetValueOrDefault<T>(source, columnName, parser, out value, defaultValue);
    return value;
}

private delegate bool TryParseDelegate<T>( string value, out T result );

private static bool TryGetValue<T>( DataRow source, string columnName, 
                             TryParseDelegate<T> parser, 
                             out Nullable<T> result ) where T : struct
{
    string value;
    if (TryGetStringValue(source, columnName, out value))
    {
        if (String.IsNullOrEmpty(value))
        {
            result = null;
            return true;
        };

        T actualResult;
        if (parser(value, out actualResult))
        {
            result = actualResult;
            return true;
        };
    };

    result = null;
    return false;
}

private static bool TryGetValueOrDefault<T>( DataRow source, string columnName, 
                                TryParseDelegate<T> parser, 
                                out T result, 
                                T defaultValue = default(T) ) where T : struct
{
    Nullable<T> actualResult;
    if (TryGetValue(source, columnName, parser, out actualResult))
    {
        result = actualResult.HasValue ? actualResult.Value : defaultValue;
        return true;
    };

    result = defaultValue;
    return false;
}

private static bool CheckColumnExists ( DataRow source, string columnName )
{
    return source.Table.Columns.Contains(columnName);
}

private static void VerifyColumnExists ( DataRow source, string columnName )
{
    if (!CheckColumnExists(source, columnName))
        throw new ArgumentException("Column does not exist.", "columnName");
}

A couple of notes about the implementation. For strings an empty string is assumed to be the default value. It is generally not a good idea to return a null string unless “lack of string” has meaning, which it generally doesn’t. In most cases conversion occurs by first translating to string and then using the standard parse method to convert to the final type. For boolean a special method is used that also allows you to include some additional (non-standard) boolean mappings like “yes” and “no”.

The methods include the type as part of the method name. This results in more methods but ensures that invalid types are not allowed. If you desire you could create a generic version of the method. You reduce the surface area of the methods but introduce a potential runtime error if a bad type is specified.

The updated sample code.

foreach (var row in ds.Tables[0].AsEnumerable())                        
{
    users.Add(new User(row.GetInt32ValueOrDefault("Id")) {
                          Name = row.GetStringValueOrDefault("Name") });                        
};

For symmetry, there are also Try methods for each of the types in the download.

One final extension method allows you to determine if a column exists at all. Sometimes you have to write code that is generic enough that the columns returned may be inconsistent. This method can be used to determine if a column exists before you try to retrieve the value (and throw an exception).

public static bool ColumnExists ( this DataRow source, string columnName )
{
    if (columnName == null)
        throw new ArgumentNullException("columnName");
    if (String.IsNullOrEmpty(columnName))
        throw new ArgumentException("Column name cannot be empty.", "columnName");

    return CheckColumnExists(source, columnName);
}

DataReader

In a lot of ways DataReader (or more correctly DbDataReader) can be extended just like DataRow. As such there are corresponding extensions to read values by type and for handling null values. The only real difference is that DataReader only exposes strongly typed getters by ordinal. Since ordinals (while faster) are generally not a good idea we can add extension methods to get typed values by name as well. Here’s how it might look in code.

using (var dr = cmd.ExecuteReader())
{
    while (dr.Read())
    {
        roles.Add(new Role(dr.GetInt32OrDefault("Id")) { 
                              Name = dr.GetStringOrDefault("Name") });                            
    };
};

There are a couple of implementation details to be aware of. To retrieve the value by name a lookup is done to get the ordinal value. The standard method is then used to retrieve the value by ordinal. Because of this the actual type of the value must match the method being called (Int32 for an int column). If this isn’t done then an exception will occur. The numeric types try to allow for mismatch by using coercing. The coercion type is included in the download.

There are other areas for extensibility as well. Sometimes it is useful to get the column names from the reader. This can be done by using a simple for loop but converting it to an extension method is simple.

public static string[] GetNames ( this IDataReader source )
{
    if (source.IsClosed)
        throw new InvalidOperationException("The reader is closed.");

    int count = source.FieldCount;
    string[] names = new string[count];
    for (int nIdx = 0; nIdx < count; ++nIdx)
        names[nIdx] = source.GetName(nIdx);

    return names;
}

Most of the remaining extension methods are simply copies of the standard methods but with support for named columns rather than ordinals.

Enhancements

This code has been in use for a while so I’m reluctant to change it but there are some areas that could stand improving.

  • DataRowExtensions and DataReaderExtensions convert the data in different ways. The result is that a value may convert with one method but not the other. This will generally only occur when the underlying type does not match the retrieved type. Ideally these should be the same and I’d lean toward the DataRowExtensions approach which is more resilient to differing types.
  • DataRowExtensions and DataReaderExtensions both provide methods for the various types. Switch this to use a generic method for getting values. Under the hood the logic will remain the same but the surface area will be reduced. This will have the disadvantage of bad types not being detected until runtime.

Next Time

Now that we’ve provided some extensions to ADO.NET it’s time to switch gears and get started replacing all the boilerplate code for retrieving data.

One Response to Simplifying ADO.NET Data Access, Part 2