Entity Framework Pitfalls: Null Semantics

Imagine you have a simple class:

public class MyEntity
{
public int Id { get; set; }
public string Name { get; set; }
}

You may want to query it:

string name = ...;

var entities = ctx
.MyEntities
.Where(x => x.Name == name)
.ToList();

Entity Framework generates a query such as this:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] = @p__linq__0)
OR (([Extent1].[Name] IS NULL) AND (@p__linq__0 IS NULL))

Meaning: return all records from the MyEntities table if

  1. The Name column is identical to the name variable or
  2. Both the Name column and the name variable are NULL

Seems excessively complex, doesn’t it? Well, this is to account for the case where the name parameter is null. In  that case, a simple COLUMN = VALUE comparison won’t work, because in SQL, NULL = NULL returns FALSE! The syntax for comparing with NULL is COLUMN IS NULL. By generating this kind of query, Entity Framework is playing safe. Mind you, this only happens if you do not use the literal null, if you do, Entity Framework is smart enough to generate a simpler query:

var entities = ctx
.MyEntities
.Where(x => x.Name == null)
.ToList();

Results in:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] IS NULL)

However, there is a configuration setting, UseDatabaseNullSemantics (the opposite of the old UseCSharpNullComparisonBehavior of ObjectContext), that can be used to control this behavior. If set to true (the opposite of the default):

ctx.Configuration.UseDatabaseNullSemantics = true;

It generates all SQL queries as the unsafe version:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] = @p__linq__0)

The result is, even if you have a record whose Name column is NULL, it won’t be returned: remember that in SQL, by default, NULL <> NULL!

Therefore, if you are not 100% sure that the parameter that you will be using in the LINQ query will never be null, do not mess with UseDatabaseNullSemantics even if it generates somewhat poorer SQL.

Of course, if you are using SQL Server, you can set ANSI_NULLS to OFF, and the problem goes away; in that case, NULL = NULL.

Data Platform Airlift 2015

Today I had the great pleasure to attend the Data Platform Airlift 2015 event, at Microsoft Portugal! Moreover, I presented a session together with Luís Calado on what’s new in the Azure world! My part was specifically about DocumentDB, one of Microsoft’s offers in the field of NoSQL.

Videos and materials for all presentations will be made available on Channel 9 in the next days, but, in the meantime, you can find my slide deck here.

As always, looking forward for your feedback!

Entity Framework Pitfalls: Using Stored Procedures for Basic Operations

Entity Framework, as of version 6, allows the usage of stored procedures for its CUD (Create, Update and Delete) operations. We just need to override the DbContext’s OnModelCreating method and call MapToStoredProcedures.

protected override void OnModelCreating(DbModelBuilder modelBuilder)

{

    modelBuilder.Entity<Master>().MapToStoredProcedures();

 

    //rest goes here

 

    base.OnModelCreating(modelBuilder);

}

The problem is, this is all or nothing: you cannot use a stored procedure for just one of the operations, you need to use stored procedures for all operations. And, no, it is not possible (yet) to use custom SQL, just stored procedures.

Profiling SQL Server in .NET

The main tool for profiling SQL Server is, of course, SQL Server Profiler. You can, however, achieve more or less the same results using .NET, I’ll show you how.

First, you need to have assemblies Microsoft.SqlServer.ConnectionInfo and Microsoft.SqlServer.ConnectionInfoExtended at hand, which are normally found in C:\Program Files\Microsoft SQL Server\120\SDK\Assemblies (for SQL Server 2014), if you installed the Shared Management Objects package from the SQL Server Feature Pack (again, this link is for SQL Server 2014, but there are feature packs for each version of SQL Server).

Then, write something like this:

var connInfo = new SqlConnectionInfo(@".\SQLEXPRESS");

//connInfo.UserName = "";

//connInfo.Password = "";

 

using (var reader = new TraceServer())

{

    reader.InitializeAsReader(connInfo, @"C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Profiler\Templates\Microsoft SQL Server\120\Standard.tdf");

 

    while (reader.Read() == true)

    {

        for (var i = 1; i < reader.FieldCount; ++i)

        {

            Console.Write(reader.GetName(i));

            Console.Write(": ");

            Console.WriteLine(reader.GetValue(i));

        }

 

        Console.WriteLine();

    }

}

And this is it. You will get a new entry for each event that happens on the server, such as SQL being run, etc. Among others, you get the following information:

  • ApplicationName: the name of the application that is executing the SQL, as supplied in its connection string;
  • NTUserName: the Windows user name of the connected user;
  • LoginName: the SQL Server login name of the connected user;
  • Duration: the request duration;
  • SPID: the executing Server Process ID;
  • StartTime: the start timestamp of the request;
  • EndTime: the end timestamp;
  • TextData: textual data, like, SQL.

One thing to keep in mind is that you need to pass a valid SQL Server Profiler template for the InitializeAsReader method. You can add one of the existing templates to your Visual Studio project, set it to copy to the output folder, and reference it locally, if you prefer, otherwise, you will need to know the full path where it is installed (C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Profiler\Templates\Microsoft SQL Server\120 is for SQL Server 2014).

Entity Framework Pitfalls: Command Interceptors and Identity Keys

The IDbCommandInterceptor (sorry, no official documentation) interface was introduced in Entity Framework 6 as part of the new interception and logging API, and allows the interception of the SQL and its parameters that are sent to the database as the result of DbContext CRUD operations.

It offers two methods for each of the basic ADO.NET operations, those defined in DbCommand, one called before the operation is executed, and the other called afterwards:

  • ExecuteNonQuery (UPDATEs, INSERTs, DELETEs) –> NonQueryExecuting, NonQueryExecuted;
  • ExecuteScalar (SELECTs returning a single value) –> ScalarExecuting, ScalarExecuted;
  • ExecuteReader (SELECTs returning rows) –> ReaderExecuting, ReaderExecuted.

As usual, the methods ending with “ing” are executed before and those ending with “ed” are executed afterwards, always synchronously.

One might naively assume that INSERTs would always trigger a NonQueryExecuting/NonQueryExecuted call, and indeed it is so, unless we have an IDENTITY primary key, in which case, Entity Framework will instead call ReaderExecuting/ReaderExecuted. It’s easy to understand why: when we use an IDENTITY, we need to retrieve the generated value immediately after the INSERT, hence Entity Framework will generate code like:

   1: INSERT INTO [dbo].[SomeTable] (...)

   2: VALUES (...)

   3: SELECT [Id]

   4: FROM [dbo].[SomeTable]

   5: WHERE @@ROWCOUNT > 0 AND [Id] = SCOPE_IDENTITY()

The INSERT and SELECT are combined in the same command, which justifies the choice of ReaderExecuting/ReaderExecuted. Because in Entity Framework we always use IDENTITY – don’t we? –, it’s ReaderExecuting/ReaderExecuted that you should be implementing if you want to change the INSERT SQL or its parameters.

SQL Server Auditing

Probably the best mechanism for auditing changes to tables in SQL Server is Change Data Capture, which I already covered here. Unfortunately, it only works with the Enterprise editions of SQL Server, something that not everyone has access to.

I once had to implement a similar solution that should work on any edition of SQL Server, which is what I am going to talk about. Its only requirement was that for any changes made to target tables, their old values would be stored in a shared history table, with the indication of the timestamp and user who made the change.

Let’s imagine we have some table, anything will do:

   1: CREATE TABLE dbo.audit_target

   2: (

   3:     id INT IDENTITY PRIMARY KEY,

   4:     a NVARCHAR(50),

   5:     b INT

   6: )

As you can see, it’s just a regular table with a single-column primary key and a couple of other columns, any number and type will do, doesn’t really matter.

Now let’s create a table for holding the change history:

   1: CREATE TABLE dbo.history

   2: (

   3:     id INT IDENTITY PRIMARY KEY,

   4:     [table_name] NVARCHAR(30) NOT NULL,

   5:     table_id INT NOT NULL,

   6:     [changes] NVARCHAR(MAX),

   7:     [timestamp] SMALLDATETIME NOT NULL DEFAULT GETDATE(),

   8:     [user] NVARCHAR(30) NOT NULL

   9: )

The table_name column shall hold the name of the table whose changes we are tracking, table_id, the id of the updated row, changes will get a XML containing all the old values, timestamp the date and time the change occurred, and user, the person who made the change (more on this later).

OK, next we need a stored procedure to populate the changes table:

   1: CREATE PROCEDURE dbo.generic_audit_procedure

   2: (

   3:     @procid INT,

   4:     @xml XML

   5: )

   6: AS

   7: BEGIN

   8:     SET NOCOUNT ON

   9:     

  10:     -- check if there are changes, in which case, xml will not be empty

  11:     IF ((@xml IS NULL) OR (CAST(@xml AS NVARCHAR(MAX)) = N''))

  12:     BEGIN

  13:         RETURN 0

  14:     END

  15:     

  16:     -- get the current table id for the current process (trigger)

  17:     DECLARE @tablename NVARCHAR(30)

  18:  

  19:     SELECT @tablename = OBJECT_NAME(t.id)

  20:     FROM sys.sysobjects p

  21:     INNER JOIN sys.sysobjects t

  22:     ON p.parent_obj = t.id 

  23:     WHERE p.id = @procid

  24:     

  25:     IF ((@tablename IS NULL) OR (@tablename = N''))

  26:     BEGIN

  27:         RAISERROR(N'Could not get table name', 16, 16)

  28:         ROLLBACK TRAN

  29:         RETURN

  30:     END

  31:     

  32:     -- get the primary key column for the current table

  33:     DECLARE @idname NVARCHAR(30)

  34:  

  35:     SELECT @idname = name 

  36:     FROM sys.syscolumns

  37:     WHERE id = OBJECT_ID(@tablename)

  38:     AND colstat = 1

  39:  

  40:     IF ((@idname IS NULL) OR (@idname = ''))

  41:     BEGIN

  42:         RAISERROR(N'Could not get id column for the current table', 16, 16)

  43:         ROLLBACK TRAN

  44:         RETURN

  45:     END

  46:     

  47:     -- get the current user from the context

  48:     DECLARE @username NVARCHAR(30)

  49:  

  50:     SELECT @username = CONVERT(NVARCHAR(30), CONTEXT_INFO())

  51:  

  52:     IF ((@username IS NULL) OR (@username = ''))

  53:     BEGIN

  54:         RAISERROR(N'Could not get current username', 16, 16)

  55:         ROLLBACK TRAN

  56:         RETURN

  57:     END

  58:  

  59:     DECLARE @tmp TABLE([key] NVARCHAR(MAX), [value] XML)

  60:  

  61:     INSERT INTO @tmp SELECT q.[key], q.[value]

  62:     FROM

  63:     (

  64:         SELECT T.N.value(N'(*[local-name(.)=sql:variable("@idname")])[1]', N'INT') AS [key], T.N.query(N'*') AS [value]

  65:         FROM @xml.nodes('/row') AS T(N)    

  66:     ) q

  67:  

  68:     DECLARE @ROWS INT

  69:     SET @ROWS = 0

  70:  

  71:     -- loop all modified records

  72:     WHILE (1 = 1)

  73:     BEGIN

  74:         DECLARE @id NVARCHAR(MAX)

  75:         DECLARE @value XML

  76:         

  77:         -- pick the first record

  78:         SELECT TOP 1 @id = [key], @value = [value]

  79:         FROM @tmp

  80:         

  81:         IF (@@ROWCOUNT = 0)

  82:         BEGIN

  83:             BREAK

  84:         END

  85:                         

  86:         -- insert into the shared table

  87:         INSERT INTO dbo.history (table_name, table_id, [user], [changes])

  88:         VALUES (@tablename, @id, @username, CAST(@value AS NVARCHAR(MAX)))

  89:         

  90:         -- increment the updated rows

  91:         SET @ROWS = @ROWS + @@ROWCOUNT

  92:         

  93:         -- remove the processed record

  94:         DELETE FROM @tmp

  95:         WHERE [key] = @id

  96:     END

  97:  

  98:     RETURN @ROWS

  99: END

The generic_audit_procedure will get the current table from the current process id, as passed in the @@procid parameter, the current user from CONTEXT_INFO (again, more on this later), the changes from the @xml parameter and will insert everything into the history table.

And the final part, a trigger to “glue” together the changes made to a table to the stored procedure:

   1: CREATE TRIGGER dbo.generic_audit_trigger

   2: ON dbo.test -- replace this with other target tables' names

   3: AFTER UPDATE

   4: AS

   5: BEGIN

   6:     SET NOCOUNT ON

   7:     

   8:     -- get all changes

   9:     -- this needs to be done on the trigger

  10:     DECLARE @xml XML

  11:     SET @xml = (SELECT * FROM inserted FOR XML PATH('row'))

  12:     

  13:     -- call the generic stored procedure

  14:     EXEC dbo.generic_audit_procedure @@PROCID, @xml

  15: END

This trigger will run after changes are accepted (AFTER UPDATE) and will cast all changes (the INSERTED pseudo-table) as XML, and then invoke the generic_audit_procedure stored procedure with it as an argument, together with the current process id (@@PROCID). Of course, you need to call this for every table that you wish to audit.

The only thing left is, we need to provide a way for SQL Server to know the name of the current user. We do that by manually executing the following query (actually, it was an automated process that was executed by the data access library):

   1: DECLARE @username VARBINARY(128) = CAST(N'rjperes' AS VARBINARY(128))

   2: SET CONTEXT_INFO @username

So, whenever a change is made to a table, it will appear as this:

image

You can see that the changes column will contain an XML node with all the old values that were changed by the update made to the target table.

In a nutshell:

  1. We add a trigger that calls a stored procedure upon change to a number of target tables;
  2. Save the application user in CONTEXT_INFO;
  3. Modify the data in a table;
  4. Get the old values in an auditing table;

There are some limitations with this implementation:

  • Only single-column primary key target tables are supported;
  • All of the “old” values are recorded, not just those that changed;
  • It only keeps the “old” values, not the new ones, although this is by design and easy to change;
  • Binary columns could use a better treatment;
  • Doesn’t keep track of deleted or inserted records;
  • Doesn’t track schema changes (add/drop/modify columns, etc);
  • It is necessary to set CONTEXT_INFO explicitly with the application user that we want to record;
  • Although I didn’t implement it, it would be easy to add a version column to the history table, to keep track of how many changes were made to each target table.

However, it suited perfectly what I needed, maybe it will work for you too! Winking smile

Custom Entity Framework Code First Convention for Discriminator Values

Since version 6, Entity Framework Code First allows the injection of custom conventions. These conventions define rules that will be applied by default to all mapped entities and properties, unless explicitly changed.

The conventions API includes a couple of interfaces: IConvention (marker only, should always be included), IConceptualModelConvention<T> (for the conceptual space of the model) and IStoreModelConvention<T> (for the store, or physical, side of the model). Worthy of mention, there is also a convenience class, Convention, that allows access to all mapped types and properties and doesn’t override any of the other conventions, and also TypeAttributeConfigurationConvention<T>, for tying a convention to a custom attribute. Some of the included attributes leverage these interfaces to configure some aspects of the mappings at design time, other configuration needs to be done explicitly in an override of OnModelCreating.

Entity Framework permits using a column for distinguishing between different types, when the Table Per Class Hierarchy / Single Table Inheritance pattern (please see Entity Framework Code First Inheritance for more information) is used for mapping a hierarchy of classes to a single table, as part of “soft delete” solutions, or, less known, for differentiating between multiple tenants. This column is called a discriminator.

In order to configure an entity to use a discriminator column, there is no out of the box attribute, so we must resort to code configuration:

   1: protected override void OnModelCreating(DbModelBuilder modelBuilder)

   2: {

   3:     modelBuilder.Entity<MyMultiTenantEntity>().Map(m => m.Requires("tenant_id").HasValue("first_tenant"));

   4:  

   5:     base.OnModelCreating(modelBuilder);

   6: }

Because there’s really no need to keep repeating this code, let’s implement an attribute for indicating a discriminator column in an entity:

   1: [Serializable]

   2: [AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = true)]

   3: public sealed class DiscriminatorAttribute : Attribute

   4: {

   5:     public DiscriminatorAttribute(String columnName, Object discriminatorValue)

   6:     {

   7:         this.ColumnName = columnName;

   8:         this.DiscriminatorValue = discriminatorValue;

   9:     }

  10:  

  11:     public String ColumnName { get; private set; }

  12:  

  13:     public Object DiscriminatorValue { get; private set; }

  14:  

  15:     public override Boolean Equals(Object obj)

  16:     {

  17:         var other = obj as DiscriminatorAttribute;

  18:  

  19:         if (other == null)

  20:         {

  21:             return (false);

  22:         }

  23:  

  24:         return ((this.ColumnName == other.ColumnName) && (Object.Equals(this.DiscriminatorValue, other.DiscriminatorValue) == true));

  25:     }

  26:  

  27:     public override Int32 GetHashCode()

  28:     {

  29:         return (String.Concat(this.ColumnName, ":", this.DiscriminatorValue).GetHashCode());

  30:     }

  31: }

As you can see, the DiscriminatorAttribute attribute can only be applied to a class, at most once. This makes sense, because most likely you will only have a single discriminator column per entity:

   1: [Discriminator("tenant_id", "first_tenant")]

   2: public class MyMultiTenantEntity

   3: {

   4:     //...

   5: }

You need to specify both a column name and a discriminator value, which can be of any type, usually, a string or an integer.

Now, let’s write a custom convention that knows how to handle our custom attribute and perform the mapping:

WARNING! DYNAMICS AND REFLECTION AHEAD!

PROCEED WITH CAUTION!

   1: public sealed class DiscriminatorConvention : TypeAttributeConfigurationConvention<DiscriminatorAttribute>

   2: {

   3:     private static readonly MethodInfo entityMethod = typeof(DbModelBuilder).GetMethod("Entity");

   4:     private static readonly MethodInfo hasValueMethod = typeof(ValueConditionConfiguration).GetMethods().Single(m => (m.Name == "HasValue") && (m.IsGenericMethod == false));

   5:  

   6:     private readonly DbModelBuilder modelBuilder;

   7:     private readonly ISet<Type> types = new HashSet<Type>();

   8:  

   9:     public DiscriminatorConvention(DbModelBuilder modelBuilder)

  10:     {

  11:         this.modelBuilder = modelBuilder;

  12:     }

  13:  

  14:     public override void Apply(ConventionTypeConfiguration configuration, DiscriminatorAttribute attribute)

  15:     {

  16:         if (this.types.Contains(configuration.ClrType) == true)

  17:         {

  18:             //if the type has already been processed, bail out

  19:             return;

  20:         }

  21:  

  22:         //add the type to the list of processed types

  23:         this.types.Add(configuration.ClrType);

  24:  

  25:         dynamic entity = entityMethod.MakeGenericMethod(configuration.ClrType).Invoke(modelBuilder, null);

  26:  

  27:         Action<dynamic> action = arg =>

  28:         {

  29:             var valueConditionConfiguration = arg.Requires(attribute.ColumnName);

  30:             hasValueMethod.Invoke(valueConditionConfiguration, new Object[] { attribute.DiscriminatorValue });

  31:         };

  32:  

  33:         entity.Map(action);

  34:     }

  35: }

This class uses a bit of dynamics and reflection because types are not known at compile time, and hence we cannot use generics directly. Because the Apply method will be called multiple times, we need to keep track of which entities have already been processed by this convention, so as to avoid reprocessing them. We need to pass it the instance of DbModelBuilder, because otherwise our custom convention would have no way to apply the mapping, but I think it is a reasonable trade off.

Et voilà! In order to make use of it, we need to register the convention in OnModelCreating:

   1: protected override void OnModelCreating(DbModelBuilder modelBuilder)

   2: {

   3:     modelBuilder.Conventions.Add(new DiscriminatorConvention(modelBuilder));

   4:  

   5:     base.OnModelCreating(modelBuilder);

   6: }

And that’s it! Happy conventions! Winking smile

Entity Framework Pitfalls: Mapping Discriminator Columns

When you use a discriminator column, that is, a column that holds a value that tells Entity Framework what type the row refers to, or what restriction it shall use when querying the entity, you cannot map this column as a property in your entity.

For example, imagine you want to use “soft deletes”, that is, use a database column to represent the “deleted” state of a record instead of actually physically deleting it from the table; you might have an IS_DELETED column of some integer type, that would hold either a 1 or a 0, depending on whether the record is deleted (1) or not (0). In that case, you would build a mapping like this in OnModelCreating:

   1: modelBuilder.Entity<MySoftDeletableEntity>().Map<MySoftDeletableEntity>(m => m.Requires("is_deleted").HasValue(0));

However, you won’t be able to have a corresponding IsDeleted property in the MySoftDeletableEntity, because Entity Framework will throw an exception complaining about it. It is somewhat sad, because it might be useful to refer to it, but that’s the way it is.