Entity Framework Pitfalls: Skip/Take Position Matters

In LINQ, you normally do paging through the Skip and Take operators. These are the LINQ counterparts to whatever the relational database uses for pagination – ROW_NUMBER(), LIMIT…OFFSET, TOP, ROWNUM, etc.

There are some gotchas to it:

  • When you do paging, Entity Framework demands that you add ordering to your query. This makes sense as without an explicit ORDER BY clause, search results may come in an unexpected order;
  • Skip and Take yield different results if added to different places in the LINQ expression.

Consider these two LINQ queries:

ctx
.MyEntities
.Where(x => x.Name == "")
.OrderBy(x => x.Id)
.Skip(10)
.Take(5)
.ToList();

And:

ctx
.MyEntities
.OrderBy(x => x.Id)
.Skip(10)
.Take(5)
.Where(x => x.Name == "")
.ToList();

The first one will produce this SQL:

SELECT TOP (5)
[Filter1].[Id] AS [Id],
[Filter1].[Name] AS [Name],
[Filter1].[Date] AS [Date]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].
[Date] AS [Date], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number
]
FROM [dbo].[MyEntities] AS [Extent1]
WHERE N'' = [Extent1].[Name]
) AS [Filter1]
WHERE [Filter1].[row_number] > 10
ORDER BY [Filter1].[Id] ASC

Whereas the second will produce this one:

SELECT
[Limit1].[Id] AS [Id],
[Limit1].[Name] AS [Name],
[Limit1].[Date] AS [Date]
FROM ( SELECT TOP (5) [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Date] AS [Date]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Date] AS [Date], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[MyEntities] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
) AS [Limit1]
WHERE N'' = [Limit1].[Name]
ORDER BY [Limit1].[Id] ASC

Basically, the TOP(5) expression is placed in the wrong place. So, in LINQ, it’s not just the presence of a Skip and Take operator, it’s where you place it that matters!

Entity Framework Pitfalls: Null Semantics

Imagine you have a simple class:

public class MyEntity
{
public int Id { get; set; }
public string Name { get; set; }
}

You may want to query it:

string name = ...;

var entities = ctx
.MyEntities
.Where(x => x.Name == name)
.ToList();

Entity Framework generates a query such as this:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] = @p__linq__0)
OR (([Extent1].[Name] IS NULL) AND (@p__linq__0 IS NULL))

Meaning: return all records from the MyEntities table if

  1. The Name column is identical to the name variable or
  2. Both the Name column and the name variable are NULL

Seems excessively complex, doesn’t it? Well, this is to account for the case where the name parameter is null. In  that case, a simple COLUMN = VALUE comparison won’t work, because in SQL, NULL = NULL returns FALSE! The syntax for comparing with NULL is COLUMN IS NULL. By generating this kind of query, Entity Framework is playing safe. Mind you, this only happens if you do not use the literal null, if you do, Entity Framework is smart enough to generate a simpler query:

var entities = ctx
.MyEntities
.Where(x => x.Name == null)
.ToList();

Results in:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] IS NULL)

However, there is a configuration setting, UseDatabaseNullSemantics (the opposite of the old UseCSharpNullComparisonBehavior of ObjectContext), that can be used to control this behavior. If set to true (the opposite of the default):

ctx.Configuration.UseDatabaseNullSemantics = true;

It generates all SQL queries as the unsafe version:

SELECT [Extent1].[Id] AS Id, [Extent1].[Name] AS Name
FROM [dbo].[MyEntities] AS [Extent1]
WHERE ([Extent1].[Name] = @p__linq__0)

The result is, even if you have a record whose Name column is NULL, it won’t be returned: remember that in SQL, by default, NULL <> NULL!

Therefore, if you are not 100% sure that the parameter that you will be using in the LINQ query will never be null, do not mess with UseDatabaseNullSemantics even if it generates somewhat poorer SQL.

Of course, if you are using SQL Server, you can set ANSI_NULLS to OFF, and the problem goes away; in that case, NULL = NULL.

Entity Framework Pitfalls: Queries Over Navigation Properties When There Are Foreign Key Properties Are Ignored

Long title, I know! Smile

Another one for Entity Framework Code First. The problem is: imagine you have an entity with a relation to another entity (one-to-one, many-to-one) where you have both the navigation property and the foreign key property:

public class Source
{
public int Id { get; set; }

[ForeignKey("TargetId")]
public virtual Target Target { get; set; }

public int? TargetId { get; set; }
}

public class Target
{
public int Id { get; set; }
public virtual ICollection<Source> Sources { get; set; }
}

You may want to query all Sources that have a Target set:

var sourcesWithTarget = ctx
.Sources
.Where(source => source.Target != null)
.ToList();

However, you will get this SQL instead:

SELECT [Extent1].[Id] AS [Id], [Extent1].[TargetId] AS [TargetId], 
FROM [dbo].[Source] AS [Extent1]

See? No WHERE clause!

If, instead, we query on the foreign key property:

var sourcesWithTarget = ctx
.Sources
.Where(source => source.TargetId != 0)
.ToList();

Then we get what we want:

SELECT [Extent1].[Id] AS [Id], [Extent1].[TargetId] AS [TargetId], 
FROM [dbo].[Source] AS [Extent1]
WHERE 0 <> [Extent1].[TargetId]

Weird, huh?

Stay tuned for more!

Entity Framework Pitfalls: Migrations and DbContext Construction

If you want to create or apply a migration over a DbContext-derived context and your context doesn’t have a public parameterless constructor, it will fail. This happens with both .NET Framework and .NET Core: the problem is that the migrations framework has no way of knowing how to create the context so as to get information from it, the migrations framework actually instantiates it!

There are two ways to go around it:

  • Create a public parameterless constructor on your context;
  • Create a class that implements IDbContextFactory<T>, where the generic parameter is your DbContext-derived class.

Having a parameterless constructor is not always ideal, because you may need to pass something in, like a connection string, but you can do it just for executing migrations and them remove it again. Just make sure that the connection string in use is the right one.

As for the IDbContextFactory<T> approach, it’s quite simple:

public class BlogContextFactory : IDbContextFactory<BlogContext>

{

    public BlogContext Create()

    {

        var connectionString = "get a connection string somehow";

        return new BlogContext(connectionString);

    }

}

Basically, you have to implement the Create method so as to return an instance of your context, doesn’t really matter how you instantiate it. The migrations API will find this class automatically.

If you don’t do any of this, you may get weird errors, like, “unable to find the DbContext”, which doesn’t really help much.

Another problem you may face is if you have multiple DbContexts, in that case, you have to tell migrations which one to use.

Entity Framework Pitfalls: Entity Refresh

After an entity is loaded by Entity Framework, it gets stored in its first level cache. After that, whenever EF executes a query that returns this entity, identified by its primary key, EF always returns the cached entry, and makes no attempt to see if the cached data (the rest of the properties) is still up to date with that returned from the query. This feature is common to other ORMs, but it may result in unexpected behavior – we cannot rely on the state of a loaded entity if the database can change by other processes..

The solution for EF pre-Core is either:

  • Remove (evict) the entity from first level cache, by marking it as detached, when we want it to be reloaded from the database:
MyEntity e = ...;

 

ctx.Entry(e).State = EntityState.Detached;

  • Explicitly ask for it to be reloaded:
MyEntity e = ...;

 

ctx.Entry(e).Reload();

In EF Core, there is no Reload method – although I did write an implementation for it –, so the only option is to detach the entity from the context.

Entity Framework Pitfalls: Attaching New Entities With Existing Related Ones

One of the patterns in EF 6.x for attaching new entities to a context is to simply change it’s state to Added:

ctx.Entry(myEntity).State = EntityState.Added;

Entity Framework 6.x will traverse the entity graph of all the related entities and set all their states to Added too. This might be a problem.

Imagine we want to attach a disconnected (new) entity that is linked to an existing one, for which we already have a reference. In this case, we don’t want this reference to be added, because doing so will raise a violation on the database and it will come out as an exception.

What we need to do is to explicitly set the state of any of these existing instances to Unchanged. Fortunately, Entity Framework Core will solve this with the introduction of a new TrackGraph API.

Thanks for Tony Sneed (@tonysneed) for reminding me.

Entity Framework Pitfalls: Cascade Deletes

Introduction

Entity Framework supports cascading deletes for all of its associations:

  • One to one
  • One to many
  • Many to many

Meaning, when we remove an entity, it will automatically descend to its dependent entities (cascade) and delete them too.

Of course, cascading deletes of many to one really does not make much sense. The problem with cascade deletes configuration in EF is, sometimes it cannot be done with attributes, which is a mapping mechanism that most people like and use, and the different associations need different approaches.

One to One

For one to one relations, we need to use code mappings, say, we have an entity Master associated with a dependent entity Detail:

protected override void OnModelCreating(DbModelBuilder modelBuilder)

{

    modelBuilder

        .Entity<Master>()

        .HasOptional(x => x.Detail)

        .WithRequired(x => x.Master)

        .WillCascadeOnDelete(true);

 

    base.OnModelCreating(modelBuilder);

}

Deleting a Master will also delete the Detail, but, of course, not the other way around. If you are curious, EF does this through an ON DELETE CASCADE INDEX, so it is done by the database, not by EF – more on this later.

image

One to Many

We just need to mark the one endpoint with a [Required] attribute, like in a Detail entity that is a child of a Master:

public class Detail

{

    public int Id { get; set; }

 

    [Required]

    public virtual Master Master { get; set; }

}

This way, when we delete the Master entity, it will first delete all of its related Detail entities. This is also achieved through the automatically created ON DELETE CASCADE INDEX.

It is also certainly possible to achieve this using code configuration, but not really necessary – and more verbose:

protected override void OnModelCreating(DbModelBuilder modelBuilder)

{

    modelBuilder

        .Entity<Master>()

        .HasMany(x => x.Details)

        .WithRequired(x => x.Master)

        .WillCascadeOnDelete();

 

    base.OnModelCreating(modelBuilder);

}

Many to Many

This one is the best: guess what, we don’t have to do anything, it just works out of the box!

Conclusion

Because Entity Framework relies on INDEXes to cascade the deletes, you cannot just change the code mappings after you created the model – either by adding attributes or through fluent configuration – and expect things to work. If you do, you will likely need to update the database. Do so by creating and applying a new migration. One thing that is still (as of EF 6.x) not supported is deleting orphans, but I already talked about it. EF Core will handle it properly.

Entity Framework Pitfalls: TransactionScope

Most people using Entity Framework use the TransactionScope API to manage database transactions. This is a powerful API, that can be used to enlist .NET APIs and even WS-*-compatible web services in transactions, that may well be distributed. Microsoft used to recommend it for working with EF, but isn’t doing it anymore, or, at least, developers need to understand a thing or two.

Most modern APIs offer asynchronous methods in addition, or instead, of blocking ones, and Entity Framework is a good example – just think of SaveChangesAsync, FindAsync and ToListAsync. The problem with TransactionScope prior to version 4.5.1 was that it didn’t handle well transaction-enabled methods running in other threads – which is what EF’s asynchronous methods do. In this version, Microsoft introduced the

TransactionScopeAsyncFlowOption flag, which can be used to tell TransactionScope to flow transactions across async threads.

Also, if we open two database connections inside a TransactionScope, the transaction gets automatically promoted to a distributed one. The problem is, distributed transactions require Microsoft Distributed Transaction Coordinator service running, and it is not available in Azure, for example (Web Apps).

So, the key here is: if we cannot be certain of the environment where EF will run, which may include pre-.NET 4.5.1 or Azure, we should use explicit transactions instead. It’s not that difficult, it’s just a matter of calling Database.BeginTransaction() and either committing or rolling back/disposing of the returned object when we are done with it.

Entity Framework Pitfalls – Composite Primary Keys

When you have a composite primary key, you will get two properties on your POCO class that together represent the id. In this case, conventions won’t be enough, so you need to mark both properties with the KeyAttribute attribute:

public class MyCompositeEntity

{

    [Key]

    public int PartialKey1 { get; set; }

 

    [Key]

    public int PartialKey2 { get; set; }

}

That won’t suffice, though. You also need to specify the order by which the columns should be used, through the ColumnAttribute’s Order property:

public class MyCompositeEntity

{

    [Key]

    [Column(Order = 0)]

    public int PartialKey1 { get; set; }

 

    [Key]

    [Column(Order = 1)]

    public int PartialKey2 { get; set; }

}

Mind you, the order is zero-based. It will be used in two places:

  • When generating the primary key index;
  • When querying using the Find method of the DbSet<T>:
var entity = ctx.MyCompositeEntities.Find(

    1,    //PartialKey1 (Order = 0)

    2     //PartialKey2 (Order = 1)

);

Entity Framework Pitfalls: DbConfiguration Classes Are Automatically Loaded

Entity Framework, since version 6, allows us to specify certain aspects of its configuration by code through a DbConfiguration-derived class. This configuration can be specified in one of three ways:

However, if there is a DbConfiguration-derived class in the same assembly as your DbContext, it will be automatically set as the configuration class, provided that:

  • It is public;
  • It is non-abstract;
  • It has a public parameterless constructor.

So, a DbConfiguration in the same assembly as the DbContext is automatically loaded. If there is more than one, the DbContext constructor will throw an exception.