EF and recursion

Through this post, I will show you a concrete case of a bad EF usage I recently saw.

My model is the following:

image_thumb3

We want to get a tree with a constraint: if the node type is "G" and the node doesn’t have any child, it should not be present on the tree.

How to do this?

This is the code I saw:

public static List<Node> GetTree()
{
using (var context = new TreeEntities())
{
List<Node> nodes = context.Nodes.Where(n => n.Parent == null).ToList();
foreach (var n in nodes)
LoadChildren(context, n);
foreach (var n in nodes.Where(n => n.Type == "G" && n.Children.Count == 0).ToList())
nodes.Remove(n);
return nodes;
} } public static void LoadChildren(TreeEntities context, Node parentNode) {
List<Node> nodes = context.Nodes.Where(n => n.ParentId == parentNode.Id).ToList();
foreach (var n in nodes)
LoadChildren(context, n);
foreach (var n in nodes.Where(n => n.Type == "G" && n.Children.Count == 0).ToList())
context.Nodes.Detach(n); }


I did a test with the following entities set:



using (var context = new TreeEntities())
{
for (int i0 = 0; i0 < 10; i0++)
{
Node n0 = new Node { Name = i0.ToString(), Type = i0 == 9 ? null : "G" };
context.Nodes.AddObject(n0);
if (i0 > 7)
break;
for (int i1 = 0; i1 < 10; i1++)
{
Node n1 = new Node { Name = n0.Name + i1.ToString(), Type = i1 == 9 ? null : "G", Parent = n0 };
context.Nodes.AddObject(n1);
if (i1 > 7)
break;
for (int i2 = 0; i2 < 10; i2++)
{
Node n2 = new Node { Name = n1.Name + i2.ToString(), Type = i2 == 9 ? null : "G", Parent = n1 };
context.Nodes.AddObject(n2);
if (i2 > 7)
break;
for (int i3 = 0; i3 < 10; i3++)
{
Node n3 = new Node { Name = n2.Name + i3.ToString(), Type = i3 == 9 ? null : "G", Parent = n2 };
context.Nodes.AddObject(n3);
if (i3 > 7)
break;
for (int i4 = 0; i4 < 10; i4++)
context.Nodes.AddObject(new Node { Name = n3.Name + i4.ToString(), Parent = n3 });
}
}
}
}
context.SaveChanges(); }


With this first version, this code runs on 5 minutes and 41 seconds and generates 46 226 SQL queries!



We don’t need to be a DBA to understand that if the number of SQL queries depends on the number of datarows, the code does not support heavy load.



So, how to improve it?



One of the big advantages using EF is the fact that it makes relationships itself. It’s why this code works. Indeed, we never set nodes Parent / Children. EF did it for us.



So you could imagine a code like this:



public static List<Node> GetTree()
{
using (var context = new TreeEntities())
{


return context.Nodes.Where(n => n.Type != "G" || n.Children.Any()).AsEnumerable().Where(n => n.ParentId == null).ToList();
} }


Before continuing: note that I use ParentId != null and not Parent != null. Indeed, when we enumerate entities, parent could not be loaded yet. If it’s the case, ParentId will be different than null but Parent will be equal to null.



Now, there is a big issue with this code: it does not work as expected. If we have a node with type equals to G with a child with type equals to G too without any child, the first node should not appear in the tree because its only one child should not appear. For this case, the previous code does not work.



SQL is not well-known for recursion.



I will try two different approaches:



  • a first one where recursion will be only on the code. Note that it means loading too many entities from DB (as the first version did).
  • a second, in a future post, where recursion is managed into the DB


My first solution is the following:



public static IEnumerable<T> GetEntitiesInCache<T>(this ObjectContext context, EntityState entityState = EntityState.Unchanged | EntityState.Modified | EntityState.Added)
{
return context.ObjectStateManager.GetObjectStateEntries(entityState).Select(ose => ose.Entity).OfType<T>(); } public static List<Node> GetTree(TreeEntities context) {
using (var context = new TreeEntities())
{
foreach (var n in context.Nodes);

bool continueLoop;
do
{
continueLoop = false;
foreach (var n in context.GetEntitiesInCache<Node>().Where(n => n.Type == "G" && n.Children.Count == 0).ToList())
{
continueLoop = true;
context.Nodes.Detach(n);
}
} while (continueLoop);

return context.GetEntitiesInCache<Node>().Where(n => n.ParentId == null).ToList();
} }


With my entities set, the 5 minutes and 41 seconds becomes only 1 second and only one SQL query (instead of 46 226) with my version.



Note that I use a local DB. Else, difference would be bigger.



Note also this code



       foreach (var n in context.Nodes);


which allows to load every nodes in the context before detaching “bad” nodes.

This entry was posted in 7671, 7674. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>