EF: Why Include method is an anti-pattern IMHO even with many to one navigation properties? 2/3

In my yesterday post, I explained why I think Include method is an anti-pattern.

Darrel commented that the issue was because I use Include with one to many navigation property on many side.

In the sample, I have only one Include to side many and four to side (zero or) one.

But Darrel is right. If I use only one side, Include is not so bad. Not so bad but not the best one anyway.

I will take 2 samples to illustrate it.

In these samples, I will compare time to execute the query from the .NET code (so including SQL generation, SQL execution and EF materialization).

Using Include method my first sample is the following:

private static OnlineSale GetOnlineSaleWithGraph(ContosoRetailDWEntities context)


{


    return context.OnlineSales.Include(os => os.Customer).Include(os => os.Product.ProductSubcategory.ProductCategory).Include(os => os.Store).FirstOrDefault();


}


 



Now with my way, the code is the following:



private static OnlineSale GetOnlineSaleWithGraph(ContosoRetailDWEntities context)


{


    OnlineSale onlineSale = null;


    object lockObject = new object();


    Task onlineSaleTask = new Task(() =>


        {


            using (var context2 = new ContosoRetailDWEntities())


            {


                var query = context2.OnlineSales;


                ObjectQuery<OnlineSale> objectQuery = (ObjectQuery<OnlineSale>)query;


                objectQuery.MergeOption = MergeOption.NoTracking;


                onlineSale = objectQuery.FirstOrDefault();


            }


            if (onlineSale != null)


                lock (lockObject)


                {


                    context.OnlineSales.Attach(onlineSale);


                }


        });


    onlineSaleTask.Start();


 


    Task customerTask = new Task(() =>


        {


            Customer customer;


            using (var context2 = new ContosoRetailDWEntities())


            {


                var query = context2.OnlineSales.Select(os => os.Customer);


                ObjectQuery<Customer> objectQuery = (ObjectQuery<Customer>)query;


                objectQuery.MergeOption = MergeOption.NoTracking;


                customer = objectQuery.FirstOrDefault();


            }


            if (customer != null)


                lock (lockObject)


                {


                    context.Customers.Attach(customer);


                }


        });


    customerTask.Start();


 


    Task productTask = new Task(() =>


    {


        using (var context2 = new ContosoRetailDWEntities())


        {


            ObjectQuery<OnlineSale> objectQuery = context2.OnlineSales;


            objectQuery.MergeOption = MergeOption.NoTracking;


            var query = objectQuery.Select(os => new { os.Product, os.Product.ProductSubcategory, os.Product.ProductSubcategory.ProductCategory });


            var product = query.FirstOrDefault();


            if (product != null)


                lock (lockObject)


                {


                    context.Products.Attach(product.Product);


                    if (product.ProductSubcategory != null)


                    {


                        context.ProductSubcategories.Attach(product.ProductSubcategory);


                        if (product.ProductCategory != null)


                            context.ProductCategories.Attach(product.ProductCategory);


                    }


                }


        }


    });


    productTask.Start();


 


    Task storeTask = new Task(() =>


    {


        Store store;


        using (var context2 = new ContosoRetailDWEntities())


        {


            var query = context2.OnlineSales.Select(os => os.Store);


            ObjectQuery<Store> objectQuery = (ObjectQuery<Store>)query;


            objectQuery.MergeOption = MergeOption.NoTracking;


            store = objectQuery.FirstOrDefault();


        }


        if (store != null)


            lock (lockObject)


            {


                context.Stores.Attach(store);


            }


    });


    storeTask.Start();


 


    Task.WaitAll(onlineSaleTask, customerTask, productTask, storeTask);


 


    return onlineSale;


}


 



As you can see, I use a sort of Include to load ProductSubCategory and ProductCategory with the Product. I do it because as I get only one product, I’m sure that I won’t duplicate ProductSubCategory or ProductCategory.



In my tests, the code using Include run on 962 ms and 611 ms when the execution plan is on SQL Server cache and mine in 749 ms (22 % faster) and 548 ms when SQL Server uses its cache (10% faster).



Now I will do the same getting 100 OnlineSales



private static List<OnlineSale> GetTenLastOnlineSalesWithGraph(ContosoRetailDWEntities context)


{


    return context.OnlineSales.Take(100).Include(os => os.Customer).Include(os => os.Product.ProductSubcategory.ProductCategory).Include(os => os.Store).ToList();


}


 



Now my code is the following:



private static List<OnlineSale> GetTenLastOnlineSalesWithGraph(ContosoRetailDWEntities context)


{


    List<OnlineSale> onlineSales = null;


    object lockObject = new object();


    Task onlineSalesTask = new Task(() =>


        {


            using (var context2 = new ContosoRetailDWEntities())


            {


                var query = context2.OnlineSales.Take(100);


                ObjectQuery<OnlineSale> objectQuery = (ObjectQuery<OnlineSale>)query;


                objectQuery.MergeOption = MergeOption.NoTracking;


                onlineSales = objectQuery.ToList();


            }


            lock (lockObject)


            {


                foreach (var onlineSale in onlineSales)


                    context.OnlineSales.Attach(onlineSale);


            }


        });


    onlineSalesTask.Start();


 


    Task customersTask = new Task(() =>


        {


            List<Customer> customers;


            using (var context2 = new ContosoRetailDWEntities())


            {


                var query = context2.OnlineSales.Take(100).Select(os => os.Customer).Distinct();


                ObjectQuery<Customer> objectQuery = (ObjectQuery<Customer>)query;


                objectQuery.MergeOption = MergeOption.NoTracking;


                customers = objectQuery.ToList();


            }


            lock (lockObject)


            {


                foreach (var customer in customers)


                    context.Customers.Attach(customer);


            }


        });


    customersTask.Start();


 


    Task productsTask = new Task(() =>


    {


        List<Product> products;


        using (var context2 = new ContosoRetailDWEntities())


        {


            var query = context2.OnlineSales.Take(100).Select(os => os.Product).Distinct();


            ObjectQuery<Product> objectQuery = (ObjectQuery<Product>)query;


            objectQuery.MergeOption = MergeOption.NoTracking;


            products = objectQuery.ToList();


        }


        lock (lockObject)


        {


            foreach (var product in products)


                context.Products.Attach(product);


        }


    });


    productsTask.Start();


 


    Task productSubCategoriesTask = new Task(() =>


    {


        List<ProductSubcategory> productSubCategories;


        using (var context2 = new ContosoRetailDWEntities())


        {


            var query = context2.OnlineSales.Take(100).Select(os => os.Product.ProductSubcategory).Distinct();


            ObjectQuery<ProductSubcategory> objectQuery = (ObjectQuery<ProductSubcategory>)query;


            objectQuery.MergeOption = MergeOption.NoTracking;


            productSubCategories = objectQuery.ToList();


        }


        lock (lockObject)


        {


            foreach (var productSubCategory in productSubCategories)


                context.ProductSubcategories.Attach(productSubCategory);


        }


    });


    productSubCategoriesTask.Start();


 


    Task productCategoriesTask = new Task(() =>


    {


        List<ProductCategory> productCategories;


        using (var context2 = new ContosoRetailDWEntities())


        {


            var query = context2.OnlineSales.Take(100).Select(os => os.Product.ProductSubcategory.ProductCategory).Distinct().Distinct();


            ObjectQuery<ProductCategory> objectQuery = (ObjectQuery<ProductCategory>)query;


            objectQuery.MergeOption = MergeOption.NoTracking;


            productCategories = objectQuery.ToList();


        }


        lock (lockObject)


        {


            foreach (var productCategory in productCategories)


                context.ProductCategories.Attach(productCategory);


        }


    });


    productCategoriesTask.Start();


 


    Task storesTask = new Task(() =>


    {


        List<Store> stores;


        using (var context2 = new ContosoRetailDWEntities())


        {


            var query = context2.OnlineSales.Take(100).Select(os => os.Store).Distinct();


            ObjectQuery<Store> objectQuery = (ObjectQuery<Store>)query;


            objectQuery.MergeOption = MergeOption.NoTracking;


            stores = objectQuery.ToList();


        }


        lock (lockObject)


        {


            foreach (var store in stores)


                context.Stores.Attach(store);


        }


    });


    storesTask.Start();


 


    Task.WaitAll(onlineSalesTask, customersTask, productsTask, productSubCategoriesTask, productCategoriesTask, storesTask);


 


    return onlineSales;


}


 



In my tests, the code using Include run on 1118 ms and 657 ms when SQL Server uses its cache and mine in 941 ms (16% faster) and 587 ms (11% faster).



 



It’s interesting to note that, for performance aspect, in many to one navigation property, I prefer starting from my base query and use a Distinct but when I have a many to one navigation property, I prefer starting with my result EntitySet and use a Where with an Any on my base query.



context2.ProductCategories.Where(ca => context2.Customers.Take(50).SelectMany(c => c.OnlineSales).Any(os => os.Product.ProductSubcategory.ProductCategoryKey == ca.ProductCategoryKey));



 



So even like this you can see that Include method has not the best performance even if it is not catastrophic contrary to yesterday.



 



Note that my computer is “only” a dual core and the DB is local. Else, my way would be more better.



 



In fact, I think that Include is very good if you have a one to one property and if the Include has only one branch per graph depth.



Just note that the Include has something very interesting comparing to my code: Include method returns an IQueryable<T> after using it.

This entry was posted in 7671, 7674. Bookmark the permalink.

5 Responses to EF: Why Include method is an anti-pattern IMHO even with many to one navigation properties? 2/3

  1. Darrel Miller says:

    My tests have shown that the vast majority of the time spent doing a query with an include is spent compiling the query. If you compile the query you will see the compiled query is significantly faster.

    Also, if your database is on your local machine then you are not experiencing network latency. It is the network latency that makes Includes valuable.

    The other thing to note is that my experience with includes was based on a three tier architecture using DevForce from Ideablade. DevForce is layered on top of EF and allows you to issue the EF query and remote the results on the client machine across the internet. In my scenario I was using Include to avoid roundtrips that were costing 100ms just in latency.
    I can see that if you have EF running on an app server that is sitting in a datacenter next to the DB server where latency is < 10ms then the benefit of Include is hugely reduced.

  2. If I really need the 10% performance improvement, I think I would tend to write a stored procedure that does exactly the job I need, and I would keep a “clean” source code.

    Just my 2 cents.

  3. Matthieu MEZIL says:

    @Fabrice. With stored procedure, you lose EF flexibility if you want to have different where, order by, etc.

  4. Matthieu MEZIL says:

    @Darrel: “It is the network latency that makes Includes valuable”. Disagree versus my way to do it because my queries are executed in parallel. And so the number of bytes received from the server influences network latency and my way avoids duplicated information and so reduces a lot the size of it.

  5. Gabriel says:

    Howdy! I know this is kinda off topic however , I’d fiuegrd I’d ask. Would you be interested in trading links or maybe guest authoring a blog article or vice-versa? My website discusses a lot of the same subjects as yours and I feel we could greatly benefit from each other. If you might be interested feel free to shoot me an e-mail. I look forward to hearing from you! Great blog by the way!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>