L2E: SQL impact

As I wrote yesterday,

“Behind the EF magic, there is the SQL reality. I’m sticking to my guns, if you don’t know SQL, you can’t write best Entity Framework queries.

I find a good sample with grouping functions.”

Yesterday, I showed you how to have a Count and a Max on only one LINQ query.

Now imagine that I want to get the orders count and the max orders date per customer.

How to do it?

If you are sure that all customers have some orders or if you want to exclude customers without any order, the best way is probably to write this:

from o in context.Orders
group o by o.Customer into g
select new { Customer = g.Key, OrdersCount = g.Count(), LastOrderDate = g.Max(o => o.OrderDate) };

In this case, the SQL is the very similar than the L2E query:

SELECT

[GroupBy1].[K1] AS [CustomerId],

[GroupBy1].[K2] AS [LastName],

[GroupBy1].[K3] AS [FirstName],

[GroupBy1].[K4] AS [BirthDay],

[GroupBy1].[K5] AS [AddressLine],

[GroupBy1].[K6] AS [City],

[GroupBy1].[K7] AS [PostalCode],

[GroupBy1].[K8] AS [Region],

[GroupBy1].[K9] AS [Country],

[GroupBy1].[A1] AS [C1],

[GroupBy1].[A2] AS [C2]

FROM ( SELECT

      [Extent2].[CustomerId] AS [K1],

      [Extent2].[LastName] AS [K2],

      [Extent2].[FirstName] AS [K3],

      [Extent2].[BirthDay] AS [K4],

      [Extent2].[AddressLine] AS [K5],

      [Extent2].[City] AS [K6],

      [Extent2].[PostalCode] AS [K7],

      [Extent2].[Region] AS [K8],

      [Extent2].[Country] AS [K9],

      COUNT(1) AS [A1],

      MAX([Extent1].[OrderDate]) AS [A2]

      FROM  [dbo].[Orders] AS [Extent1]

      INNER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[CustomerId]

      GROUP BY [Extent2].[CustomerId], [Extent2].[LastName], [Extent2].[FirstName], [Extent2].[BirthDay], [Extent2].[AddressLine], [Extent2].[City], [Extent2].[PostalCode], [Extent2].[Region], [Extent2].[Country]

)  AS [GroupBy1]

 

With a basic execution plan:

image

But, if you can have some customers without any orders and you want it on the result, it is more difficult.

A basic idea would be to get previous query union customers without any orders:

(from o in context.Orders  group o by o.Customer intoselect new { Customer = g.Key, OrdersCount = g.Count(), LastOrderDate = (DateTime?)g.Max(o => o.OrderDate) }).Union(
from c in context.Customers
where !c.Orders.Any()
select new { Customer = c, OrdersCount = 0, LastOrderDate = (DateTime?)null });

In this case, the SQL query is not good:

SELECT

[Distinct1].[C1] AS [C1],

[Distinct1].[C2] AS [C2],

[Distinct1].[C3] AS [C3],

[Distinct1].[C4] AS [C4],

[Distinct1].[C5] AS [C5],

[Distinct1].[C6] AS [C6],

[Distinct1].[C7] AS [C7],

[Distinct1].[C8] AS [C8],

[Distinct1].[C9] AS [C9],

[Distinct1].[C10] AS [C10],

[Distinct1].[C11] AS [C11],

[Distinct1].[C12] AS [C12]

FROM ( SELECT DISTINCT

      [UnionAll1].[C1] AS [C1],

      [UnionAll1].[CustomerId] AS [C2],

      [UnionAll1].[LastName] AS [C3],

      [UnionAll1].[FirstName] AS [C4],

      [UnionAll1].[BirthDay] AS [C5],

      [UnionAll1].[AddressLine] AS [C6],

      [UnionAll1].[City] AS [C7],

      [UnionAll1].[PostalCode] AS [C8],

      [UnionAll1].[Region] AS [C9],

      [UnionAll1].[Country] AS [C10],

      [UnionAll1].[C2] AS [C11],

      [UnionAll1].[C3] AS [C12]

      FROM  (SELECT

            1 AS [C1],

            [GroupBy1].[K1] AS [CustomerId],

            [GroupBy1].[K2] AS [LastName],

            [GroupBy1].[K3] AS [FirstName],

            [GroupBy1].[K4] AS [BirthDay],

            [GroupBy1].[K5] AS [AddressLine],

            [GroupBy1].[K6] AS [City],

            [GroupBy1].[K7] AS [PostalCode],

            [GroupBy1].[K8] AS [Region],

            [GroupBy1].[K9] AS [Country],

            [GroupBy1].[A1] AS [C2],

             CAST( [GroupBy1].[A2] AS datetime2) AS [C3]

            FROM ( SELECT

                  [Extent2].[CustomerId] AS [K1],

                  [Extent2].[LastName] AS [K2],

                  [Extent2].[FirstName] AS [K3],

                  [Extent2].[BirthDay] AS [K4],

                  [Extent2].[AddressLine] AS [K5],

                  [Extent2].[City] AS [K6],

                  [Extent2].[PostalCode] AS [K7],

                  [Extent2].[Region] AS [K8],

                  [Extent2].[Country] AS [K9],

                  COUNT(1) AS [A1],

                  MAX([Extent1].[OrderDate]) AS [A2]

                  FROM  [dbo].[Orders] AS [Extent1]

                  INNER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[CustomerId]

                  GROUP BY [Extent2].[CustomerId], [Extent2].[LastName], [Extent2].[FirstName], [Extent2].[BirthDay], [Extent2].[AddressLine], [Extent2].[City], [Extent2].[PostalCode], [Extent2].[Region], [Extent2].[Country]

            )  AS [GroupBy1]

      UNION ALL

            SELECT

            1 AS [C1],

            [Extent3].[CustomerId] AS [CustomerId],

            [Extent3].[LastName] AS [LastName],

            [Extent3].[FirstName] AS [FirstName],

            [Extent3].[BirthDay] AS [BirthDay],

            [Extent3].[AddressLine] AS [AddressLine],

            [Extent3].[City] AS [City],

            [Extent3].[PostalCode] AS [PostalCode],

            [Extent3].[Region] AS [Region],

            [Extent3].[Country] AS [Country],

            0 AS [C2],

            CAST(NULL AS datetime2) AS [C3]

            FROM [dbo].[Customers] AS [Extent3]

            WHERE  NOT EXISTS (SELECT

                  1 AS [C1]

                  FROM [dbo].[Orders] AS [Extent4]

                  WHERE [Extent3].[CustomerId] = [Extent4].[CustomerId]

            )) AS [UnionAll1]

)  AS [Distinct1]

image

So it is probably not the best way…

Another one could be to use what I presented yesterday: a group by a constant to get the count and the max on c.Orders:

from c in context.Customers
let ordersInfo = (from o in c.Orders                   group o by 1 into g                   select new { OrdersCount = g.Count(), LastOrderDate = g.Max(o => o.OrderDate) }).FirstOrDefault()
select new { Customer = c, OrdersCount = (int?)ordersInfo.OrdersCount ?? 0, OrderDate = (DateTime?)ordersInfo.LastOrderDate };

Note the cast into int? or DateTime?. Contrary to .NET, SQL returns null for OrdersCount and OrderDate if a customer has no orders.

If the previous query is good for L2O, it is not so good with L2E because of the generated SQL!

SELECT

[Extent1].[CustomerId] AS [CustomerId],

[Extent1].[LastName] AS [LastName],

[Extent1].[FirstName] AS [FirstName],

[Extent1].[BirthDay] AS [BirthDay],

[Extent1].[AddressLine] AS [AddressLine],

[Extent1].[City] AS [City],

[Extent1].[PostalCode] AS [PostalCode],

[Extent1].[Region] AS [Region],

[Extent1].[Country] AS [Country],

CASE WHEN ([Limit1].[C1] IS NULL) THEN 0 ELSE [Limit1].[C1] END AS [C1],

 CAST( [Limit1].[C2] AS datetime2) AS [C2]

FROM  [dbo].[Customers] AS [Extent1]

OUTER APPLY  (SELECT TOP (1)

      [GroupBy1].[A1] AS [C1],

      [GroupBy1].[A2] AS [C2]

      FROM ( SELECT

            [Project1].[K1] AS [K1],

            COUNT([Project1].[A1]) AS [A1],

            MAX([Project1].[A2]) AS [A2]

            FROM ( SELECT

                  1 AS [K1],

                  1 AS [A1],

                  [Project1].[OrderDate] AS [A2]

                  FROM ( SELECT

                        [Extent2].[OrderDate] AS [OrderDate]

                        FROM [dbo].[Orders] AS [Extent2]

                        WHERE [Extent1].[CustomerId] = [Extent2].[CustomerId]

                  )  AS [Project1]

            )  AS [Project1]

            GROUP BY [K1]

      )  AS [GroupBy1] ) AS [Limit1]

image

As you can see, it is worse than the previous one. So contrary to my yesterday post in this case, it is not a good idea to use group by a constant.

We can also try this:

from c in context.Customers
let orders = c.Orders
select new { Customer = c, OrdersCount = (int?)orders.Count ?? 0, LastOrderDate = (DateTime?)orders.Max(o => o.OrderDate) };

But the generated SQL query is not better:

SELECT

[Project3].[CustomerId] AS [CustomerId],

[Project3].[LastName] AS [LastName],

[Project3].[FirstName] AS [FirstName],

[Project3].[BirthDay] AS [BirthDay],

[Project3].[AddressLine] AS [AddressLine],

[Project3].[City] AS [City],

[Project3].[PostalCode] AS [PostalCode],

[Project3].[Region] AS [Region],

[Project3].[Country] AS [Country],

CASE WHEN ([Project3].[C1] IS NULL) THEN 0 ELSE [Project3].[C2] END AS [C1],

 CAST( [Project3].[C3] AS datetime2) AS [C2]

FROM ( SELECT

      [Project2].[CustomerId] AS [CustomerId],

      [Project2].[LastName] AS [LastName],

      [Project2].[FirstName] AS [FirstName],

      [Project2].[BirthDay] AS [BirthDay],

      [Project2].[AddressLine] AS [AddressLine],

      [Project2].[City] AS [City],

      [Project2].[PostalCode] AS [PostalCode],

      [Project2].[Region] AS [Region],

      [Project2].[Country] AS [Country],

      [Project2].[C1] AS [C1],

      [Project2].[C2] AS [C2],

      (SELECT

            MAX([Extent4].[OrderDate]) AS [A1]

            FROM [dbo].[Orders] AS [Extent4]

            WHERE [Project2].[CustomerId] = [Extent4].[CustomerId]) AS [C3]

      FROM ( SELECT

            [Project1].[CustomerId] AS [CustomerId],

            [Project1].[LastName] AS [LastName],

            [Project1].[FirstName] AS [FirstName],

            [Project1].[BirthDay] AS [BirthDay],

            [Project1].[AddressLine] AS [AddressLine],

            [Project1].[City] AS [City],

            [Project1].[PostalCode] AS [PostalCode],

            [Project1].[Region] AS [Region],

            [Project1].[Country] AS [Country],

            [Project1].[C1] AS [C1],

            (SELECT

                  COUNT(1) AS [A1]

                  FROM [dbo].[Orders] AS [Extent3]

                  WHERE [Project1].[CustomerId] = [Extent3].[CustomerId]) AS [C2]

            FROM ( SELECT

                  [Extent1].[CustomerId] AS [CustomerId],

                  [Extent1].[LastName] AS [LastName],

                  [Extent1].[FirstName] AS [FirstName],

                  [Extent1].[BirthDay] AS [BirthDay],

                  [Extent1].[AddressLine] AS [AddressLine],

                  [Extent1].[City] AS [City],

                  [Extent1].[PostalCode] AS [PostalCode],

                  [Extent1].[Region] AS [Region],

                  [Extent1].[Country] AS [Country],

                  (SELECT

                        COUNT(1) AS [A1]

                        FROM [dbo].[Orders] AS [Extent2]

                        WHERE [Extent1].[CustomerId] = [Extent2].[CustomerId]) AS [C1]

                  FROM [dbo].[Customers] AS [Extent1]

            )  AS [Project1]

      )  AS [Project2]

)  AS [Project3]

image

Ok now, I stop playing with your patience. What is the best way to write it?

I think it’s the following:

from c in context.Customers
from o in
    (from o2 in context.Orders      where o2.CustomerId == c.CustomerId      group o2 by o2.CustomerId into g      select new { OrdersCount = g.Count(), LastOrderDate = (DateTime?)g.Max(o => o.OrderDate) }).DefaultIfEmpty()
select new { Customer = c, OrdersCount = (int?)o.OrdersCount ?? 0, o.LastOrderDate };

Now the SQL is very simple and the execution plan too:

SELECT

[Extent1].[CustomerId] AS [CustomerId],

[Extent1].[LastName] AS [LastName],

[Extent1].[FirstName] AS [FirstName],

[Extent1].[BirthDay] AS [BirthDay],

[Extent1].[AddressLine] AS [AddressLine],

[Extent1].[City] AS [City],

[Extent1].[PostalCode] AS [PostalCode],

[Extent1].[Region] AS [Region],

[Extent1].[Country] AS [Country],

CASE WHEN ([GroupBy1].[A1] IS NULL) THEN 0 ELSE [GroupBy1].[A1] END AS [C1],

CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS datetime2) ELSE  CAST( [GroupBy1].[A2] AS datetime2) END AS [C2]

FROM  [dbo].[Customers] AS [Extent1]

OUTER APPLY  (SELECT

      [Extent2].[CustomerId] AS [K1],

      COUNT(1) AS [A1],

      MAX([Extent2].[OrderDate]) AS [A2]

      FROM [dbo].[Orders] AS [Extent2]

      WHERE [Extent2].[CustomerId] = [Extent1].[CustomerId]

      GROUP BY [Extent2].[CustomerId] ) AS [GroupBy1]

image

Note the query cost of 11%. It’s the same than the first SQl query without customers with no orders! Awesome!

 

Why do I write this post? Why showing these execution plans and the generated SQL?

I think it’s very important. Contrary to the marketing speech, I don’t think that EF is magic and easy to use. Of course for a demo with 3 rows on 3 tables in DB, you can write the query you want but for a real application with performance issues, it is not so easy… Understand me, I still think that EF is awesome and I really love it. However, I also think that it’s very important to understand L2E impacts on SQL and it is not easy…

With this post, I wanted to show how SQL execution plans can be different for the same result function of the way to write L2E queries.

I realized many audits on Entity Framework, particularly to fix performance issues and I never saw any real EF performance issues. Performance issues are, in all cases I was shown, the fault of a bad EF knowledge or a bad DB conception.

This entry was posted in 12253, 7671, 7674, 9104. Bookmark the permalink.

2 Responses to L2E: SQL impact

  1. Leonilyn says:

    I agree left / right joins are very important but sdelom used. Not relevant in some situation but when they are called for they do a much better job than inner joins. I guess people simply don’t know that they exist, and how they work.Nice tutorial.

  2. Randal says:

    The column names of the reuslt set that are returned by EXCEPT or INTERSECT are the same names as those returned by the query on the left side of the operand.Column names or aliases in ORDER BY clauses must reference column names returned by the left-side query.

Leave a Reply

Your email address will not be published. Required fields are marked *


*