Denormalizing data in RavenDB

One of the things with RavenDB, or NoSQL document databases in general, is that you don’t do joins to combine data. Normally you try to model the documents you store in such a way that the data you need for most common actions is stored in the document itself. That often means denormalizing data. When you first get started with document databases that feels strange, after all with relational databases we are taught to normalize data as much as possible and not repeat the same values. Where normalizing data is great for updates and minimizing the size of databases it is less than ideal for querying. This is because when querying we need to join various tables to turn abstract foreign keys into something that is actually understandable by the end user. And while relational databases are pretty good at joining tables these operations are not free, instead we pay for the that with every query we do. Now it turns out that most applications are read heavy and not write heavy. And as a result optimizing writes  actually hurts something like 99% of the database operations we do.

With document database like RavenDB we can’t even do a join action. When we normalize data the client actively has to fetch related data and turn those abstract identities to other documents into, for a user, meaningful values. Normally the documents in a RavenDB database are much more denormalized that similar data in a SQL server database would be. The result is that for most operations a single IDocumentSession.Load() is enough to work with a document.

 

That data makes sense to denormalize?

Not everything makes sense to denormalize, normally only relatively static data that is frequently needed is denormalized. Why relatively static data? Simple, every time the master document for that piece of data is updated all documents where it might be denormalized also need to be updated. And while not especially difficult it would become a bottleneck if it happened to often. Fortunately there is enough data that fits the criteria.

 

The RavenDB example data

The de-facto sample data for SQL Server is the Northwind database. And by sheer coincidence it so happens that RavenDB also ships with this same database, except now in document form. With lots of .NET developers being familiar with SQL Server this Northwind database is often the first stop at how a document database should be constructed.

image

As you can see in the screenshot from the RavenDB studio a relatively small number of collections replaces the tables from SQL Server. Nice :-)

image

The structure used to save an order is also nice and simple, just the Order and OrderLine classes saved in a single document.

   1: public class Order


   2: {


   3:     public string Id { get; set; }


   4:     public string Company { get; set; }


   5:     public string Employee { get; set; }


   6:     public DateTime OrderedAt { get; set; }


   7:     public DateTime RequireAt { get; set; }


   8:     public DateTime? ShippedAt { get; set; }


   9:     public Address ShipTo { get; set; }


  10:     public string ShipVia { get; set; }


  11:     public decimal Freight { get; set; }


  12:     public List<OrderLine> Lines { get; set; }


  13: }


  14:  


  15: public class OrderLine


  16: {


  17:     public string Product { get; set; }


  18:     public string ProductName { get; set; }


  19:     public decimal PricePerUnit { get; set; }


  20:     public int Quantity { get; set; }


  21:     public decimal Discount { get; set; }


  22: }

 

One missing thing

Nice as this may be there is one missing thing. Other than the product name being sold and it’s price there is no data denormalized. This means that if we want to display to the user for even the most basic of uses we will need to load additional document. For example the Company property in an order just contains the identity of a customer. If we want to display the order the very least we would have to do is load the company and display the customers name instead of its identity. And the same it true for the employee and shipper.

While this sample database is not denormalized it turns out is is quite easy to do so ourselves.

 

Denormalizing the RavenDB Northwind database

The first step is to store the related name along with each referred to identity as seen below.

image

 

The order is the same but this time we can do common user interaction operations with just the one document and not be required to load additional documents. It turns out this is quite easy to do. The RavenDB documentation has a nice description on how to do that using INamedDocument and DenormalizedReference<T>. Using this technique makes it really easy and consistent to work with denormalized data and create a document structure like the one above. The change to the Order and OrderLine classes are minimal. All I had to do is replace the string type Company property with one of type DenormalizedReference<Company>.

   1: public class Order


   2:  {


   3:      public string Id { get; set; }


   4:      public DenormalizedReference<Company> Company { get; set; }


   5:      public DenormalizedReference<Employee> Employee { get; set; }


   6:      public DateTime OrderedAt { get; set; }


   7:      public DateTime RequireAt { get; set; }


   8:      public DateTime? ShippedAt { get; set; }


   9:      public Address ShipTo { get; set; }


  10:      public DenormalizedReference<Shipper> ShipVia { get; set; }


  11:      public decimal Freight { get; set; }


  12:      public List<OrderLine> Lines { get; set; }


  13: }


  14:  


  15: public class OrderLine


  16: {


  17:     public DenormalizedReference<Product> Product { get; set; }


  18:     public string ProductName { get; set; }


  19:     public decimal PricePerUnit { get; set; }


  20:     public int Quantity { get; set; }


  21:     public decimal Discount { get; set; }


  22: }

 

The DenormalizedReference<T> and INamedDocument are also really simple and straight from the RavenDB documentation.

   1: public class DenormalizedReference<T> where T : INamedDocument


   2: {


   3:     public string Id { get; set; }


   4:     public string Name { get; set; }


   5:  


   6:     public static implicit operator DenormalizedReference<T>(T doc)


   7:     {


   8:         return new DenormalizedReference<T>


   9:         {


  10:             Id = doc.Id,


  11:             Name = doc.Name


  12:         };


  13:     }


  14: }


  15:  


  16: public interface INamedDocument


  17: {


  18:     string Id { get; }


  19:     string Name { get; }


  20: }

 

The implicit cast operator in the DenormalizedReference<T> makes using this really simple. Just assign a property and it will take case of the proper reference needed.

   1: var order = session.Load<Order>("orders/42");


   2: order.Company = session.Load<Company>("companies/11");

 

One useful extension method

Loading the single document and doing common operations should be easy now but there are still operations where you will need more data from the related entity. Loading them is easy enough.

   1: var customer = session.Load<Company>(order.Company.Id);

 

However using the DenormalizedReference<T> the structure and type is already captured in the Order class. Using this with a simple extension method makes the code even simpler which is always nice :-)

   1: public static class IDocumentSessionExtensions


   2: {


   3:     public static T Load<T>(this IDocumentSession session, DenormalizedReference<T> reference)


   4:         where T : INamedDocument


   5:     {


   6:         return session.Load<T>(reference.Id);


   7:     }


   8: }

 

This simple extension method will let is load the customer as follows:

   1: var customer = session.Load(order.Company);





 



Saves another few keystrokes and completely type safe. Sweet :-)



 



Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>