There are many ways of storing data when developing applications, some more mature and capable than others. Storing data of some sort or another in an application is common. Extremely common to be exact as almost every application out there needs to store data is some way or another. After all even a game usually stores the users achievements.
But it’s not games I am interested in. Sure they are interesting to develop and play but most developers I know are busy developing line of business (LOB) applications of some sort or another. One thing line of business application have in common is they work with and usually store data of some sort.
When looking at data oriented applications we can categorize data storage architectures based on different characteristics and capabilities.
Level 0: Data Dumps
The most basic way of working with data is just dumping whatever the users works with in the UI to some proprietary data file. This typically means we are working with really simple Create Read Update Delete (CRUD) type of data entry forms and not even storing the data in a structured way. This is extremely limited in capabilities and should generally be avoided at all costs. Any time you have to work with a slightly larger data set or update the structure you are in for a world of hurt.
Level 1: Structured Storage
At level 1 we are still working with CRUD style data entry forms but at least we have started using a formal database of some sorts. The database can be a relational database like SQL Server or MySQL but a NoSQL database like MongoDB is equally valid. While data database used allows us to do much more the user interface is not much better. We are still loading complete objects and storing them in a CRUD fashion. This might work reasonably well in a low usage scenario with a low change of conflicts but is really not suitable for anything more complex than a basic business application. We are only storing the current state of the data and as the database stores whatever is send from the UI, or business processing code, there is no sense of meaning to any change made.
Level 2: Command Query Responsibility Segregation
When we need to develop better and more complex business applications we really should use Command Query Responsibility Segregation (CQRS) as minimum. In this case we separate the read actions from the write actions. We no longer just send an object to be stored from the user interface to the back end but we are sending commands to update the data. These commands should be related to business actions the application works with. So in other words if a business analyst sees the command names he should be able to make sense of what they do without looking at the code implementations.
While this is a lot better we are still only storing the current state of the data. And that is the problem as it can be very hard to figure out how something got to be in a given state. So if a users detects that something is wrong with the data and suspects a bug in the program we might just have a hard time figuring out how it got to be that way. And once we do fixing the issue might be extremely hard as well.
There are other limitations with just storing the current state like not being able to produce reports, or only at great difficulty, the ask for. Or possibly alter business rules after the fact. And if you think that doesn’t happen just try working on a large government project where the slowness of the decision process means that rules are only definitely updated after the fact.
Level 3: Event Sourcing
The most advanced level to be working at is using Event Sourcing (ES). An events sourced application resembles a CQRS style application in a lot of ways except for one vital part. With an Event Sourced application we are no longer storing the current state of the data but we are storing all events that lead up to this. All these events are stored as one big steam of changes and are used to deduce the current state of the data in the application. These events typically never change once written, after all we don’t change history (although our view of it might change over time). This has some large benefits as we can now track exactly how the state came to be as it is making it easier to find bugs. And if the bug is in how we used those business events that we can fix the bug and often that is enough to deduce the correct state.
The usual queries done in an application are much harder on an event stream. In order to fix that issue the events are usually projected out to a read model making querying much easier. This read model it normally stored in some appropriate database like SQL Server or a NoSQL database but could also just be kept in memory. However the event stream is the true source of the truth and not the projections as these are just a derived result. This means we can delete all projections and completely rebuild them from the existing events resulting in much more flexibility. Need to do an expensive query in version two of an application? Well just create a projection designed for that purpose and rebuild it from all previously stored events. This is similar to our view of history changing.
There are some more benefits from storing events instead of just the current state. We can now do temporal queries, or queries over time, on how the data got to be how it is. These kind of queries have lot of goals like for example fraud detection. Another possibility is displaying the state at any previous point in time and running reports or analysis on the data as it was then.
It’s kind of hard to say at what level you should be working. Level 0, limited as it is might be appropriate for your application. Lots of applications are at level 1 and just basic forms over data CRUD applications. In some that might be appropriate but in a lot of cases that is actually sub optimal. Level 2 with CQRS is a pretty sweet place to be. You can capture the business intent with command and have a reasonable flexibility. At level 3 with event sourcing you gain a lot of flexibility and strength. If you are doing a more complex business application you should be working on this level. But as always there is no free lunch so don’t go there is the application is really not that complex