Sign up with your email address to be the first to know about new products, VIP offers, blog features & more.
[mc4wp_form id="4890"]
Zapisz Zapisz

Data – Fact or Opinion?

Posted on 0 No tags 0

There’s a lot of available information talking about different types of database technology, how to format your data, how to index it for fast retrieval – you name it. At this point though I’d like to talk about something a little more abstract: what to store, and where to store it. One way to divide data that I strongly recommend you follow is to always keep in mind whether a piece of data is unarguable fact or business-logic derived opinion. I also suggest that documents or tables only ever contain one of those two pieces of data.

To explore the difference let’s take one of the classic database examples, checking accounts, and play with it for a bit. Often used to illustrate the examples of transaction handling, the classic example has an account table that contains an identifier and a balance:

ID      | Balance
--------+---------
Al      |    $100
Bob     |    $100

When Al pays Bob $20, transactions make sure that the UPDATE statement that adds $20 to Bob’s account and the one that subtracts $20 from Al’s account either both occur or both fail – which is a pretty big deal in this model. This also means that that action has to get properly serialized with one in which Bob is paying Charlie, though, even if that shouldn’t affect Al in the slightest…

The thing is that the whole nature of a “current balance” is somewhat fictitious. Let’s say that on the same day Bob pays out a $110 check. Does it bounce? That depends on whether the debits are applied before or after the deposits, which is ultimately a business decision. By tracking information in this way we’re combining things that actually happen (facts) with how we perceive them (opinions).

To break these out, let’s start by moving to two tables. One of them, the fact table, contains events:

EVENT 1 : Monday    : Al deposits $100
EVENT 2 : Tuesday   : Bob deposits $100
EVENT 3 : Wednesday : Charlie presents $110 check against Bob's account
EVENT 4 : Wednesday : Al pays Bob $20

Now when the nightly batch runs and the bank determines whether or not to pay the check presented in EVENT 3, business logic can be applied. It would be common to have a process that runs through the days events and applies all of the deposits before applying the withdrawals, for example. This process might actually update an account table as described above, but that table would be ephemeral and could be rebuilt at any time by “replaying” the event information. At the end of the night an “EVENT 5 : Bob pays Charlie $110” event could be added to the list.

A change in business logic might affect the way that the sequence is interpreted next week (possible resulting in a “EVENT x : Charlie’s check is returned NSF” row instead). The very fact that the same events might create a different result indicates that the account table is more business-logic based than fact-based. That’s nice for another reason, by the way – the database is by far the hardest part of your system to scale, so almost any time you can do more work in the app after the fact and put less logic in the database, you come out a winner.

One advantage to this is that these event-based tables are inherently WORMs – Write Only Read Many systems. You can easily have distributed event capture for later reconciliation. Even if you get into a situation with partial system outages, and Al withdraws $70 twice in the same day, the fact that he did so should be logged as true. There may be business-level consequences to those actions, but that doesn’t change the fact that the actions occurred.

In that vein, resist any urges to update an event after its entered. You don’t know what business systems have already taken action based on the existence of the event, and figuring that out is a real PITA. Let’s say for example that after the 4 event series above its determined that the check that Charlie presented was for $11 rather than $110 and had been mis-entered. If the event was adjusted, that wouldn’t account for the fact that either Charlie was sent money or Bob was sent an NSF notification, both real-world events that can’t be rolled back.

The desire to do such an update actually indicates missing information in the captured event. Let’s reword them as follows:

EVENT 1 : Monday    : Teller 1 claims that Al deposits $100
EVENT 2 : Tuesday   : Teller 1 claims that Bob deposits $100
EVENT 3 : Wednesday : Teller 3 enters that Charlie presents $110 check 
                    : "24601" against Bob's account
EVENT 4 : Wednesday : Teller 1 claims that Al pays Bob $20 in the bank
EVENT 5 : Wednesday : Automated system QBRT transfers $110 to Charlie's bank from 
                    : Bob's account for check "24601"

All of these things have happened. Now we can add:

EVENT 6 : Thursday  : Manager 6 adjusts check amount "24601" from $110 to $11

Which will possibly inspire:

EVENT 7 : Friday    : Automated system QBRT presents Charlie's bank with claim for 
                    : $99 for check "24601"
EVENT 8 : Friday    : Automated system QBRT deposits credits $99 to Bob's account 
                    : related to check "24601"

At this point a business decision might dictate that Bob’s statement only reflect the actual amount of the $11 check, hiding the bank’s mistakes – a perfectly acceptable decision since no NSF notices were drafted. That should never be mirrored in the real world log of events, though, since the money transfer did actually happen.

This has been greatly simplified; in reality there would be a large number of separate fact-based event tables (at Kimbia, for example, we keep track of a gateway communication being initiated and receiving a response as two separate but related events), and there’d be a lot of “cached” business logic derived states. Additionally, business-logic initiated actions (like mailing a statement or even presenting information in a webapp) might be saved, since they’d represent real-world events that can never be rolled back, and might alter future business-logic actions (such as whether or not to admit the $99 error discussed above to the account holder).

This approach of maintaining these separate views of the world may appear trickier at first, but where it can be followed you’ll often find that systems become easier to scale and easier to enhance in the future.

signature