Data Model Examples to Help You Predict the Future
Data Model Examples to Help You Predict the Future

Hi, my name is DJ Spiess. One common theme
I often see in data model examples is a data model design that forces you to update records.
Let’s say you have an instrument that collects data for you. It can be online, offline, initializing,
collecting, sleeping, waking up, and in error. The status is kept in a field on the instrument
record. Everytime the status changes, you update the record. Easy. Then the instrument goes down, and after a
week of being in error someone finally notices. You look at the instrument record, and yep,
it’s in error. The question is inevitably why? You’re not sure so the next question
is, what was it doing before the error occurred. If you’re writing over your status every time
you update it, you can’t answer that. You can’t even tell when it went down. In this video, we’re going to look at different
ways you can track status in your applications. Ultimately what I’d like to show you, is why
updating your database is a bad idea in general. There are exceptions, but for most cases you
want to just append data. Finally we’ll look at some of the benefits of storing a state
history in your data model, including predicting the future with data mining! So in our first data model example we have
an instrument with a status. It also has an id, name, manufacturer number, description,
and installation location. We just saw that updating the record is really a bad idea.
We can’t see a history of the statuses, when it changed status, etc. Unlike the id, name,
description, and maunufacturer number, the status is something that is likely to change.
In fact, I’d argue the installation location is just like the status. It should not be
part of the instrument record. Status and location are things that are likely to change. One data model pattern I’ve seen for status
is to create a status table like this. The status table has the name of the status, the
timestamp, the previous state, and a reference back to the entity for which this is a record.
This is great because you can perform a query where the instrument id is the entity you
want. You can even narrow your search for the last week, or what ever you need. The
advantage of this approach is it doesn’t affect your Instrument table. The concept of a status
is in a different table. The downside to this data model example is
when you have many things you need to status, and you want to easily add more. You wouldn’t
want to duplicate the status table for every type, so you include a second pointer back
to the next entity type. This doesn’t scale well, and gets ugly as you get more statusable
things to track. This means you have to move the reference
back into your statusable entity. Here’s another approach. Let’s add a status history table
id in our instrument table. We’ll make the business rule that the status history is created
once when the instrument is created. You can enforce this many ways, but that’s another
topic. The advantage of this data model example is
when we need to add new statusable entities, we just include a mandatory key reference
back to the status history. How we do statuses never changes, and it doesn’t grow as we add
more things to status. The other important thing to note, is this
doesn’t just apply for status. When we introduced this problem earlier, we also mentioned we
would not like location in our instrument definition. You would break out location in
a similar way to how we broke out status. You could even add two new statuses, installed
and uninstalled, to work with your location data. Which data model example is better really
depends on how likely you are to add statusable things, and how many. If you’re often add
things to your data model or if you have more than 10 statusable things, I’d use the status
history. If your list of statusable things is small and unlikely to change, use the single
status table. Both patterns can grow to include allowable statuses, status types, etc. Anything
to make your data mining easier. What we’re breaking up is definitional data
vs operational data. Things that define our instrument go into the instrument table. Things
that define it’s operational state should be tracked in a state table. Every record
in your operational table should define a single fact. In the example earlier, we are
defining the state of the instrument for a single datetime. Repeats in the data will
not change the current status of our instrument. The most recent status is the status of the
instrument. Of course we could do this without keeping
the history of the state. We’re really just normalizing the data. Why would we want to
keep the history? To predict the future, and any future questions you don’t know you need
to answer. Of course we could do this without keeping
the history of the state. We’re really just normalizing the data. Why would we want to
keep the history? To predict the future, and any future questions you don’t know you need
to answer. Whenever you update data, you’re really deleting
data and then doing a new insert. When you delete data, you lose information, including
information you didn’t know you needed. If someone comes to you in the future and says
“hey the 90210 instrument type seems sketchy, how often are they going into an error state”,
you can’t answer the question unless you’re keeping the history of statuses. If you only append new data, and never update
or delete, this allows you to answer questions you didn’t know you needed to. You could tell
your boss, the 90210 instruments seem to fail more than the 80205s. In fact, you could even
make a future prediction when 50% will have failed and need to be replaced. That helps
you predict required budgets, time needed, and so on. It’s one of the ideas behind big
data. When I work on database modeling, if I come
across a point where I need an update – that’s a red flag for me. This doesn’t mean updates
are completely evil, it means I need to think about the problem more. Data should not be
deleted until triggered by your data retention policy. Keep everything until you’re certain
you don’t need it. Even then it’s part of a data deletion process, not your day-to-day
business as usual activities when you do delete data. Hopefully this has given you a few ideas on
how to track status in your data. Make sure to subscribe so you can keep up to date with
new videos. See you in the next video!

2 thoughts on “Data Model Examples to Help You Predict the Future”

  1. Juan Vasquez says:

    having a log for your status is a good idea… tx again

  2. Dominic Shaw says:

    At the end of your video (around 6:03), the video replays what you said from 5:23 to 6:02 (minus the clap at the end though).

Leave a Reply

Your email address will not be published. Required fields are marked *