Analyse data straight away to ease future development

May 18th 2014

I'm a fan of analytical applications. Lord knows I've written quite a few in my time.

The absolute central component of analytical apps, is the data behind them. This data is quite often huge sets, with extremely complex structures linking it all together.

Over time, things can change that will invalidate certain information. Dead data is absolutely useless to everyone involved. It costs money to store, and more time and money to make worthwhile again.

Take a users IP address for example. An IP address is used typically to identify the geographical information for a users visit. They are also (typically) extremely dynamic. An IP address for a user in London one day, could easily be assigned to a user in Newcastle the next.

So, what's the deal?

OK. So, let's pretend you're designing a new system to give your users an insight into visitor information. Your data looks something like this:

Text Snippet:
{
    "ip_address": "123.123.123.123",
    "time": 1104537600,
    "date": "01 / 01 / 2005 @ 0:0:0 UTC"
}

That's cool. But, it's 2014. When you try to map this IP to a physical geographical location, it'll be wrong (99.9% of the time).

What you really need to do is to identify the geographical information associated to the IP immediately. Before you store the document you should add the dynamic data to it. A well structured document would look something like this:

Text Snippet:
{
    "ip_address": "123.123.123.123",
    "geo": {
        "country": "GB",
        "city": "London"
    },
    "time": 1104537600,
    "date": "01 / 01 / 2005 @ 0:0:0 UTC"
}

Not to keep reiterating this, but I can't stress it enough. Mapping the information immediately makes it a lot easier in the future. Regardless of whether you think you'll EVER use it, or display it. Maybe one day you'll ask yourself: "I wonder how many users came from London.". When you do, you'll be able to easily get a count.

Actually, thinking about it, there 'could' be ways to retrospectively get the geographical information, but that would be an absolute pain, and I wouldn't envy the person who needed to do that!

To summarise: Analyse data straight away and make sure you've collected all you can when it comes to dynamic sets.


Visit me on Google+

Leave a comment