What is Big Data? And How Is A Statistician Different From Data Analyst?

In case you've been hearing about big data and don't yet know what exactly it means, today is your lucky day.

First, a little background to help put things in the right perspective. Data is simply information that has been gathered and could be stored. 20 years ago, gathering information was relatively expensive. It often required people going around with forms and gathering information, like a national census. Then another set of people will be hired to enter those data in a storage and retrieval system. And those storage and retrieval systems were very expensive such that only big organizations could afford a significantly sized one.

Recently, all that has changed. It is now dirt cheap and sometimes free to own a storage and retrieval system (examples are your Facebook account and your email account). Then the gathering process is now digital and automated. No need of using people and forms. We now have more data than we know what to do with them. And that is the foundation on which big data is built on.

Big data is when you automatically gather and store all the data about your business (or life, yours or some else's), and it runs into billions and billions of billions, then you decide you are not going to discard anything but try to use everything with the hope of learning things you never knew and be able to make accurate predictions. That is big data. It is how Facebook figures out people you may know.

image: collegerag.net

Now to the second part: Statistician vs Data Analyst.

In those days of forms and pen for data gathering, there were data entry people (who gathered and cleaned the data for use) and there were statisticians who worked on the data (often as a sample of a bigger population) and try to make inference and predictions.

Recently, there are no more data entry people. Computers, chips and technologies now abound that gather data automatically and put them in usable form. Now the people needed to work on these data are people who can work these new technologies, mine data on a scale traditional statisticians have never experienced and build analysis a traditional statistical tool can't handle. 

In a sentence, data analysts are the new statisticians. The real ones know as much statistics as a statistician but use new tools a traditional statistician doesn't use. 

Unfortunately, there are more fake data analysts than genuine ones. I am one of the fake too but I am already fixing that.


  1. You must be kidding me. Thanks for clarifying though. I like to think of myself more as a data analyst than a statistician though. I'm even looking forward to doing a postgraduate program in data analysis.

    1. Hello Chief,

      No, I'm not. I came into this data analysis by chance. So I have to now go through the right learning.

      Great to hear you are planning on a more structured formal learning at an advanced level. That's super cool!

  2. Nice Article! Lool @ the last statement! I would recommend running Data Analysis courses on Udacity, Coursera and Lynda.com. I am a Data Analyst myself. Good work on your site and Blog. Keep it up!


