Tutorial On Creating A Frequency Distribution Chart With Microsoft Excel, R and Python

, , 6 comments
I got to join this amazing community of Data Scientists in Nigeria. We are a mix of experts and beginners. Today, I created a tutorial for the beginners to see how to do a common task like frequency distribution plot in both Python and R, also decided to include my dearest Microsoft Excel as a control.

The sample data is a fictionalized data for Dominos Pizza Nigeria. One day sales data for their Lekki branch. You can download the practice along raw data file here: https://dl.dropboxusercontent.com/u/28140414/Dominos%20Pizza.csv 



So the business question we want to tackle is: Is there a pattern in the quantities each customer buys? To be more specific, we want to examine the frequency distribution of the quantities purchased per sales transaction.

In Excel, it is extremely straightforward. Just plot a histogram on the quantity field.



Now let's head to doing same with R

I use R 3.3.2 and RStudio.

First, I import the csv file into RStudio.






Though not necessary for what we want to do, but I like doing it for any data I bring into R, I run the summary command on the dataframe/table. > summary(Dominos_Pizza)


Again, not a required step. I check out the standard plot graph on the Quantity field. > plot(Dominos_Pizza$Quantity)



Finally, I do the histogram chart on the Quantity field. > hist(Dominos_Pizza$Quantity)




For now I don't bother customizing the graph elements (labels, color, title, etc.)

It is Python time.

I use Rodeo IDE and Anaconda. 



I import Pandas and use it to read in the csv file. 






And here is the plot graph, like we did in R.



Finally, I create the histogram.




I will try to follow up with more tutorials of complex tasks, and some that are best suited to R and others that are best suited to Python. As per Excel, it is in a completely different class. It is a spreadsheet application. 

Got any particular task you will like me to create a tutorial around? Ask away!

6 comments:

  1. Hello Michael, thank you for this easy-to-follow tutorial. I am a beginner in R and was able to replicate your steps. I was also at the Bootcamp but your friend F. Okoye sent me this link. Thanks again.

    ReplyDelete
    Replies
    1. Cool. Glad to hear you were also at the Bootcamp. I've got to thank Francis for the kindness.

      Thanks for trying out the tutorial steps. I hope to publish more in the future.

      Delete
  2. Amazing, and I went a little extra to sort, sum total order for each pizza flavor and check which is the best selling for that day using R. I got Pepperoni Suya and that took me quite sometime as a beginner oh, but was exciting to get it right. Please I would appreciate a lecture or link to one on cleaning data before analyzing it.Thank you.

    ReplyDelete
    Replies
    1. That's impressive!

      I will try to create other tutorials around data cleaning.

      Delete
  3. Hi Michael

    Useful Post. Any more information about the "amazing community of Data Scientists in Nigeria" you mentioned you joined?

    ReplyDelete
    Replies
    1. Hi Layibiyi,

      Well, we now have a vibrant Whatsapp community and are gearing up for the next bootcamp.

      I have met other amazing data analysts in the community.

      Cheers.

      Delete

You can be sure of a response, a very relevant one too!

Click on Subscribe by Email just down below the comment box so you'll be notified of my response.

Thanks!