Setting Up My First Amazon Web Services (AWS) Virtual Server for Data Analytics

, , No Comments
Today, I finally joined the AWS train. As part of the big data analytics project I am working on for a client, I would need to host a cloud server to mine data off Twitter and Google, saving them daily in a CSV file. Ordinarily, I would have used my laptop as the server for this task, but since this is a commercial project and I can pass the cost to the client, I have to use a dependable affordable virtual server.

I then went on Google to search. I didn't want to go with Microsoft Azure as I have issues with the payment system and I heard Azure isn't the best pocket-friendly option for a very small business. I stumbled on Alibaba Cloud Service but before long I settled for Amazon Web Services.



Setting up was really easy; easier than I expected since it was common statement online that AWS is not as straightforward to use as Azure. Now having experienced both, I disagree.


I used the EC2 service and created a t2-micro Windows Server 2012 R2 instance.




On connecting to the Server via Remote Desktop Connection, I set it up for my data analysis work. I installed Microsoft R Open and RStudio for my R based data analytics work. And I installed Anaconda and PyCharm for my Python based analytics work.

Below is how my server setup looks like.


I even pinned my commonly used tools to the taskbar -- Powershell, Task Scheduler, RStudio and PyCharm.

What is left is how to estimate how much this would cost me per month and if I would need to automate startup/shutdown of the server to save compute time/costs. Some aspect of the startup/shutdown decision will depend on if Twitter approves my request to get unrestricted access to their Tweet database and don't charge me some crazy amount. I have already initiated the request and they have requested I provide them some information they will use to decide whether to grant my request or not. My current way around the limitation in how many tweets per 15 minutes I can scrape is to run the scrapping script every three hours, and still I think I don't get enough representative tweets and some tweets keep showing repeatedly.

Overall, I am loving my new adventure into the deep side of data analytics. And I will always keep you all updated on my progress and share my learning.

0 comments:

Post a Comment

You can be sure of a response, a very relevant one too!

Click on Subscribe by Email just down below the comment box so you'll be notified of my response.

Thanks!