Using Jupyter Notebook for my first data project

Recently I’ve gained an interest in Jupyter Notebook, datasets, and acquiring data. Of course, I just had to spin up a pet project for such an adventure, and thus I decided to use Jupyter Notebook.

I honestly wasn’t sure what to do for this Jupyter Notebook as I really just wanted to learn how to use Jupyter Notebook, plotting data, and so on. I found that I always enjoyed watching videos and reading up on how expensive the Hong Kong housing market is and after realizing the Government of Hong Kong provided a decent bit of economic data for public use, I decided on charting data from the Government of Hong Kong on the topic of the housing market.

Starting Off With Pandas

My first challenge was to acquire data, so I decided that I needed multiple scripts to get the economic data. Based on what I looked at while figuring out how to extract data from spreadsheets, it seemed that Pandas was the best candidate.

Occupations and monthly salaries parsed with Pandas
Occupations and monthly salary data retrieved with Pandas

With the exception of the capital markets data, all the other scripts for the project were coded in Python. The reason for why capital markets was in PHP? I realized the capital markets data was in JSON format so it was way easier to get the data than from spreadsheets, so I made that one in PHP with the thought of migrating it to Python in the near future. The data provider for my capital markets data also suggested that I pull the most recent data every time, however, I wanted to extract data and since the data was only updated monthly or so, I don’t exactly need the latest data if it was going to be the same for the entire month.

Pandas does a lot of heavy lifting in terms of parsing through spreadsheets it turns out. It took a bit of time to figure out how the data was structured after it parsed the spreadsheet and it still takes me some trial and error to figure out how to read elements of the data even when it shows the data structure to me.

Consumer Price Indices Dataframe output
Pandas Dataframe for Consumer Price Indices

A Website To Accompany The Notebook

I wanted an alternative to just a Jupyter Notebook to display data and thought that allowing people to easily read this data through a website could be useful. The plan for a website was met by me telling myself “Another website!?” and to be honest, I got so sick of web projects. Eventually, I was like “you know what, I’m gonna spend most of the time on the Jupyter Notebook anyways” and decided to “greenlight” the website idea.

HK Housing Stats Website Screenshot
HK Housing Stats website screenshot

Back to PHP

Initially I wanted to create the website in NodeJS with Express.js, however, after realizing I was spending more time on the website trying to figure out how to connect to the MySQL database from NodeJS, I decided to scrap the idea for now and move back to PHP. I really did not want to push through another PHP project but here we go again I guess.

Switching to PHP make the website development go so much faster and I can see why they call LAMP stack one of the easiest to use.

Charting And Data Troubles

I can’t tell whether or not I messed up on the data type when inserting into the database, not properly understanding how to chart with matplotlib, or both, but there was a decent bit of converting I had to do to get matplotlib to recognize the series of values I wanted it to plot on the X and Y axes.

HSI monthly price failed chart
My first attempt at charting data from my database

Finishing The Project

After finishing the project, I decided that I should probably make the repository public on Github. However, I really wanted to separate the website in case I wanted to host it on its own down the road. In the end, the Github repository for the Jupyter Notebook itself was posted on Github as a nice example of what I learned to do with Pandas, Python, and Jupyter Notebook.

I’m still not overly accustomed to Jupyter Notebook as each “cell” of the notebook file must be executed for the next part to work if you separated a block of code into different cells. But I must say, the layout for code and output in Jupyter Notebook is way cleaner and better to look at than whatever I’m seeing in Spyder or command line.

I think I’ll continue to poke around on Jupyter Notebook with the hopes of improving my ability to chart database rows and use a library that isn’t matplotlib.

To check out the Jupyter Notebook: https://github.com/angusleung100/hkhousinganalysis

To check out the website: https://hkhousingstats.techiskey.net/

Angus

Angus

Student, Blogger, and Developer, with an interest in fintech, aerospace defence, and finance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Font Resize
Contrast