An automated, around-the-clock, ETL pipeline using GCP
Github Repository for this project can be found here.
There is a niche market in Shanghai, China where property management companies will seek out long-term leases with owners of the European-style houses in the Former French Concession. These houses, built from the late 1800s to the early 1930s are now in a state of disrepair.
The management companies invest money to fix and renovate apartments in these houses. The newly renovated apartments are then marketed and rented out, at a higher price, to international (and local) professionals in the city.
These properties, because of their age, are especially vulnerable to Shanghai’s humidity and rain. Black mold, leaks, and water damage can get out of control if not monitored carefully.
The goal of this project is to construct a data pipeline for live weather updates (specifically humidity, temperature, and rain) so that property managers can visualize trends in weather and take preventative measures during stretches of high humidity and before heavy rainfall.
I collected the data for this pipeline from the Open Weather Map API. I specifically used the “Current Weather Data” and “One Call API” under the free subscription plan.
I am currently pulling current weather conditions every minute, 7-day forecast predictions every day, and weather history every hour. At this rate, I will reach 100,000 data points in about two months.
The data is stored in Google Firestore. The collections are categorized by the three types of API pulls (current, forecast, and history). Within each collection, a document is a dictionary of weather information for the time requested.
Workflow
API Calls:
Scheduled with Cloud Scheduler:
DATA QUERY:
Humidity % in Shanghai over the last 24 hours, coded by weather condition
Histogram of humidity % over the last 24 hours (counts are every minute)