Analysis of Meteorological Data

Data analysis can be described as the process of inspecting, cleansing, transforming and modelling data with the goal of discovering useful information, informing conclusions and supporting decision making. In this article we will be performing data analysis on Meteorological Data. You can find the dataset at

This dataset provides historical data on many meteorological parameters such as pressure, temperature, humidity, wind speed, visibility, etc. The dataset has hourly temperature recorded for the last 10 years starting from 2006–04–01 00:00:00.000+0200 to 2016–09–09 23:00:00.000+0200. It corresponds to Finland, a country in Northern Europe.

We will perform data cleaning, and analyze it to test the hypothesis-

Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global Warming”.


If you already have Jupyter Notebook installed and all the necessary python libraries (numpy, Scikit-learn, matplotlib) installed then you are ready to get started.


Importing the necessary python libraries

First we start by importing all the helpful libraries that we would be needing for the analysis.

Now let us discuss about the libraries:

Numpy is a general purpose array processing package for scientific computing with Python.

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in python.

Loading the Dataset

To load the data from a CSV (comma-separated value) file into Pandas dataframe we use the “read_csv” function in Pandas.

In the above code snippet we can see that the data set contains 96453 records and 12 columns.

A data type object(an instance of numpy.dtype class ) describes how the bytes in the fixed size block of memory corresponding to an array item should be interpreted.

Before starting with visualization of data, we need to make date features into date time object. For this we use to _datetime() function.

Using the df.dtypes command after making the date features into date time objects:

To have a look at the top few rows of the dataframe we can use the head() method, which returns the top n(5 by default) rows of a dataframe or series.

Resampling the data

Since we have hourly data, we need to resample it to monthly. Resampling is a convenient method for frequency conversion. Object must have a datetime like index.

In the above code snippet Month starting is denoted by “MS”.

Using the mean() function, we are displaying the average apparent temperature and humidity.

Plotting the variation in Apparent Temperature and Humidity with time

We will be using Seaborn to plot the variation in apparent temperature and humidity with time.

Seaborn is a python data visualization library based on matplotlib. It provides a high level interface for drawing attractive and informative statistical graphics.

The above code gives the following output:

The above plot shows that humidity remained almost constant in these years. Even the apparent temperature is almost the same (since peaks lie on the same line).

If we want to specially retrieve the data of a particular month from every year, say April, then:

Plotting the variation in Apparent temperature and Humidity for the month of April every year

The above code displays the following plot:


As for our Hypothesis,

“ Has the apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”

No change in average humidity was observed over the ten years from 2006 to 2016.

We can observe an increase in the average apparent temperature in the year 2009, then again a drop in 2010, a slight increase in 2011, a significant drop in 2015 and finally an increase in 2016.

Thank you for reading my article!

You can find the source code at