Data Analysis using Python

Before getting started with Data analysis we will get to know a little about Jupyter Notebook and Google colab.

Jupyter Notebook is a web-based interactive computing platform that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls.

Google Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. Colab is especially well suited to machine learning, data science, and education.

For this exercise we will be using Google Colab.

The data we are going to analyze is the CBSE Class-X Result Data. This dataset contains information related to CBSE Class-X results for schools in the different regions. Below is a sample from the dataset. This sample contains details of the number of students Registered(Regd.), Appeared(Appd.) and cleared(Pass) the exam from Region Ajmer based on school categories as GOVT, GOVT AIDED, INDEPENDENT, JNV, KV, and CTSA (Central Tibetan School Administration).

Region	Status	CTSA	GOVT	GOVT AIDED	INDEPENDENT	JNV	KV
Ajmer	Regd.	0	8537	0	          102253	4945	9194
Ajmer	Appd.	0	8509	0	          101822	4928	9160
Ajmer	Pass	0	8154	0	          100178	4911	8875

To get started lets create a new notebook in the Google Colab(https://colab.google/) and load the dataset file.

We will use Python pandas library to read the csv file and display the first three rows of the dataset.

As a next step lets find the total number of students for each status of all school categories combined.

Then we will create a dict with key as Region and total students appeared & passed as values.

We then will convert the above dict to dataframe and use it to plot stacked bar chart.

Finally we will calculate the pass percentage for each region and plot a bar chart to visualize the results.

Leave a Reply

Your email address will not be published. Required fields are marked *