Final Project: Exploratory Data Analysis with SQL

 

This project is part of the course SQL: A Practical Introduction for Querying DatabasesHere is the source. I will be answering the questions asked on MySQL.

Project Overview

Imagine you have been hired by a non-profit organization that strives to improve socio-economic conditions and educational outcomes for children and youth in the City of Chicago. Your job is to analyze the census, crime, and school data.

You will be asked questions that will help you understand the data just like a real world data professional would. You will be assessed on the correctness of both your SQL queries and results.

Task I: Review and familiarize yourself with the datasets

fig.1 Snapshot of Task 1

Task II: Load the datasets in database tables

I downloaded the datasets available in ‘.sql ‘ format and imported all the tables one by one in MySQL database ‘practicedb’ .

fig2 Imported tables

Task III: Write and execute queries to analyze the data

Problem 1

Find the total number of crimes recorded in the CRIME table.

Each case number is for a particular crime so counting distinct case numbers will answer the question.

SELECT COUNT(DISTINCT case_number) AS total_crime FROM chicago_crime_data;
Fig.3 Result of above query
/*Counting rows will give the same result as there are no duplicates*/
SELECT COUNT(*) total_rows FROM chicago_crime_data
Fig.4
Problem 2
Retrieve first 10 rows from the CRIME table.
SELECT* FROM chicago_crime_data LIMIT 10;
Fig.5 Result of query

Problem 3
How many crimes involve an arrest?

Arrest column is a boolean type either TRUE or FALSE.

SELECT COUNT(*) AS crimes_with_arrest 
FROM chicago_crime_data
WHERE arrest = 'TRUE';
Fig.6 Result of above query

Problem 4
Which unique types of crimes have been recorded at GAS STATION locations?

Crime type is in PRIMARY_TYPE field and location is in LOCATION DESCRIPTION field.

SELECT DISTINCT PRIMARY_TYPE, LOCATION_DESCRIPTION 
FROM chicago_crime_data
WHERE LOCATION_DESCRIPTION
LIKE '%Gas station%';
Fig. 7 Result of query

Problem 5
In the CENUS_DATA table list all Community Areas whose names start with the letter ‘B’.
SELECT community_area_name 
FROM CENSUS_DATA
WHERE community_area_name
LIKE 'b%';
Fig.8 Result of above query

Problem 6
Which schools in Community Areas 10 to 15 are healthy school certified?

Sometimes it’s hard to find the exact column names and which table have this column, so we can find the column names and the table with query like below:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%healthy%'
ORDER BY TABLE_NAME ;
Fig.9 Result of query
SELECT name_of_school 
FROM CHICAGO_PUBLIC_SCHOOLS
WHERE Healthy_School_Certified = 'Yes'
AND Community_Area_Number
BETWEEN 10 AND 15;
FIg. 10 Only one school

Problem 7
What is the average school Safety Score?
/*Rounded the figure to 2 decimals.*/
SELECT ROUND(AVG(Safety_Score),2)AS avg_sfety_score
FROM CHICAGO_PUBLIC_SCHOOLS ;
Fig. 11

Problem 8
List the top 5 Community Areas by average College Enrollment [number of students]
SELECT  Community_Area_Name, AVG(College_Enrollment) AS AVG_ENROLLMENT 
FROM CHICAGO_PUBLIC_SCHOOLS
GROUP BY Community_Area_Name
ORDER BY AVG_ENROLLMENT DESC
LIMIT 5;
FIg. 12 Result of query
Problem 9
Use a sub-query to determine which Community Area has the least value for school Safety Score?

Safety score is in character data type in table, so first I found distinct safety scores.

SELECT DISTINCT safety_score 
FROM chicago_public_schools
ORDER BY safety_score ;
Fig. 13

Blanks are also present, so the least value is 1.

SELECT COMMUNITY_AREA_NAME, safety_score 
FROM (SELECT COMMUNITY_AREA_NAME, safety_score
FROM chicago_public_schools WHERE safety_score = 1) school;
Fig.14 Only one area

Problem 10
Without using an explicit JOIN operator] Find the Per Capita Income of the Community Area which has a school Safety Score of 1.
SELECT cs.COMMUNITY_AREA_NAME, cs.COMMUNITY_AREA_NUMBER, PER_CAPITA_INCOME 
FROM census_data cd, chicago_public_schools cs
WHERE cs.COMMUNITY_AREA_NUMBER=cd.COMMUNITY_AREA_NUMBER AND safety_score=1;
Fig. 15 Result of above query

Thank you for reading! If you want to give any suggestion or any feedback please feel free to comment (:

Comments