This project is part of the course SQL: A Practical Introduction for Querying Databases. Here is the source. I will be answering the questions asked on MySQL.
Project Overview
Imagine you have been hired by a non-profit organization that strives to improve socio-economic conditions and educational outcomes for children and youth in the City of Chicago. Your job is to analyze the census, crime, and school data.
You will be asked questions that will help you understand the data just like a real world data professional would. You will be assessed on the correctness of both your SQL queries and results.
Task I: Review and familiarize yourself with the datasets

Task II: Load the datasets in database tables
I downloaded the datasets available in ‘.sql ‘ format and imported all the tables one by one in MySQL database ‘practicedb’ .

Task III: Write and execute queries to analyze the data
Problem 1
Find the total number of crimes recorded in the CRIME table.
Each case number is for a particular crime so counting distinct case numbers will answer the question.
SELECT COUNT(DISTINCT case_number) AS total_crime FROM chicago_crime_data;

/*Counting rows will give the same result as there are no duplicates*/
SELECT COUNT(*) total_rows FROM chicago_crime_data

SELECT* FROM chicago_crime_data LIMIT 10;

Arrest column is a boolean type either TRUE or FALSE.
SELECT COUNT(*) AS crimes_with_arrest
FROM chicago_crime_data
WHERE arrest = 'TRUE';

Crime type is in PRIMARY_TYPE field and location is in LOCATION DESCRIPTION field.
SELECT DISTINCT PRIMARY_TYPE, LOCATION_DESCRIPTION
FROM chicago_crime_data
WHERE LOCATION_DESCRIPTION
LIKE '%Gas station%';

SELECT community_area_name
FROM CENSUS_DATA
WHERE community_area_name
LIKE 'b%';

Sometimes it’s hard to find the exact column names and which table have this column, so we can find the column names and the table with query like below:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%healthy%'
ORDER BY TABLE_NAME ;

SELECT name_of_school
FROM CHICAGO_PUBLIC_SCHOOLS
WHERE Healthy_School_Certified = 'Yes'
AND Community_Area_Number
BETWEEN 10 AND 15;

/*Rounded the figure to 2 decimals.*/
SELECT ROUND(AVG(Safety_Score),2)AS avg_sfety_score
FROM CHICAGO_PUBLIC_SCHOOLS ;

SELECT Community_Area_Name, AVG(College_Enrollment) AS AVG_ENROLLMENT
FROM CHICAGO_PUBLIC_SCHOOLS
GROUP BY Community_Area_Name
ORDER BY AVG_ENROLLMENT DESC
LIMIT 5;

Safety score is in character data type in table, so first I found distinct safety scores.
SELECT DISTINCT safety_score
FROM chicago_public_schools
ORDER BY safety_score ;

Blanks are also present, so the least value is 1.
SELECT COMMUNITY_AREA_NAME, safety_score
FROM (SELECT COMMUNITY_AREA_NAME, safety_score
FROM chicago_public_schools WHERE safety_score = 1) school;

SELECT cs.COMMUNITY_AREA_NAME, cs.COMMUNITY_AREA_NUMBER, PER_CAPITA_INCOME
FROM census_data cd, chicago_public_schools cs
WHERE cs.COMMUNITY_AREA_NUMBER=cd.COMMUNITY_AREA_NUMBER AND safety_score=1;

Comments
Post a Comment