Customer & Sales Analysis: MySQL and Power BI

 

Analyzing customer purchase behavior, sales, and revenue.



Final Dashboard after analysis in Power BI

Introduction

In this project, I’ll Identify the gender breakdown of Superstore company’s customer base, the most popular product categories for different age groups and genders, trends in sales and transactions over timeproducts that are in high demand, and product categories that are low in stock, and most profitable product categories. The analysis can be used to target marketing and sales efforts more effectively, improve product assortment and marketing campaigns, forecast future sales and make informed decisions about marketing and sales strategies, improve inventory management and prevent stockouts, and focus marketing and sales efforts on the most profitable areas.

1. About Dataset

The source of the dataset is a subset of Sample superstore. It has 4 tables; Customer, Inventory, Product, and Sales.

  1. A glimpse of customer dataset 4 columns:

2. Glimpse of product dataset 4 columns:

3. Glimpse of sales dataset 6 columns:

4. Glimpse of inventory dataset:

2. Data Exploration and Cleaning

I’ve added the SQL script of this on GitHub. I’ll start with Number of Rows of each table:

Checking the unique values in each id column for each table:

A mismatch in rows for the product table, means there are Duplicates in product table, then I identify the duplicates.

The same Products have different price values. Now in this case I’ll keep the products having maximum price value. To delete the duplicates, I’ll need a column that is unique so I’ll create a row index column with the help of auto-increment.

With the help of Common Table Expression and ROW NUMBER function, I deleted the duplicates and 8332 row(s) affected, only unique 1862 rows left. I verified if there are any duplicates with:

For the product id ‘FUR-BO-1000046’ there were different prices 104,145 etc. Now only the maximum-price product is left.

Identifying Null or blank values: I found 15 records in stock with values zero, it is possible the stocks are zero, so no need to remove them. Some product id and quantity are missing in this case I’ll replace the quantity with mean and remove the product_id as there are only 3 missing records.

Missing values in the sales table

I checked the quantity column for any inconsistencies such as alphabets or special characters, as it is a character data type. So, I used the query “SELECT quantity FROM sales WHERE quantity NOT REGEXP ‘[^0-9]’;” and 15 rows blank rows returned, which means there are no special characters or alphabets. Then I replaced the blank quantities with average (4) and changed the data type to Integer.

3. Analyze and Share

Connecting MySQL with Power BI: To visualize the data, I utilized Power BI by connecting it to MySQL. I created a VIEW to represent the query results that were intended for visual representation.

Views and Tables in the Database

I first analyzed the data related to a particular table, such as the number of customers by gender, age range, and time period. I then analyzed the data based on the project objective.

3.1 Number of customers by gender: Female 673 & Male 131.

3.2 Age range: The customer’s Age ranges from 18 to 80.

3.3 Order dates: From 2019–01–03 to 2022–12–30.

3.4 Transaction & Sales by Year:

Sales have increased from 2.6 million in 2019 to 4.3 million in 2022, with the number of transactions increasing as well. 2022 had the highest number of transactions, while 2019 had the lowest. The trend of increasing sales and transactions is evident from year to year

3.5 Number of Transactions & Sales by Month:

December achieved 1.9 million in sales, while February recorded approximately 0.35 million. November saw 755 transactions, whereas February had 162. Notably, the last quarter of the year experienced higher sales and transaction volumes.

3.6 Sales & Number of Transactions by Day of Week:

Sunday garnered 2.5 million in sales, whereas Tuesday reached approximately 0.7 million. Additionally, Sunday recorded 956 transactions, and Tuesday had 274. Evidently, customers tend to engage in more shopping activities during the latter part of the week.

3.7 Transaction and Sales by Age: Customers who are 58 years old contribute the highest to sales and transactions, while those who are 67 years old contribute the least. To better understand the distribution of sales in respect of age, I created age bins of size 10. It tells us that customers aged between 20 to 29 have made the most transactions.

3.8 Sales by Gender: As there are more female customers, it was anticipated that the highest sales would come from females compared to male customers.

3.9 Percentage of products with zero stocks

3.10 Total revenue generated by the company for each product category

We can see that the most revenue is coming from Furniture 57% of the total, then office supplies then the least from Technology contributing only 14% to total revenue.

3.11 Top 5 customers who have made the highest total purchases, considering the customer’s age and gender.

The top 5 customers predominantly consist of females, outnumbering males. There is a significant age variation among these customers, ranging from 20 to 61, with similar patterns observed for both genders.

3.12 Most profitable product category by calculating the average revenue per unit sold.

Furniture is the most profitable product category based on the average revenue per unit sold, followed by Office Supplies and Technology. Furniture accounts for 57% of the contribution, with revenue of 1.9M, while Technology represents the least profitable category with a contribution of 15%.

3.13 The average age of customers for each product category.

The Furniture category has an average age of 44, while both Technology and Office Supplies categories have an average age of 43. This suggests that there is a slight variation in the age of customers across these product categories, with Furniture attracting slightly older customers on average.

3.14 Top product category that has the highest average transaction amount.

Technology has the highest average transaction amount (2573). Office Supplies closely follows with a slightly lower average transaction amount (2552). Furniture has a slightly lower average transaction amount (2513).

This suggests that customers, on average, spend the most per transaction in the Technology category, followed by Office Supplies and then Furniture. It indicates that Technology products may have a higher price point compared to the other categories.

3.15 Popular product category by Gender

Among both gender office Supplies is the popular product category. However, the contribution of female customers is more than males in every product category.

3.16 Popular product category by Age Group

Here also Office supplies are the most popular category among all the age groups, followed by furniture then technology. And the popularity of the category decreases as we move toward the younger age groups.

3.17 Product Category having Zero Stock in the inventory.

I checked the inventory for products with zero stock. Office supplies have the highest number of zero stock, followed by furniture and technology, which have the same percentage.

3.18 YoY% Revenue Growth

4. Conclusion

4.1 Target Audience: The company’s customer base is predominantly female, so it makes sense to target more female customers in its marketing and advertising campaigns. And in terms of Age, target middle age customers as this age group contributes more to sales.

4.2 Continue to focus on Furniture as it is the company’s most profitable product category. This could be done by expanding the company’s selection of office supplies, or by offering discounts on office supplies. Market office supplies to older adults. Office supplies are also popular among older adults, so the company could market its office supplies to this group.

4.3 Consider expanding the Technology product category it is currently the company’s least profitable product category, but it has the potential to be more profitable. The company could consider expanding this category by adding new furniture products, or by offering discounts on furniture products.

4.4 Product Stock: It is important to understand why there is zero stock for product categories. Also, monitor inventory levels on a regular basis to ensure that the company has enough products to meet customer demand, company can set up alerts for low inventory levels so that it is notified when a product category is running low on stock.

4.5 Increase marketing and advertising efforts during the last quarter of the year. This is when sales and transactions are highest, so it makes sense to focus marketing efforts during this time. Target more customers on SundayOffer discounts or promotions during the off-peak seasons. This could help to attract customers during times when sales are typically lower.

Any thoughts or suggestions are welcome in the comment or you can directly message and connect with me on Linkedin.

 

Comments

Popular posts from this blog

Why Data Analysts Will Be in High Demand in 2025: Insights and Career Tips for Aspiring Analysts A Look Back at 2024: The Year for Data Analysts

5 Free Job Simulations for Data Analytics: Build Your Skills and Get Noticed by Recruiters

I Asked ChatGPT How to Spot Phrases That Give Away AI-Written Text - Here's What It Told Me