Understand The Basics of Data Analysis using SQL

Vishesh Arora 10 Aug, 2021 • 6 min read

This article was published as a part of the Data Science Blogathon

Introduction

SQL is one of the most widely used skills when it comes to Data Analysis. Analysts around the world rely on SQL for their analysis. In this article, we will cover the basics of Data Analysis using SQL.

You can also learn the basics of SQL in my previous article on Analytics Vidhya.

ORDER BY

The ORDER BY keyword is used to order records based on values in a certain column. It allows you to arrange records in ascending and descending order. For example, let us assume we have sales records in a table called “Orders”. Let us have a look at the data in this table.

SELECT *
FROM Orders
LIMIT 10;

The output will look like this:

order by |  Data Analysis using SQL

The table contains information about the buyer (USER_ID), the product they bought (PRODUCT_ID), the amount they spent (SUBTOTAL and TOTAL), quantity, and the date of purchase.

Let us order these records in decreasing order of amount spent. This way, the highest-grossing orders will come on top, and smaller orders will come at the bottom. We will use the ORDER BY and DESC keywords for this.

SELECT *
FROM Orders
ORDER BY Total DESC;

The output will look like this:

order by desc |  Data Analysis using SQL

You can see that the records are now arranged in decreasing order of the order total. In case you want to see records in increasing (ascending) order, you can use the ASC keyword followed by the ORDER BY. Alternately, you can omit the ASC keyword, and SQL will order records in increasing order by default.

GROUP BY & Aggregate Functions

The GROUP BY statement is used to group records based on values in a certain column. For example, we can group all records for a particular user together and then use aggregate functions (SUM, AVG, COUNT, MAX, MIN) on top of it.

Let us try to see the average order value of each user. We will use the GROUP BY statement and the AVG function for this. Also, let us arrange them in decreasing order of average order value.

SELECT User_id, AVG(Total) AS "Average Order Value"
FROM Orders
GROUP BY User_id
ORDER BY AVG(Total) DESC;

The output will look like this:

groupby |  Data Analysis using SQL

You can see the users with the highest average order value appear on top.

Aggregate functions are often used in conjunction with the GROUP BY statement. Let us see another example to understand better. Let us see the number of orders placed by each user. To do this, we will use the GROUP BY statement and the COUNT function.

SELECT User_id, COUNT(*) AS "Number of orders"
FROM Orders
GROUP BY User_id;

COUNT(*) will count the number of records associated with each USER_ID. The output will look like this:

Groupby count

We can see the number of orders associated with each user.

JOINS

JOINs in SQL allow you to join two or more tables on a common column (field). JOINs are extremely useful because they help us combine data from multiple tables in a single output. Let us understand this with an example.

The Orders table contains information about customers’ orders. The Products table contains information about different products. Let us join these two tables.

SELECT User_id, Total, Discount, Product_id, Vendor, Price, Rating
FROM Orders
JOIN Products ON Orders.Product_id = Products.id;

The JOIN clause allows us to join the Orders table with the Products table on a common column. The ON clause lets us specify the columns on which the tables should be joined.

The output will look like this:

Joins

We can see the Orders table has been joined on the Products table. We can see columns from both tables together in a single output.

We can also combine JOINs with the GROUP BY statement or the ORDER BY keyword and do more meaningful analyses. Let us see an example of this.

We will see the total amount spent by different users, and the total cost of the products they had bought. We will then arrange records by the total amount spent by users in decreasing order.

SELECT User_id, SUM(Total) AS "Total amount spent", SUM(Price) AS "Total cost", AVG(Rating) AS "Average Rating"
FROM Orders
JOIN Products ON Orders.Product_id = Products.id
GROUP BY User_id
ORDER BY SUM(Total) DESC;

The output will look like this:

join groupby |  Data Analysis using SQL

We have combined the two tables, Orders and Products, and then used the GROUP BY statement and ORDER BY in the same query to see the total amount spent and the total cost of products bought by different users.

Types of JOINs

JOINs can be used in a variety of ways to solve different business use cases. There are 5 major types of JOINs in SQL.

1. INNER JOIN: An Inner Join returns all records from both the tables where the join condition is satisfied.

2. LEFT (OUTER) JOIN: A Left Join returns all records in the left table, and matches records from the right table.

3. RIGHT (OUTER) JOIN: A Right Join returns all records in the right table, and matches records from the left table. A right join works similar to a left join.

4. FULL (OUTER) JOIN: A Full Join returns all records from both tables.

5. SELF JOIN: A Self Join is used to join a table on its own. It is quite useful in some use cases.

Types of join | v

                                                                        Source: https://learnsql.com/blog/learn-and-practice-sql-joins/

WHERE & LIKE

The WHERE statement helps us in filtering records based on values in a column. Let us look at an example to see how it works. We will look at orders where the total field is more than 100.

SELECT *
FROM Orders
WHERE Total > 100;

The output will look like this:

Where and like |  Data Analysis using SQL

We can see that the result only contains orders where the total is more than 100.

The LIKE operator is used with the WHERE statement to look for a pattern in a column. We can also use wildcards (like % and _) with the LIKE operator. The % sign represents 0 or more than 0 characters. The _ sign represents exactly 1 character.

If we want to filter records where the product name contains “Shoes”, we can use the LIKE operator.

SELECT *
FROM Products
WHERE Title LIKE '%Shoes%';

The output will look like this:

Use of Like operator |  Data Analysis using SQL

We can see that all the records have “Shoes” in their title. The LIKE operator is very useful when columns contain string values.

AND & OR

The AND and OR operators are used to combine multiple conditions. For instance, if we want to filter records where the product title contains shoes and the price is less than 50, we can use the AND operator.

SELECT *
FROM Products
WHERE Title LIKE '%Shoes%'
AND Price < 50;

This will filter out records where the product title contains shoes, and the price is less than 50. The output will only contain records where both these conditions are met. The output will look like this:

And or |  Data Analysis using SQL

Only 3 records are returned in the output. Well, guess what. The shoes are expensive!

If we want to filter records where the price is less than 25 or the rating is more than 4.5, we can use the OR operator.

SELECT *
FROM Products
WHERE Price < 25
OR Rating > 4.5;

The output will look like this:

Where condition |  Data Analysis using SQL

We can see that the output contains records where the price is less than 25 or the rating is more than 4.5. It returns all records where at least one of these conditions is met.

The AND and OR operators can be used in conjunction also to create custom filters.

EndNote

In this article, we looked at the basics of Data Analysis in SQL. This article was written by Vishesh Arora. You can connect with me on LinkedIn. You can also read my other blogs on Analytics Vidhya. To know more about me, you can check out my portfolio.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Vishesh Arora 10 Aug 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses