Single Table Analysis with MYSQL

Amrutha K 23 Jun, 2022 • 5 min read

This article was published as a part of the Data Science Blogathon.

Introduction on SQL

In this article, we will see how to use SQL statements for data analysis. Data analysis can be done on only single tables or on multiple tables. In this particular article, we will mainly focus on only a single table analysis. Nowadays SQL is very much popular and is a very important topic. In this article, we will do querying and analysis of the database. In this overall practical section, we use the MYSQL workbench to run our queries. Make sure to download the workbench and community server.

Let’s get started.

First, let us understand SQL.

SQL

SQL stands for Structured Query Language. It is a language that is used for relational databases to query and get data from the database. SQL lets you access and manipulates databases. Learning SQL is always fun and very easy. There are many flavors of SQL like MYSQL, SQLite, SQL Server, Postgre SQL, and Oracle.

Out of these, we use MYSQL for now. All these flavors are very much similar to each other with just little differences. And we use a workbench editor for MYSQL.

What is SQL? Learn the basics of data analytics | Ironhack Blog

Database

Now for practice, we use the mavenmovies database. I uploaded the mavenmovies database SQL file to my GitHub repository. Please head into it and download it. Run the SQL file in the workbench to get a database with tables.

In this database, there are 16 related databases. And mainly it contains the details about customers, Business, and inventory. Each table has rows and columns. Rows refer to records and columns refer to the fields. Tables contain primary keys and Foreign keys. Primary keys are always unique. The primary key identifies a unique record in the database. Whereas Foreign keys are non-unique.

Elements of MYSQL Statement

The elements of the SQL statement are called “BIG 6” elements. These elements form the structure of an SQL statement. These are,

SELECT – It identifies the column or columns that you need to query

FROM – Identifies the table(s) that you are querying from

WHERE – To filter the results based on some conditions. It is optional.

GROUP BY – To group the data in the results. It is optional

HAVING – To filter results by group filtering. It is optional

ORDER BY – To order the results either in ascending or descending order. It is optional

An SQL statement looks like this,

SELECT column-name FROM table-name WHERE logical-condition GROUP BY column-name HAVING logical-condition ORDER BY column-name

LIKE Operator

This like operator is very useful when you need to fetch records but you know only little about the words in it. It allows you to use a pattern like how it actually was.

Examples :

WHERE name LIKE ‘Denise%’ -- records where the name starts with ‘Denise’, with any number of characters after 
WHERE description LIKE ‘%fancy%’ -- records that contain ‘fancy’, with any characters before OR after 
WHERE name LIKE ‘%Johnson’ -- records that end with ‘Johnson’, with any number of characters before 
 WHERE first_name LIKE ‘_erry’ -- records that end with ‘erry’, with exactly one character before (i.e. Terry, Jerry)

Aggregate Functions

COUNT() – count the records

COUNT DISTINCT() – count the distinct values

MIN() – to find the smallest value

MAX() – to find the largest value

AVG() – to find the average value

SUM() – to find the sum of values

Working with Workbench

Top MySQL Workbench Alternatives - Ubiq BI

Fetch the list of the first
name, last name, and email of
each of our customers

SELECT first_name,last_name ,email from customer;

Fetch the records of films and see if there are
any other rental durations

SELECT DISTINCT rental_duration from film;

Fetch records of customers whose first name is Mary.

SELECT * from customer where first_name="mary";

Fetch payment records of the first 100 customers where the amount is greater than 5 and payments that are done after 6th January 2006.

SELECT * from payment where amount>5 AND payment_date>'2006-01-06' AND customer_id<=100;

Fetch the list of films that
include a Behind the Scenes
special feature

select title,special_features from film where special_features like "%Behind the Scenes%";

Fetch the count of
titles sliced by rental duration

select rental_duration , count(film_id) from film GROUP BY rental_duration;

Fetch the count of
films, along with the average,
min, and max rental rate,
grouped by replacement cost

select replacement_cost, count(film_id) as number_of_films, avg(rental_rate) as avg_rental_rate,min(rental_rate) as cheapest_rental,max(rental_rate) as most_expensive_rental from film group by replacement_cost;

Fetch the list of
customer_ids with less than 15
rentals all-time

select customer_id, count(rental_id) from rental group by customer_id having count(rental_id)<15;

Fetch the list of all film
titles along with their lengths
and rental rates, and sort them
from longest to shortest

select title,length,rental_rate from film order by length DESC;

CASE Statements (MYSQL)

CASE statements are very helpful when working with databases. These CASE statements allow us to make some conditional logic to specify how our output should be. CASE statements are defined using WHEN and THEN. The top WHEN/THEN pair executes
first. If true, the CASE is complete. If
not, it continues to test each
condition until the END statement is reached. If multiple conditions
are true, then the output will be based on the first condition, since it’s the
first to be evaluated (top to bottom). If none of the given conditions are true then the NULL value is returned as output.

SQL CASE Statement | Examples to Implement SQL CASE Statement

CASE statements can also work with the logical operators that we used with WHERE statements. These logical operators include:

= Equals
Does NOT Equal
> Greater Than
< Less Than
>= Greater Than Or Equal To
<= Less Than Or Equal To
BETWEEN A Range Between Two Values

LIKE Matching a Pattern Like This

LIKE Matching a Pattern Like This
IN() Equals One of These Values

Fetch the list of first and
last names of all customers, and
label them as either ‘store 1
active’, ‘store 1 inactive’, ‘store 2
active’, or ‘store 2 inactive’

select distinct first_name, last_name,
	case
    when store_id =1 and active=0 then "store 1 inactive"
    when store_id =1 and active=1 then "store 1 active"
    when store_id =2 and active=0 then "store 2 inactive"
    when store_id =2 and active=1 then "store 2 active"
    else "no sense"
    end
    as Active_status
from customer;

Create a table dividing the length of the film providing the length.

SELECT distinct
	length,
    CASE
		WHEN length<60 THEN 'UNDER 1 HR'
        WHEN length BETWEEN 60 AND 90 THEN '1 - 1.5'
        WHEN length>90 THEN 'OVER 1.5 HRS'
        ELSE ' UH NO... CHECK LOGIC'
	END 
    AS length_bucket
FROM film;

Create a table that contains the film title and specify whether the movie is suitable for the recommendation. If not please specify the reason for it.

select distinct	
	title,
    case 
    when rental_duration<=4 then 'rental too short'
    when rental_rate>=3.99 then'too expensive'
    when rating in('tNC-17', 'R') then 'too adult'
    when length not between 60 and 90 then ' too short or too long'
    when description like '%Shark%' then 'nope has sharks'
    else 'great recommendation'
    END
    as fit_for_recommendation
from film;

Create a table to
count the number of customers
broken down by store_id (in
rows), and active status (in
columns)

select store_id,
       count(case when active=0 then active else null end) as inactive_count,
       count(case when active=1 then active else null end) as active_count
from customer
group by store_id;

Conclusion on MYSQL

SQL is very important to learn. SQL is a very useful and popular technology that many companies are using nowadays. It is a very easy and understandable language. You just need to understand the logic in the query. If you understand the language then it will be very easy for you to write the query to the given question. Overall in this article, we have seen,

  • How to perform data analysis on the mavenmovies database
  • We have used only a single table for any query
  • BIG 6 elements of SQL statement with some examples
  • CASE statements with examples

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Amrutha K 23 Jun 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear