Multi-Table Analysis with MYSQL
This article was published as a part of the Data Science Blogathon.
Introduction on MYSQL
In this article, we will see how to work with multiple tables. Learning SQL is very important nowadays. It is a very popular technology that most companies are using. SQL queries are used for querying tables in the process of data analysis. This Analysis can be done using only one single table and also using multiple tables. Now we will see how to analyze multiple tables.
We will work these queries in the MYSQL editor called workbench. Make sure to download the workbench and the community server along with it. Not only workbench you can use any MYSQL editor to work with.
Let’s get started.
For practical purposes, now we will use the mavenmovies database that contains 16 relational tables. It is quite enough database for us to implement our queries. It has details of customers like names, addresses and details about the business like staff, actors, rentals and also it has details of inventory like films, categories and all. It is quite an interesting database.
I uploaded the SQL file of the mavenmovies database to my Github repository. Please head into it and download the SQL file.
The process of structuring the tables and columns in a relational database in such a way as to minimize redundancy and also to preserve data integrity is called database Normalization.
This normalization includes:
- Eliminating duplicate data
- Reducing errors and anomalies
If you don’t do normalization to your database then you will end with lots of duplicate records. In normalization, the data from a single merged table are divided into multiple related tables.
Table Relationships and Cardinality
Cardinality refers to the uniqueness of values in a column of a table and is commonly used to describe
how two tables relate. Relationships between tables include one-to-one, one-to-many or many-to-many. In a table, there may be two types of keys namely a primary key and a foreign key.
- Primary keys are always unique. The primary key uniquely identifies an instance in the table. No two records have the same primary key in a table.
- Foreign keys are non-unique. Records may have the same foreign key. They can repeat. So there can be many instances with the same foreign key.
One-to-many relationship tables are created by connecting a foreign key in one table to a primary key in another table.
Using JOINS for Multiple-Table Querying!
We use joins for multi-table querying. First, we join the tables that e need, and then we query it from the resultant. The whole point of table relationships is to enable multi-table querying which is getting data from multiple tables at once. Mavenmovies database tables are related to each other as follows.
Types of JOINs
Basically, there are five types of JOINs. They are,
|INNER JOIN||It returns records that exist in BOTH tables and
excludes unmatched records from either table
|LEFT JOIN||It returns ALL records that are in the LEFT table and also matching records from the RIGHT table|
|RIGHT JOIN||It returns ALL records that are in the RIGHT table and also matching records from the LEFT table|
|FULL OUTER JOIN||It returns ALL records from BOTH tables, including
non-matching records also
|UNION||It returns all data from one table, with all data
from another table, attached at the end like a stack
SELECT column(s) FROM table1 ANY JOIN table2 ON table1.column=table2.column
For INNER JOIN,
INNER JOIN rightTableName ON leftTable.columnName = rightTable.columnName
For LEFT JOIN,
LEFT JOIN rightTableName ON leftTable.columnName = rightTable.columnName
For RIGHT JOIN,
RIGHT JOIN rightTableName ON leftTable.columnName = rightTable.columnName
For FULL JOIN,
FULL JOIN rightTableName ON leftTable.columnName = rightTable.columnName
UNION SELECT sameColumnName FROM secondTableName
Basically, in practice, we mostly use only two joins. One is inner join and the other is left join. These joins return records only when the ON condition is satisfied. When you write queries using
multiple tables, you need to specify
both the table and column name
(i.e. inventory.inventory_id) else it will give an ambiguity error. Mainly Left joins are used when you need to fetch all the records from the left table and also some records from the right table. Similarly, Right joins are used when you need to fetch all the records from the right table and also some records from the left table.
Bridging Unrelated Tables
There are some cases where you need to fetch data from two tables with have no columns in common. Here you have to search for the third table where it has a column that was in both table 1 and table 2. This third table serves as a bridge between the two tables.
Here in the mavenmovies database, we have no key to connect the customer table directly to the city, but we can join the customer to address (using
address_id), and address to city (using city_id). Here, the address table serves as our bridge between the customer table and the city table.
Working with Workbench
Fetch the records with the film’s title,
description, and the store_id value
associated with each item, and its
SELECT film.title,film.description, inventory.store_id,inventory.inventory_id FROM film INNER JOIN inventory ON film.film_id=inventory.film_id;
Fetch the list of all titles, and
figure out how many actors are
associated with each title
select film.title, count(film_actor.actor_id) as number_of_actors from film left join film_actor on film.film_id=film_actor.film_id group by film.film_id;
Fetch the list of
all actors, with each title that they
select actor.first_name,actor.last_name,film.title from actor inner join film_actor on film_actor.actor_id=actor.actor_id inner join film on film.film_id=film_actor.film_id;
Fetch the list of distinct titles
and their descriptions, currently
available in the inventory at store 2
select film.title,film.description,inventory.store_id from film inner join inventory on film.film_id=inventory.film_id and inventory.store_id=2;
select actor.first_name,actor.last_name,film.title from film inner join film_actor on film.film_id=film_actor.film_id inner join actor on film_actor.actor_id=actor.actor_id where last_name="GUINESS";
Create one list of all staff
and advisor names, and include a
column noting whether they are a
staff member or advisor
select "Advisor" as type, first_name,last_name from advisor union select "Staff" as type,first_name,last_name from staff;
Create a list of all customer names and actor’s names including a column noting whether they are a customer or an actor
select 'customer' as type,first_name,last_name from customer union select 'actor' as type,first_name,last_name from actor;
Fetch the list of customer names along with store_id, active status, and their address.
select customer.first_name,customer.last_name, customer.store_id,customer.active, address.address,city.city,country.country from customer left join address on customer.address_id=address.address_id left join city on address.city_id=city.city_id left join country on city.country_id=country.country_id;
Conclusion on MYSQL
This is all about querying multiple tables. In real we use multi-table analysis than single-table analysis. MYSQL makes it very easy to query from the database in a very simple manner if the logic in the question is very well understood. Overall in this MYSQL article, we have seen,
- how to use joins for multi-table analysis
- Normalization and Cardinality
- Different types of join what we use in joining multiple tables
- how to work with unrelated tables by using a bridge table
- Some examples of multi-table analysis
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Leave a Reply Your email address will not be published. Required fields are marked *