What is Common Table Expressions (CTE) in SQL?
Dive into the world of Common Table Expressions (CTE) in SQL, a powerful tool for simplifying complex queries and enhancing code readability. Discover how CTEs help developers create modular, maintainable SQL code while unlocking the full potential of your database queries, from basic data retrieval to intricate, recursive operations.
This article was published as a part of the Data Science Blogathon.
Table of contents
- What is CTE in SQL?
- Creating and Using CTE in SQL
- Syntax of CTE
- Querying with CTEs
- Chaining Multiple CTEs
- Tips and Best Practices
- Example of CTE in SQL
- How to Avoid Error in Common Table Expressions?
- CTE in SQL vs. Other Techniques
- Advantages of Using CTEs
- Types of CTEs in SQL
- Summing Up
- Frequently Asked Questions
What is CTE in SQL?
CTE is the short form for Common Table Expressions. CTE is one of the most powerful tools of SQL (Structured Query Language), and it also helps to clean the data. It is the concept of SQL (Structured Query Language) used to simplify coding and help to get the result as quickly as possible. CTE is the temporary table used to reference the original table. If the original table contains too many columns and we require only a few of them, we can make CTE (a temporary table) containing the required columns only.
Complex problems can be solved easily with good readability of code using CTE. A code should always be easier to read so that if you are referring it after some time, you will get that code immediately. Even if the third person is referring to your code, he/she can understand it very well if the readability of your code is easy and good. CTE is one of the important concepts of SQL. If you have knowledge of CTE, you are stepping into the world of advanced SQL. Common Table Expressions (commonly known as CTE) helps Data analysts, data scientists, or any other analysts who want to extract their data efficiently and effectively from a table containing big data.
Creating and Using CTE in SQL
Syntax of CTE
To define a CTE, use the WITH keyword, followed by the CTE name, optional column list, and the AS keyword with a query expression in parentheses:
WITH CTE_NAME AS ( SELECT column_name1, column_name2,..., column_nameN FROM table_name WHERE condition ) SELECT column_name1, column_name2,..., column_nameN FROM CTE_NAME;
Querying with CTEs
1. Selecting data from CTEs
After defining a CTE, you can select data from it using a standard SELECT statement:
SELECT column1, column2 FROM CTE_name WHERE ...
2. Inserting, updating, and deleting data using CTEs
You can use CTEs to manipulate data in your base tables. To insert, update, or delete data, reference the CTE in the respective statement:
-- Insert INSERT INTO target_table SELECT ... FROM CTE_name ... -- Update UPDATE target_table SET ... FROM CTE_name WHERE ... -- Delete DELETE target_table FROM target_table JOIN CTE_name ON …
Chaining Multiple CTEs
To use multiple CTEs, chain them together, separating each definition with a comma:
WITH CTE1 AS (SELECT ...), CTE2 AS (SELECT ... FROM CTE1 ...), CTE3 AS (SELECT ... FROM CTE2 ...) SELECT ... FROM CTE3 ...;
Tips and Best Practices
- Use descriptive names for your CTEs to improve code readability.
- Keep CTEs concise, focusing on a single operation or transformation.
- Use non-recursive CTEs to simplify complex queries, and reserve recursive CTEs for hierarchical or iterative tasks.
- When chaining multiple CTEs, consider the order of execution to optimize query performance.
Example of CTE in SQL
Below code is executed in MySQL:
-- Creating databse CREATE DATABASE employee_details; -- Use database USE employee_details; -- Creating table CREATE TABLE employee ( emp_id INT PRIMARY KEY, emp_name VARCHAR(20), email_id VARCHAR(30), city VARCHAR(25), Verification_status CHAR(3) ); -- Inserting values into the table INSERT INTO employee VALUES (1,'Suhas','[email protected]','Mumbai','Yes'); INSERT INTO employee VALUES (2,'Yohani','[email protected]','Mumbai','No'); INSERT INTO employee VALUES (3,'Reshama','[email protected]','Pune','Yes'); INSERT INTO employee VALUES (4,'Raj','[email protected]','Bangalore','No'); INSERT INTO employee VALUES (5,'Shivani','shi[email protected]','Bangalore','Yes'); SELECT * FROM employee;
The Output is given below:
In the example above, I have created the table named ’employee’ which contains the employee details.
The description of the columns of ’employee’ table is listed below:
- emp_id: refers to the unique id of each employee
- emp_name: refers to the name of each employee
- email_id: refers to an email id of each employee
- city: refers to the city where the employee is situated
- Verification_status: If the verification of the background details of an employee is done, this value is stored as ‘Yes’ whereas If the verification of the background details of an employee is not done, this value is stored as ‘No’.
Suppose we want to extract specific information of employees, then we can achieve this by creating CTE.
If we want to check the verification status of employees whether it is verified or not and want to see the records of the employees whose verification is not done yet. So, we require only two columns in our CTE (temporary table) and the columns are emp_name and Verification_status containing the value as ‘No’ from the table ’employee’ .
The following code shows the example of creating CTE using the table ’employee’:
-- Creating CTE WITH mycte AS ( SELECT emp_name , Verification_status FROM employee WHERE Verification_status = 'No' ) SELECT * FROM mycte;
The Output is given below:
Here, ” mycte ” is the name of the CTE (temporary table). ” mycte ” contains the columns emp_name and Verification_status. It contains the details of the employees whose verification is incomplete. This is what we wanted to achieve.
We can make one or more than one CTE from the same table separated by comma.
How to Avoid Error in Common Table Expressions?
his error has occurred because you created the CTE but not used it by selecting any of the columns from CTE. See the corrected code above for creating CTE, where I have created CTE and then selected the columns from this CTE.So, it is necessary to use the CTE by selecting the required columns from it , in order to avoid the error.It is important to note that, you can use the CTE in a query where you have created it. But you cannot use this CTE in later queries , that is , inside the queries that will be created later.For Example:I have created CTE named “mycte” in above example. Then my next query is as follows:
This SQL query is not valid. Because here in this SQL query, I cannot use “mycte” CTE.
” mycte ” CTE does not exists for this SQL query.
” mycte ” CTE exists for the only SQL query where it has been created. That is why CTE (Common Table Expression) is known as a Temporary table in SQL.
Now, let us understand how to use the JOIN query inside CTE. Consider the MySQL code below.
-- Creating table product CREATE TABLE product ( p_id INT PRIMARY KEY, p_name VARCHAR(20), category VARCHAR(30) ); -- Creating table sales CREATE TABLE sales ( p_id INT PRIMARY KEY, p_name VARCHAR(20), gross_sales DECIMAL ); -- Inserting values into the table 'product' INSERT INTO product VALUES (1, 'Mobile', 'Electronics'); INSERT INTO product VALUES (2, 'TV', 'Electronics'); INSERT INTO product VALUES (3, 'Car', 'Toy'); INSERT INTO product VALUES (4, 'Video game', 'Toy'); INSERT INTO product VALUES (5, 'Earphones', 'Electronics'); -- Inserting values into the table 'sales' INSERT INTO sales VALUES (1, 'Mobile', 50000); INSERT INTO sales VALUES (2, 'TV', 40000); INSERT INTO sales VALUES (3, 'Car', 50000); INSERT INTO sales VALUES (5, 'Earphones', 500000); -- Show all columns from the table 'product' SELECT * FROM product; -- Show all columns from the table 'product' SELECT * FROM sales;
“product ” table:
” sales ” table:
In the above example, I have created two tables ” product ” and ” sales “.
The description of the columns of the table ” product ” :
- p_id: refers to the unique ID of the product
- p_name: refers to the name of the product
- category: refers to the category to which each product belongs
The description of the columns of the table ” sales ” :
- p_id: refers to the unique ID of the product
- p_name: refers to the name of the product
- gross_sales: refers to the gross sales of each product
Then, I inserted values into the tables ” product ” and ” sales “.
-- Creating CTE WITH TEMP_CTE AS ( SELECT p.category AS category, COUNT(*) AS No_of_products, SUM(s.gross_sales) AS Total_gross_sales FROM product p JOIN sales s ON p.p_id=s.p_id GROUP BY category ORDER BY Total_gross_sales DESC ) SELECT * FROM TEMP_CTE;
Now, we have to find the gross sales in each category of the product along with the number of products in each category.
To achieve this, I created CTE named “TEMP_CTE”. In this, I have used JOIN to join the two tables “product” and “sales”. I wanted to join the tables over the records which have matches in both the tables. Hence, I have used INNER JOIN. INNER JOIN is also called JOIN. So if you write JOIN in your query instead of INNER JOIN, it will be valid only. p_id is the common column in both the tables.
To find the total gross sales and no. of products, we need to use aggregate functions sum() and count() respectively. To use the aggregate functions, GROUP BY clause has to be there. So we have grouped our result by category and find the sum of the gross sales in each category along with the product count.
Lastly, Total gross sales is ordered into descending, so that we can see the category with the highest gross sales.
CTE in SQL vs. Other Techniques
Comparing CTEs with subqueries
1. Performance implications
CTEs and subqueries often have similar performance characteristics. However, CTEs can improve readability and maintainability in complex scenarios. In some cases, the query optimizer might generate better execution plans for CTEs, resulting in faster performance.
2. Use cases for each approach
- Use CTEs for simplifying large queries, breaking them into smaller, more readable parts, or when the same subquery is used multiple times within the query.
- Use subqueries for single-use, simple, or correlated operations that don’t benefit from CTE’s modularity.
Comparing CTEs with temp tables
1. Differences in usage and storage
CTEs are temporary result sets, used for readability and maintainability, existing only for the scope of a single query. They are not stored in the database.
Temp tables are physical tables stored temporarily in the database, allowing indexing, modification, and access across multiple queries within the same session.
2. Scenarios for choosing one over the other
- Use CTEs for single-query scenarios where readability and maintainability are crucial, and there’s no need to persist data beyond the query’s scope.
- Use temp tables when you need to store intermediate results for multiple queries, require indexing for better performance, or need to modify data within the table during the session.
Advantages of Using CTEs
CTEs provide various advantages that make them a valuable tool for SQL developers:
1. Improved query readability
CTEs break down complex queries into smaller, more manageable parts. By naming each part and isolating it within a CTE, developers can easily understand the purpose of each component and the overall query logic, making the code more readable.
2. Enhanced maintainability
As CTEs improve readability, they also make the code easier to maintain. When updating or modifying a query, developers can quickly identify the relevant CTEs and make changes without affecting the rest of the query. This modularity simplifies troubleshooting and reduces the likelihood of introducing errors.
3. Encapsulation of logic
CTEs allow developers to encapsulate specific portions of a query’s logic, promoting a modular approach to query design. This abstraction not only makes the query more maintainable but also promotes code reusability, as the same CTE can be referenced multiple times within a query, reducing code duplication.
Types of CTEs in SQL
1. Non-Recursive CTEs
Purpose and Usage
Non-Recursive CTEs are used to simplify complex queries and improve code readability. They serve as a temporary result set, allowing developers to break down and organize various parts of the query logically, which can be referenced later within the same query.
- Aggregating sales data from multiple tables, making it easier to calculate KPIs and metrics.
- Filtering and transforming data before joining it with other tables, improving query performance and readability.
2. Recursive CTEs
Purpose and Usage
Recursive CTEs enable querying hierarchical or iterative data structures, such as parent-child relationships, tree traversal, or finding the shortest path in a graph. They consist of an initial, non-recursive part (called anchor member) and a recursive part (called recursive member), which references the CTE itself.
- Retrieving the entire hierarchy of employees and their managers in an organization.
- Finding the shortest path between two nodes in a transportation network or the least number of steps to solve a puzzle.
To sum up, I would like to say; Common Table Expressions is a very easy concept to implement. Some of the key takeaways are:
- Using CTE makes the SQL code readable, increasing the code’s efficiency.
- CTE makes the analysis easy for analysts.
- Debugging which is one of the important parts of the queries, becomes easier due to CTE.
- To extract specific information (to be more specific, columns) from a lot of data, CTE is the most effective way to handle this situation.
If you are giving an interview for any analyst position or data scientist, the interviewer checks the knowledge of SQL. That time CTE plays a vital role in showcasing your knowledge and let the interviewer you your problem-solving skills and capabilities with greater complexity.
I hope this article about Common Table Expressions finds you insightful and helps you in your knowledgeable data journey with SQL. Hopefully, your knowledge of the amount of salt in the food has increased!
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Frequently Asked Questions
A. CTE and subqueries often have comparable performance. However, CTEs can make your query more readable and maintainable, especially in complex scenarios. Sometimes, CTEs can be faster due to better execution plans generated by the query optimizer.
A. CTEs are temporary result sets, used for readability and maintainability, and exist only for the scope of a single query. Temp tables are physical tables stored temporarily in the database, and they can be indexed, modified, and accessed across multiple queries within the same session.
A. To use 3 CTEs, you can chain them together, separated by commas:
CTE1 AS (SELECT …),
CTE2 AS (SELECT … FROM CTE1 …),
CTE3 AS (SELECT … FROM CTE2 …)
SELECT … FROM CTE3 …;
A. CTEs in SQL can be categorized into two types: Non-Recursive and Recursive. Non-Recursive CTEs are used for simplifying complex queries, while Recursive CTEs are used for querying hierarchical or iterative data structures, like retrieving parent-child relationships or traversing trees.