What is Common Table Expressions (CTE) in SQL?

Bhakti Pawar 04 Jul, 2024
11 min read

In structured query language (SQL), mastering Common Table Expressions (CTEs) is akin to unlocking a powerful data manipulation and analysis toolset. CTEs SQL, often abbreviated as CTE, are the cornerstone of efficient SQL coding. They offer a streamlined approach to handling complex queries and enhance code readability.

CTE, or Common Table Expression, serves as a temporary table within SQL, providing a concise and efficient way to reference and manipulate data from the original table. With CTEs, developers can streamline their code, focusing only on the necessary columns and operations, thus enhancing efficiency and readability. This comprehensive guide delves deep into the realm of CTEs, exploring their syntax, applications, and best practices.

Learning Objectives:

  • Understand what Common Table Expressions (CTEs) are and their purpose in SQL.
  • Learn how to create and utilize CTEs in SQL queries
  • Recognize the advantages of using CTEs for simplifying complex queries and improving code readability

This article was published as a part of the Data Science Blogathon.

What is CTE in SQL?

CTE is the short form for Common Table Expressions. CTE is one of the most powerful tools of SQL (Structured Query Language), and it also helps to clean the data. SQL is the concept used to simplify coding and help get results as quickly as possible. CTE in SQL is the temporary table used to reference the original table. If the original table contains too many columns and requires only a few, we can make CTE (a temporary table) containing only the required columns.

Complex problems can be solved easily with good readability of code using CTE. A code should always be easier to read so that if you refer to it after some time, you will immediately get that code. Even if the third person is referring to your code, he/she can understand it very well if the readability of your code is easy and good. CTE function in SQL is one of the most important concepts in SQL. If you know about CTE, you are stepping into the world of advanced SQL. Common Table Expressions (CTE) helps Data analysts, data scientists, or any other analysts who want to extract their data efficiently and effectively from a table containing big data.

Types of CTEs in SQL

1. Non-Recursive CTEs

Non-recursive Common Table Expressions simplify complex queries and improve code readability. They serve as a temporary result set, allowing developers to logically break down and organize various parts of the query, which can be referenced later within the same query.

Real-life examples

  • Aggregating sales data from multiple tables makes calculating KPIs and metrics easier.
  • Filtering and transforming data before joining it with other tables improves query performance and readability.

2. Recursive CTEs

Recursive Common Table Expressions enables querying hierarchical or iterative data structures, such as parent-child relationships, tree traversal, or finding the shortest path in a graph. They consist of an initial, non-recursive part (called anchor member) and a recursive part (called recursive member), referencing the CTE in SQL.

Real-life examples

  • Retrieving the entire hierarchy of employees and their managers in an organization.
  • Finding the shortest path between two nodes in a transportation network or the least number of steps to solve a puzzle.

Advantages of Using CTEs

CTEs provide various advantages that make them a valuable tool for SQL developers:

1. Improved query readability

CTEs break down complex queries into smaller, more manageable parts. By naming each part and isolating it within a SQL CTE, developers can easily understand the purpose of each component and the overall query logic, making the code more readable.

2. Enhanced maintainability

CTE function in SQL improve readability and make the code easier to maintain. When updating or modifying a query, developers can quickly identify the relevant CTEs SQL and make changes without affecting the rest. This modularity simplifies troubleshooting and reduces the likelihood of introducing errors.

3. Encapsulation of logic

CTEs in SQL allow developers to encapsulate specific portions of a query’s logic, promoting a modular approach to query design. This abstraction makes the query more maintainable and promotes code reusability, as the same CTE can be referenced multiple times within a query, reducing code duplication.

Creating and Using CTE in SQL

Let us now learn how to create and use CTE in SQL.

Syntax of CTE

To define a CTE, use the WITH keyword, followed by the CTE name, optional column list, and the AS keyword with a query expression in parentheses: 

WITH CTE_NAME AS
( 
 SELECT column_name1, column_name2,..., column_nameN
 FROM table_name
 WHERE condition
)cte in sql
SELECT column_name1, column_name2,..., column_nameN
FROM CTE_NAME;

Querying with CTEs

Let us now utilize Common Table Expressions (CTEs) to select and manipulate data in SQL queries.

1. Selecting data from CTEs

After defining a CTE, you can select data from it using a standard SELECT statement:

SELECT column1, column2

FROM CTE_name

WHERE ...

2. Inserting, updating, and deleting data using CTEs

You can use CTEs to manipulate data in your base tables. To insert, update, or delete data, reference the CTE in the respective statement:

-- Insert

INSERT INTO target_table

SELECT ... FROM CTE_name ...

-- Update

UPDATE target_table

SET ...

FROM CTE_name

WHERE ...

-- Delete

DELETE target_table

FROM target_table

JOIN CTE_name ON …

Chaining Multiple CTEs

To use multiple CTEs, chain them together, separating each definition with a comma:

Example of CTE in SQL

Here is a tutorial on how to implement CTE in SQL server.

Example 1:

Below code is executed in MySQL:

-- Creating databse

CREATE DATABASE employee_details;
-- Use database

USE employee_details;

-- Creating table

CREATE TABLE employee
(
emp_id INT PRIMARY KEY,
emp_name `VARCHAR`(20),
email_id `VARCHAR`(30),
city `VARCHAR`(25),
Verification_status `CHAR`(3)
);
-- Inserting values into the table
INSERT INTO employee VALUES (1,'Suhas','[email protected]','Mumbai','Yes');
INSERT INTO employee VALUES (2,'Yohani','[email protected]','Mumbai','No');
INSERT INTO employee VALUES (3,'Reshama','[email protected]','Pune','Yes');
INSERT INTO employee VALUES (4,'Raj','[email protected]','Bangalore','No');
INSERT INTO employee VALUES (5,'Shivani','[email protected]','Bangalore','Yes');
SELECT * FROM employee;

The Output is given below:

Output | common table expressions

The Output is given below:

In the example above, I created a table named ’employee’ containing employee details.

The description of the column list of the ’employee’ table is listed below:

  • emp_id: refers to the unique employee ID of each employee
  • emp_name: refers to the name of each employee
  • email_id: refers to the employee ID of each employee
  • city:  refers to the city where the employee is situated
  • Verification_status: If an employee’s background details are verified, this value is stored as ‘Yes’; if they are not verified, it is stored as ‘No.’

Suppose we want to extract specific information about employees; then, we can achieve this by creating SQL CTE.

If we want to check the verification status of employees, whether they are verified or not, and see the records of the employees whose verification is not done yet, we require only two columns in our CTE (temporary table): emp_name and Verification_status, which contain the value ‘No’ from the table ’employee.’

The following code shows the example of creating CTE in SQL using the table ’employee’:

-- Creating CTE
WITH mycte AS
(
 SELECT  emp_name , Verification_status
 FROM employee
 WHERE Verification_status = 'No'
)
SELECT * FROM mycte;

The Output is given below:

Output | common table expressions

Here, ” mycte ” is the name of the CTE (temporary table). ” mycte contains the columns emp_name and Verification_status. It contains the details of the employees whose incomplete verification is what we wanted to achieve.

We can make one or more than one CTE from the same table separated by a comma.

How to Avoid Error in Common Table Expressions?

This error occurs because you create the Common Table Expression but do not use it when selecting any of the columns from CTE. See the corrected code above for creating CTE, where I have created CTE and then selected the columns from this CTE in SQL. So, it is necessary to use the CTE function in SQL by selecting the required columns to avoid the error. It is important to note that you can use the CTE in a query where you created it. But you cannot use this Common Table Expression in later queries, that is, inside the queries that will be created later. For example, I created a Common Table Expression named “mycte” in the above example. Then my next SQL query is as follows:

SELECT *

FROM mycte;

This SQL query is not valid. Because here in this SQL query, I cannot use “mycte” CTE.

” mycte ” CTE does not exists for this SQL query.

” mycte ” CTE exists for the only SQL query where it has been created. That is why CTE (Common Table Expression) is known as a Temporary table in SQL.

Example 2:

Now, let us understand how to use the JOIN query inside CTE. Consider the MySQL code below.

-- Creating table product

CREATE TABLE product

(

p_id INT PRIMARY KEY,

p_name `VARCHAR`(20),

category `VARCHAR`(30)

);

-- Creating table sales

CREATE TABLE sales

(

p_id INT PRIMARY KEY,

p_name `VARCHAR`(20),

gross_sales DECIMAL

);

-- Inserting values into the table 'product'

INSERT INTO product VALUES (1, 'Mobile', 'Electronics');

INSERT INTO product VALUES (2, 'TV', 'Electronics');

INSERT INTO product VALUES (3, 'Car', 'Toy');

INSERT INTO product VALUES (4, 'Video game', 'Toy');

INSERT INTO product VALUES (5, 'Earphones', 'Electronics');

-- Inserting values into the table 'sales'

INSERT INTO sales VALUES (1, 'Mobile', 50000);

INSERT INTO sales VALUES (2, 'TV', 40000);

INSERT INTO sales VALUES (3, 'Car', 50000);

INSERT INTO sales VALUES (5, 'Earphones', 500000);

-- Show all columns from the table 'product'

SELECT * FROM product;

-- Show all columns from the table 'product'

SELECT * FROM sales;

Output :

“product ” table:

CTE in SQL

” sales ” table:

 Source: Author

In the above example, I have created two tables product and sales.

The description of the columns of the table ” product ” :

  • p_id: refers to the unique ID of the product
  • p_name: refers to the name of the product
  • category: refers to the category to which each product belongs

The description of the columns of the table ” sales ” :

  • p_id: refers to the unique ID of the product
  • p_name: refers to the name of the product
  • gross_sales: refers to the gross sales of each product

Then, I inserted values into the tables product and sales .

-- Creating CTE

WITH TEMP_CTE AS

(

SELECT p.category AS category,

       COUNT(*) AS No_of_products,

       SUM(s.gross_sales) AS Total_gross_sales

FROM product p JOIN sales s 

ON p.p_id=s.p_id

GROUP BY category

ORDER BY Total_gross_sales DESC

)

SELECT * FROM TEMP_CTE;

Now, we have to find the gross sales in each product category and the number of products in each category.

To achieve this, I created the Common Table Expression named “TEMP_CTE”. I have used JOIN to join the two tables, “product” and “sales.” I wanted to join the tables over the records with matches in both tables. Hence, I have used INNER JOIN. INNER JOIN is also called JOIN. So, if you write JOIN in your query instead of INNER JOIN, it will be valid only. p_id is the common column in both tables.

To find the total gross sales and no. of products, we need to use aggregate functions sum() and count(), respectively. To use the aggregate functions, the GROUP BY clause has to be there. So, we have grouped our results by category and found the sum of the gross sales in each category and the product count.

Lastly, Total gross sales are ordered in descending order to see the category with the highest gross sales.

Why do We Need CTE in the SQl Server?

CTEs in SQL Server provide numerous benefits compared to traditional methods for creating complex queries. Here is the reason why they are helpful:

Enhanced Clarity: Breaking down complex queries into smaller, manageable CTEs can greatly improve the readability of your code. This will simplify the process of maintaining and altering the queries for both you and others in the future.

Modularization: Common Table Expressions behave as temporary named result sets that can be referenced within the main query. This methodical strategy helps in concentrating on precise data alterations and computations in every CTE, thus simplifying the overall logic comprehension.

Reusability allows for referencing a CTE multiple times in a query because it is a named result set. This eliminates the necessity of repeating intricate subqueries, cutting down on redundancy and enhancing code efficiency.

Hierarchical Data Management: Recursive CTEs are a unique form of CTE designed for handling hierarchical data structures. This is especially handy for tasks such as navigating bills-of-materials or organizational charts

CTE in SQL vs. Other Techniques

Comparing CTEs with subqueries

1. Performance implications

CTEs and subqueries often have similar performance characteristics. However, CTEs can improve readability and maintainability in complex scenarios. In some cases, the query optimizer might generate better execution plans for CTEs, resulting in faster performance.

2. Use cases for each approach
  • Use CTE in SQL Server to simplify large queries, break them into smaller, more readable parts, or when the same subquery is used multiple times within the query.
  • Use subqueries for single-use, simple, or correlated operations that don’t benefit from CTE’s modularity.

Comparing CTEs with temp tables

1. Differences in usage and storage

The Common Table Expressions are temporary result sets that exist only for the scope of a single query and are not stored in the database. They are used for readability and maintainability.

Temp tables are physical tables stored temporarily in the database. They allow indexing, modification, and access across multiple queries within the same session.

2. Scenarios for choosing one over the other
  • Use CTEs for single-query scenarios where readability and maintainability are crucial, and data need not be persistent beyond the query’s scope.
  • Use temp tables when you need to store intermediate results for multiple queries, require indexing for better performance, or need to modify data within the table during the session.

Tips and Best Practices

  • Descriptive Naming: Opt for meaningful names for your CTEs to enhance the clarity of your code, aiding in easier understanding and maintenance.
  • Conciseness: Keep your CTEs focused on a single task or transformation, avoiding unnecessary complexity that could obscure their purpose.
  • Non-Recursive vs. Recursive CTEs: Reserve recursive CTEs for tasks involving hierarchical or iterative operations, while non-recursive CTEs can simplify complex queries without such structures.
  • Optimize Query Performance: When chaining multiple CTEs, consider the order of execution to optimize query performance, ensuring efficient data retrieval and processing.
  • Readability: Prioritize readability in your SQL queries by structuring CTEs and their associated statements in a clear and organized manner, making it easier for other developers to comprehend your code.

Conclusion

In conclusion, Common Table Expressions (CTEs) in SQL are a powerful tool for simplifying complex queries and enhancing code readability. By breaking down intricate SQL operations into modular, maintainable components, CTEs empower developers to unleash the full potential of their database queries. Whether basic data retrieval or tackling recursive operations, CTEs offer a versatile solution for various SQL challenges. With their ability to streamline coding processes and improve query structure, CTEs are indispensable for data analysts, data scientists, and other professionals working with large datasets.

By diving into the world of CTEs, SQL developers can elevate their skills and optimize their database operations, ultimately driving greater efficiency and effectiveness in data manipulation. Some of the key takeaways are:

  • Using CTE makes the SQL code readable, increasing the code’s efficiency.
  • CTE makes the analysis easy for analysts.
  • Debugging, one of the important parts of the queries, becomes easier due to CTE.
  • To extract specific information (to be more specific, columns) from a lot of data, CTE is the most effective way to handle this situation.

If you are giving an interview for any analyst position or data scientist, the interviewer checks the knowledge of SQL. That time, CTE plays a vital role in showcasing your knowledge and letting the interviewer see your problem-solving skills and capabilities with greater complexity.

I hope this article about Common Table Expressions finds you insightful and helps you in your knowledgeable data journey with SQL. Hopefully, your knowledge of salt in the food has increased! Also, you will get a proper information about the cte in SQL server.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 

Frequently Asked Questions

Q1. What does a CTE do in SQL?

A. A Common Table Expression (CTE) in SQL is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps to simplify complex queries by breaking them into more manageable parts.

Q2. Is CTE better than a subquery?

A. CTEs can be better than subqueries for readability and maintenance. They make queries easier to understand by giving parts of the query a name, which is helpful in complex operations. However, performance-wise, they are often similar to subqueries.

Q3. What is CTE in SQL interview?

A. In an SQL interview, a CTE might be discussed to test your understanding of writing and optimizing queries. You might be asked to explain its syntax, advantages, and how it differs from subqueries or temporary tables.

Q4. What is the difference between CTE and table in SQL?

A. A CTE is a temporary, named result set defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. A table, on the other hand, is a permanent data storage structure in the database. CTEs are for short-term use within a query, while tables store data persistently.

Bhakti Pawar 04 Jul, 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers