Advanced SQL for Data Science

Simran Joshi 02 Sep, 2022 • 9 min read

This article was published as a part of the Data Science Blogathon.

Introduction 

The article focuses on techniques to deal with a wide range of data types; mastering these can be useful for the user. The article doesn’t focus on the basics of SQL, including standard syntax, functions, and applications but aims to expand the fundamental knowledge of SQL. The writing also covers the concept of subqueries.

sql server

Let’s get started!

SQL Functions

SQL provides built-in functions for performing operations, categorized into two types. The types are:

1. Aggregate Functions

These functions are used to perform operations on values of the column, and a single value is returned. The SQL provides the following aggregate functions:

AVG(): It returns the calculated average value from values of the selected numeric column

Syntax of AVG() function:

Select AVG(column_name) From table_name;

Example of AVG() function:

Select AVG(Salary) AS AverageSalary From Employees;

COUNT(): This function is used to count the number of rows returned in a Select Statement.

Syntax of COUNT() function:

Select COUNT(column_name) From table_name;

Example of COUNT() function:

Select COUNT(*) AS NumEmployees From Employees;

FIRST(): The function returns the first value of the selected column.

Syntax of FIRST() function:

Select FIRST(column_name) From table_name;

Example of FIRST() function:

Select FIRST(Employee_ID) AS FirstEmployee From Employees;

LAST(): The function returns the last value of the selected column.

Syntax of LAST() function:

Select LAST(column_name) From table_name;

Example of LAST() function:

Select LAST(Employee_ID) AS LastEmployee From Employees;

MAX(): The function returns the maximum value of the selected column.

Syntax of MAX() function:

Select MAX(column_name) From table_name;

Example of MAX() function:

Select MAX(Salary) AS MaxSalary From Employees;

MIN(): The function returns the minimum value of the selected column.

Syntax of MIN() function:

Select MIN(column_name) From table_name;

Example of MIN() function:

Select MIN(Salary) AS MinSalary From Employees;

SUM(): The function returns the sum of the values of the selected column.

Syntax of SUM() function:

Select SUM(column_name) From table_name;

Example of SUM() function:

Select SUM(Salary) AS TotalSalary From Employees;

2. Scalar Functions

The scalar functions are based on user input and return a single value. Let’s understand through scalar functions:

UCASE(): The function converts the value of a field to uppercase.

Syntax of UCASE() function:

Select UCASE(column_name) From table_name;

Example of UCASE() function:

Select UCASE(Ename) From Employees;

LCASE(): The function converts the value of a field to lowercase.

Syntax of LCASE() function:

Select LCASE(column_name) From table_name;

Example of LCASE() function:

Select LCASE(Ename) From Employees;

MID(): The function extracts texts from the text field.

Syntax of MID() function:

Select MID(column_name,start,length) FROM table_name;

Specifying the length is not compulsory here and the start represents the start position.

Example of MID() function:

Select MID(Ename, 1, 4) From Employees;

LEN(): The function returns the length of the specified value.

Syntax of LEN() function:

Select LENGTH(column_name) From table_name;

Example of LEN() function:

Select LENGTH(Ename) From Employees;

ROUND(): The function returns the round numeric value to the specified decimal places. This arithmetic operation is performed considering IEEE 754 standard.

Syntax of ROUND() function:

Select ROUND(column_name, decimals) From table_name;

decimals in the syntax specify the number of decimals to be fetched.

Example of ROUND() function:

Select ROUND(Salary, 0) From Employees;

NOW(): The function returns the current date and time of the system.

Syntax of NOW() function:

Select NOW() From table_name;

Example of NOW() function:

Select Ename, NOW() From Employees;

FORMAT(): The function formats how a field is to be presented.

Syntax of FORMAT() function:

Select FORMAT(column_name, format) From table_name;

Example of FORMAT() function:

Select Ename, FORMAT(NOW(), 'YYYY-MM-DD') AS Date From Employees;

CONCAT(): The function joins the values stored in different columns, or it can be used to join two strings simply.

Syntax of CONCAT() function:

Select CONCAT(string_1, string_2,...., string_n) AS Alias_Name;

Select CONCAT(column_name1, column_name2,...., column_name_n) From table_name;

Example of CONCAT() function:

Select CONCAT('Hello', ' Everyone') As Gesture;

Select CONCAT(FirstName, LastName) AS EmployeeName From Employee;

REPLACE(): The function replaces the occurrence of a specified value with the new one.

Syntax of REPLACE() function:

Select REPLACE(Original_Value, Value_to_Replace, New_Value) AS Alias_Name;

Select REPLACE(Column_Name, Character/string_to_replace, new_String/character ) AS Alias_Name FROM Table_Name;

Example of REPLACE() function:

Select REPLACE('APPSE', 'S', 'L');

Select LastName, REPLACE(LastName, 'r', 'a') AS Replace_r_a From Employees;

POSITION(): The function returns the position of the first occurrence of a specified substring in a string.

Syntax of POSITION() function:

Select POSITION(substring IN string/column_name);

Example of POSITION() function:

Select POSITION("A" IN "APPLE") As Position;

Select POSITION("a" in FirstName) From employees;

SQL Joins

As the name suggests, JOIN means combining something, which refers to combining two or more tables. The JOIN combines the data of two or more tables in a database. The joins are used if we want to access the data of multiple tables simultaneously. The joining of tables is done based on a common field between them.

According to ANSI standards, there are five types of JOIN:

  • INNER Join
  • LEFT OUTER Join
  • RIGHT OUTER Join
  • FULL OUTER Join
  • CROSS Join

Firstly, let’s look at how SQL JOIN works:

Suppose we have two tables:

1. Parent Table

ID Name Age Address Salary
1 Ram 26 Mumbai 20000
2 Jack 28 Delhi 18000
3 John 25 Pune 25000
4 Amy 32 Delhi 22000

2. Student Table

Student_Id Class Class_ID Grades
101 9 1 A
102 8 3 B
103 10 4 A

So, if we use the following JOIN statement:

Select ID, Name, Student_ID, Grades
From Parent p, Student s
Where p.ID = s.Class_ID;

The result would be:

ID Name Student_ID Grades
3 John 102 B
1 Ram 101 A
4 Amy 103 A
1 Ram 101 A

Now, let’s look at different types of joins:

1. SQL OUTER JOIN

In the outer JOIN of SQL, the content of the specified tables is integrated whether their data matches or not.

Outer join is done in two ways:

  1. Left outer join, or a left join, returns all the rows from the left table, combining them with the matching rows of the right table. If there is no matching data, it returns NULL values.
  2. Right outer join, or a right join, returns all the rows from the right table, combining them with the matching rows of the left table. If there is no matching data, it returns NULL values.

Syntax of LEFT JOIN:

Select table1.column1, table2.column2,....
From table1 LEFT JOIN table2
ON table1.coulmn_field = table2.column_field;
Example of LEFT JOIN:
Select ID, Name, Student_Id, Grades
From Parent LEFT JOIN Student 
ON Parent.ID = Student.Class_ID;

Syntax of RIGHT JOIN:

Select table1.column1, table2.column2,....
From table1 RIGHT JOIN table2
ON table1.coulmn_field = table2.column_field;

Example of RIGHT JOIN:

Select ID, Name, Student_Id, Grades
From Parent RIGHT JOIN Student 
ON Parent.ID = Student.Class_ID;

2. SQL FULL JOIN

The full join or full outer join of SQL returns the combination of both right and left outer join, and the resulting table has all the records from both tables. If no matches are found, then the NULL value is returned.

Syntax of FULL OUTER JOIN:

Select * 
From table1 FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;

Example of FULL OUTER JOIN:

Select *
From Parent FULL OUTER JOIN Student
ON Parent.ID = Student.Class_ID;

3. SQL CROSS JOIN

The SQL cross join used to combine the data gives the cartesian product of the sets of rows from the joined table. When each row of the first table is combined with every single row of the second table, it is called Cartesian join or Cross join.

The resulting number of rows equals the product of the number of rows in the first table to the number of rows in the second table.

Syntax of CROSS JOIN:

Select table1.column1, table2.column2,....
From table1 CROSS JOIN table2
ON table1.coulmn_field = table2.column_field;

Example of CROSS JOIN:

Select * 
From Parent CROSS JOIN Student
ON Parent.ID = Student.Class_ID;

Grouping Data

In SQL, the GROUP BY statement is used for organizing data into groups based on similarity in the data. Further data is organized with the help of the equivalent functions. In simple words, if different rows of a specified column have the same values, they are placed together in a group. The following criteria are taken into consideration while using Group By statement:

  • In SQL, the Select statement is used with the GROUP BY clause.
  • Where clause is used before GROUP BY clause.
  • ORDER BY clause is placed after this clause.

Syntax of GROUP BY clause:

Select column1, function_name(column2)
From table_name
Where condition
GROUP BY column1, column2
ORDER BY column1, column2;

Example of GROUP BY clause:

Select Name, SUM(Salary), Age 
From Employee
GROUP BY Age;

One important point to remember here is that the Where clause is used for deciding purposes. It is used to place conditions on columns to decide the part of the result table. Here, we cannot use aggregate functions like COUNT(), SUM(), etc. with the Where clause. So, we use the Having clause.

Syntax of Having Clause:

Select column1, function_name(column2)
From table_name
Where condition
GROUP BY column1, column2
Having condition
ORDER BY column1, column2;

Example of Having Clause:

Select Name, SUM(Salary), ID 
From Employee
GROUP BY Name
Having SUM(Salary)>18000
ORDER BY ID;

The Select statement can use constants, aggregate functions, expressions, and column names in the GROUP BY clause.

SQL CASE

The CASE statement operates like if-then-else logical queries. When the specified condition is true, the statement returns the specified value. If the condition turns out to be false, it executes the ELSE part. When there is no specified ELSE condition, it returns a NULL value.

The CASE statement is used in Select, Delete, and Insert statements with Where, ORDER BY, and GROUP BY clauses.

Syntax of CASE statement:

CASE 
WHEN condition_1 THEN statement_1
WHEN condition_2 THEN statement_2
.
.
.
WHEN condition_N THEN statement_N
ELSE result
END;

The above query will go through each condition one by one. If the expression matches the query, it will print the result accordingly and skip all the condition statements afterward. If no condition matches the term, the control would go to the ELSE part and return its result. Here, the ELSE part is optional; in that case, it returns a NULL value if no condition satisfies the expression.

Example of CASE Statement:

Select Student_ID, Name, Subject, Marks,
CASE
WHEN Marks>=50 THEN 'PASS'
ELSE 'FAIL'
END AS Student_Result
From Student;

SQL View

To hide the complexity of the data and prevent unnecessary access to the database, SQL introduces the concept of VIEW. It allows the user to pick a particular column rather than the complete table. The view or virtual table as it is considered depends on the result-set of the predefined SQL query.

In the views, the rows don’t have any physical existence in the database, and just like SQL tables, views store data in rows and columns using the Where clause.

Syntax to create View from Single Table

Create VIEW View_name AS
Select column_name1, column_name2,....., column_nameN
From table_name
Where condition;

Syntax to create View from Multiple Tables

Create VIEW View_name AS
Select table_name1.column_name1, table_name2.column_name1,.....
From table_name1, table_name2,....., table_nameN
Where condition;

We can also modify the current view and insert new data, but it can only be done if the following conditions are followed:

  • Views based on one table can only be updated, not the one formed from multiple tables.
  • The view fields should not contain any NULL values.
  • The view doesn’t contain any subquery and DISTINCT keyword in its definition.
  • The view cannot be modified if the Select statement used to create a view contains JOIN, HAVING, or GROUP BY clause.
  • The view cannot be updated if any field contains an aggregate function.

Syntax to Update a View:

CREATE OR REPLACE VIEW View_name AS
Select column_name1, column_name2,...., column_nameN
From table_name
Where condition;

To delete the current view from the database DROP statement is used:

DROP VIEW View_name;

SQL UNION, UNION ALL, and EXCEPT

The UNION operator combines the result of two or more Select queries and results in a single output.

Syntax of UNION operator:

Select column_name1, column_name2,...., column_nameN From table_name1
UNION
Select column_name1, column_name2,...., column_nameN From table_name2
UNION
Select column_name1, column_name2,...., column_nameN From table_name3;

The UNION ALL operator has the same functionality as the UNION operator, the only difference is that UNION ALL operator shows the common rows in the result, whereas the UNION operator does not.

Syntax of UNION ALL operator:

Select column_name1, column_name2,...., column_nameN From table_name1
UNION ALL
Select column_name1, column_name2,...., column_nameN From table_name2;

The EXCEPT operator is used to filter out data. The statement combines the two select statements and returns the records that are present in the first Select query and not in the second Select query. It works in the same way as the minus operator does in mathematics.

Syntax of EXCEPT operator:

Select column_name1, column_name2,...., column_nameN From table_name1
EXCEPT
Select column_name1, column_name2,...., column_nameN From table_name2;

SQL Subqueries

A subquery or inner query is a nested SQL query inside a larger query. The subquery can be included inside a Select, Insert, Update, Delete statement or inside another subquery. Writing subqueries requires comparison operators, such as >, <, or =, and the comparison operator can also be a multiple-row operator like IN, ALL, or ANY.

Syntax to write an inner query:

Select column_name1, column_name2, ...., column_nameN
From table_name
Where operator (Select column_name1, ..., column_nameN
                               From table_name);

The subquery is executed before the outer or main query, and the main query uses its result.

There are some conditions required to be followed for writing a subquery:

  1. Subqueries are enclosed within parentheses.
  2. A subquery has only one column in the SELECT clause unless there are multiple columns in the main query for comparing the selected columns of the subquery.
  3. The subquery cannot contain an ORDER BY command, but the GROUP BY clause can be used to perform the same function, the main query can use an ORDER BY clause.
  4. Subqueries that return multiple rows can only be used with multiple value operators like IN operator.
  5. The BETWEEN operator cannot be used with a subquery but within it.

Example of a subquery:

Select *
From Employees
Where ID IN(Select ID From Employees_Another
            Where Salary > 4500);

The above statement selects the data of employees in the Employees table whose given salary is more than 4500 in the Employees_Another table.

Update Employees
SET Salary = Salary * 0.25
Where Age IN(Select Age From Employees_Another
             Where Age >= 27);

The above query updates the employees’ salary value in the Employees table if their age is greater than or equal to 27 in the Employees_Another table.

Conclusion

We now completely grasp SQL concepts and advanced queries and functions. Now you can pick any SQL project and start working on it. SQL is one of the most efficient languages for dealing with data in adequate time and space. If you are building a career in data science, give SQL a go!

In a nutshell, we learned about:

  • How can aggregated functions be used to get the required data, and how can we compute various functions
  • Creating and working on virtual tables
  • Some additional clauses like CASE, UNION, UNION ALL, and EXCEPT
  • The concept and working of SQL subqueries

SQL is all about constant practicing. Keep on practicing, and you will master it in no time!

Thank you.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Simran Joshi 02 Sep 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses