Learning SQL after Excel couldn’t be simpler!
I’ve spent more than a decade working on Excel. Yet, there is so much to learn. If you dislike coding, excel could be your rescue into data science world (to some extent). Once you understand Excel operations, learning SQL is very easy.
Why can’t you use Excel for serious data science work?
Now at this stage, you might ask, why can’t I use excel for all my work. There are several reasons for it:
- For large datasets, excel is not effective. Computations on large datasets either won’t happen or would take a lot of time. Just a caveat: Microsoft recently launched Power BI and I need to explore it. It might have changed the boundaries of large data.
- There is no audit trail in Excel. With tools based on coding and work flow management, you can re-look and re-run the process again and again. It is very difficult to do so in excel. If you change or delete a cell in Excel in an accidental manner, it is difficult to trace it back.
- Finally, Excel takes a lot of time to update libraries with latest algorithms in data science and machine learning. Try searching for XGboost and FTRL in excel!
Moving to SQL would address point 1 and point 2 to some extent. Moreover, SQL is one of the most sought out skills in a data scientist.
If you don’t know SQL yet and have worked in Excel, you can get started right now. I’ve designed this tutorial with keeping in mind, the most commonly used excel operations. Your previous experience blended with this tutorial can quickly make you a SQL expert. (Note: If you find any trouble, please write to me in comments section below.
Related : Basics of SQL and RDBMS for Beginners
List of Common Excel Operations
Here is the list of commonly used excel operation. In this tutorial, I’ve performed all these operations in SQL:
- View Data
- Sort Data
- Filter Data
- Delete Records
- Add Records
- Update Data in Existing Record
- Show Unique Values
- Write an expression to generate new column
- LookUp data from another table
- Pivot Table
1. View Data
In excel, we can view all the records directly. But, SQL requires a command to process this request. This can be done by using SELECT command.
SELECT column_1,column_2,…column_n | * FROM table_name;
A. View all data of Employee table
B. View only ECODE and Gender data of Employee table
2. Sort Data
Organizing information becomes important when you have more data. It helps to generate quick inferences. You can quickly organize an excel worksheet by sorting your data in ascending or descending order.
SELECT column_1,column_2,…column_n | * FROM table_name order by column_1 [desc], column_2 [desc];
A. Arrange records of Employee table in Descending order of Total_Payout.
Select * from Employee order by Total_Payout desc;
B. Arrange records of Employee table by City (ascending) and Total_Payout(descending).
Select * from Employee order by City, Total_Payout desc;
3. Filter Data
In addition to sorting, we often apply filter to analyze data in a better way. When data is filtered, only rows that meet the filter criteria is displayed while other rows get hidden. Also, we can apply multiple criterion to filter data.
SELECT column_1,column_2,…column_n | * FROM table_name where column_1 operator value;
Below are the common list of operators, we can use to form a condition.
|<>||Not equal. Note: In some versions of SQL this operator may be written as !=|
|>=||Greater than or equal|
|<=||Less than or equal|
|BETWEEN||Between an inclusive range|
|LIKE||Search for a pattern|
|IN||To specify multiple possible values for a column|
A. Filter observations associated with city “Delhi”
Select * from Employee where City="Delhi";
B. Filter observations of Department “Admin” and Total_Payout >= 500
Select * from Employee where Department="Admin" and Total_Payout >=500;
4. Delete Records
Deleting records or columns is a commonly used operation in Excel. In excel, we simply press ‘Delete’ key on keyboard to delete a record. Similarly, SQL has command DELETE to remove records from a table.
DELETE FROM table_name WHERE some_column=some_value;
A. Delete observations which have Total_Payout >=600
Delete * from Employee where Total_Payout >=600;
It removes two record only since these two observations satisfy the condition stated above. But be careful! if we do not provide any condition, it will remove all records from a table.
B. Delete observations which have Total_Payout >=600 and Department =”Admin”
Delete * from Employee where Total_Payout >=600 and Department ="Admin";
Above command will remove only one record which satisfies the condition.
5. Add records
We have seen methods to remove records, we can also add records to SQL table as we do in excel. INSERT command helps to perform this operation.
INSERT INTO table_name VALUES (value1, value2, value3,…); -> Insert values to all columns
INSERT INTO table_name (column1,column2,column3,…) VALUES (value1,value2,value3,…); -> Insert values to selected columns
Insert into employee values('A002','05-Nov-12',0.8,'Female','Admin',12.05,26,313.3,'Mumbai');
Select * from Employee where ECODE='A002';
B. Insert values to ECODE (A016) and Department (HR) only.
Insert into employee (ECODE, Department) values('A016','HR');
Select * from Employee where Department='HR';
6. Update Data in Existing Observations
Suppose, we want to update the name of “HR” department to “Manpower” for all employees. For such cases, SQL has a command UPDATE which performs this function.
UPDATE table_name SET column1=value1,column2=value2,… WHERE some_column=some_value;
Exercise: Rename Department “HR” to “Manpower”
Update Employee SET Department='Manpower' where Department='HR';
7. Show unique values
We can show unique values of variable(s) by applying DISTINCT keyword prior to variable name.
SELECT DISTINCT column_name,column_name FROM table_name;
Exercise: Show unique values of City
8. Write an expression to generate new column
In excel, we can create a column, based on existing column using functions or operators. This can be done in SQL using the commands below.
A. Create a new column Incentive which is 10% of Total_Payout
B. Create a new column City_Code which has first three characters of City.
Select *, Left(City,3) as City_Code from Employee where Department="Admin";
For more details on SQL functions, I would recommend you to refer this link.
9. LookUp data from another table
The most used function of excel by any BI professional / data analyst is VLOOKUP(). It helps to map data from other table to parent table. In other words, we can say that it is the ‘excel’ way of joining 2 data sets through a common key.
In SQL, we have similar functionality known as JOIN.
SQL JOIN is used to combine rows from two or more tables, based on a common field between them. It has multiple types:
- INNER JOIN: Return rows when there is a match in both tables
- LEFT JOIN: Return all rows from the left table and the matched rows from the right table
- RIGHT JOIN: Return all rows from the right table and the matched rows from the left table
- FULL JOIN: Return all rows when there is a match in ONE of the tables
SELECT table1.column1, table2.column2..... FROM table1 INNER | LEFT| RIGHT| FULL JOIN table2 ON table1.column = table2.column;
Exercise: Below is city category table “City_Cat”, now I want to map city category to Employee table and show all records of Employee table.Here, I want to show all records of table Employee. So, we will use left join.
SELECT Employee.*,City_Cat.City_Category FROM Employee LEFT JOIN City_Cat ON Employee.City = City_Cat.City;
To know more about JOIN operations, I would recommend you to refer this link.
10. Pivot Table
Pivot Table is an advanced way of analyzing data in excel. Not only it is useful, but it allows you to extract the hidden insights from data.
Moreover, it helps us to generate inference by summarizing data and allow us to manipulate it in different ways. This operation can be done in SQL by using aggregate functions and GROUP BY command.
SELECT column, aggregate_function(column) FROM table WHERE column operator value GROUP BY column;
A. Show sum of Total_Payout by Gender
SELECT Gender, Sum(Total_Payout) from Employee Group by Gender;
SELECT Gender, City, Count(City), Sum(Total_Payout) from Employee Group by Gender, City;
Did you find the article useful? Do let us know your thoughts about this transition guide in the comments section below