I just completed Analytics Vidhya’s course on SQL for Data Science and here’s what I’ve learned in that.
Data Science essentially applies to get insights from data and analyzing it to draw conclusions. We all know, most of the data is present in databases and SQL is nothing but the language which helps us communicate with databases.
So learning SQL for Data Science is more of a prerequisite for being successful in this field.
The following article discusses how to manage SQL from Python which is a powerful skill to master.
What I used: Anaconda Command Prompt, Sublime Text Editor, OS: Windows 10, SQL Command Line
You need to set up a virtual environment. Why?
- Virtual Environment as the name suggests is different from the actual Python Environment.
- To manipulate SQL, you will need to install or make changes in some packages. The virtual environment will ensure that these changes do not affect other projects, the main python installation, or even people working on the same machine.
- So, a virtual environment provides you your copy of Python so that you can make changes, installing or uninstalling packages freely.
Creating a virtual environment
(I executed the following on Anaconda Command Prompt:)
conda create — name myenv (here myenv is the name of your virtual environment)
To activate this environment, use-
conda activate myenv ##This will create a directory namely myenv
Note how (base) changes to (myenv) when you activate your environment.
Similarly, to deactivate this environment, use
conda deactivate myenv
Installing required packages:
Run this command:
pip install mysql-connector-python
The required package is installed.
We will write our python code for executing SQL queries in Sublime Text Editor. To execute Sublime Text python files from the command prompt, you need to create an environment variable. If you don’t know how to do that, you can check this.
Now, when you type subl in the command prompt, it will open the Sublime Text Editor from right there. Create a new python file. I have named my file HelloWorld.py
Run a test command print(“Hello World”) to ensure that your file is executing.
Connecting to MySQL
db = mysql.connector.connect(host=’localhost’, user=’yourusername’, passwd=’yourpass’, database=’fifa19', port=’3307', auth_plugin=’mysql_native_password’)
1. type your MySQL username and password in the arguments user and passwd.
2. database will have the name of the database you want to use.
3. MySQL port is 3307. If your port is 3306, you don’t need to specify explicitly.
4. auth_plugin=’mysql_native_password’: this argument is specified to avoid authentication error.
This creates an object of MySQL.connector.connection
Viewing data from Database
We use the cursor method to create a cursor object which is used to execute statements to communicate with MySQL databases.
The following code is written in the text editor.
crs = db.cursor() crs.execute(“select * from players limit 10”) result = crs.fetchall()
1. crs is the name of my cursor object.
2. execute() accepts SQL queries as a parameter and executes the given query.
3. The result variable stores the result set of fetchall() method.
4. fetchall() method returns the result set of the query in the form of tuples.
5. pprint() prints the output in a more formatted and readable way. [You can import pprint with: from pprint import pprint]
Save and run the file through the command prompt. The output will look as follows:
Write the following code in the text editor:
cmd=”create table contacts(Name Varchar(255),PhoneNo int(12))” print(cmd) crs.execute(cmd)
Note that we execute all queries on the cursor object.
Save and run the file on the command prompt. This will create a table named ‘contacts’ in your database. You can check the same using your sql command line.
The code for the insert command looks like:
insert_command= “insert into contacts(Name,PhoneNo) values(%s,%s)” values=(“Sejal”,987654321)
Here, db.commit() is required to be executed every time we make changes to the database. This method is used to ensure that changes in the database are consistent.
Run the file and you will be able to see that data has been added to your table. You can verify this using SQL command line.
The code for the searching records looks like:
crs.execute(“select Name from players where Nationality=’United States’”) result = crs.fetchall()
Here, we are trying to look for names of players whose Nationality is the United States.
The output will look as follows:
The code for deleting records looks like:
crs.execute(“delete from contacts”) db.commit()
Remember, we use db.commit() to ensure that changes are reflected in the database.
After the file is run, you can see that the records have been deleted from the database.
Hope you learned something in a relatively easier way how to manipulate SQL in Python!