How to Read and Write With CSV Files in Python: Ultimate Guide to Working With CSV Files in Data Science
Introduction
CSV is a file format you will frequently come across while working in the field of Data Science. It is a type of text file that stores tabular data for better readability, easier understanding, and faster processing. CSV files can be converted from a JSON file or created using Python or Java. In this article, I will introduce you to the basics of CSV files and walk you through the different ways of reading and writing CSV files in Python. We are focusing on Python as it has a built-in csv library that makes it easy to read data from CSV files and also write into them. The step-by-step Python tutorials in this article are sure to be simple and easy to follow, even for beginners.
Learning Objectives
- Understand the definition and structure of a CSV file
- Familiarize yourself with the different ways of reading a CSV
- Learn how to write a Python CSV file
Source: diegomariano.com
This article was published as a part of the Data Science Blogathon.
Table of Contents
What is a CSV?
CSV stands for “Comma Separated Values.” It is the simplest form of storing data in tabular form as plain text. It is important to know to work with CSV because we mostly rely on CSV data in our day-to-day lives as data scientists.
Structure of CSV
![Structure of CSV - header and observations/records [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/26434csv sample10661.jpg)
How to Read CSV Files in Python?
There are many different ways to read data in a CSV file, which we will now see one by one.
Using csv.reader
You can read CSV files using the csv.reader object from Python’s csv module.
Steps to read a CSV file using csv reader:
1. Import the csv library.
import csv
2. Open the CSV file.
The .open() method in python is used to open files and return a file object.
file = open('Salary_Data.csv')
type(file)
The type of file is “_io.TextIOWrapper” which is a file object that is returned by the open() method.
3. Use the csv.reader object to read the CSV file.
csvreader = csv.reader(file)
4. Extract the field names.
Create an empty list called a header. Use the next() method to obtain the header.
The .next() method returns the current row and moves to the next row.
The first time you run next(), it returns the header, and the next time you run, it returns the first record, and so on.
header = []
header = next(csvreader)
header
rows = []
for row in csvreader:
rows.append(row)
rows
![Rows/records in csv [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/19301rows71106.jpg)
6. Close the file.
.close() method is used to close the opened file. Once it is closed, we cannot perform any operations on it.
file.close()
Complete Code
Python Code
Naturally, we might forget to close an open file. To avoid that, we can use the with() statement to automatically release the resources. In simple terms, there is no need to call the .close() method if we are using with() statement.
Implementing the above code using with() statement:
Syntax: with open(filename, mode) as alias_filename:
Modes:
‘r’ – to read an existing file,
‘w’ – to create a new file if the given file doesn’t exist and write to it,
‘a’ – to append to existing file content,
‘+’ – to create a new file for reading and writing
import csv
rows = []
with open("Salary_Data.csv", 'r) as file:
csvreader = csv.reader(file)
header = next(csvreader)
for row in csvreader:
rows.append(row)
print(header)
print(rows)
![CSV python file [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/21110with13394.jpg)
Using .readlines()
Now the question is – “Is it possible to fetch the header and rows using only open() and with() statements and without the csv library?” Let’s see…
.readlines() method is the answer. It returns all the lines in a file as a list. Each item on the list is a row of our CSV file.
The first row of the file.readlines() is the header, and the rest are the records.
with open('Salary_Data.csv') as file:
content = file.readlines()
header = content[:1]
rows = content[1:]
print(header)
print(rows)
![CSV file using .readlines() [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/22485readlines69269.jpg)
**The ‘n’ from the output can be removed using .strip() method.
What if we have a huge dataset with hundreds of features and thousands of records? Would it be possible to handle lists??
Here comes pandas library into the picture.
Using pandas
Let’s have a look at how pandas are used to read data in a CSV file.
1. Import pandas library.
import pandas as pd
2. Load CSV files to pandas using read_csv().
Basic Syntax: pandas.read_csv(filename, delimiter=’,’)
data= pd.read_csv("Salary_Data.csv")
data
![csv file python pandas [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/62185data97273.jpg)
3. Extract the field names.
.columns is used to obtain the header/field names.
data.columns
![.columns in csv python pandas [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/28062data columns19493.jpg)
4. Extract the rows.
All the data of a data frame can be accessed using the field names.
data.Salary
![Row extraction in csv pandas [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/68951salary92794.jpg)
Using csv.DictReader
Dict is a hash table of keys and values structured in Python. The dict() method is used to create a dictionary object from either a specified set or iterables of keys and values. The csv module .DictReader is used to read CSV files. Here’s how you can do that.
1. Import the csv module
import csv
2. Open the CSV file using the .open() function with the mode set to ‘r’ for reading.
with open('Salary_Data.csv', 'r') as csvfile:
3. Create a DictReader object using the csv.DictReader() method.
Note that the first row of the CSV file should contain the field names.
reader = csv.DictReader(csvfile)
4. Use the csv.DictReader object to read the CSV file.
Iterate through the rows of the CSV file using a ‘for’ loop and the DictReader object to see the field names as keys along with their respective values.
for row in reader:
print(row)
How to Write to a Python CSV?
We can write to a CSV file in multiple ways.
Using csv.writer
The csv.writer() function returns a writer object that converts the input data into a delimited string.
For example, let’s assume we are recording the data of 3 students (Name, M1 Score, M2 Score)
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
Now let’s see how this data can be written to a CSV file using csv.writer:
1. Import csv library.
import csv
2. Define a filename and Open the file using open().
3. Create a csvwriter object using csv.writer().
4. Write the header.
5. Write the rest of the data.
Code for steps 2-5
filename = 'Students_Data.csv'
with open(filename, 'w', newline="") as file:
csvwriter = csv.writer(file) # 2. create a csvwriter object
csvwriter.writerow(header) # 4. write the header
csvwriter.writerows(data) # 5. write the rest of the data
Below is how our CSV file looks.
![CSV file using csv.writer [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/20904students_data10718.jpg)
Using .writelines()
.writelines() iterates through each list, converts the list elements to a string, and then writes it to the csv file.
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
filename = 'Student_scores.csv'
with open(filename, 'w') as file:
for header in header:
file.write(str(header)+', ')
file.write('n')
for row in data:
for x in row:
file.write(str(x)+', ')
file.write('n')
![CSV file using .writelines() [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/95763student_scores59602.jpg)
Using pandas
Follow these steps to write to a CSV file using pandas:
1. Import pandas library.
import pandas as pd
2. Create a pandas dataframe using pd.DataFrame.
Syntax: pd.DataFrame(data, columns)
The data parameter takes the records/observations, and the columns parameter takes the columns/field names.
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
data = pd.DataFrame(data, columns=header)
3. Write to a CSV file using to_csv().
Syntax: DataFrame.to_csv(filename, sep=’,’, index=False)
**separator is ‘,’ by default.
index=False to remove the index numbers.
data.to_csv('Stu_data.csv', index=False)
Below is how our CSV looks like
![Writing csv using pandas [python read csv]](https://av-eks-blogoptimized.s3.amazonaws.com/44622pandas to csv40456.jpg)
Using csv.DictWriter
You can write data into a CSV file using the csv module .DictReader following the below steps.
1. Import the csv module.
import csv
2. Using the .open() function, create a new file object with the mode as ‘w’ for writing.
Create a new file object using the open()
function, specifying the file name with the mode set as ‘w’ for writing.
with open('Students_Data.csv', 'w', newline='') as csvfile:
3. Type in the data you want to write to the CSV file as a list of dictionaries.
data = [{'Name': 'Alex', 'M1 Score': 62, 'M2 Score': 80},
{'Name': 'Brad', 'M1 Score': 45, 'M2 Score': 56},
{'Name': 'Joey', 'M1 Score': 85, 'M2 Score': 98}]
4. Create a csv.DictWriter object specifying the file object, the fieldname parameters, and the delimiter.
Note that the delimiter by default is ‘,’
fieldnames = ['Name', 'M1 Score', 'M2 Score'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
5. Write the header row using the writeheader() method.
writer.writeheader()
6. Now, use the writerows() method to write the data to the CSV file.
writer.writerows(data)
This will create a new file named ‘Students_Data.csv’ with Name, M1 Score, and M2 Score as the header/column names and the data values under the data variable.
Conclusion
By now, I’m sure you are all familiar with the different ways of handling CSV files in Python. I hope this article was informative for all. Feel free to share it with your study buddies.
Key Takeaways
- Creating a Comma Separated Values (CSV) file is the simplest way of converting complex data into a readable text file.
- A file in the CSV format shows you organized tabular data similar to an excel sheet.
- You can read a CSV file in Python using csv.reader, .readlines(), or csv.DictReader, and write into one by using .writer, .DictWriter, or .writelines().
- Pandas can be used for both reading and writing data in a CSV.
Frequently Asked Questions
Q1. How do I write data to a CSV file in Python?
A. You can write data to a CSV file in Python using pandas, or csv modules such as .writer and .DictWriter, or by the .writelines() method.
Q2. How do I read a CSV file as text in Python?
A. There are many ways to read CSV files as plain text in Python including using csv.reader, .readlines(), pandas, or csv.DictReader.
Q3. Can you read and write to a CSV file at the same time in Python?
A. Although you can open a CSV file in both reading and writing modes in the same program, you cannot do both simultaneously.
Q4. How to create CSV in Python?
A. To create a CSV file in Python, you can use the built-in csv module. First, import the module and open a new file using the ‘with open’ statement. Then create a csv writer object and use it to write rows of data to the file. Finally, close the file.
References
Check out the complete code from the GitHub repo.
Feel free to check out my other blog posts from my Analytics Vidhya Profile.
You can find me on LinkedIn, Twitter in case you would want to connect. I would be glad to connect with you.
For an immediate exchange of thoughts, please write to me at [email protected].
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
One thought on "How to Read and Write With CSV Files in Python: Ultimate Guide to Working With CSV Files in Data Science"
George Thomas says: November 19, 2022 at 10:50 pm
What a great article! ” Your information is very helpful for becoming a better blogger. Keep sharing.