CSV is a file format you will frequently come across while working in the field of Data Science. It is a type of text file that stores tabular data for better readability, easier understanding, and faster processing. CSV files can be converted from a JSON file or created using Python or Java.
In this article, we will introduce you to the basics of CSV files and walk you through the different ways of reading and writing CSV files in Python. We are focusing on Python as it has a built-in csv library that makes it easy to read data from CSV files and also write into them. The step-by-step Python tutorials in this article cover how to read and write CSV files in Python and are sure to be simple and easy to follow, even for beginners. Additionally, we will explain how to parse data, work with different data types and data structures, and use libraries like Pandas and NumPy in Python 3 for handling CSV files.
This article was published as a part of the Data Science Blogathon.
Understanding how to read CSV files in Python is essential for any data scientist. CSV, which stands for “Comma Separated Values,” serves as the fundamental format for storing tabular data as plain text. As data scientists, we frequently encounter CSV data in our daily workflows. Therefore, mastering the ability to read CSV files in Python is crucial for efficiently handling and analyzing data sets.
We have a file named “Salary_Data.csv.” The first line of a CSV file is the header. It contains the names of the fields/features, which are shown on top as the column names in the file.
After the header, each line of the file is an observation/a record. The values of a record are separated by “commas.”
There are two main ways to read CSV files in Python:
csv
module: This is the built-in module for working with CSV files in Python. It provides basic functionality for reading and writing CSV data.Here’s an example of how to read a CSV file using csv.reader:
import csv
# Open the CSV file in read mode
with open('data.csv', 'r') as csvfile:
# Create a reader object
csv_reader = csv.reader(csvfile)
# Iterate through the rows in the CSV file
for row in csv_reader:
# Access each element in the row
print(row)
Here’s an example of how to read a CSV file using Pandas:
import pandas as pd
# Read the CSV file into a DataFrame
df = pd.read_csv('data.csv')
# Access data in the DataFrame using column names or indexing
print(df['column_name'])
print(df.iloc[0]) # Access first row
There are many different ways to read data in a CSV file, which we will now see one by one.
You can read CSV files using the csv.reader object from Python’s csv module. Steps to read a CSV file using csv reader:
import csv
The .open() method in python is used to open files and return a file object.file = open('Salary_Data.csv')
type(file)
The type of file is “_io.TextIOWrapper” which is a file object that is returned by the open() method.
csvreader = csv.reader(file)
Create an empty list called a header. Use the next() method to obtain the header.
The .next() method returns the current row and moves to the next row.
The first time you run next(), it returns the header, and the next time you run, it returns the first record, and so on.
header = []
header = next(csvreader)
header
Create an empty list called rows and iterate through the csvreader object and append each row to the rows list.rows = []
for row in csvreader:
rows.append(row)
rows
.close() method is used to close the opened file. Once it is closed, we cannot perform any operations on it.file.close()
import csv
file = open("Salary_Data.csv")
csvreader = csv.reader(file)
header = next(csvreader)
print(header)
rows = []
for row in csvreader:
rows.append(row)
print(rows)
file.close()
Naturally, we might forget to close an open file. To avoid that, we can use the with() statement to automatically release the resources. In simple terms, there is no need to call the .close() method if we are using with() statement.
Basic Syntax: with open(filename, mode) as alias_filename:
Modes:
import csv
rows = []
with open("Salary_Data.csv", 'r') as file:
csvreader = csv.reader(file)
header = next(csvreader)
for row in csvreader:
rows.append(row)
print(header)
print(rows)
Also Read: The Evolution and Future of Data Science Innovation
Now the question is – “Is it possible to fetch the header and rows using only open() and with() statements and without the csv library?” Let’s see…
.readlines() method is the answer. It returns all the lines in a file as a list. Each item on the list is a row of our CSV file.
The first row of the file.readlines() is the header, and the rest are the records.
with open('Salary_Data.csv') as file:
content = file.readlines()
header = content[:1]
rows = content[1:]
print(header)
print(rows)
**The ‘n’ from the output can be removed using .strip() method.
What if we have a huge dataset with hundreds of features and thousands of records? Would it be possible to handle lists??
Here comes pandas library into the picture.
Let’s have a look at how pandas are used to read data in a CSV file.
import pandas as pd
Basic Syntax: pandas.read_csv(filename, delimiter=’,’)
data= pd.read_csv("Salary_Data.csv")
data
.columns is used to obtain the header/field names.
data.columns
All the data of a data frame can be accessed using the field names.
data.Salary
A dictionary in how to Read CSV file in Python is like a hash table, containing keys and values. To create a dictionary, you use the dict() method with specified keys and values. If you’re working with CSV files in Python, the csv module’s .DictReader comes in handy for reading them. Here’s a simple guide on how to use Python to read CSV file
import csv
with open('Salary_Data.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
Iterate through the rows of the CSV file using a ‘for’ loop and the DictReader object to see the field names as keys along with their respective values.
for row in reader:
print(row)
We can write to a CSV file in multiple ways.
The csv.writer() function returns a writer object that converts the input data into a delimited string.
For example, let’s assume we are recording the data of 3 students (Name, M1 Score, M2 Score)
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
Now let’s see how this data can be written to a CSV file using csv.writer:
Step1: Import csv library.
import csv
Step2: Define a filename and Open the file using open().
Step3: Create a csvwriter object using csv.writer().
Step4: Write the header.
Step5: Write the rest of the data.
Code for steps 2-5
filename = 'Students_Data.csv'
with open(filename, 'w', newline="") as file:
csvwriter = csv.writer(file) # 2. create a csvwriter object
csvwriter.writerow(header) # 4. write the header
csvwriter.writerows(data) # 5. write the rest of the data
Below is how our CSV file looks.
.writelines() iterates through each list, converts the list elements to a string, and then writes it to the csv file.
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
filename = 'Student_scores.csv'
with open(filename, 'w') as file:
for header in header:
file.write(str(header)+', ')
file.write('n')
for row in data:
for x in row:
file.write(str(x)+', ')
file.write('n')
Follow these steps to write to a CSV file using pandas:
import pandas as pd
Syntax: pd.DataFrame(data, columns)
The data parameter takes the records/observations, and the columns parameter takes the columns/field names.
header = ['Name', 'M1 Score', 'M2 Score']
data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
data = pd.DataFrame(data, columns=header)
Syntax:DataFrame.to_csv(filename, sep=’,’, index=False)
**separator is ‘,’ by default.
index=False to remove the index numbers.
data.to_csv('Stu_data.csv', index=False)
Below is how our CSV looks like
You can write data into a CSV file using the csv module .DictReader following the below steps.
import csv
Create a new file object using the open()
function, specifying the file name with the mode set as ‘w’ for writing.
with open('Students_Data.csv', 'w', newline='') as csvfile:
data = [{'Name': 'Alex', 'M1 Score': 62, 'M2 Score': 80},
{'Name': 'Brad', 'M1 Score': 45, 'M2 Score': 56},
{'Name': 'Joey', 'M1 Score': 85, 'M2 Score': 98}]
Note that the delimiter by default is ‘,’
fieldnames = ['Name', 'M1 Score', 'M2 Score'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)
This will create a new file named ‘Students_Data.csv’ with Name, M1 Score, and M2 Score as the header/column names and the data values under the data variable.
By now, I’m sure you are all familiar with the various techniques for handling CSV files in Python, including the essential process of Python read CSV file. We trust this article has been informative for all. Feel free to share it with your study buddies to spread the knowledge and enhance everyone’s Python skills.
Knowing how to read and write CSV files in Python is an essential skill for any data scientist or analyst. It can save time, improve productivity, and make data processing more efficient. Whether you’re just starting out or looking to take your skills to the next level, our Data Science Black Belt program is an excellent resource to enhance your knowledge in data science. The program covers basics of Python programming to advanced machine learning concepts. With hands-on projects and case studies, you’ll gain practical experience and learn how to apply your skills to real-world problems.
A. You can write data to a CSV file in Python using pandas, or csv modules such as .writer and .DictWriter, or by the .writelines() method.
A. There are many ways to read CSV files as plain text in Python including using csv.reader, .readlines(), pandas, or csv.DictReader.
A. To read a CSV file in Python row by row, you can use the csv.reader module and iterate through each row using a for loop. This method allows you to efficiently process the contents of the CSV file in Python.
A. To create a CSV file in Python, you can use the built-in csv module. First, import the module and open a new file using the ‘with open’ statement. Then create a csv writer object and use it to write rows of data to the file. Finally, close the file.
To read a CSV file in Python before the deadline, utilize the pandas library’s read_csv function, which provides efficient methods for reading CSV files into a DataFrame for further analysis. Additionally, you can leverage Python’s capabilities to write CSV files using the csv.writer module, enabling you to handle data manipulation tasks effectively.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,