Rahul Shah — Updated On August 17th, 2023
Beginner Pandas Python

Introduction

Every Data Analysis project requires a dataset. These datasets are available in various file formats, such as .xlsx, .json, .csv, and .html. Conventionally, datasets are mostly found as csv data in .csv format. CSV (or Comma Separated Values) files, as the name suggests, have data items separated by commas. CSV files are plain text files that are lighter in file size. It uses comma (,) as the default delimiter or separator while parsing a file. Also, CSV Data files can be viewed and saved in tabular form in popular tools such as Microsoft Excel and Google Sheets. The commas used in CSV data files are known as delimiters. Think of delimiters as a separating boundary that distinguishes between any two subsequent data items. You will learn about various delimiters in pandas in detail in this article.

Learning Objectives

  • In this python3 tutorial, you will learn different types of delimiters in pandas.
  • You will learn to use the read_csv function.
  • You will also learn how to read csv files other than comma separator.

This blog was published as a part of Data Science Blogathon 7

delimiters

Pandas Refresher

Python pandas is a popular library widely used in data science among Data Scientists and Analysts. Pandas is built over another popular library like NumPy from pandas. The conventional use of Pandas is for analyzing and manipulating data but is not limited to the same. Pandas’ basic data structure includes series and Dataframe. A series is a one-dimensional array comprising data items of any data type.

Pandas Dataframe is a two-dimensional array consisting of data items of any data type. Pandas can also be identified as a combination of two or more Pandas Series objects.

Reading CSV Data Files Using Pandas Function

To load and read csv file these CSV files or read_csv delimiter, we import Pandas library called read_csv function Syntax.

df = pd.read_csv()

Syntax

pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar=’”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
na-filter: Detect missing values. set this to False to improve performance. missing data should be encoded as nan.

The read_csv function has tens of parameters, out of which one is mandatory, and others are optional to use on an ad hoc basis. By default, it reads the first rows on CSV as column names (header), and it creates an incremental numerical number as an index starting from zero. This mandatory parameter specifies the CSV file we want to read. For example,

Note: Remember to use double backward slashes while specifying the file path.

abc.csv file, read file separators

Sep Parameter: The Default Delimiter in Pandas

One of the optional parameters in the read_csv function is sep, a shortened name for the separator. This operator is the delimiter we talked about before. This sep parameter tells the interpreter which delimiter is used in our dataset or, in Layman’s terms, how the data items are separated in our CSV file.

The default value of the sep parameter is the comma (,), which means if we don’t specify the sep parameter in our read_csv() function, it is understood that our file is using a comma as the delimiter. Thus, in our previous code snippet, we did not specify the sep parameter; it was understood that our file has commas as delimiters.

Using Other Delimiters in Pandas

Often it may happen that the dataset in .csv file format has data items separated by a delimiter other than a comma. This includes semicolons, colons, tab spaces, vertical bars, etc. In such cases, we need to use the sep parameter inside the read.csv() function. For example, a file named Example.csv is a semi-colon-separated CSV file, the syntax of which is given below.

Example.csv File, Using Other Delimiters in Pandas
df = pd.read_csv("C:\Users\Rahul\Desktop\Example.csv", sep = ';')

On executing this code, we get a dataframe named df:

Dataframe df, dataframe Delimiters in Pandas

Vertical-bar Separator

Thus, a vertical bar delimited file can be read by the below syntax:

df = pd.read_csv("C:\Users\Rahul\Desktop\Example.csv", sep = '|')

Colon Separator

And a colon-delimited file can be loaded using the below syntax:

df = pd.read_csv("C:\Users\Rahul\Desktop\Example.csv", sep = ':')

Tab Separator

Often we may come across the file having file format .tsv. These .tsv files have tab-separated values in them, or we can say it has tab space as a delimiter. Such files can be read using the same .read_csv() function of pandas, and we need to specify the delimiter. For example:

df = pd.read_csv("C:\Users\Rahul\Desktop\Example.tsv", sep = 't')

Similarly, other separators can be used based on identified delimiter from our data.

You can use the to_csv() method to export data from a DataFrame or pandas series as a csv file or append it to an existing csv file.

Conclusion

It is always useful to check how our data is stored in our dataset. Understanding the data is necessary before starting to work on it. A delimiter (pandas read csv delimiter) can be identified effortlessly by checking the data. Based on our inspection, we can use the relevant delimiter in the sep parameter. In this article, we have learned about different csv separators. We have also learned how to read and check data and how data is stored.

Key Takeaways

  • Python pandas library is very useful for preprocessing data, from loading to cleaning the data.
  • Commas are the default delimiters or sep parameters in a csv file.
  • Vertical-bar separators, colon separators, and tab separators are some of the other delimiters in pandas.

Frequently Asked Questions

Q1. How to read CSV with delimiter?

A. We can read CSV files with a delimiter using the pd.read_csv function from the python pandas library.

Q2. Is there a way to read a CSV file in Python that has a delimiter?

A. Yes, we can read CSV files in python using pandas libraries with different delimiters.

Q3. What is the first row of a CSV file?

A. The first row must contain column headers of the csv file.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

About the Author

Rahul Shah

IT Engineering Graduate currently pursuing Post Graduate Diploma in Data Science.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *