3 Ways to Deal With SettingWithCopyWarning in Python

Padhma M 17 Mar, 2024 • 10 min read

Introduction

While learning data science, we have to know how to work with many libraries like numpy, matplotlib, seaborn, etc. It is important to learn about machine learning algorithms in data science and also know how to use these packages to your advantage. Another important thing to know while working with data is how to handle errors and warnings. No matter how long you’ve worked with python pandas, sooner or later, you’re bound to encounter the SettingWithCopyWarning. If you’re trying to wrap your head around what it is and why it keeps showing up even when you “do get the output you expected”, then this article is for you.

SettingWithCopyWarning image

Learning Objectives

  • Understand the difference between views and copies.
  • Understand why the SettingWithCopyWarning should be dealt with.
  • Understand the different ways of dealing with the SettingWithCopyWarning.

This article was published as a part of the Data Science Blogathon.

Table of Contents

What Is the SettingWithCopyWarning?

It doesn’t pay to ignore warnings. Even when they don’t make sense. – Debra Doyle

One of the things I was taught while learning to code in python was not to be bothered by ‘Warnings’ in your code. “Focus on fixing major bugs and errors; warnings aren’t a big deal” was the advice I got. I realized it was terrible advice when I started working on real-world situations. Sometimes warnings can cost you more than you think. While working with the pandas DataFrame, I faced one such warning, the ‘SettingWithCopy’.

In order to explain the logic behind the warning, I will take you through a hands-on tutorial in python.  I have used the Car Sales dataset from Kaggle as a sample dataset. The dataset contains information about different types of cars.

Here is a glimpse of the data and the structure of the dataset.

data structure SettingWithCopyWarning

Imported Pandas

Here we have imported pandas using the import pandas statement. If you want, you can import numpy and matplotlib as well, then we have used the read_csv function to read the data. Then we used the info function to find out about the data in all the columns, such as non-null/missing values, and whether the data is categorical or numerical.

car sales info SettingWithCopyWarning

Inference: Here, we can see that some of the values are null values or nan values as the number of non-nulls is less in some of the columns.

Examples

Let’s assume a scenario where we have received an update that the fuel_capacity of all the Porsche cars has been increased from 17.0 to 18.0, and we have been requested to make the changes. Let’s go ahead and change them.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0
SettingWithCopyWarning

Uh-oh! We have triggered the famous ‘SettingWithCopyWarning’.

If we take a look at the dataframe now, we can see the values are not updated.

car_sales[car_sales['Manufacturer'] == 'Porsche']
porche SettingWithCopyWarning

We have to understand “SettingWithCopy” is a warning and not an error. An error breaks your code and prevents you from moving on further without fixing it. But, a warning indicates that there is something wrong with your code while producing the output.

In this case, sometimes we might get the output we intended to and also be tempted to ignore the warning. But we should never ignore this warning because it means that the operation we are trying to perform may not have worked as we expected, and there can be some unexpected issues in the future.

These are the words of Jeff Reback, one of the core developers of pandas, on why you should never ignore this warning.

SettingWithCopyWarning view

In order to understand how to fix this warning and what to do when we face it, it is imperative to know the difference between Views and Copies in Pandas and how they work.

Difference Between Views & Copies

An important step in machine learning is preprocessing, and there will be lots of situations in which we will have to work with a data frame’s views and copies. It is especially important to know about views and copies when we are using the df.loc and iloc functions as they can return either views or copies depending on how they are used. There are some functions like the groupby function, which does not return a view or copy but only the grouped object. So we must stay familiar with which function returns what. If you want to know more about this, then I recommend you to go through this documentation about Pandas’ APIs https://pandas.pydata.org/docs/reference/index.html

In the code above, where we try to return all the Porsche cars from the data, the result we receive may either be a view or a copy of the dataframe.

A view (or a shallow copy) is a subset of the original object which doesn’t have its own memory and address space. It is just a projection of the object we are trying to access.

SettingWithCopyWarning views vs curve

A copy (or a deep copy) is a duplicate of the original object which has its own memory and address space. It is a separate entity that is thrown away in Pandas once we are done operating on them.

copy of cars sales SettingWithCopyWarning

Views and Copies

One of the main differences between views and copies is that modifying a view modifies the original dataframe and vice versa, whereas modifying a copy doesn’t affect the original dataframe.

Let’s say we change ‘sales_in_thousands’ for the car of ‘Model‘ Boxter to 9.35.

sales in 1000s | SettingWithCopyWarning

You can see above modifying a view modifies the original dataframe as well.

On the contrary, modifying a copy doesn’t necessarily modify the original dataframe.

Pandas got this behavior of views and copies from the underlying Numpy arrays. Numpy arrays are limited to a datatype, so whether a view or a copy is returned can be predicted. While Pandas uses its Numpy core, it follows a complex set of rules to optimize space and determine whether to return a view or a copy. Because of that, whenever we are indexing a dataframe, there is no set way to predict whether a view or a copy is returned. To quote the pandas documentation,

Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees) ………. That’s what SettingWithCopy is warning you about!

To check whether a view or a copy is returned, you can use the internal attributes _is_view or _is_copy. _is_view returns a boolean, and _is_copy returns a reference to the original dataframe or None.

If you want to read more about views, then you can look at this official documentation from pandas https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

What Causes the SettingWithCopyWarning & How to Fix It?

Let’s look at 3 of the most common issues for encountering the SettingWithCopyWarning and how to handle them.

Chained Assignment

One of the most common reasons Pandas generates this warning is when it detects chained assignment or chained indexing.

There are two things we do with a Pandas dataframe, we either

  • Set – assign a value to something
  • Or, Get – access values from something

A chained assignment is when we try to assign(set) something by using more than one indexing operation.

Recall the example below, which we used previously.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0

Here, two indexing operations are combined to set a value. First, we try to access (get) all the ‘Porsche’ cars from the dataframe, then we try to assign(set) a new value to ‘Fuel_capacity’.

We want to modify the original dataframe, but this operation may try to create a copy and modify it. This is what the warning is telling us. ‘A value is trying to be set on a copy of a slice of a dataframe.

We discussed above that Pandas can either create a view or a copy when we are trying to access (get) a subset of an operation.

Let’s see if the operation we are trying to perform is on a view or a copy.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_view

# output
True
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_copy

#output

_is_view has returned ‘True’, meaning it’s a view, while _is_copy has returned a ‘weakref’, meaning it’s a copy. Hence, the output of the ‘get‘ operation is ambiguous. It can be anything in the end. This is why ignoring the ‘SettingWithCopyWarning’ is a bad idea. It can eventually lead to breaking something in your code when you least expect it.

The problem of chained assignment can be tackled easily by combining the back-to-back indexing operations into a single operation using .loc.

car_sales.loc[car_sales.Manufacturer == 'Porsche', 'Fuel_capacity'] = 18.0
car_sales[car_sales.Manufacturer == 'Porsche']['Fuel_capacity']

#output
124    18.0
125    18.0
126    18.0
Name: Fuel_capacity, dtype: float64

Hidden Chaining

The second most common reason that triggers this warning is Hidden Chaining. It can be tricky and hard to track down the source of this problem as it may span across your entire codebase. Let’s look at a scenario for Hidden Chaining. We’ll go ahead and create a new dataframe containing all the ‘Chevrolet’ cars while bearing in mind to use .loc from our previous lesson.

chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet']
chevrolet_cars
chevorlet

We do some other operations for some time and play around with our code.

chevrolet_cars['Model'].value_counts()
....
# few lines of code
chevrolet_cars['Sales_in_thousands'].std()
....
chevrolet_cars['__year_resale_value'].max()
....
# few lines of code
chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638
warning

Boom! This warning again!!

There was no chained assignment in that last line of code, but it still went ahead and triggered that warning. Let’s look at the values in our dataframe.

data

It has updated our value. So should we go ahead and ignore the warning this time? Probably not.

There is no obvious chained assignment in this code. In reality, it can occur on one line or even across multiple lines of code. When we created the ‘chevrolet_cars’ dataframe, we used a get operation. So there is no guarantee whether this returned a view or a copy. So, we might be trying to modify the original dataframe as well.

Identifying this problem can be very tedious in real codebases spanning thousands of lines, but it is very simple to tackle this. When you want to create a new dataframe, explicitly make a copy using the .copy() method. This will make it clear to Pandas that we are operating on a new dataframe.

chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet'].copy()
chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638
chevrolet_cars.loc[20, 'Price_in_thousands']

#output
17.638

False Positives

False-positive is a case when the warning is triggered when it’s not supposed to be. It’s very safe to ignore the warning in this case. Many of the scenarios which caused the “False Positive” warnings have been fixed in Pandas through the years. It’s discussed in the Pandas documentation if you want to take a look.

false positives

Let’s say we want only the cars with Vehicle_type as ‘Passenger‘, and we would like to create a new dataframe column which will be a boolean indicating whether the car is available or not.

car_sales = car_sales[car_sales['Vehicle_type'] == 'Passenger']
car_sales['In Stock'] = 'True'

#output
:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

If you look at the dataframe, it would have updated the new column. In this case, we are not bothered if it overwrites the original dataframe.

We can go ahead and suppress the warning by changing the default behavior as follows,

  • pd.set_option(‘mode.chained_assignment’, None) – suppresses and turn off the warning
  • pd.set_option(‘mode.chained_assignment’, ‘Warn’) – issues a warning
  • pd.set_option(‘mode.chained_assignment’, ‘Raise’) – raises an exception

Changing the behavior is not recommended unless you know what you are doing.

How do you avoid setting with copy warning?

To avoid the “SettingWithCopy” warning in Python, which typically arises when modifying a DataFrame slice, follow these Concepts:

  • Use .loc or . iloc: When modifying Data Frame slices, explicitly employ .loc or .iloc to indicate whether you intend to modify a view or a copy of the data. For example:
df.loc[df['column'] > 5, 'new_column'] = 0
  • Create a copy explicitly: If you intend to modify a copy of the DataFrame, make it explicit using the .copy() method. This will give clarity in your code. For example:
df_copy = df[df['column'] > 5].copy()
df_copy['new_column'] = 0
  • Suppress the warning: Although not recommended, you can suppress the warning temporarily using pd.options.mode.chained_assignment = None. However, this might obscure potential issues in your code, so exercise caution when using this approach.

Conclusion

Hope this article has given you a detailed overview of the ‘SettingWithCopyWarning’ in pandas in Python. Most of the scenarios of the ‘SettingWithCopyWarning’ ignore can be avoided by communicating to Pandas clearly and understanding why it’s triggered in the first place. When you want to modify the original dataframe, use .loc, or when you want a copy, specify it directly. This will not only prevent future warnings and errors, but it will also make your codebase more robust for maintenance.

You can take a look at these issues on GitHub #5390 and #5597 for background discussion.

Key Takeaways

  • ‘SettingWithCopy’ is a common warning (and not an error) you may encounter while working on pd.dataframe in python.
  • Chained assignments, hidden chaining, and false positives are some of the most common reasons the settingwithcopywarning new column pops up.
  • A view is a subset of the original object which doesn’t have its own memory and address space, whereas a copy is a duplicate of the original object, which has its own memory and address space.

If you want to master pandas, then for further reading, you can “iterate” through these articles:

You can also check out this free course on pandas https://courses.analyticsvidhya.com/courses/pandas-for-data-analysis-in-python

Frequently Asked Questions

Q1. How do you copy a DataFrame in Python using pandas lib?

A. We can use the copy() function to create a copy of a dataframe. In case you want to make changes to a dataframe, then you should create copies of that dataframe so that if you want to roll back, then you can.

Q2. How do I stop Pandas from printing warnings?

A. You can use the pd.options.mode to stop Pandas from printing warnings. The chained_assignment setting, set to None, will suppress the warnings. However, be careful because this might hide potential issues in your code.

Q3. What is the difference between a DataFrame and a Series?

A. DataFrame and Series are both data structures in pandas that you can use to manipulate data. A dataframe can have multiple columns; however, a series has only one column. Also, we can store multiple types of data, like an object, float64, etc., in a dataframe, but in series, since it is only one column, you can only have data of the same data type.

Q4. What is chained indexing in Python?

Chained indexing happens in Python when you use multiple indexing operations, like [ ], one after another, on a DataFrame or a series. This can sometimes lead to unexpected behavior and make it unclear whether you’re working with a copy or a view of the data. It’s generally 

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Padhma M 17 Mar 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Johannes
Johannes 17 Dec, 2021

Very good explanation! The documentation was never that clear for me to understand. Thanks!

Comments are Closed

Related Courses

image.name
0 Hrs 70 Lessons
5

Introduction to Python

Free