3 ways to deal with SettingWithCopyWarning in Pandas
It doesn’t pay to ignore warnings. Even when they don’t make sense. – Debra Doyle
One of the things I was taught while learning to code was to not be bothered by ‘Warnings’ in your code. “Focus on fixing major bugs and errors, warnings aren’t a big deal”, was the advice I got. I realized it was terrible advice when I started working on real-world situations. Sometimes warnings can cost you more than you think. One such warning is the ‘SettingWithCopy’ warning in Pandas.
No matter how long you’ve worked with pandas, sooner or later you’re bound to encounter the SettingWithCopyWarning. If you’re trying to wrap your heads around what it is and why it keeps showing up even when you “do get the output you expected”, then this article is for you.
- Views Vs Copies
- Chained Assignment
- Hidden Chaining
- False Positives
In order to explain the logic behind the warning, I have used the Car Sales dataset from Kaggle. The dataset contains information about different types of cars.
Here is a glimpse of the data and the structure of the dataset.
Let’s assume a scenario where we have received an update that the fuel_capacity of all the Porsche cars have been increased from 17.0 to 18.0 and we have been requested to make the changes. Let’s go ahead and change them.
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0
Uh-oh! We have triggered the famous ‘SettingWithCopy’ warning.
If we take a look at the dataframe now, we can see the values are not updated.
car_sales[car_sales['Manufacturer'] == 'Porsche']
We have to understand “SettingWithCopy” is a warning and not an error. An error breaks your code and prevents you from moving on further without fixing it. But, a warning indicates that there is something wrong with your code while producing the output.
In this case, sometimes we might get the output we intended to and also be tempted to ignore the warning. But we should never ignore this warning because it means that the operation we are trying to perform may not have worked as we expected and there can be some unexpected issues in the future.
These are the words of Jeff Reback, one of the core developers of pandas, on why you should never ignore this warning.
In order to understand how to fix this warning and what to do when we face it, it is imperative to know the difference between Views and Copies in Pandas and how they work.
Views Vs Copies
In the code above where we try to return all the Porsche cars from the data, the result we receive may either be a view or a copy of the dataframe.
A view (or a shallow copy) is a subset of the original object which doesn’t have its own memory and address space. It is just a projection of the object we are trying to access.
A copy (or a deep copy) is a duplicate of the original object which has its own memory and address space. It is a separate entity that is thrown away in Pandas once we are done operating on them.
One of the main differences between views and copies is that modifying a view modifies the original dataframe and vice versa, whereas modifying a copy doesn’t affect the original dataframe.
Let’s say we change ‘sales_in_thousands’ for the car of ‘Model‘ Boxter to 9.35.
You can see above, modifying a view modifies the original dataframe as well.
On the contrary, modifying a copy doesn’t necessarily modify the original dataframe.
Pandas got this behavior of views and copies from the underlying Numpy arrays. Numpy arrays are limited to a datatype so whether a view or a copy is returned can be predicted. While Pandas uses its Numpy core, it follows a complex set of rules to optimize space and determine whether to return a view or a copy. Because of that, whenever we are indexing a dataframe, there is no set way to predict whether a view or a copy is returned. To quote the pandas documentation,
Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees) ………. That’s what SettingWithCopy is warning you about!
To check whether a view or a copy is returned, you can use the internal attributes _is_view or _is_copy. _is_view returns a boolean and _is_copy returns a reference to the original dataframe or None.
Let’s look at 3 of the most common issues for encountering this warning and how to handle them.
One of the most common reasons Pandas generates this warning is when it detects chained assignment or chained indexing.
There are two things we do with a Pandas dataframe, we either
- Set – assign a value to something
- Or, Get – access values from something
A chained assignment is when we try to assign(set) something by using more than one indexing operation.
Recall the example below which we used previously.
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0
Here, two indexing operations are combined to set a value. First, we are trying to access (get) all the ‘Porsche’ cars from the dataframe, then we try to assign(set) a new value to ‘Fuel_capacity’.
We want to modify the original dataframe but this operation may try to create a copy and modify it. This is what the warning is telling us. ‘A value is trying to be set on a copy of a slice of a dataframe’.
We discussed above that Pandas can either create a view or a copy when we are trying to access (get) a subset of an operation.
Let’s see if the operation we are trying to perform is on a view or a copy.
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_view # output True
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_copy #output
_is_view has returned ‘True’ meaning it’s a view while _is_copy has returned a ‘weakref’ meaning it’s a copy. Hence, the output of the ‘get‘ operation is ambiguous. It can be anything in the end. This is why ignoring the ‘SettingWithCopy’ warning is a bad idea. It can eventually lead to breaking something in your code when you least expect it.
The problem of chained assignment can be tackled easily by combining the back-to-back indexing operations into a single operation using .loc.
car_sales.loc[car_sales.Manufacturer == 'Porsche', 'Fuel_capacity'] = 18.0 car_sales[car_sales.Manufacturer == 'Porsche']['Fuel_capacity'] #output 124 18.0 125 18.0 126 18.0 Name: Fuel_capacity, dtype: float64
The second most common reason that triggers this warning is Hidden Chaining. It can be tricky and hard to track down the source of this problem as it may span across your entire codebase.
Let’s look at a scenario for Hidden Chaining. We’ll go ahead and create a new dataframe containing all the ‘Chevrolet’ cars while bearing in mind to use .loc from our previous lesson.
chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet'] chevrolet_cars
We do some other operations for some time and play around with our code.
chevrolet_cars['Model'].value_counts() .... # few lines of code chevrolet_cars['Sales_in_thousands'].std() .... chevrolet_cars['__year_resale_value'].max() .... # few lines of code chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638
Boom! This warning again!!
There was no chained assignment in that last line of code but it still went ahead and triggered that warning. Let’s look at the values in our dataframe.
It has updated our value. So should we go ahead and ignore the warning this time? Probably not.
There is no obvious chained assignment in this code. In reality, it can occur on one line or even across multiple lines of code. When we created the ‘chevrolet_cars’ dataframe, we used a get operation. So there is no guarantee whether this returned a view or a copy. So, we might be trying to modify the original dataframe as well.
Identifying this problem can be very tedious in real codebases spanning thousands of lines but it is very simple to tackle this. When you want to create a new dataframe, explicitly make a copy using the .copy() method. This will make it clear to Pandas that we are operating on a new dataframe.
chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet'].copy() chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638 chevrolet_cars.loc[20, 'Price_in_thousands'] #output 17.638
False-positive is a case when the warning is triggered when it’s not supposed to be. It’s very safe to ignore the warning in this case. Many of the scenarios which caused the “False Positive” warnings have been fixed in Pandas through the years. It’s discussed in the Pandas documentation if you want to take a look.
Let’s say we want only the cars with Vehicle_type as ‘Passenger‘ and we would like to create a new column which will be a boolean indicating whether the car is available or not.
car_sales = car_sales[car_sales['Vehicle_type'] == 'Passenger'] car_sales['In Stock'] = 'True' #output :1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
If you look at the dataframe, it would have updated the new column. In this case, we are not bothered if it overwrites the original dataframe.
We can go ahead and suppress the warning by changing the default behavior as follows,
- pd.set_option(‘mode.chained_assignment’, None) – suppresses and turn off the warning
- pd.set_option(‘mode.chained_assignment’, ‘Warn’) – issues a warning
- pd.set_option(‘mode.chained_assignment’, ‘Raise’) – raises an exception
Changing the behavior is not recommended unless you know what you are doing.
Most of the scenarios of the ‘SettingWithCopy’ warning can be avoided by communicating to Pandas clearly and understanding why it’s triggered in the first place. When you want to modify the original dataframe, use .loc, or when you want a copy, specify it directly. This will not only prevent future warnings and errors, but it will also make your codebase more robust for maintenance.
Thank you for reading all the way down here! Let me know in the comments if you have any feedback, criticism, or concerns. Have a good day!
- Cover Image: Photo by Muhammad Daudy on Unsplash
- Output Images: Author
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion