python-为什么在函数内进行此DataFrame修改会更改全局外部函数?

为什么下面的函数更改名为df的全局DataFrame?它不应该只是在函数中更改局部df,而不能更改全局df吗?

import pandas as pd

df = pd.DataFrame()

def adding_var_inside_function(df):
    df['value'] = 0

print(df.columns) # Index([], dtype='object')
adding_var_inside_function(df)
print(df.columns) # Index([u'value'], dtype='object')

最佳答案

docs开始:

Mutability and copying of data

All pandas data structures are value-mutable (the values they contain can be altered) but not always
size-mutable. The length of a Series cannot be changed, but, for
example, columns can be inserted into a DataFrame. However, the vast
majority of methods produce new objects and leave the input data
untouched. In general, though, we like to favor immutability where
sensible.

这是另一个示例,显示值(单元格)的可变性:

In [21]: df
Out[21]:
   a  b  c
0  3  2  0
1  3  3  1
2  4  0  0
3  2  3  2
4  0  4  4

In [22]: df2 = df

In [23]: df2.loc[0, 'a'] = 100

In [24]: df
Out[24]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

df2是对df的引用

In [28]: id(df) == id(df2)
Out[28]: True

您的函数不会改变DF参数:

def adding_var_inside_function(df):
    df = df.copy()
    df['value'] = 0
    return df

In [30]: df
Out[30]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

In [31]: adding_var_inside_function(df)
Out[31]:
     a  b  c  value
0  100  2  0      0
1    3  3  1      0
2    4  0  0      0
3    2  3  2      0
4    0  4  4      0

In [32]: df
Out[32]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4