dask_expr._collection.DataFrame.replace — Dask 文档

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

失望的鸵鸟 · 部分内容可能包含了文档检查器无法删除的个人信 ...· 3 月前 ·

刚失恋的投影仪 · Tip: Stretching list ...· 4 月前 ·

冷冷的鸭蛋 · MailItem.SaveAs 方法 ...· 5 月前 ·

高大的机器人 · "Column referenced ...· 6 月前 ·

细心的警车 · volutic怎么赚钱，实用技巧与行业经验分 ...· 8 月前 ·

DataFrame. replace ( to_replace = None , value = _NoDefault.no_default , regex = False ) ¶

将 to_replace 中的值替换为 value 。

此文档字符串是从 pandas.core.frame.DataFrame.replace 复制的。

Dask 版本可能存在一些不一致性。

Series/DataFrame 的值被动态替换为其他值。这与使用 .loc 或 .iloc 进行更新不同，后者要求您指定一个位置以用某些值进行更新。

待替换 str, regex, list, dict, Series, int, float, 或 None

如何找到将被替换的值。

数字, 字符串或正则表达式:

numeric: 数值等于 to_replace 的将被替换为 value

str: 完全匹配 to_replace 的字符串将被替换为 value

regex: 匹配 to_replace 的正则表达式将被替换为 value

字符串列表、正则表达式或数值：

首先，如果 to_replace 和 value 都是列表，它们必须具有相同的长度。

其次，如果 regex=True ，那么**两个**列表中的所有字符串都将被解释为正则表达式，否则它们将直接匹配。对于 value 来说，这并不重要，因为只有少数几个可能的替换正则表达式可以使用。

字符串、正则表达式和数值规则如上所述适用。

dict:

字典可以用来为不同的现有值指定不同的替换值。例如， {'a': 'b', 'y': 'z'} 将值 ‘a’ 替换为 ‘b’，将 ‘y’ 替换为 ‘z’。要以此方式使用字典，不应给出可选的 value 参数。

对于一个 DataFrame，字典可以指定在不同的列中应该替换不同的值。例如， {'a': 1, 'b': 'z'} 在列 ‘a’ 中查找值 1，在列 ‘b’ 中查找值 ‘z’，并将这些值替换为 value 中指定的任何内容。在这种情况下， value 参数不应为 None 。你可以将其视为传递两个列表的特殊情况，只不过你指定了要搜索的列。

对于嵌套字典的 DataFrame，例如 {'a': {'b': np.nan}} ，读取方式如下：在列 ‘a’ 中查找值 ‘b’ 并将其替换为 NaN。在这种方式下，不应指定可选的 value 参数。你也可以嵌套正则表达式。请注意，列名（嵌套字典中的顶级字典键）**不能**是正则表达式。

None:

这意味着 regex 参数必须是一个字符串、编译的正则表达式，或者是列表、字典、ndarray 或 Series 等元素。如果 value 也是 None ，那么这必须是一个嵌套的字典或 Series。

请参阅示例部分，了解这些内容的示例。

值 scalar, dict, list, str, regex, default None

用于替换与 to_replace 匹配的任何值。对于 DataFrame，可以使用字典来指定每列要使用的值（不在字典中的列将不会被填充）。正则表达式、字符串以及这些对象的列表或字典也是允许的。

就地 bool, 默认 False (Dask 中不支持)

如果为真，则在原地执行操作并返回 None。

限制 int, 默认 None (在 Dask 中不支持)

最大正向或反向填充的尺寸差距。

2.1.0 版后已移除.

regex : bool 或与 to_replace 相同类型的值, 默认为 False 布尔值或相同类型

是否将 to_replace 和/或 value 解释为正则表达式。或者，这可以是一个正则表达式，或一个列表、字典或正则表达式的数组，在这种情况下 to_replace 必须为 None 。

方法 {‘pad’, ‘ffill’, ‘bfill’} (在Dask中不支持)

当 to_replace 是标量、列表或元组且 value 为 None 时使用的替换方法。

2.1.0 版后已移除.

如果 to_replace 不是一个标量、类数组、 dict 或 None

如果 to_replace 是一个 dict 而 value 不是一个 list 、 dict 、 ndarray 或 Series

如果 to_replace 是 None 并且 regex 不能编译成正则表达式，或者是一个列表、字典、ndarray 或 Series。

当替换多个 bool 或 datetime64 对象时， to_replace 的参数与被替换值的类型不匹配

ValueError

如果将 list 或 ndarray 传递给 to_replace 和 value ，但它们的长度不同。
正则表达式替换在后台通过 re.sub 执行。 re.sub 的替换规则是相同的。
正则表达式只会替换字符串，这意味着你不能提供一个匹配浮点数的正则表达式，并期望你的数据框中具有数值类型的列会被匹配。然而，如果这些浮点数是字符串，那么你可以这样做。
这种方法有很多选项。鼓励你进行实验并使用这种方法来获得对其工作原理的直观理解。
当使用字典作为 to_replace 值时，字典中的键是 to_replace 部分，而字典中的值是 value 参数。

标量 `to_replace` 和 `value`

>>> s = pd.Series([1, 2, 3, 4, 5])  
>>> s.replace(1, 5)  
0    5
1    2
2    3
3    4
4    5
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],  
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)  
    A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e
类似列表的 `to_replace`
>>> df.replace([0, 1, 2, 3], 4)  
    A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])  
    A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e
>>> s.replace([1, 2], method='bfill')  
0    3
1    3
2    3
3    4
4    5
dtype: int64
类似字典的 `to_replace`
>>> df.replace({0: 10, 1: 100})  
        A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)  
        A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> df.replace({'A': {0: 100, 4: 400}})  
        A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e
正则表达式 `to_replace`
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],  
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)  
        A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)  
        A    B
0   new  abc
1   foo  bar
2  bait  xyz
>>> df.replace(regex=r'^ba.$', value='new')  
        A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})  
        A    B
0   new  abc
1   xyz  new
2  bait  xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')  
        A    B
0   new  abc
1   new  new
2  bait  xyz
比较 s.replace({'a': None}) 和 s.replace('a', None) 的行为，以理解 to_replace 参数的特性：
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])  
当使用字典作为 to_replace 值时，字典中的值(s)相当于 value 参数。s.replace({'a': None}) 等同于 s.replace(to_replace={'a': None}, value=None, method=None):
>>> s.replace({'a': None})  
0      10
1    None
2    None
3       b
4    None
dtype: object
当 value 未明确传递且 to_replace 是标量、列表或元组时，replace 使用方法参数（默认 ‘pad’）来进行替换。因此，在这种情况下，第1行和第2行中的 ‘a’ 值被10替换，第4行中的 ‘b’ 值被替换。
>>> s.replace('a')  
0    10
1    10
2    10
3     b
4     b
dtype: object
当 regex=True 时，value 不是 None 且 to_replace 是字符串，替换将应用于 DataFrame 的所有列。
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],  
...                    'B': ['a', 'b', 'c', 'd', 'e'],
...                    'C': ['f', 'g', 'h', 'i', 'j']})
>>> df.replace(to_replace='^[a-g]', value='e', regex=True)  
    A  B  C
0  0  e  e
1  1  e  e
2  2  e  h
3  3  e  i
4  4  e  j
如果 value 不是 None 并且 to_replace 是一个字典，字典的键将是替换操作将应用的 DataFrame 列。
>>> df.replace(to_replace={'B': '^[a-c]', 'C': '^[h-j]'}, value='e', regex=True)  
    A  B  C
0  0  e  f
1  1  e  g
2  2  e  e
3  3  d  e
4  4  e  e