添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Pandas 有一种众所周知的方法,可以通过列表的破折号、空格和返回列( Series )来拆分字符串列或文本列;如果我们谈论 pandas,术语 Series 被称为 Dataframe 列。

我们可以使用 pandas Series.str.split() 函数将字符串拆分为围绕给定分隔符或定界符的多列。它类似于 Python 字符串 split() 方法,但适用于整个 Dataframe 列。我们有最简单的方法来分隔下面的列。

此方法将 Series 字符串与初始索引分开。

Series . str . split(pat = None , n =- 1 , expand = False ) 让我们尝试了解此方法的工作原理

# import Pandas as pd import pandas as pd # innitilize Dataframe df = pd . DataFrame({ 'Email' : [ '[email protected]' , '[email protected]' , '[email protected]' ], 'Number' :[ '+44-3844556210' , '+44-2245551219' , '+44-1049956215' ], 'Location' :[ 'Alameda,California' , 'Sanford,Florida' , 'Columbus,Georgia' ]}) print ( "Dataframe series: \n " ,df) 我们创建了一个 Dataframe df ,包含三列, Email Number Location 。请注意,电子邮件列中的字符串具有特定的模式。但是,如果你仔细观察,可以将此列拆分为两列。我们将很好地解决所需的问题。

Dataframe series : Email Number Location 0 [email protected] +44-3844556210 Alameda,California 1 [email protected] +44-2245551219 Sanford,Florida 2 [email protected] +44-1049956215 Columbus,Georgia 我们将使用 Series.str.split() 函数来分隔 Number 列并在 split() 方法中传递 - 。确保将 True 传递给 expand 关键字。

示例 1:

print ( " \n\n Split 'Number' column by '-' into two individual columns : \n " , df . Number . str . split(pat = '-' ,expand = True )) 这个例子将用 - 分割系列(数字)的每个值。

Split 'Number' column into two individual columns : 0 1 0 +44 3844556210 1 +44 2245551219 2 +44 1049956215 如果我们只使用扩展参数 Series.str.split(expand=True) ,这将允许拆分空格,但不能用 - , 或字符串中存在的任何正则表达式进行分隔,你必须通过 pat 参数。

让我们重命名这些拆分列。

df[[ 'Dialling Code' , 'Cell-Number' ]] = df . Number . str . split( '-' ,expand = True ) print (df) 我们创建了两个新系列 Dialling code Cell-Number 并使用 Number 系列分配值。

Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number 0 3844556210 1 2245551219 2 1049956215 示例 2:

在这个例子中,我们将用 , 分割 Location 系列。

df[[ 'City' , 'State' ]] = df . Location . str . split( ',' ,expand = True ) print (df) 拆分 Location 系列并将其值存储在单独的系列 City State 中。

Email Number Location City \ 0 [email protected] +44-3844556210 Alameda,California Alameda 1 [email protected] +44-2245551219 Sanford,Florida Sanford 2 [email protected] +44-1049956215 Columbus,Georgia Columbus State 0 California 1 Florida 2 Georgia 让我们看看最后一个例子。我们将在 Email 系列中分隔全名。

full_name = df . Email . str . split(pat = '@' ,expand = True ) print (full_name) 0 1 0 Alex.jhon gmail.com 1 Hamza.Azeez gmail.com 2 Harry.barton hotmail.com 现在我们用 . 分隔名字和姓氏。

df[[ 'First Name' , 'Last Name' ]] = full_name[ 0 ] . str . split( '.' ,expand = True ) print (df) Email Number Location First Name \ 0 [email protected] +44-3844556210 Alameda,California Alex 1 [email protected] +44-2245551219 Sanford,Florida Hamza 2 [email protected] +44-1049956215 Columbus,Georgia Harry Last Name 0 jhon 1 Azeez 2 barton 如果在 .split() 方法中传递了 expand=True n=-1 参数将不起作用。

print (df[ 'Email' ] . str . split( '@' ,n =- 1 , expand = True )) 0 1 0 George Washington 1 Hamza Azeez 2 Harry Walker 整个示例代码如下。

# import Pandas as pd import pandas as pd # create a new Dataframe df = pd . DataFrame({ 'Email' : [ '[email protected]' , '[email protected]' , '[email protected]' ], 'Number' :[ '+44-3844556210' , '+44-2245551219' , '+44-1049956215' ], 'Location' :[ 'Alameda,California' , 'Sanford,Florida' , 'Columbus,Georgia' ]}) print ( "Dataframe series : \n " ,df) print ( " \n\n Split 'Number' column by '-' into two individual columns : \n " , df . Number . str . split(pat = '-' ,expand = True )) df[[ 'Dialling Code' , 'Cell-Number' ]] = df . Number . str . split( '-' ,expand = True ) print (df) df[[ 'City' , 'State' ]] = df . Location . str . split( ',' ,expand = True ) print (df) full_name = df . Email . str . split(pat = '@' ,expand = True ) print (full_name) df[[ 'First Name' , 'Last Name' ]] = full_name[ 0 ] . str . split( '.' ,expand = True ) print (df) Dataframe series : Email Number Location 0 [email protected] +44-3844556210 Alameda,California 1 [email protected] +44-2245551219 Sanford,Florida 2 [email protected] +44-1049956215 Columbus,Georgia Split 'Number' column by '-' into two individual columns : 0 1 0 +44 3844556210 1 +44 2245551219 2 +44 1049956215 Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number 0 3844556210 1 2245551219 2 1049956215 Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number City State 0 3844556210 Alameda California 1 2245551219 Sanford Florida 2 1049956215 Columbus Georgia 0 1 0 Alex.jhon gmail.com 1 Hamza.Azeez gmail.com 2 Harry.barton hotmail.com Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number City State First Name Last Name 0 3844556210 Alameda California Alex jhon 1 2245551219 Sanford Florida Hamza Azeez 2 1049956215 Columbus Georgia Harry barton
  • Java Struts2 框架 java.lang.NoSuchFieldError: EMPTY_BYTE_ARRAY 错误解决
  • PHP扩展开发 ini配置项定义
  • php zookeeper你需要知道的细节
  • PHP操作redis的两种方式
  • PHP简体转繁体——MediaWiki-zhconvert
  • PHP7 中 include、require 相对于PHP5你所不知道的一些
  • Python 错误 IsADirectoryError: [Errno 21] Is a directory 解决方法
  • PHP——json_encode中文编码问题
  • PHP重写session机制
  •