Pandas 有一种众所周知的方法,可以通过列表的破折号、空格和返回列(
Series
)来拆分字符串列或文本列;如果我们谈论 pandas,术语
Series
被称为 Dataframe 列。
我们可以使用 pandas
Series.str.split()
函数将字符串拆分为围绕给定分隔符或定界符的多列。它类似于 Python 字符串
split()
方法,但适用于整个 Dataframe 列。我们有最简单的方法来分隔下面的列。
此方法将
Series
字符串与初始索引分开。
Series
.
str
.
split(pat
=
None
, n
=-
1
, expand
=
False
)
让我们尝试了解此方法的工作原理
# import Pandas as pd
import
pandas
as
pd
# innitilize Dataframe
df
=
pd
.
DataFrame({
'Email'
: [
'[email protected]'
,
'[email protected]'
,
'[email protected]'
],
'Number'
:[
'+44-3844556210'
,
'+44-2245551219'
,
'+44-1049956215'
],
'Location'
:[
'Alameda,California'
,
'Sanford,Florida'
,
'Columbus,Georgia'
]})
print
(
"Dataframe series:
\n
"
,df)
我们创建了一个 Dataframe
df
,包含三列,
Email
、
Number
和
Location
。请注意,电子邮件列中的字符串具有特定的模式。但是,如果你仔细观察,可以将此列拆分为两列。我们将很好地解决所需的问题。
Dataframe series :
Email Number Location
0 [email protected] +44-3844556210 Alameda,California
1 [email protected] +44-2245551219 Sanford,Florida
2 [email protected] +44-1049956215 Columbus,Georgia
我们将使用
Series.str.split()
函数来分隔
Number
列并在
split()
方法中传递
-
。确保将
True
传递给
expand
关键字。
示例 1:
print
(
"
\n\n
Split 'Number' column by '-' into two individual columns :
\n
"
,
df
.
Number
.
str
.
split(pat
=
'-'
,expand
=
True
))
这个例子将用
-
分割系列(数字)的每个值。
Split 'Number' column into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
如果我们只使用扩展参数
Series.str.split(expand=True)
,这将允许拆分空格,但不能用
-
和
,
或字符串中存在的任何正则表达式进行分隔,你必须通过
pat
参数。
让我们重命名这些拆分列。
df[[
'Dialling Code'
,
'Cell-Number'
]]
=
df
.
Number
.
str
.
split(
'-'
,expand
=
True
)
print
(df)
我们创建了两个新系列
Dialling code
和
Cell-Number
并使用
Number
系列分配值。
Email Number Location Dialling Code \
0 [email protected] +44-3844556210 Alameda,California +44
1 [email protected] +44-2245551219 Sanford,Florida +44
2 [email protected] +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
示例 2:
在这个例子中,我们将用
,
分割
Location
系列。
df[[
'City'
,
'State'
]]
=
df
.
Location
.
str
.
split(
','
,expand
=
True
)
print
(df)
拆分
Location
系列并将其值存储在单独的系列
City
和
State
中。
Email Number Location City \
0 [email protected] +44-3844556210 Alameda,California Alameda
1 [email protected] +44-2245551219 Sanford,Florida Sanford
2 [email protected] +44-1049956215 Columbus,Georgia Columbus
State
0 California
1 Florida
2 Georgia
让我们看看最后一个例子。我们将在
Email
系列中分隔全名。
full_name
=
df
.
Email
.
str
.
split(pat
=
'@'
,expand
=
True
)
print
(full_name)
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
现在我们用
.
分隔名字和姓氏。
df[[
'First Name'
,
'Last Name'
]]
=
full_name[
0
]
.
str
.
split(
'.'
,expand
=
True
)
print
(df)
Email Number Location First Name \
0 [email protected] +44-3844556210 Alameda,California Alex
1 [email protected] +44-2245551219 Sanford,Florida Hamza
2 [email protected] +44-1049956215 Columbus,Georgia Harry
Last Name
0 jhon
1 Azeez
2 barton
如果在
.split()
方法中传递了
expand=True
,
n=-1
参数将不起作用。
print
(df[
'Email'
]
.
str
.
split(
'@'
,n
=-
1
, expand
=
True
))
0 1
0 George Washington
1 Hamza Azeez
2 Harry Walker
整个示例代码如下。
# import Pandas as pd
import
pandas
as
pd
# create a new Dataframe
df
=
pd
.
DataFrame({
'Email'
: [
'[email protected]'
,
'[email protected]'
,
'[email protected]'
],
'Number'
:[
'+44-3844556210'
,
'+44-2245551219'
,
'+44-1049956215'
],
'Location'
:[
'Alameda,California'
,
'Sanford,Florida'
,
'Columbus,Georgia'
]})
print
(
"Dataframe series :
\n
"
,df)
print
(
"
\n\n
Split 'Number' column by '-' into two individual columns :
\n
"
,
df
.
Number
.
str
.
split(pat
=
'-'
,expand
=
True
))
df[[
'Dialling Code'
,
'Cell-Number'
]]
=
df
.
Number
.
str
.
split(
'-'
,expand
=
True
)
print
(df)
df[[
'City'
,
'State'
]]
=
df
.
Location
.
str
.
split(
','
,expand
=
True
)
print
(df)
full_name
=
df
.
Email
.
str
.
split(pat
=
'@'
,expand
=
True
)
print
(full_name)
df[[
'First Name'
,
'Last Name'
]]
=
full_name[
0
]
.
str
.
split(
'.'
,expand
=
True
)
print
(df)
Dataframe series :
Email Number Location
0 [email protected] +44-3844556210 Alameda,California
1 [email protected] +44-2245551219 Sanford,Florida
2 [email protected] +44-1049956215 Columbus,Georgia
Split 'Number' column by '-' into two individual columns :
0 1
0 +44 3844556210
1 +44 2245551219
2 +44 1049956215
Email Number Location Dialling Code \
0 [email protected] +44-3844556210 Alameda,California +44
1 [email protected] +44-2245551219 Sanford,Florida +44
2 [email protected] +44-1049956215 Columbus,Georgia +44
Cell-Number
0 3844556210
1 2245551219
2 1049956215
Email Number Location Dialling Code \
0 [email protected] +44-3844556210 Alameda,California +44
1 [email protected] +44-2245551219 Sanford,Florida +44
2 [email protected] +44-1049956215 Columbus,Georgia +44
Cell-Number City State
0 3844556210 Alameda California
1 2245551219 Sanford Florida
2 1049956215 Columbus Georgia
0 1
0 Alex.jhon gmail.com
1 Hamza.Azeez gmail.com
2 Harry.barton hotmail.com
Email Number Location Dialling Code \
0 [email protected] +44-3844556210 Alameda,California +44
1 [email protected] +44-2245551219 Sanford,Florida +44
2 [email protected] +44-1049956215 Columbus,Georgia +44
Cell-Number City State First Name Last Name
0 3844556210 Alameda California Alex jhon
1 2245551219 Sanford Florida Hamza Azeez
2 1049956215 Columbus Georgia Harry barton
Java Struts2 框架 java.lang.NoSuchFieldError: EMPTY_BYTE_ARRAY 错误解决
PHP扩展开发 ini配置项定义
php zookeeper你需要知道的细节
PHP操作redis的两种方式
PHP简体转繁体——MediaWiki-zhconvert
PHP7 中 include、require 相对于PHP5你所不知道的一些
Python 错误 IsADirectoryError: [Errno 21] Is a directory 解决方法
PHP——json_encode中文编码问题
PHP重写session机制