DataFrame created by DataFrame.apply() - Dask DataFrame

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

健壮的皮带 · python DataFrame循环读取 ...· 2 周前 ·

没有腹肌的蚂蚁 · Reading an excel file ...· 1 周前 ·

唠叨的豆芽 · Pandas中multiindex转换成列_ ...· 6 天前 ·

飘逸的热带鱼 · 2023年热门优质学术会议推荐 | ...· 8 月前 ·

心软的饺子 · 仙子下地狱风从云第二卷风华正茂下载 ...· 10 月前 ·

长情的大脸猫 · PySpark Custom Config ...· 10 月前 ·

好帅的冲锋衣 · 杭州学军中学校长陈萍：远方or苟且？卓越or ...· 1 年前 ·

谦逊的沙滩裤 · 今天去哪吃？照着美食地图总没错· 1 年前 ·

Hello guys,

I just discovered Dask, I have to deal with huge data (1.6TB per csv file) and I think Dask can help me

I need to apply “basic” data transformation, and I am using apply() function to do so.

I have this function.

def extract_data(row):
  ret=dict()
  # do regexp stuff on a specific column
  # generate a few values, store them in the dict ret
 return ret['value1'],ret['value2']
then I apply this function to the daskdataframe
meta=[ ('value1', str),('value2',str) ]
newddf = ddf.apply(extract_data, axis=1, meta=meta)
print(newdf) gives me something like that:
Dask DataFrame Structure:
                value 1 value2  
npartitions=1                                                            
               object  object    
                  ...     ...       
Dask Name: apply, 12 tasks
when I try to run newddf.head() I have an error

**AttributeError** : 'DataFrame' object has no attribute 'name'
What did I do wrong ?
I can run exactly the same code on a pandas dataframe with no issue.
Thanks for your help !
              @pfrenard Welcome to Discourse!
I was able to reproduce this and the error is in how you’re defining meta. The output of extract_data is a tuple, and meta needs to match that. You can use something like: meta = ("Result", object)
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({'x': list(range(5))})
ddf = dd.from_pandas(df, npartitions=2)
def extract_data(row):
    ret = {'value1': 'p', 'value2': 'q'}
    return ret['value1'], ret['value2']
meta = ("Result", object)
newddf = ddf.apply(extract_data, axis=1, meta=meta)
newddf.compute()
Ref docs: dask.dataframe.DataFrame.apply — Dask documentation

推荐文章

健壮的皮带 · python DataFrame循环读取获取某行某列的值_mob649e816138f5的技术博客_

2 周前

没有腹肌的蚂蚁 · Reading an excel file using Python - GeeksforGeeks

1 周前

唠叨的豆芽 · Pandas中multiindex转换成列_multiindex转为列

6 天前

飘逸的热带鱼 · 2023年热门优质学术会议推荐 | 涵盖能源、环境、土木、交通、遥感、理学、计算机、机电等多个方向~

8 月前

心软的饺子 · 仙子下地狱风从云第二卷风华正茂下载 _全文在线阅读_风从云小说作品 - 精品小说网

10 月前

长情的大脸猫 · PySpark Custom Config not being considered - Cloudera Community - 319008

10 月前

好帅的冲锋衣 · 杭州学军中学校长陈萍：远方or苟且？卓越or平庸？取决于……

1 年前

谦逊的沙滩裤 · 今天去哪吃？照着美食地图总没错

1 年前