添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
  • Pandas read_sql_query converting integer column to float
  • Element wise numeric comparison in Pandas dataframe column value with list
  • Convert this Word DataFrame into Zero One Matrix Format DataFrame in Python Pandas
  • Dropping ID of multiindex dataframe if a specific column only includes NaAs for that ID
  • Iterate pandas.DataFrame efficiently while accessing more than one index row at a time
  • Pandas: How to split on multiple delimiters?
  • Flattening pandas DataFrame
  • Pandas dataframe sorting string values and by descending aggregated values
  • What is the quickest way to increment date string YYYY-MM-DD in Python?
  • Pandas DataFrame: Merge rows with same id
  • How to write a function that iterates over groups of pandas dataframe and applies hierarchical conditions?
  • How to style highlight categorical dtype in pandas style?
  • Performance decrease for huge amount of columns. Pyspark
  • Separate pandas df by repeating range in a column
  • Convert dataframe column with values in the form of a list into a string in the same column
  • What is going wrong with `unionAll` of Spark `DataFrame`?
  • Same function over multiple data frames in R
  • Accessing column with df[col] gives: Error 'x' must be atomic for 'sort.list'
  • Spark Dataframe groupBy and sort results into a list
  • Strategies for formatting JSON output from R
  • How to "negative select" columns in spark's dataframe
  • Spark - Group by HAVING with dataframe syntax?
  • Combine PySpark DataFrame ArrayType fields into single ArrayType field
  • How to get the schema definition from a dataframe in PySpark?
  • Converting a list of data frames into individual data frames in R
  • django

  • Django rest framework extra serializer field with get and post
  • Django: How to import a javascript inside another javascript file
  • Does django csrf token must be unique on every request?
  • Dealing with multiple apps overriding management commands in Django
  • Abstract base class inheritance in Django with foreignkey
  • Django LEFT JOIN?
  • can you submit a form within a jquery modal dialog
  • sort by latitude in geodjango
  • Fetch API for Django POST requests
  • Unable to create an integration test for Django's reset password flow
  • Or, if I try to write this dataframe as an excel file, with utf-8 encoding:

    ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
    df.to_excel(ew)
    

    I get the following error:

    IllegalCharacterError                     Traceback (most recent call last)
    <ipython-input-4-62adec25ae8d> in <module>()
          1 ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
          2 #df.to_excel("test.xlsx")
    ----> 3 df.to_excel(ew)
    /usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
         86                 else:
         87                     kwargs[new_arg_name] = new_arg_value
    ---> 88             return func(*args, **kwargs)
         89         return wrapper
         90     return _deprecate_kwarg
    /usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep)
       1258         formatted_cells = formatter.get_formatted_cells()
       1259         excel_writer.write_cells(formatted_cells, sheet_name,
    -> 1260                                  startrow=startrow, startcol=startcol)
       1261         if need_save:
       1262             excel_writer.save()
    /usr/local/lib/python2.7/dist-packages/pandas/io/excel.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
        679             colletter = get_column_letter(startcol + cell.col + 1)
        680             xcell = wks.cell("%s%s" % (colletter, startrow + cell.row + 1))
    --> 681             xcell.value = _conv_value(cell.val)
        682             style_kwargs = {}
    /usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in value(self, value)
        360     def value(self, value):
        361         """Set the value and infer type and display options."""
    --> 362         self._bind_value(value)
        364     @property
    /usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
        269             elif self.guess_types:
        270                 value = self._infer_value(value)
    --> 271         self.set_explicit_value(value, self.data_type)
    /usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in set_explicit_value(self, value, data_type)
        235             raise ValueError('Invalid data type: %s' % data_type)
        236         if isinstance(value, STRING_TYPES):
    --> 237             value = self.check_string(value)
        238         self._value = value
        239         self.data_type = data_type
    /usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in check_string(self, value)
        220         value = value[:32767]
        221         if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
    --> 222             raise IllegalCharacterError
        223         return value
    IllegalCharacterError: 
    

    How can I write a pandas dataframe containing unicode to an excel file?

    Accepted answer

    Not a Unicode issue as such... \x16 (or in Unicode strings \u0016 refers to the same character) is ASCII control code 22 (SYN). Pandas says it's invalid to have control codes (other than tab and newlines) in an Excel file, and though I don't know much about Excel files it would certainly be impossible to include them in an XML 1.0 file, which is what's inside a xlsx.

    So most likely there is no way to include arbitrary character sequences (with control codes) in an Excel. You should filter them out before writing, or if you really need to preserve the original data use some form of ad hoc encoding recognised only by your application.

    Similar question

  • Pandas - Writing an excel file containing unicode - IllegalCharacterError
  • pandas writing to excel sheet deleting other sheets in file
  • problems writing a pandas DataFrame into a unicode text file
  • Writing Pandas Dataframes as well as numpy arrays to a common Excel File
  • Unicode error when writing Python Pandas dataframe to CSV file
  • Writing a pandas DataFrame to CSV file
  • Reading an Excel file in python using pandas
  • Pandas cannot open an Excel (.xlsx) file
  • Writing pandas DataFrame to JSON in unicode
  • Writing large Pandas Dataframes to CSV file in chunks
  • score:0

    I don't know this particular language, but generally there is an error with excel and UTF8. If you just open a file of UTF8 characters with excel programatically, it will corrupt them (it doesn't seem to handle all the bits in the character, but truncates it to effectively the first 2 and last 2 hex numbers of the 8 present in extended characters).

    A work around, to load a utf file correctly into excel, is to get the program insert a macro into your excel sheet after you have loaded it which imports the data. I have some code to do this in C#, if that's any help?

    does your input contain any extended characters (i.e. àâäçæèëéêìïîñòöôœûüùÿÀÂÄÇÆÈËÉÊÌÏÎÑÒÖÔŒÛÜÙŸ) and if you take them out, does it work?

    When I encounter this error, I usually go around it by writing the file to a '.csv instead of '.xlsx' files. So instead of

    yourdataframe.to_excel('Your workbook name.xlsx')
    

    I would do:

    yourdataframe.to_csv('Your workbook name.csv')
    

    It appears the way pandas decodes .csv files by default is:

    encoding : string, optional
    A string representing the encoding to use in the output file,
    defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
    

    On the other hand default encoding of .xlsx files is:

    encoding: string, default None
    encoding of the resulting excel file. Only necessary for xlwt,
    other writers support unicode natively.
    

    This difference is responsible for that error. You will also get the error when you write data with strings that start with - or + to a .xlsx file.

    Some more answer related to the same question

  • Unicode Encode Error when writing pandas df to csv
  • Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?
  • Python: Writing Images and dataframes to the same excel file
  • pandas to_csv: suppress scientific notation in csv file when writing pandas to csv
  • PANDAS & glob - Excel file format cannot be determined, you must specify an engine manually
  • How to stop writing a blank line at the end of csv file - pandas
  • Pandas dataframe and character encoding when reading excel file
  • Python Pandas - Read csv file containing multiple tables
  • Insert pandas chart into an Excel file using XlsxWriter
  • Read Excel XML .xls file with pandas
  • score:0

    for writing a data frame containing unicode characters to multiple sheets in a single excel file below code can be helpful:

    %pip install xlsxwriter
    from pandas import ExcelWriter
    import xlsxwriter
    writer = ExcelWriter('notes.xlsx')
    for key in dict_df:
            data[key].to_excel(writer, key,index=False,engine='xlsxwriter')
    writer.save()
                                            

    I've answered a similar question at this post: https://stackoverflow.com/a/63950544/1851492, below is the same content.

    If you don't want to install another excel writer engine (e.g. xlsxwriter), you may try to remove these illegal characters by looking for the pattern which cause IllegalCharacterError raised.

    Open cell.py which under the path /path/to/your/python/site-packages/openpyxl/cell/, look for check_string function, you'll see it using a defined regular expression pattern ILLEGAL_CHARACTERS_RE to find those illegal characters. Trying to locate its definition you'll see this line:

    ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')

    This line is what you need to remove those characters. Copy this line to your program and execute below code before your dataframe is writing to excel:

    dataframe = dataframe.applymap(lambda x: ILLEGAL_CHARACTERS_RE.sub(r'', x) if isinstance(x, str) else x)

    The above line will apply remove those characters to every cells.

    Use this to remove any error that you might be getting. You can save to excel post this.

    df = df.applymap(lambda x: x.encode('unicode_escape').
                     decode('utf-8') if isinstance(x, str) else x)
                                            

    The same problem happened to me. I solved it as follows:

    First, install python package xlsxwriter:

    pip install xlsxwriter
    

    Second, replace the default engine 'openpyxl' with 'xlsxwriter':

    df.to_excel("test.xlsx", engine='xlsxwriter')
                                                
  • Store Excel file exported from Pandas in AWS
  • From password-protected Excel file to pandas DataFrame
  • Writing Percentages in Excel Using Pandas
  • Writing multiple pandas dataframes to multiple excel worksheets
  • Setting default number format when writing to Excel from Pandas
  • Pandas unable to open this Excel file
  • Writing a formated binary file from a Pandas Dataframe
  • Read multiple excel file with different sheets names in pandas
  • Error when writing python pandas dataframe to csv file
  • No module named xlsxwriter error while writing pandas df to excel
  • More Query from same tag

  • Python series where values are lists, get another series with list of indexes correspond to each item list
  • Python: how to keep leading zeros with dataframe.to_csv
  • How to use two columns in x-axis
  • What is the fastest way to stream a large csv file?
  • Logic operator for boolean indexing with np.nan in Pandas
  • How do you specify a Pandas DataFrame schema/structure in a docstring?
  • tz_localize: KeyError: ('Asia/Singapore', u'occurred at index 0')
  • Issue with Merging in Pandas
  • How do I separate a dictionary that has two keys into two seperate new columns?
  • How to check all the columns of a dataframe are the same?
  • Select values from any column/row based on criteria
  • For loop is only storing the last value in colum
  • Plotly Distplot subplots
  • How to use the `to_latex` method of pandas in Jupyter to obtain code that one can use directly in LaTeX?
  • Pandas Series fillna with default date
  • Numpy: does matrix-vector multiplication not skip computation when some vector elements are equal to zero? ImportError importing .pyd - DLL load failed. Cython How do I convert a numpy Array to a data type that tensorflow can classify? TypeError when passing 2d numpy array to C++ Numpy Documentation Pandas: how to concatenate a MultiIndex DataFrame with a single index DataFrame, and custom ordering Pandas: Check if value is epoch time using python pandas lag multi-index irregular time series data by number of months Pandas slice string based on str.find as position to start and stop Multiplying multiple columns in a DataFrame How not to show/dipslay an index in csv when adding a new row using python pandas