ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
df.to_excel(ew)
I get the following error:
IllegalCharacterError Traceback (most recent call last)
<ipython-input-4-62adec25ae8d> in <module>()
1 ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
2 #df.to_excel("test.xlsx")
----> 3 df.to_excel(ew)
/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
86 else:
87 kwargs[new_arg_name] = new_arg_value
---> 88 return func(*args, **kwargs)
89 return wrapper
90 return _deprecate_kwarg
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep)
1258 formatted_cells = formatter.get_formatted_cells()
1259 excel_writer.write_cells(formatted_cells, sheet_name,
-> 1260 startrow=startrow, startcol=startcol)
1261 if need_save:
1262 excel_writer.save()
/usr/local/lib/python2.7/dist-packages/pandas/io/excel.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
679 colletter = get_column_letter(startcol + cell.col + 1)
680 xcell = wks.cell("%s%s" % (colletter, startrow + cell.row + 1))
--> 681 xcell.value = _conv_value(cell.val)
682 style_kwargs = {}
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in value(self, value)
360 def value(self, value):
361 """Set the value and infer type and display options."""
--> 362 self._bind_value(value)
364 @property
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
269 elif self.guess_types:
270 value = self._infer_value(value)
--> 271 self.set_explicit_value(value, self.data_type)
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in set_explicit_value(self, value, data_type)
235 raise ValueError('Invalid data type: %s' % data_type)
236 if isinstance(value, STRING_TYPES):
--> 237 value = self.check_string(value)
238 self._value = value
239 self.data_type = data_type
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in check_string(self, value)
220 value = value[:32767]
221 if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 222 raise IllegalCharacterError
223 return value
IllegalCharacterError:
How can I write a pandas dataframe containing unicode to an excel file?
Accepted answer
Not a Unicode issue as such... \x16
(or in Unicode strings \u0016
refers to the same character) is ASCII control code 22 (SYN). Pandas says it's invalid to have control codes (other than tab and newlines) in an Excel file, and though I don't know much about Excel files it would certainly be impossible to include them in an XML 1.0 file, which is what's inside a xlsx.
So most likely there is no way to include arbitrary character sequences (with control codes) in an Excel. You should filter them out before writing, or if you really need to preserve the original data use some form of ad hoc encoding recognised only by your application.
Similar question
Pandas - Writing an excel file containing unicode - IllegalCharacterError
pandas writing to excel sheet deleting other sheets in file
problems writing a pandas DataFrame into a unicode text file
Writing Pandas Dataframes as well as numpy arrays to a common Excel File
Unicode error when writing Python Pandas dataframe to CSV file
Writing a pandas DataFrame to CSV file
Reading an Excel file in python using pandas
Pandas cannot open an Excel (.xlsx) file
Writing pandas DataFrame to JSON in unicode
Writing large Pandas Dataframes to CSV file in chunks
score:0
I don't know this particular language, but generally there is an error with excel and UTF8. If you just open a file of UTF8 characters with excel programatically, it will corrupt them (it doesn't seem to handle all the bits in the character, but truncates it to effectively the first 2 and last 2 hex numbers of the 8 present in extended characters).
A work around, to load a utf file correctly into excel, is to get the program insert a macro into your excel sheet after you have loaded it which imports the data. I have some code to do this in C#, if that's any help?
does your input contain any extended characters (i.e. àâäçæèëéêìïîñòöôœûüùÿÀÂÄÇÆÈËÉÊÌÏÎÑÒÖÔŒÛÜÙŸ) and if you take them out, does it work?
When I encounter this error, I usually go around it by writing the file to a '.csv
instead of '.xlsx'
files.
So instead of
yourdataframe.to_excel('Your workbook name.xlsx')
I would do:
yourdataframe.to_csv('Your workbook name.csv')
It appears the way pandas
decodes .csv
files by default is:
encoding : string, optional
A string representing the encoding to use in the output file,
defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
On the other hand default encoding of .xlsx
files is:
encoding: string, default None
encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
This difference is responsible for that error. You will also get the error when you write data with strings that start with -
or +
to a .xlsx
file.
Some more answer related to the same question
Unicode Encode Error when writing pandas df to csv
Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?
Python: Writing Images and dataframes to the same excel file
pandas to_csv: suppress scientific notation in csv file when writing pandas to csv
PANDAS & glob - Excel file format cannot be determined, you must specify an engine manually
How to stop writing a blank line at the end of csv file - pandas
Pandas dataframe and character encoding when reading excel file
Python Pandas - Read csv file containing multiple tables
Insert pandas chart into an Excel file using XlsxWriter
Read Excel XML .xls file with pandas
score:0
for writing a data frame containing unicode characters to multiple sheets in a single excel file below code can be helpful:
%pip install xlsxwriter
from pandas import ExcelWriter
import xlsxwriter
writer = ExcelWriter('notes.xlsx')
for key in dict_df:
data[key].to_excel(writer, key,index=False,engine='xlsxwriter')
writer.save()
I've answered a similar question at this post: https://stackoverflow.com/a/63950544/1851492, below is the same content.
If you don't want to install another excel writer engine (e.g. xlsxwriter), you may try to remove these illegal characters by looking for the pattern which cause IllegalCharacterError
raised.
Open cell.py
which under the path /path/to/your/python/site-packages/openpyxl/cell/
, look for check_string
function, you'll see it using a defined regular expression pattern ILLEGAL_CHARACTERS_RE
to find those illegal characters. Trying to locate its definition you'll see this line:
ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')
This line is what you need to remove those characters. Copy this line to your program and execute below code before your dataframe is writing to excel:
dataframe = dataframe.applymap(lambda x: ILLEGAL_CHARACTERS_RE.sub(r'', x) if isinstance(x, str) else x)
The above line will apply remove those characters to every cells.
Use this to remove any error that you might be getting. You can save to excel post this.
df = df.applymap(lambda x: x.encode('unicode_escape').
decode('utf-8') if isinstance(x, str) else x)
The same problem happened to me. I solved it as follows:
First, install python package xlsxwriter:
pip install xlsxwriter
Second, replace the default engine 'openpyxl' with 'xlsxwriter':
df.to_excel("test.xlsx", engine='xlsxwriter')
Store Excel file exported from Pandas in AWS
From password-protected Excel file to pandas DataFrame
Writing Percentages in Excel Using Pandas
Writing multiple pandas dataframes to multiple excel worksheets
Setting default number format when writing to Excel from Pandas
Pandas unable to open this Excel file
Writing a formated binary file from a Pandas Dataframe
Read multiple excel file with different sheets names in pandas
Error when writing python pandas dataframe to csv file
No module named xlsxwriter error while writing pandas df to excel
More Query from same tag
Python series where values are lists, get another series with list of indexes correspond to each item list
Python: how to keep leading zeros with dataframe.to_csv
How to use two columns in x-axis
What is the fastest way to stream a large csv file?
Logic operator for boolean indexing with np.nan in Pandas
How do you specify a Pandas DataFrame schema/structure in a docstring?
tz_localize: KeyError: ('Asia/Singapore', u'occurred at index 0')
Issue with Merging in Pandas
How do I separate a dictionary that has two keys into two seperate new columns?
How to check all the columns of a dataframe are the same?
Select values from any column/row based on criteria
For loop is only storing the last value in colum
Plotly Distplot subplots
How to use the `to_latex` method of pandas in Jupyter to obtain code that one can use directly in LaTeX?
Pandas Series fillna with default date
Numpy: does matrix-vector multiplication not skip computation when some vector elements are equal to zero?
ImportError importing .pyd - DLL load failed. Cython
How do I convert a numpy Array to a data type that tensorflow can classify?
TypeError when passing 2d numpy array to C++
Numpy Documentation
Pandas: how to concatenate a MultiIndex DataFrame with a single index DataFrame, and custom ordering
Pandas: Check if value is epoch time using python
pandas lag multi-index irregular time series data by number of months
Pandas slice string based on str.find as position to start and stop
Multiplying multiple columns in a DataFrame
How not to show/dipslay an index in csv when adding a new row using python pandas