I've tried io, repr() etc, they don't work!
Problem inputting å (\xe5):
(None of these work)
import sys
print(sys.stdin.read(1))
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), errors='replace', encoding='iso-8859-1', newline='\n')
print(sys.stdin.read(1))
x = sys.stdin.buffer.read(1)
print(x.decode('utf-8'))
They all give me roughly UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: unexpected end of data
Also tried starting Python with: export PYTHONIOENCODING=utf-8 doesn't work either.
Now, here's where i'm at:
import sys, codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
sys.stdin = codecs.getwriter("utf-8")(sys.stdin.detach())
x = sys.stdin.read(1)
print(x.decode('utf-8', 'replace'))
This gives me: �
It's close...
How can i take a \xe5 and turn it into å in my console?
Without it breaking input() as well, because this solution breaks it.
Note: I know this has been asked before, but non of those solve it.. especially not io
Some info of my system
os.environ['LANG'] == 'C'
sys.getdefaultencoding() == 'utf-8'
sys.stdout.encoding == 'ANSI_X3.4-1968'
sys.stdin.encoding == 'ANSI_X3.4-1968'
My os: ArchLinux running xterm
Running locale -a gives me: C | POSIX | sv_SE.utf8
I've followed these:
(and a few 50 more)
Solution (sort of, still breaks input())
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach())
sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach())
x = sys.stdin.read(1)
print(x.decode('latin-1', 'replace'))
Went with a programatical approach in Python3 instead of changing the terminals codec:
import sys, codecs
sys.stdout = codecs.getwriter("latin-1")(sys.stdout.detach())
sys.stdin = codecs.getwriter("latin-1")(sys.stdin.detach())
sys.stdout.write(sys.stdin.read(1).decode('latin-1', 'replace'))
This does not only make you choose/match against your terminals "encoding", it actually requires no outside influence (such as export LANG=sv_SE.ISO-8859-1).
The only downside:
input('something: ')
Will break, fix for that is:
# Since it's bad practice to name function the
# same as __builtins__, we'll go ahead and call it something
# we're used to but isn't in use any more.
def raw_input(txt):
sys.stdout.write(txt)
sys.stdout.flush()
sys.stdin.flush()
return sys.stdin.readline().strip()
A big thanks to Martijn for telling why and that in fact the data is latin-1!
I've tried io, repr() etc, they don't work!Problem inputting å (\xe5):(None of these work)import sysprint(sys.stdin.read(1))sys.stdin = io.TextIOWrapper(sys.stdin.detach(), errors='replace', encoding=...
Python中文编码问题是Coder时常碰到的烦心问题
Python一般采用
Ascii
,Unicode编码,但是世界上各个国家语言存在各种编码,比如中国的gbk,gb2312等。
首先,一般情况,python默认会认为源代码文件是
ascii
编码
使用unicode对象的话,除了这样使用u标记,还可以使用unicode类以及
字符
串的encode和decode方法。
unicode类的构...
今天执行脚本时遇到一个问题,提示:icodeDecodeError: ‘utf8’ codec can’t decode byte
0xe5
in position 0: unexpected end of data
代码如下:
contactList[0].send_keys('测试')
解决办法:
contactList[0].send_keys('姜莲叶'
灵感来源:https://bl...
完整报错是:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 301, in _on_run
r = r.decode('
utf-8
')
UnicodeDecodeError: '
utf-8
' codec can't decode byte
0xe5
in pos