添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
I have the following line in Python 2:-

msgstr = string.join(popmsg[1], "\n") # popmsg[1] is a list containing the lines of the message

... so I changed it to:-

s = "\n"
msgstr = s.join(popmsg[1]) # popmsg[1] is a list containing the lines of the message

However this still doesn't work because popmsg[1] isn't a list of
strings, I get the error:-

TypeError: sequence item 0: expected str instance, bytes found

So how do I do this? I can see clumsy ways by a loop working through
the list in popmsg[1] but surely there must be a way that's as neat
and elegant as the Python 2 way was?
--
Chris Green
·
On 2020-08-26 at 14:22:10 +0100,
Post by Chris Green
I have the following line in Python 2:-
msgstr = string.join(popmsg[1], "\n") # popmsg[1] is a list containing the lines of the message
... so I changed it to:-
s = "\n"
msgstr = s.join(popmsg[1]) # popmsg[1] is a list containing the lines of the message
However this still doesn't work because popmsg[1] isn't a list of
strings, I get the error:-
TypeError: sequence item 0: expected str instance, bytes found
So how do I do this? I can see clumsy ways by a loop working through
the list in popmsg[1] but surely there must be a way that's as neat
and elegant as the Python 2 way was?
Join bytes objects with a byte object:

b"\n".join(popmsg[1])
Post by 2***@potatochowder.com
On 2020-08-26 at 14:22:10 +0100,
Post by Chris Green
I have the following line in Python 2:-
msgstr = string.join(popmsg[1], "\n") # popmsg[1] is a list containing
the lines of the message
Post by Chris Green
... so I changed it to:-
s = "\n"
msgstr = s.join(popmsg[1]) # popmsg[1] is a list containing the lines of the message
However this still doesn't work because popmsg[1] isn't a list of
strings, I get the error:-
TypeError: sequence item 0: expected str instance, bytes found
So how do I do this? I can see clumsy ways by a loop working through
the list in popmsg[1] but surely there must be a way that's as neat
and elegant as the Python 2 way was?
b"\n".join(popmsg[1])
Aaahhh! Thank you (and the other reply).
--
Chris Green
·
Post by Chris Green
Post by 2***@potatochowder.com
b"\n".join(popmsg[1])
Aaahhh! Thank you (and the other reply).
But note: joining bytes like strings is uncommon, and may indicate that
you should be working in strings to start with. Eg you may want to
convert popmsg from bytes to str and do a str.join anyway. It depends on
exactly what you're dealing with: are you doing text work, or are you
doing "binary data" work?

I know many network protocols are "bytes-as-text, but that is
accomplished by implying an encoding of the text, eg as ASCII, where
characters all fit in single bytes/octets.

Cheers,
Cameron Simpson <***@cskk.id.au>
Post by Cameron Simpson
Post by Chris Green
Post by 2***@potatochowder.com
b"\n".join(popmsg[1])
Aaahhh! Thank you (and the other reply).
But note: joining bytes like strings is uncommon, and may indicate that
you should be working in strings to start with. Eg you may want to
convert popmsg from bytes to str and do a str.join anyway. It depends on
exactly what you're dealing with: are you doing text work, or are you
doing "binary data" work?
I know many network protocols are "bytes-as-text, but that is
accomplished by implying an encoding of the text, eg as ASCII, where
characters all fit in single bytes/octets.
Yes, I realise that making everything a string before I start might be
the 'right' way to do things but one is a bit limited by what the mail
handling modules in Python provide.

E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-

popmsg = pop3.retr(i+1)

I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.

Should I be converting to string somewhere? I guess the POP3 and SMTP
libraries will cope with strings as input. Can I convert to string
after the join for example? If so, how? Can I just do:-

msgbytes = b'\n'.join(popmsg[1])
msgstr = str(mshbytes)

(Yes, I know it can be one line, I was just being explicit).

... or do I need to stringify the lines returned by popmsg() before
joining them together?


Thank you for all your help and comments!

(I'm a C programmer at heart, preceded by being an assembler
programmer. I started programming way back in the 1970s, I'm retired
now and Python is for relaxation (?) in my dotage)
--
Chris Green
·
Post by Chris Green
Post by Cameron Simpson
But note: joining bytes like strings is uncommon, and may indicate that
you should be working in strings to start with. Eg you may want to
convert popmsg from bytes to str and do a str.join anyway. It depends on
exactly what you're dealing with: are you doing text work, or are you
doing "binary data" work?
I know many network protocols are "bytes-as-text, but that is
accomplished by implying an encoding of the text, eg as ASCII, where
characters all fit in single bytes/octets.
Yes, I realise that making everything a string before I start might be
the 'right' way to do things but one is a bit limited by what the mail
handling modules in Python provide.
I do ok, though most of my message processing happens to messages
already landed in my "spool" Maildir by getmail. My setup uses getmail
to get messages with POP into a single Maildir, and then I process the
message files from there.
Post by Chris Green
E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-
popmsg = pop3.retr(i+1)
Ok, so you have bytes? You need to know.
Post by Chris Green
I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.
Should I be converting to string somewhere?
I have not used poplib, but the Python email modules have a BytesParser,
which gets you a Message object; I would feed the poplib bytes to that
to parse the received message. A Message object can then be transcribed
as text via its .as_string method. Or you can do other things with it.

I think my main points are:

- know whether you're using bytes (uninterpreted data) or text (strings
of _characters_); treating bytes _as_ text implies an encoding, and
when that assumption is incorrect you get mojibake[1]

- look at the email modules' parsers, which return Messages, a
representation of the message in a structure (so that MIME subparts
etc are correctly broken out, and the character sets are _known_, post
parse)

[1] https://en.wikipedia.org/wiki/Mojibake

Cheers,
Cameron Simpson <***@cskk.id.au>
Post by Cameron Simpson
Post by Chris Green
Post by Cameron Simpson
But note: joining bytes like strings is uncommon, and may indicate that
you should be working in strings to start with. Eg you may want to
convert popmsg from bytes to str and do a str.join anyway. It depends on
exactly what you're dealing with: are you doing text work, or are you
doing "binary data" work?
I know many network protocols are "bytes-as-text, but that is
accomplished by implying an encoding of the text, eg as ASCII, where
characters all fit in single bytes/octets.
Yes, I realise that making everything a string before I start might be
the 'right' way to do things but one is a bit limited by what the mail
handling modules in Python provide.
I do ok, though most of my message processing happens to messages
already landed in my "spool" Maildir by getmail. My setup uses getmail
to get messages with POP into a single Maildir, and then I process the
message files from there.
Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
on my desktop machine which stays on permanently.

The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider. It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.
Post by Cameron Simpson
Post by Chris Green
E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-
popmsg = pop3.retr(i+1)
Ok, so you have bytes? You need to know.
The documentation says (and it's exactly the same for Python 2 and
Python 3):-

POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).

Which isn't amazingly explicit unless 'line' implies a string.
Post by Cameron Simpson
Post by Chris Green
I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.
Should I be converting to string somewhere?
I have not used poplib, but the Python email modules have a BytesParser,
which gets you a Message object; I would feed the poplib bytes to that
to parse the received message. A Message object can then be transcribed
as text via its .as_string method. Or you can do other things with it.
- know whether you're using bytes (uninterpreted data) or text (strings
of _characters_); treating bytes _as_ text implies an encoding, and
when that assumption is incorrect you get mojibake[1]
- look at the email modules' parsers, which return Messages, a
representation of the message in a structure (so that MIME subparts
etc are correctly broken out, and the character sets are _known_, post
parse)
OK, thanks Cameron.
--
Chris Green
·
Post by Chris Green
Post by Cameron Simpson
I do ok, though most of my message processing happens to messages
already landed in my "spool" Maildir by getmail. My setup uses getmail
to get messages with POP into a single Maildir, and then I process the
message files from there.
Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
on my desktop machine which stays on permanently.
I run postfix on my machines too, including my laptop, but mostly for
sending - it means I can queue messages while offline, and they'll go
out later.

I don't receive SMTP on my laptop (which is where my mail lives); I
receive elsewhere such as the machine hosting my email domain (which
also runs postfix), and the various external addresses I have (one for
each ISP of course, and a couple of external email addresses such as a
GMail one (largely to interact with stuff like Google Groups, which is
pretty parochial).

So I use getmail to fetch from most of these (GMail just forwards a copy
of everything "personal" to my primary address) and deliver to a spool
Maildir on my laptop, and the mailfiler processes the spool Maildir.
Post by Chris Green
The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider. It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.
Very strong email policy, that one. Personally I fear data loss, and
process everything; anything which doesn't match a rule lands in my
"UNKNOWN" mail folder for manual consideration when I'm bored. It is
largely spam, but sometimes has a message wanting a new filing rule.
Post by Chris Green
Post by Cameron Simpson
Post by Chris Green
E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-
popmsg = pop3.retr(i+1)
Ok, so you have bytes? You need to know.
The documentation says (and it's exactly the same for Python 2 and
Python 3):-
POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).
Which isn't amazingly explicit unless 'line' implies a string.
Aye. But "print(repr(a_pop_line))" will tell you. Almost certainly a
string-of-bytes, so I would expect bytes. The docs are probably
unchanged during the Python2->3 move.
Post by Chris Green
Post by Cameron Simpson
Post by Chris Green
I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.
Ah. I like Maildirs for analysis; every message has its own file, which
makes adding and removing messages easy, and avoids contention with
other things using the Maildir.

My mailfiler can process Maildirs (scan, add, remove) and add to
Maildirs and mboxes.

Cheers,
Cameron Simpson <***@cskk.id.au>
Cameron Simpson <***@cskk.id.au> wrote:
[snip]
Post by Cameron Simpson
Post by Chris Green
The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider. It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.
Very strong email policy, that one. Personally I fear data loss, and
process everything; anything which doesn't match a rule lands in my
"UNKNOWN" mail folder for manual consideration when I'm bored. It is
largely spam, but sometimes has a message wanting a new filing rule.
It's not *that* strong, the catchall is for *anything* that is
addressed to either of the two domains hosted there. I.e. mail for
***@isbd.net will arrive in the catchall mailbox. So I just
search the To: address for anything that might be a typo for one of
our names or anything else that might be of interest. I have an
associated configuration file that specifies the patterns to look for
so I can change things on the fly as it were.

One of the scripts that I'm having trouble converting to Python 3 is
the one that does this catchall management.
Post by Cameron Simpson
Post by Chris Green
Post by Cameron Simpson
Post by Chris Green
E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-
popmsg = pop3.retr(i+1)
Ok, so you have bytes? You need to know.
The documentation says (and it's exactly the same for Python 2 and
Python 3):-
POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).
Which isn't amazingly explicit unless 'line' implies a string.
Aye. But "print(repr(a_pop_line))" will tell you. Almost certainly a
string-of-bytes, so I would expect bytes. The docs are probably
unchanged during the Python2->3 move.
Yes, I added some print statments to my catchall script to find out
and, yes, the returned value is a list of 'byte strings'. It's a pity
there isn't a less ambiguous name for 'string-of-bytes'! :-)
Post by Cameron Simpson
Post by Chris Green
Post by Cameron Simpson
Post by Chris Green
I join the lines to feed them into mailbox.mbox() to create a mbox I
can analyse and also a message which can be sent using SMTP.
Ah. I like Maildirs for analysis; every message has its own file, which
makes adding and removing messages easy, and avoids contention with
other things using the Maildir.
My mailfiler can process Maildirs (scan, add, remove) and add to
Maildirs and mboxes.
I've switched to maildir several times in the past and have always
switched back because they have so many 'standards'. I use mutt as my
MUA and that does handle maildir as well as anything but still doesn't
do it for me. :-)
--
Chris Green
·
Post by Chris Green
POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).
Which isn't amazingly explicit unless 'line' implies a string.
The last term in the result is a clue... The count of octets is how
many 8-bit bytes are in the message. They may not match the number of
characters if one is parsing/decoding the lines as UTF-8, say.

Look at the email RFCs -- which specify lines as "octets" IE; sequence
of 8-bit bytes.

How those bytes are to be interpreted when displaying the message is up
to the presence of MIME character set declarations -- and can vary between
parts of a multipart/mixed message, so can NOT be done before storing in
the mailbox.
--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
Post by Chris Green
Yes, I realise that making everything a string before I start might be
the 'right' way to do things but one is a bit limited by what the mail
handling modules in Python provide.
E.g. in this case the only (well the only ready made) way to get a
POP3 message is using poplib and this just gives you a list of lines
made up of "bytes as text" :-
popmsg = pop3.retr(i+1)
Which is reasonable. The headers are limited to ASCII
https://tools.ietf.org/html/rfc5322#section-2.2

Bodies depend upon the MIME headers, if present... but as a "message"
https://tools.ietf.org/html/rfc2045#section-2.10 they are "octets" (ie;
bytes and need to be interpreted/translated when rendering the body for
display)

It is only when parsing the headers, and possibly sections of the body,
that one learns HOW these bytes are to be interpreted.

{from a sample message in my client}
Header region:

Content-Type: multipart/alternative;
boundary="000000000000bc9cdb05adbfeb7d"

Body region:

--000000000000bc9cdb05adbfeb7d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
...

--000000000000bc9cdb05adbfeb7d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 8bit
...

The raw message is considered just a sequence of bytes. One has to
apply the proper encode/decode operation for the "lines" after determining
the character set used by the section being processed.

You probably do NOT want to convert from the bytes before adding the
message to the mailbox. After all, when you later read the mailbox for
subsequent processing/display, that software is going to read the headers
to determine how to render the message. If you've already processed it,
things might get confused...

https://docs.python.org/3/library/mailbox.html
"""
add(message)

Add message to the mailbox and return the key that has been assigned to
it.

Parameter message may be a Message instance, an email.message.Message
instance, a string, a byte string, or a file-like object (which should be
open in binary mode).
"""
NOTE that "byte string" is valid.

I'd also like to point out that raw mail messages use <cr><lf> as the
line ending. I'd have to experiment to determine if the mailbox module
converts "\n" to "\r\n" when writing to the file (which, on Windows, is the
normal line ending anyway) or wants the full ending specified (I'd have to
set up a Linux client to get a sample message in a mailbox for examination
-- since I can't tell if my mail client on Windows explicitly writes the
\r\n or relies on the I/O system to translate \n => \r\n).
--
Wulfraed Dennis Lee Bieber AF6VN
***@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
Post by Chris Green
I have the following line in Python 2:-
msgstr = string.join(popmsg[1], "\n") # popmsg[1] is a list containing the lines of the message
... so I changed it to:-
s = "\n"
msgstr = s.join(popmsg[1]) # popmsg[1] is a list containing the lines of the message
However this still doesn't work because popmsg[1] isn't a list of
strings, I get the error:-
TypeError: sequence item 0: expected str instance, bytes found
So how do I do this? I can see clumsy ways by a loop working through
the list in popmsg[1] but surely there must be a way that's as neat
and elegant as the Python 2 way was?
Well, the simple fix is to set s to b"\n" but that may not solve all of your
problems. The issue is that popmsg[1] is a list of bytes. You probably
want a list of strings. I would look further back and think about getting a
list of strings in the first place. Without knowing how popmsg was created
we can't tell you how to do that.

Of course, if a bytes object is what you want then the above will work. You
can also convert to string after the join.

Cheers.
--
D'Arcy J.M. Cain
Vybe Networks Inc.
A unit of Excelsior Solutions Corporation - Propelling Business Forward
http://www.VybeNetworks.com/
IM:***@VybeNetworks.com VoIP: sip:***@VybeNetworks.com
Post by Chris Green
I have the following line in Python 2:-
msgstr = string.join(popmsg[1], "\n") # popmsg[1] is a list containing the lines of the message
... so I changed it to:-
s = "\n"
msgstr = s.join(popmsg[1]) # popmsg[1] is a list containing the lines of the message
However this still doesn't work because popmsg[1] isn't a list of
strings, I get the error:-
TypeError: sequence item 0: expected str instance, bytes found
So how do I do this? I can see clumsy ways by a loop working through
the list in popmsg[1] but surely there must be a way that's as neat
and elegant as the Python 2 way was?
In Python 3, bytestring literals require the 'b' prefix:

msgstr = b"\n".join(popmsg[1])