添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

This issue tracker has been migrated to GitHub , and is currently read-only .
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2014-09-08 12:22 by serhiy.storchaka , last changed 2022-04-11 14:58 by admin . This issue is now closed .

Files File name Uploaded Description re_errors_regex.patch serhiy.storchaka , 2014-11-10 16:50
In some cases standard re module and third-party regex modules raise exceptions with different error messages.
1. re.match(re.compile('.'), 'A', re.I)
  re:    Cannot process flags argument with a compiled pattern
  regex: can't process flags argument with a compiled pattern
2. re.compile('(?P<foo_123')
  re:    unterminated name
  regex: missing >
3. re.compile('(?P<foo_123>a)(?P=foo_123')
  re:    unterminated name
  regex: missing )
4. regex.sub('(?P<a>x)', r'\g<a', 'xx')
  re:    unterminated group name
  regex: missing >
5. re.sub('(?P<a>x)', r'\g<', 'xx')
  re:    unterminated group name
  regex: bad group name
6. re.sub('(?P<a>x)', r'\g<a a>', 'xx')
  re:    bad character in group name
  regex: bad group name
7. re.sub('(?P<a>x)', r'\g<-1>', 'xx')
  re:    negative group number
  regex: bad group name
8. re.compile('(?P<foo_123>a)(?P=!)')
  re:    bad character in backref group name '!'
  regex: bad group name
9. re.sub('(?P<a>x)', r'\g', 'xx')
  re:    missing group name
  regex: missing <
10. re.compile('a\\')
    re.sub('x', '\\', 'x')
  re:    bogus escape (end of line)
  regex: bad escape
11. re.compile(r'\1')
  re:    bogus escape: '\1'
  regex: unknown group
12. re.compile('[a-')
  re:    unexpected end of regular expression
  regex: bad set
13. re.sub(b'.', 'b', b'c')
  re:    expected bytes, bytearray, or an object with the buffer interface, str found
  regex: expected bytes instance, str found
14. re.compile(r'\w', re.UNICODE | re.ASCII)
  re:    ASCII and UNICODE flags are incompatible
  regex: ASCII, LOCALE and UNICODE flags are mutually incompatible
15. re.compile('(abc')
  re:    unbalanced parenthesis
  regex: missing )
16. re.compile('abc)')
  re:    unbalanced parenthesis
  regex: trailing characters in pattern
17. re.compile(r'((.)\1+)')
  re:    cannot refer to open group
  regex: can't refer to an open group
Looks as in one case re messages are better, and in other cases regex messages are better. In any case it would be good to unify error messages in both modules.
I'm dubious about this issue. It suggests that the wording of the exceptions is part of the API of the two modules.
If the idea is just to copy the best error messages from one module to the other, then I guess there is no harm. But if the idea is to guarantee to keep the two modules' messages in sync, then I think it is unnecessary and harmful.
> re:    Cannot process flags argument with a compiled pattern
> regex: can't process flags argument with a compiled pattern
Error messages usually start with a lowercase letter, and I think that all the other ones in the re module do.
By the way, which is preferred, "cannot" or "can't"? The regex module always uses "can't", but re module uses "cannot" except for "TypeError: can't use a bytes pattern on a string-like object", I think.
Also, you said that one of the re module's messages was better, but didn't say which! Did you mean this one?
> re:    expected bytes, bytearray, or an object with the buffer interface, str found
> regex: expected bytes instance, str found
> By the way, which is preferred, "cannot" or "can't"? The regex module always
> uses "can't", but re module uses "cannot" except for "TypeError: can't use
> a bytes pattern on a string-like object", I think.
It's interesting question. Grepping in CPython sources got results:
Cannot  210
cannot  865
Can't   216
can't   796
Lowercase wins uppercase with score 4:1 and short and long forms are 
equivalent.
I left the decision to English speakers.
> Also, you said that one of the re module's messages was better, but didn't
> say which! Did you mean this one?
> > re:    expected bytes, bytearray, or an object with the buffer interface,
> > str found
> > regex: expected bytes instance, str found
Both are not good. re variant is too verbose, but it is more correct.
May be 6, 7, 8, 10, 11, 16, 18 are better in re.
Steven and Mark are correct that a tracker patch cannot change a 3rd party module.  On the other hand, we are free to improve error messages in new versions.  And we are free to borrow ideas from 3rd part modules. I changed the title accordingly. 
(Back compatibility comes into play in not making message enhancements in bugfix releases even though message details are not part of the documented API. People who write code that depends on those details, and doctexts need not so depend, should expect to revise for new versions.  I expect that some of our re tests would need to be changed.)
Re and regex are a bit special in that regex is the only re replacement (that I  know of) and is (almost) a drop-in replacement.  So some people *are*, on their own, replacing re with regex by installing regex (easy with pip) and adding 'import regex as re' at the top of their code.
Serhiy suggested either picking the best or writing a new one, I think a new one combining both would be best in many of the cases.  As a user, I like "name missing terminal '>'" for #2 (is there an adjective for a name in this context?) and for #4, "group name missing terminal '>'".  (Note that we usually quote literals, as in #8.)  For #12, I would like a parallel construction "set expression missing terminal ']'" if that is possible.  But the currently vague re message "unexpected end of regular expression" might be raised as a point where the specific information is lost and only the general version is correct.
As for #14, either UNICODE and LOCALE *are* compatible (for re) or this is buggy.
>>> import re
>>> re.compile(r'\w', re.UNICODE | re.LOCALE)
re.compile('\\w', re.LOCALE|re.UNICODE)
On Fri, Sep 19, 2014 at 08:41:57PM +0000, Mark Lawrence wrote:
> I do not believe that any changes should be made to the re module 
> until such time as there is a fully approved PEP [....]
Why is this so controversial? We're not talking about functional changes 
to the re module, we're talking about improving error messages. Firstly, 
the actual wording of error messages are not part of the API and are 
subject to change without notice. Secondly, nobody is talking about 
keeping the two modules syncronised on an on-going basis. This is just 
to improve the re error messages using regex as inspiration.
Some error messages use the indefinite article:
    "expected a bytes-like object, %.200s found"
    "cannot use a bytes pattern on a string-like object"
    "cannot use a string pattern on a bytes-like object"
but others don't:
    "expected string instance, %.200s found"
    "expected str instance, %.200s found"
Messages tend to be abbreviated, so I think that it would be better to just omit the article.
I don't think that the error message "bad repeat interval" is an improvement (Why is it "bad"? What is an "interval"?). I think that saying that the min is greater than the max is clearer.
> Messages tend to be abbreviated, so I think that it would be better to just
> omit the article.
I agree, but this is came from standard error messages which are not 
consistent. I opened a thread on Python-Dev.
"expected a bytes-like object" and "expected str instance" are standard error 
messages raised in bytes.join and str.join, not in re. We could change them 
though.
> I don't think that the error message "bad repeat interval" is an improvement
> (Why is it "bad"? What is an "interval"?). I think that saying that the min
> is greater than the max is clearer.
Agree. I'll change this in re. What message is better in case of overflow: "the 
repetition number is too large" (in re) or "repeat count too big" (in regex)?
New changeset 068365acbe73 by Serhiy Storchaka in branch 'default':
Issue #22364: Improved some re error messages using regex for hints.
https://hg.python.org/cpython/rev/068365acbe73
2022-04-11 14:58:07adminsetgithub: 66560 2015-03-25 19:05:44serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved 2015-03-25 19:04:39python-devsetnosy: + python-dev
messages: + msg239279
2015-03-02 08:04:39serhiy.storchakalink issue433028 dependencies 2015-03-01 11:04:50serhiy.storchakasetmessages: + msg236954 2015-02-24 22:04:55serhiy.storchakasetfiles: + regex_errors2.diff

messages: + msg236551 2015-02-24 21:56:26serhiy.storchakasetfiles: + re_errors_3.patch

messages: + msg236549 2015-02-20 08:59:21serhiy.storchakasetmessages: + msg236257 2015-02-18 23:24:47mrabarnettsetmessages: + msg236201 2015-02-18 18:24:20serhiy.storchakasetfiles: + regex_errors.diff

messages: + msg236188 2015-02-10 10:29:59serhiy.storchakasetfiles: + re_errors_2.patch

messages: + msg235678 2015-02-07 23:37:38serhiy.storchakasetfiles: + re_errors_diff.txt

messages: + msg235534 2015-02-07 23:34:38serhiy.storchakasetfiles: + re_errors.patch

messages: + msg235532
stage: needs patch -> patch review 2014-11-12 00:46:21terry.reedysetmessages: + msg231057 2014-11-10 16:50:24serhiy.storchakasetfiles: + re_errors_regex.patch
keywords: + patch
messages: + msg230965
2014-11-02 15:07:52serhiy.storchakasetdependencies: + Add additional attributes to re.error , Other mentions of the buffer protocol 2014-11-02 07:34:55rhettingersetnosy: + rhettinger
messages: + msg230481
2014-11-01 22:13:22ezio.melottisetmessages: + msg230464
stage: needs patch 2014-10-05 17:44:35serhiy.storchakasetmessages: + msg228599
title: Unify error messages of re and regex -> Improve some re error messages using regex for hints 2014-09-19 23:56:03steven.dapranosetmessages: + msg227132
title: Improve some re error messages using regex for hints -> Unify error messages of re and regex 2014-09-19 23:03:38terry.reedysetmessages: + msg227130
title: Unify error messages of re and regex -> Improve some re error messages using regex for hints 2014-09-19 20:41:57BreamoreBoysetmessages: + msg227119 2014-09-19 08:03:28serhiy.storchakasetmessages: + msg227084 2014-09-18 21:08:20BreamoreBoysetnosy: + BreamoreBoy
messages: + msg227065
2014-09-12 22:07:15terry.reedysetnosy: + terry.reedy
messages: + msg226847
2014-09-11 17:16:40serhiy.storchakasetmessages: + msg226790 2014-09-09 16:08:09mrabarnettsetmessages: + msg226641 2014-09-09 14:15:41serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg226635 2014-09-08 19:30:18steven.dapranosetnosy: + steven.daprano
messages: + msg226599
2014-09-08 12:32:22serhiy.storchakasetmessages: + msg226576 2014-09-08 12:22:56serhiy.storchakacreate