This issue tracker
has been migrated to
GitHub
,
and is currently
read-only
.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on
2016-11-07 23:08
by
vstinner
, last changed
2022-04-11 14:58
by
admin
. This issue is now
closed
.
Files
File name
Uploaded
Description
site-not-use-re.patch
serhiy.storchaka
,
2016-11-08 12:48
The changeset 223731925d06 of the issue issue #28082 "use IntFlag for re constants" made Python startup 34% slower:
- rev 5637c9b4dd4c: Median +- std dev: 19.5 ms +- 0.0 ms
- rev 223731925d06: Median +- std dev: 26.2 ms +- 0.0 ms
The change adds "import enum" in Lib/re.py, so the enum module is imported, whereas it wasn't before. It seems like importing enum takes 6.7 ms.
I propose to revert the change 223731925d06 in Python 3.6 and reopen the issue #28082 to try to find another solution for the re module which doesn't impact Python startup performance.
Ethan Furman added the comment:
> Victor, can you add the commands you are using for timing so I can work on making enum imports faster?
Results of the python_startup benchmark:
https://speed.python.org/timeline/#/?exe=4&ben=python_startup&env=1&revs=50&equid=off&quarts=on&extr=on
This website uses the following benchmark suite:
https://github.com/python/performance
You can run the benchmark using:
python3 -m performance run -b python_startup
Oh, I understood why I had issues to reproduce the startup performance slowdown. When Python is run from the source code using ./python, the re module is not imported. When Python is installed, the re is not imported by default neither:
haypo@speed-python$ prefix/bin/python3 -c 'import sys; print("re" in sys.modules)'
False
BUT when Python is started from a virtual environment (created by the "venv" module), the re module is important by default.
haypo@speed-python$ venv/bin/python3 -c 'import sys; print("re" in sys.modules)'
If the site module is not imported, the re module is not imported:
haypo@speed-python$ venv/bin/python3 -S -c 'import sys; print("re" in sys.modules)'
False
The /home/haypo/benchmarks/prefix/lib/python3.6/site.py file is generated by the venv module and contains:
def venv(...):
if candidate_confs:
import re
config_line = re.compile(CONFIG_LINE)
New changeset d903a243c281 by Victor Stinner in branch '3.6':
Issue #28637: Revert issue #28082, don't import enum in re
https://hg.python.org/cpython/rev/d903a243c281
With being this close to release and with two weeks to go until the final beta, it seems a bit premature to immediately revert this without exploring all the alternatives especially since it seems the effects are limited to venvs. In any case, we do need to have a final answer well before the b4 cutoff (2016-11-21). Possible options:
1. leave the re contents reverted (current status)
2. find a way to mitigate the performance impact of importing re and enum (perhaps making them builtins in Setup.dist?)
3. avoid the use of re in site.py venv
4. others?
That would be a tiny but nice enhancement.
re is a commonly used module. I would prefer to not make its import time slower. Python 3 startup is already something like 3x slower than Python 2. Please don't make it even slower :-( We worked on optimizing Python 3 startup time.
> 2. find a way to mitigate the performance impact of importing re and enum (perhaps making them builtins in Setup.dist?)
I reverted the change, because Guido and Ethan seem to be in favor of a revert, and I don't expect any simple solution for this issue.
I'm quite sure that importing enum.py is slow. Optimizing enum.py import time is a large task.
Another option to explore is to delay the creation/instanciation of the re flags, something like lazy module import... but at the module attribute level :-)
Please leave this as a "release blocker" until we have had a chance to explore all the options for 3.6.0b4. We may still very well decide that reverting is the least bad option but I don't want to rule out other options without some further investigation. We have time and the reverting could be a fairly significant change.
Hum strange, the changeset d903a243c281 (the revert) optimized regex_compile: 480 ms (rev 573bc1f9900e) => 403 ms (rev cf7711887b4a).
But it didn't restore python_startup performance: 26.ms (rev 573bc1f9900e) => 24.6 ms (rev cf7711887b4a).
Another change made re import slower: revision 88110cfbf4dc of issue #28193. If I also revert this change locally, python_startup performance is restored to 19.6 ms.
By the way, regex_compile performance is the same with and without the revision 88110cfbf4dc, but the benchmark purges the re cache at each iteration of the benchmark.
* 3.4: 18.7 ms +- 0.4 ms
* 3.5: 20.3 ms +- 0.7 ms
* 3.6: 26.9 ms +- 0.6 ms (rev
c4319c0d0131
, before the revert)
* 3.7: 26.6 ms +- 0.5 ms (rev
36af3566b67a
, before the revert)
* 3.7: 24.6 ms +- 0.0 ms (rev
cf7711887b4a
, after the revert)
So what does that benchmark measure? For me, python 3.6 startup is 44 ms
and python 2.7 startup is 78 ms (real time; user time is proportionally
less).
On Mon, Nov 7, 2016 at 5:40 PM, STINNER Victor <[email protected]>
wrote:
> STINNER Victor added the comment:
> python_startup:
> * 2.7: 7.74 ms +- 0.28 ms
> * 3.4: 18.7 ms +- 0.4 ms
> * 3.5: 20.3 ms +- 0.7 ms
> * 3.6: 26.9 ms +- 0.6 ms (rev c4319c0d0131, before the revert)
> * 3.7: 26.6 ms +- 0.5 ms (rev 36af3566b67a, before the revert)
> * 3.7: 24.6 ms +- 0.0 ms (rev cf7711887b4a, after the revert)
> ----------
> _______________________________________
> Python tracker <[email protected]>
> <http://bugs.python.org/issue28637>
> _______________________________________
STINNER Victor added the comment:
>BUT when Python is started from a virtual environment (created by the
>"venv" module), the re module is important by default.
>haypo@speed-python$ venv/bin/python3 -c 'import sys; print("re" in sys.modules)'
Exciting, I just verified that this is true and running python3 from a venv really seems to be the only situation, in which the re module gets imported during startup (at least it's only this one branch in site.py that uses it).
If adding a single enum import to re causes such a big startup time difference I wonder how much more could be gained for the venv case by not importing re at all!
Turns out that the complete code block in site.py that is used by venvs and that was partially shown by @haypo is:
CONFIG_LINE = r'^(?P<key>(\w|[-_])+)\s*=\s*(?P<value>.*)\s*$'
def venv(known_paths):
global PREFIXES, ENABLE_USER_SITE
env = os.environ
if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
executable = os.environ['__PYVENV_LAUNCHER__']
else:
executable = sys.executable
exe_dir, _ = os.path.split(os.path.abspath(executable))
site_prefix = os.path.dirname(exe_dir)
sys._home = None
conf_basename = 'pyvenv.cfg'
candidate_confs = [
conffile for conffile in (
os.path.join(exe_dir, conf_basename),
os.path.join(site_prefix, conf_basename)
if os.path.isfile(conffile)
if candidate_confs:
import re
config_line = re.compile(CONFIG_LINE)
virtual_conf = candidate_confs[0]
system_site = "true"
# Issue 25185: Use UTF-8, as that's what the venv module uses when
# writing the file.
with open(virtual_conf, encoding='utf-8') as f:
for line in f:
line = line.strip()
m = config_line.match(line)
if m:
d = m.groupdict()
key, value = d['key'].lower(), d['value']
if key == 'include-system-site-packages':
system_site = value.lower()
elif key == 'home':
sys._home = value
sys.prefix = sys.exec_prefix = site_prefix
# Doing this here ensures venv takes precedence over user-site
addsitepackages(known_paths, [sys.prefix])
# addsitepackages will process site_prefix again if its in PREFIXES,
# but that's ok; known_paths will prevent anything being added twice
if system_site == "true":
PREFIXES.insert(0, sys.prefix)
else:
PREFIXES = [sys.prefix]
ENABLE_USER_SITE = False
return known_paths
So all the re module is good for here is to parse simple config file records with key/value pairs separated by '='. ´Shouldn't it be straightforward to implement that logic right inside that block directly without requiring a giant import?
This should easily be doable for 3.6 still, seems as if it would solve the whole issue and probably speed up the performance tests much more than any reverted changesets could.
What do you think?
site-not-use-re.patch: The change on site.py LGTM. I don't see why re was used in the first place for such simple text format ;-) I suggest to push the fix into Python 3.6 and 3.7.
But what is the change in Lib/collections/__init__.py?
@serhiy.storchaka you've beaten me by a few minutes (still waiting for the test suite to finish). Your patch is "contaminated" by an additional change in collections/__init__.py though so I'm still uploading my version (which also tries to stick as close as possible to the existing behaviour though that probably won't matter much)
It seems to me that current regular expression is not correct. It can keep trailing whitespaces it a value.
I'm not experienced with venv and don't know what tests are needed.
The effect is limited to venv only in a narrow sense. The startup time for running programs from an IDLE editor is also affected. Many other programs also import re either directly or indirectly on startup. For instance, importing argparse imports re. Running Python from Command Prompt:
Python 3.6.0b3 (default, Nov 1 2016, 03:21:01) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; 're' in sys.modules
False
>>> import argparse
>>> import sys; 're' in sys.modules
New changeset a822818ec74e by Serhiy Storchaka in branch '3.6':
Issue #28637: No longer use re in site.py.
https://hg.python.org/cpython/rev/a822818ec74e
New changeset a4ea837a7f84 by Serhiy Storchaka in branch 'default':
Issue #28637: No longer use re in site.py.
https://hg.python.org/cpython/rev/a4ea837a7f84
Guido: "So what does that benchmark measure? For me, python 3.6 startup is 44 ms and python 2.7 startup is 78 ms (real time; user time is proportionally less)."
It measures something like "time python -c pass".
The performance module creates a virtual environment to run benchmarks to get reproductible timings. If you benchmark the Python startup time using your system Python, the timing depends on the number of installed .pth files.
The code of python_startup benchmark:
https://github.com/python/performance/blob/master/performance/benchmarks/bm_python_startup.py
Note: I didn't write the benchmark, it comes from the old benchmark suite.
re: Ethan's question - I think the enum use should be restored in re.
I realize issue28082 (yay palindrome number) is not an urgent change but we created IntEnum for the purpose of more identifiable integer constants.
So a microbenchmark of "import re" slows down. so what? I don't find this to be a big deal. Other standard library modules also use enum and I expect more to do so in the future. In realistic size programs other things use enum as well so there isn't a hit.
PS thanks for the site.py improvements!
Importing empty module: 160 us
Creating empty class: 30 us
Creating empty function: 0.16 us
Creating empty Enum/IntEnum: 125/150 us
Creating Enum/IntEnum member: 25/27 us
Creating empty namedtuple: 600 us
Creating namedtuple member: 50 us
Importing the itertools module: 40 us
Importing the io module: 900 us
Importing the os module: 1600 us
Importing the functools module: 2100 us
Importing the re module (with all sre submodules): 3300 us
Python startup time: 43000 us
About the re module, there is still a regression on the performance of
the import time: see my msg280271.
We may also revert the changeset 88110cfbf4dc (issue #28193) if we
want to keep "import re" as fast as Python 3.5.
The question is if we consider that the re module is part of the
Python core modules which must be blasing fast to import.
Honestly now the basic startup time is under control I am fine with
restoring the re enum. We can optimize that later.
--Guido (mobile)
Oh nice, python_startup benchmark at Nov 10 (rev at 8ebaa546a033) is now faster than python_startup at Feb 29 (rev dfeadfb4c8cc): new 16.8 ms, old 19.5. And it's much before when I opened the issue: 26.6 ms. Thanks.
https://speed.python.org/timeline/#/?exe=4&ben=python_startup&env=1&revs=50&equid=off&quarts=on&extr=on
The main difference is the removal of "import re" from Lib/site.py. If I added back "import re", python_startup comes back to 24.2 ms +- 0.0 ms. I understand that "import re" adds 7.4 ms. Any additional import has an important cost of python startup performance :-/
New changeset 5fd69d4a93e0 by Victor Stinner in branch '3.6':
Issue #28637: Reapply changeset 223731925d06
https://hg.python.org/cpython/rev/5fd69d4a93e0
Ethan Furman: "Does this mean we can put enum back in re?"
I checked manually: adding "import enum" to Lib/re.py has no more impact on "python_startup" benchmark.
Guido van Rossum: "Honestly now the basic startup time is under control I am fine with restoring the re enum. We can optimize that later."
I concur with Guido, so I reapplied Ethan's changeset. Enjoy readable str() on re flags ;-)
The initial bug (performance regression in python_startup) is fixed. Thanks everyone for the nice discussion. I close the issue.
Please open new issues if you want to continue efforts on making CPython faster ;-)
2022-04-11 14:58:39adminsetgithub: 72823
2017-02-17 15:37:46gvanrossumsetnosy:
- gvanrossum
2017-02-17 10:25:34christian.heimessetnosy:
+ christian.heimes
2016-11-14 11:55:02vstinnersetstatus: open -> closed
resolution: fixed
messages:
+ msg280760
2016-11-14 11:54:14vstinnersetmessages:
+ msg280758
2016-11-14 11:40:55python-devsetmessages:
+ msg280753
2016-11-12 07:09:08serhiy.storchakasetmessages:
+ msg280636
2016-11-12 04:33:07rhettingersetnosy:
+ rhettinger
messages:
+ msg280629
2016-11-11 02:46:44gvanrossumsetmessages:
+ msg280559
2016-11-11 02:43:18ethan.furmansetmessages:
+ msg280558
2016-11-10 23:28:11vstinnersetmessages:
+ msg280552
2016-11-09 16:04:51gvanrossumsetmessages:
+ msg280409
2016-11-09 09:05:18vstinnersetmessages:
+ msg280383
2016-11-09 08:49:35serhiy.storchakasetmessages:
+ msg280380
2016-11-09 07:30:29gregory.p.smithsetnosy:
+ gregory.p.smith
messages:
+ msg280378
2016-11-09 01:24:59gvanrossumsetmessages:
+ msg280370
title: Python startup performance regression due to enum in re -> Python startup performance regression
2016-11-08 23:37:29ethan.furmansetmessages:
+ msg280362
title: Python startup performance regression -> Python startup performance regression due to enum in re
2016-11-08 21:40:08vstinnersetmessages:
+ msg280353
2016-11-08 20:55:04vstinnersetmessages:
+ msg280346
2016-11-08 18:18:45python-devsetmessages:
+ msg280328
2016-11-08 15:11:25terry.reedysetnosy:
+ terry.reedy
messages:
+ msg280312
2016-11-08 14:17:56wolmasetnosy:
+ vinay.sajip
2016-11-08 13:12:32serhiy.storchakasetmessages:
+ msg280306
2016-11-08 12:58:08serhiy.storchakasetfiles:
+ site-not-use-re.patch
2016-11-08 12:58:01wolmasetfiles:
+ no_use_re_in_site_py.patch
messages:
+ msg280305
2016-11-08 12:57:02serhiy.storchakasetmessages:
+ msg280304
2016-11-08 12:53:04vstinnersetmessages:
+ msg280302
2016-11-08 12:48:57serhiy.storchakasetfiles:
+ site-not-use-re.patch
nosy:
+ serhiy.storchaka
messages:
+ msg280301
keywords:
+ patch
2016-11-08 12:08:24wolmasetnosy:
+ wolma
messages:
+ msg280296
2016-11-08 01:50:08gvanrossumsetmessages:
+ msg280274
2016-11-08 01:40:29vstinnersetmessages:
+ msg280273
2016-11-08 01:19:46gvanrossumsetmessages:
+ msg280272
2016-11-08 01:18:14vstinnersetmessages:
+ msg280271
2016-11-08 01:01:53ned.deilysetpriority: release blocker
messages:
+ msg280269
2016-11-08 00:49:34vstinnersetpriority: release blocker -> (no value)
messages:
+ msg280268
2016-11-08 00:49:15vstinnersetmessages:
+ msg280267
2016-11-08 00:35:17ned.deilysetmessages:
+ msg280266
2016-11-07 23:46:30python-devsetnosy:
+ python-dev
messages:
+ msg280262
2016-11-07 23:38:35vstinnersetmessages:
+ msg280261
2016-11-07 23:31:02vstinnersetmessages:
+ msg280260
2016-11-07 23:27:43ethan.furmansetmessages:
+ msg280259
2016-11-07 23:20:31gvanrossumsetnosy:
+ gvanrossum
messages:
+ msg280258
2016-11-07 23:08:51vstinnersetpriority: normal -> release blocker
nosy:
+ ned.deily
2016-11-07 23:08:21vstinnercreate