pandas: PERF: pandas import is too slow
The following test demonstrates the problem… the contents of testme.py
is literally import pandas
; however, it takes almost 6 seconds to import pandas on my Lenovo T60.
[mpenning@Mudslide panex]$ time python testme.py
real 0m5.759s
user 0m5.612s
sys 0m0.120s
[mpenning@Mudslide panex]$
[mpenning@Mudslide panex]$ uname -a
Linux Mudslide 3.2.0-4-686-pae #1 SMP Debian 3.2.57-3+deb7u1 i686 GNU/Linux
[mpenning@Mudslide panex]$ python -V
Python 2.7.3
[mpenning@Mudslide panex]$
[mpenning@Mudslide panex]$ pip freeze
Babel==1.3
Cython==0.20.1
Flask==0.10.1
Flask-Babel==0.8
Flask-Login==0.2.7
Flask-Mail==0.7.6
Flask-OpenID==1.1.1
Flask-SQLAlchemy==0.16
Flask-WTF==0.8.4
Flask-WhooshAlchemy==0.54a
Jinja2==2.7.1
MarkupSafe==0.18
Pygments==1.6
SQLAlchemy==0.7.9
Sphinx==1.2.2
Tempita==0.5.1
WTForms==1.0.5
Werkzeug==0.9.4
Whoosh==2.5.4
argparse==1.2.1
backports.ssl-match-hostname==3.4.0.2
blinker==1.3
ciscoconfparse==1.1.1
decorator==3.4.0
docutils==0.11
dulwich==0.9.6
## FIXME: could not find svn URL in dependency_links for this package:
flup==1.0.3.dev-20110405
hg-git==0.5.0
ipaddr==2.1.11
itsdangerous==0.23
matplotlib==1.3.1
mercurial==3.0
mock==1.0.1
nose==1.3.3
numexpr==2.4
numpy==1.8.1
numpydoc==0.4
pandas==0.13.1
pyparsing==2.0.2
python-dateutil==2.2
python-openid==2.2.5
pytz==2013b
six==1.6.1
speaklater==1.3
sqlalchemy-migrate==0.7.2
tables==3.1.1
tornado==3.2.1
wsgiref==0.1.2
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 5
- Comments: 50 (22 by maintainers)
Please, do not ignore this issue. It’s closed, but I also found problems with a long import duration. Maybe it should be picked up. Create awareness about this issue and higher the prio? Otherwise it is not good for the popularity of pandas.
+1. I see anywhere between 400 - 700ms.
I know this has been closed for awhile but I’m seeing the same thing and it is not pandas specific. We have our pandas environment in a virtualenv on a drive on a server. That drive is then mounted by each client. This allows us to maintain a sane package environment among all users. However, this is clearly sacrificing startup time to an unreasonable extent. The import times in seconds are as follows:
So clearly this is a setup issue, but how do other companies deal with this problem? I find it hard to believe that packages are installed locally on every user’s box and if that isn’t the case, that they experience these long startup times.
The network itself is working fine…transfer speeds are ~120MB/s.
For reference, here’s an import profile using Python 3.7’s
importtime
and tuna:Can somebody please profile a simple “import pandas” and we can see if the problem is easily identified?
Our solution is to set up a web server, and using post request to the algorithm part, and the time for import “pandas” package could be reduced.
FYI – I have an SSD on my PC so if there is a disk seek issue that some people have, I don’t see it. numpy 1.12 takes 0.17 seconds to import.
For me, the first load is 4s. The the OS caches the library in memory, so it’s around 300-500ms. Wait a little while, and try again.
Best, Jacob On Tue, Jan 10, 2017 at 10:42 PM, Jeff Reback notifications@github.com wrote:
– +919971876580 twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh ) web: http://www.jacobsingh.name Skype: pajamadesign gTalk: jacobsingh@gmail.com
I don’t think that will help - then I’ll have the delay every reload (since all my code works with pandas).
I’d done this in a terminal window:
‘’’ while true; do date && python -m timeit -n1 -r1 “import pandas”; sleep 2; done; ‘’’
Doing this keeps pandas in the OS cache. Stupid hack, but keeps loading down to 300-500ms.
-J
On Sat, Jan 7, 2017 at 7:27 AM, Bryce Guinta notifications@github.com wrote:
– +919971876580 twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh ) web: http://www.jacobsingh.name Skype: pajamadesign gTalk: jacobsingh@gmail.com
Same problem, (pandas 0.18) although mine is not as awful: 400ms just to
import pandas
on a local SSD. I can’t imagine how bad this would be for someone using say a networked filesystem.I have the same problem -> 6s import time, local install (anaconda, pandas '0.14.1). This is impossibly slow, especially trying to import on multiple processes.
I have the same problem. Was this closed because you found a solution? I’d be grateful if you could share it. Thanks.