pandas: PERF: pandas import is too slow

The following test demonstrates the problem… the contents of testme.py is literally import pandas; however, it takes almost 6 seconds to import pandas on my Lenovo T60.

[mpenning@Mudslide panex]$ time python testme.py 

real    0m5.759s
user    0m5.612s
sys 0m0.120s
[mpenning@Mudslide panex]$
[mpenning@Mudslide panex]$ uname -a
Linux Mudslide 3.2.0-4-686-pae #1 SMP Debian 3.2.57-3+deb7u1 i686 GNU/Linux
[mpenning@Mudslide panex]$ python -V
Python 2.7.3
[mpenning@Mudslide panex]$
[mpenning@Mudslide panex]$ pip freeze
Babel==1.3
Cython==0.20.1
Flask==0.10.1
Flask-Babel==0.8
Flask-Login==0.2.7
Flask-Mail==0.7.6
Flask-OpenID==1.1.1
Flask-SQLAlchemy==0.16
Flask-WTF==0.8.4
Flask-WhooshAlchemy==0.54a
Jinja2==2.7.1
MarkupSafe==0.18
Pygments==1.6
SQLAlchemy==0.7.9
Sphinx==1.2.2
Tempita==0.5.1
WTForms==1.0.5
Werkzeug==0.9.4
Whoosh==2.5.4
argparse==1.2.1
backports.ssl-match-hostname==3.4.0.2
blinker==1.3
ciscoconfparse==1.1.1
decorator==3.4.0
docutils==0.11
dulwich==0.9.6
## FIXME: could not find svn URL in dependency_links for this package:
flup==1.0.3.dev-20110405
hg-git==0.5.0
ipaddr==2.1.11
itsdangerous==0.23
matplotlib==1.3.1
mercurial==3.0
mock==1.0.1
nose==1.3.3
numexpr==2.4
numpy==1.8.1
numpydoc==0.4
pandas==0.13.1
pyparsing==2.0.2
python-dateutil==2.2
python-openid==2.2.5
pytz==2013b
six==1.6.1
speaklater==1.3
sqlalchemy-migrate==0.7.2
tables==3.1.1
tornado==3.2.1
wsgiref==0.1.2

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 5
  • Comments: 50 (22 by maintainers)

Most upvoted comments

Please, do not ignore this issue. It’s closed, but I also found problems with a long import duration. Maybe it should be picked up. Create awareness about this issue and higher the prio? Otherwise it is not good for the popularity of pandas.

+1. I see anywhere between 400 - 700ms.

I know this has been closed for awhile but I’m seeing the same thing and it is not pandas specific. We have our pandas environment in a virtualenv on a drive on a server. That drive is then mounted by each client. This allows us to maintain a sane package environment among all users. However, this is clearly sacrificing startup time to an unreasonable extent. The import times in seconds are as follows:

Package Server Client
pandas 1.22 6.23
numpy .2 1.2

So clearly this is a setup issue, but how do other companies deal with this problem? I find it hard to believe that packages are installed locally on every user’s box and if that isn’t the case, that they experience these long startup times.

The network itself is working fine…transfer speeds are ~120MB/s.

For reference, here’s an import profile using Python 3.7’s importtime and tuna:

python3.7 -X importtime -c "import pandas" 2> pandas.log
tuna pandas.log

pandas

Can somebody please profile a simple “import pandas” and we can see if the problem is easily identified?

Our solution is to set up a web server, and using post request to the algorithm part, and the time for import “pandas” package could be reduced.

FYI – I have an SSD on my PC so if there is a disk seek issue that some people have, I don’t see it. numpy 1.12 takes 0.17 seconds to import.

For me, the first load is 4s. The the OS caches the library in memory, so it’s around 300-500ms. Wait a little while, and try again.

Best, Jacob On Tue, Jan 10, 2017 at 10:42 PM, Jeff Reback notifications@github.com wrote:

not sure what you think is cached

– +919971876580 twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh ) web: http://www.jacobsingh.name Skype: pajamadesign gTalk: jacobsingh@gmail.com

I don’t think that will help - then I’ll have the delay every reload (since all my code works with pandas).

I’d done this in a terminal window:

‘’’ while true; do date && python -m timeit -n1 -r1 “import pandas”; sleep 2; done; ‘’’

Doing this keeps pandas in the OS cache. Stupid hack, but keeps loading down to 300-500ms.

-J

On Sat, Jan 7, 2017 at 7:27 AM, Bryce Guinta notifications@github.com wrote:

One workaround is to isolate all the code that interacts with pandas and lazily import that code only when you need it so that the wait period is only during program execution. (that’s what I do)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/7282#issuecomment-271054552, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHchj5AFRzCAFrswjpTz1B7e0XM-mo1ks5rPvEDgaJpZM4B_lQ5 .

– +919971876580 twitter: @JacobSingh ( http://twitter.com/#!/JacobSingh ) web: http://www.jacobsingh.name Skype: pajamadesign gTalk: jacobsingh@gmail.com

Same problem, (pandas 0.18) although mine is not as awful: 400ms just to import pandas on a local SSD. I can’t imagine how bad this would be for someone using say a networked filesystem.

I have the same problem -> 6s import time, local install (anaconda, pandas '0.14.1). This is impossibly slow, especially trying to import on multiple processes.

I have the same problem. Was this closed because you found a solution? I’d be grateful if you could share it. Thanks.