zipline: Benchmark downloading is broken

Fix benchmark downloading from Google with pandas-datareader. This issue was originally brought up here.

We now get benchmark data from Google instead of Yahoo, as seen here.

However, it appears that as of only a week or two ago, Google changed the URL from which they are serving their financial data, causing pandas datareader to break. This is also preventing us from rebuilding the test_examples data. (For more info see the original post above).

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 36 (7 by maintainers)

Most upvoted comments

Not only does this problem still exist, even after fixing the url to be finance.google.com, you still get an error that you’re sending automated requests. We can overcome this, but the google finance api is just plain unstable anyway. Better off using quandl…or custom bundle symbol.

What I am failing to understand is why we’re downloading a benchmark from any website when we have a bundle? set_benchmark doesn’t seem to care at all, which is very strange. Should be able to use benchmarking symbol from our custom set.

Quick solution: Use a manually downloaded local copy of SPY (yahoo lets you download manually the entire history). I modified the benchmarks.py to look for a local csv copy instead. I attach the modified benchmarks.py file it should replace the existing one (so make a a copy of the original first before you overwrite it). The benchmarks.py file is usually found in: %USERPROFILE%\Anaconda3\envs\py34\Lib\site-packages\zipline\data. If you didn’t create a unique environment for it then don’t specify py34 after envs.

Also make sure that your local directory is reflected in this line in the code: new_dir = ‘c:/Downloaded_csv’

benchmarks.txt

You can try setting the benchmark to an asset that’s already in your bundle. For example if running the example algos with AAPL, tell Zipline to use AAPL as your benchmark.

from zipline.api import symbol, set_benchmark

def initialize(context):
    set_benchmark(symbol("AAPL"))

My experience has been that Zipline still downloads the SPY data (limited to a year) but at least refrains from using it in the backtest, and thus the backtest doesn’t fail.

Changing the benchmark data source to morningstar worked for me.

To do this, in [your_env]/lib/python3.5/site-packages/zipline/data/benchmarks.py make the 2 changes marked by # NEW

data = pd_reader.DataReader(
        symbol,
        'morningstar', # NEW
        first_date,
        last_date
    )

    data = data.reset_index(0, drop=True) # NEW
    data = data['Close']

However, I agree with @Sentdex: fetching the benchmark data from the local bundle would be an improvement – both in speed and stability.

Edit: Morningstar data was new for pandas-datareader v0.6.0, so a version upgrade may be necessary.

Google has changed the url for a finance data. Instead of http://www.google.com/ need to use https://finance.google.com/. Open a source code of pandas-datareader package and change urls.

Hi, Is there an official solution out there for running backtests and not having the system break every time because of this bechmark issue? @yiorgosn solution doesn’t work for me, I think you have to do more than just replace the file. I have the exact same failure even with his file. Of course, I have replaced the file directory to make sure it looks in the right place for the csv.

Is there a way I can run the backtest without doing a benchmark until it is fixed? Without, that is, ripping up the code and removing any mention of benchmarks. Thanks

@niklas-amslgruber there’s a fix on master that uses IEX. You should be able to run a backtest up to 5 years from the current date using the zipline master branch, which you can install using:

git clone git@github.com:quantopian/zipline.git
pip install zipline/

or fork it and then do the same steps above, replacing quantopian with your-github-username.

Hoping to do a release of zipline in the next week or two as well so people can just pip install without cloning.

Also doing work here #2107 for a more permanent fix, but haven’t had the chance to finish it.