zipline: Getting benckmark data via IEX API does not work anymore

Zipline uses IEX API to get benchmark data in benchmarks.py:

def get_benchmark_returns(symbol):
    """
    Get a Series of benchmark returns from IEX associated with `symbol`.
    Default is `SPY`.

    Parameters
    ----------
    symbol : str
        Benchmark symbol for which we're getting the returns.

    The data is provided by IEX (https://iextrading.com/), and we can
    get up to 5 years worth of data.
    """
    r = requests.get(
        'https://api.iextrading.com/1.0/stock/{}/chart/5y'.format(symbol)
    )
    data = r.json()

    df = pd.DataFrame(data)

    df.index = pd.DatetimeIndex(df['date'])
    df = df['close']

    return df.sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

However, according to the IEX FAQ page, the chart api was already removed on June 15, 2019. Currently, using this api to try to download any stock data such as SPY will return nothing but an HTTP 403 error. The functions of deprecated APIs are now transferred to their new API, IEX Cloud, which requires a unique token per user in any request. Any idea how to fix this issue in the long run?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 55 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Inspired by this comment in #1951, here is the workaround for people who does NOT need benchmarks at all:

By default, zipline downloads benchmark data by making an http request in get_benchmark_returns() in zipline/data/benchmarks.py. It returns a pd.Series which will be saved to a csv file by ensure_benchmark_data() in zipline/data/loaders.py. So we can create a dummy benchmark file by setting all data entries to zero.

First, replace benchmarks.py with:

import pandas as pd
from trading_calendars import get_calendar

def get_benchmark_returns(symbol, first_date, last_date):
    cal = get_calendar('NYSE')
    
    dates = cal.sessions_in_range(first_date, last_date)

    data = pd.DataFrame(0.0, index=dates, columns=['close'])
    data = data['close']

    return data.sort_index().iloc[1:]

Then in loaders.py, replace data = get_benchmark_returns(symbol) with data = get_benchmark_returns(symbol, first_date, last_date)

In this example NYSE is used, but it also works when I use AlwaysOpenCalendar in my backtest, so I did not try to change it to some other calendar.

This is only a hack. In the long run I would suggest to change the benchmark downloading method to request other API in case you would like to use benchmarks in the future.

Another fix is getting a free IEX api key and altering the API request in benchmarks.py

r= requests.get(
        "https://cloud.iexapis.com/stable/stock/{}/chart/5y?chartCloseOnly=True&token={}".format(symbol, IEX_KEY)
    )

My temporal fix (on zipline-live, which is branched off from 1.1.0):

diff --git a/zipline/data/benchmarks.py b/zipline/data/benchmarks.py
index 45137428..72d9c3cc 100644
--- a/zipline/data/benchmarks.py
+++ b/zipline/data/benchmarks.py
@@ -30,8 +30,10 @@ def get_benchmark_returns(symbol):
     The data is provided by IEX (https://iextrading.com/), and we can
     get up to 5 years worth of data.
     """
+    IEX_TOKEN = 'pk_TOKEN_COMES_HERE’  # FIXME: move to param
     r = requests.get(
-        'https://api.iextrading.com/1.0/stock/{}/chart/5y'.format(symbol)
+        'https://cloud.iexapis.com/stable/stock/{}/chart/5y?token={}'.format(
+            symbol, IEX_TOKEN)
     )
     data = json.loads(r.text)

My take on solution:

  1. Put proper error handling in place. No vital functionality requires benchmark data, and there’s no reason for a full on crash if it can’t be reached. Just print a warning message and move on with the backtest.

  2. A hard coded benchmark from a hard coded source makes no sense. Using the SPY as bench doesn’t make sense either, in particular since this api call doesn’t seem to take dividends into account. If you really need a bm, have the symbol and bundle configurable.

People who are serious enough about backtesting to bother with setting up a local Zipline are not very likely to rely on Yahoo, Quandle, Google or other free sources, and they are very likely to use proper benchmarks instead of price series of an ETF.

here’s a fix that goes back to yahoo as a benchmark source. replace this method in benchmarks.py and don’t forget to change the call to it in loader.py

import numpy as np
import pandas as pd
import pandas_datareader.data as pd_reader

def get_benchmark_returns(symbol, first_date, last_date):
    """
    Get a Series of benchmark returns from Yahoo associated with `symbol`.
    Default is `SPY`.

    Parameters
    ----------
    symbol : str
        Benchmark symbol for which we're getting the returns.

    The data is provided by Yahoo Finance
    """
    data = pd_reader.DataReader(
        symbol,
        'yahoo',
        first_date,
        last_date
    )

    data = data['Close']

    data[pd.Timestamp('2008-12-15')] = np.nan
    data[pd.Timestamp('2009-08-11')] = np.nan
    data[pd.Timestamp('2012-02-02')] = np.nan

    data = data.fillna(method='ffill')

    return data.sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

@zipper-123 Perhaps you have a faulty cached file. Remember that ensure_benchmark_data in loader.py first attempts to read from disk. That happened to my while I was tinkering with a solution.

I ended up implementing a minor variation of the solution suggested by @marketneutral to sidestep the issue until it’s properly fixed. I kept the signature of get_benchmark_returns, and just used a wider date range than I’ll ever need.

In benchmark_py

def get_benchmark_returns(symbol):
    cal = get_calendar('NYSE')
    first_date = datetime(1930,1,1)
    last_date = datetime(2030,1,1)
    dates = cal.sessions_in_range(first_date, last_date)
    data = pd.DataFrame(0.0, index=dates, columns=['close'])
    data = data['close']
    return data.sort_index().iloc[1:]

And bypassing the cache in loader.py

    """
    if data is not None:
        return data
    """

Zipline will have to move their datasource for SPY data somewhere else besides IEX, or change the setup to require an API key. Another way to fix is:

  • sign up here: https://iexcloud.io/cloud-login#/register
  • find where your benchmarks.py file is: ipython, import zipline, zipline.__file__
  • Open the zipline lib in an IDE like Atom
  • Add your token from the IEX dashboard (instead of pk_numbers... below), and change the requests line to:
    token = 'pk_numbersnumbersnumbers'
    r = requests.get(
        'https://cloud.iexapis.com/stable/stock/{}/chart/5y?token={}'.format(symbol, token)
    )

I guess it’s the nature of the beast with financial data that it’s hard to find for free…but annoying.

yes… mine is not a temporary fix. it’s a solution. I’m working with it for more than a month

My temporal fix (on zipline-live, which is branched off from 1.1.0):

diff --git a/zipline/data/benchmarks.py b/zipline/data/benchmarks.py
index 45137428..72d9c3cc 100644
--- a/zipline/data/benchmarks.py
+++ b/zipline/data/benchmarks.py
@@ -30,8 +30,10 @@ def get_benchmark_returns(symbol):
     The data is provided by IEX (https://iextrading.com/), and we can
     get up to 5 years worth of data.
     """
+    IEX_TOKEN = 'pk_TOKEN_COMES_HERE’  # FIXME: move to param
     r = requests.get(
-        'https://api.iextrading.com/1.0/stock/{}/chart/5y'.format(symbol)
+        'https://cloud.iexapis.com/stable/stock/{}/chart/5y?token={}'.format(
+            symbol, IEX_TOKEN)
     )
     data = json.loads(r.text)

This resolved it for me (had to +import json though).

Any chance this will be in the official release?