get_all_tickers: ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46

When I try:

from get_all_tickers import get_tickers as gt

tickers = gt.get_tickers()

I get an error:


tickers = gt.get_tickers(NASDAQ=False)
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
c:\Users\Mislav\Documents\GitHub\stocksee\stocksee\ib_market_data.py in 
----> 36 tickers = gt.get_tickers(NASDAQ=False)

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in get_tickers(NYSE, NASDAQ, AMEX)
     71     tickers_list = []
     72     if NYSE:
---> 73         tickers_list.extend(__exchange2list('nyse'))
     74     if NASDAQ:
     75         tickers_list.extend(__exchange2list('nasdaq'))

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2list(exchange)
    136 
    137 def __exchange2list(exchange):
--> 138     df = __exchange2df(exchange)
    139     # removes weird tickers
    140     df_filtered = df[~df['Symbol'].str.contains("\.|\^")]

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2df(exchange)
    132     response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers, params=params(exchange))
    133     data = io.StringIO(response.text)
--> 134     df = pd.read_csv(data, sep=",")
    135     return df
    136 

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     try:
--> 454         data = parser.read(nrows)
    455     finally:
    456         parser.close()

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   1131     def read(self, nrows=None):
   1132         nrows = _validate_integer("nrows", nrows)
-> 1133         ret = self._engine.read(nrows)
   1134 
   1135         # May alter columns / col_dict

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   2035     def read(self, nrows=None):
   2036         try:
-> 2037             data = self._reader.read(nrows)
   2038         except StopIteration:
   2039             if self._first_chunk:

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 5
Comments: 22

Commits related to this issue

Make `get_tickers_fixed.py` distinguish ecxchanges The supposed 'sane' module taken from this issue [1] is broken. It always returns all tickers even if only a single exchange is selected. This comm... — committed to SuperstonkQuants/q4_majorshortsqueezes by JyzayaEU 3 years ago
Make `get_tickers_fixed.py` distinguish ecxchanges The supposed 'sane' module taken from this issue [1] is broken. It always returns all tickers even if only a single exchange is selected. This comm... — committed to SuperstonkQuants/q4_majorshortsqueezes by JyzayaEU 3 years ago

Most upvoted comments

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])

Possums on Feb 13, 2021

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn’t test regions. Github doesn’t allow me to upload a .py file, so you need to remove the ‘.txt’ ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

ErlerPhilipp on Jan 26, 2021

hey @here, as it doesn’t look like the author cares much about this. I’d suggest that somebody that does, and yet doesn’t want to just use the script provided above, create their own repo/fork this with the appropriate modifications.

There’s no point in creating PRs if the author is MIA.

arainboldt on Apr 12, 2021

Copy and paste this code to get_tickers.py in your library by searching in your PC where it is. Cheers.

import pandas as pd
from enum import Enum
import io
import requests

_EXCHANGE_LIST = ['nyse', 'nasdaq', 'amex']

_SECTORS_LIST = set(['Consumer Non-Durables', 'Capital Goods', 'Health Care',
       'Energy', 'Technology', 'Basic Industries', 'Finance',
       'Consumer Services', 'Public Utilities', 'Miscellaneous',
       'Consumer Durables', 'Transportation'])


headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

def params(exchange):
    return (
        ('letter', '0'),
        ('exchange', exchange),
        ('render', 'download'),
    )

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

def params_region(region):
    return (
        ('letter', '0'),
        ('region', region),
        ('render', 'download'),
    )


class Region(Enum):
    AFRICA = 'AFRICA'
    EUROPE = 'EUROPE'
    ASIA = 'ASIA'
    AUSTRALIA_SOUTH_PACIFIC = 'AUSTRALIA+AND+SOUTH+PACIFIC'
    CARIBBEAN = 'CARIBBEAN'
    SOUTH_AMERICA = 'SOUTH+AMERICA'
    MIDDLE_EAST = 'MIDDLE+EAST'
    NORTH_AMERICA = 'NORTH+AMERICA'

class SectorConstants:
    NON_DURABLE_GOODS = 'Consumer Non-Durables'
    CAPITAL_GOODS = 'Capital Goods'
    HEALTH_CARE = 'Health Care'
    ENERGY = 'Energy'
    TECH = 'Technology'
    BASICS = 'Basic Industries'
    FINANCE = 'Finance'
    SERVICES = 'Consumer Services'
    UTILITIES = 'Public Utilities'
    DURABLE_GOODS = 'Consumer Durables'
    TRANSPORT = 'Transportation'


def get_tickers(NYSE=True, NASDAQ=True, AMEX=True):
    tickers_list = []
    if NYSE:
        tickers_list.extend(__exchange2list('nyse'))
    if NASDAQ:
        tickers_list.extend(__exchange2list('nasdaq'))
    if AMEX:
        tickers_list.extend(__exchange2list('amex'))
    return tickers_list


def get_tickers_filtered(mktcap_min=None, mktcap_max=None, sectors=None):
    tickers_list = []
    for exchange in _EXCHANGE_LIST:
        tickers_list.extend(__exchange2list_filtered(exchange, mktcap_min=mktcap_min, mktcap_max=mktcap_max, sectors=sectors))
    return tickers_list


def get_biggest_n_tickers(top_n, sectors=None):
    df = pd.DataFrame()
    for exchange in _EXCHANGE_LIST:
        temp = __exchange2df(exchange)
        df = pd.concat([df, temp])
        
    df = df.dropna(subset={'marketCap'})
    df = df[~df['Symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['Sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)

    df = df.sort_values('marketCap', ascending=False)
    if top_n > len(df):
        raise ValueError('Not enough companies, please specify a smaller top_n')

    return df.iloc[:top_n]['Symbol'].tolist()


def get_tickers_by_region(region):
    if region in Region:
        response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers,
                                params=params_region(region))
        data = io.StringIO(response.text)
        df = pd.read_csv(data, sep=",")
        return __exchange2list(df)
    else:
        raise ValueError('Please enter a valid region (use a Region.REGION as the argument, e.g. Region.AFRICA)')

def __exchange2df(exchange):
    r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
    data = r.json()['data']
    df = pd.DataFrame(data['rows'], columns=data['headers'])
    return df

def __exchange2list(exchange):
    df = __exchange2df(exchange)
    df_filtered = df[~df['symbol'].str.contains("\.|\^")]
    return df_filtered['symbol'].tolist()

def __exchange2list_filtered(exchange, mktcap_min=None, mktcap_max=None, sectors=None):
    df = __exchange2df(exchange)
    df = df.dropna(subset={'marketCap'})
    df = df[~df['symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        elif mkt_cap == '':
            return 0.0
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)
    if mktcap_min is not None:
        df = df[df['marketCap'] > mktcap_min]
    if mktcap_max is not None:
        df = df[df['marketCap'] < mktcap_max]
    return df['symbol'].tolist()

def save_tickers(NYSE=True, NASDAQ=True, AMEX=True, filename='tickers.csv'):
    tickers2save = get_tickers(NYSE, NASDAQ, AMEX)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)

def save_tickers_by_region(region, filename='tickers_by_region.csv'):
    tickers2save = get_tickers_by_region(region)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)


if __name__ == '__main__':

    tickers = get_tickers()
    print(tickers[:5])

    tickers = get_tickers(AMEX=False)

    # default filename is tickers.csv, to specify, add argument filename='yourfilename.csv'
    save_tickers()

    # save tickers from NYSE and AMEX only
    save_tickers(NASDAQ=False)

    # get tickers from Asia
    tickers_asia = get_tickers_by_region(Region.ASIA)
    print(tickers_asia[:5])

    # save tickers from Europe
    save_tickers_by_region(Region.EUROPE, filename='EU_tickers.csv')

    # get tickers filtered by market cap (in millions)
    filtered_tickers = get_tickers_filtered(mktcap_min=500, mktcap_max=2000)
    print(filtered_tickers[:5])

    # not setting max will get stocks with $2000 million market cap and up.
    filtered_tickers = get_tickers_filtered(mktcap_min=2000)
    print(filtered_tickers[:5])

    # get tickers filtered by sector
    filtered_by_sector = get_tickers_filtered(mktcap_min=200e3, sectors=SectorConstants.FINANCE)
    print(filtered_by_sector[:5])

    # get tickers of 5 largest companies by market cap (specify sectors=SECTOR)
    top_5 = get_biggest_n_tickers(5)
    print(top_5)

justege on May 30, 2021

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn’t test regions. Github doesn’t allow me to upload a .py file, so you need to remove the ‘.txt’ ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

Hey, the “Symbol” column name shall be in all lower case “symbol”

JaisinhBhosale9712 on May 1, 2021

Updated the code by DimaDDM to be able to get individual exchanges rather than return all symbols get_tickers_by_region also updated but untested

get_tickers.py.txt

s-kerin on Feb 14, 2021

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)`
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])

There is a small typo on the following line:

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)'

Simply remove the extra quotation mark at the end. This works @Possums

frank-ceballos on Feb 13, 2021

@patroucheva, You can fix the issue by changing ‘Symbol’ to ‘symbol’ on lines 162 and 163

Coombszy on Jan 26, 2021

@ErlerPhilipp Great! The standard way to suggest code changes is to create a pull request, so that might be why you can’t upload a .py file (it’s slightly more work, though).

krikru on Jan 26, 2021

I am seeing this same issue. I really hope it gets fixed soon!

divvision33 on Jan 25, 2021

I am getting the same error. Lets hope it gets fixed. @shilewenuw

sarino5 on Jan 25, 2021