wetterdienst: KeyError: 'station_id' in NoaaGhcnRequest

Describe the bug When getting data for specific stations around Amsterdam, using NoaaGhcnRequest, I get a KeyError ‘station_id’ even though the stations_object finds stations. When reproducing the code on issue 741 (https://github.com/earthobservations/wetterdienst/issues/741) I don’t seem to have a problem.

To Reproduce

from wetterdienst.provider.noaa.ghcn.api import NoaaGhcnRequest, NoaaGhcnParameter
import datetime as dt

stations_object = NoaaGhcnRequest(
    parameter=NoaaGhcnParameter.DAILY.TEMPERATURE_AIR_MIN_200,
    start_date=dt.datetime(2010, 1, 1),
    end_date=dt.datetime(2022, 1, 1)
).filter_by_station_id('NLE00101920')

print(stations_object)
def get_data_from_stations_request(
    stations_object: NoaaGhcnRequest,
) -> pd.DataFrame:
    """
    Takes a stations request object and process queries

    Args:
        stations_object: DwdObservationRequest object that holds all required information
        for downloading opendata dwd data

    Returns:
        DataFrame with content from DwdObservationRequest

    """
    observation_data = []

    for result in stations_object.values.query():
        observation_data.append(result.df)

    return pd.concat(observation_data)

df = get_data_from_stations_request(stations_object)
print(df)

Expected behavior Gets the temperature data

Screenshots image

Desktop (please complete the following information):

  • OS: Windows
  • Python-Version 3.8

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Dear @nhcb ,

all I can say for now is I HATE timezones 😄 it is definitely related to the DST…

Cool, thanks! I appreciate the quick support 😃. I will give it a try.

Dear @nhcb ,

once again thanks for reporting such issues! It’s the beauty of the detail to bring up such errors/hiccups, I will just look into it.

Edit1:

Turns out we can not localize the given date with “Europe/Amsterdam” and the following will give us the error NonExistentTimeError:

import pandas as pd
import pytz

df = pd.DataFrame({"date": ["1914-11-08 00:00:00"]})
df.date = pd.to_datetime(df.date).dt.tz_localize(pytz.timezone("Europe/Amsterdam"))

because somehow when using pd.to_datetime it will automatically add timezone UTC and then when localizing run into error because dates like 1914-11-08 doesn’t exist in UTC. However we can pass notexistent="shift_forward" to tz_localize which automatically then chooses the next date (in this case 1914-11-08 01:00:00 +01