pandas: BUG: read_csv with date_parser lock file open on failure
Problem description
When using the date_parser functionality of read_csv() if the file read fails then the file is left locked open. My use case is that I am trying to enforce a strict datetime format that must include the time zone offset. Another thread stated that the way to accomplish this is with date_parser. My issue is that I would like to move a file that fails being loaded to another directory, but I cannot because the failure in the date_parser keeps the file open until the python session is terminated.
Code Sample, a copy-pastable example if possible
import pandas as pd
import datetime
def strict_parser(dates):
datetimes = [pd.Timestamp(datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S%z'), tz='UTC') for date in dates]
return pd.DatetimeIndex(datetimes)
filename = 'c:/temp/data.csv'
data = pd.read_csv(filename, parse_dates=['datetime'], index_col=['datetime'], date_parser=strict_parser)
The file data.csv is:
datetime,data 2010-05-05 09:30:00-0500,10 2010-05-05 09:35:00-0500,20 2010-05-05 09:40:00,30
Output of pd.show_versions()
pandas: 0.19.1
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 15 (11 by maintainers)
I don’t think I am enough of an expert in the pandas code to submit a change that fixes this. Do you want me to just create a blank pull-request? (Sorry I’m new to the GitHub / collaborative world and how exactly it works)