pyjanitor: Problems with equalities in contional_join
Brief Description
I don’t know if I’m doing this correctly but, I want to apply 2 equalities and 2 inequalities (dates). If I apply one equality it doesn’t return an error but, when I apply both equalities an error is returned.
System Information
- Operating system: Windows
- OS details : 10
- Python version: 3.9.18
Minimally Reproducible Code
import pandas as pd
import janitor
import random
df1 = pd.DataFrame({
'index1': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C'],
'index2': [1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1],
'date': pd.to_datetime([
'2024-01-01', '2024-01-10', '2024-04-01',
'2024-01-01', '2024-01-10', '2024-04-01',
'2024-01-01', '2024-01-10', '2024-04-01',
'2021-10-01', '2021-11-01'
])
})
df1['value'] = [random.randrange(1, 50, 1) for i in range(df1.shape[0])]
df2 = pd.DataFrame({
'index1': ['A', 'B', 'B', 'C'],
'index2': [1, 1, 2, 1],
'date_min': pd.to_datetime(['2024-01-01', '2024-01-01', '2024-01-01', '2021-10-01']),
'date_max': pd.to_datetime(['2024-01-30', '2024-01-30', '2024-01-30', '2021-10-30']),
})
df1.conditional_join(
df2,
('index1', 'index1', '=='),
('index2', 'index2', '=='),
('date', 'date_min', '>='),
('date', 'date_max', '<='),
)
Error Messages
AttributeError: ‘NoneType’ object has no attribute ‘size’
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 18 (5 by maintainers)
All done, @samukweku!
Thanks for the help!
Basically, I have a function that completes missing dates on a DataFrame with groups. This is an extract of that code:
As you can see when I merge df_dates with df_group_dates I create a bigger DataFrame that is later filtered. This sometimes causes a memory error when the DataFrame is big.
Now, I can replace:
with:
I haven’t done any test to see if this is more efficient but, at least the code is smaller and more readable.
@dugarte-vox a new version of pyjanitor has been released. It should be fine on pandas > 2. Test and let’s know if there are any issues. By the way, out of curiosity, what is your use case for conditional_join that regular pandas could not solve?
We haven’t migrated to pyproject.yaml yet, need your infra expertise on the PR
Sorry for the late response. I’m currently using pandas 2.2.1. I’ve tried with a lower version of pandas (2.0.3) as you suggested and the example is working.
Likely be a minor release
Ok to do so! I trust your judgment on whether to do a patch or minor 😄.