pandas: GroupBy using TimeGrouper does not work

BUG: TimeGrouper not too friendly with other groups, e.g.

df.set_index('Date').groupby([pd.TimeGrouper('6M'),'Branch']).sum() should work


Hi everybody,

I found two issues with TimeGrouper:

  1. TimeGrouper does not work at all:

Let’s take the following example:

df = pd.DataFrame({ ‘Branch’ : ‘A A A A A B’.split(), ‘Buyer’: ‘Carl Mark Carl Joe Joe Carl’.split(), ‘Quantity’: [1,3,5,8,9,3], ‘Date’ : [ DT.datetime(2013,1,1,13,0), DT.datetime(2013,1,1,13,5), DT.datetime(2013,10,1,20,0), DT.datetime(2013,10,3,10,0), DT.datetime(2013,12,2,12,0),
DT.datetime(2013,12,2,14,0), ]})

gr = df.groupby(pd.TimeGrouper(freq=‘6M’))

def testgr(df): print df

gr.apply(testgr)

This will raise the Exception: “Exception: All objects passed were None”

  1. With previous Panda’s version it was not possible to combine TimeGrouper with another criteria such as “Branch” in my case.

Thank you very much

Andy

About this issue

  • Original URL
  • State: closed
  • Created 11 years ago
  • Comments: 20 (12 by maintainers)

Most upvoted comments

You need to set_index as TimeGrouper operates on the index

In [15]: df
Out[15]: 
  Branch Buyer                Date  Quantity
0      A  Carl 2013-01-01 13:00:00         1
1      A  Mark 2013-01-01 13:05:00         3
2      A  Carl 2013-10-01 20:00:00         5
3      A   Joe 2013-10-03 10:00:00         8
4      A   Joe 2013-12-02 12:00:00         9
5      B  Carl 2013-12-02 14:00:00         3

In [16]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).sum()
Out[16]: 
            Quantity
2013-01-31         4
2013-07-31       NaN
2014-01-31        25

If you return a custom function then you need to handle the string cases, but you can return pretty much anything you want (make it a Series) to get this kind of functionaility, you function is passed a slice of the original frame

In [55]: def testf(df):
   ....:     if (df['Buyer'] == 'Mark').sum() > 0:
   ....:         return Series(dict(quantity = df['Quantity'].sum(), buyer = 'mark'))
   ....:     return Series(dict(quantity = df['Quantity'].sum()*100, buyer = 'other'))
   ....: 

In [56]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).apply(lambda x: x.groupby('Branch').apply(testf))
Out[56]: 
                   buyer quantity
           Branch                
2013-01-31 A        mark        4
2014-01-31 A       other     2200
           B       other      300