How can I calculate averages and min max values in a climate data frame after Creating an Index on Date-time in Pandas

How can I calculate averages and min max values in a climate data frame after Creating an Index on Date-time in Pandas
typescript
Ethan Jackson

I have climate data dataframe in cs format but no headers, so I read data as follows:

df_MERRA_data = pd.read_csv(fname,delimiter=',',header=None) df_MERRA_data.columns = \ ['Date-time','Temperature','Wind speed','Sun shine','Precipitation','Relative Humidity']

Then I create an index:

def extract_YYYYMMDD(Date_val): year_val = np.rint(Date_val/1000000).astype(int) month_val = np.rint(Date_val/10000).astype(int) - year_val*100 day_val = np.rint(Date_val/100).astype(int) - year_val*10000 - month_val*100 hour_val = np.rint(Date_val).astype(int) - year_val*1000000 - month_val*10000 - day_val*100 return year_val, month_val, day_val, hour_val year, month, day, hour = extract_YYYYMMDD(df_MERRA_data['Date-time']) df_MERRA_data['Year'] = year df_MERRA_data['Month'] = month df_MERRA_data['Day'] = day df_MERRA_data['Hour'] = hour

Now make an index based on date and time

df_MERRA_data['Date-time'] = pd.to_datetime(df_MERRA_data[['Year','Month','Day','Hour']], errors='coerce')\ .dt.strftime('%Y-%m-%d %H') df_MERRA_data.set_index(['Date-time'], inplace=True) *Then later when I want to extract data on the index I get the error:* df_MERRA_year = df_MERRA_data['Date-time'].dt.year KeyError: 'Date-time'

These are the top three rows in the dataframe:

df_MERRA_data.head(3)

Out[3]:

Temperature Wind speed Sun shine ... Month Day Hour Date-time ... 1985-01-01 00 17.5 14.4 87 ... 1 1 0 1985-01-01 01 17.3 14.4 88 ... 1 1 1 1985-01-01 02 17.1 14.4 88 ... 1 1 2 [3 rows x 9 columns]

Can someone please help me to successfully access the records on the index so that I can determine yearly, monthly, weekly and daily averages, maximum, minmum, etc

Thank you in advance, I am a Pandas Novice.

I have changed the Date-time column name to Date_time and datetime with no success, I have also tried the code without

df_MERRA_data.set_index(['Date-time'], inplace=True)

Thank you

Answer

Searching for a solution to your problem I found this article. It says that you can do the aggregation via resampling, for example:

df.resample('Y').mean()

and you can change the frequencies to

You can change the resample frequencies, such as:

  • D (daily)
  • W (weekly)
  • M (monthly)
  • Q (quarterly)
  • A (yearly)

Here you can read more about aggregation. The function for average is mean, for maximum is max and for minimum is min.

Related Articles