I have climate data dataframe in cs format but no headers, so I read data as follows:

    df_MERRA_data = pd.read_csv(fname,delimiter=',',header=None)
    df_MERRA_data.columns = \
        ['Date-time','Temperature','Wind speed','Sun   shine','Precipitation','Relative Humidity']

Then I create an index:

    def extract_YYYYMMDD(Date_val):
        year_val = np.rint(Date_val/1000000).astype(int)
        month_val = np.rint(Date_val/10000).astype(int) - year_val*100
        day_val = np.rint(Date_val/100).astype(int) - year_val*10000 - month_val*100
        hour_val = np.rint(Date_val).astype(int) - year_val*1000000 - month_val*10000 - day_val*100
        return year_val, month_val, day_val, hour_val

    year, month, day, hour = extract_YYYYMMDD(df_MERRA_data['Date-time'])
    df_MERRA_data['Year'] = year
    df_MERRA_data['Month'] = month
    df_MERRA_data['Day'] = day
    df_MERRA_data['Hour'] = hour

Now make an index based on date and time

    df_MERRA_data['Date-time'] =   pd.to_datetime(df_MERRA_data[['Year','Month','Day','Hour']], errors='coerce')\
    .dt.strftime('%Y-%m-%d %H')
    df_MERRA_data.set_index(['Date-time'], inplace=True)

*Then later when I want to extract data on the index I get the error:*

    df_MERRA_year = df_MERRA_data['Date-time'].dt.year

KeyError: 'Date-time'

These are the top three rows in the dataframe:

    df_MERRA_data.head(3)

Out[3]:

               Temperature  Wind speed  Sun shine  ...  Month  Day  Hour
Date-time                                          ...                  
1985-01-01 00         17.5        14.4         87  ...      1    1     0
1985-01-01 01         17.3        14.4         88  ...      1    1     1
1985-01-01 02         17.1        14.4         88  ...      1    1     2

[3 rows x 9 columns]

Can someone please help me to successfully access the records on the index so that I can determine yearly, monthly, weekly and daily averages, maximum, minmum, etc

Thank you in advance, I am a Pandas Novice.

I have changed the Date-time column name to Date_time and datetime with no success, I have also tried the code without

    df_MERRA_data.set_index(['Date-time'], inplace=True)

Thank you

Answer

Searching for a solution to your problem I found this article. It says that you can do the aggregation via resampling, for example:

df.resample('Y').mean()

and you can change the frequencies to

You can change the resample frequencies, such as:

D (daily)
W (weekly)
M (monthly)
Q (quarterly)
A (yearly)

Here you can read more about aggregation. The function for average is mean, for maximum is max and for minimum is min.

How can I calculate averages and min max values in a climate data frame after Creating an Index on Date-time in Pandas

Now make an index based on date and time

Answer

Related Articles

Why does a regex that ends with `[^.]` match unexpectedly?

How to make a macro in C to parameterize register?

Set HTTP(S) proxy programmatically

Unify two columns skipping NAs

Webhook Failed to respond in timely manner

Inferring type based on another type's field