Forecasting Across Grouped Time Series

Objective

After completing this lesson, you will be able to implement forecasting logic across grouped time series using SAP HANA PAL.

Forecasting Multiple Time Series

In real-world forecasting scenarios, it's common to generate separate forecasts for multiple categories, such as countries, stores, or product lines. This lesson walks through how to scale time series forecasting across many such groups using the group_key parameter in SAP HANA PAL and custom looping logic.

Training the Model with the AdditiveModelForecast Algorithm

The Python machine learning client for SAP HANA (hana-ml) exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on SAP HANA DataFrames input data.

Next, we train individual models using the same AdditiveModelForecast algorithm [1] - as we did for the univariate time series model scenario.

However, we now specify the country of residence as group key, which delivers individual models for each country.

Code Snippet
12
amf = AdditiveModelForecast(massive=True) amf.fit(data=hdf_overnightstays_agg, group_key='COUNTRYOFRESIDENCE')

<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9c79d7020>

Time Series Forecast - Individually by Country

We create an SAP HANA DataFrame that contains the dates/months for which we want to create a forecast.

Next, we begin by identifying the most recent month from the training history.

Code Snippet
1234
str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0] str_lastdate = str(str_lastdate)[0:10] print(str_lastdate)

2024-05-01

Now, we create an SAP HANA DataFrame that contains the first day of the desired months to forecast.

Code Snippet
123456
months_to_forecast=12 hdf_overnightstays_future = binomial(conn, n=1, p=1, num_random=months_to_forecast) hdf_overnightstays_future = hdf_overnightstays_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') ) hdf_overnightstays_future = hdf_overnightstays_future.select('MONTH', ('0', 'TARGET')) hdf_overnightstays_future.head(10).collect()
 MONTHTARGET
02024-06-010
12024-07-010
22024-08-010
32024-09-010
42024-10-010
52024-11-010
62024-12-010
72025-01-010
82025-02-010
92025-03-010

Then, we create an SAP HANA DataFrame that contains the countries for which forecasts should be delivered.

Code Snippet
123
hdf_overnightstays_countries = hdf_overnightstays.agg([('sum', 'OVERNIGHTSTAYS', 'TOTAL' )], group_by='COUNTRYOFRESIDENCE' ) \ .sort('TOTAL', desc=True).select('COUNTRYOFRESIDENCE').head(10) hdf_overnightstays_countries.collect()
 COUNTRYOFRESIDENCE
0Switzerland
1Germany
2United States
3United Kingdom
4France
5Italy
6Netherlands
7Belgium
8India
9Spain

Following, we combine the above two SAP HANA DataFrames to create a new DataFrame that contains the future dates/months for each country.

Code Snippet
123
hdf_overnightstays_topredict = hdf_overnightstays_future.add_id().set_index('ID').join(hdf_overnightstays_countries.add_id('ID').set_index('ID'), how='cross' ) hdf_overnightstays_topredict = hdf_overnightstays_topredict.drop('ID') hdf_overnightstays_topredict.head(10).collect()
 MONTHTARGETCOUNTRYOFRESIDENCE
02024-06-010Switzerland
12024-06-010Germany
22024-06-010United States
32024-06-010United Kingdom
42024-06-010France
52024-06-010Italy
62024-06-010Netherlands
72024-06-010Belgium
82024-06-010India
92024-06-010Spain

Applying the Model to Generate Forecasts

Next, we score the trained time series model to create forecasts for the individual countries for the specified dates.

Code Snippet
12
hdf_overnightstays_pred = amf.predict(data=hdf_overnightstays_topredict, group_key='COUNTRYOFRESIDENCE') hdf_overnightstays_predicted = hdf_overnightstays_pred[0]

We are able to look into the details of the components of the forecast, that is, trend, seasonality, as shown below:

Code Snippet
1
hdf_overnightstays_pred[1].collect()
 GROUP IDMONTHTRENDSEASONALHOLIDAYEXOGENOUS
0Belgium2024-06-0146941.398949{"seasonalities":3149.2922197280824}{}{}
1Belgium2024-07-0146645.399501{"seasonalities":53969.69442748901}{}{}
2Belgium2024-08-0146339.533404{"seasonalities":47670.95477129075}{}{}
3Belgium2024-09-0146033.667307{"seasonalities":-10867.900939089028}{}{}
4Belgium2024-10-0145737.667859{"seasonalities":-35024.826999824596}{}{}
.....................
115United States2025-01-01256174.088611{"seasonalities":-105260.40367463966}{}{}
116United States2025-02-01255710.827254{"seasonalities":-107113.21705733694}{}{}
117United States2025-03-01255292.397641{"seasonalities":-63771.189585235305}{}{}
118United States2025-04-01254829.136284{"seasonalities":-74262.95570659888}{}{}
119United States2025-05-01254380.818842{"seasonalities":35157.57804977409}{}{}

120 rows × 6 columns

Assessing Accuracy

We check whether there are any errors as follows:

Code Snippet
1
hdf_overnightstays_pred[2].collect()

GROUP_ID ERROR_TIMESTAMP ERRORCODE MESSAGE

We plot the forecast specifically for Germany as seen below:

Code Snippet
12
forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'Germany' ''').drop('GROUP_ID').set_index('MONTH'), confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)
Forecast for Germany plotted in a line graph

Now, we plot the forecast for the United States as follows:

Code Snippet
12
forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'United States' ''').drop('GROUP_ID').set_index('MONTH'), confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)
Forecast for USA plotted in a line graph

We can combine the historical dataset used for training the model, and the forecasted values into a single SAP HANA DataFrame as shown below:

Code Snippet
123456789
hdf_overnightstays_predicted = hdf_overnightstays_predicted.select( 'MONTH', ('GROUP_ID', 'COUNTRYOFRESIDENCE'), ('NULL', 'OVERNIGHTSTAYS_SUM'), ('YHAT', 'FORECAST'), ('YHAT_LOWER', 'FORECAST_LOWER'), ('YHAT_UPPER', 'FORECAST_UPPER') ) hdf_overnightstays_predicted.head(10).collect()
 MONTHCOUNTRYOFRESIDENCEOVERNIGHTSTAYS_SUMFORECASTFORECAST_LOWERFORECAST_UPPER
02024-06-01BelgiumNone50090.69116947179.35458952902.229641
12024-07-01BelgiumNone100615.09392897772.912096103203.241535
22024-08-01BelgiumNone94010.48817591129.24329696876.143447
32024-09-01BelgiumNone35165.76636832295.30331338081.503388
42024-10-01BelgiumNone10712.8408597792.91422413545.382103
52024-11-01BelgiumNone15575.70697312869.71051118361.714953
62024-12-01BelgiumNone43761.77340240908.29150946512.689291
72025-01-01BelgiumNone22965.28217820305.02596125710.640517
82025-02-01BelgiumNone36852.83205333948.72528639735.768094
92025-03-01BelgiumNone42565.99072739522.28448745665.217573
Code Snippet
123456
hdf_overnightstays_agg = hdf_overnightstays_agg.select('*', ('NULL', 'FORECAST'), ('NULL', 'FORECAST_LOWER'), ('NULL', 'FORECAST_UPPER') ) hdf_overnightstays_all = hdf_overnightstays_predicted.union(hdf_overnightstays_agg) hdf_overnightstays_all.sort('MONTH').tail(20).collect()
 MONTHCOUNTRYOFRESIDENCEOVERNIGHTSTAYS_SUMFORECASTFORECAST_LOWERFORECAST_UPPER
02025-04-01ItalyNone7.301358e+046.667947e+047.887995e+04
12025-04-01United StatesNone1.805662e+051.680320e+051.921834e+05
22025-04-01United KingdomNone9.974975e+049.186932e+041.070457e+05
32025-04-01BelgiumNone2.574547e+042.259784e+042.862533e+04
42025-04-01SwitzerlandNone1.378666e+061.322983e+061.430933e+06
52025-04-01SpainNone3.213217e+042.940963e+043.476349e+04
62025-04-01IndiaNone6.515502e+045.293711e+047.653153e+04
72025-04-01GermanyNone2.407886e+052.251664e+052.552813e+05
82025-04-01FranceNone1.020115e+059.917646e+041.046622e+05
92025-04-01NetherlandsNone3.838326e+043.441906e+044.207428e+04
102025-05-01United StatesNone2.895384e+052.755186e+053.024338e+05
112025-05-01ItalyNone6.399058e+045.780458e+046.952312e+04
122025-05-01United KingdomNone1.080254e+059.963275e+041.155868e+05
132025-05-01SwitzerlandNone1.502996e+061.445847e+061.553558e+06
142025-05-01IndiaNone1.180078e+051.052617e+051.293825e+05
152025-05-01FranceNone1.204332e+051.173433e+051.233076e+05
162025-05-01SpainNone3.689542e+043.382324e+043.987497e+04
172025-05-01NetherlandsNone5.181526e+044.768336e+045.550257e+04
182025-05-01GermanyNone2.789746e+052.621818e+052.946275e+05
192025-05-01BelgiumNone1.842008e+041.533416e+042.117920e+04

Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.

Code Snippet
1
hdf_overnightstays_all.save('OVERNIGHTSTAYS_FORECAST_COUNTRIES', force=True)

<hana_ml.dataframe.DataFrame at 0x1d9c79d69f0>

Finally, we plot the entire dataset, i.e., the historical values and predictions, for a single country - in this case for Germany.

Code Snippet
1
hdf_overnightstays_data = hdf_overnightstays_all.filter('''"COUNTRYOFRESIDENCE" = 'Germany' ''').collect()
Code Snippet
12345678910111213
fig, ax = plt.subplots() ax.fill_between(hdf_overnightstays_data['MONTH'].values, hdf_overnightstays_data['FORECAST_LOWER'].values, hdf_overnightstays_data['FORECAST_UPPER'].values, alpha=0.2) ax.plot(hdf_overnightstays_data['MONTH'].values, hdf_overnightstays_data['OVERNIGHTSTAYS_SUM'].values, '-') plt.xticks(rotation=45) plt.xlabel("MONTH") plt.ylabel("OVERNIGHTSTAYS") plt.grid() plt.show()
A graph showing the entire dataset of historical and forecasted valued for a single country, Germany

References