In real-world forecasting scenarios, it's common to generate separate forecasts for multiple categories, such as countries, stores, or product lines. This lesson walks through how to scale time series forecasting across many such groups using the group_key parameter in SAP HANA PAL and custom looping logic.
Training the Model with the AdditiveModelForecast Algorithm
The Python machine learning client for SAP HANA (hana-ml) exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on SAP HANA DataFrames input data.
Next, we train individual models using the same AdditiveModelForecast algorithm [1] - as we did for the univariate time series model scenario.
However, we now specify the country of residence as group key, which delivers individual models for each country.
12amf = AdditiveModelForecast(massive=True)
amf.fit(data=hdf_overnightstays_agg, group_key='COUNTRYOFRESIDENCE')<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9c79d7020>
Time Series Forecast - Individually by Country
We create an SAP HANA DataFrame that contains the dates/months for which we want to create a forecast.
Next, we begin by identifying the most recent month from the training history.
1234str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]
print(str_lastdate)2024-05-01
Now, we create an SAP HANA DataFrame that contains the first day of the desired months to forecast.
123456months_to_forecast=12
hdf_overnightstays_future = binomial(conn, n=1, p=1, num_random=months_to_forecast)
hdf_overnightstays_future = hdf_overnightstays_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') )
hdf_overnightstays_future = hdf_overnightstays_future.select('MONTH', ('0', 'TARGET'))
hdf_overnightstays_future.head(10).collect()| MONTH | TARGET | |
|---|---|---|
| 0 | 2024-06-01 | 0 |
| 1 | 2024-07-01 | 0 |
| 2 | 2024-08-01 | 0 |
| 3 | 2024-09-01 | 0 |
| 4 | 2024-10-01 | 0 |
| 5 | 2024-11-01 | 0 |
| 6 | 2024-12-01 | 0 |
| 7 | 2025-01-01 | 0 |
| 8 | 2025-02-01 | 0 |
| 9 | 2025-03-01 | 0 |
Then, we create an SAP HANA DataFrame that contains the countries for which forecasts should be delivered.
123hdf_overnightstays_countries = hdf_overnightstays.agg([('sum', 'OVERNIGHTSTAYS', 'TOTAL' )], group_by='COUNTRYOFRESIDENCE' ) \
.sort('TOTAL', desc=True).select('COUNTRYOFRESIDENCE').head(10)
hdf_overnightstays_countries.collect()| COUNTRYOFRESIDENCE | |
|---|---|
| 0 | Switzerland |
| 1 | Germany |
| 2 | United States |
| 3 | United Kingdom |
| 4 | France |
| 5 | Italy |
| 6 | Netherlands |
| 7 | Belgium |
| 8 | India |
| 9 | Spain |
Following, we combine the above two SAP HANA DataFrames to create a new DataFrame that contains the future dates/months for each country.
123hdf_overnightstays_topredict = hdf_overnightstays_future.add_id().set_index('ID').join(hdf_overnightstays_countries.add_id('ID').set_index('ID'), how='cross' )
hdf_overnightstays_topredict = hdf_overnightstays_topredict.drop('ID')
hdf_overnightstays_topredict.head(10).collect()| MONTH | TARGET | COUNTRYOFRESIDENCE | |
|---|---|---|---|
| 0 | 2024-06-01 | 0 | Switzerland |
| 1 | 2024-06-01 | 0 | Germany |
| 2 | 2024-06-01 | 0 | United States |
| 3 | 2024-06-01 | 0 | United Kingdom |
| 4 | 2024-06-01 | 0 | France |
| 5 | 2024-06-01 | 0 | Italy |
| 6 | 2024-06-01 | 0 | Netherlands |
| 7 | 2024-06-01 | 0 | Belgium |
| 8 | 2024-06-01 | 0 | India |
| 9 | 2024-06-01 | 0 | Spain |
Applying the Model to Generate Forecasts
Next, we score the trained time series model to create forecasts for the individual countries for the specified dates.
12hdf_overnightstays_pred = amf.predict(data=hdf_overnightstays_topredict, group_key='COUNTRYOFRESIDENCE')
hdf_overnightstays_predicted = hdf_overnightstays_pred[0]We are able to look into the details of the components of the forecast, that is, trend, seasonality, as shown below:
1hdf_overnightstays_pred[1].collect()| GROUP ID | MONTH | TREND | SEASONAL | HOLIDAY | EXOGENOUS | |
|---|---|---|---|---|---|---|
| 0 | Belgium | 2024-06-01 | 46941.398949 | {"seasonalities":3149.2922197280824} | {} | {} |
| 1 | Belgium | 2024-07-01 | 46645.399501 | {"seasonalities":53969.69442748901} | {} | {} |
| 2 | Belgium | 2024-08-01 | 46339.533404 | {"seasonalities":47670.95477129075} | {} | {} |
| 3 | Belgium | 2024-09-01 | 46033.667307 | {"seasonalities":-10867.900939089028} | {} | {} |
| 4 | Belgium | 2024-10-01 | 45737.667859 | {"seasonalities":-35024.826999824596} | {} | {} |
| ... | ... | ... | ... | ... | ... | ... |
| 115 | United States | 2025-01-01 | 256174.088611 | {"seasonalities":-105260.40367463966} | {} | {} |
| 116 | United States | 2025-02-01 | 255710.827254 | {"seasonalities":-107113.21705733694} | {} | {} |
| 117 | United States | 2025-03-01 | 255292.397641 | {"seasonalities":-63771.189585235305} | {} | {} |
| 118 | United States | 2025-04-01 | 254829.136284 | {"seasonalities":-74262.95570659888} | {} | {} |
| 119 | United States | 2025-05-01 | 254380.818842 | {"seasonalities":35157.57804977409} | {} | {} |
120 rows × 6 columns
Assessing Accuracy
We check whether there are any errors as follows:
1hdf_overnightstays_pred[2].collect()GROUP_ID ERROR_TIMESTAMP ERRORCODE MESSAGE
We plot the forecast specifically for Germany as seen below:
12forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'Germany' ''').drop('GROUP_ID').set_index('MONTH'),
confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)
Now, we plot the forecast for the United States as follows:
12forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'United States' ''').drop('GROUP_ID').set_index('MONTH'),
confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)
We can combine the historical dataset used for training the model, and the forecasted values into a single SAP HANA DataFrame as shown below:
123456789hdf_overnightstays_predicted = hdf_overnightstays_predicted.select(
'MONTH',
('GROUP_ID', 'COUNTRYOFRESIDENCE'),
('NULL', 'OVERNIGHTSTAYS_SUM'),
('YHAT', 'FORECAST'),
('YHAT_LOWER', 'FORECAST_LOWER'),
('YHAT_UPPER', 'FORECAST_UPPER')
)
hdf_overnightstays_predicted.head(10).collect()| MONTH | COUNTRYOFRESIDENCE | OVERNIGHTSTAYS_SUM | FORECAST | FORECAST_LOWER | FORECAST_UPPER | |
|---|---|---|---|---|---|---|
| 0 | 2024-06-01 | Belgium | None | 50090.691169 | 47179.354589 | 52902.229641 |
| 1 | 2024-07-01 | Belgium | None | 100615.093928 | 97772.912096 | 103203.241535 |
| 2 | 2024-08-01 | Belgium | None | 94010.488175 | 91129.243296 | 96876.143447 |
| 3 | 2024-09-01 | Belgium | None | 35165.766368 | 32295.303313 | 38081.503388 |
| 4 | 2024-10-01 | Belgium | None | 10712.840859 | 7792.914224 | 13545.382103 |
| 5 | 2024-11-01 | Belgium | None | 15575.706973 | 12869.710511 | 18361.714953 |
| 6 | 2024-12-01 | Belgium | None | 43761.773402 | 40908.291509 | 46512.689291 |
| 7 | 2025-01-01 | Belgium | None | 22965.282178 | 20305.025961 | 25710.640517 |
| 8 | 2025-02-01 | Belgium | None | 36852.832053 | 33948.725286 | 39735.768094 |
| 9 | 2025-03-01 | Belgium | None | 42565.990727 | 39522.284487 | 45665.217573 |
123456hdf_overnightstays_agg = hdf_overnightstays_agg.select('*', ('NULL', 'FORECAST'),
('NULL', 'FORECAST_LOWER'),
('NULL', 'FORECAST_UPPER')
)
hdf_overnightstays_all = hdf_overnightstays_predicted.union(hdf_overnightstays_agg)
hdf_overnightstays_all.sort('MONTH').tail(20).collect()| MONTH | COUNTRYOFRESIDENCE | OVERNIGHTSTAYS_SUM | FORECAST | FORECAST_LOWER | FORECAST_UPPER | |
|---|---|---|---|---|---|---|
| 0 | 2025-04-01 | Italy | None | 7.301358e+04 | 6.667947e+04 | 7.887995e+04 |
| 1 | 2025-04-01 | United States | None | 1.805662e+05 | 1.680320e+05 | 1.921834e+05 |
| 2 | 2025-04-01 | United Kingdom | None | 9.974975e+04 | 9.186932e+04 | 1.070457e+05 |
| 3 | 2025-04-01 | Belgium | None | 2.574547e+04 | 2.259784e+04 | 2.862533e+04 |
| 4 | 2025-04-01 | Switzerland | None | 1.378666e+06 | 1.322983e+06 | 1.430933e+06 |
| 5 | 2025-04-01 | Spain | None | 3.213217e+04 | 2.940963e+04 | 3.476349e+04 |
| 6 | 2025-04-01 | India | None | 6.515502e+04 | 5.293711e+04 | 7.653153e+04 |
| 7 | 2025-04-01 | Germany | None | 2.407886e+05 | 2.251664e+05 | 2.552813e+05 |
| 8 | 2025-04-01 | France | None | 1.020115e+05 | 9.917646e+04 | 1.046622e+05 |
| 9 | 2025-04-01 | Netherlands | None | 3.838326e+04 | 3.441906e+04 | 4.207428e+04 |
| 10 | 2025-05-01 | United States | None | 2.895384e+05 | 2.755186e+05 | 3.024338e+05 |
| 11 | 2025-05-01 | Italy | None | 6.399058e+04 | 5.780458e+04 | 6.952312e+04 |
| 12 | 2025-05-01 | United Kingdom | None | 1.080254e+05 | 9.963275e+04 | 1.155868e+05 |
| 13 | 2025-05-01 | Switzerland | None | 1.502996e+06 | 1.445847e+06 | 1.553558e+06 |
| 14 | 2025-05-01 | India | None | 1.180078e+05 | 1.052617e+05 | 1.293825e+05 |
| 15 | 2025-05-01 | France | None | 1.204332e+05 | 1.173433e+05 | 1.233076e+05 |
| 16 | 2025-05-01 | Spain | None | 3.689542e+04 | 3.382324e+04 | 3.987497e+04 |
| 17 | 2025-05-01 | Netherlands | None | 5.181526e+04 | 4.768336e+04 | 5.550257e+04 |
| 18 | 2025-05-01 | Germany | None | 2.789746e+05 | 2.621818e+05 | 2.946275e+05 |
| 19 | 2025-05-01 | Belgium | None | 1.842008e+04 | 1.533416e+04 | 2.117920e+04 |
Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.
1hdf_overnightstays_all.save('OVERNIGHTSTAYS_FORECAST_COUNTRIES', force=True)<hana_ml.dataframe.DataFrame at 0x1d9c79d69f0>
Finally, we plot the entire dataset, i.e., the historical values and predictions, for a single country - in this case for Germany.
1hdf_overnightstays_data = hdf_overnightstays_all.filter('''"COUNTRYOFRESIDENCE" = 'Germany' ''').collect()12345678910111213fig, ax = plt.subplots()
ax.fill_between(hdf_overnightstays_data['MONTH'].values,
hdf_overnightstays_data['FORECAST_LOWER'].values,
hdf_overnightstays_data['FORECAST_UPPER'].values, alpha=0.2)
ax.plot(hdf_overnightstays_data['MONTH'].values,
hdf_overnightstays_data['OVERNIGHTSTAYS_SUM'].values, '-')
plt.xticks(rotation=45)
plt.xlabel("MONTH")
plt.ylabel("OVERNIGHTSTAYS")
plt.grid()
plt.show()