Understanding Machine Learning for Time Series Forecasting

Setting Up the Environment and Preparing Data Using SAP HANA DataFrames

Training an SAP HANA PAL Univariate Time Series Model

Forecasting with Multiple Time Series

Forecasting Across Grouped Time Series

Objective

After completing this lesson, you will be able to implement forecasting logic across grouped time series using SAP HANA PAL.

Forecasting Multiple Time Series

In real-world forecasting scenarios, it's common to generate separate forecasts for multiple categories, such as countries, stores, or product lines. This lesson walks through how to scale time series forecasting across many such groups using the group_key parameter in SAP HANA PAL and custom looping logic.

Training the Model with the AdditiveModelForecast Algorithm

The Python machine learning client for SAP HANA (hana-ml) exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on SAP HANA DataFrames input data.

Next, we train individual models using the same AdditiveModelForecast algorithm [1] - as we did for the univariate time series model scenario.

However, we now specify the country of residence as group key, which delivers individual models for each country.

Code Snippet

12
amf = AdditiveModelForecast(massive=True)
amf.fit(data=hdf_overnightstays_agg, group_key='COUNTRYOFRESIDENCE')

<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9c79d7020>

Time Series Forecast - Individually by Country

We create an SAP HANA DataFrame that contains the dates/months for which we want to create a forecast.

Next, we begin by identifying the most recent month from the training history.

Code Snippet

1234
str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]

print(str_lastdate)

2024-05-01

Now, we create an SAP HANA DataFrame that contains the first day of the desired months to forecast.

Code Snippet

123456
months_to_forecast=12

hdf_overnightstays_future = binomial(conn, n=1, p=1, num_random=months_to_forecast)
hdf_overnightstays_future = hdf_overnightstays_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') )
hdf_overnightstays_future = hdf_overnightstays_future.select('MONTH', ('0', 'TARGET'))
hdf_overnightstays_future.head(10).collect()

	MONTH	TARGET
0	2024-06-01	0
1	2024-07-01	0
2	2024-08-01	0
3	2024-09-01	0
4	2024-10-01	0
5	2024-11-01	0
6	2024-12-01	0
7	2025-01-01	0
8	2025-02-01	0
9	2025-03-01	0

Then, we create an SAP HANA DataFrame that contains the countries for which forecasts should be delivered.

Code Snippet

123
hdf_overnightstays_countries = hdf_overnightstays.agg([('sum', 'OVERNIGHTSTAYS', 'TOTAL'  )], group_by='COUNTRYOFRESIDENCE' ) \
                            .sort('TOTAL', desc=True).select('COUNTRYOFRESIDENCE').head(10)
hdf_overnightstays_countries.collect()

	COUNTRYOFRESIDENCE
0	Switzerland
1	Germany
2	United States
3	United Kingdom
4	France
5	Italy
6	Netherlands
7	Belgium
8	India
9	Spain

Following, we combine the above two SAP HANA DataFrames to create a new DataFrame that contains the future dates/months for each country.

Code Snippet

123
hdf_overnightstays_topredict = hdf_overnightstays_future.add_id().set_index('ID').join(hdf_overnightstays_countries.add_id('ID').set_index('ID'), how='cross' )
hdf_overnightstays_topredict = hdf_overnightstays_topredict.drop('ID')
hdf_overnightstays_topredict.head(10).collect()

	MONTH	COUNTRYOFRESIDENCE
0	2024-06-01	Switzerland
1	2024-06-01	Germany
2	2024-06-01	United States
3	2024-06-01	United Kingdom
4	2024-06-01	France
5	2024-06-01	Italy
6	2024-06-01	Netherlands
7	2024-06-01	Belgium
8	2024-06-01	India
9	2024-06-01	Spain

Applying the Model to Generate Forecasts

Next, we score the trained time series model to create forecasts for the individual countries for the specified dates.

Code Snippet

12
hdf_overnightstays_pred = amf.predict(data=hdf_overnightstays_topredict, group_key='COUNTRYOFRESIDENCE')
hdf_overnightstays_predicted = hdf_overnightstays_pred[0]

We are able to look into the details of the components of the forecast, that is, trend, seasonality, as shown below:

Code Snippet

1
hdf_overnightstays_pred[1].collect()

	GROUP ID	MONTH	TREND	SEASONAL	HOLIDAY	EXOGENOUS
0	Belgium	2024-06-01	46941.398949	{"seasonalities":3149.2922197280824}	{}	{}
1	Belgium	2024-07-01	46645.399501	{"seasonalities":53969.69442748901}	{}	{}
2	Belgium	2024-08-01	46339.533404	{"seasonalities":47670.95477129075}	{}	{}
3	Belgium	2024-09-01	46033.667307	{"seasonalities":-10867.900939089028}	{}	{}
4	Belgium	2024-10-01	45737.667859	{"seasonalities":-35024.826999824596}	{}	{}
...	...	...	...	...	...	...
115	United States	2025-01-01	256174.088611	{"seasonalities":-105260.40367463966}	{}	{}
116	United States	2025-02-01	255710.827254	{"seasonalities":-107113.21705733694}	{}	{}
117	United States	2025-03-01	255292.397641	{"seasonalities":-63771.189585235305}	{}	{}
118	United States	2025-04-01	254829.136284	{"seasonalities":-74262.95570659888}	{}	{}
119	United States	2025-05-01	254380.818842	{"seasonalities":35157.57804977409}	{}	{}

120 rows × 6 columns

Assessing Accuracy

We check whether there are any errors as follows:

Code Snippet

1
hdf_overnightstays_pred[2].collect()

GROUP_ID ERROR_TIMESTAMP ERRORCODE MESSAGE

We plot the forecast specifically for Germany as seen below:

Code Snippet

12
forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'Germany' ''').drop('GROUP_ID').set_index('MONTH'),
                   confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)

Forecast for Germany plotted in a line graph

Now, we plot the forecast for the United States as follows:

Code Snippet

12
forecast_line_plot(pred_data=hdf_overnightstays_predicted.filter(''' "GROUP_ID" = 'United States' ''').drop('GROUP_ID').set_index('MONTH'),
                   confidence=("YHAT_LOWER", "YHAT_UPPER"), enable_plotly=True)

Forecast for USA plotted in a line graph

We can combine the historical dataset used for training the model, and the forecasted values into a single SAP HANA DataFrame as shown below:

Code Snippet

123456789
hdf_overnightstays_predicted = hdf_overnightstays_predicted.select(
                                  'MONTH',
                                  ('GROUP_ID', 'COUNTRYOFRESIDENCE'),
                                  ('NULL', 'OVERNIGHTSTAYS_SUM'),
                                  ('YHAT', 'FORECAST'),
                                  ('YHAT_LOWER', 'FORECAST_LOWER'),
                                  ('YHAT_UPPER', 'FORECAST_UPPER')
                                 )
hdf_overnightstays_predicted.head(10).collect()

	MONTH	COUNTRYOFRESIDENCE	OVERNIGHTSTAYS_SUM	FORECAST	FORECAST_LOWER	FORECAST_UPPER
0	2024-06-01	Belgium	None	50090.691169	47179.354589	52902.229641
1	2024-07-01	Belgium	None	100615.093928	97772.912096	103203.241535
2	2024-08-01	Belgium	None	94010.488175	91129.243296	96876.143447
3	2024-09-01	Belgium	None	35165.766368	32295.303313	38081.503388
4	2024-10-01	Belgium	None	10712.840859	7792.914224	13545.382103
5	2024-11-01	Belgium	None	15575.706973	12869.710511	18361.714953
6	2024-12-01	Belgium	None	43761.773402	40908.291509	46512.689291
7	2025-01-01	Belgium	None	22965.282178	20305.025961	25710.640517
8	2025-02-01	Belgium	None	36852.832053	33948.725286	39735.768094
9	2025-03-01	Belgium	None	42565.990727	39522.284487	45665.217573

Code Snippet

123456
hdf_overnightstays_agg = hdf_overnightstays_agg.select('*', ('NULL', 'FORECAST'),
                                  ('NULL', 'FORECAST_LOWER'),
                                  ('NULL', 'FORECAST_UPPER')
                                  )
hdf_overnightstays_all = hdf_overnightstays_predicted.union(hdf_overnightstays_agg)
hdf_overnightstays_all.sort('MONTH').tail(20).collect()

	MONTH	COUNTRYOFRESIDENCE	OVERNIGHTSTAYS_SUM	FORECAST	FORECAST_LOWER	FORECAST_UPPER
0	2025-04-01	Italy	None	7.301358e+04	6.667947e+04	7.887995e+04
1	2025-04-01	United States	None	1.805662e+05	1.680320e+05	1.921834e+05
2	2025-04-01	United Kingdom	None	9.974975e+04	9.186932e+04	1.070457e+05
3	2025-04-01	Belgium	None	2.574547e+04	2.259784e+04	2.862533e+04
4	2025-04-01	Switzerland	None	1.378666e+06	1.322983e+06	1.430933e+06
5	2025-04-01	Spain	None	3.213217e+04	2.940963e+04	3.476349e+04
6	2025-04-01	India	None	6.515502e+04	5.293711e+04	7.653153e+04
7	2025-04-01	Germany	None	2.407886e+05	2.251664e+05	2.552813e+05
8	2025-04-01	France	None	1.020115e+05	9.917646e+04	1.046622e+05
9	2025-04-01	Netherlands	None	3.838326e+04	3.441906e+04	4.207428e+04
10	2025-05-01	United States	None	2.895384e+05	2.755186e+05	3.024338e+05
11	2025-05-01	Italy	None	6.399058e+04	5.780458e+04	6.952312e+04
12	2025-05-01	United Kingdom	None	1.080254e+05	9.963275e+04	1.155868e+05
13	2025-05-01	Switzerland	None	1.502996e+06	1.445847e+06	1.553558e+06
14	2025-05-01	India	None	1.180078e+05	1.052617e+05	1.293825e+05
15	2025-05-01	France	None	1.204332e+05	1.173433e+05	1.233076e+05
16	2025-05-01	Spain	None	3.689542e+04	3.382324e+04	3.987497e+04
17	2025-05-01	Netherlands	None	5.181526e+04	4.768336e+04	5.550257e+04
18	2025-05-01	Germany	None	2.789746e+05	2.621818e+05	2.946275e+05
19	2025-05-01	Belgium	None	1.842008e+04	1.533416e+04	2.117920e+04

Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.

Code Snippet

1
hdf_overnightstays_all.save('OVERNIGHTSTAYS_FORECAST_COUNTRIES', force=True)

<hana_ml.dataframe.DataFrame at 0x1d9c79d69f0>

Finally, we plot the entire dataset, i.e., the historical values and predictions, for a single country - in this case for Germany.

Code Snippet

1
hdf_overnightstays_data = hdf_overnightstays_all.filter('''"COUNTRYOFRESIDENCE" = 'Germany' ''').collect()

Code Snippet

12345678910111213
fig, ax = plt.subplots()
ax.fill_between(hdf_overnightstays_data['MONTH'].values, 
                hdf_overnightstays_data['FORECAST_LOWER'].values, 
                hdf_overnightstays_data['FORECAST_UPPER'].values, alpha=0.2)

ax.plot(hdf_overnightstays_data['MONTH'].values, 
        hdf_overnightstays_data['OVERNIGHTSTAYS_SUM'].values, '-')

plt.xticks(rotation=45)
plt.xlabel("MONTH")
plt.ylabel("OVERNIGHTSTAYS")
plt.grid()
plt.show()

A graph showing the entire dataset of historical and forecasted valued for a single country, Germany

References

[1]SAP HANA PAL Algorithm - Additive Model Forecast

Continue to quiz