Understanding Machine Learning for Time Series Forecasting

Setting Up the Environment and Preparing Data Using SAP HANA DataFrames

Training an SAP HANA PAL Univariate Time Series Model

Forecasting with Multiple Time Series

Building a Univariate Forecasting Model with SAP HANA PAL

Objective

After completing this lesson, you will be able to apply SAP HANA PAL to train a univariate time series model on the overnight stays dataset.

Training a Univariate Time Series Model Using SAP HANA PAL

In this lesson, you will train a univariate time series forecasting model using the AdditiveModelForecast algorithm from SAP HANA's Predictive Analysis Library (PAL). You'll use monthly aggregated overnight stays as the input data and forecast future values using a trained SAP HANA PAL model. This is a foundation step in using SAP HANA for operational time series predictions.

The Python machine learning client for SAP HANA (hana-ml), exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on the SAP HANA DataFrames input data.

Next, we train a time series model on the monthly aggregates by using the AdditiveModelForecast algorithm [1].

Code Snippet

12
amf = AdditiveModelForecast()
amf.fit(data=hdf_overnightstays_agg)

<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9bdeefb00>

Additive model time series analysis uses an additive model to forecast time series data. It handles data with strong seasonal effects and is robust to shift the historical trend.

Additive model time series analysis - also called 'Prophet' - uses a decomposable time series model with three main components: trend, seasonality, and holidays or events.

We can explore the trained model by looking at a graphical report as shown below:

Code Snippet

1
UnifiedReport(amf).build().display()

100%|██████████| 6/6 [00:51<00:00, 8.57s/it]

A graphical display of the training data in the time series report

Creating the SAP HANA DataFrame for Forecasts

Up to now, we have not created a prediction. In order to create a forecast, we firstly need to create an SAP HANA DataFrame that contains the dates/months we would like forecasts for.

Therefore, we need to find out the most recent date of the training dataset which is displayed below.

Code Snippet

123
str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]
print(str_lastdate)

2024-05-01

Next, based on the last known date / month shown above, we create a new SAP HANA DataFrame that displays the following 12 months.

Code Snippet

123456
months_to_forecast=12

hdf_future = binomial(conn, n=1, p=1, num_random=months_to_forecast)
hdf_future = hdf_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') )
hdf_future = hdf_future.select('MONTH', ('0', 'TARGET'))
hdf_future.head(20).collect()

	MONTH	TARGET
0	2024-06-01	0
1	2024-07-01	0
2	2024-08-01	0
3	2024-09-01	0
4	2024-10-01	0
5	2024-11-01	0
6	2024-12-01	0
7	2025-01-01	0
8	2025-02-01	0
9	2025-03-01	0
10	2025-04-01	0
11	2025-05-01	0

Note the use of binomial in the code:

Binomial is a function designed for generating random numbers following a binomial distribution directly within the SAP HANA database. The two columns generated are ID and GENERATED_NUMBER.

Based on the parameters given to the function we can appreciate that the generated number is always 1.0, which is practically not used for our purposes in this cell. The useful column is ID which is a consecutive number from 0 to 11 (associated to the 12 generated random numbers).

Column ID is useful because that number is summed to the last_date of the training dataset, so to compute the dates for the forecasting. One can see that the last_date of the training dataset summed by ''ID+1" becomes the value in the MONTH column, which is then used for forecasting.

In summary, the use of binomial is an intermediate step (technical rather than scientific) to create the timestamps for the forecasting.

Screenshot of the code and generated data: Refer to the accompanying text for more detail.

Applying the Trained Model

Afterwards, we apply the trained time-series model to predict the overnight stays for following 12 months.

Code Snippet

12
hdf_predicted = amf.predict(data=hdf_future)
hdf_predicted.head(5).collect()

	MONTH	YHAT	YHAT_LOWER	YHAT_UPPER
0	2024-06-01	3.892362e+06	3.818124e+06	3.963618e+06
1	2024-07-01	4.786169e+06	4.713556e+06	4.852429e+06
2	2024-08-01	4.817140e+06	4.743560e+06	4.891867e+06
3	2024-09-01	3.776572e+06	3.702854e+06	3.850597e+06
4	2024-10-01	3.535125e+06	3.459114e+06	3.610827e+06

We visualize the forecast in the graphical report as shown below.

We clearly observe a repeating pattern identified in the training dataset. Most overnight stays are in the summer, but the winter season also has its peaks.

Code Snippet

1
UnifiedReport(amf).build().display()

100%|██████████| 6/6 [00:50<00:00, 8.44s/it]

A graphical display of the training data and forecast result in the time series report

Combining Historical and Predicted Value in the SAP HANA DataFrame

Finally we merge the historical dataset and the predicted values into a single SAP HANA DataFrame as shown below.

To create a complete and continuous view of the time series, it's helpful to combine historical data with future predictions. This merged dataset can be used for visualizing trends, sharing with business stakeholders, or making the data accessible to systems like SAP Analytics Cloud.

Code Snippet

1234567891011
hdf_predicted = hdf_predicted.select('MONTH',
                                    ('NULL', 'OVERNIGHTSTAYS_SUM'),
                                    ('YHAT', 'FORECAST'),
                                    ('YHAT_LOWER', 'FORECAST_LOWER'),
                                    ('YHAT_UPPER', 'FORECAST_UPPER'))
hdf_overnightstays_agg = hdf_overnightstays_agg.select('*', 
                                                      ('NULL', 'FORECAST'),
                                                      ('NULL', 'FORECAST_LOWER'),
                                                      ('NULL', 'FORECAST_UPPER'))
hdf_all = hdf_predicted.union(hdf_overnightstays_agg)
hdf_all.sort('MONTH').tail(5).collect()

	MONTH	OVERNIGHTSTAYS_SUM	FORECAST	FORECAST_LOWER	FORECAST_UPPER
0	2025-01-01	None	2.719344e+06	2.644614e+06	2.794261e+06
1	2025-02-01	None	3.456029e+06	3.376482e+06	3.543326e+06
2	2025-03-01	None	3.412205e+06	3.326795e+06	3.500738e+06
3	2025-04-01	None	2.878771e+06	2.786906e+06	2.965247e+06
4	2025-05-01	None	3.346712e+06	3.245006e+06	3.435816e+06

Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.

Code Snippet

1
hdf_all.save('OVERNIGHTSTAYS_FORECAST_TOTAL', force=True)

<hana_ml.dataframe.DataFrame at 0x1d9c242e0c0>

References

[1]SAP HANA PAL Algorithm - Additive Model Forecast

Continue to quiz