In this lesson, you will train a univariate time series forecasting model using the AdditiveModelForecast algorithm from SAP HANA's Predictive Analysis Library (PAL). You'll use monthly aggregated overnight stays as the input data and forecast future values using a trained SAP HANA PAL model. This is a foundation step in using SAP HANA for operational time series predictions.
The Python machine learning client for SAP HANA (hana-ml), exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on the SAP HANA DataFrames input data.
Next, we train a time series model on the monthly aggregates by using the AdditiveModelForecast algorithm [1].
12amf = AdditiveModelForecast()
amf.fit(data=hdf_overnightstays_agg)<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9bdeefb00>
Additive model time series analysis uses an additive model to forecast time series data. It handles data with strong seasonal effects and is robust to shift the historical trend.
Additive model time series analysis - also called 'Prophet' - uses a decomposable time series model with three main components: trend, seasonality, and holidays or events.
We can explore the trained model by looking at a graphical report as shown below:
1UnifiedReport(amf).build().display()100%|██████████| 6/6 [00:51<00:00, 8.57s/it]

Creating the SAP HANA DataFrame for Forecasts
Up to now, we have not created a prediction. In order to create a forecast, we firstly need to create an SAP HANA DataFrame that contains the dates/months we would like forecasts for.
Therefore, we need to find out the most recent date of the training dataset which is displayed below.
123str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]
print(str_lastdate)2024-05-01
Next, based on the last known date / month shown above, we create a new SAP HANA DataFrame that displays the following 12 months.
123456months_to_forecast=12
hdf_future = binomial(conn, n=1, p=1, num_random=months_to_forecast)
hdf_future = hdf_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') )
hdf_future = hdf_future.select('MONTH', ('0', 'TARGET'))
hdf_future.head(20).collect()| MONTH | TARGET | |
|---|---|---|
| 0 | 2024-06-01 | 0 |
| 1 | 2024-07-01 | 0 |
| 2 | 2024-08-01 | 0 |
| 3 | 2024-09-01 | 0 |
| 4 | 2024-10-01 | 0 |
| 5 | 2024-11-01 | 0 |
| 6 | 2024-12-01 | 0 |
| 7 | 2025-01-01 | 0 |
| 8 | 2025-02-01 | 0 |
| 9 | 2025-03-01 | 0 |
| 10 | 2025-04-01 | 0 |
| 11 | 2025-05-01 | 0 |
Note the use of binomial in the code:
Binomial is a function designed for generating random numbers following a binomial distribution directly within the SAP HANA database. The two columns generated are ID and GENERATED_NUMBER.
Based on the parameters given to the function we can appreciate that the generated number is always 1.0, which is practically not used for our purposes in this cell. The useful column is ID which is a consecutive number from 0 to 11 (associated to the 12 generated random numbers).
Column ID is useful because that number is summed to the last_date of the training dataset, so to compute the dates for the forecasting. One can see that the last_date of the training dataset summed by ''ID+1" becomes the value in the MONTH column, which is then used for forecasting.
In summary, the use of binomial is an intermediate step (technical rather than scientific) to create the timestamps for the forecasting.

Applying the Trained Model
Afterwards, we apply the trained time-series model to predict the overnight stays for following 12 months.
12hdf_predicted = amf.predict(data=hdf_future)
hdf_predicted.head(5).collect()| MONTH | YHAT | YHAT_LOWER | YHAT_UPPER | |
|---|---|---|---|---|
| 0 | 2024-06-01 | 3.892362e+06 | 3.818124e+06 | 3.963618e+06 |
| 1 | 2024-07-01 | 4.786169e+06 | 4.713556e+06 | 4.852429e+06 |
| 2 | 2024-08-01 | 4.817140e+06 | 4.743560e+06 | 4.891867e+06 |
| 3 | 2024-09-01 | 3.776572e+06 | 3.702854e+06 | 3.850597e+06 |
| 4 | 2024-10-01 | 3.535125e+06 | 3.459114e+06 | 3.610827e+06 |
We visualize the forecast in the graphical report as shown below.
We clearly observe a repeating pattern identified in the training dataset. Most overnight stays are in the summer, but the winter season also has its peaks.
1UnifiedReport(amf).build().display()100%|██████████| 6/6 [00:50<00:00, 8.44s/it]

Combining Historical and Predicted Value in the SAP HANA DataFrame
Finally we merge the historical dataset and the predicted values into a single SAP HANA DataFrame as shown below.
To create a complete and continuous view of the time series, it's helpful to combine historical data with future predictions. This merged dataset can be used for visualizing trends, sharing with business stakeholders, or making the data accessible to systems like SAP Analytics Cloud.
1234567891011hdf_predicted = hdf_predicted.select('MONTH',
('NULL', 'OVERNIGHTSTAYS_SUM'),
('YHAT', 'FORECAST'),
('YHAT_LOWER', 'FORECAST_LOWER'),
('YHAT_UPPER', 'FORECAST_UPPER'))
hdf_overnightstays_agg = hdf_overnightstays_agg.select('*',
('NULL', 'FORECAST'),
('NULL', 'FORECAST_LOWER'),
('NULL', 'FORECAST_UPPER'))
hdf_all = hdf_predicted.union(hdf_overnightstays_agg)
hdf_all.sort('MONTH').tail(5).collect()| MONTH | OVERNIGHTSTAYS_SUM | FORECAST | FORECAST_LOWER | FORECAST_UPPER | |
|---|---|---|---|---|---|
| 0 | 2025-01-01 | None | 2.719344e+06 | 2.644614e+06 | 2.794261e+06 |
| 1 | 2025-02-01 | None | 3.456029e+06 | 3.376482e+06 | 3.543326e+06 |
| 2 | 2025-03-01 | None | 3.412205e+06 | 3.326795e+06 | 3.500738e+06 |
| 3 | 2025-04-01 | None | 2.878771e+06 | 2.786906e+06 | 2.965247e+06 |
| 4 | 2025-05-01 | None | 3.346712e+06 | 3.245006e+06 | 3.435816e+06 |
Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.
1hdf_all.save('OVERNIGHTSTAYS_FORECAST_TOTAL', force=True)<hana_ml.dataframe.DataFrame at 0x1d9c242e0c0>