Building a Univariate Forecasting Model with SAP HANA PAL

Objective

After completing this lesson, you will be able to apply SAP HANA PAL to train a univariate time series model on the overnight stays dataset.

Training a Univariate Time Series Model Using SAP HANA PAL

In this lesson, you will train a univariate time series forecasting model using the AdditiveModelForecast algorithm from SAP HANA's Predictive Analysis Library (PAL). You'll use monthly aggregated overnight stays as the input data and forecast future values using a trained SAP HANA PAL model. This is a foundation step in using SAP HANA for operational time series predictions.

The Python machine learning client for SAP HANA (hana-ml), exposes all SAP HANA Predictive Analysis Library (PAL) functions as well as the Automated Predictive Library (APL) functions in Python for use based on the SAP HANA DataFrames input data.

Next, we train a time series model on the monthly aggregates by using the AdditiveModelForecast algorithm [1].

Code Snippet
12
amf = AdditiveModelForecast() amf.fit(data=hdf_overnightstays_agg)

<hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast at 0x1d9bdeefb00>

Additive model time series analysis uses an additive model to forecast time series data. It handles data with strong seasonal effects and is robust to shift the historical trend.

Additive model time series analysis - also called 'Prophet' - uses a decomposable time series model with three main components: trend, seasonality, and holidays or events.

We can explore the trained model by looking at a graphical report as shown below:

Code Snippet
1
UnifiedReport(amf).build().display()

100%|██████████| 6/6 [00:51<00:00, 8.57s/it]

A graphical display of the training data in the time series report

Creating the SAP HANA DataFrame for Forecasts

Up to now, we have not created a prediction. In order to create a forecast, we firstly need to create an SAP HANA DataFrame that contains the dates/months we would like forecasts for.

Therefore, we need to find out the most recent date of the training dataset which is displayed below.

Code Snippet
123
str_lastdate = hdf_overnightstays_agg.tail(1, ref_col='MONTH').collect().iloc[0,0] str_lastdate = str(str_lastdate)[0:10] print(str_lastdate)

2024-05-01

Next, based on the last known date / month shown above, we create a new SAP HANA DataFrame that displays the following 12 months.

Code Snippet
123456
months_to_forecast=12 hdf_future = binomial(conn, n=1, p=1, num_random=months_to_forecast) hdf_future = hdf_future.select('*', (f'''ADD_MONTHS(TO_DATE ('{str_lastdate}', 'YYYY-MM-DD'), ID+1)''', 'MONTH') ) hdf_future = hdf_future.select('MONTH', ('0', 'TARGET')) hdf_future.head(20).collect()
 MONTHTARGET
02024-06-010
12024-07-010
22024-08-010
32024-09-010
42024-10-010
52024-11-010
62024-12-010
72025-01-010
82025-02-010
92025-03-010
102025-04-010
112025-05-010

Note the use of binomial in the code:

Binomial is a function designed for generating random numbers following a binomial distribution directly within the SAP HANA database. The two columns generated are ID and GENERATED_NUMBER.

Based on the parameters given to the function we can appreciate that the generated number is always 1.0, which is practically not used for our purposes in this cell. The useful column is ID which is a consecutive number from 0 to 11 (associated to the 12 generated random numbers).

Column ID is useful because that number is summed to the last_date of the training dataset, so to compute the dates for the forecasting. One can see that the last_date of the training dataset summed by ''ID+1" becomes the value in the MONTH column, which is then used for forecasting.

In summary, the use of binomial is an intermediate step (technical rather than scientific) to create the timestamps for the forecasting.

Screenshot of the code and generated data: Refer to the accompanying text for more detail.

Applying the Trained Model

Afterwards, we apply the trained time-series model to predict the overnight stays for following 12 months.

Code Snippet
12
hdf_predicted = amf.predict(data=hdf_future) hdf_predicted.head(5).collect()
 MONTHYHATYHAT_LOWERYHAT_UPPER
02024-06-013.892362e+063.818124e+063.963618e+06
12024-07-014.786169e+064.713556e+064.852429e+06
22024-08-014.817140e+064.743560e+064.891867e+06
32024-09-013.776572e+063.702854e+063.850597e+06
42024-10-013.535125e+063.459114e+063.610827e+06

We visualize the forecast in the graphical report as shown below.

We clearly observe a repeating pattern identified in the training dataset. Most overnight stays are in the summer, but the winter season also has its peaks.

Code Snippet
1
UnifiedReport(amf).build().display()

100%|██████████| 6/6 [00:50<00:00, 8.44s/it]

A graphical display of the training data and forecast result in the time series report

Combining Historical and Predicted Value in the SAP HANA DataFrame

Finally we merge the historical dataset and the predicted values into a single SAP HANA DataFrame as shown below.

To create a complete and continuous view of the time series, it's helpful to combine historical data with future predictions. This merged dataset can be used for visualizing trends, sharing with business stakeholders, or making the data accessible to systems like SAP Analytics Cloud.

Code Snippet
1234567891011
hdf_predicted = hdf_predicted.select('MONTH', ('NULL', 'OVERNIGHTSTAYS_SUM'), ('YHAT', 'FORECAST'), ('YHAT_LOWER', 'FORECAST_LOWER'), ('YHAT_UPPER', 'FORECAST_UPPER')) hdf_overnightstays_agg = hdf_overnightstays_agg.select('*', ('NULL', 'FORECAST'), ('NULL', 'FORECAST_LOWER'), ('NULL', 'FORECAST_UPPER')) hdf_all = hdf_predicted.union(hdf_overnightstays_agg) hdf_all.sort('MONTH').tail(5).collect()
 MONTHOVERNIGHTSTAYS_SUMFORECASTFORECAST_LOWERFORECAST_UPPER
02025-01-01None2.719344e+062.644614e+062.794261e+06
12025-02-01None3.456029e+063.376482e+063.543326e+06
22025-03-01None3.412205e+063.326795e+063.500738e+06
32025-04-01None2.878771e+062.786906e+062.965247e+06
42025-05-01None3.346712e+063.245006e+063.435816e+06

Optionally, we can save the merged dataset to a table in SAP HANA Cloud. For instance, SAP Analytics Cloud could access such data.

Code Snippet
1
hdf_all.save('OVERNIGHTSTAYS_FORECAST_TOTAL', force=True)

<hana_ml.dataframe.DataFrame at 0x1d9c242e0c0>

References