[[{“value”:”
Machine Learning on structured data might not be what many people have in mind when talking about AI. Yet, it is a form of AI and the massive interest in Large Language Models, agents and assistants is also leading to many new Machine Learning Projects.
With this blog you can get hands-on with Machine Learning by implementing a time-series forecast in SAP HANA Cloud. In the requirements and projects that I am seeing, time-series forecasting is by far the most popular Machine Learning approach in a SAP related project. From Accounts Receivable forecasting, to demand forecasting and even solar power prediction, a time-series forecast might get you there.
IMPORTANT: You can download all code from this repository.
Architecture
The architecture for this example is as straightforward as it can be. Your data is in SAP HANA Cloud, where we also have Machine Learning algorithms (Predictive Analysis Library and Automated Predictive Library). We then use Python in a Jupyter Notebook running in SAP Business Application Studio to trigger the Machine Learning algorithms in SAP HANA Cloud. This is possible with SAP’s hana_ml Python package. You can also choose any other Python environment of your choice to use that package to trigger the built-in algorithms. No data ever has to leave SAP HANA Cloud, since data and ML algorithm are already in the same place.
This blog focuses mostly on a stand-alone SAP HANA Cloud. However, you can also trigger the same Machine Learning in SAP Datasphere’s embedded SAP HANA Cloud. This blog has more details.
More components are typically needed for a productive environment. To automate the creation of predictions the Python code can be schedules for instance with Cloud Foundry or SAP AI Core. And the time-series algorithm we are using is just one of 100+ algorithms for you to choose from. Here is an overview. See the hana_ml documentation for details.
Prerequisites
This blog assumes that:
- You have SAP Business Application Studio up and running and configured as described in “Setting up Python in SAP Business Application Studio to trigger HANA Cloud Machine Learning“
- You have access to a SAP HANA Cloud instance, that is configured as described in the same above blog.
Data Load
You will use historic data from the Swiss tourism area to predict how many nights people will spend in a hotel each month in the next 12 months. Download the file HOTELNIGHTS.csv and import it into SAP HANA Cloud, for instance with the SAP HANA Database Explorer. Select “Import Data”.
Click through the options:
- name the new table: HOTELNIGHTS
- set the following column data types
- MONTH: Date
- ARRIVALS: Integer
- HOTELNIGHTS: Integer
You will see that the information in that table is quite detailed. It shows for the period from January 2020 to March 2026 by month how many people from which Country spent how many nights in a hotel in the various regions of Switzerland.
Time-series forecast
Now use the data for a time-series forecast! Script in Python and the hana_ml package translates your logic into SQL and has SAP HANA Cloud do the work.
IMPORTANT: You can download the notebook HANAML time series forecast.ipynb, which has the following code included. If you upload that file into your environment and execute the first cell, you will be prompted to select the Python environment. This can be easily overlooked…
If you are meeting the prerequisites (see above), you should already be able to connect form a Jupyter notebook in BAS to connect to your SAP HANA Cloud. Continue now by storing the database user’s details a in a credentials file.
{
“SAP_HANA_CLOUD”: {
“HANA_ADDRESS”: “YOURENDPOINT”,
“HANA_PORT”: 443,
“HANA_USER”: “AIUSER”,
“HANA_PASSWORD”: “YOURSECRETPASSWORD”
}
}
Load these credentials in Python and verify that the connection still returns True.
import json
with open(‘./credentials.json’, ‘r’) as creds:
credentials = json.load(creds)
SAP_HANA_CLOUD_ADDRESS = credentials[“SAP_HANA_CLOUD”][“HANA_ADDRESS”]
SAP_HANA_CLOUD_PORT = credentials[“SAP_HANA_CLOUD”][“HANA_PORT”]
SAP_HANA_CLOUD_USER = credentials[“SAP_HANA_CLOUD”][“HANA_USER”]
SAP_HANA_CLOUD_PASSWORD = credentials[“SAP_HANA_CLOUD”][“HANA_PASSWORD”]import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(address = SAP_HANA_CLOUD_ADDRESS,
port = SAP_HANA_CLOUD_PORT,
user = SAP_HANA_CLOUD_USER,
password = SAP_HANA_CLOUD_PASSWORD)
conn.connection.isconnected()
In case you see a compatibility warning for Python 3.0.2, feel free to ignore it.
Continue by creating a hana_ml DataFrame, which points to the Data in SAP HANA Cloud. The data remains in SAP HANA Cloud. No records have yet left the system. This is by default and on purpose. The intention is to keep the data in the system and do any data processing within SAP HANA Cloud.
data_hdf = conn.table(‘HOTELNIGHTS’)
It is possible to retrieve data though. You just need to request it with the collect()-method, which downloads the data of the hana_ml DataFrame as Pandas DataFrame. To avoid downloading all records, first restrict the data to 5 rows with the head()-method. The sequence is very important here. First reduce the number of records, then download.
data_hdf.head(5).collect()
To get a more detailed feel for the data, have some statistics calculated.
data_hdf.describe().collect()
There are 13 Swiss regions in the data for example and the maximum number of hotel nights in a specific months in a specific region from a specific country is 223091. Important here is the fact that every value in that table was calculated in SAP HANA Cloud. Imagine you are working with a massive table, you will want to leverage the processing power of SAP HANA Cloud. That’s ensured here, since the hana_ml package created the necessary SQL statement. That SQL syntax can be seen by adding “.select_statement” to the command. The screenshot shows only a fraction of the SQL.
import pprint
pprint.pp(data_hdf.describe().select_statement)
One of many other ways to explore the data is an interactive standard report.
from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(data_hdf).build().display()
We are happy with the data and need to prepare it now for a monthly forecast of all hotel nights in the country. Aggregate by month.
agg_hdf = data_hdf.agg([(‘sum’, ‘HOTELNIGHTS’, ‘HOTELNIGHTS’)], group_by=’MONTH’)
agg_hdf = agg_hdf.sort(‘MONTH’)
agg_hdf.head(5).collect()
As usual, that aggregation is also done on SAP HANA Cloud and only 5 aggregated values are downloaded. Also as usual, you could add “.select_statement” to see the exact SQL syntax that is used.
Plot the monthly data to get a visual impression for the data. This requires the installation of another Python package. Run this cell once. After the package is installed:
- you may want to delete the cell (or comment it out) to avoid unnecessary reinstallation attempts)
- and you may need to restart the Python kernel so that the new package can be used. Click the “Restart” icon in the menu on top and run the earlier cells again.
!pip install nbformat # Once installed, turn this line into a comment to avoid running it again
Download the monthly aggregates and create a line chart.
import plotly.express as px
data_df = agg_hdf.collect()
fig = px.line(data_df, x=’MONTH’, y=’HOTELNIGHTS’, title=’Hotel nights by Month’)
fig.show()
There is a beautiful repeating pattern in the data. True seasonality. Peaks in the summer, lows in November and lower peaks in winter. Train a time-series model! We use the AdditiveModelForecast which is an implementation of Meta’s Prophet algorithm.
from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=agg_hdf)
Explore the trained model in another interactive standard report.
from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(amf).build().display()
With that trained model we can forecast the unknown future. Just prepare a hana_ml DataFrame that contains the months (or dates), for which we want forecasts. In a productive scenario you will probably connect to a table that’s permanently updated with new records. Hence, to keep the code dynamic for scheduling, find out from the data the most recent month for which data is available.
str_lastdate = agg_hdf.tail(1, ref_col=’MONTH’).collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]
print(str_lastdate)
Use that month as basis create the required dates for the forecast. We need 12 records, always the first day of the following 12 months.
from hana_ml.algorithms.pal.random import binomial
months_to_forecast=12
future_hdf = binomial(conn, n=1, p=1, num_random=months_to_forecast)
future_hdf = future_hdf.select(‘*’, (f”’ADD_MONTHS(TO_DATE (‘{str_lastdate}’, ‘YYYY-MM-DD’), ID+1)”’, ‘MONTH’) )
future_hdf = future_hdf.select(‘MONTH’, (‘0’, ‘TARGET’))
future_hdf.head(20).collect()
Apply the trained model to predict those dates. And you get the prediction (YHAT) as well as the prediction interval (YHAT_LOWER and YHAT_UPPER).
predicted_hdf = amf.predict(data=future_hdf)
predicted_hdf.head(20).collect()
Before using the predictions, have a closer look at the trained model and the forecasted values in the standard report.
The outlier tab points out a few months in 2020 that are unusual. Indeed, there was something that year that impacted travelling and hotel nights…. Cutting off that old history would be a good idea. You could go back to the top of the Notebook and add a filter to the data, ie
data_hdf = data_hdf.filter(“YEAR(MONTH) >= 2022”)
In this tutorial we continue with the model as it is. You could save the predictions just as they are. However, you may also want to combine the past actuals with the predictions into a single table since that might make it easier to put a dashboard on top for instance.
predicted_hdf = predicted_hdf.select(‘MONTH’,
(‘NULL’, ‘HOTELNIGHTS’),
(‘YHAT’, ‘FORECAST’),
(‘YHAT_LOWER’, ‘FORECAST_LOWER’),
(‘YHAT_UPPER’, ‘FORECAST_UPPER’))
agg_hdf = agg_hdf.select(‘*’,
(‘NULL’, ‘FORECAST’),
(‘NULL’, ‘FORECAST_LOWER’),
(‘NULL’, ‘FORECAST_UPPER’))
actualsandpred_hdf = predicted_hdf.union(agg_hdf)
actualsandpred_hdf.sort(‘MONTH’).tail(5).collect()
Finally, save that combined data into a table in SAP HANA Cloud. In case that table doesn’t exist yet, it will be automatically created in the format of the hana_ml DataFrame.
actualsandpred_hdf.save(‘HOTELNIGHTS_FORECAST’, force=True)
With this you have enriched your data in SAP HANA Cloud through Machine Learning! The actuals in the HOTELNIGHTS column end in March 2026 with the FORECAST continuing on from April 2026 to March 2027.
Related topics
With such a forecast you can already achieve a lot. You may also want to extend it further or use a completely different algorithm. These resources can help you along, but feel free to ask if you have any questions!
- Documentation hana_ml package
- Documentation of the underlying Predictive Analysis Library
- Sample code for the Predictive Analysis Library (PAL) and also the Automated Predictive Library (APL)
“}]]
[[{“value”:”Machine Learning on structured data might not be what many people have in mind when talking about AI. Yet, it is a form of AI and the massive interest in Large Language Models, agents and assistants is also leading to many new Machine Learning Projects. With this blog you can get hands-on with Machine Learning by implementing a time-series forecast in SAP HANA Cloud. In the requirements and projects that I am seeing, time-series forecasting is by far the most popular Machine Learning approach in a SAP related project. From Accounts Receivable forecasting, to demand forecasting and even solar power prediction, a time-series forecast might get you there.IMPORTANT: You can download all code from this repository. ArchitectureThe architecture for this example is as straightforward as it can be. Your data is in SAP HANA Cloud, where we also have Machine Learning algorithms (Predictive Analysis Library and Automated Predictive Library). We then use Python in a Jupyter Notebook running in SAP Business Application Studio to trigger the Machine Learning algorithms in SAP HANA Cloud. This is possible with SAP’s hana_ml Python package. You can also choose any other Python environment of your choice to use that package to trigger the built-in algorithms. No data ever has to leave SAP HANA Cloud, since data and ML algorithm are already in the same place.This blog focuses mostly on a stand-alone SAP HANA Cloud. However, you can also trigger the same Machine Learning in SAP Datasphere’s embedded SAP HANA Cloud. This blog has more details. More components are typically needed for a productive environment. To automate the creation of predictions the Python code can be schedules for instance with Cloud Foundry or SAP AI Core. And the time-series algorithm we are using is just one of 100+ algorithms for you to choose from. Here is an overview. See the hana_ml documentation for details. PrerequisitesThis blog assumes that:You have SAP Business Application Studio up and running and configured as described in “Setting up Python in SAP Business Application Studio to trigger HANA Cloud Machine Learning”You have access to a SAP HANA Cloud instance, that is configured as described in the same above blog. Data LoadYou will use historic data from the Swiss tourism area to predict how many nights people will spend in a hotel each month in the next 12 months. Download the file HOTELNIGHTS.csv and import it into SAP HANA Cloud, for instance with the SAP HANA Database Explorer. Select “Import Data”. Click through the options:name the new table: HOTELNIGHTSset the following column data typesMONTH: DateARRIVALS: IntegerHOTELNIGHTS: Integer You will see that the information in that table is quite detailed. It shows for the period from January 2020 to March 2026 by month how many people from which Country spent how many nights in a hotel in the various regions of Switzerland. Time-series forecastNow use the data for a time-series forecast! Script in Python and the hana_ml package translates your logic into SQL and has SAP HANA Cloud do the work. IMPORTANT: You can download the notebook HANAML time series forecast.ipynb, which has the following code included. If you upload that file into your environment and execute the first cell, you will be prompted to select the Python environment. This can be easily overlooked…If you are meeting the prerequisites (see above), you should already be able to connect form a Jupyter notebook in BAS to connect to your SAP HANA Cloud. Continue now by storing the database user’s details a in a credentials file.{
“SAP_HANA_CLOUD”: {
“HANA_ADDRESS”: “YOURENDPOINT”,
“HANA_PORT”: 443,
“HANA_USER”: “AIUSER”,
“HANA_PASSWORD”: “YOURSECRETPASSWORD”
}
} Load these credentials in Python and verify that the connection still returns True.import json
with open(‘./credentials.json’, ‘r’) as creds:
credentials = json.load(creds)
SAP_HANA_CLOUD_ADDRESS = credentials[“SAP_HANA_CLOUD”][“HANA_ADDRESS”]
SAP_HANA_CLOUD_PORT = credentials[“SAP_HANA_CLOUD”][“HANA_PORT”]
SAP_HANA_CLOUD_USER = credentials[“SAP_HANA_CLOUD”][“HANA_USER”]
SAP_HANA_CLOUD_PASSWORD = credentials[“SAP_HANA_CLOUD”][“HANA_PASSWORD”]import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(address = SAP_HANA_CLOUD_ADDRESS,
port = SAP_HANA_CLOUD_PORT,
user = SAP_HANA_CLOUD_USER,
password = SAP_HANA_CLOUD_PASSWORD)
conn.connection.isconnected()In case you see a compatibility warning for Python 3.0.2, feel free to ignore it.Continue by creating a hana_ml DataFrame, which points to the Data in SAP HANA Cloud. The data remains in SAP HANA Cloud. No records have yet left the system. This is by default and on purpose. The intention is to keep the data in the system and do any data processing within SAP HANA Cloud.data_hdf = conn.table(‘HOTELNIGHTS’) It is possible to retrieve data though. You just need to request it with the collect()-method, which downloads the data of the hana_ml DataFrame as Pandas DataFrame. To avoid downloading all records, first restrict the data to 5 rows with the head()-method. The sequence is very important here. First reduce the number of records, then download.data_hdf.head(5).collect() To get a more detailed feel for the data, have some statistics calculated.data_hdf.describe().collect() There are 13 Swiss regions in the data for example and the maximum number of hotel nights in a specific months in a specific region from a specific country is 223091. Important here is the fact that every value in that table was calculated in SAP HANA Cloud. Imagine you are working with a massive table, you will want to leverage the processing power of SAP HANA Cloud. That’s ensured here, since the hana_ml package created the necessary SQL statement. That SQL syntax can be seen by adding “.select_statement” to the command. The screenshot shows only a fraction of the SQL.import pprint
pprint.pp(data_hdf.describe().select_statement) One of many other ways to explore the data is an interactive standard report.from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(data_hdf).build().display() We are happy with the data and need to prepare it now for a monthly forecast of all hotel nights in the country. Aggregate by month. agg_hdf = data_hdf.agg([(‘sum’, ‘HOTELNIGHTS’, ‘HOTELNIGHTS’)], group_by=’MONTH’)
agg_hdf = agg_hdf.sort(‘MONTH’)
agg_hdf.head(5).collect()As usual, that aggregation is also done on SAP HANA Cloud and only 5 aggregated values are downloaded. Also as usual, you could add “.select_statement” to see the exact SQL syntax that is used.Plot the monthly data to get a visual impression for the data. This requires the installation of another Python package. Run this cell once. After the package is installed:you may want to delete the cell (or comment it out) to avoid unnecessary reinstallation attempts)and you may need to restart the Python kernel so that the new package can be used. Click the “Restart” icon in the menu on top and run the earlier cells again.!pip install nbformat # Once installed, turn this line into a comment to avoid running it again Download the monthly aggregates and create a line chart.import plotly.express as px
data_df = agg_hdf.collect()
fig = px.line(data_df, x=’MONTH’, y=’HOTELNIGHTS’, title=’Hotel nights by Month’)
fig.show()There is a beautiful repeating pattern in the data. True seasonality. Peaks in the summer, lows in November and lower peaks in winter. Train a time-series model! We use the AdditiveModelForecast which is an implementation of Meta’s Prophet algorithm.from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=agg_hdf) Explore the trained model in another interactive standard report.from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(amf).build().display() With that trained model we can forecast the unknown future. Just prepare a hana_ml DataFrame that contains the months (or dates), for which we want forecasts. In a productive scenario you will probably connect to a table that’s permanently updated with new records. Hence, to keep the code dynamic for scheduling, find out from the data the most recent month for which data is available. str_lastdate = agg_hdf.tail(1, ref_col=’MONTH’).collect().iloc[0,0]
str_lastdate = str(str_lastdate)[0:10]
print(str_lastdate) Use that month as basis create the required dates for the forecast. We need 12 records, always the first day of the following 12 months.from hana_ml.algorithms.pal.random import binomial
months_to_forecast=12
future_hdf = binomial(conn, n=1, p=1, num_random=months_to_forecast)
future_hdf = future_hdf.select(‘*’, (f”’ADD_MONTHS(TO_DATE (‘{str_lastdate}’, ‘YYYY-MM-DD’), ID+1)”’, ‘MONTH’) )
future_hdf = future_hdf.select(‘MONTH’, (‘0’, ‘TARGET’))
future_hdf.head(20).collect() Apply the trained model to predict those dates. And you get the prediction (YHAT) as well as the prediction interval (YHAT_LOWER and YHAT_UPPER).predicted_hdf = amf.predict(data=future_hdf)
predicted_hdf.head(20).collect()Before using the predictions, have a closer look at the trained model and the forecasted values in the standard report. The outlier tab points out a few months in 2020 that are unusual. Indeed, there was something that year that impacted travelling and hotel nights…. Cutting off that old history would be a good idea. You could go back to the top of the Notebook and add a filter to the data, ie data_hdf = data_hdf.filter(“YEAR(MONTH) >= 2022”) In this tutorial we continue with the model as it is. You could save the predictions just as they are. However, you may also want to combine the past actuals with the predictions into a single table since that might make it easier to put a dashboard on top for instance.predicted_hdf = predicted_hdf.select(‘MONTH’,
(‘NULL’, ‘HOTELNIGHTS’),
(‘YHAT’, ‘FORECAST’),
(‘YHAT_LOWER’, ‘FORECAST_LOWER’),
(‘YHAT_UPPER’, ‘FORECAST_UPPER’))
agg_hdf = agg_hdf.select(‘*’,
(‘NULL’, ‘FORECAST’),
(‘NULL’, ‘FORECAST_LOWER’),
(‘NULL’, ‘FORECAST_UPPER’))
actualsandpred_hdf = predicted_hdf.union(agg_hdf)
actualsandpred_hdf.sort(‘MONTH’).tail(5).collect() Finally, save that combined data into a table in SAP HANA Cloud. In case that table doesn’t exist yet, it will be automatically created in the format of the hana_ml DataFrame.actualsandpred_hdf.save(‘HOTELNIGHTS_FORECAST’, force=True) With this you have enriched your data in SAP HANA Cloud through Machine Learning! The actuals in the HOTELNIGHTS column end in March 2026 with the FORECAST continuing on from April 2026 to March 2027. Related topicsWith such a forecast you can already achieve a lot. You may also want to extend it further or use a completely different algorithm. These resources can help you along, but feel free to ask if you have any questions!Documentation hana_ml packageDocumentation of the underlying Predictive Analysis LibrarySample code for the Predictive Analysis Library (PAL) and also the Automated Predictive Library (APL)”}]] Read More Technology Blog Posts by SAP articles
#SAPCHANNEL