In this lesson, you'll learn how to add a standalone data lake instance to the SAP HANA Cloud in your SAP BTP account.
See the following video to know the Business Case for adding a standalone data lake.
Add a Standalone Data Lake
You can easily add a standalone data lake to your SAP HANA Cloud using the SAP HANA Cloud Central overview page. On the SAP HANA Cloud Central overview page, select the Create (1) button to start the Create Instance Wizard and deploy an SAP HANA Cloud, data lake instance.
In the first step of the Create Instance Wizard, you must decide which type of instance you want to create. In this example, select the SAP HANA Cloud, data lake (2) option. After the instance type selection, continue with the Next Step (3).
In the General – Basics area, you must specify the instance name and description of the data lake you're going to create. You must also specify who is allowed to connect to your integrated data lake. The options are:
- Allow only BTP IP addresses. This is the most secure option, but this also means the data lake can only be accessed from applications inside the SAP BTP environment. No connections to the outside world.
- Allow all IP addresses. This option allows a connection from every IP address in the whole world. This option is only suitable if you have a public service that needs to be accessible from anywhere in the world.
- Allow specific IP addresses and IP ranges (in addition to BTP). This option allows you to specify your companies IP addresses or IP ranges. With this option, your employees from the corporate network can access the data lake in the SAP BTP.
After specifying the Basics and Connection information, continue with the Next Step (3).
In the Create Instance – Data Lake Relational Engine step, you enable the data lake relational engine. If you enable data lake relational engine, you can use the data lake to ingest, store, and analyze high volumes of date in a disk-based database. If you don't enable data lake relational engine, you only get a file-based data lake with limited capabilities.
In the Size area, you can specify how many vCPUs will be allocated for the Coordinator and the Workers. The number of vCPUs for the Compute instance is calculated from the number of vCPUs for the Coordinator and the Workers. You can also specify how many Workers should be created.
For the data lake relational engine option, you must also provide a strong password for the database administrator (HDLADMIN).
In the Create Instance – Data Lake Relational Engine Advanced Settings (optional) step, you specify if the data lake must be maximally compatible with SAP HANA or SAP Relational Engine. Which option you choose depends on the applications you want to connect to the data lake and what their preference is. Also, if you develop your own applications on top of the data lake, then the developer's knowledge of SAP HANA or SAP Relational Engine would be the decisive factor.
If you decide to use SAP Relational Engine, then you can choose some general options that are specific to SAP Relational Engine. The options are:
- A Collation describes how to sort and compare characters from a particular character set or encoding. You can choose from 30+ different collations.
- If you require, then the Case Sensitivity can be switched on or off for collations.
- For the NChar data type you can also specify the collation to use. Here you can choose between UTF8BIN and UCA.NoteThe NCHar data type stores Unicode character data.
- The NChar Case Sensitivity behaviour con be specified as well. The options are ignore, respect, UpperFirst of LowerFirst.
By default the provisioning wizard configures and schedules data lake backups. You can manually disable this feature if no backups are required.
Customer Controlled Key Management
The Encryption Key Management Service is also available for SAP HANA data lake. This allows you to use the customer-controlled encryption key (CCEK) feature for integration with the SAP Data Custodian Key Management Service (KMS). Follow the links to find more information on how to use the Customer Controlled Key Management Services and SAP Data Custodian
The SAP HANA Cloud, data lake is being created and started.
In the SAP HANA Cloud Central, you can see the difference between an SAP HANA Cloud, data lake (standalone data lake) and an integrated data lake attached to SAP HANA Cloud, SAP HANA database.
The standalone data lake is represented under its own header with the data lake instance name as you specified.
The integrated data lake is represented under the header of the SAP HANA database it’s associated with.