Enhance Data Product in SAP Databricks

Objective

After completing this lesson, you will be able to as a Data Scientist, enhance data products in SAP Databricks.

Contents

  • Persona
  • Log On to SAP Databricks
  • SAP Databricks Enhancing Data Products
  • Usage of sap-bdc-connect-sdk
  • Verify Published Data Product

Persona

In this lesson we will work with a data product in SAP Databricks shared from SAP Business Data Cloud and create an enhanced version of it. Then we will share the new data product to SAP Datasphere using SAP Business Data Cloud SDK.

Prerequisites

  • You are logged on to SAP Databricks.

Log On to SAP Databricks

  1. Open a Chrome browser or Microsoft Edge browser and enter the SAP Databricks URL.

    Alternatively, click here.

  2. Provide Email and select Continue.

    • Email: @sapexperienceacademy.com
    • Password:

    525_2_image02
  3. Select Continue with SSO.

    525_2_image03
  4. Select default workspace to continue.

    525_2_image04
  5. SAP Databricks welcome page will open.

    525_2_image05

SAP Databricks Enhancing Data Products

  1. Select Catalog in the navigation menu on the left.

    525_2_image35
  2. Under Delta Shares Received, expand companycode_share -> companycode -> and select the companycode table.

    This Data Product was shared from the SAP Business Data Cloud to SAP Databricks and it’s available for consumption.

    525_2_image36
  3. Switch to the Sample Data tab. If a compute resource isn’t started choose Select compute.

    525_2_image07
  4. Select Serverless and Start and Close.

    525_2_image08
  5. Once the service starts the sample data should appear.

    Here we are accessing remote the data shared from SAP Business Data Cloud.

    525_2_image09
  1. Under My organization catalog section, expand company_code_data_product -> company_code -> and select the company_code_clusters table.

    This table has been populated with company code clusters. It’s the enhanced dataset we will share back to SAP Datasphere after reviewing how we created the table.

    525_2_image10
  2. Switch to the Sample Data tab to view data.

    This table has been created and populated to store company code clusters using a Python notebook. We won’t be reprocessing the notebook again as part of this lesson but will review how data has been enhanced.

    525_2_image11
  3. Select Workspaces in the left navigation menu.

    525_2_image37
  1. Expand Workspace folder and select Project_Artifacts folder.

    525_2_image37
  2. See the Company_Clustering notebook used to create and populate the company_code_clusters table. Select the file to open it.

    525_2_image61
  3. Read thru the explanations and the code in the notebook, explaining step by step how we were able to create the company code clusters.

    525_2_image16
  4. Now we will share the comany code clusters table via Delta Share and publish it to SAP Datasphere as a new data product.

    Select Catalog in the left navigation menu.

    525_2_image39
  5. Under My organization catalog section, expand company_code_data_product -> company_code and select the company_code_clusters table.

    525_2_image40
  6. Next select Share and choose Share via Delta Sharing.

    525_2_image41
  7. Select Create a new share with the table.

    Provide Share name and Recipients:

    • Share name : company_code_clustering_share_

      Note

      Make sure username in the Share name is lowercase. In the following steps we will execute codes in a notebook. They are case-sensitive. So make sure all characters in the username are lowercase
    • Recipients : sap-business-data-cloud

    Select Share.

    525_2_image18
  1. Select the Gear icon on the catalog and then select Delta Sharing.

    525_2_image19
  2. Switch to Shared by me and filter for your username (no space at the end) and select the delta share just created.

    Note

    Refresh the browser if the data share does not appear.

    525_2_image20
  3. Assets in the share you created will be displayed.

    525_2_image21

Usage of sap-bdc-connect-sdk

In this section we will import another notebook and execute 5 code pieces:

  • Install SDK
  • Create a client
  • Create a share
  • Create the share CSN
  • Publish a Data Product
  1. Download the notebook we will use to publish a data product from SAP Databricks here (Right click and Save link as).

  2. After saving the notebook locally, select Workspace in the left navigation and expand Workspace -> Users and right click on your username and select Import.

    525_2_image12
  3. Select Browse to locate the file.

    525_2_image23
  4. Import the Publish_Data_Product_Company_Clustering.py notebook.

    525_2_image24
  1. Select the Publish_Data_Product_Company_Clustering file to open the notebook.

    525_2_image25
  2. Select Environment on the right panel to change default settings.

    525_2_image25
  3. Set the environment version to 3 and Apply the change.

    525_2_image25
  4. Confirm the change.

    525_2_image25
  5. Execute the first code block by clicking Run on the upper left corner of the cell.

    This code will install the SDK. It should take about a min to complete. A green check will appear next to Run once it completes.

    Note

    Ignore version related warnings and pip dependency errors.

    525_2_image26
  6. Execute the second code block.

    This code creates a client:

    • DatabricksClient receives dbutils as a parameter, which is a SAP Databricks utility that can be used inside the Databricks notebooks
    • BdcConnectClient receives the DatabricksClient as a parameter to get information from the SAP Databricks environment (e.g. secrets, api_token, workspace_url_base)

    525_2_image27
  1. Execute the third code block to create a share.

    A share is a mechanism for distributing and accessing data across different systems. Creating or updating a share involves including specific attributes, such as @openResourceDiscoveryV1, in the request body, aligning with the Open Resource Discovery protocol. This procedure ensures that the share is properly structured and described according to specified standards, facilitating effective data sharing and management.

    Note

    There are 2 places you need to modify in this code, share_name and title. Make sure <lowercase_username> in the share_name is lowercase.
    • share_name : “company_code_clustering_share_<lowercase_username>”
    • title : “Company Code Clustering Data Product From

    525_2_image28
  2. Execute the fourth code block to create the CSN.

    The CSN serves as a standardized format for configuring and describing shares within a network. To create or update the CSN for a share, it’s advised to prepare the CSN content in a separate file and include this content in the request body. This approach ensures accuracy and compliance with the CSN interoperability specifications, facilitating consistent and effective share configuration across systems.

    Note

    Make sure share_name is in lowercase as in the previous step.
    • share_name: “company_code_clustering_share_<lowercase_username>”

    525_2_image29
  3. Execute the fifth code block to publish the data product to SAP Datasphere.

    A Data Product is an abstraction that represents a type of data or data set within a system, facilitating easier management and sharing across different platforms. It bundles resources or API endpoints to enable efficient data access and utilization by integrated systems. Publishing a Data Product allows these systems to access and consume the data, ensuring seamless communication and resource sharing.

    Note

    Make sure share_name is in lowercase as in the previous step.
    • share_name: “company_code_clustering_share_<lowercase_username>”
      525_2_image30

Verify Published Data Product

Log on to SAP Datasphere.

  1. Open a Chrome browser or Microsoft Edge browser and enter the SAP Datasphere URL.

    Alternatively, click here.

  1. Login with your user credentials.

    Username:

    Password:

    525_3_image08
  2. Once SAP Datasphere welcome page opens select Catalog & Marketplace and Search.

    525_2_image32
  1. Show filters using the filter icon then filter for Data Products and also SAP Databricks (for System Type).

    525_3_image08
  2. In display options (upper right corner) switch to Display as List.

    525_3_image08

Note

The published data product will not immediately appear in SAP Datasphere because it triggers a list of actions that are executed at pre-scheduled intervals. If you don’t see your data product, wait a few minutes and try again. Alternatively, you can continue with the next lesson, using the pre-delivered data product Company Code Clustering Data Product published in SAP Datasphere.
  1. Enter in the search bar and select Search (or press Enter).

    The data product you just published from SAP Databricks should appear first in the list.

    525_2_image33
  2. Open the data product to verify it’s the one you published from SAP Databricks.

    525_2_image33
  3. Go back to Home.

    525_2_image34

Congratulations! You have successfully published a Data Product from SAP Databricks.


Contents

  • Persona
  • Log On to SAP Databricks
  • SAP Databricks Enhancing Data Products
  • Usage of sap-bdc-connect-sdk
  • Verify Published Data Product

Persona

In this lesson we will work with a data product in SAP Databricks shared from SAP Business Data Cloud and create an enhanced version of it. Then we will share the new data product to SAP Datasphere using SAP Business Data Cloud SDK.

Prerequisites

  • You are logged on to SAP Databricks.

Log On to SAP Databricks

  1. Open a Chrome browser or Microsoft Edge browser and enter the SAP Databricks URL.

    Alternatively, click here.

  2. Provide Email and select Continue.

    • Email: @sapexperienceacademy.com
    • Password:

    525_2_image02
  3. Select Continue with SSO.

    525_2_image03
  4. Select default workspace to continue.

    525_2_image04
  5. SAP Databricks welcome page will open.

    525_2_image05

SAP Databricks Enhancing Data Products

  1. Select Catalog in the navigation menu on the left.

    525_2_image35
  2. Under Delta Shares Received, expand companycode_share -> companycode -> and select the companycode table.

    This Data Product was shared from the SAP Business Data Cloud to SAP Databricks and it’s available for consumption.

    525_2_image36
  3. Switch to the Sample Data tab. If a compute resource isn’t started choose Select compute.

    525_2_image07
  4. Select Serverless and Start and Close.

    525_2_image08
  5. Once the service starts the sample data should appear.

    Here we are accessing remote the data shared from SAP Business Data Cloud.

    525_2_image09
  1. Under My organization catalog section, expand company_code_data_product -> company_code -> and select the company_code_clusters table.

    This table has been populated with company code clusters. It’s the enhanced dataset we will share back to SAP Datasphere after reviewing how we created the table.

    525_2_image10
  2. Switch to the Sample Data tab to view data.

    This table has been created and populated to store company code clusters using a Python notebook. We won’t be reprocessing the notebook again as part of this lesson but will review how data has been enhanced.

    525_2_image11
  3. Select Workspaces in the left navigation menu.

    525_2_image37
  1. Expand Workspace folder and select Project_Artifacts folder.

    525_2_image37
  2. See the Company_Clustering notebook used to create and populate the company_code_clusters table. Select the file to open it.

    525_2_image61
  3. Read thru the explanations and the code in the notebook, explaining step by step how we were able to create the company code clusters.

    525_2_image16
  4. Now we will share the comany code clusters table via Delta Share and publish it to SAP Datasphere as a new data product.

    Select Catalog in the left navigation menu.

    525_2_image39
  5. Under My organization catalog section, expand company_code_data_product -> company_code and select the company_code_clusters table.

    525_2_image40
  6. Next select Share and choose Share via Delta Sharing.

    525_2_image41
  7. Select Create a new share with the table.

    Provide Share name and Recipients:

    • Share name : company_code_clustering_share_

      Note

      Make sure username in the Share name is lowercase. In the following steps we will execute codes in a notebook. They are case-sensitive. So make sure all characters in the username are lowercase
    • Recipients : sap-business-data-cloud

    Select Share.

    525_2_image18
  1. Select the Gear icon on the catalog and then select Delta Sharing.

    525_2_image19
  2. Switch to Shared by me and filter for your username (no space at the end) and select the delta share just created.

    Note

    Refresh the browser if the data share does not appear.

    525_2_image20
  3. Assets in the share you created will be displayed.

    525_2_image21

Usage of sap-bdc-connect-sdk

In this section we will import another notebook and execute 5 code pieces:

  • Install SDK
  • Create a client
  • Create a share
  • Create the share CSN
  • Publish a Data Product
  1. Download the notebook we will use to publish a data product from SAP Databricks here (Right click and Save link as).

  2. After saving the notebook locally, select Workspace in the left navigation and expand Workspace -> Users and right click on your username and select Import.

    525_2_image12
  3. Select Browse to locate the file.

    525_2_image23
  4. Import the Publish_Data_Product_Company_Clustering.py notebook.

    525_2_image24
  1. Select the Publish_Data_Product_Company_Clustering file to open the notebook.

    525_2_image25
  2. Select Environment on the right panel to change default settings.

    525_2_image25
  3. Set the environment version to 3 and Apply the change.

    525_2_image25
  4. Confirm the change.

    525_2_image25
  5. Execute the first code block by clicking Run on the upper left corner of the cell.

    This code will install the SDK. It should take about a min to complete. A green check will appear next to Run once it completes.

    Note

    Ignore version related warnings and pip dependency errors.

    525_2_image26
  6. Execute the second code block.

    This code creates a client:

    • DatabricksClient receives dbutils as a parameter, which is a SAP Databricks utility that can be used inside the Databricks notebooks
    • BdcConnectClient receives the DatabricksClient as a parameter to get information from the SAP Databricks environment (e.g. secrets, api_token, workspace_url_base)

    525_2_image27
  1. Execute the third code block to create a share.

    A share is a mechanism for distributing and accessing data across different systems. Creating or updating a share involves including specific attributes, such as @openResourceDiscoveryV1, in the request body, aligning with the Open Resource Discovery protocol. This procedure ensures that the share is properly structured and described according to specified standards, facilitating effective data sharing and management.

    Note

    There are 2 places you need to modify in this code, share_name and title. Make sure <lowercase_username> in the share_name is lowercase.
    • share_name : “company_code_clustering_share_<lowercase_username>”
    • title : “Company Code Clustering Data Product From

    525_2_image28
  2. Execute the fourth code block to create the CSN.

    The CSN serves as a standardized format for configuring and describing shares within a network. To create or update the CSN for a share, it’s advised to prepare the CSN content in a separate file and include this content in the request body. This approach ensures accuracy and compliance with the CSN interoperability specifications, facilitating consistent and effective share configuration across systems.

    Note

    Make sure share_name is in lowercase as in the previous step.
    • share_name: “company_code_clustering_share_<lowercase_username>”

    525_2_image29
  3. Execute the fifth code block to publish the data product to SAP Datasphere.

    A Data Product is an abstraction that represents a type of data or data set within a system, facilitating easier management and sharing across different platforms. It bundles resources or API endpoints to enable efficient data access and utilization by integrated systems. Publishing a Data Product allows these systems to access and consume the data, ensuring seamless communication and resource sharing.

    Note

    Make sure share_name is in lowercase as in the previous step.
    • share_name: “company_code_clustering_share_<lowercase_username>”
      525_2_image30

Verify Published Data Product

Log on to SAP Datasphere.

  1. Open a Chrome browser or Microsoft Edge browser and enter the SAP Datasphere URL.

    Alternatively, click here.

  1. Login with your user credentials.

    Username:

    Password:

    525_3_image08
  2. Once SAP Datasphere welcome page opens select Catalog & Marketplace and Search.

    525_2_image32
  1. Show filters using the filter icon then filter for Data Products and also SAP Databricks (for System Type).

    525_3_image08
  2. In display options (upper right corner) switch to Display as List.

    525_3_image08

Note

The published data product will not immediately appear in SAP Datasphere because it triggers a list of actions that are executed at pre-scheduled intervals. If you don’t see your data product, wait a few minutes and try again. Alternatively, you can continue with the next lesson, using the pre-delivered data product Company Code Clustering Data Product published in SAP Datasphere.
  1. Enter in the search bar and select Search (or press Enter).

    The data product you just published from SAP Databricks should appear first in the list.

    525_2_image33
  2. Open the data product to verify it’s the one you published from SAP Databricks.

    525_2_image33
  3. Go back to Home.

    525_2_image34

Congratulations! You have successfully published a Data Product from SAP Databricks.