Text Embedding
Text Embedding
Text embedding is the process of converting words into a numerical form that can be understood and utilized by machine learning algorithms. This method represents a word or a piece of text as a high-dimensional vector within a predefined space, where words or text with similar meanings are positioned closer to one another.
Creating text embeddings in SAP HANA Cloud involves using the VECTOR_EMBEDDING function to convert text into vector representations. This functionality is useful for enabling advanced text search and similarity queries within the SAP HANA Cloud database.
Introduction
SAP HANA Cloud can create vector embeddings for texts with the VECTOR_EMBEDDING Function (Vector).
The VECTOR_EMBEDDING function receives the following arguments:
-
The first argument is the text to be embedded.
-
The second argument specifies the type of the text. It must be either ‘DOCUMENT’ or ‘QUERY’. ‘DOCUMENT’ must be used for texts that are stored in the database and that must be searchable. ‘QUERY’ must be used for texts that are used in queries.
-
The third argument specifies which model and which version of the model are used. A list of available models and versions is available in VECTOR_EMBEDDING Function (Vector).
The VECTOR_EMBEDDING function creates a vector embedding for the passed text using the model identified by the passed model and version string.
The dimension of the returned vector depends on the used model.
Models might have limits on the number of tokens in the text. Tokens beyond the token limit will be ignored.
Available models and versions
| Model Name and Version | Vector Dimension | Token Limit | Supported Languages | Description |
|---|---|---|---|---|
| SAP_NEB.20240715 | 768 | 256 | de, en, es, fr, pt | This SAP HANA text embedding model is a model based on the transformer structure. It can effectively encode textual language information into vectors, and the encoded vectors can be used for tasks such as information retrieval, text classification, clustering, semantic similarity searches. |
Try it out!
The following section uses the SQL console within Lobby to work through some practical examples.
The following statement demonstrates how the function is used. Copy the query and paste it into a new SQL console window in the Database Explorer. Then run the query by pressing the 'Run' icon or by pressing the 'F8' key:
1SELECT VECTOR_EMBEDDING('Hello world!', 'DOCUMENT', 'SAP_NEB.20240715') FROM DUMMY;- The first argument is the text that should be embedded (Hello world!).
- The second argument specifies the type of the text. In this case, 'DOCUMENT'.
- The third argument specifies which model and which version of the model should be used (SAP_NEB.20240715).
This results in a vector representation of the text ‘Hello World!’:

How to Automatically Create Vector Embeddings
It’s possible to use generated columns or triggers to automatically create embeddings for texts on insert and update. Using generated columns is the simpler approach, but they don’t support referring to NCLOB columns. Therefore, if texts are stored in NCLOB columns, triggers must be used. In cases where the model being used has a rather small token limit, then NVARCHAR(5000) columns should be sufficient.
Let’s look at both methods for automatic generation of embeddings.
Create Embeddings with Generated Columns
A typical table definition for storing text paragraphs and their automatically generated embedding could look like this:
-
Copy the below query and run it in the sql console window of Database Explorer to create a table called MY_EMBEDDINGS.
Code Snippet123456CREATE TABLE MY_EMBEDDINGS ( TITLE NVARCHAR(100), PARAGRAPH_ID INTEGER, PARAGRAPH NVARCHAR(5000), EMBEDDING REAL_VECTOR GENERATED ALWAYS AS VECTOR_EMBEDDING(PARAGRAPH, 'DOCUMENT', 'SAP_NEB.20240715') );
-
Run the following query to insert a new row to the table:
Code Snippet12INSERT INTO MY_EMBEDDINGS (TITLE, PARAGRAPH_ID, PARAGRAPH) VALUES ('Hello world!', 1, 'Hello world!'); -
Observe the results to make sure the query was executed successfully:

-
Inspect the table from the catalog to verify that the embedded data has been automatically created:

As can be seen, the embeddings have been automatically generated and stored in the EMBEDDING column:

Create Embeddings using Triggers
Another way to generate embeddings on the fly is by using triggers. This is also the option to use when the column to be embedded is of type NCLOB.
The following SQL creates a table called MY_EMBEDDINGS_T which uses the NCLOB Data type for the column Paragraph. The query will also define a Trigger on this table which is invoked whenever an update or insert occurs on the Paragraph column.
-
Copy this query into the SQL console in Database Explorer and run it:
Code Snippet123456789101112131415CREATE TABLE "MY_EMBEDDINGS_T" ( TITLE NVARCHAR(100), PARAGRAPH_ID INTEGER, PARAGRAPH NCLOB, EMBEDDING REAL_VECTOR ); CREATE TRIGGER CREATE_MY_EMBEDDING BEFORE INSERT OR UPDATE OF PARAGRAPH ON "MY_EMBEDDINGS_T" REFERENCING NEW ROW AS newrow ONLINE BEGIN newrow.EMBEDDING = VECTOR_EMBEDDING( :newrow.PARAGRAPH, 'DOCUMENT', 'SAP_NEB.20240715'); END; -
Verify the table has been created and note the column data types.

- Insert a new row into the table by running the following:Code Snippet12INSERT INTO MY_EMBEDDINGS_T (TITLE, PARAGRAPH_ID, PARAGRAPH) VALUES ('Hello world!', 1, 'Hello world!'); INSERT INTO MY_EMBEDDINGS_T (TITLE, PARAGRAPH_ID, PARAGRAPH) VALUES ('SAP', 2, 'SAP was started in 1972 by five former IBM employees with a vision of creating a standard application software for real-time business processing.');
- Observe the results by inspecting the table data:Code Snippet1SELECT * FROM "MY_EMBEDDINGS_T"
The insertion of a new row has triggered the vector embedding function automatically and the Embedding column has been populated:

How to write queries
Use the VECTOR_EMBEDDING function in top-k similarity searches to directly create an embedding from a question. An example using the table definition from the previous section could look like this:
-
Run the following sql statement:
Code Snippet1234SELECT TOP 5 TITLE, PARAGRAPH, COSINE_SIMILARITY(VECTOR_EMBEDDING( 'When was SAP founded?', 'QUERY', 'SAP_NEB.20240715' ), EMBEDDING) AS "SIMILARITY" FROM MY_EMBEDDINGS_T ORDER BY "SIMILARITY" DESC;
This lesson discussed the benefits of text Embeddings as well as providing a hands-on SQL tutorial to the SAP HANA Cloud Text Embedding.
The next lesson looks at multi-modeling with Text Embedding, Vector Engine, Spatial Engine and Document Store.