You can access your results directly by opening the output data set or depending on your business needs, consume the output data set in a BI story. If the application data set contains more columns than your training data set, the application process ignores additional columns.
Apply Predictive Model
Open the relevant predictive model and, to open the apply model dialog, select the Apply Predictive Model icon.
Apply to Population
In the apply to population section, select the new data set (application data set) that you want to apply your predictive model on in the Data Source field.
Generated Data Set
In this section, you have various options to select the additional columns you want to include in your output data set.
Replicated Columns: Select the variables from the data set that you used to train the model that you want to be part of the output data set. The application process does not consider any columns of the application data set that do not belong to the training data set.
Statistics & Predictions: In the statistics and predictions dropdown, the following data can be selected for the output data set. If you don't select any statistics or predictions, only the target variable and the key variables are included.
The Statistics & Predictions options include:
- Apply Date: It's the start date of the predictive model application. The type of the column is TIMESTAMP.
- Train Date: It's the start date of the predictive model training. The type of the column is TIMESTAMP.
- Assigned Bin: During the application step, Smart Predict refers to the bins defined in the training step to assign the current observations from the input data set to the relevant bin. It compares each value obtained by the predictive model with the limits of each assigned bin, defined in the training step. It then assigns each observation to the relevant bin. In the example below, you can see how Smart Predict arranged the data as shown in the table on the left when building the model and the result in the output dataset when the model was applied to an input dataset containing observations on 700 customers.
- Outlier Indicator: For each row in the application data set, the Outlier Indicator is 1 if the row is an outlier regarding the target, otherwise it is 0. An observation is considered an outlier when the prediction error is greater than three times the average prediction error found on similar observations.
- Predicted Category: For each row in the application data set, the Predicted Category is the target category determined by the predictive model. Classification predictive models use a nominal target with two values only.
- Predicted Probability: Classification predictive models use a nominal target with two values only. For each row in the application data set, the Prediction Probability is the probability that the Predicted Category is the target value.
Output As: Give a name to your generated data set.