Input Data Set Recap for Apply Regression Scenarios
The application data set must contain the same information structure as the corresponding training data set, as follows:
- The same number of variables (extra columns are ignored.)
- The same variable names as the corresponding training data set.
- The same order of presentation of the variables.
By analyzing the training data set, Smart Predict generates a regression model that explains and predicts the target variable based on the variables identified as influencers.
Once the regression model is trained, it can be applied to an application data set, generating the predicted values of the target in the output data set.
Apply Your Predictive Model
Open the relevant predictive model and select the Apply Predictive Model icon to open the Apply Predictive Model dialog.

Apply to Population
In the Data Source field, select the new data set (application data set) onto which you want to apply your predictive model.

Generated Data Set
In this section, you have several options to select the additional columns you want to include in your output data set.
Replicated columns: Select the variables from the data set that you used to train the model that is part of the output data set. The application process does not account for any columns in the application data set that do not belong to the training data set.
Statistics & Predictions: In the statistics and predictions dropdown list, various data options can be selected in the output data set. If you do not select any statistics or predictions, only the target variable and the key variables are included.

The Statistics & Predictions options include:
- Apply Date: The apply date is the start date of the predictive model application. The column type is TIMESTAMP.
- Train Date: The train date is the start date of the predictive model training. The column type is TIMESTAMP.
- Assigned Bin: While applying a regression predictive model to an input data set, the output statistics information for assigned bins can be applied. During the training step, Smart Predict uses past observations in a training data set to create a predictive model. In the application step, Smart Predict associates each observation with a predicted value.
- Based on this value, it groups the list of observations ranging from the highest to the lowest predicted value in 10 bins (or groups). Each bin represents 10% of the observations, and within each bin, the observations either have the same value or fall within a range.
- Smart Predict refers to the bins defined in the training step to assign the current observations from the input data set to the relevant bin. It compares each value obtained by the predictive model. The limits of each assigned bin are defined in the training step: it then assigns each observation to the relevant bin.
In the following example, a regression model is used to predict the deal values for the next quarter. The data set contains observations on 3,000 customers. Assigned bins are used to monitor the population structure. Each bin must contain approximately 10% of the observations. Therefore, if these figures increase or decrease for one or several bins, it indicates that the population is changing. The predictive model may need to be retrained with more recent data.
- On the left, the distribution per bin is similar in the output data set as in the training data set.
- On the right, in the apply data set, 14% of customers are in the top bin. If you check the build data set, you see that this is more than the 10% of customers expected.

- Outlier Indicator: For each row in the application data set, the outlier indicator is one if the row is an outlier regarding the target, otherwise it is zero. An observation is considered an outlier when the prediction error is greater than three times the average prediction error found on similar observations.
- Predicted Value: Selecting this option creates the predicted value from the regression model in the output table.
- Prediction Explanation: Can be used to display the reasons explaining why Smart Predict has generated a specific prediction for a specific entity of the application data set.
An explanation (or reason) is a combination of a variable and its value; for example, age: 35. It corresponds to the value assigned for a given variable to produce a specific prediction.
The strength indicates how much this value affects the prediction and the direction of that impact. In a regression model, a positive strength increases the predicted value, while a negative strength decreases it.
Smart Predict can generate up to 10 explanations. When the predictive model uses more than 10 influencers to generate the predictions, Smart Predict aggregates the explanations with the lowest absolute strength (less contributing influencers) into two groups:
- Positive Others: aggregates the smallest positive influencers
- Negative Others: aggregates the smallest negative influencers.
The strength associated with positive others and negative others is the sum of the strength of the aggregated explanations. When the predictive model uses fewer than 10 or exactly 10 influencers to generate predictions, the other group isn't generated because the provided list of explanations is complete.
Output As: Give a name to your generated data set.