# Using Platt Scaling and Isotonic Regression to Minimize LogLoss Error in R

• Shan says:

Nice article. Can you pls elaborate the following snippet :

k <- ldply(levels(bin.pred), function(x) {
idx <- x == bin.pred
c(sum(obs[idx]) / length(obs[idx]), mean(pred[idx]))
})

• NSS says:

@Shan

I would like to draw your attention to the quoted text in the article by the original author of reliability diagrams- “For each bin, the mean predicted value is plotted against the true fraction of positive cases. If the model is well calibrated the points will fall near the diagonal line.”

So the above snippet deals on things 1-observed values , 2- predicted values

So what is does is- it takes the mean of the projected forecast for each bin { mean(pred[idx]} and plot it against the observed relative(to the predicted values bins) frequency {sum(obs[idx])/length(length(obs[idx]}.

Ideally, this should lie near to 1:1 line.

I hope I made this clear.

If you do not want to go into the details of the reliability diagrams you can use the reliability.plot function from verification package.

Regards.

• Anurag says:

If it is a multi-class classification problem,then what should be the approach of calibrating the outcomes?

• NSS says:

@Anurag ,Convert it into a 1 vs other problem and proceed as above.

Regards.

• GGN says:

Where exactly is the platt scaling being performed below?

# performing platt scaling on the dataset
colnames(dataframe)<-c("x","y")

• NSS says:

@GGN, just below this platt scaling is just a name to run a logistic regression model on the output of cross validation dataset.

# performing platt scaling on the dataset