Andrew Ng machine learning 课程笔记--贝叶斯统计正则化

Source

Bayesian statistics:it turns out if you add this prior term there,it turns out that the authorization objective you end up optimizing turns out to be this.where you add an extra term that,you know ,penalizas your parameter theta as being large.this algorithm tend to keep your parameters small.that strengthening the parameters has the sffect of keeping the functions you fit to be smoother and less likely to overfit.like logistic regresson would be very much prone to overfitting.but it turns out that with this sort of baysian regularation,with Gaussian,logistic regression becomes a very sffective text classification algorithm .

Regularation:

Online learning:in which you have to make predictions even while you are in the process of learning.

Advice for applying machine learning:

Diagnostics for debugging learning algorithms:

Sort of talk briefly about error analyses and ablative analysis:

Advice for how to get started on a machine learning problem:

In case 1,let's say that J of SVM is,indeed,is greater than J of BLR ,but we know that Bayesian logistic regression was trying to maximize J of theta,that's the definition of Bayesian logistic regression.so this means that theta the value of theta output that Bayesian logistic regression actually fails to maximize J because the support back to machine actually returned the value of the theta that you know a better job out-maximizing J.and so,this tells me that Bayesian logistic regression did not actually maximize J correctly,and so the problem is with the optimization algorithm.the optimization algorithm has not converged.the other case is as follows,this means that Bayesian logistic regression  actually attains the higher value for the optimization objective J than does SVM.the svm,what does worse on your optimization problem,actually does better on the weighted accurcy measure.so what this means is that  something that does worse on your optimization objective,on J.can actually do better on the weighted accurcy objective.and this really means that maximizing J of theta,does not really correspond that well to maximizing your weighted accurcy critera.and that is tell you that J of theta is maybe the wrong optimization objective to be maximizing.