how to identify right performance evaluation metrics in machine learning based dissertation -...

How to Identify Right PerformanceEvaluation Metrics In MachineLearning Based DissertationAn Academic presentation byDr. Nancy Agnes, Head, Technical Operations, PhdassistanceGroup www.phdassistance.comEmail: [email protected]

TODAY'S OUTLINE

Introduction

Regression metrics

Classification metrics

Conclusion

About phdassistance

INTRODUCTIONEvery Machine Learning pipeline has performancemeasurements.

They inform you if you're progressing and give youa number.

A metric is required for all machine learningmodels, whether linear regression or a SOTAmethod like BERT.

https://www.phdassistance.com/industries/computer-science-information/

Every Machine Learning Activity, likeperformance measurements, can be split downinto Regression or Classification.

For both issues, there are hundreds of metricsto choose from, but we'll go through the mostcommon ones and the information they giveregarding model performance.

It's critical to understand how your modelinterprets your data!

https://www.phdassistance.com/blog/machine-learning-on-big-data-opportunities-and-challenges-future-research-direction-for-phd-scholars/

Loss functions are not the same as metrics. Loss functions display a model'sperformance.

They're often differentiable in the model's parameters and are used to train a machinelearning model (using some form of optimization like Gradient Descent).

Metrics are used to track and quantify a model's performance (during training andtesting), and they don't have to be differentiable.

If the performance measure is differentiable for some tasks, it may also be utilized as aloss function (possibly with additional regularizations), such as MSE.

https://www.phdassistance.com/blog/data-management-challenges-in-production-machine-learning/

https://www.phdassistance.com/blog/data-management-challenges-in-production-machine-learning/

PhD Assistance experts to develop new frameworks and novel techniques onimproving the optimization for your engineering dissertation Services.

http://www.phdassistance.com/

http://www.phdassistance.com/industries/engineering-technology/

REGRESSION METRICSThe output of regression models is continuous.

As a result, we'll need a measure that is basedon computing some type of distance betweenanticipated and actual values.

We'll go through these machine learningmeasures in depth in order to evaluateregression models:

https://www.phdassistance.com/services/phd-dissertation/


Because it does not exaggerate mistakes, it is more resistant to outliers than MAE.

It tells us how far the forecasts differed from the actual result. However, because MAEutilizes the absolute value of the residual, we won't know which way the mistake isgoing, i.e. whether we're under- or over-predicting the data.

MEAN ABSOLUTE ERROR (MAE)

The average of the difference between the ground truth and projected values is the MeanAbsolute Error. There are a few essential factors for MAE to consider:

There is no need to second-guess error interpretation.

In contrast to MSE, which is differentiable, MAE is non-differentiable.

This measure, like MSE, is straightforward to apply.

Hire PhD Assistance experts to develop your algorithm and coding implementationon improving the secure access for your Engineering dissertation Services.

http://www.phdassistance.com/industries/computer-science-information/

http://www.phdassistance.com/services/phd-dissertation/

The mean squared error is arguably the most often used regression statistic. Itsimply calculates the average of the squared difference between the goal valueand the regression model's projected value.

A few essential features of MSE:

Because it is differentiable, it can be better optimized.

MEAN SQUARED ERROR (MSE):

It penalizes even minor mistakes by squaring them, resulting in an overestimation of themodel's badness.

The squaring factor (scale) must be considered while interpreting errors.

It's indeed essentially more prone to outliers than other measures due to the squaring effect.

The square root of the average of the squared difference between the target value and thevalue predicted by the regression model is the Root Mean Squared Error.

It corrects a few flaws in MSE.

ROOT MEAN SQUARED ERROR (RMSE)

A few essential points of RMSE:

It maintains MSE's differentiable feature.

It square roots the penalization of minor mistakes performed by MSE.

Because the scale is now the same as the random variable, error interpretation is simple.

Because scale factors are effectively standardized, outliers are less likely to causeproblems.

Its application is similar to MSE.

PhD Assistance experts has experience in handling Dissertation And Assignment incloud security and machine learning techniques with assured 2:1 distinction. Talk toExperts Now

The R2 coefficient of determination is a post measure, meaning it is determined afterother metrics have been calculated.

The purpose of computing this coefficient is to answer the question "How much (whatpercentage) of the entire variance in Y (target) is explained by variation in X (regressionline)?"The sum of squared errors is used to compute this.

R² COEFFICIENT OF DETERMINATION


A few thoughts on the R2 results:

If the regression line's sum of Squared Error is minimal, R2 will be near to 1 (ideal), indicatingthat the regression was able to capture 100% of the variance in the target variable.

In contrast, if the regression line's sum of squared error is high, R2 will be close to 0,indicating that the regression failed to capture any variation in the target variable.

The range of R2 appears to be (0,1), but it is really (-,1) since the ratio of squared errors of theregression line and mean might exceed 1 if the squared error of the regression line issufficiently high (>squared error of the mean).

PhD Assistance has vast experience in developing dissertation research topics for studentspursuing the UK dissertation in business management. Order Now.

http://www.phdassistance.com/pricing/

http://www.phdassistance.com/

http://www.phdassistance.com/services/phd-dissertation/phd-dissertation-full/

http://www.phdassistance.com/pricing/

The R2 technique has various flaws, such as Deceiving The Researcher into assumingthat the model is improving when the score rises while, in fact, no learning is taking place.

This can occur when a model over fits the data; in such instance, the variance explainedwill be 100%, but no learning will have occurred. R2 is modified with the number ofindependent variables to correct this.

Adjusted R2 is usually lower than R2 since it accounts for rising predictors and onlyindicates improvement when there is one.

ADJUSTED R²

https://www.phdassistance.com/services/phd-dissertation/research-proposal/

One of the most explored fields in the world is classificationissues. Almost all production and industrial contexts have usecases.

The list goes on and on: speech recognition, facial recognition,text categorization, and so on.

We need a measure that compares discrete classes in some waysince classification algorithms provide discrete output.

CLASSIFICATION METRICS

Classification Metrics assess a model's performance and tell you if the classification isexcellent or bad, but each one does it in a unique way.

So, in order to assess Classification models, we'll go through the following measures indepth:

The easiest measure to use and apply is classification accuracy, which is defined as thenumber of correct predictions divided by the total number of predictions, multiplied by 100.

We may accomplish this manually looping between the ground truth and projected values,or we can use the scikit-learn module .

ACCURACY

The Ground-Truth Labels vs. Model Predictions Confusion Matrix is a tabularrepresentation of the ground-truth labels vs. model predictions.

The examples in a predicted class are represented by each row of the confusion matrix,whereas the occurrences in an actual class are represented by each column. TheConfusion Matrix isn't strictly a performance indicator, but it serves as a foundation forother metrics to assess the outcomes.

We need to establish a value for the null hypothesis as an assumption in order tocomprehend the confusion matrix.

CONFUSION MATRIX (NOT A METRIC BUT FUNDAMENTAL TO OTHERS)

Type-I mistakes are the subject of the precision metric (FP). When we reject a valid nullHypothesis(H0), we make a Type-I mistake.

For example, Type-I error mistakenly classifies cancer patients as non-cancerous. Anaccuracy score of 1 indicates that your model did not miss any true positives and candistinguish correctly between accurate and wrong cancer patient labeling.

What it can't detect is Type-II error, or false negatives, which occur when a non-cancerous patient is mistakenly diagnosed as malignant.

1.PRECISION AND RECALL

A low accuracy score (0.5) indicates that your classifier has a significant amount of falsepositives, which might be due to an imbalanced class or poorly adjusted model hyperparameters.

The percentage of genuine positives to all positives in ground truth is known as the recall.The type-II mistake is the subject of the recall metric (FN). When we accept a false nullhypothesis (H0), we make a type-II mistake.

As a result, type-II mistake is mislabeling non-cancerous patients as malignant in thissituation. Recalling to 1 indicates that your model did not miss any genuine positives andcan distinguish properly from wrongly classifying cancer patients.

What it can't detect is type-I error, or false positives, which occur when a malignantpatient is mistakenly diagnosed as non-cancerous.

A low recall score (0.5) indicates that your classifier has a lot of false negatives,which might be caused by an unbalanced class or an untuned model hyperparameter.

To avoid FP/FN in an unbalanced class issue, you must prepare your data aheadof time using over/under-sampling or focal loss.

Precision and recall are combined in the F1-score measure. In reality, the harmonicmean of the two is the F1 score.

A high F1 score now denotes a high level of accuracy as well as recall. It has anexcellent mix of precision and recall, and it performs well on tasks with unbalancedcategorization.

A low F1 score means (nearly) nothing; it merely indicates performance at a certainlevel. We didn't strive to perform well on a large portion of the test set because wehad low recall. Low accuracy indicates that we didn't get many of the cases werecognised as affirmative cases accurate.

F1-SCORE

However, a low F1 does not indicate which instances are involved. A high F1indicates that we are likely to have good accuracy and memory for a significantchunk of the choice (which is informative).

It's unclear what the issue is with low F1 (poor accuracy or low precision). Is FormulaOne merely a gimmick? No, it's frequently used and regarded a good metric forarriving at a choice, but only with a few changes.

When you combine FPR (false positive rates) with F1, you can reduce type-Imistakes and figure out who's to blame for your poor F1 score.

AUC-ROC score/curves are also known as AUC-ROC score/curves. True positiverates (TPR) and false positive rates (FPR) are used .

TPR/recall, on the surface, is the percentage of positive data points that arecorrectly classified as positive when compared to all positive data points. To put itanother way, the higher the TPR, the fewer positive data items we'll overlook.

With regard to all negative data points, FPR/fallout refers to the fraction ofNegative Data Points that are wrongly deemed positive. To put it another way, thegreater the FPR, the more negative data points we'll miss.

AU-ROC(AREA UNDER RECEIVER OPERATING CHARACTERISTICS CURVE)

https://www.phdassistance.com/services/phd-data-analysis/computer-programming/

We first compute the two former measures using many different thresholds for the logisticregression, and then plot them on a single graph to merge the FPR and the TPR into asingle metric. The ROC curve represents the result, and the measure we use is the areaunder the curve, which we refer to as AUROC.

A no-skill classifier is one that cannot distinguish between classes and will always predicta random or constant class. The proportion of positive to negative classes affects the no-skill line. It's a horizontal line with the ratio of positive cases in the dataset as its value. It's0.5 for a well-balanced dataset.

The area represents the likelihood that a randomly chosen positive example ranks higherthan a randomly chosen negative example (i.e., has a higher probability of being positivethan negative).

As a result, a high ROC merely implies that the likelihood of a positive example beingpicked at random is truly positive.

High ROC also indicates that your algorithm is good at rating test data, with the majorityof negative instances on one end of a scale and the majority of positive cases on the other.

When your problem has a large class imbalance, ROC curves aren't a smart choice. Theexplanation for this is not obvious, but it can be deduced from the formulae; you can learnmore about it here.

After processing an imbalance set or utilizing focus loss techniques, you can still utilisethem in that circumstance. Other than academic study and comparing different classifiers,the AUROC measure is useless.

CONCLUSIONI hope you now see the value of performancemeasures in model evaluation and are aware of afew odd Small Techniques For Deciphering yourmodel.

One thing to keep in mind is that these metrics maybe tweaked to fit your unique use case. Take, forinstance, a weighted F1-score.

It calculates each label's metrics and determinestheir average weight based on support (the numberof true instances for each label)

https://www.phdassistance.com/services/

A weighted accuracy, or Balanced Accuracy in technical words, is anotherexample.

To cope with unbalanced datasets, balanced accuracy in binary and multiclassclassification problems is employed.

It's defined as the average recall in each category.

ABOUT PHDASSISTANCE Ph.D. assistance expert helps you for research proposal in wide range of subjects. Wehave a specialized academicians who are professional and qualified in their particularspecialization, like English, physics, chemistry, computer science, criminology, biologicalscience, arts and literature, law ,sociology, biology, law, geography, social science,nursing, medicine, arts and literature, computer science, software programming,information technology, graphics, animation 3D drawing, CAD, construction etc.

We also serve some other services as ; manuscript writing service, coursework writingservice, dissertation writing service, manuscript writing and editing service, animationservice.

CONTACT US

UNITED KINGDOM

+44 7537144372

INDIA+91-9176966446

[email protected]

how to identify right performance evaluation metrics in machine learning based dissertation -...

Education

phdassistance

computerscienceinmachinelearning

machinelearningphdthesisservices

evaluationmetricsmachinelearning

machinelearningbasedphddissertation

phdmachinelearninghelp

phddissertationwritingservices

phdmanuscripthelpuk