simpler machine learning with skll 1.0
DESCRIPTION
As the popularity of machine learning techniques spreads to new areas of industry and science, the number of potential machine learning users is growing rapidly. While the fantastic scikit-learn library is widely used in the Python community for tackling such tasks, there are two significant hurdles in place for people working on new machine learning problems: • Scikit-learn requires writing a fair amount of boilerplate code to run even simple experiments. • Obtaining good performance typically requires tuning various model parameters, which can be particularly challenging for beginners. SciKit-Learn Laboratory (SKLL) is an open source Python package, originally developed by the NLP & Speech group at the Educational Testing Service (ETS), that addresses these issues by providing the ability to run scikit-learn experiments with tuned models without writing any code beyond what generates the features. This talk will provide an overview of performing common machine learning tasks with SKLL, and highlight some of the new features that are present as of the 1.0 release.TRANSCRIPT
![Page 1: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/1.jpg)
Simpler Machine Learning with SKLL 1.0
Dan Blanchard Educational Testing Service
PyData NYC 2014
![Page 2: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/2.jpg)
![Page 3: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/3.jpg)
![Page 4: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/4.jpg)
![Page 5: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/5.jpg)
Survived Perished
![Page 6: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/6.jpg)
Survived Perishedfirst class, female,
1 sibling, 35 years old
third class, female,
2 siblings, 18 years old
second class, male,
0 siblings, 50 years old
![Page 7: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/7.jpg)
Survived Perishedfirst class, female,
1 sibling, 35 years old
third class, female,
2 siblings, 18 years old
second class, male,
0 siblings, 50 years old
Can we predict survival from data?
![Page 8: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/8.jpg)
SciKit-Learn Laboratory
![Page 9: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/9.jpg)
SKLL
It's where the learning happens
![Page 10: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/10.jpg)
Learning to Predict Survival$ ./make_titanic_example_data.py Loading train.csv... done Writing titanic/train/socioeconomic.csv...done Writing titanic/train/family.csv...done Writing titanic/train/vitals.csv...done Writing titanic/train/misc.csv...done Writing titanic/train+dev/socioeconomic.csv...done Writing titanic/train+dev/family.csv...done Writing titanic/train+dev/vitals.csv...done Writing titanic/train+dev/misc.csv...done Writing titanic/dev/socioeconomic.csv...done Writing titanic/dev/family.csv...done Writing titanic/dev/vitals.csv...done Writing titanic/dev/misc.csv...done Loading test.csv... done Writing titanic/test/socioeconomic.csv...done Writing titanic/test/family.csv...done Writing titanic/test/vitals.csv...done Writing titanic/test/misc.csv...done
1. Split up given training set: train (80%) and dev (20%)
![Page 11: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/11.jpg)
Learning to Predict Survival2. Pick classifiers to try:
1. Decision Tree
2. Naive Bayes
3. Random forest
4. Support Vector Machine (SVM)
![Page 12: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/12.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival3. Create configuration file for SKLL
![Page 13: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/13.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
directory with feature files for training learner
Learning to Predict Survival3. Create configuration file for SKLL
![Page 14: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/14.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
directory with feature files for evaluating performance
3. Create configuration file for SKLL
![Page 15: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/15.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival3. Create configuration file for SKLL
![Page 16: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/16.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
# of siblings, spouses, parents, children
3. Create configuration file for SKLL
![Page 17: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/17.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
departure port
3. Create configuration file for SKLL
![Page 18: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/18.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
fare & passenger class
3. Create configuration file for SKLL
![Page 19: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/19.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
sex & age
3. Create configuration file for SKLL
![Page 20: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/20.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival3. Create configuration file for SKLL
![Page 21: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/21.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival
directory to store evaluation results
3. Create configuration file for SKLL
![Page 22: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/22.jpg)
[General] experiment_name = Titanic_Evaluate_Untuned task = evaluate
[Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Output] results = output models = output
Learning to Predict Survival3. Create configuration file for SKLL
directory to store trained models
![Page 23: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/23.jpg)
Learning to Predict Survival4. Run the configuration file with run_experiment$ run_experiment evaluate.cfg
Loading train/family.csv... done Loading train/misc.csv... done Loading train/socioeconomic.csv... done Loading train/vitals.csv... done Loading dev/family.csv... done Loading dev/misc.csv... done Loading dev/socioeconomic.csv... done Loading dev/vitals.csv... done Loading train/family.csv... done Loading train/misc.csv... done Loading train/socioeconomic.csv... done Loading train/vitals.csv... done Loading dev/family.csv... done ...
![Page 24: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/24.jpg)
Learning to Predict SurvivalExperiment Name: Titanic_Evaluate_Untuned SKLL Version: 1.0.0 Training Set: train (712) Test Set: dev (179) Feature Set: ["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"] Learner: RandomForestClassifier Scikit-learn Version: 0.15.2 Total Time: 0:00:02.065403
+-------+------+------+-----------+--------+-----------+ | | 0.0 | 1.0 | Precision | Recall | F-measure | +-------+------+------+-----------+--------+-----------+ | 0.000 | [96] | 19 | 0.865 | 0.835 | 0.850 | +-------+------+------+-----------+--------+-----------+ | 1.000 | 15 | [49] | 0.721 | 0.766 | 0.742 | +-------+------+------+-----------+--------+-----------+ (row = reference; column = predicted) Accuracy = 0.8100558659217877
5. Examine results
![Page 25: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/25.jpg)
Aggregate Evaluation Results
Dev. Accuracy Learner
0.8101 RandomForestClassifier
0.7989 DecisionTreeClassifier
0.7709 SVC
0.7095 MultinomialNB
![Page 26: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/26.jpg)
[General] experiment_name = Titanic_Evaluate task = evaluate [Input] train_directory = train test_directory = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Tuning] grid_search = true objective = accuracy
[Output] results = output
Tuning learnerCan we do better than default hyperparameters?
![Page 27: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/27.jpg)
Tuned Evaluation Results
Untuned Accuracy Tuned Accuracy Learner
0.8101 0.8380 RandomForestClassifier
0.7989 0.7989 DecisionTreeClassifier
0.7709 0.8156 SVC
0.7095 0.7095 MultinomialNB
![Page 28: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/28.jpg)
[General] experiment_name = Titanic_Predict task = predict [Input] train_directory = train+dev test_directory = test featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"] id_col = PassengerId label_col = Survived
[Tuning] grid_search = true objective = accuracy
[Output] results = output
Using All Available DataUse training and dev to generate predictions on test
![Page 29: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/29.jpg)
Test Set Accuracy
Train only Train + DevLearner
Untuned Tuned Untuned Tuned
0.727 0.756 0.746 0.780 RandomForestClassifier
0.703 0.742 0.670 0.742 DecisionTreeClassifier
0.608 0.679 0.612 0.679 SVC
0.627 0.627 0.622 0.622 MultinomialNB
![Page 30: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/30.jpg)
Advanced SKLL Features• Read & write .arff, .csv, .jsonlines, .libsvm, .megam, .ndj, and .tsv data
• Parameter grids for all supported scikit-learn learners
• Custom learners• Parallelize experiments on
DRMAA clusters via GridMap• Ablation experiments
• Collapse/rename classes from config file
• Feature scaling• Rescale predictions to be closer
to observed data• Command-line tools for joining,
filtering, and converting feature files
• Python API
![Page 31: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/31.jpg)
Currently Supported LearnersClassifiers Regressors
Linear Support Vector Machine Elastic Net
Logistic Regression Lasso
Multinomial Naive Bayes Linear
AdaBoost
Decision Tree
Gradient Boosting
K-Nearest Neighbors
Random Forest
Stochastic Gradient Descent
Support Vector Machine
![Page 32: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/32.jpg)
Contributors• Nitin Madnani
• Mike Heilman
• Nils Murrugarra Llerena
• Aoife Cahill
• Diane Napolitano
• Keelan Evanini
• Ben Leong
![Page 33: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/33.jpg)
References• Dataset: kaggle.com/c/titanic-gettingStarted
• SKLL GitHub: github.com/EducationalTestingService/skll
• SKLL Docs: skll.readthedocs.org
• Titanic configs and data splitting script in examples dir on GitHub
@dsblanch
dan-blanchard
![Page 34: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/34.jpg)
Bonus Slides
![Page 35: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/35.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
![Page 36: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/36.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
confusion matrix
![Page 37: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/37.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
overall accuracy on test set
![Page 38: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/38.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
precision, recall, f-score for each class
![Page 39: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/39.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
tuned model parameters
![Page 40: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/40.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
objective function score on test set
![Page 41: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/41.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
# Generate predictions from trained modelpredictions = learner.predict(test_examples)
![Page 42: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/42.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
# Generate predictions from trained modelpredictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVMlearner = Learner('SVC')fold_result_list, grid_search_scores = learner.cross_validate(train_examples)
![Page 43: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/43.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
# Generate predictions from trained modelpredictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVMlearner = Learner('SVC')fold_result_list, grid_search_scores = learner.cross_validate(train_examples)
per-fold evaluation results
![Page 44: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/44.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
# Generate predictions from trained modelpredictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVMlearner = Learner('SVC')fold_result_list, grid_search_scores = learner.cross_validate(train_examples)
per-fold training set obj. scores
![Page 45: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/45.jpg)
SKLL APIfrom skll import Learner, Reader
# Load training examplestrain_examples = Reader.for_path('myexamples.megam').read()
# Train a linear SVMlearner = Learner('LinearSVC')learner.train(train_examples)
# Load test examples and evaluatetest_examples = Reader.for_path('test.tsv').read()conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)
# Generate predictions from trained modelpredictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVMlearner = Learner('SVC')fold_result_list, grid_search_scores = learner.cross_validate(train_examples)
![Page 46: Simpler Machine Learning with SKLL 1.0](https://reader033.vdocuments.us/reader033/viewer/2022060121/55943ea21a28abd85b8b46fd/html5/thumbnails/46.jpg)
SKLL APIimport numpy as npfrom os.path import joinfrom skll import FeatureSet, NDJWriter, Writer
# Create some training exampleslabels = []ids = []features = []for i in range(num_train_examples): labels.append("dog" if i % 2 == 0 else "cat") ids.append("{}{}".format(y, i)) features.append({"f1": np.random.randint(1, 4), "f2": np.random.randint(1, 4)})feat_set = FeatureSet('training', ids, labels=labels, features=features)
# Write them to a filetrain_path = join(_my_dir, 'train', 'test_summary.jsonlines')Writer.for_path(train_path, feat_set).write()# OrNDJWriter.(train_path, feat_set).write()