Download - Machine Learning Application Development
![Page 1: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/1.jpg)
Developing Machine Learning ApplicationsGeoff Holmes, University of Waikato
1
![Page 2: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/2.jpg)
Outline
• What application development have we done?
• What lessons have we learned?
• What is needed in terms of the future of machine learning application development?
2
![Page 3: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/3.jpg)
Applications – a taxonomy
• UCI data sets – very much like our early agricultural data
• Competition data – usually larger than above, often difficult
• Signal control applications (often involve reinforcement learning) – eg autonomous helicopters, vehicles, learning the signature of a great pianist, learning to sail, learning to drive racing cars faster, learning to play soccer (often linked to robotics)
• Key to success = objective measurement – eg Human Computer Interaction, Speech and Image Recognition, Computer Games, etc.
3
![Page 4: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/4.jpg)
WEKA Waikato Environment for Knowledge Analysis
• Machine Learning at Waikato started in 1993
• Build an interface to enable several ML methods to be compared on same data
• Explore datasets of importance to the agricultural sector in NZ
• Apple bruising, Venison bruising, Bull behaviour, Grass grubs, Pasture production, Pea seed colour, Slugs, Squash harvest, Wasp nests, White clover persistence
• Cow culling
• Datasets very of the “bring out your dead” variety
4
![Page 5: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/5.jpg)
WEKA – unscientific study from Google Scholar
• For the query “WEKA applications”
• Bioinformatics
• Grid Computing
• Medicine
• Business and Finance
• Computer Networks
• Education
5
![Page 6: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/6.jpg)
Early lessons learned
• Using WEKA is good but only static solutions are possible
• Datasets need to be large enough to yield significant and meaningful results
• Datasets involving human judgement tend to be unreliable
6
![Page 7: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/7.jpg)
Scientific Equipment Application Methodology
• Obtain samples and reference data from existing technology (eg wet chemistry) – establish targets Y.
• Process same samples using a proxy (eg NIR) – new X
• Construct new dataset with new X and Y
7
![Page 8: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/8.jpg)
Near Infrared Spectroscopy
• Once concept was proven we needed a system to support commercial use (ie alongside the LIMS)
• Developed S2 (with WEKA interface):
• Used continuously at Hill Laboratories and BLGG (Holland) since around 2005 – never gone wrong!
• So far it is the best application of the technology that we have ever come across.
• Faster than wet chemistry
• Predictions can be more accurate
• Large cost savings – multiple analyses per sample
8
![Page 9: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/9.jpg)
S2
9
![Page 10: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/10.jpg)
NIR – lessons learned
• Very lightweight input/output solution using dropboxmethodology was successful as it is transparent and seamless alongside a LIMS.
• Instrument data is extremely reliable
• In this Industry, compliance is important which implies that a single algorithm is better than choosing the best method per dataset.
• As data is abundant, models are rebuilt from time to time.
• No facility for users to develop new applications.
10
![Page 11: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/11.jpg)
Gas Chromatography Mass Spectrometry
• Analytical instrument that combines the features of gas chromatography and mass spectrometry to identify different substances within a test sample
• Typical Applications
• Environmental monitoring
• Food and beverage analysis
• Criminal forensics (CSI!)
• Drugs/explosives detection
11
![Page 12: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/12.jpg)
Example Chromatogram (PAH) – ion counts
12
![Page 13: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/13.jpg)
MS fingerprints
13
![Page 14: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/14.jpg)
Machine Learning Approach
• Chromatograms are pre-processed to extract features
• Dataset constructed combining pre-processed chromatograms with analyst checked compound concentrations
• Learn the relationship between pre-processed chromatograms and compound concentrations:
• extensive pre-processing of data
• parallel processing – 5000 * 300 values per instance (NIR = 1000)
• pre-processing varies among compounds
14
![Page 15: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/15.jpg)
Process Requirements
15
![Page 16: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/16.jpg)
Solution = Advanced DAta Mining System
• get database IDs of chromatograms
• load chromatograms from DB
• identify and reject outliers
• obtain calibration set information, check correctness of set
• align with calibration chromatogram, check correlation
• compound-specific outlier detection
• generate artificial chromatogram with peaks of compound and spike compound
• generate output for WEKA16
![Page 17: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/17.jpg)
Limitations and future directions
• What we have seen so far works with data resident in memory (RAM) all the time
• This implies a limit can easily be reached, esp in applications like GCMS.
• We would like to be able to learn from potentially infinite data sources but with finite memory (RAM).
17
![Page 18: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/18.jpg)
Solution = MOA
18
![Page 19: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/19.jpg)
Future Directions
• Investigate how to get users to deploy their own DM solutions
• Implement incremental pre-processing techniques (Joao has already started!), eg incremental outlier detection.
• Implement incremental algs esp. for regression.
• Encourage work on abstention classifiers, uncertainty associated with point predictions etc.
• Meta-mine which units of a workflow are useful in tandem
• Investigate fusion: ADAMS with MOA, data (image+features), tasks (multiview, multitask, transfer)
19
![Page 20: Machine Learning Application Development](https://reader033.vdocuments.us/reader033/viewer/2022042614/557ad023d8b42add288b4d2a/html5/thumbnails/20.jpg)
Finally
Questions or Comments?
20