Transcript
Page 1: Machine Learning Application Development

Developing Machine Learning ApplicationsGeoff Holmes, University of Waikato

1

Page 2: Machine Learning Application Development

Outline

• What application development have we done?

• What lessons have we learned?

• What is needed in terms of the future of machine learning application development?

2

Page 3: Machine Learning Application Development

Applications – a taxonomy

• UCI data sets – very much like our early agricultural data

• Competition data – usually larger than above, often difficult

• Signal control applications (often involve reinforcement learning) – eg autonomous helicopters, vehicles, learning the signature of a great pianist, learning to sail, learning to drive racing cars faster, learning to play soccer (often linked to robotics)

• Key to success = objective measurement – eg Human Computer Interaction, Speech and Image Recognition, Computer Games, etc.

3

Page 4: Machine Learning Application Development

WEKA Waikato Environment for Knowledge Analysis

• Machine Learning at Waikato started in 1993

• Build an interface to enable several ML methods to be compared on same data

• Explore datasets of importance to the agricultural sector in NZ

• Apple bruising, Venison bruising, Bull behaviour, Grass grubs, Pasture production, Pea seed colour, Slugs, Squash harvest, Wasp nests, White clover persistence

• Cow culling

• Datasets very of the “bring out your dead” variety

4

Page 5: Machine Learning Application Development

WEKA – unscientific study from Google Scholar

• For the query “WEKA applications”

• Bioinformatics

• Grid Computing

• Medicine

• Business and Finance

• Computer Networks

• Education

5

Page 6: Machine Learning Application Development

Early lessons learned

• Using WEKA is good but only static solutions are possible

• Datasets need to be large enough to yield significant and meaningful results

• Datasets involving human judgement tend to be unreliable

6

Page 7: Machine Learning Application Development

Scientific Equipment Application Methodology

• Obtain samples and reference data from existing technology (eg wet chemistry) – establish targets Y.

• Process same samples using a proxy (eg NIR) – new X

• Construct new dataset with new X and Y

7

Page 8: Machine Learning Application Development

Near Infrared Spectroscopy

• Once concept was proven we needed a system to support commercial use (ie alongside the LIMS)

• Developed S2 (with WEKA interface):

• Used continuously at Hill Laboratories and BLGG (Holland) since around 2005 – never gone wrong!

• So far it is the best application of the technology that we have ever come across.

• Faster than wet chemistry

• Predictions can be more accurate

• Large cost savings – multiple analyses per sample

8

Page 9: Machine Learning Application Development

S2

9

Page 10: Machine Learning Application Development

NIR – lessons learned

• Very lightweight input/output solution using dropboxmethodology was successful as it is transparent and seamless alongside a LIMS.

• Instrument data is extremely reliable

• In this Industry, compliance is important which implies that a single algorithm is better than choosing the best method per dataset.

• As data is abundant, models are rebuilt from time to time.

• No facility for users to develop new applications.

10

Page 11: Machine Learning Application Development

Gas Chromatography Mass Spectrometry

• Analytical instrument that combines the features of gas chromatography and mass spectrometry to identify different substances within a test sample

• Typical Applications

• Environmental monitoring

• Food and beverage analysis

• Criminal forensics (CSI!)

• Drugs/explosives detection

11

Page 12: Machine Learning Application Development

Example Chromatogram (PAH) – ion counts

12

Page 13: Machine Learning Application Development

MS fingerprints

13

Page 14: Machine Learning Application Development

Machine Learning Approach

• Chromatograms are pre-processed to extract features

• Dataset constructed combining pre-processed chromatograms with analyst checked compound concentrations

• Learn the relationship between pre-processed chromatograms and compound concentrations:

• extensive pre-processing of data

• parallel processing – 5000 * 300 values per instance (NIR = 1000)

• pre-processing varies among compounds

14

Page 15: Machine Learning Application Development

Process Requirements

15

Page 16: Machine Learning Application Development

Solution = Advanced DAta Mining System

• get database IDs of chromatograms

• load chromatograms from DB

• identify and reject outliers

• obtain calibration set information, check correctness of set

• align with calibration chromatogram, check correlation

• compound-specific outlier detection

• generate artificial chromatogram with peaks of compound and spike compound

• generate output for WEKA16

Page 17: Machine Learning Application Development

Limitations and future directions

• What we have seen so far works with data resident in memory (RAM) all the time

• This implies a limit can easily be reached, esp in applications like GCMS.

• We would like to be able to learn from potentially infinite data sources but with finite memory (RAM).

17

Page 18: Machine Learning Application Development

Solution = MOA

18

Page 19: Machine Learning Application Development

Future Directions

• Investigate how to get users to deploy their own DM solutions

• Implement incremental pre-processing techniques (Joao has already started!), eg incremental outlier detection.

• Implement incremental algs esp. for regression.

• Encourage work on abstention classifiers, uncertainty associated with point predictions etc.

• Meta-mine which units of a workflow are useful in tandem

• Investigate fusion: ADAMS with MOA, data (image+features), tasks (multiview, multitask, transfer)

19

Page 20: Machine Learning Application Development

Finally

Questions or Comments?

20


Top Related