september 2005nvo summer school1 object classification in the virtual observatory: a vo status...
Post on 27-Mar-2015
215 Views
Preview:
TRANSCRIPT
September 2005NVO Summer School 1
Object Classification in the
Virtual Observatory:A VO Status Report
Tom McGlynnNASA/GSFC
THE US NATIONAL VIRTUAL OBSERVATORY
September 2005NVO Summer School 2
How do we know what we want in the VO?
Pretend the VO exists.
What is the science we are doing with it?
Now try to do that science and see what gets in the way.
September 2005NVO Summer School 3
Can we classify ROSAT X-ray sources?
All RASS Sources (124,730)
Classified RASS Sources ~7,000
Total RASS Sources ~130,000
September 2005NVO Summer School 4
What do we want to do?
• Find counterparts to ROSAT X-ray sources in optical, IR, radio.
• Train a classifier to use multiwavelength information to determine type of objects.
• Classify all of the objects seen by ROSAT.
September 2005NVO Summer School 5
What is classification?
• Translation from observables to distinct physical processes.
• Each element classified independently of others• Is classification different from measurement?• Classification versus cataloging• Usually classify ‘objects’ but also…
– Events: GRBs, solar flares, …– Simulated data– Pixels/regions in an image: Earth and planetary
studies, shocked regions, …
September 2005NVO Summer School 6
It’s not just us.
http://aria.arizona.edu/courses/tutorials/class/html/class.html
A typical plot of objects to be classified?
There is lots of information and discussion of classification outside astronomy.
September 2005NVO Summer School 7
Examples
• Moving versus fixed stars• Classes of stellar spectra (ordered by strength
of Balmer lines).– Substitute for a measurement– Cf. Dwarf versus giant
• Osterbrock diagram: AGN versus star-forming emission line galaxies.
• Bautz-Morgan types of clusters of galaxies– Dominance of cluster by central galaxy.
• Types of x-ray sources: AGN, SNR, pulsars, XRBs, …
September 2005NVO Summer School 8
Galaxy Classification
September 2005NVO Summer School 9
Why do we classify?
• Understand a given field.• Generate statistical samples.• Compare different regions/observations.• Find rare objects.• Remove unwanted backgrounds.• Plan subsequent observations.• …
September 2005NVO Summer School 10
Do we know what we are looking for?
• Yes: We have a good idea of the kinds of objects that are in the field.– Supervised classification– Find out which regions of observable ‘phase space’
belong to which classes and use that knowledge to classify new sources.
• No: We don’t really know what we’re looking at.– Unsupervised classification– Is there any structure in the phase space distribution?
September 2005NVO Summer School 11
Supervised versus unsupervised classification
Supervised and Unsupervised Land Use Classification, Chris Banman
http://www.emporia.edu/earthsci/student/banman5/perry3.html
September 2005NVO Summer School 12
Supervised classification
• Often has a ‘training’ phase where a priori knowledge is used to tune the classifier algorithm. Training takes most of the time.– But Osterbrock diagram based on theoretical
modeling.
• We specify a list of output classes.• May give a list of probabilities of membership
in more than one class.• Algorithms: Neural networks, nearest
neighbor, decision trees
September 2005NVO Summer School 13
Supervised classifier training
Neural Networks Oblique Decision Trees
September 2005NVO Summer School 14
Unsupervised classification
• Tries to find natural groupings of data.• User often specifies number of classes
to find.• Classes found are anonymous – it is up
to user to define physical meaning.• Self-organizing maps, K-means, C-
means hierarchical clustering, gaussian mixtures
September 2005NVO Summer School 15
Self-organizing maps
Catalogs in VizieR K-means
Fuzzy C-means
September 2005NVO Summer School 16
Some key questions.
1. (S) What output classes are we interested in, and what degree of resolution do we want?
• Star versus galaxy or A0V versus SBa
(U) How many classes might we expect?2. What input data sets are we going to use? 3. How are we going to get them?4. How do we combine them?5. What observables are available? Which are useful?6. (S) What training sets are available?
(U) How do we understand the output classes?7. What algorithm are we going to use in classification?8. How can we test the results so that we believe them?
September 2005NVO Summer School 17
Specification/Count of Output Classes
We weren’t sure how detailed we could do classifications and had to play with the classifiers to see what might be feasible.
Does the VO help?Not directly. This will often be implicit in
the problem. By making other aspects in classification easier, the VO makes playing around with this choice easier.
September 2005NVO Summer School 18
What input data sets are we going to use?
We knew which datasets we were going to use but we added one along the way.
Does the VO help?Maybe. VO registries can help find
resources but these will often be implicit in the problem.
September 2005NVO Summer School 19
We used custom interfaces to get data from different resources, but VOTables were developed early enough for us to use. (Perl VOTable parser from ClassX effort) This took a fair bit of work.
Does the VO help?A lot. Just a few standard ways to get the data
and nice standard ways of defining them. Limits on some services are still annoying. New libraries can make this part really easy. Large XML files are cumbersome to process in many tools.
How are we going to get the data?
September 2005NVO Summer School 20
How do we combine them?
We used custom software. This took a lot of work but we had to deal with the issue of multiple counterparts to each X-ray sources.
Does the VO help?A lot. XMatch does a lot of what we want though
not everything. Note spatial matching capabilities in TOPCAT allow merging of data from ConeSearch too.
September 2005NVO Summer School 21
What observables are available? Which are useful?
This took a lot of work. Understanding what variables were available and getting full descriptions was difficult.
Does the VO help?A little. Visualization tools like Mirage are nice
for getting a feel for the data, but non-VO tools (e.g., IDL itself) may do this just as well. Documentation in the VO is probably not better than before but a common framework for getting information to users is available if providers ever get around to providing adequate documentation.
September 2005NVO Summer School 22
Classification needs right information, not all information.
Hughes Effect
Classification of Multi-Spectral Data by Join Supervised-Unsupervised Learning
(Shahshahani & Landgrebe)
September 2005NVO Summer School 23
Training set/ground truth data
We knew most of the training data in advance.
Does the VO help?VO registry may point out some
possibilities but training or truth data may be implicit in the problem.
September 2005NVO Summer School 24
What algorithm are we going to use in classification?
We had experience with oblique decision trees.
Does the VO help?A little. VOStat provides a few capabilities for
unsupervised classification, but the Web interface is a little flakey. Web service interfaces to a few standard classifiers might be nice. VO could do a lot more here.
September 2005NVO Summer School 25
VOStat
• See www.vostat.org• Statistics routines on-line with VO
interface.• Downloadable library• Fairly minimal Web interface• Includes K-means and hierarchical
clustering tools.
September 2005NVO Summer School 26
How can we test the results so that we believe them?
We found a number of independently classified sets of objects and checked for consistency.
Does the VO help?Yes. This is probably where we can most
effectively use VO resources we discover in the registry. However a couple of the samples we used were not yet published.
September 2005NVO Summer School 27
Testing the results
Classify independently classified datasets.
Check faint sources?
September 2005NVO Summer School 28
Overall…
A lot of progress since we started ClassX but plenty of issues still remain.
September 2005NVO Summer School 29
A ClassX phase space slice
\
September 2005NVO Summer School 30
Science
• Probalistic classifications of all ROSAT X-ray sources: McGlynn, et. al 2004ApJ...616.1284M
• New HMXRB’s: Suchkov and Hanisch2004ApJ...612..437S
top related