international atomic energy agency october 2013inis training seminar1 subject analysis: computer...
TRANSCRIPT
International Atomic Energy Agency
October 2013 INIS Training Seminar 1
Subject Analysis:Subject Analysis:
Computer Assisted Indexing Computer Assisted Indexing
07 – 11 October 2013
Vienna, Austria
Bekele NegeriINIS Unit
Nuclear Information Specialist(Adapted from A. Nevyjel’s presentation)
International Atomic Energy Agency
Subject Indexing ToolsSubject Indexing Tools
There are two main INIS products used for indexing: WinFibre and CAI•WinFibre – for input preparation both bibliographic and subject indexing•CAI (Computer Assisted Indexing) – for subject classification and indexing
INIS/ETDE Thesaurus and INIS Subject Category Codes are incorporated in both.
October 2013 INIS Training Seminar 2
International Atomic Energy Agency
Indexing with FIBREIndexing with FIBRE
October 2013 INIS Training Seminar 3
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 4
Computer-assisted Indexing - CAIComputer-assisted Indexing - CAI
• Kick-off Meeting Jan 2004• Implementation and Customisation Jun 2004• Production Indexing from Jun 2004 ongoing• CAI version 1.0 final acceptance Aug 2004• Tuning of the system from Aug 2004 ongoing• CAI batch processing for Member States Dec 2004• CAI online from remote for MS Nov 2007
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 6
CAI Thesaurus ExtensionCAI Thesaurus Extension
• Thesaurus• Valid Descriptors 22,051
• Forbidden Terms 8,675
• Total 30,726
• CAI • Hidden Terms ~35.000
Terminological Knowledge Base
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 7
CAI Thesaurus extensionCAI Thesaurus extension
“Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE
relations• CAI internal only • not exported to INIS production system• not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 8
Hidden Terms: Compounds and IsotopesHidden Terms: Compounds and Isotopes
Descriptor hidden term free text
MAGNESIUM BORIDES MgB_2 MgB2
ACETIC ACID C_2H_4O_2 C2H4O2
CESIUM 137 Cesium 137, Cesium-137"1"3"7cs 137Cs137 caesium 137 Caesium, 137-Caesiumcaesium 137 Caesium 137, Caesium-137137 cesium 137 Cesium, 137-Cesium137 cs 137 Cs, 137-Css 137 Cs 137, Cs-137cs"1"3"7 Cs137
cs137 Cs137
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 9
Hidden Terms: Elementary ParticlesHidden Terms: Elementary Particles and countries and countries
Descriptor hidden term free text
ELECTRON NEUTRINOS #nu#_e νe
MUON NEUTRINOS #nu#_#mu# νμ
TAU NEUTRINOS #nu#_#tau# ντ
RHO-770 MESONS #rho#-770 ρ-770
OMEGA-782 MESONS #omega#-782 ω-782Country Names:
CAMBODIA kampucheaCOTE D'IVOIRE ivory coastGREECE hellasMYANMAR burmaTHAILAND siam
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 10
Hidden Terms: UK/US Spellings Hidden Terms: UK/US Spellings Descriptor hidden term
A CENTERS a centresACTIVITY METERS activity metresANALOG COMPUTERS analogue computersANESTHESIA anaesthesiaARCHAEOLOGY archeologyAUSTRIAN ORGANIZATIONS austrian organisationsBALLISTIC MISSILE DEFENSE ballistic missile defenceBAYARD-ALPERT GAGES bayard-alpert gaugesBEAM ANALYZERS beam analysersBEHAVIOR behaviourCATALOGS catalogues
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 11
Hidden Terms: Other Spellings Hidden Terms: Other Spellings
Descriptor hidden termSingular/Plural
FUNGI fungusFUNGI fungusesG MATRIX g matricesG MATRIX g matrixes
Reverse SequenceATOM-MOLECULE COLLISIONS atom-molecule scatteringATOM-MOLECULE COLLISIONS molecule-atom scatteringATOM-MOLECULE COLLISIONS atom-molecule reactionsATOM-MOLECULE COLLISIONS molecule-atom reactionsATOM-MOLECULE COLLISIONS atom-molecule interactionsATOM-MOLECULE COLLISIONS molecule-atom interactions
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 12
Further Improvements necessary Further Improvements necessary • “+” and “-“ signs
• K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS
• Case sensitivity• TiN TIN (instead of TITANIUM NITRIDES)• gas GALLIUM SULFIDES• “…who is the …” WHO (World Health Organization)
• Verbs versus Nouns• “… this leads us to …” LEAD• “… this leaves it ….” LEAVES
• Homographic terms• Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS
• Nuclear Reactions, e.g. 14N(γ,α)10B • Targets • Beams• Reactions
International Atomic Energy AgencyINIS Training Seminar
INDEXING PROBLEMSINDEXING PROBLEMS
• General terms (energy, physics, materials, uses etc.
• Misleading CAI suggestions:
Thesaurus terms:
PRODUCTIONPRODUCTION and PARTICLE PRODUCTIONPARTICLE PRODUCTION
SOLUTIONSOLUTION and MATHEMATICAL SOLUTIONMATHEMATICAL SOLUTION
IGNITIONIGNITION and THERMONUCLEAR IGNITIONTHERMONUCLEAR IGNITION
WALLS WALLS and THERMONUCLEAR REACTOR WALLSTHERMONUCLEAR REACTOR WALLS
PLANTSPLANTS and NUCLEAR POWER PLANTSNUCLEAR POWER PLANTS
MEMBRANESMEMBRANES (classic) and membranemembrane (in brane theory)
COLORCOLOR and COLOR MODELCOLOR MODEL (elementary particle characteristics)
TRANSPORT, etc.TRANSPORT, etc.
October 2013 13
International Atomic Energy AgencyINIS Training Seminar
INDEXING PROBLEMSINDEXING PROBLEMS
chemical compounds/ case sensitivity/homonyms:
INDIUM IONS for “in ions”
ASTATINE 200 for at 200oC
VISIBLE RADIATION for light (weight)
HELIUM 6 for “consisting of 6 He 3 tubes”
VISIBLE RADIATION for “light weight”
temperature, pressure, etc. range
abbreviations:
TNA for Thermal Neutron Analysis and TRINONYLAMINE
MPA for Maximum Permissible Activity
MPa (Mega Pascal)
October 2013 14
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 15
CAI online for Member StatesCAI online for Member Statesintroduced in July 2007introduced in July 2007
• CAI Batch used by• China• Czech Republic (seldom)• Georgia (only in 2012)
• Germany• Iran• Uzbekistan• Vietnam
• CAI Online in use by• Austria• Bulgaria• Cuba• Israel (registering)
• Japan• Mexico• Netherlands (seldom)
• Uruguay
CAI online and CAI batch are now regular services for CAI online and CAI batch are now regular services for Member StatesMember States
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 16
CAI Batch and Online ProcessingCAI Batch and Online Processing
• Input: MemSt-CC-yymmdd-xxxxxxxxxxx
• MemSt is a standard prefix (meaning “member state”)• CC is the country code • yymmdd is the date when the file was generated • xxxxxxxxxxx is any additional identification
• Examples• MemSt-AR-041203-thisismytestfile• MemSt-FR-041212-fileidentification
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 17
CAI Batch ProcessingCAI Batch Processing
• Output: _MemSt-CC-yymmdd-xxxxxxxxxxx
• These files will carry the CAI suggested descriptors in tag 800, preceded by the string
##CAI suggestions##; • Example:
• 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; …….
• sent back to the member state for reviewing
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 19
CAI Batch and Online ProcessingCAI Batch and Online ProcessingReviewing ProcessReviewing Process
• Delete all suggested descriptors which are too general
• Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature
ranges,...
• nuclear reactions
• chemical compounds, alloys, etc.
• CAI is cleaning up BT/NTs clean up BT/NTs from manual additions
• Clean up suggestions from homographic terms
International Atomic Energy AgencyOctober 2013 INIS Training Seminar 20
CAI Batch and Online ProcessingCAI Batch and Online ProcessingFinalisation ProcessFinalisation Process
CAI batch•When reviewing of the record completed:
Delete “##CAI suggestions## “•When reviewing of all records completed:
Submit file to “INIS Input Box” CAI online•When reaching the last record:
press “export and exit” button• File goes directly to INIS production system,
or if required, sent back to Member State for reviewing