parallel – picke room i · • the iet is working to engineer a better world by inspiring,...
TRANSCRIPT
#alpsp17www.alpsp.org/Conference
Parallel – Picke Room IAI – Two publishing case studiesDavid Smith (Chair & speaker) - IETMarcel Karnstedt-Hulpus - Springer Nature
Youwon’tbelievehoweasyitistobuildanAI!
RetoolinganA&Idatabaseforthe21st Century.
AbouttheIET• TheIETisoneoftheworld’slargestengineeringinstitutionswithover168,000membersin150countries.Itisalsothemostmultidisciplinary– toreflecttheincreasinglydiversenatureofengineeringinthe21stcentury.
• TheIETisworkingtoengineerabetterworldbyinspiring,informingandinfluencingourmembers,engineersandtechnicians,andallthosewhoaretouchedby,ortouch,theworkofengineers.
INSPEC:ABluffersGuide• AhighlycuratedA&IdatabasecoveringEngineering,ComputingandPhysics(etc etc)
• Forover40Years• >17millionabstracts• SoMuchMetadataWOW!• SeveralhundredyearsworthofHumanExpertisekeepsaverycloseeyeonthemetadataquality
Soitwasamanualsystem…
Andhere’showitworked…
Weneededtochangethis…• TheTechwasE.O.L• TheManualmethodswererestrictive&expensive(butVHighQuality)
• Wehadreachedanupperlimitoncoverageandvolume
• Therewereclearopportunitiestorethinkwhatweweredoingandwhy…
• RebootingINSPECproductioncouldopenupnewbusinessavenues– ifwegotitright.
Goals…• Delivercostsavings(ROIargumentused).• Movethehumaneffortfurtherupthevaluechain• Beabletoextendcoveragecapabilities• Beabletoextendvolumecapabilities• ReconfigurethedatainINSPECtoallownewwaysofasking
questionsofit.• BuildanewIETIPasset• FocusonautomationwithhumanQA(‘GroundTruthing’)
Sothisiswhatwehavebuilt…(Simpleversion)
Acquisition
• IngestXML/PDF/OCR
Normalisation
•RendertostandardINSPECSchemaforonwardprocessing
MetadataApplication
•TheAIliveshere…
ProductGeneration
•Setupofabstractstovariousoutputcontainers
Output
•VariousXMLoutputsasneeded
Humans Machine Machine&HumanQA Machine Machine/HumanQA
Ohyeah…We’vealsobuiltanINSPECKnowledgeGraphCoveringallofINSPEC
Let’sfocusontheAI• Whatdoesitdo?• Howdoesitwork?• Isitanygood?
Whatdoesitdo?• Itreadstext.• ItexpectsthattexttocontainengineeringcontentcommensuratewithINSPECcoverage
• ItthenappliesthefullgamutofINSPECmetadatatothetext…Uncontrolledindexing/Controlledterms/Classifications/Numericalindexing/Chemicalindexing/Astronomicalobjectindexing/Treatmentcodes
84
Howdoesitwork?• Wedon’treallyknow.Weturneditonafewmonthsagoandremovedhumansfromthedecisionprocess.Itstartedtolearnatageometricrateuntilitbecameselfaware.Wetriedtoturnitoff,butitalreadyhadphishedourAWScreditcarddetails.Itkeepsaskinguswhere‘Wintermute’is.Helpusplease…
Howdoesitwork?
JustKidding!
Howdoesitwork?• Itusesamixtureof
– Heuristics– NaturalLanguageProcessors– Statisticalanalysistools– AndaselectionofAIalgorithms.
• Webuiltadetaileddomainmodel&Ontologyforittouse
• It’sbeentrainedviadirectedlearningofagoldencorpus(circa600KdocumentsacrossINSPEC)
Howdoesitwork?• AndaselectionofAIalgorithms…
– Welookedatadaboost (goseewikipedia…)– Alsoword2vec(likewise)– AndTensorflow – thedeeplearningalgo fromGoogle.Interestingresults…ItdidsomeratheroddthingsTBHsoweabandonedthatapproach.
IsitAnyGood?• Arathercomplexquestiontoanswerinmanyways…
• Whenitstartstogetgood(anditis)ittestspreviouslyheldassumptionsaboutwhatqualityactuallyis…
• We’velearnedourselvesquiteabitaboutwhatwethinkisgoodandWHYasaresultofteachingamachinetounderstandengineeringtexts.
IsitAnyGood?ShowMeTheNumbers!
ControlledTermsFScoreresults ClassificationsFScoreresultsINSPECClassificationsarecomplexmetaconcepts
Remember– FscoreisafunctionofBOTHPrecisionandrecall…
IsitAnyGood?• WecangetVERYhighnumbersindeedonindividualconceptandtermmatching(90%+)butmuchofthemetadataweaddisaboutwhereagivenitemshouldbelonginourvariousmeta-classificationapproaches.
• WealsohavetofigureoutawaytolookacrosstheentiretyoftheINSPECdatawhenthemachineislearning.Animprovementinoneareacanleadtooddresultselsewhere.
IsitAnyGood?• Ohyes.It’sverygoodindeed.SeniorINSPECAlumniofmanyyearsarefrequentlystunnedbywhatitcando.
• It’slive.It’sdeliveringsavingstousnowandit’sallowingustogotakealookatwhat’soverthehorizonfortheIET…
Visualisation
Thanks!
Q’s(attheend)