getting the results you want from unstructured data
TRANSCRIPT
Getting the Results !You Want From Unstructured Data: !An Overview for Developers
Aaron Chavez VP of Engineering
Meet the Expert
#datatoresults
Pioneer of web services for real-time text and image analysis
• Founded in 2005 • 40,000+ users • Used in 36+ countries • More than 1+ billion API calls monthly • Deep learning experts • Recently acquired by IBM
#datatoresults
Poll
How many of you have an NLP/AI application planned, in development, or in use? A. We’re trying to get an idea how we could use deep learning. B. We have a napkin sketch and see potential. C. We have code written and are testing it. D. We have an application ready and in use.
#datatoresults
What You’ll Learn Today
1
2
3
What types of problems are better solved by NOT using AI/NLP
How to assess your approach from a qualitative and quantitative perspective
The best way to test what you are developing
#datatoresults
Expect to Be Surprised
#datatoresults
Bad Surprises • Machines haven’t replaced us yet; every system has its quirks and shortcomings – that’s ok!
• If it has to be perfect, don’t use an intelligent system. If it can be reduced to a process, it doesn’t require intelligence • Use the results in aggregate • Use the results in conjunction with human expertise
#datatoresults
Good Surprises You need to experiment because there is so much out there, many services that never would have thought possible, are actually possible.
• Question Answering: Get information through voice recognition capabilities. For example, how about asking the phone to navigate you to a location or ask for data found on the web: “Who is the president of the United States?” • Websites that track infectious diseases or “crisis data” in real time • Less esoteric applications, such as learning about a sales prospect and their company’s recent business decisions
Roadmaps and trajectories are important. !Even if something is impossible today, it might be just around the corner.
IBM’s Watson for Oncology
Memorial Sloan Kettering and IBM are collaborating to train IBM Watson to help doctors identify treatment options for patients with cancer and assist in vital research. Medical imaging analysis + Machine learning + Computer vision + Medical expertise
#datatoresults
Assess Qualitatively and Quantitatively
#datatoresults
Qualitative
• Is this tool really trying to solve the same problem you want it to solve?
• Is it feature-complete? • Informal testing will not bring you hard numbers,
but you can work through the checklist
#datatoresults
Examples of Qualitative Exploration
• Does it accept my data as-is? • Can I use existing data instead of finding it
myself? • NLP: does it support the language(s) I need? • Is there scoring/ranking that allow for fine-
turning of results? • “Tagging” versus “Classifying” versus “More
like this”
#datatoresults
Quantitative
Do not rely on your gut. Your product is more important than that. – Be scientific – Don’t get hung up on the “quirks” of
these systems You can’t “feel” the difference between 70%
accurate and 80% accurate, but such margins can separate success and failure.
#datatoresults
#deeplearningseries
Test It EXACTLY How You Will Use It
Your Data, Your Application
» Use YOUR data, based on » YOUR definition of correct, for » YOUR use case
#datatoresults
Your Data, Your Application
– Obviously, accuracy matters, but what else?
– If you plan to run the service in live applications, test for reliability
– If you need real-time results, test for latency
– If you want to process high volumes of data, test for throughput
#datatoresults
Test Holistically
– Don’t test an intermediate result when you can test the whole
– What if your goal is to show a better ad using text classification?
• Don’t just measure the accuracy of a text classifier
• Measure the overall improvement in the system when you add text classification
#datatoresults
Use Case
Spiderbook Redefines CRM to be Customer Relationship Discovery “The problem I ran into was that most NLP and named entity recognition algorithms had been developed using pristine data sets, hand-curated for test suites.” “Those algorithms are unable to accurately analyze the content you find on the Web, which is not perfectly written articles, blog posts or tweets.” Aman Naimat, Spiderbook co-founder
#datatoresults
#datatoresults
Next Steps
Next Steps • Run the Alchemy demos for language, vision, face detection
• http://www.alchemyapi.com/products/demo • Let your imagination run!
• Access these resources on our website:
• Get started with the guide: !http://www.alchemyapi.com/developers/getting-started-guide/
• SDKs available at: https://github.com/AlchemyAPI • Test deep learning with your own applications:
• Free API Key: http://www.alchemyapi.com/api/register.html
• Need help? Contact [email protected]
#datatoresults
What We’ve Covered
#datatoresults
1
2
3
What types of problems are better solved by NOT !using AI/NLP
How to assess your approach from a qualitative !and quantitative perspective
The best way to test what you are developing
#datatoresults
Q&A
Contact us
www.alchemyapi.com You will receive an email with a recording of this webinar,
the slides and additional resources soon.
Thank you for attending!
#datatoresults
#datatoresults