getting the results you want from unstructured data

Post on 30-Jul-2015

220 Views

Category:

Data & Analytics

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Getting the Results !You Want From Unstructured Data: !An Overview for Developers

Aaron Chavez VP of Engineering

Meet the Expert

#datatoresults

Pioneer of web services for real-time text and image analysis

•  Founded in 2005 •  40,000+ users •  Used in 36+ countries •  More than 1+ billion API calls monthly •  Deep learning experts •  Recently acquired by IBM

#datatoresults

Poll

How many of you have an NLP/AI application planned, in development, or in use? A.  We’re trying to get an idea how we could use deep learning. B.  We have a napkin sketch and see potential. C.  We have code written and are testing it. D.  We have an application ready and in use.

#datatoresults

What You’ll Learn Today

1

2

3

What types of problems are better solved by NOT using AI/NLP

How to assess your approach from a qualitative and quantitative perspective

The best way to test what you are developing

#datatoresults

Expect to Be Surprised

#datatoresults

Bad Surprises •  Machines haven’t replaced us yet; every system has its quirks and shortcomings – that’s ok!

• If it has to be perfect, don’t use an intelligent system. If it can be reduced to a process, it doesn’t require intelligence •  Use the results in aggregate •  Use the results in conjunction with human expertise

#datatoresults

Good Surprises You need to experiment because there is so much out there, many services that never would have thought possible, are actually possible.

• Question Answering: Get information through voice recognition capabilities. For example, how about asking the phone to navigate you to a location or ask for data found on the web: “Who is the president of the United States?” • Websites that track infectious diseases or “crisis data” in real time • Less esoteric applications, such as learning about a sales prospect and their company’s recent business decisions

Roadmaps and trajectories are important. !Even if something is impossible today, it might be just around the corner.

IBM’s Watson for Oncology

Memorial Sloan Kettering and IBM are collaborating to train IBM Watson to help doctors identify treatment options for patients with cancer and assist in vital research. Medical imaging analysis + Machine learning + Computer vision + Medical expertise

#datatoresults

Assess Qualitatively and Quantitatively

#datatoresults

Qualitative

•  Is this tool really trying to solve the same problem you want it to solve?

•  Is it feature-complete? •  Informal testing will not bring you hard numbers,

but you can work through the checklist

#datatoresults

Examples of Qualitative Exploration

•  Does it accept my data as-is? •  Can I use existing data instead of finding it

myself? •  NLP: does it support the language(s) I need? •  Is there scoring/ranking that allow for fine-

turning of results? •  “Tagging” versus “Classifying” versus “More

like this”

#datatoresults

Quantitative

Do not rely on your gut. Your product is more important than that. –  Be scientific –  Don’t get hung up on the “quirks” of

these systems You can’t “feel” the difference between 70%

accurate and 80% accurate, but such margins can separate success and failure.

#datatoresults

#deeplearningseries

Test It EXACTLY How You Will Use It

Your Data, Your Application

» Use YOUR data, based on » YOUR definition of correct, for » YOUR use case

#datatoresults

Your Data, Your Application

–  Obviously, accuracy matters, but what else?

–  If you plan to run the service in live applications, test for reliability

–  If you need real-time results, test for latency

–  If you want to process high volumes of data, test for throughput

#datatoresults

Test Holistically

–  Don’t test an intermediate result when you can test the whole

–  What if your goal is to show a better ad using text classification?

•  Don’t just measure the accuracy of a text classifier

•  Measure the overall improvement in the system when you add text classification

#datatoresults

Use Case

Spiderbook Redefines CRM to be Customer Relationship Discovery “The problem I ran into was that most NLP and named entity recognition algorithms had been developed using pristine data sets, hand-curated for test suites.” “Those algorithms are unable to accurately analyze the content you find on the Web, which is not perfectly written articles, blog posts or tweets.” Aman Naimat, Spiderbook co-founder

#datatoresults

#datatoresults

Next Steps

Next Steps •  Run the Alchemy demos for language, vision, face detection

•  http://www.alchemyapi.com/products/demo •  Let your imagination run!

•  Access these resources on our website:

•  Get started with the guide: !http://www.alchemyapi.com/developers/getting-started-guide/

•  SDKs available at: https://github.com/AlchemyAPI •  Test deep learning with your own applications:

•  Free API Key: http://www.alchemyapi.com/api/register.html

•  Need help? Contact support@alchemyapi.com

#datatoresults

What We’ve Covered

#datatoresults

1

2

3

What types of problems are better solved by NOT !using AI/NLP

How to assess your approach from a qualitative !and quantitative perspective

The best way to test what you are developing

#datatoresults

Q&A

Contact us

1-877-253-0308!questions@alchemyapi.com !

www.alchemyapi.com You will receive an email with a recording of this webinar,

the slides and additional resources soon.

Thank you for attending!

#datatoresults

#datatoresults

top related