captcha.ppt
DESCRIPTION
presentation on captchaTRANSCRIPT
Presented by: AVINASH MAURYA
IT VI SEM 0829213008
Definition Background Applications Types of CAPTCHAs Breaking CAPTCHAs Proposed Approach Conclusion
2
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart
A program that can tell whether its user is a human or a computer.
The challenge: develop a software program that can create and grade challenges most humans can pass but computers cannot
3
First used by Altavista in1997• Reduced SPAM add-url by over 95%
CMU/Yahoo!• Automated the creating and grading of
challenges PARC
• Relies on document image degradation to prevent successful OCR
• Conducted user-focused studies to assess the effectiveness of CAPTCHAs
4
CAPTCHAs are based on open AI problems
Breaking CAPTCHAs help advance AI by solving these open problems
Improving CAPTCHAs help telling computers and human apart
Win-win situation
5
Pessimal Print: A Reverse Turing TestAllison L. Coates, Henry S. Baird, Richard J. Fateman
Telling Humans and Computer Apart AutomaticallyLuis von Ahn, Manuel Blum, and John Langford
CAPTCHA: Using Hard AI Problems for SecurityLuis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford
Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)Kumar Chellapilla, Patrice Y. Simard
6
Free email services Online polls Dictionary attacks Newsgroups, Blogs, etc… SPAM
7
Text based• Gimpy, ez-gimpy• Gimpy-r, Google CAPTCHA• Simard’s HIP (MSN)
Graphic based• Bongo• Pix
Audio based
8
Gimpy, ez-gimpy• Pick a word or words from a small dictionary• Distort them and add noise and background
Gimpy-r, Google’s CAPTCHA• Pick random letters• Distort them, add noise and background
Simard’s HIP• Pick random letters and numbers• Distort them and add arcs
9
10
Bongo• Display two series of blocks• User must find the characteristic that sets
the two series apart• User is asked to determine which series
each of four single blocks belongs to
Difference? thick vs. thin lines
11
PIX• Create a large database of labeled images• Pick a concrete object• Pick four images of the object from the
images database• Distort the images• Ask the user to pick the object for a list of
words
12
13
DogPool
Pick a word or a sequence of numbers at random
Render them into an audio clip using a TTS software
Distort the audio clip Ask the user to identify and type the
word or numbers
14
Most text based CAPTCHAs have been broken by software• OCR• Segmentation
Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.
15
Very similar to PIX Pick a concrete object Get 6 images at random from
images.google.com that match the object Distort the images Build a list of 100 words: 90 from a full
dictionary, 10 from the objects dictionary Prompt the user to pick the object from
the list of words
16
Make an HTTP call to images.google.com and search for the object
Screen scrape the result of 2-3 pages to get the list of images
Pick 6 images at random Randomly distort both the images and
their URLs before displaying them Expire the CAPTCHA in 30-45 seconds
17
The database already exists and is public
The database is constantly being updated and maintained
Adding “concrete objects” to the dictionary is virtually instantaneous
Distortion prevents caching hacks Quick expiration limits streaming
hacks
18
Not accessible to people with disabilities (which is the case of most CAPTCHAs)
Relies on Google’s infrastructure Unlike CAPTCHAs using random
letters and numbers, the number of challenge words is limited
19
20
21