{saturated fat: 6.7g,added sugar: 8.1g,
fibre: 8.5g,proteins: 9.9g,
salt: 1.4g}
[email protected]
Practical Image Processing
Social Entrepreneur
[email protected]
Huge Machine LearningEnthusiast
ME
[email protected]
TESSERACT
https://github.com/tesseract-ocr
[email protected]
PROBLEM: BINARIZATION
python: PIL library does not work here
[email protected]
colour histogram
~ 500 000 pixel values, each pixel value is a different colour
[email protected]
k-means clustering
image source: Wikipedia
http://stackoverflow.com/questions/3241929/python-find-dominant-most-common-color-in-an-image
[email protected]
kmeans: results on full image
[email protected]
SOLUTION: sloppy way
colour (pixel values) histogram
find middle pixel valueeverything below it goes blackeverything above it goes white
[email protected]
LINE DELETION
work with windows or full image?
remove black regions > 400 pixels?
remove uninterrupted blackregions?
Are we missing something?
[email protected]
black regions > 400 pixels
window-wise
[email protected]
uninterrupted black regions
window-wise
[email protected]
uninterrupted black regions
window-wise
[email protected]
bonus: MIN FILTER
min filter PIL default: 3 pixels
[email protected]
CHALLENGE: WHITE PIXELS INLETTERS