steganography by: joe jupin supervised by: dr. longin jan latecki
TRANSCRIPT
SteganographySteganography
By: Joe JupinBy: Joe Jupin
Supervised by: Dr. Longin Jan Supervised by: Dr. Longin Jan LateckiLatecki
OverviewOverview IntroductionIntroduction
Clandestine CommunicationClandestine Communication Digital Applications of SteganographyDigital Applications of Steganography
BackgroundBackground Uncompressed ImagesUncompressed Images Compressed ImagesCompressed Images SteganalysisSteganalysis The Images UsedThe Images Used
Finding and Extracting Messages from BitmapsFinding and Extracting Messages from Bitmaps Detecting Messages in jpegsDetecting Messages in jpegs Future WorkFuture Work
IntroductionIntroduction Clandestine CommunicationClandestine Communication
CryptographyCryptography Scrambles the message into cipherScrambles the message into cipher
SteganographySteganography Hides the message in unexpected placesHides the message in unexpected places
Digital Applications of SteganographyDigital Applications of Steganography Can be hidden in digital dataCan be hidden in digital data
MS Word (doc)MS Word (doc) Web pages (htm)Web pages (htm) Executables (exe)Executables (exe) Sound files (mp3, wav, cda)Sound files (mp3, wav, cda) Video files (mpeg, avi)Video files (mpeg, avi) Digital images (bmp, gif, jpg)Digital images (bmp, gif, jpg)
BackgroundBackground
Uncompressed ImagesUncompressed Images Grayscale Bitmap images (bmp)Grayscale Bitmap images (bmp)
256 shades of intensity from black to white256 shades of intensity from black to white Can be obtained from color imagesCan be obtained from color images Arranged into a 2-D matrixArranged into a 2-D matrix Messages are hidden in the least Messages are hidden in the least
significant bits (lsb)significant bits (lsb) Matrix values change slightlyMatrix values change slightly Interested in patterns that form messagesInterested in patterns that form messages
Character Integer Binary
Space 32 00100000
0 – 9 48 – 57 00110000 - 00111001
A – Z 65 – 90 01000001 - 01011010
a – z 97 – 122 01100001 – 01111010
Length = 12
Message = Hello Stego!
BackgroundBackground
Compressed ImagesCompressed Images Grayscale jpeg images (jpg)Grayscale jpeg images (jpg)
Joint Photographic Experts Group (jpeg)Joint Photographic Experts Group (jpeg) Converts image to Converts image to YCbCr colorspaceYCbCr colorspace Divides into 8x8 blocksDivides into 8x8 blocks Uses Discrete Cosine Transform (DCT)Uses Discrete Cosine Transform (DCT)
– Obtain frequency coefficientsObtain frequency coefficients– Scaled by quantization to remove some frequenciesScaled by quantization to remove some frequencies– High quality setting will not be noticedHigh quality setting will not be noticed
Huffman CodingHuffman Coding Affects the images statistical propertiesAffects the images statistical properties
BackgroundBackground
SteganalysisSteganalysis The Images UsedThe Images Used
From Star Trek WebsiteFrom Star Trek Website 1,000 color jpeg images1,000 color jpeg images 320x240 or 240x320320x240 or 240x320 www.startrek.comwww.startrek.com There will be KlingonsThere will be Klingons
Finding and Extracting Finding and Extracting Messages from BitmapsMessages from Bitmaps
ProblemProblem Messages can be hidden in lsb’sMessages can be hidden in lsb’s May be anywhere in imageMay be anywhere in image Cannot see message in imageCannot see message in image Would take forever to be processed Would take forever to be processed
by a humanby a human
Finding and Extracting Finding and Extracting Messages from BitmapsMessages from Bitmaps
ProcedureProcedure Inject messages into a imagesInject messages into a images Take a Boolean snapshot of even and odd pixelsTake a Boolean snapshot of even and odd pixels Construct a string of all possible charactersConstruct a string of all possible characters
An n-pixel image has n-7 individual character An n-pixel image has n-7 individual character enumerations (320 x 240 - 7 = 76,793)enumerations (320 x 240 - 7 = 76,793)
Use character properties to match a message Use character properties to match a message pattern in the enumerated stringpattern in the enumerated string
Define a ‘message’ (pattern of message characters)Define a ‘message’ (pattern of message characters) Define ‘message characters’ (used in messages)Define ‘message characters’ (used in messages) Use ‘stego stems’ (patterns)Use ‘stego stems’ (patterns)
A test can be performed faster by using tiled A test can be performed faster by using tiled samplessamples
Steganography is the art and science of communicating in a way which hides the existence of the communication. In
contrast to cryptography, where the "enemy" is allowed to detect, intercept and modify messages without being able to
violate certain security premises guaranteed by a cryptosystem, the goal of steganography is to hide messages
inside other "harmless" messages in a way that does not allow any "enemy" to even detect that there is a second
secret message present [Markus Kuhn 1995-07-03].
Finding and Extracting Finding and Extracting Messages from BitmapsMessages from Bitmaps
ObservationObservation Only considered linear unencrypted Only considered linear unencrypted
messagesmessages Trial performed on 100 grayscale bitmapsTrial performed on 100 grayscale bitmaps
97 clean97 clean 3 stego3 stego
Took an average of 9 seconds per image to Took an average of 9 seconds per image to find with 100% accuracy (no training -- cold)find with 100% accuracy (no training -- cold)
Occasionally some garbage text at head or tailOccasionally some garbage text at head or tail Took an average of 3 seconds per image to Took an average of 3 seconds per image to
test with 100% accuracytest with 100% accuracy Clean images had pattern scores of less than 10Clean images had pattern scores of less than 10 Stego images had pattern scores of 31 or moreStego images had pattern scores of 31 or more
Finding and Extracting Finding and Extracting Messages from BitmapsMessages from Bitmaps
ConclusionConclusion Messages are detectible and Messages are detectible and
extractible from non-encrypted extractible from non-encrypted uncompressed imagesuncompressed images
Linear messages can be found in any Linear messages can be found in any direction with more computationdirection with more computation
This method can be foiled by hashing This method can be foiled by hashing the message into the imagethe message into the image
Detecting Messages Detecting Messages in jpegsin jpegs
ProblemProblem Cannot use an enumeration scheme Cannot use an enumeration scheme
to detect or find a messageto detect or find a message May only be able to detect because of May only be able to detect because of
encoding schemes and encryptionencoding schemes and encryption Cannot see message in imageCannot see message in image Statistical properties of an image Statistical properties of an image
change when a message is injectedchange when a message is injected
Detecting Messages Detecting Messages in jpegsin jpegs
ProcedureProcedure Obtain the 4-level 2-D wavelet Obtain the 4-level 2-D wavelet
decomposition of the imagesdecomposition of the images Obtain the orientation decomposition of Obtain the orientation decomposition of
frequency space statisticsfrequency space statistics 72 features plus the class (0 = clean, 1=stego)72 features plus the class (0 = clean, 1=stego) Includes: mean, variance, skewness and kurtosis Includes: mean, variance, skewness and kurtosis
of coefficients and error for prediction in subbandof coefficients and error for prediction in subband Normalize the data by 0-1 min-maxNormalize the data by 0-1 min-max Train Fisher Linear Descriptor (FLD)Train Fisher Linear Descriptor (FLD) Test the FLD thresholdTest the FLD threshold
-0.004 17.120 120.485 0.059 0.363 1.041 3.809 -0.291
-0.146 838.622 97.874 0.887 0.034 1.391 3.948 -0.703
-2.200 15627.538 47.077 -1.128 -0.465 2.060 3.726 -0.738
0.011 15.318 90.017 0.594 0.268 0.969 3.877 -0.172
-0.523 920.19 62.226 -1.366 -0.146 1.326 3.944 -0.705
4.418 15572.229 23.531 -0.123 -0.541 1.980 3.571 -0.705
-0.004 0.935 182.339 -1.808 0.601 1.226 4.692 0.205
-0.079 193.451 364.874 -9.569 -0.116 1.133 4.244 -0.577
1.899 3640.213 24.731 0.766 -0.349 1.681 3.426 -0.625
0
0.590963 0.050189 0.080103 0.345166 0.343829 0.332710 0.001311 0.021374
0.482941 0.094929 0.084698 0.411032 0.331954 0.572352 0.260870 0.337264
0.135543 0.065238 0.079329 0.542244 0.187500 0.603208 0.306227 0.424866
0.370270 0.032725 0.025054 0.381317 0.412698 0.385321 0.001666 0.043085
0.402427 0.053992 0.155397 0.553661 0.476190 0.432629 0.237224 0.271698
0.422609 0.096439 0.087974 0.463496 0.471598 0.242233 0.153389 0.360447
0.395349 0.026724 0.044753 0.738226 0.479060 0.367367 0.073430 0.361345
0.427911 0.042625 0.055986 0.558653 0.350634 0.332762 0.165738 0.301011
0.611057 0.054988 0.166710 0.497393 0.518569 0.373766 0.153005 0.320611
0
meanV12 meanH12 meanD12 varV12 varH12 varD12 skwV12 skwH12
skwD12 krtV12 krtH12 krtD12 meanEv12 meanEh12 meanEd12 varEv12
varEh12 varEd12 skwEv12 skwEh12 skwEd12 krtEv12 krtEh12 krtEd12
meanV23 meanH23 meanD23 varV23 varH23 varD23 skwV23 skwH23
skwD23 krtV23 krtH23 krtD23 meanEv23 meanEh23 meanEd23 varEv23
varEh23 varEd23 skwEv23 skwEh23 skwEd23 krtEv23 krtEh23 krtEd23
meanV34 meanH34 meanD34 varV34 varH34 varD34 skwV34 skwH34
skwD34 krtV34 krtH34 krtD34 meanEv34 meanEh34 meanEd34 varEv34
varEh34 varEd34 skwEv34 skwEh34 skwEd34 krtEv34 krtEh34 krtEd34
class
Detecting Messages Detecting Messages in jpegsin jpegs
ObservationObservation Trials performed on 2000 imagesTrials performed on 2000 images
1000 clean and 1000 stego1000 clean and 1000 stego Random selection of 1000 instances Random selection of 1000 instances
without replacement (500 each class)without replacement (500 each class) Messages in stego had sufficient sizeMessages in stego had sufficient size
Results show overwhelming accuracyResults show overwhelming accuracy Bior3.1 True Neg 100%, True Pos 98.6%Bior3.1 True Neg 100%, True Pos 98.6% Rbio5.5 True Neg 99.8%, True Pos 98.8%Rbio5.5 True Neg 99.8%, True Pos 98.8%
Detecting Messages Detecting Messages in jpegsin jpegs
ConclusionConclusion Messages of sufficient size can be Messages of sufficient size can be
detected in stego images with great detected in stego images with great accuracyaccuracy
Improved accuracy may be due to a Improved accuracy may be due to a large training setlarge training set
1000 (800/200)1000 (800/200) 500 (400/100)500 (400/100)
Restricted domainRestricted domain Many similar imagesMany similar images
Detecting Messages Detecting Messages in jpegsin jpegs
ProblemsProblems Authors did not handle log of zero Authors did not handle log of zero
problemproblem Replaced with small valueReplaced with small value
Differing jpeg sizes need differing Differing jpeg sizes need differing message sizesmessage sizes
Dynamic message injectionDynamic message injection
Detecting Messages Detecting Messages in jpegsin jpegs
Other ClassifiersOther Classifiers Tests were run on J4.8, SMO, Logistic Tests were run on J4.8, SMO, Logistic
and Naïve Bayes for bior3.1 and and Naïve Bayes for bior3.1 and rbio5.5 with 80/20 split and default rbio5.5 with 80/20 split and default settingssettings
ResultsResults
Future WorkFuture Work
Would like to find optimal stemsWould like to find optimal stems Pattern matchingPattern matching Text miningText mining CryptanalysisCryptanalysis
Would like to optimize TestMsg codeWould like to optimize TestMsg code C/assembly codeC/assembly code
ReferencesReferences Petitcolas, F.A.P., Anderson, R., Kuhn, M.G., "Information Petitcolas, F.A.P., Anderson, R., Kuhn, M.G., "Information
Hiding - A Survey", July1999, URL: Hiding - A Survey", July1999, URL: http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-infohiding.pdf (11/26/0117:00)infohiding.pdf (11/26/0117:00)
Farid, Hany, “Detecting Steganographic Messages in Digital Farid, Hany, “Detecting Steganographic Messages in Digital Images” Department of Computer Science, Dartmouth Images” Department of Computer Science, Dartmouth College, Hanover NH 03755College, Hanover NH 03755
Moby™ Words II, Copyright (c) 1988-93, Grady Ward. All Moby™ Words II, Copyright (c) 1988-93, Grady Ward. All Rights Reserved.Rights Reserved.
Lyu, Siwei and Farid, Hany, “Steganalysis Using Color Wavelet Lyu, Siwei and Farid, Hany, “Steganalysis Using Color Wavelet Statistics and One-Class Support Vector Machines”, Statistics and One-Class Support Vector Machines”, Department of Computer Science, Dartmouth College, Department of Computer Science, Dartmouth College, Hanover, NH 03755, USAHanover, NH 03755, USA
Farid, Hany, “Detecting Hidden Messages Using Higher Order Farid, Hany, “Detecting Hidden Messages Using Higher Order Statistical Models” Department of Computer Science, Statistical Models” Department of Computer Science, Dartmouth College, Hanover NH 03755Dartmouth College, Hanover NH 03755
Spy Vs. Spy Vs. SpySpy
by Antonio Prohias from MAD Magazine
Have a good Winter Have a good Winter Break!Break!