"Detecting Steganographic Content on the Internet",Niels Provos and 
Peter Honeyman, ISOC NDSS'02, San Diego, CA, February 2002.
   
   Summary
   -------
   
   The authors note the popular press has repeatedly asserted that 
terrorists are using steganography to hide communication from law 
enforcement. In order to test these assertions, the authors construct a 
statistical test fordetecting steganographic content in JPEG images. 
They use this test is used incombination with a webcrawler to gather 
and examine image from e-Bay and a USENETarchive. Since the statistical 
test only indicates the "likely" presence of hidden data, suspicious 
images are then subjected to a dictionary attack to determine if there 
really is hidden data. As of the writing of this paper, examination of 
2million images from e-Bay and 1 million from a USENET archive have not 
yielded any discoveries of actual data. The authors start off by 
describing how typical steganographic systems identify redundant bits 
in digital images, and replace those bits with a hidden message. When 
such a scheme us applied to GIF images, the resulting hidden message 
can be susceptible to visual attacks, apparently based on examining 
histograms of color frequencies. The JPEG format differs from the GIF 
in that it uses 64 DCT (Discrete Cosine Transform) coefficients to 
encode 8x8 pixel blocks. Altering the least significant bits of the 
already quantized coefficients can be done without creating visibly 
perceptible changes to the image. To detect changes to JPEG data, the 
authors perform a chi-square significance test on the frequencies of 
the DCT coefficients.  Using this test on three popular and/or free 
steg packages (JSteg/-Shell, JPHide, and OutGuess), the authors develop 
statistical signatures for each and calibrated their system on a set 
1500 images from a digital camera. The system has false negative rates 
of 10%, 20%, and 60% respectively for the steg packages in question. 
The verification of hidden content is done by a fairly standard 
dictionary attack, run in parallel using a multi-lingual dictionary 
with pin numbers and concatenated English words. For JPHide and JSteg-
Shell, known header and appended signature content can be used to 
verify a correct password guess. The JSteg-Shell system uses only a 40-
bit key, so you could search the keyspace. 
   
   Verifying a correct guess for OutGuess is difficult, due to the lack 
of cribs such as header and signature information. The authors conclude 
by stating the results of their study: they found no hidden data in the 
e-Bay/USENET images they examined. They believe this shows either:
   1) No one is using steganography on the internet.
   2) People use more sophisticated techniques than they can detect.
   3) People used the techniques they targeted, but with good passwords. 
or,
   4) They looked at the wrong source of images.  
   
   Questions
   ---------
   1) Is it true that we expect hidden to increase the entropy of 
redundant data?   If the hidden data weren't encrypted, would this 
assumption fail and would   that impact the author's scheme?    Yes, 
and and probably not.	
   2) The chi-square test relies on having an accurate measure of the 
expected   frequencies in a histogram. The authors compute this 
expectation from   the (possibly altered) image by assuming that "an 
image with hidden data   embedded has similar frequency for adjacent 
DCT coefficients". Is this   a valid assumption?   Yes, but it also 
probably leads to the false positives.	
   3) Stegdetect seems to generate lots of false negatives. Why do you 
think this   is and how significant a problem is it? Do you agree that 
the false positive   rate is more important?     It's significant and 
not good. 
   4) The authors assume that smaller images, which are harder for them 
to handle,    are not suitable for message transmission. Do you agree?       
You could packetize messages into smaller images and evade detection 
that way.	
   5) How effective do you think the dictionary attack is? Do you think 
users will   use poor passwords?    Not very effective, although it's 
debateable whether not people   will use good passwords
   7) Is there anyway to improve Stegbreak, particular in verifying the 
breaking   of OutGuess?   Not discussed.   
   8)  What do you think of the performance of the dictionary attack?     
Not discussed.
   9)  How easy is it to defeat their statistical detection system?    
Not too hard, the authors already know how, and have published it.
   10) Do the results of the paper led you to think steganography is not 
widely    used? Do you think the authors were looking in the right 
place?
   11) Do you believe steganography is an effective way to hide 
communication?     There are probably easier secure ways to 
communicate.
   12) Do you know of any other research that would support the 
conclusion that    steganography is not being used widely on the 
internet?    Honeyman has done further studies of website images, but 
maybe with    the same tools described here.
   13) Is anyone else suprised there wasn't an image retrieving 
webcrawler available?      They could have used wget (maybe...).    
   
   Pros
   ----
   -- Good idea for a paper (looking for hidden content is an 
interesting idea).
   -- Could also detect hidden watermarking.
   -- Some analysis (probability) used in determining how to improve   
performance.
   -- Chi square test of DCT coefficients an interesting approach.
   
   Cons
   ----
   -- Seems very possible there is content they failed to detect.
   -- Inconclusive results (can't *prove* there's nothing out there). 
   -- Dictionary might not be strong enough.
   -- Poor job with related work section.
   -- Steg more appropriate for insider attacks (keeping receivers   
anonymous).
   
   Rating
   ------
   Accept – 0
   Weak Accept - 7 
   Weak Reject – 1
   Reject - 0