"Detecting Steganographic Content on the Internet",Niels Provos and Peter Honeyman, ISOC NDSS'02, San Diego, CA, February 2002. Summary ------- The authors note the popular press has repeatedly asserted that terrorists are using steganography to hide communication from law enforcement. In order to test these assertions, the authors construct a statistical test fordetecting steganographic content in JPEG images. They use this test is used incombination with a webcrawler to gather and examine image from e-Bay and a USENETarchive. Since the statistical test only indicates the "likely" presence of hidden data, suspicious images are then subjected to a dictionary attack to determine if there really is hidden data. As of the writing of this paper, examination of 2million images from e-Bay and 1 million from a USENET archive have not yielded any discoveries of actual data. The authors start off by describing how typical steganographic systems identify redundant bits in digital images, and replace those bits with a hidden message. When such a scheme us applied to GIF images, the resulting hidden message can be susceptible to visual attacks, apparently based on examining histograms of color frequencies. The JPEG format differs from the GIF in that it uses 64 DCT (Discrete Cosine Transform) coefficients to encode 8x8 pixel blocks. Altering the least significant bits of the already quantized coefficients can be done without creating visibly perceptible changes to the image. To detect changes to JPEG data, the authors perform a chi-square significance test on the frequencies of the DCT coefficients. Using this test on three popular and/or free steg packages (JSteg/-Shell, JPHide, and OutGuess), the authors develop statistical signatures for each and calibrated their system on a set 1500 images from a digital camera. The system has false negative rates of 10%, 20%, and 60% respectively for the steg packages in question. The verification of hidden content is done by a fairly standard dictionary attack, run in parallel using a multi-lingual dictionary with pin numbers and concatenated English words. For JPHide and JSteg- Shell, known header and appended signature content can be used to verify a correct password guess. The JSteg-Shell system uses only a 40- bit key, so you could search the keyspace. Verifying a correct guess for OutGuess is difficult, due to the lack of cribs such as header and signature information. The authors conclude by stating the results of their study: they found no hidden data in the e-Bay/USENET images they examined. They believe this shows either: 1) No one is using steganography on the internet. 2) People use more sophisticated techniques than they can detect. 3) People used the techniques they targeted, but with good passwords. or, 4) They looked at the wrong source of images. Questions --------- 1) Is it true that we expect hidden to increase the entropy of redundant data? If the hidden data weren't encrypted, would this assumption fail and would that impact the author's scheme? Yes, and and probably not. 2) The chi-square test relies on having an accurate measure of the expected frequencies in a histogram. The authors compute this expectation from the (possibly altered) image by assuming that "an image with hidden data embedded has similar frequency for adjacent DCT coefficients". Is this a valid assumption? Yes, but it also probably leads to the false positives. 3) Stegdetect seems to generate lots of false negatives. Why do you think this is and how significant a problem is it? Do you agree that the false positive rate is more important? It's significant and not good. 4) The authors assume that smaller images, which are harder for them to handle, are not suitable for message transmission. Do you agree? You could packetize messages into smaller images and evade detection that way. 5) How effective do you think the dictionary attack is? Do you think users will use poor passwords? Not very effective, although it's debateable whether not people will use good passwords 7) Is there anyway to improve Stegbreak, particular in verifying the breaking of OutGuess? Not discussed. 8) What do you think of the performance of the dictionary attack? Not discussed. 9) How easy is it to defeat their statistical detection system? Not too hard, the authors already know how, and have published it. 10) Do the results of the paper led you to think steganography is not widely used? Do you think the authors were looking in the right place? 11) Do you believe steganography is an effective way to hide communication? There are probably easier secure ways to communicate. 12) Do you know of any other research that would support the conclusion that steganography is not being used widely on the internet? Honeyman has done further studies of website images, but maybe with the same tools described here. 13) Is anyone else suprised there wasn't an image retrieving webcrawler available? They could have used wget (maybe...). Pros ---- -- Good idea for a paper (looking for hidden content is an interesting idea). -- Could also detect hidden watermarking. -- Some analysis (probability) used in determining how to improve performance. -- Chi square test of DCT coefficients an interesting approach. Cons ---- -- Seems very possible there is content they failed to detect. -- Inconclusive results (can't *prove* there's nothing out there). -- Dictionary might not be strong enough. -- Poor job with related work section. -- Steg more appropriate for insider attacks (keeping receivers anonymous). Rating ------ Accept – 0 Weak Accept - 7 Weak Reject – 1 Reject - 0