Infranet: Circumventing Web Censorship and Surveillance

by Nick Feamster, Magdalena Balazinska, Greg Harfst, 
   Hari Balakrishnan, David Karger

in the Proceedings of Usenix Security Symposium 2002


<Summary>
Infranet is yet another framework to overcome the censorship,
specifically on www access. Infranet is particularly pointing at the
state level censors or corporate level censors that an ordinary user
doesn't have enough capability to fight against.

Most of previous attempts like Publius, Tangler tried to overcome or
bypass the censorship by hiding the information from the censors'
eyes. They either use well-known proxies, encryption to hide.  Main
Problem with these approaches is, what they're doing can be obvious to
censor's eyes and considering a state-level censor's power, it is
trivial for the censors to just "block" the access to suspicious
places.

As authors claim in the paper, "To be effective against blocking, a
scheme should be covert as well as secure", Infranet adds a new flavor
to the batch, which is the covertness.  Infranet tries to hide the
fact that there is an effort going on to circumvent censorship. By
making the attempt hopefully invisible from the censor's eyes,
Infranet may be able to provide more robust anti-censorship services
than others.

Infranet uses a user side proxy called requester, and server side
proxy called responder. User's browsers use requester as the web
proxy. Requester modulates user's request into a sequence of URLs and
send them to the responder. Responder decodes the sequence, fetch the
actual content requested, embed it into another data format like a
picture, and send it back. To censor's eyes, it looks like that the
client is making some sequence of web requests and gets back some
strange picture but that's all.


<Pros> 

Contributed by Erin Wolf
1. research about other methods 
2. using other users link traversal as a basis for probability 
  analysis to allow users to blend in 
3. server testing and analysis at end, as well as ideas for reducing 
  the load on the server for future incarnations of the software 
4. using an apache mod so that installation is fairly easy and standard 
5. clarity of methods - they consistently outlined what was of primary 
  importance and what was secondary (e.g. on uplink security was primary, 
  bandwidth concerns were secondary)

Contributed by Apu Kapadia
1. Bold step, most researchers are afraid to do such research because
  of funding issues. They give bold examples such as porn, and Nazi
  propaganda that are being censored. They are supporting the use of
  covert channels, which could be bad news for US Govt. labs.
2. Interesting use of steganography, embedding webpages inside webpages
3. They suggest ways of distribution (bundling with Apache)
4. Does not rely on headers, SSL, etc. All the content is hidden within the
  legitimate parts of the webpage (images).
5. Have an actual implemention. 


<Cons>

 Contributed by Geetanjali Sampemane
1. Bootstrapping:  Assuming that a user can find a responder by some
  out-of-band mechanism.  If the user can, so can the censor.
2. The assumption that the censor will not block a responder if the
  responder also serves legitimate content (in addition to
  surreptitiously serving censored content) is weak.
3. Insider/rubberhose attack: The censor can pretend to be a responder
  and acquire a list of users trying to access banned content (those
  that request it).  Similarly the censor can pretend to be a valid
  requester, and get a list of responders (by the same means as
  legitimate users would) and shut off access to them.  

Contributed by Paul Kennedy
1. Distribution model: word of mouth and can't fall into the hands 
  of the bad guys doesn't really seem like a workable scheme.
2. In figure three they indicate that there is this optional "Initialize
  muldulation function" step.  Would it be possible to inject a lot of
  bogus "init modulation function" messages?  The IKEY is responder
  specific and subject to attack.
3. assume the confidentiality of the dictionary between the requester 
  and the responder.  
4. Range mapping model is based on one-hop conditional probabilities.  
  Is this going to look different if we try to do correlations of longer
  runs?
5. A serious con is server load in that they encode content in every
  image.  This leads to another problem.  Where are you going to get your
  random content that you are going to be encoding in all these images. 
  Although the content is compressed before it's hidden (at least for
  text), you would want there not to be a difference between the "random
  garbage" and the "hidden document".
6. On page 10, the authors suggest allowing the user to override a
  potential link and the authors say that this will introduce more noise.
  Most people don't know what "random" looks like (and this is a real
  danger).  Correlations and patterns are easy for people to see and they
  try to avoid any pattern.
7. Implementation only uses Cookies--not going to change the world with
  cookies.
8. Proxy--SQUID


<Moderator's opinion and questions>
1. Doesn't look like a completed research, but a interim report for an
  extensible framework that can utilize pluggable components
2. Like the framework, but lack of details, questionable performance
  numbers and so on.
3. How does the range mapping actually work?
4. use of covert channel and stenography to overcome censorship - any 
  usual philosophical objections to such technology?
5. Is the web surfing trace used in the paper in any way representative?

<Evaluation result> *
Strong Accept - 8
       Accept - 4
       Reject - 0
Strong Reject - 0

* Probably one of the best rated papers we have covered so far. :)