NNSquad - Network Neutrality Squad
[ NNSquad ] SSL vs. "Referers": Friend or Foe?
SSL vs. "Referers": Friend or Foe? http://lauren.vortex.com/archive/000895.html In a recent posting in some other venues, I noted with pleasure that Google is now testing the use of "SSL by default" for Google Search ( http://j.mp/nRuYTG [NNSquad] ). In passing, I very briefly touched on the implications of SSL for "referer" data that is traditionally passed along to Web sites when a user clicks a link. I received a surprisingly high level of diametrically opposed reactions. On one side, people were saying, "Good riddance! Referers are privacy invasive and never should have been implemented in the first place!" On the other hand, I also got many messages with claims along the lines of, "This is just Google's attempt to ruin my analytics -- they don't really care about privacy." The latter assertion is the easier to address. I've been talking to Google folks for years about SSL issues, and there has been a consistent desire to move their services toward this protection on a default basis (as they've already done with Gmail and Google+). The collateral impact on referers has been an issue of concern all along, and possible workarounds such as enhanced Webmaster Tools data and other techniques have always been part of the discussions. But the still largely status quo of "postcard security" data on the Internet, where any entity -- commercial, government, or others -- who have access to a data stream can read most information in the clear, has become intolerable, and securing these paths to the extent practicable must be viewed as an important priority. For now, SSL is a practical means to that end. The "Good Riddance" reaction probably needs a bit more exploration. Let's remember what "referers" (typically misspelled in this manner due to an original misspelling in the HTTP specifications) really do. When a user views info on a Web site, the associated site's logs will typically record a variety of data regarding the connection, including source IP address, various browser-related configuration information, and other information -- most notably for our discussion the referer. The referer is the URL of the page that contained the link that the user clicked to reach the destination site -- the page that "referred" the user. In the case of a search results page, that referer will usually including the user's search query as embedded in the URL itself. However, when a user click arrives via a site that was viewed through SSL, the information that would otherwise normally have been relayed (like the referer) will usually no longer appear. Note however that the IP address of the user will still be present. The passing of referer information is a function not only of the sites involved but also of the user's browser. Various browser extensions and plugins have long existed that allow users to optionally block referers if they wish. There are various reasons why referers were originally implemented. One important one was to aid in session sequencing, since knowing the full URL of the previous page -- that referring page -- could be useful to maintaining session transactional states, especially in the absence of more advanced methodologies that would further evolve later. Some critics of referers make the claim that only "snooping businesses" are interested in such data, and so cutting it off would harm nobody of real merit. But this really is not true. I believe if you took a poll, you'd find that the vast majority of Web site operators -- including nonprofits, individuals, and so on, not just commercial enterprises -- use referer data to better understand what people find to be of interest on their sites, and to have some sense of how their sites are being referenced by the broader world. I know that I find this data to be of significant interest, and I don't run any ads or other monetizing elements on my blog. While there are other ways to discover relevant links over time, being able to see immediately when there's a "flood" of hits referring from a particular site (e.g., a Slashdot posting!) can be very important not just as a point of knowledge but from a site management standpoint as well. Visible search terms in referers tell me what issues from my postings are of particular worth to readers, and help me determine followups and future emphasis. Could I continue posting new items if all log referers suddenly vanished? Sure. It would mean switching to more limited tools that were less real-time in nature, like retrospective searching and such, to try understand the dynamics of users viewing my site, but the fundamental ability to run my blog would of course not be significantly undermined. But there would be a notable diminishing of the "value proposition" between readers and the site. While you may never have thought of them in this way, referers can be viewed as something of an "equalizing" agent between large and small Web sites. When you conduct a search on a search engine, that site obviously knows your query, so that they can provide you with a list of results. You then usually visit sites based on that list, and (hopefully) obtain the information of interest. This transaction -- that typically occurs without your being charged any fee by either party -- still has real value. Questions: Is it unreasonable for the site that actually provides the information that answers your query, to see the same data (the search query itself) that the search engine itself had? The search engine must have the query to process your request, and can use this information to improve its search results over time. Is it reasonable to argue that the actual content site should have the same opportunity to improve its services through the use of this data? These questions can certainly be argued either way. I personally come down on the side of best possible use of data in a responsible and egalitarian manner whenever possible. In any case, the increasing routine and default use of SSL, with the many important benefits it brings, is likely moving the era of traditional referers toward a gradual diminution and ultimately an effective closure in many respects. Other analytical mechanisms (either existing or yet to be developed and deployed) will likely take up some of the slack, and in some cases provide even greater insights. But perhaps of even greater importance in the long run, is the reality that questions surrounding the collection and use of transactional data, even related to relatively routine operations on the Internet, can be much more complex than they might appear at first glance, and that seemingly obvious "simple" solutions (such as blanket restrictions) may actually create or exacerbate far more problems than they might solve. This is true regardless of who is referering to ... I mean referring to ... uh, *talking about* these issues! --Lauren-- Lauren Weinstein (lauren@vortex.com): http://www.vortex.com/lauren Co-Founder: People For Internet Responsibility: http://www.pfir.org Founder: - Network Neutrality Squad: http://www.nnsquad.org - Global Coalition for Transparent Internet Performance: http://www.gctip.org - PRIVACY Forum: http://www.vortex.com Member: ACM Committee on Computers and Public Policy Blog: http://lauren.vortex.com Google+: http://vortex.com/g+lauren Twitter: https://twitter.com/laurenweinstein Tel: +1 (818) 225-2800 / Skype: vortex.com