AOL released the logs of all searches done by over 650,000 of their users over the course of March-May of 2006. They didn't ask any of these users whether they could release the information, nor did they take extensive steps to hide personally identifiable information in the search queries. They took the token step of anonymizing the screennames by assigning them to a number. As has been reported on Slashdot, Technorati, and other sites, the data has already yielded some rather... interesting information. Personally, I grabbed a mirror and plugged the files into a mySQL database to play with. So far, it's been interesting.
Normalized queries:
36,389,567 lines of data
21,011,340 instances of new queries (w/ or w/o click-through)
7,887,022 requests for "next page" of results
19,442,629 user click-through events
16,946,938 queries w/o user click-through
10,154,742 unique (normalized) queries
657,426 unique user ID's
read more | digg story
Tuesday, August 08, 2006
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment