Today, while researching some hits on my website, I encountered something that Google Analytics couldn't tell me and for various reasons, probably will never be able to tell me. Naturally, I wondered if I could write a program to solve the problem for me. I wanted to use the same architecture as Analytics, but didn't actually know what that architecture was. Basically, I wanted to know, "How does Google Analytics work?"
It turns out to be pretty easy. There are essentially only four steps:
- A visitor loads a page on your website
- In the process, her browser loads and runs some Javascript from Google
- That Javascript collects information about the visitor
- The information is sent to Google by requesting a URI and passing the details as CGI parameters
A program on Google's end then stores all the detailed observations that were recorded by the Javascript. It probably augments those observations with some of its own (IP address, user agent string, etc). At some point, all those details are analyzed and displayed in pretty graphs.
As an experiment, I was able to make a basic implementation in about 70 lines of Javascript. It would take about that many lines of Perl to do the CGI side. The analysis would be harder, but also quite a bit of fun.
Information collected
Here's a list of the useful information that Google Analytics' Javascript collects and submits to Google for tracking. Hidden within DOM documentation are all kinds of other useful tidbits that might be worth tracking.
- title of the current document
- document URI's hostname
- document URI's path and query portion
- URI of the referring page
- computer screen dimensions
- computer screen color depth
- character set of the document being viewed
- default language of the browser (English, Spanish, etc)
- whether Java is enabled
- which version of Flash is installed, if any