Today, while researching some hits on my website, I encountered something that Google Analytics couldn't tell me and for various reasons, probably will never be able to tell me. Naturally, I wondered if I could write a program to solve the problem for me. I wanted to use the same architecture as Analytics, but didn't actually know what that architecture was. Basically, I wanted to know, "How does Google Analytics work?"
It turns out to be pretty easy. There are essentially only four steps:
- A visitor loads a page on your website
- In the process, her browser loads and runs some Javascript from Google
- That Javascript collects information about the visitor
- The information is sent to Google by requesting a URI and passing the details as CGI parameters
A program on Google's end then stores all the detailed observations that were recorded by the Javascript. It probably augments those observations with some of its own (IP address, user agent string, etc). At some point, all those details are analyzed and displayed in pretty graphs.
As an experiment, I was able to make a basic implementation in about 70 lines of Javascript. It would take about that many lines of Perl to do the CGI side. The analysis would be harder, but also quite a bit of fun.
Information collected
Here's a list of the useful information that Google Analytics' Javascript collects and submits to Google for tracking. Hidden within DOM documentation are all kinds of other useful tidbits that might be worth tracking.
- title of the current document
- document URI's hostname
- document URI's path and query portion
- URI of the referring page
- computer screen dimensions
- computer screen color depth
- character set of the document being viewed
- default language of the browser (English, Spanish, etc)
- whether Java is enabled
- which version of Flash is installed, if any
4 comments:
Most other packages send a request for a dummy 1pixel by 1pixel image and then capture the analytics details. Does Google Analytics too work in the same way? Is there a way to figure this out?
Yes Akshay, Google Analytics returns a 1px square image. If you use Firefox and load the LiveHTTPHeaders extension, you can watch the request headers. You could also read the source code of Google's urchin.js That's the file linked to when installing Analytics on a website.
I'm curious about the performance impact of Google Analytics as I'm thinking about doing it for some iPhone web sites.
How big is urchin.js? Is it cached? If so, is it cached across site (ie, if the user got a cached copy of urchin.js form some other site and then visits our site, is he required to download it again).
Does the server request issued by the JS occur after the user sees the page? Can I make the JS download after the page is shown, so the user's experience is not impacted by the tracking? (some gprs connections are very very slow, and the latency to connect to a new server is very high, ~2 seconds.)
sorry for all the questions, but let me know if you've looked into this, otherwise i'll investigate.
ajay, the performance impact is relatively small, but noticeable over slow connections and connections with high latency. The ga.js file (previously called urchin.js) is served with gzip compression enabled and is about 9.1K after compression (22.7K without it). The file is served with the Cache-Control header with a value for max-age which appears, in many cases, to prevent the file from being fetched again.
The JavaScript request used for tracking is made after all the rest of the page has been loaded. When browsing sites across the internet, I have noticed times where the page load is halted while "Waiting for google-analytics.com..." but my home connection is quite slow.
Post a Comment