Sep 06 2020
Light as a Feather: Removing Google Analytics
Gathering analytics data is becoming increasingly difficult for publishers. uBlock is commonly used by the average internet user and Pi-hole is quickly growing in the technologist space. I applaud these privacy-concerned folks for taking personal agency in a time where internet privacy is more difficult than ever. From a publishers point of view, users ghosting analytics is simply a case of “rolling with the punches”.
While I had Google Analytics on my site for about two years, I seldomly checked it. It felt more like giving Google free data more than anything. Part of the issue is that I never really took the time to learn the ins-and-outs of the Google Analytics tooling. Likewise, I also did not engage in marketing campaigns where this feedback could be vital.
What are my “business” requirements for analytics on this blog? Not much. It’s a personal site tucked really far back in the corners of the internet. Since nothing is monetized, engagement is not something that’s really on the todo list.
With this in mind, it seems excessive that the Google Analytics tracker constituted approximately 75% of the page-weight. In this case, I’ve decided that it is better to drop Google Analytics – constituting a giant step towards my philosophy of “only give the users what they ask for.”
So what now?
I’ve decided to exchange Google Analytics for a completely server-side analytics software called awstats. Awstats parses server logs to gather information regarding traffic to the site. I have these statistics digested daily, and review the results about once per month. The installation is relatively easy, and reviewing the data is as easy as rsyncing a pdf to my local machine. Of course, I don’t run the rsync command directly. It is baked into my daily routine (last paragraph).
Some of the general information that is reported includes:
- Number of unique visitors YTD (likely ip-based since there is no cookie tracking).
- Total number of visits YTD
- Total page views
- Bandwidth used
The above statistics are broken down further by:
- month of year
- day for the current month
- day of week
- IP address/host with last visit
Other sections include:
- Most viewed pages
- Operating system
- Origin: direct hits, links from search engines, and links from external pages (i.e. Hacker News)
- Unknown user agents
While going through some of this data, I learned that the most frequent unknown user agents that visits this site is
Blackboard_Safeassign. Interesting that my site is archived by Blackboard and used in conjunction with their plagiarism detection software.
One other interesting discovery is that 80.2% of my traffic is from non-mobile devices. Though with 17.7% being Linux, I’m sure a good chunk of the desktop traffic is just from me. However, it is also very possible that some bots are spoofing their operating system.