This website runs as a static site hosted on GitHub pages. I don't currently have any way of telling how many people are reading it, or what pages they're reading etc.
The simple way to get this data would be to add Google Analytics to the website. However, I think it's an interesting exercise to work out how to implement something similar ourselves. This would let us specify the amount of information we'd like to log about each visitor and we'd also have control of the data ourselves, rather than shipping it to a third party.
The web server accepts requests, and stores the information in a database.
What analytics do we care about?
We get a couple of data points from the HTTP request itself:
- The visitor's country (derived from the IP address)
On top of this, we can send the following in the request body:
- URL of the visited page
- Referrer URL
- Screen width (so we can tell if the user is on web or mobile)
I'm interested in preserving the privacy of website visitors, so we:
- Won't store IP addresses
- Won't set any cookies to uniquely identify visitors
- Will honour the Do Not Track (
Requests will be
https://analytics.routley.io/track with the body
I think I'm going to start with a SQLite database. I'm going to run the analytics server on a free Google Cloud Instance, and I'm not sure if I need the power or scalability of MySQL or Postgres. I'm not super worried about losing historic data if the instance dies for whatever reason.
Arguably a time series database like Prometheus would be a better fit for our data - I'm avoiding this because I use Prometheus at work but don't use SQL and would like to get more experience with it.
We can write SQL queries to find out information about our visitors:
- Number of views per page per day
- Bucketed screen widths, so we can see what devices visitors are using
- What countries people are visiting from
- Where visitors are being referred form