fathom: Don’t assign unique identifiers/fingerprints to visitors by default

Disclaimer: I’m not a lawyer and this isn’t legal advise.

It would be useful for General Data Protection Regulation (GDPR) compliance to not store IP addresses, cookie identifiers, or other unique fingerprints. The current unique identifiers can be decoded back to IP addresses. See EU GDPR and personal data in web server logs for context.

I’d like to see this as the default mode, but at least make it an option. This could be a unique selling point

IP addresses also aren’t all that useful any more for assigning a unique identifier as mobile devices roam between different networks several times in a normal day (at home, mobile carrier, work, café, etc.) IPv6 reduces the usefulness of this further by assigning new addresses periodically (usually once per reboot, reconnect: or 48, 24, or 12 hours depending on operating system and network environment).

Here are some ideas and alternatives approaches to get the same data in aggregate without assigning unique identifiers to each user:

Pageviews per session and number of unique users:

  1. Set cookies in the responses to /collect that runs an incremental short-lived/session pageview counters. E.g. Set-Cookie: ana_pageviews=1; path=/collect; max-age=3600 (1 hour session). Second request you send back the same cookie with a value of 2, etc.
  2. Increment $unique_sessions (unique users) by one per request without this cookie. Increment $sessions_with_atleast_2_pageviews by one, etc.

User-retention/repeat visitors:

  1. Set a cookie in the responses to /collect that includes an imprecise timestamp (e.g. only daily precision to avoid them being too unique). E.g. Set-Cookie: ana_lastvisit=2018-04-23; path=/collect; max-age=7776000 (3 months). Reset on every visit.
  2. Don’t copy the exact timestamps, but find the time since the last visit from now() - $cookie['ana_lastvisit']. Maybe don’t track this within an active session ($cookie['ana_pageviews'] is set)?

What else is needed to track?

On the use of cookies: The cookies are transparent (even self-explanatory), their use is easy to explain in a privacy policy, and in my opinion they should be GDPR-friendly. They’re not used to track the behaviour of an individual users, just the movements and trends in the herd.

Disclaimer: I’m not a lawyer and this isn’t legal advise.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@rosswintle, put this on repeat and get back to working on Kownter. It’ll be a great alternative (more alternatives is great!) and you just need to stay motivated. Make it your own and it’ll probably turn out great. Nick some stylesheets and graphs from Fathom if you like the visuals and stuff your own data in them.

This isn’t legal advice and I’m not anyone’s lawyer. The following could very well be totally wrong: Fewer magical identifiers means more transparency. It also mean people won’t contact the operator of the analytics service to ask for a copy of the data belonging to $magical_token or ask to have the data of $magical_token deleted. I specifically suggested cookie names that were named after the data they contain instead of a single magical cookie containing all the data. Individual cookies are more easy for people to inspect. Opting out of this is as simple as disabling cookies, and their use is easy to document in a privacy policy.

The following is provided for context (GDPR sections mentioning identifiers as relevant to this discussion):

GDPR Recital 30:

Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

GDPR Article 4 Definitions: Point 1:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

This is an interesting discussion and I hope you don’t mind me both chipping in and following. @dannyvankooten - you look like you’re WAY more into this than I am. But I’m currently working out this exact same thing on my (much smaller and probably less viable) Kownter project. In fact, one of the reasons I’m building it is to see what can be done without cookies.

Some kind of aggregate flow between pages as @da2x suggests is pretty much the thinking that I came to . I don’t need to see individual’s paths if I can see the aggregate drop-off as people move through the user journey. In this post I attempted to explain this, saying: “we can still report the ratio of conversions against page views.”

OK, so there are probably some advanced cases where you want to know more than that but if you want that then you probably actually want GA, right?

I’ve also been wondering if it makes sense, is useful, and is performant enough (server-side) to store a cookie, but to recycle it on each visit. And, if you do this, is it any more private than just setting a cookie and leaving it there? I’m not sure it is.

I am not actually convinced that a session ID is personally identifiable as there’s no reverse-lookup. If you were storing a session ID alongside an IP address then that would be different. I know GDPR says something about web-tokens, but I think what you/we are doing here is way within the spirit of GDPR.

Interested to see how this progresses.