A Marketer’s Guide to Data Tracking: What’s Really Going on Within Your Website?
By Hessie Jones
I never thought I could write this article, mainly because I don’t have the technical chops to build credibility for this work. But in recent weeks I have come to understand that as a marketer and someone who is attuned to data privacy, and more particularly, the resulting harms, I needed to go deeper than just understanding what was being done, but how it was being done.
I was fortunate enough to meet Allen Woods, through a few privacy colleagues. Allen served 24 years in uniform with the British Army and a further 25 years working for the UK Ministry of Defence on IT related matters mainly with compliance issues in the UK Defence Supply Chain and Logistics IT. His relevant experience and s keen understanding of the evolving privacy legislation, made it ideal for me to lean into his teachings. Marketers, especially, who have been doing digital media and advertising for years, I would bet, have no idea to the extent that code has become the pervasive vehicle to bring surveillance capitalism to the state we see today.
So, I took some time to learn. This is a glimpse of what is happening on a common website.
Some definitions to know and why each is significant:
- Why this is important: AJAX simply has made it easier to send and receive information from across multiple browsers to host servers to improve the end user experience and minimize disruption. Facebook and Twitter rely on tech like AJAX to keep web pages up to date (FBLikes, RTs, timestamps)
Supporting event driven functions means that a program flow is determined by the user actions (mouse clicks, threads, messages from programs). This is centered on performing certain actions based on the user input.
- Why this is important: The w3C has identified HttpRequests that passively expose the following : 1) browser fingerprinting based on persistent 2) super cookies, correlated with other techniques to re identify you 3) header requests that may include IP information, browser, version and OS — are considered Unsanctioned Web Tracking
- Why this is important: The reasons for its popularity: 1) There are plugins that are readily available for building websites or web apps. This “Write less, do more” tagline means you get more done with fewer lines of code.
Cookies: Cookies are blocks of data that are created by a web server while a user is browsing a website and put on the user’s device or computer by the web server. Each cookie has a name value, and a value that is to be stored on the user’s device.
- Why this is important: Cookies are state indicators, markers that contain small amounts of information stored on your device, that can be recognized by a web site, or other web sites that know the cookie is there and can act on the data contained in them. Examples include username, password, website preferences.
The importance of stating these definitions up front is to understand how they have contributed to the state of data collection today that has seemed to fly in the face of EU GDPR, CCPA, PIPEDA and other privacy regulations. What’s become increasingly clear to me is the massive movement and exchange of data across the web, without the end user or site controller’s (ultimately responsible for the site) consent, knowledge, let alone web or application developers who often promote the use or creation of these innocuous codes. You, as a website owner, by default, are collecting data as a proxy for someone or something.
Allen Woods has taken me through some developer code to further understand what is happening. I’ve chosen a popular clothing website, which shall remain nameless to demonstrate this process.
Understand what’s happening on your website
In order to begin to understand what’s going on in a website, it makes sense to understand what to look for. To display what’s on the page there is HTML but increasingly people are calling code components from just about anywhere.
Most people do not check what they have , what code goes into the client device at the point at which it’s delivered into the client. So let’s demonstrate this.
Here’s a video to get you started: Opening Up Developer Code for Any Website.
Generally in page using a browser “view source” option it may look something like this:
Just six lines of HTML: The lines may vary in count depending on the component, nevertheless, the code insertion is relatively tiny.
To understand what’s really happening, I opened up an API file associated with a popular traffic analyzer and what I found there were nearly 18,000 lines of codes that did not appear in the original file above, but appeared in my browser, on my device because it had been requested by the traffic analyzer.
What does this mean? This site is one of the more popular site visitor analysis toolkits that is in used by an estimated 29MM sites worldwide and each page is used to transmit a visitor device profile back to the code owners.
The website owners get a summary of their specific visitor data but the code owners are able to leverage ALL of the data collected by all 29MM sites on an ongoing basis. This massive collection of data intelligence continues, unabated.
For website visitors, over 18,000 lines of code are dropped onto their device by this third party site for every user interaction. Most of the time this is happening without their knowledge or permission.
It should be noted that such code is usually requested, asked for, by a web site, from another computer somewhere in the web and that raises an issue of control because those owning the code can and do make changes to it to their benefit, and at risk to the end users.
Remember, these requests contain references to the objects (page elements and the command codes) that make up a web page and can be made to do things like transfer the data between the website and the server requesting it. The code below is placed into the browser memory cache and then it becomes a live program that is executed based on the page events like the user clicking or entering information.
There are a number of keyword phrases to look out for when looking at code. One of them is “HTTPRequest” which is one of the indicators that AJAX is using as a coding technique to transmit something.
Through some simple searches I pulled up the following HttpRequests — (7 counted in total)
Next I did a search for the word “cookies”.
I found 67 instances of the “cookies”. Here are some examples
What this means:
You will see keywords like “mouse”, “document” (which refers to the page) and “window.location”, all of which are a means by which coders can detect things like device and browser capabilities for each end user.
All of this code is dropped into an end user browser by the popular web analyzer tool without the explicit knowledge of a site controller. Depending on the number of capabilities of components on the web page, the delivery of a single web page may mean tens of thousands of lines of code will be dropped on the client site — ALL of which site controllers will have no control over. The code may then be modified by the original developers for any number of reasons and may contain capabilities other than the stated original purpose.
It should be noted that these components also have licensing terms and conditions that will make use of the term, indemnification — which Shoshana Zuboff referred to as “sadistic” in her book, The Age of Surveillance Capitalism. On a typical Terms and Conditions page, indemnification from this web analyzer tool may look like this:
“To the extent permitted by applicable law, You will indemnify, hold harmless and defend [Company] and its wholly-owned subsidiaries, at Your expense, from any and all third-party claims, actions, proceedings, and suits brought against [Company] or any of its officers, directors, employees, agents or affiliates, and all related liabilities, damages, settlements, penalties, fines, costs or expenses (including, reasonable attorneys’ fees and other litigation expenses) incurred by [Company] or any of its officers, directors, employees, agents or affiliates, arising out of or relating to (i) Your breach of any term or condition of this Agreement…”
Why do marketers need to understand this?
Those who manage the website to run communication or acquisition campaigns need to understand the basics of carrying out a simple code review. Legally speaking, the “controller” is the person responsible for how the site functions. Secondly, “open source” does not mean “free of responsibility” and it is the case that each site request (to the data processor) will have its own terms and conditions by which controllers are legally bound and as we’ve seen (unbeknownst to the controller), the code that is dropped into the user device more often than not, does not come from the site server but rather the third party data processor (represented by standard .js calls below):
It should be pointed out that this simple exercise does not require coding knowledge, but rather, to know the kinds of keywords in a computer language to look for as clues that something is happening that you may be initially unaware of.
What are the implications on end user information and privacy?
- This seamless exchange of information between client site and server environments that AJAX has enabled has made it more commonplace for more data collection from a dynamic and interactive user environment.
- There is shape and form to web pages that can all be manipulated by coders who have the skill and knowledge to do so.
- HttpRequests passively expose your identity through your IP, persistent cookies on your website, your browser, version and login preferences, location etc.
- Veiled as optimizing user experience, what is actually going on is a massive market for data gathering right under the noses of site owners and their visitors.
- What I have found (and I am still learning) is that website construction is many layered. More than just what you see on a screen and what happens in end user devices needs to be understood and properly managed.
Caveat: The information contained within is just one pathfinder (of many) which marketers can use to learn more about what’s happening within their own environment. It should be seen as an indication of a real need to check what your site is actually doing in a client device, ie your customers’ machines. There is a real commercial risk if this goes unchecked because as a controller (legally responsible for the site), your site may be acting as a proxy for something much wider in scope.
This post originally appeared on beacontrustnetwork.Substack