A Marketer’s Guide to Data Tracking: What’s Really Going on Within Your Website?

MyData Canada
11 min readAug 25, 2021

--

By Hessie Jones

I never thought I could write this article, mainly because I don’t have the technical chops to build credibility for this work. But in recent weeks I have come to understand that as a marketer and someone who is attuned to data privacy, and more particularly, the resulting harms, I needed to go deeper than just understanding what was being done, but how it was being done.

I was fortunate enough to meet Allen Woods, through a few privacy colleagues. Allen served 24 years in uniform with the British Army and a further 25 years working for the UK Ministry of Defence on IT related matters mainly with compliance issues in the UK Defence Supply Chain and Logistics IT. His relevant experience and s keen understanding of the evolving privacy legislation, made it ideal for me to lean into his teachings. Marketers, especially, who have been doing digital media and advertising for years, I would bet, have no idea to the extent that code has become the pervasive vehicle to bring surveillance capitalism to the state we see today.

So, I took some time to learn. This is a glimpse of what is happening on a common website.

Some definitions to know and why each is significant:

AJAX: Is short for “Asynchronous JavaScript and XML”. With AJAX, a web application can send and receive data in the background without “interfering with the display behaviour of the existing page”, thereby changing content dynamically without needing the page to reload each time.

  • Why this is important: AJAX simply has made it easier to send and receive information from across multiple browsers to host servers to improve the end user experience and minimize disruption. Facebook and Twitter rely on tech like AJAX to keep web pages up to date (FBLikes, RTs, timestamps)

JavaScript (.js): JavaScript is a multi-paradigm program language and is a core technology of theWorld Wide Web. Over 97% of websites use it client-side for web page behavior using third-party libraries. All major web browsers have a dedicated JavaScript engine to execute the code on the user’s device.

As a multi-paradigm language, JavaScript supports event-driven (tied to user actions) functional (logic), and imperative programming styles (how a computer operates).

Supporting event driven functions means that a program flow is determined by the user actions (mouse clicks, threads, messages from programs). This is centered on performing certain actions based on the user input.

  • Why this is important: JavaScript operates in your browser as part of page rendering. All browsers support javaScript and it is often the case that code may be written to deal with the different capabilities each browser may have.
  • A simple bit of embedded JavaScript is all that’s needed to record any kind of activity on a webpage — even if you don’t actually submit anything!

XMLHttpRequest: “XMLHttpRequest (XHR) is an API, in the form of an object, which contain both fields ( in the form of attributes or properties) and code (in the form of procedures). “The procedures will transfer the data between the web browser and the web server. The object is provided by JavaScript, and this data retrieval via XHR continues to modify a loaded web page.”

  • Why this is important: The w3C has identified HttpRequests that passively expose the following : 1) browser fingerprinting based on persistent 2) super cookies, correlated with other techniques to re identify you 3) header requests that may include IP information, browser, version and OS — are considered Unsanctioned Web Tracking

jQuery: “jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and AJAX much simpler with an easy-to-use API that works across a multitude of browsers. With a combination of versatility and extensibility, jQuery has changed the way that millions of people write JavaScript.”

  • Why this is important: The reasons for its popularity: 1) There are plugins that are readily available for building websites or web apps. This “Write less, do more” tagline means you get more done with fewer lines of code.
  • jQuery powers 76% of the top 1MM websites, 41 MM of which are in the US, and hence has built a huge dependency, with over 65% of the JavaScript library usage.

Cookies: Cookies are blocks of data that are created by a web server while a user is browsing a website and put on the user’s device or computer by the web server. Each cookie has a name value, and a value that is to be stored on the user’s device.

  • Why this is important: Cookies are state indicators, markers that contain small amounts of information stored on your device, that can be recognized by a web site, or other web sites that know the cookie is there and can act on the data contained in them. Examples include username, password, website preferences.

The importance of stating these definitions up front is to understand how they have contributed to the state of data collection today that has seemed to fly in the face of EU GDPR, CCPA, PIPEDA and other privacy regulations. What’s become increasingly clear to me is the massive movement and exchange of data across the web, without the end user or site controller’s (ultimately responsible for the site) consent, knowledge, let alone web or application developers who often promote the use or creation of these innocuous codes. You, as a website owner, by default, are collecting data as a proxy for someone or something.

Allen Woods has taken me through some developer code to further understand what is happening. I’ve chosen a popular clothing website, which shall remain nameless to demonstrate this process.

Understand what’s happening on your website

In order to begin to understand what’s going on in a website, it makes sense to understand what to look for. To display what’s on the page there is HTML but increasingly people are calling code components from just about anywhere.

Most people do not check what they have , what code goes into the client device at the point at which it’s delivered into the client. So let’s demonstrate this.

Here’s a video to get you started: Opening Up Developer Code for Any Website.

I’ve picked a code snapshot from a popular traffic analyzer. Now, in that website that code is being requested is quite complex and contains several thousand lines of javaScript referenced inside the HTML or delivered as separate modules. The code presented here is just a few hundred of lines that has also been compressed, or “obfuscated” to make it difficult to read even for seasoned coders.

Generally in page using a browser “view source” option it may look something like this:

Just six lines of HTML: The lines may vary in count depending on the component, nevertheless, the code insertion is relatively tiny.

To understand what’s really happening, I opened up an API file associated with a popular traffic analyzer and what I found there were nearly 18,000 lines of codes that did not appear in the original file above, but appeared in my browser, on my device because it had been requested by the traffic analyzer.

What does this mean? This site is one of the more popular site visitor analysis toolkits that is in used by an estimated 29MM sites worldwide and each page is used to transmit a visitor device profile back to the code owners.

The website owners get a summary of their specific visitor data but the code owners are able to leverage ALL of the data collected by all 29MM sites on an ongoing basis. This massive collection of data intelligence continues, unabated.

For website visitors, over 18,000 lines of code are dropped onto their device by this third party site for every user interaction. Most of the time this is happening without their knowledge or permission.

JavaScript is one of the more commonly used languages in web development today. There are others, with each core language supported by developers worldwide who write their own “add-ins”, or libraries, that can be shared. Many of them use techniques like AJAX as part of normal operation, which is part of the problem, because the language and libraries can be used for both good and bad. This represents information dropped onto the user’s device by third party servers, requesting every keystroke, data input, navigation behaviour, location etc. — all elicited by the end user, without their knowledge nor permission.

If you know how to look and what to look for, you can download and inspect the code for any of the components your web site may use. I pasted some of the code from one of the more popular analytics engines that has been dropped into my browser, on my computer (Note: without consent), into a standard text editor and used its search facilities to take a quick look at some key words or phrases used in javascript.

It should be noted that such code is usually requested, asked for, by a web site, from another computer somewhere in the web and that raises an issue of control because those owning the code can and do make changes to it to their benefit, and at risk to the end users.

HTTPRequests:

Remember, these requests contain references to the objects (page elements and the command codes) that make up a web page and can be made to do things like transfer the data between the website and the server requesting it. The code below is placed into the browser memory cache and then it becomes a live program that is executed based on the page events like the user clicking or entering information.

There are a number of keyword phrases to look out for when looking at code. One of them is “HTTPRequest” which is one of the indicators that AJAX is using as a coding technique to transmit something.

Through some simple searches I pulled up the following HttpRequests — (7 counted in total)

Next I did a search for the word “cookies”.

I found 67 instances of the “cookies”. Here are some examples

What this means:

You will see keywords like “mouse”, “document” (which refers to the page) and “window.location”, all of which are a means by which coders can detect things like device and browser capabilities for each end user.

All of this code is dropped into an end user browser by the popular web analyzer tool without the explicit knowledge of a site controller. Depending on the number of capabilities of components on the web page, the delivery of a single web page may mean tens of thousands of lines of code will be dropped on the client site — ALL of which site controllers will have no control over. The code may then be modified by the original developers for any number of reasons and may contain capabilities other than the stated original purpose.

It should be noted that these components also have licensing terms and conditions that will make use of the term, indemnification — which Shoshana Zuboff referred to as “sadistic” in her book, The Age of Surveillance Capitalism. On a typical Terms and Conditions page, indemnification from this web analyzer tool may look like this:

“To the extent permitted by applicable law, You will indemnify, hold harmless and defend [Company] and its wholly-owned subsidiaries, at Your expense, from any and all third-party claims, actions, proceedings, and suits brought against [Company] or any of its officers, directors, employees, agents or affiliates, and all related liabilities, damages, settlements, penalties, fines, costs or expenses (including, reasonable attorneys’ fees and other litigation expenses) incurred by [Company] or any of its officers, directors, employees, agents or affiliates, arising out of or relating to (i) Your breach of any term or condition of this Agreement…”

Why do marketers need to understand this?

Those who manage the website to run communication or acquisition campaigns need to understand the basics of carrying out a simple code review. Legally speaking, the “controller” is the person responsible for how the site functions. Secondly, “open source” does not mean “free of responsibility” and it is the case that each site request (to the data processor) will have its own terms and conditions by which controllers are legally bound and as we’ve seen (unbeknownst to the controller), the code that is dropped into the user device more often than not, does not come from the site server but rather the third party data processor (represented by standard .js calls below):

It should be pointed out that this simple exercise does not require coding knowledge, but rather, to know the kinds of keywords in a computer language to look for as clues that something is happening that you may be initially unaware of.

What are the implications on end user information and privacy?

  • This seamless exchange of information between client site and server environments that AJAX has enabled has made it more commonplace for more data collection from a dynamic and interactive user environment.
  • There is shape and form to web pages that can all be manipulated by coders who have the skill and knowledge to do so.
  • Through seemingly innocent pieces of javaScript, web scrolling, mouse movements, keystrokes can be tracked and recorded against your will or knowledge.
  • HttpRequests passively expose your identity through your IP, persistent cookies on your website, your browser, version and login preferences, location etc.
  • Any code library that makes use of any of the javaScript language can be safely assumed to be either transmitting data, or storing data on the user device. In all such instances, the data gathering goes back to wherever the host machine for the code resides.
  • Veiled as optimizing user experience, what is actually going on is a massive market for data gathering right under the noses of site owners and their visitors.
  • What I have found (and I am still learning) is that website construction is many layered. More than just what you see on a screen and what happens in end user devices needs to be understood and properly managed.

Caveat: The information contained within is just one pathfinder (of many) which marketers can use to learn more about what’s happening within their own environment. It should be seen as an indication of a real need to check what your site is actually doing in a client device, ie your customers’ machines. There is a real commercial risk if this goes unchecked because as a controller (legally responsible for the site), your site may be acting as a proxy for something much wider in scope.

This post originally appeared on beacontrustnetwork.Substack

--

--

MyData Canada
MyData Canada

Written by MyData Canada

Our vision is to become an essential voice of technological and social change that impacts the data rights of all Canadians and influences the national debate.

No responses yet