What can you track with web analytics, without requiring a cookie consent banner and/or privacy policy, and maintain compliance with GDPR and CCPA?
The digital landscape is undergoing a fundamental transformation, driven by a confluence of regulatory pressure and shifting consumer expectations regarding data privacy. The era of ubiquitous, consentless tracking, epitomized by the third-party cookie, is rapidly drawing to a close. Regulations like the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) have established stringent requirements for how organizations collect and process user data, imposing significant penalties for non-compliance. This regulatory environment, coupled with a growing public awareness of "surveillance capitalism," has created a significant market opportunity for technologies that prioritize privacy by design.
This new economy demands a paradigm shift from data maximization to data minimization. For web analytics, this means moving away from tools that collect vast amounts of personally identifiable information towards solutions that provide actionable insights while respecting user anonymity. The value proposition of a modern analytics platform is no longer just the data it provides, but the legal confidence and trust it instills. A successful platform in this climate must function as a "Compliance-as-a-Service" offering, where the core product is not merely traffic metrics, but demonstrable legal peace of mind for website administrators.
This document serves as the foundational legal and architectural blueprint for a privacy-first, cookieless web analytics platform. It deconstructs the complex and often overlapping requirements of the GDPR, the ePrivacy Directive, and the CCPA to establish a unified, "highest-common-denominator" compliance model. This model is then translated into two distinct and technically specified operational modes: a completely anonymous mode that falls outside the scope of privacy law, and a richer, pseudonymised mode built upon the defensible legal basis of "Legitimate Interest." The final section provides a direct, actionable implementation plan intended for the engineering team, ensuring that the principles of privacy-by-design are embedded into the core of the product architecture.
To build a platform that offers its customers true legal confidence, it is not sufficient to comply with a single regulation. The architecture must be resilient to the strictest interpretations across all relevant legal frameworks. This section establishes the non-negotiable legal principles derived from GDPR, the ePrivacy Directive, and CCPA that will dictate the platform's design, creating a robust and defensible compliance posture.
The cornerstone of all data protection law is the definition of "personal data" or "personal information." The scope of this definition determines when the regulations apply. A conservative and comprehensive interpretation is essential for a product whose primary value is legal assurance.
The GDPR provides a famously expansive definition, stating that 'personal data' means any information relating to an identified or identifiable natural person. An individual is considered "identifiable" if they can be singled out, "directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier". This explicitly includes data points common in web analytics, such as IP addresses and cookie IDs, which are categorized as "online identifiers".
The CCPA casts an even wider net. Its definition of "personal information" includes any information that "identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household". The CCPA explicitly lists identifiers such as IP addresses, geolocation data, and internet activity like browsing and search history. Crucially, it also includes "inferences drawn" from any other personal information to create a profile about a consumer's preferences and characteristics.
A critical concept underpinning these definitions is the "singling out" doctrine, articulated in Recital 26 of the GDPR. It clarifies that to determine whether a person is identifiable, one should account for "all the means reasonably likely to be used... to identify the natural person directly or indirectly". This means that data does not need to contain a name or email address to be considered personal. A combination of several seemingly innocuous data points can, in aggregate, become personal data if their combined uniqueness allows for an individual to be distinguished from a group. This is often referred to as the "mosaic effect." For instance, collecting a specific browser version, a precise screen resolution, a list of installed browser plugins, and a timezone can create a highly unique "device fingerprint". While each piece of information in isolation may be low-risk, their combination constitutes a powerful online identifier and is therefore personal data.
For the purpose of building a maximally safe analytics platform, the unified definition of personal data must be the most conservative one. The platform will treat any data point, or combination of data points, as "personal data" if it could be used to reasonably single out an individual, their specific device, or their household. This approach acknowledges that the primary legal risk often lies not in collecting a single problematic data point, but in collecting a set of data points whose cumulative entropy is high enough to create a unique or near-unique signature. This principle dictates that the platform's data collection policy cannot be a simple checklist of allowed fields; it must be a holistic assessment of the total information gathered per visitor event. Consequently, techniques like "binning" screen sizes into categories and "truncating" user-agent strings are not merely best practices but are legal necessities to reduce the identifiability of the overall dataset.
While the GDPR governs the processing of personal data, the ePrivacy Directive (often called the "cookie law") governs a specific action: the storing of information on, or the gaining of access to information already stored in, a user's terminal equipment (e.g., computer or smartphone). This distinction is of paramount importance for any service aiming to operate without a consent banner.
Article 5(3) of the Directive establishes the core rule: this action is only allowed with the user's prior, informed consent. The rule is technology-neutral, applying not only to traditional HTTP cookies but to any equivalent technology, including localStorage, sessionStorage, tracking pixels, and scripts that actively probe the user's device for fingerprinting data.
For this specific action of accessing a user's device, the ePrivacy Directive's consent requirement takes precedence over the GDPR's more flexible lawful bases. An organization cannot rely on "Legitimate Interest" under the GDPR to justify setting a non-essential cookie or running a fingerprinting script; explicit, opt-in consent is the only valid legal basis. This is a frequent point of confusion, but it is a settled matter of law.
The only way to legally bypass this consent requirement is to fall under one of two narrow exemptions. The action must be either:
Regulators have consistently interpreted the "strictly necessary" exemption very narrowly. A classic example of a strictly necessary cookie is one that remembers the items in a user's shopping cart as they navigate an e-commerce site. Conversely, cookies and similar technologies used for analytics, audience measurement, and advertising are explicitly and repeatedly classified as not strictly necessary by data protection authorities (DPAs) like the UK's Information Commissioner's Office (ICO).
The critical takeaway is that the ePrivacy Directive is an action-based law, not a data-based one. Its rules are triggered by the act of accessing the user's device, regardless of whether the data collected is personal or anonymous. This means that even if the goal is to collect 100% anonymous data, if the method used involves storing or reading non-essential information from the user's device, a consent banner is legally required. This single principle dictates the entire client-side architecture of a "no banner" analytics tool. The tracking script must be a "zero-footprint" script. It cannot use cookies, localStorage, sessionStorage for non-essential purposes, or any other form of persistent client-side storage. Furthermore, it cannot engage in active device fingerprinting, such as probing for installed fonts or using canvas rendering to generate an identifier. The only data it can permissibly collect without triggering ePrivacy consent are data points that are transmitted automatically by the browser as part of a standard HTTP request (e.g., IP address, User-Agent header) or are available via non-persistent JavaScript APIs that do not access stored information (e.g., location.pathname, document.referrer).
While the ePrivacy Directive sets the strict EU-wide rule, the law allows for some national variation in its implementation. This has led a few national regulators to carve out specific, narrow exemptions for analytics tools that meet stringent privacy-preserving conditions. The most well-documented and influential of these comes from France's Commission Nationale de l'Informatique et des Libertés (CNIL).
The CNIL's guidance provides a limited exemption from the ePrivacy consent requirement for analytics trackers whose purpose is strictly limited to measuring the audience of a site or app on behalf of the publisher. This exemption is not a blanket pass for all analytics and is contingent upon meeting a cumulative list of strict conditions :
It is crucial to contrast this with the position of stricter DPAs. The UK's ICO, for example, offers no such analytics exemption in its current guidance. The ICO's position is that analytics is not "strictly necessary" for the user-requested service and therefore always requires consent.
The platform's strategy should be to leverage the CNIL model as the architectural foundation for its more advanced tracking mode. By engineering a mode that meticulously adheres to the CNIL's stringent conditions, the platform can offer its customers a legally defensible position in at least one major EU jurisdiction. This can be argued as a reasonable, good-faith interpretation of the spirit of the law in other jurisdictions, particularly given the privacy-preserving measures implemented. However, it must be communicated to customers that this approach is not entirely without risk, as a DPA in a country without such an explicit exemption could still take a stricter view.
Building on the legal principles established in the previous section, the platform's architecture will be bifurcated into two distinct operational modes. Each mode is designed to provide a specific level of legal assurance and data richness, translating abstract regulatory requirements into concrete product features and technical specifications.
This mode is designed to offer the highest possible level of legal certainty. Its objective is to operate completely outside the purview of data privacy regulations like GDPR and CCPA by ensuring that no personal data is ever processed or stored.
Guiding Principle: Anonymised at the Point of Collection. To obviate the need for a privacy policy and any associated compliance obligations, the data collected must be rendered irreversibly anonymous. This is a higher standard than pseudonymisation. According to GDPR Recital 26, the regulation "does not... concern the processing of... anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable". For data to be truly anonymous, re-identification must be impossible by any reasonably likely means. Pseudonymisation, which involves replacing identifiers with a code, is insufficient because the data can still be attributed to a specific person with additional information; therefore, pseudonymised data remains personal data. Mode 1 must achieve true, irreversible anonymisation.
Technical Architecture: The architecture for Mode 1 is predicated on transient, in-memory processing and immediate data aggregation.
What CAN be tracked in Mode 1:
What CANNOT be tracked in Mode 1:
Legal Justification: The legal basis for this mode is straightforward: the data privacy laws do not apply. Because no personal data is ever stored or filed, the GDPR and CCPA are not triggered for the stored dataset. Furthermore, because the client-side script does not store information on or access information from the user's device, the consent requirement of the ePrivacy Directive is not engaged. Mode 1 is the digital equivalent of a simple, anonymous door counter at a physical store—it counts entries but knows nothing about the individuals passing through.
This mode is designed for customers who require richer analytics, including the ability to count unique visitors, while still avoiding the need for a consent banner. It operates on a more complex legal footing, acknowledging the processing of personal data but justifying it under a specific legal basis.
Guiding Principle: Privacy-Preserving Pseudonymous Analytics. Mode 2 acknowledges that it processes personal data. However, it is architected to do so without requiring user consent. Instead, it relies on the "Legitimate Interest" lawful basis provided under GDPR Article 6(1)(f). This is only possible because the platform continues to adhere to the core principle of avoiding ePrivacy consent triggers—that is, still no cookies and no active device fingerprinting. The collection of personal data is limited to what is sent by the browser by default.
The Legitimate Interests Assessment (LIA): A Non-Negotiable Prerequisite. The use of Legitimate Interest is not a free pass; it requires the data controller (the website owner using the platform) to conduct and document a Legitimate Interests Assessment (LIA). The platform must provide its customers with clear guidance and ideally a template for this assessment. The LIA is a three-part test :
Technical Architecture & Additional Data Points: Mode 2's architecture is carefully designed to align with the strict conditions of the CNIL analytics exemption, providing a strong, defensible position.
By implementing these measures, Mode 2 provides significantly more utility than Mode 1 while remaining within a defensible legal framework that does not require a consent banner. The onus is on the platform to provide the tools and on the customer to maintain a compliant privacy policy and honor opt-out requests.
This section provides the direct, actionable specifications for the engineering team. It translates the legal and architectural principles from the preceding sections into a concrete implementation plan, removing ambiguity and ensuring that the final product is compliant by design.
The following table serves as the single source of truth for data handling. It dictates how every potential data point must be processed, stored, and/or discarded in each of the platform's operational modes. Adherence to this matrix is critical for maintaining the legal integrity of the product.
Data Point | Raw Data Example | Mode 1 Implementation (The Anonymity Standard) | Mode 2 Implementation (The Legitimate Interest Standard) | Legal Rationale & Notes |
---|---|---|---|---|
IP Address | 198.51.100.1 | USE & DISCARD. Use transiently in-memory for country lookup, then immediately discard. Must never be logged or stored. | PSEUDONYMISE & DISCARD. Use transiently in-memory for geo-lookup and as an input for the daily salted hash. Must discard the raw IP immediately after use. | A raw IP address is an "online identifier" and constitutes personal data under both GDPR and CCPA. Discarding it is the cornerstone of Mode 1's anonymity. Hashing is a form of pseudonymisation, not anonymisation; the result is still personal data. |
User Agent | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36... | PARSE, AGGREGATE, DISCARD. Parse in-memory for Browser, OS, and Device Type families (e.g., 'Chrome', 'macOS', 'Desktop'). Store only these low-entropy strings. Discard the full UA string. | PARSE, TRUNCATE, STORE. Parse for families and major versions (e.g., 'Chrome 108', 'macOS 13'). Store these truncated strings. Discard the full UA string. | A full User-Agent string is a high-entropy vector for device fingerprinting and can contribute to "singling out" a user. Aggregation and truncation are essential data minimisation techniques required by GDPR and are aligned with the principles of the CNIL exemption model. |
Screen Dimensions | 1920x1080 | STRICTLY FORBIDDEN. The combination with other data points is too identifying. | BIN & STORE. Convert exact dimensions into predefined categorical bins based on common responsive design breakpoints (e.g., 'xs', 'sm', 'md', 'lg', 'xl'). Store only the bin label, not the raw numbers. | Reduces the entropy of the data to mitigate fingerprinting risk. Exact screen dimensions are a key component of unique device fingerprints. |
Geolocation | (Derived from IP) | COUNTRY ONLY. Store only the two-letter ISO country code (e.g., 'FR', 'DE'). | COUNTRY & REGION. Store the country code and the first-level administrative division (e.g., state, province, region). City-level data is too granular and high-risk. | Location data is explicitly personal data. Granularity significantly increases identifiability. The CNIL exemption allows up to postal code level, but for a global product, limiting to region is a more conservative and defensible posture. |
Referrer URL | https://www.google.com/search?q=analytics | DOMAIN ONLY. Extract and store only the effective top-level domain + 1 (e.g., google.com). The full path and all query parameters must be scrubbed. | PATH ONLY. Extract and store the origin and pathname (e.g., https://www.example.com/blog/article-name). All query parameters must be scrubbed. | Query parameters frequently contain PII or session identifiers, posing a significant compliance risk. Aggressive scrubbing is a critical data minimisation measure. |
Session ID | (from sessionStorage) | NOT RECOMMENDED. While technically compliant with ePrivacy, it provides little value in a fully anonymous context and adds unnecessary complexity. The goal is simplicity and absolute legal certainty. | PERMITTED. Generate and store a random string in sessionStorage. This identifier is ephemeral, scoped to the browser tab/session, and is automatically cleared by the browser. | sessionStorage does not persist across sessions and is not considered equivalent to a cookie under the ePrivacy Directive. It can be considered "strictly necessary" for the purpose of analyzing a single user journey within one session. |
Unique User ID | (cross-session) | STRICTLY FORBIDDEN. This is the clearest violation of the "no banner" principle. | Daily Salted Hash ONLY. Implement as HASH(IP + UA + SiteID + DailySalt). This is the only defensible method for consentless unique visitor counting. See section 2.2 for a full analysis of the risks. | Any persistent or stable identifier requires ePrivacy consent. The daily salted hash is a high-risk but marketable implementation of pseudonymisation that relies on its 24-hour lifespan as a key mitigating factor. |
Custom Event Properties | {'category': 'books', 'author': 'jane\_doe'} | PERMITTED (with strict validation). The system must filter incoming properties against a deny-list of keys that are likely to contain PII (e.g., 'email', 'name', 'user_id', 'address'). The customer bears ultimate responsibility, but the platform must provide technical safeguards. | PERMITTED (with strict validation). Same requirements as Mode 1. The customer must be explicitly warned in the documentation and their privacy policy that they are responsible for not sending PII in custom properties. | Under GDPR, both the data controller (the customer) and the data processor (the platform) have compliance obligations. The processor must implement technical measures to help prevent data breaches and unauthorized processing. |
The following pseudocode outlines the server-side logic for handling an incoming analytics event. This logic must be executed for every request to ensure that raw personal data is handled transiently and that the correct data points are stored based on the customer's selected compliance mode.
Function handle_analytics_request(request, site_config):
// 1. Extract raw data from request headers and body
raw_ip = request.headers['x-forwarded-for'] or request.remote_addr
raw_ua = request.headers['user-agent']
payload = request.body
// 2. Initialize enriched data object with common, safe fields
enriched_data = {
site_id: site_config.id,
timestamp: new Date().toISOString(),
event_name: payload.event,
pathname: payload.pathname,
utm_source: payload.utm_source,
//... other UTM fields
}
// 3. Perform transient, in-memory enrichment using raw data
geo_data = geo_lookup_from_ip(raw_ip)
parsed_ua = parse_user_agent(raw_ua)
// 4. Apply mode-specific logic
// The raw_ip and raw_ua are only used within this block and are never stored.
if site_config.mode == 'MODE_2_LEGITIMATE_INTEREST':
// --- Mode 2: Legitimate Interest Standard ---
enriched_data.country = geo_data.country_code
enriched_data.region = geo_data.region_code
enriched_data.device_type = parsed_ua.device_type
enriched_data.os_family = parsed_ua.os_family
enriched_data.os_version_major = parsed_ua.os_major_version
enriched_data.browser_family = parsed_ua.browser_family
enriched_data.browser_version_major = parsed_ua.browser_major_version
enriched_data.screen_bin = bin_screen_dimensions(payload.screen_width, payload.screen_height)
enriched_data.referrer_path = extract_origin_and_path(payload.referrer)
enriched_data.session_id = payload.session_id
// Generate the pseudonymised daily visitor ID
daily_salt = get_daily_salt() // CRITICAL: Fetches the secret, daily-rotated salt
visitor_signature = raw_ip + raw_ua + site_config.id + daily_salt
enriched_data.daily_visitor_id = sha256(visitor_signature)
else: // Default to Mode 1 for maximum safety
// --- Mode 1: Anonymity Standard ---
enriched_data.country = geo_data.country_code
enriched_data.device_type = parsed_ua.device_type
enriched_data.os_family = parsed_ua.os_family
enriched_data.browser_family = parsed_ua.browser_family
enriched_data.referrer_domain = extract_domain(payload.referrer)
// 5. Filter and store the final data object
// The raw_ip and raw_ua variables are now out of scope and will be garbage collected.
// They were never written to disk or any persistent log.
final_data = filter_for_allowed_fields(enriched_data, site_config.mode)
save_to_database(final_data)
Daily Salt Management: The security and integrity of the daily salted hash mechanism depend entirely on the proper management of the salt. This process must be automated, secure, and auditable.
# This function should be executed by a secure, isolated, and reliable
# scheduler (e.g., cron job) precisely at 00:00:00 UTC every day.
def rotate_daily_salt():
# Generate a new, cryptographically secure random string.
new_salt = generate_secure_random_string(64)
# Store the new salt in a fast-access, volatile cache (e.g., Redis, Memcached).
# The Time-To-Live (TTL) ensures it is automatically purged after 24 hours
# even if the rotation job fails on the next day.
cache.set('analytics_daily_salt', new_salt, ttl=86400) # 86400 seconds = 24 hours
# CRITICAL: The old salt is now overwritten or has expired. There is no
# history of past salts, making retrospective re-identification of visitors
# from previous days computationally infeasible.
# This function is called by the request handler.
def get_daily_salt():
salt = cache.get('analytics_daily_salt')
if not salt:
# This is a fallback mechanism in case the cache is empty or has been cleared.
# It should trigger a high-priority alert to system administrators, as it
# indicates a potential issue with the scheduled rotation job.
log_critical_alert("Daily salt not found in cache. Generating emergency salt.")
rotate_daily_salt()
return cache.get('analytics_daily_salt')
return salt
Template Privacy Policy Clause for Mode 2 Customers: Customers using Mode 2 must update their privacy policy to inform users about the data processing. Providing a clear, legally sound template is a crucial part of the service.
Website Analytics:
To understand how visitors interact with our website and to continuously improve our service, we use [Your Analytics Platform Name], a privacy-focused web analytics service. [Your Platform Name] provides us with aggregated statistical data and is designed to respect your privacy.
We process this information on the legal basis of our Legitimate Interest (as per Article 6(1)(f) of the GDPR) to monitor and enhance our website and services. We have conducted a Legitimate Interest Assessment and have concluded that our interest in this processing is not overridden by your rights and freedoms, particularly given the privacy-preserving measures we have taken.
The data we process includes:
We do not use cookies for this purpose and do not collect any data that directly identifies you, such as your name or email address.
Your Right to Object: You have the right to object to this processing. You can exercise this right by enabling the "Do Not Track" (DNT) setting in your browser, which our analytics service respects, or by visiting our opt-out page here: [Link to Customer's Opt-Out Page].
This report has established a dual-mode architectural framework designed to provide a compliant, cookieless web analytics service that meets the market's demand for privacy-first technologies. By systematically analyzing the GDPR, ePrivacy Directive, and CCPA, and adopting a conservative "highest-common-denominator" approach, the platform can offer its customers two distinct levels of legal assurance. Mode 1 achieves true anonymity, placing it outside the scope of privacy law, while Mode 2 leverages the Legitimate Interest legal basis to provide richer insights through carefully controlled pseudonymisation.
The core of this framework is a deep understanding that legal compliance in this space is not about a checklist of data points, but about a holistic assessment of the processing methods and the cumulative identifiability of the data collected. The primacy of the ePrivacy Directive dictates a "zero-footprint" client-side script, while the principles of data minimization and the "singling out" doctrine from GDPR guide the necessary server-side aggregation and truncation.
To ensure the long-term success and legal integrity of the platform, the following recommendations should be adopted:
Long coding sessions lead to physical fatigue and mental fog. A walking desk keeps you alert and focused, preventing costly bugs and burnout.Stay focused and healthy during long coding sessions.Get the factsGet the facts