Testing – it is a critical activity for optimizing any site and an important part of an overall data-driven web site strategy. When it comes to disclosure of web analytics tools within the context of a site’s privacy policy or terms of service, I lump testing tools together with click stream, voice-of-customer, and audience paneling tools. My sites’ privacy policies typically describe them collectively as third-party tools that collect information about the user.
Is there a privacy concern here? No, because the scope of data that is collected is clearly defined – the data collected by testing tools is far less detailed than that collected by click stream tools or transactional data.
Is there a user experience/user sentiment concern here? Possibly. Although user experience varies widely by platform and site customization features, by the very nature of a test, certain visitors are getting a sub-optimal experience. Should informing site visitors of the use of these tools, or even the fact that they are participating as a panelist in an experiment, be part of a policy of transparency and potentially cut off any negative sentiments created by the testing?
This all leads to my question:
| Do you explicitly inform your sites’ visitors of any A/B or multivariate tests/tools that are deployed on your sites? | |
| No. | |
| Yes, we cover all testing generically in our TOS/Privacy Policy. | |
| Yes, we mention specific tools and/or tests in our TOS/Privacy Policy. | |
| Yes, pages that are part of test treatments inform visitors to the existence and nature of the test. | |
![]() |
|
| Web Polls | |
I’d love to hear about any specific experiences related to testing and disclosure in the comments!
On May 25th, Google announced the availability of a browser add-on for Internet Explorer, Chrome, and Firefox that prevents a user’s browser from reporting site usage data to Google Analytics. This plug-in has the effect of preventing visit and visitor information from being reported to any site using Google Analytics to collect clickstream data to their site.
Although I have years’ of experience implementing and using other web analytics tools, today I use Google Analytics on nearly every site that I manage. It has become the de-facto standard web analytics tool for content and small e-commerce sites for a reason: it is easy to implement, has enterprise-grade features and a large user base, and it is FREE. Here are the reasons why I’m not freaking out about a potential loss of visitor data from this tool:
It Isn’t about the Individual Visit
The power of clickstream analytics tools, like Google Analytics, comes from deriving actionable insights by exploring aggregate site traffic across discrete time periods and specific traffic segments. You simply aren’t going to get very much actionable insight delivered by looking at one person’s visit to your site or even by tracking one person’s visits over a longer time period. In fact, Google Analytics’ terms of service explicitly forbids implementing it in a way that can uniquely identify individual visitors.
Aggregate Data is More about Precision than Accuracy
Here is a thought exercise: what if Google Analytics or some other clickstream analytics tool is delivering actionable insights that boost your site’s conversion rate but is only collecting data from about 95-99% of your site’s visitors? That extra 1-5% isn’t a big deal as you can safely assume that the missing 1-5% is acting like the other 95-99% of your visitors.
Due to Javascript not loading, the mechanics of the Javascript not triggering the call back to Google fast enough, or visitors’ current use of ad-blocking and privacy tools, I generally assume that I am CURRENTLY missing about 1-5% of my sites’ pageviews. Moreover, if you have a site with a large amount of traffic (millions of pageviews per month), Google Analytics suggests that you estimate traffic data based on sampling your site’s traffic to speed up the processing of your reports.
Back to the thought exercise: I expect adoption of this plug-in to be somewhere in the neighborhood of 1-5% of all users. Is your traffic data fatally flawed if you are missing 2-10% of your pageviews? What about 20%?
Unless the people who install the plug-in are going to behave differently (as a group) than those that do not, Google Analytics will become somewhat less accurate with no loss in precision. In the context of most sites’ objectives, there is not going to be a reason to question the validity of the conclusions that are drawn from Google Analytics unless there is widespread adoption of the plug-in. This is because actionable site optimization metrics are based on rates (conversion rate, funnel exit rate), rather than on absolute numbers.
Clickstream is Only Part of the Puzzle
There is an ever-increasing amount of information that is being generated by people interacting with your brand online. On your site, there is the potential to collect transactional data, direct voice-of-customer data, site testing data, contact us form data, etc., that is typically integrated with, but discrete from, Google Analytics. Off site, there are interactions with your brand on social media, email marketing activities, and any offline interactions that may also be generating data. It isn’t that your clickstream data isn’t important – it is just that there are other sources of data that may prompt action on the part of the analyst.
Allowing the Opt-out Is the Right Thing to Do
As site owners, we should never lose sight of our objectives. There is a reason why our sites exist (sell something, provide information, display advertising) that is fundamentally more important than how we measure and improve our sites’ ability to achieve those goals. Perhaps unfairly, some peoples’ concerns over privacy will cause them to block a tool that is likely being used to understand and improve their experiences, but we should respect their wishes and accept this as a new browsing paradigm in an environment with many other evolving browsing paradigms.
There Are Alternatives to Google Analytics
Of course, there are other web analytics packages out there if Google Analytics is no longer getting the job done. It is pretty standard for “enterprise-level” web analytics solutions to include a clickstream tool, a CRM tool, a data warehousing tool, a testing and optimization tool, a social media monitoring and engagement tool, etc., along with their “enterprise level” cost and implementation difficulty. There are other free tools out there with the features that you would expect with a free tool. Google also sells Urchin, which doesn’t rely on Javascript to collect data, but instead uses server data logs as its primary data source.
In summary, I don’t think that there will be widespread adoption of the Google Analytics opt-out. Even if there is, it won’t totally strip away the value of the tool and there are other clickstream analytics tools out there (as well as other sources of web analytics data).
By using functions in Excel and SQL that return the text from specific locations within a string combined with ones that can isolate the location of the “@” character in every email address, you can easily extract domain names from lists of email addresses.
EXCEL:
The base function for this is RIGHT. RIGHT gets passed two arguments, text, which is the text being parsed, and num_chars, which is the number of characters returned by the function. RIGHT takes the form in Excel of RIGHT(text,[num_chars]).
The text argument is obvious; it is the text of the email address to be parsed.
The num_chars argument is determined using a combination of two other functions, LEN and FIND. We use LEN to determine the length of the overall email address and subtract the position of the @ operator, determined using FIND. The resulting differencewill return the length of the domain portion of the email address.
LEN(text) returns the number of characters in the string.
FIND(find_text,within_text,start_num) returns the postion of the find_text within the within_text. start_num, which we won’t use here, is a way to start selecting text after a certain number of characters.
To put this all together, let’s put my email address in cell A1, place our derived function into the B1, and derive the result.
=RIGHT(A1,((LEN(A1)-FIND(“@”,A1)))) calculates to:
=RIGHT(A1,(21-4)) to:
=RIGHT(A1,17) to:
=tomsanalytics.com

SQL:
SQL is a little bit trickier, as there isn’t a right-to-left text selection function, instead we are going to use a left-to-right function, SUBSTRING.
SUBSTRING returns a string of text based on definition passed to the function. SUBSTRING takes three arguments, value_expression, which is the text being parsed, start_expression, which is the starting character of the returned string, and length_expression, which is the number of characters returned, starting with the start_expression. This function in SQL looks like this: SUBSTRING(value_expression,start_expression,length_expression).
In this case our email address is the value_expression. The character following the “@” symbol is the start_expression, with the length of the remaining string being the length_expression.
To determine start_expression, we deploy another SQL function, CHARINDEX, which works exactly like FIND in Excel. Using SQL’s version of LEN and the same math, we can determine length_expression.
Putting it all together, let’s assume a table named email_addresses with a column named email:
SELECT SUBSTRING(email, CHARINDEX(‘@’, email) + 1, LEN(email) – CHARINDEX(‘@’, email) + 1) AS domain_name
FROM email_addresses
WHERE email like ‘%@%’
The WHERE clause is in there to prevent malformed email addresses from crashing the CHARINDEX function.
The math works the same way as the Excel, except that you have to remember that we are working from the left, so the need arises to add one to the character counter in the CHARINDEX function.
Simple and powerful. I hope this is useful to someone – the inspiration for this post came from this post at Chandoo.org. Pointy Haird Dilbert is easily my favorite as well as one of the most useful and entertaining Excel blogs out there.

