A little over a year ago, I wrote a blog post about why you shouldn’t freak out about people opt-ing out of being tracked in Google Analytics. Yesterday, Google announced that they are using SSL to encrypt search queries and responses for people that are using Google.com and are logged into their Google accounts. The result of this change is that referrals from Google organic search in Google Analytics and other clickstream measurement tools will not be able to determine the keywords used in the search that brought a user to the studied site. The reporting of the fact that the visit was referred by Google organic search will be maintained.
Although this has widespread implications for both Search Engine Optimization and site optimization activities, I’d encourage you to not freak out about this change for a lot of same reasons that I outlined last year.
Why I’m Not Freaking Out – And You Shouldn’t Either
I have no idea what to expect with this change – there is no way for me to predict how many of my sites’ visitors are going to be coming from logged-in Google account holders. I do know the current impact of Google organic search to my portfolio of sites – it is the single biggest driver of organic (as opposed to paid) search traffic, providing, on average, 75% of visits. According to both Comscore and Hitwise, Google had a 66% market share of U.S. searches in September 2011.
This traffic is important to the overall goals of all of our sites. We spend a lot of time and effort on building and maintaining our sites’ search traffic and this change has serious implications for both the quality and quantity of data that we use to drive these efforts. Despite this, I’m still not freaking out.
Using the same thought exercise as my previous post, imagine if 50% of Google’s organic traffic had its keywords obfuscated because those visitors were logged into their Google accounts when they performed the search that ultimately brought them to your site:
It (Still) Isn’t about the Individual Visit (or in This Case, Search) – Search data, like all other clickstream data, is useful in aggregate. By looking at the keywords searched and grouping them by theme, we can calculate the value of certain types of keywords vs. others (for example, brand terms vs. product terms). Losing 50% of this data will affect the number of Long Tail keywords that get reported and potentially affect the reporting of narrow segments of traffic that have few reported keywords, but not radically affect our conclusions on the aggregate “search intent” of our visitors.
Aggregate Data is More about Precision than Accuracy - With this thought exercise, we are again losing a bit of Accuracy without losing Precision. There is no reason to suspect that individuals search and post-search behavior is going to change because they happen to be logged into their Google account. The assumption that these individuals as a group do not significantly behave differently than all Google organic visitors is one that is easily tested within Google Analytics by comparing the group with reported keywords with the one whose search terms are obfuscated. Because of this (testable) assumption, we can draw conclusions that the “search intent” of the missing 50% in aggregate is going to be similar to the fully-reported 50%.
Perhaps most importantly, from a privacy standpoint, this is the right thing to do. I spend a lot of time connected to Wi-Fi in other offices, coffee shops, hotels, at conferences, and other places where a nefarious system administrator could easily snoop on my search queries and other non-encrypted web usage data. Google’s new two-factor authentication makes me secure accessing Google products (including Google Analytics!) while connected to potentially sketchy Wi-Fi. Now I have the same level of comfort while using Google search in potentially unfriendly places.
The data will show the impact of the change in the next few days as it is rolled out to everyone. Regardless of the scope of the data that has been affected, I hope this post had made a strong argument for not freaking out about it.
I was inspired to write this post this evening after looking at my LinkedIn social graph and seeing the recent career arcs of some of my former colleagues (more on this later).
This post is to thank and highlight a video from a person that has had a tremendous amount of influence to my career: Avinash Kaushik. This video was recorded back in 2007, and talks about a topic that is near and dear to my heart:
In the wrong context or to the wrong people, talking about “culture” causes people’s eyes to glaze over. Based on my experience, I am no longer one of those people. My tale of developing a data-driven culture in a large corporation follows:
In 2006, I accepted a position as a Senior Manager of Web Analytics for a large business services firm. As the product manager of the organization’s enterprise-wide web analytics software and data collection framework, I had my hands full developing a data capture and reporting framework as part of a complete web reboot by the company. Although implementing an enterprise click stream tool as well as a framework for web data integration into the company’s data warehouse was a technically complex task, it was fairly straightforward once the requirements were determined.
What was not straightforward, however, was how to develop a data-driven culture in regards to how the company used its web data.
The organization had nine different, quasi-independent business units with about thirty-five different web sites. Nobody from the business units was focused on web analytics, however, each business unit had a web team that was focused on managing the content on the sites. My goal was to transform those positions from content managers into data driven product managers of their web sites.
So how did I attempt to accomplish this? I empowered the web managers with their own data. I trained the web teams on both our clickstream and data warehouse tools and gave them the ability to independently develop actionable insight about their clients’ web usage.
Less than a year later, these managers could look at both individual and aggregate customer data and determine how specific web-based activity affected their business units’ bottom line. They had total visibility into all of the company’s marketing data, allowing them to explore the data and develop objective arguments for action.
Looking at my social graph on LinkedIn, I see that, three years later, some have moved into different roles either within or outside of the firm, but at least four of those former web managers have moved on to be web analysts, two with a top-tier web analytics consulting firm.
My approach here was directly influenced by Avinash’s first book, blog, and talks that he was giving at the time. His guidance was, and continues to be, useful and inspirational for the entire online marketing community.
This post is in praise of a simple tool: the QR Code. QR codes are graphics that represent text strings, typically website URL’s. Using their cameras and QR scanning software, smart phone users can scan QR codes to launch specific website URL’s in their mobile browser.
Although the “QR” code is one type of two dimensional code (other common ones: Aztec Code, MaxiCode) the term “QR Code” has been extended to encompass any two dimensional code that is readable by scanner software on mobile devices. The QR Code standard is a set standard and license-free, so the platforms for both consumption and generation are interchangeable.
So what’s to like about QR codes?
QR codes are easy to consume. All the major mobile platforms either support the QR Code standard natively or have free QR scanning applications readily available. To consume a QR code, a smart phone user needs to simply “take a picture” of it with their phone.
QR codes are easy to create. Since the QR Code is based on a set standard, there are a number of web services that will produce them based on URL input. I’ve been using Kaywa’s generator, but URL shorteners, such as goo.gl and bit.ly now also generate them along with their shortened URL’s.
QR codes are easy to track. Much like a shortened URL or a vanity URL, there is opportunity to tag incoming URL’s to allow tracking of traffic generated by QR code scans. This is a key practice when attempting to determine use of and return from QR code usage.
Where are they useful? With the explosion in advanced smartphone usage, there is increasing opportunity to embed these codes in a wide variety of applications. I have personally seen QR codes used in billboards, magazine advertisements, bus shelters, bus wraps, business cards, conference badges, and, oddly enough, men’s rooms.
So why aren’t they everywhere? The sad truth is that they aren’t everywhere. They are still so rare to see “in the wild” that I am still surprised to see them, even in situations with obvious utility.
What are some other uses? I’d like to see QR codes everywhere where a web resource could be useful. I’d like one on my appliances or in my car that can point me to product information. I’d like one at Starbucks and Chipotle that would allow me to order and pay while standing in line. I’d like to see them on TV that would allow me to connect with shows and their advertisers in addition to vanity URL’s. The potential applications are legion.
Is there a privacy concern here? No, because the scope of data that is collected is clearly defined – the data collected by testing tools is far less detailed than that collected by click stream tools or transactional data.
Is there a user experience/user sentiment concern here? Possibly. Although user experience varies widely by platform and site customization features, by the very nature of a test, certain visitors are getting a sub-optimal experience. Should informing site visitors of the use of these tools, or even the fact that they are participating as a panelist in an experiment, be part of a policy of transparency and potentially cut off any negative sentiments created by the testing?
This all leads to my question:
|Do you explicitly inform your sites’ visitors of any A/B or multivariate tests/tools that are deployed on your sites?|
|Yes, pages that are part of test treatments inform visitors to the existence and nature of the test.|
I’d love to hear about any specific experiences related to testing and disclosure in the comments!
On May 25th, Google announced the availability of a browser add-on for Internet Explorer, Chrome, and Firefox that prevents a user’s browser from reporting site usage data to Google Analytics. This plug-in has the effect of preventing visit and visitor information from being reported to any site using Google Analytics to collect clickstream data to their site.
Although I have years’ of experience implementing and using other web analytics tools, today I use Google Analytics on nearly every site that I manage. It has become the de-facto standard web analytics tool for content and small e-commerce sites for a reason: it is easy to implement, has enterprise-grade features and a large user base, and it is FREE. Here are the reasons why I’m not freaking out about a potential loss of visitor data from this tool:
It Isn’t about the Individual Visit
The power of clickstream analytics tools, like Google Analytics, comes from deriving actionable insights by exploring aggregate site traffic across discrete time periods and specific traffic segments. You simply aren’t going to get very much actionable insight delivered by looking at one person’s visit to your site or even by tracking one person’s visits over a longer time period. In fact, Google Analytics’ terms of service explicitly forbids implementing it in a way that can uniquely identify individual visitors.
Aggregate Data is More about Precision than Accuracy
Here is a thought exercise: what if Google Analytics or some other clickstream analytics tool is delivering actionable insights that boost your site’s conversion rate but is only collecting data from about 95-99% of your site’s visitors? That extra 1-5% isn’t a big deal as you can safely assume that the missing 1-5% is acting like the other 95-99% of your visitors.
Back to the thought exercise: I expect adoption of this plug-in to be somewhere in the neighborhood of 1-5% of all users. Is your traffic data fatally flawed if you are missing 2-10% of your pageviews? What about 20%?
Unless the people who install the plug-in are going to behave differently (as a group) than those that do not, Google Analytics will become somewhat less accurate with no loss in precision. In the context of most sites’ objectives, there is not going to be a reason to question the validity of the conclusions that are drawn from Google Analytics unless there is widespread adoption of the plug-in. This is because actionable site optimization metrics are based on rates (conversion rate, funnel exit rate), rather than on absolute numbers.
Clickstream is Only Part of the Puzzle
There is an ever-increasing amount of information that is being generated by people interacting with your brand online. On your site, there is the potential to collect transactional data, direct voice-of-customer data, site testing data, contact us form data, etc., that is typically integrated with, but discrete from, Google Analytics. Off site, there are interactions with your brand on social media, email marketing activities, and any offline interactions that may also be generating data. It isn’t that your clickstream data isn’t important – it is just that there are other sources of data that may prompt action on the part of the analyst.
Allowing the Opt-out Is the Right Thing to Do
As site owners, we should never lose sight of our objectives. There is a reason why our sites exist (sell something, provide information, display advertising) that is fundamentally more important than how we measure and improve our sites’ ability to achieve those goals. Perhaps unfairly, some peoples’ concerns over privacy will cause them to block a tool that is likely being used to understand and improve their experiences, but we should respect their wishes and accept this as a new browsing paradigm in an environment with many other evolving browsing paradigms.
There Are Alternatives to Google Analytics
In summary, I don’t think that there will be widespread adoption of the Google Analytics opt-out. Even if there is, it won’t totally strip away the value of the tool and there are other clickstream analytics tools out there (as well as other sources of web analytics data).
By using functions in Excel and SQL that return the text from specific locations within a string combined with ones that can isolate the location of the “@” character in every email address, you can easily extract domain names from lists of email addresses.
The base function for this is RIGHT. RIGHT gets passed two arguments, text, which is the text being parsed, and num_chars, which is the number of characters returned by the function. RIGHT takes the form in Excel of RIGHT(text,[num_chars]).
The text argument is obvious; it is the text of the email address to be parsed.
The num_chars argument is determined using a combination of two other functions, LEN and FIND. We use LEN to determine the length of the overall email address and subtract the position of the @ operator, determined using FIND. The resulting differencewill return the length of the domain portion of the email address.
LEN(text) returns the number of characters in the string.
FIND(find_text,within_text,start_num) returns the postion of the find_text within the within_text. start_num, which we won’t use here, is a way to start selecting text after a certain number of characters.
To put this all together, let’s put my email address in cell A1, place our derived function into the B1, and derive the result.
=RIGHT(A1,((LEN(A1)-FIND(“@”,A1)))) calculates to:
SQL is a little bit trickier, as there isn’t a right-to-left text selection function, instead we are going to use a left-to-right function, SUBSTRING.
SUBSTRING returns a string of text based on definition passed to the function. SUBSTRING takes three arguments, value_expression, which is the text being parsed, start_expression, which is the starting character of the returned string, and length_expression, which is the number of characters returned, starting with the start_expression. This function in SQL looks like this: SUBSTRING(value_expression,start_expression,length_expression).
In this case our email address is the value_expression. The character following the “@” symbol is the start_expression, with the length of the remaining string being the length_expression.
To determine start_expression, we deploy another SQL function, CHARINDEX, which works exactly like FIND in Excel. Using SQL’s version of LEN and the same math, we can determine length_expression.
Putting it all together, let’s assume a table named email_addresses with a column named email:
SELECT SUBSTRING(email, CHARINDEX(‘@’, email) + 1, LEN(email) – CHARINDEX(‘@’, email) + 1) AS domain_name
WHERE email like ‘%@%’
The WHERE clause is in there to prevent malformed email addresses from crashing the CHARINDEX function.
The math works the same way as the Excel, except that you have to remember that we are working from the left, so the need arises to add one to the character counter in the CHARINDEX function.
Simple and powerful. I hope this is useful to someone – the inspiration for this post came from this post at Chandoo.org. Pointy Haird Dilbert is easily my favorite as well as one of the most useful and entertaining Excel blogs out there.
Tom’s Planner is a web-based collaborative project management tool. Despite the name, I have nothing to do with Tom’s Planner. It must be run by “that other Tom”.
Perhaps due to its moniker, but more likely due to its power, flexibility, and ease-of-use, Tom’s Planner has earned a spot in my toolkit. With a simple drag-and-drop functionality, you can create Gantt charts and easily export them to an image, MS Project file, or even its own (public or private) web space. That is pretty much all the application does, but it does it well, making it ideal for Project Managers that don’t want to deal with hassle of learning the intricacies of modern Project Management software.
Check it out! They offer a fully-functional, registration-fee (!) demo at:
Rupert Murdoch has been grumbling Google’s usage of the Wall Street Journal’s content for its Google News product and threatening to dis-allow Google spiders on WSJ properties. In the first few minutes of the interview above, he gives his opinion on Google and the state of his internet businesses. It is worth a watch.
Much has been made of the traffic, and revenue losses that the Journal would incur if it voluntarily de-listed itself from Google and/or Google News. I think that all of the analysis that I have been reading in the past week is missing the subtext of his posturing. Murdoch’s frustration with Google goes way beyond their re-purposing of WSJ content.
First, some context: News Corporation, the parent company of the Wall Street Journal, is one of the largest media conglomerates in the world. The company’s market cap is currently about $35 billion USD. In addition to numerous newspapers, magazines, book publishers, film and TV studios and distributors, and other companies under the New Corp. umbrella, the company also runs some of the largest web sites in the world, that combine to generate tens of millions of site visits every day.
News Corp Sites in U.S. Top 100 (Alexa):
Fox News (39)
Wall Street Journal (88)
Of these, only Hulu (which is 45% owned by News Corp.) and Fox News do not currently participate in Google’s AdSense program to help them monetize their unsold ad inventory on their sites. The enormous amount of ad inventory available on these sites through Google AdWords (the buy side of AdSense) implies that they lean heavily on AdSense as a revenue-generating device. Looking at the cost of MySpace, Photobucket, and the WSJ through AdWords and subsequent revenue split with Google, News Corp. cannot be happy with with the revenue produced by Google’s platform compared with the amount of traffic that they get.
I am by no means implying that the pricing for this inventory is unfair; Google runs a continuous auction-based system to determine advertiser placement. My point is that Murdoch is feuding with Google because AdSense isn’t able to generate the incremental revenue that News Corp. expects from these properties based on the amount of traffic that they receive.
Here is a solution for News Corp: MySpace has a self-service advertising program called MySpace Ads. Spend a few million making the system easier for advertisers to target their desired audiences and roll out the platform across all News Corp. properties. That way, News Corp. has the ability to keep 100% of these ad revenues as well as cross-sell their inventory. After a couple of years of development, they could even be in a position to offer their platform to other publishers and directly compete with AdSense. That would make News Corp. a serious player in the internet advertising business.