You Say Tomato…

Over at Coffee, Sun & Analytics, Xavier has a couple posts on session length. Some good thoughts there, but I was surprised at the statement

Session length = number of pages users viewed during their session on the site.

Call me old school, but I thought session length was the amount of time a user spent.

I can’t find any definitive phrase for what Xavier is talking about. At Accrue we called it Session Depth or Visit Depth (we said a session was user-centric: a user may visit many sites during a session). At Yahoo there’s no new term, it’s just “pageviews per session.”

Argh. Here we have two people with a lot of experience in web analytics and we’re not even speaking the same language. What a mess!

You Say Tomato…

Google Acquires Urchin

In other news, Google announced that they acquired Urchin, a web analytics vendor and service. This makes a lot of sense for Google, but not for some of the reasons I’ve seen speculated on.

One speculation is that it gives Google web analytics capabilities to analyze their site. Actually, no, it doesn’t. Google has too much traffic, and their analysis needs are too complex.

Another is that Google can now offer this as an additional capability to their AdWords / AdSense customers. I don’t buy this. Google’s already got enough reporting capabilities in the SEM (search engine marketing) area, and Urchin isn’t going to add any value here that couldn’t have been done cheaper in-house.

It’s also not because Google is just a bunch of Nice People and they want to have another tool in their portfolio of cool stuff.

So if Google doesn’t need this for their own analytics, or to offer to AdWords customers, why bother? After all, Urchin isn’t a game-changing technology. There are better solutions available, no matter which axis you measure on.

Simple. Google did this because they want more ability to get off-network surfing data. They want to know, for people not using any Google services, what are they using? That information is partially available through AdSense, because AdSense lives on third-party sites. That’s a rich source of data. A nice way to get even more off-network data is to supply folks with a hosted analytics service that most small and medium-sized web sites can use. Simply put a web bug / beacon in your page, and we’ll track your visitors for you. And for us.

(Before you get all cynical on me: yes, Overture bought Keylime many years ago, for SEM reporting, and perhaps for off-network information, I don’t know. The difference between the Overture/Keylime and Google/Urchin deals is that Yahoo! and Overture are different legal entities, and have different privacy policies. As a result, Yahoo and Overture cannot share third-party information about web surfers. Whether or not that makes business sense is beside the point – Yahoo’s pretty rabid about privacy.)

One final element of this announcement. If there’s no privacy backlash, and web sites brush off the concept of Google as big brother, the low-end market for web analytics is effectively dead. Omniture, WebSideStory and (perhaps) CoreMetrics will survive, but it’s going to be tough for anyone else, which is going to give the newly independent WebTrends second thoughts about resurrecting WebTrends Live / WebTrends OnDemand.

Google Acquires Urchin

Shaking up the Analytics Landscape

In case you missed it, NetIQ is spinning out WebTrends. I won’t speculate why – oh hell, of course I will. I thought (and still think) that WebTrends went together with NetIQ only a small amount more than Andromedia went together with Macromedia – that is to say, not very much. The two companies have different lines of business, and web analytics ended up being a side show. WebTrends and NetIQ sell to different people in the organization – simple as that.

I’ve heard rumor of a somewhat similar web analytics deal coming down soon, as the vendor sells off its analytics business to focus on a different line of products.

Consolidation, or diversification? Apparently some people think they can’t make enough in the web analytics business, while others think they can. Interesting times.

Shaking up the Analytics Landscape

Mojo et al

A bit of a buzz today around Om Malik’s How Yahoo Got Its Mojo Back with the attendant lovers and haters commenting along. As is the norm, a lot of the haters (of both Yahoo and Google) don’t know what they are talking about.

I still find it surprising how seemingly intelligent people can march up and down about how one service is amazing and the other is absolute rubbish. If it works for you, great. I know people who swear by the gmail interface, and others who swear at it. Some people want My Yahoo, and others prefer Google News. So be it.

Regardless, I think what makes both wonderful is the competition. Microsoft is coming? Hey, jump in – the water’s great.

Mojo et al

Flickrizing Yahoo!

Not a lot of blogging lately – not because there’s nothing to talk about, but because I’m up to my eyeballs in resumes and recruiting. (If you can code, and you understand web data, get in touch!)

Regarding Yahoo!’s purchase of Flickr – some random thoughts:

  1. I suspect Flickr will influence Y! more than the other way ’round.
  2. Tagging (aka folksonomies) will show up in other places on Y!.
  3. We (SDS) need a strategy for figuring out how to analyze/report on tags (perhaps with similar technology that’s used to power the the buzz index).
  4. Tags are going to give Overture and Google a whole new set of opportunities and headaches for context advertising. On the surface, they look like they could be used like search terms, but in so many ways, they’re a lot different.
  5. I’m glad I had the foresight (or lack of imagination) to create a Flickr ID that’s the same as my Yahoo! ID.
Flickrizing Yahoo!

Firefox? Yes Please.

I’m one of those guys that runs around thinking people should use the Firefox browser. Many people inside Y! do use it, but they are generally the early adopters. A week or two ago, posters went up around campus announcing an internal test of a new service (no, it’s not Yahoo! 360°). It looks pretty useful, so I went to check it out.

Windows and IE only. I couldn’t believe it.

So forgive me for snickering when I read Yahoo vows to open all services to Firefox users. It said Yahoo would not launch any new services without Firefox support. Cool.

Maybe the product in internal test will get modified to support other browsers, or operating systems, before it’s released. That would be great. And then maybe Launch could support non-Windows machines too…

Update: Sigh.

Firefox? Yes Please.

The web data pipelines

I wanted to address another observation given in the article Things That Throw Your Stats. The author makes the statement:

Web analysis is statistics, not accounting.

While I think his overall message is a disservice to the people trying hard to increase accuracy and accountability on the web, I won’t go on about that here. Instead, I want to point out that his view of web analysis is too narrow.

Actually there are three different components to web analysis. At Yahoo! we have many sources of data, but fundamentally three data pipelines:

  • Operational
  • Financial
  • Analytical

Each may start from a central place, such as the web server log files, but they move through the infrastructure at different speeds, and in different ways, because they are used for different things.

The operational data pipeline is largely concerned with availability, quality of service, consistency, correctness, etc. Some of the analysis needs to be available in real-time, and some of it much less so. A lot of the analysis is accounting, but there’s statistics involved for things like failure prediction.

The financial data pipeline is all about the money. If you can’t account for it, you can’t charge for it. Since Y! is largely ad-driven, it’s important to get this aspect right. A 10% “fudge” won’t sit right with advertisers, nor with shareholders, nor with the fine folks who brought you Sarbanes-Oxley. Not everything needs to be collected (e.g. click paths aren’t very interesting), just metrics like ad views and clickthroughs. It’s not real-time, but needs to be available relatively soon after a campaign ends, or at the end of an accounting quarter. This is largely straight accounting, yet there are statistics involved, for things like detecting click fraud.

The analytics data pipeline largely parallels the financial pipeline, but doesn’t have to be SOX-compliant. Also much more data is collected (e.g. browser string), and even more data is algorithmically computed (e.g. visit duration). The intention, of course, is to use analytics to impact the other two systems. The tricky part is that the way to positively impact the operational and financial systems is by improving the user experience (better response times, more engaging content, etc.) which largely must be inferred through observed behavior. There’s some accounting here, but largely statistics, advanced metrics, and data research/mining, with a heavy dose of human-based synthesis. Some of the results of the analytics systems feed the operational pipeline, for things like providing targeted advertising based on observed interest.

While the group I’m in largely focuses on strategic uses of the web data – the analytics pipeline – it’s never done in a vacuum; we’re always cognizant of the other two pipelines. All three groups – operational, financial, and analytical – are all doing analysis, all with the same source data, all towards the same overall goals. The data we keep, the tools we use, and the methods we employ can be very different, but it’s always a combination of accounting and statistics – never just one or the other.

The web data pipelines