Algorithms Can’t Cure All – Google Gives up Arbitrage Fight

Analysis,Search,SEM by on May 21, 2007 at 7:59 pm

I’m sure you’ve heard by now that Google has been kicking Made-For-AdSense (MFA) publishers out of AdSense. This follows multiple unsuccessful attempts to use algorithms to make the business of AdSense Arbitrage unprofitable.

First was AdSense Smart Pricing. Google intended to more closely align advertiser benefit with publisher payout (eg drop payouts to crappy, poorly converting websites). It certainly reduced payouts to MFA sites, but they continued to flourish.

Later was the AdWords Quality Score. Google introduced the AdWords Quality Score to reduce long-tail arbitrage (Arbitrageurs using millions of low cost keywords driving traffic to MFA landing pages). Quality Score is a poorly defined metric that accounts for the ‘quality’ of the relationship between the keyword, ad text and landing page. Arbitrageurs adapted and found other sources of traffic (SEO being chief among them).

So, finally after four years of automated attempts at making AdSense Arbitrage uneconomic, Google is kicking MFA-publishers out of AdSense. I’m sure they are using algorithms to identify accounts for manual review, but they’ve clearly made an important directional shift in how they think about the problem.

Avoiding the Algorithm Trap - Scalability does not require pure automation

At Quova, we initially tried to build an all-automated IP Geolocation system. In 2002 we acquired Real Mapping, a Dutch company that had taken a purely manual approach to mapping the Internet (rooms of analysts). We made the purchase to consolidate the market, however we got lucky with the technology synergies (yeah, I hate the word synergy too). Their manual approach was a great complement to Quova’s automated algorithms. By the time I left Quova in 2004 we had achieved the ideal blend: expert network geography analysts teaching an automated mapping system.

I’ve seen countless hours poured into automated solutions to intractable problems. In many cases (particularly in startups), the answer isn’t a more elegant algorithm. A lot more can be accomplished quickly if you use automation to solve 95% of the problem and manual labor to get the rest. I’ll use Excel to clean data to the point where I can manually clean the rest or we’ll outsource a project to Elance instead of trying to automate the full task.

But what about Google’s Algorithms?

Although Google is most known for their algorithmic prowess, they depend heavily on legions of people that review:

  • web sites
  • ad text
  • search results

They are even now reaching out to the web at large, asking for help in identifying things like search spam and paid links.

In many ways, Google has become a master of blending automation with manual techniques. I have to admit that I’m surprised how long it took them to acknowledge that their algorithms alone couldn’t beat the Arbitrageurs.

Google Analytics – Screwing the Power User

Products,Search by on May 17, 2007 at 10:37 pm

I’ll be the first to admit that the prior Google Analytics user interface left a lot to be desired. It was unintuitive and took a while to find information. The new interface is much improved and certainly prettier.

However, pretty isn’t always powerful. Among the power-user features that Google excluded, I was disappointed to find that Google took away their most powerful feature: export to csv.

It is actually a little worse. They didn’t take away the feature entirely - they actually changed how the feature performs, making it close to useless for large sites. You can now export 100 rows (as opposed to full reports). This is certainly the feature I use most - I routinely use it to understand how users are getting to our site (what page classes, from what sources), as well as troubleshoot site-wide issues…

The functionality is still available in the old interface, but Google states that interface will be going away in 2 months. My inquiry to customer support resulted in a claim that Google is working to add the feature back in:

Unfortunately, it is not possible to export the entire data table
in the new version of Google Analytics. We understand that this is an
important feature and are working hard to add it back into the product.

We’ve invested heavily in Google Analytics: tracking URLs and tracking codes are scattered throughout the site and our marketing campaigns. I guess I’ll wait 2 months to see if Google makes good on its promise, but I’m not looking forward to implementing a new solution.

New Google Custom Search Button Popup

Products,Search by on March 27, 2007 at 6:45 pm

I saw this popup today while I was searching our internal Trac repository:

I’m running Firefox & Google Toolbar 3.0.2.

This prompt was generated when I placed my cursor in the search box. Presumably the Google toolbar recognized the box and was prepared to auto-generate the button. I generally don’t like Toolbars and addins doing things to my browse experience without my explicit permission, but this didn’t bother me. In fact, it made me realize what a useful feature this could be.

Google getting greyer – Inline text ads?

Search,SEM by on March 22, 2007 at 7:40 am

Rahul pointed me to this Techcrunch article that covers Google’s launch of In-line text ads. For those of you unfamiliar with text links ads, these are the incredibly obnoxious, usually double-underlined links that are actually ads that appear in many forums and sketchy content sites.

I think Arrington sums it up well:

They’ve crossed a hazy ethical line here. If this product was announced on its own, it would be heavily debated by the blogs and press. But by burying it in other, bigger news, they’ve mostly avoided the critical analysis that this actually deserves

Yes, we should have seen this coming. The most recent change to the AdSense TOS raised a blog storm when publishers realized that they inline text ads violated that TOS. Google backed down and ‘clarified’ the policy, but I wouldn’t be surprised to see the old policy return.

I’ve fought to keep these things off of Judy’s Book. The typical pitch is “incremental revenue”, which may be true, but it is typically accomplished by tricking users. I haven’t seen an implementation of the Google text link ads, but their FAQ suggests that the links will be indistinguishable from from true text links until a user mouses over the link.

Oh, there’s a great negative SEO impact too for advertisers. Here is the example Google gives for the type of phrase that would benefit from text link ads:

“Widgets are fun! I encourage all my friends to Buy a high-quality widget today.”

If you invented this widget the author would typically give you a link. However, if the publisher has text link ads installed, the publisher would instead provide a link to an ad. No PR gets passed to you. Widespread adoption of Google’s text link ads will penalize the organic search results of advertisers that choose to participate. Not to mention degrade the user experience of thousands of sites across the web.

The Value of Ranking #1

Analysis,Judy's Book,Search,SEO by on February 28, 2007 at 9:20 pm

Matt McGee of Small Business SEM wrote a brilliant post on the the value of occupying the first search position. Here’s a graph detailing click share against position:


I guess we all intuitively understood this, but I was shocked that the top position saw 10x the traffic of the fifth position (which is still above the fold).

Shotguns and Sniper Rifles

I think that this suggests a lot about SEO strategy. JB receives lots of long-tail search traffic. Our approach towards SEO has been largely on-page optimization - we haven’t undertaken any linkbuilding efforts. We’ve aggregated a lot of our reviews into topically relevant pages that would be easy to link to, but we’ve largely taken a build it and links will come approach.

This data makes me rethink that strategy. Anecdotally, I know that we rarely rank in the top position, although we frequently make the top 5. I wonder if the next step would be to take a look at our long tail organic traffic and determine the rank of those terms. It would be simple to determine an ‘upside’ from this data and develop a much shorter list of terms to focus on. I’m curious what focused link building could do for some of those terms. It seems like something worth experimenting with.

Can Google Avoid the RealPlayer Phenomenon?

Products,Search by on February 17, 2007 at 9:46 pm

I don’t know what it is with media players, but they are all scummy (yes, even Apple’s Quicktime). They are all locked in mortal combat with each other trying to win the battle of the default settings. After an itunes upgrade, shortcuts litter the desktop, new programs are running in the background and Quicktime has been set as the default program for all media files.

I was surprised when I installed Picasa to find this options menu (yes, these are the defaults).


The new default setting battle is over the search box. Google’s not at Real’s level yet, but I definitely didn’t feel great about those checked boxes.

New Google Webmaster Link Tool – Proof that it pays to promote

Analysis,Search,SEO by on February 10, 2007 at 12:31 am

There has been plenty of coverage around the blogosphere about the new Google Webmaster Link Tool.
Danny Sullivan’s writeup at SearchEngineLand is top notch and well worth a read if you’re looking to understand the tool.

Here’s a screenshot of my links:


There are really 5 posts that stand out. These five posts generated 88% (1661) of my inbound links. The post that got to the front page of Digg generated 54% (1020) of my inbound links. With each of these 5 posts, I realized that they would likely be interesting to a wider audience. I emailed 2-4 very relevant bloggers each time and frequently got picked up - the rest of the links followed from those few pickups.

It is worth stating that the number of unique domains in that pool is much smaller than the number of links. I’d love for Google to include a unique domain count column in this report as well - it would greatly increase the value of the tool.

Is the Semantic Web (Web 3.0) Dead On Arrival?

Business,Search by on February 7, 2007 at 10:33 pm

Warning, kind of long…

I saw Tim Berners-Lee evangelizing the Semantic Web back in ~ 1998 while I was at MIT. I love the concepts behind the semantic web, and I fully appreciate the power of what the semantic web might some day look like, but I think we’re nearing the 18th year of his evangelical crusade.

A quick primer on the Semantic Web. The semantic web is all about standardized and formatted data. Calendars, places, names, research data, etc. would all be published in standardized formats that could be mashed up, shared and used throughout the web. As always, wikipedia has a good article on the semantic web, and much information can be found on the W3C standards page and the microformats page. There is also a great write up of the semantic web at ReadWriteWeb.

RSS and Microformats are the most well-known variants on the semantic web concept. RSS continues to be a success, but Microformats have languished.

Why RSS succeeded?

It’s all about value thresholds and defaults.

  • Value thresholds for users: It only takes 2 RSS feeds and an RSS reader for a user to derive value from an RSS feed. So, if 2 sites a user visits have RSS feeds, then they’ll benefit by using a single RSS reader. The value threshold for changing user behavior is very low. Only a few sites need to publish the format to get users to see value and begin to change their behavior.
  • Defaults: There are really 3 dominant blog platforms, and they’ve been dominant for a long time. They made the creation of RSS feeds a default very early. Writers had to do nothing to enable RSS feeds - they were automatically generated. So, as blogging took hold, default settings created the supply of RSS feeds that were consumed by visitors.
  • Value thresholds for publishers: Once a critical mass of visitors seeking RSS formed, mainstream sites found huge value in publishing an RSS feed. It connected them to their users offsite and drove visitors from other mashed up sites using their feeds. But, it wasn’t until the critical mass of users formed that mainstream sites adopted RSS.

Challenges of Microformats

Ok, let’s look at microformats and the challenges they face. De facto data standards for calendar events, contact information and addresses (courtesy of the Microsoft Office monopoly and the USPS) have been around forever, but have not received anywhere near the adoption of RSS online. Is the hcard, really any different than the vcard?

  • Value thresholds for users: Users don’t get behavior changing benefit from 1 or 2 sites that use hcalendars, hcards or hreviews. Sure, an hcard might make it easier for a user to add to their address book, but it won’t fundamentally change the way the user interacts online. Any developers that want to create utilities for users, will need to support unformatted data as well, unless there are tons of sites publishing to the standard. There will be no community of users demanding the implementation of microformats.
  • Default settings - too much work: Microformats are all about providing structure. In order for publishers to adopt the microformats they’ll need to do extra work. They can’t write a standard blog entry that’s a review, it will have to be something different. Same thing if they’re announcing an event or providing their contact info.
  • Value thresholds for publishers: Unless tons of users (or tons of other websites) are using the microformats, publishers have little incentive to publish to microformats (or any other sort of pre-defined standard). For example, we implemented hreview microformats at Judy’s Book, and they have had absolutely no impact on our business. We’ve gotten great distribution of our reviews, but microformats haven’t played a role in that. At all.

A critical mass of users consuming microformats isn’t likely to form. They aren’t going to change their behavior because uses hreviews, hcards or hcalendars. Firefox 3 is looking at ways to enable users to use microformats within the browser experience, but even the use cases addressed in this post are really edge cases that hit a user once a week at best.

Publishers aren’t going to do the extra work without any direct benefit to themselves (happier users, more users, etc.)

So, is the Semantic Web DOA?

I don’t think so. I just think it will unfold very differently than conventional wisdom expects. Publishers won’t adopt the standards. Aggregators will build an audience, define (maybe adopt) the standards and then drive adoption to smaller publishers hoping to access the audience of the aggregators.

Vertical Search, Google Base & Intelligent Agents

Method 1: A big player defines a standard. Google, Yahoo and Ask have incorporated Judy’s Book’s reviews into their local products. Google and Yahoo defined their own formats, and since we wanted our reviews there, we published to those standards. We’ve received substantial traffic from those relationships, but they had nothing to do with microformats, the semantic web or other industry standards. If the major players define a format and allow any site to submit their content, small sites will scramble over themselves to get their content in that format.

Hellooo Google Base. If Google starts sending search through Google Base content to websites, we’ll see a mass adoption of Google’s defined formats.

Combine this standard with APIs to access the content, and mashups will spring up everywhere. And, more sites will submit in the format.

Method 2: Intelligent Agents standardize the information.
Ask actually approached the problem differently, they crawled our web pages, determined our page structure and then extracted the reviews. This intelligent agent approach is far more powerful than the Google Base approach and I believe it is more likely to drive the creation of a semantic web.

Aggregators & ‘Intelligent Agents’ (Vertical Search) are already solving the hard problem of dealing with disparate data formats. Kayak for flights, CalendarData / Trumba for events, Yodlee for financial information, Trulia for real estate, and Dapper for data.

These services have directly addressed the ‘user and publisher’ value thresholds. They attempt to aggregate ALL of the information on a particular topic, without publishers having to do anything different. Finding reviews, contact info and calendar events out in the wild is a hard but not an intractable problem.

Not all of the information that these Agents find out in the wild will be deciphered. They’ll take an 80/20 approach and won’t get everything. But, as those aggregators gain traction and start shedding traffic to other sites, publishers will begin ‘review optimizing’ or ‘calendar optimizing’. They’ll optimize to whatever standards the aggregation sites require, much like happens with search optimizers today. And, it is only a matter of time before the Intelligent Agents open up via API so that mashups can form the world over.

Voila, you have a semantic web. Maybe the standards of the semantic web will be those defined today, but more likely they’ll be whatever Google says they will be.

« Previous PageNext Page »
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. | Dave Naffziger's BlogDave & Iva Naffziger