The Value of Ranking #1

Analysis,Judy's Book,Search,SEO by on February 28, 2007 at 9:20 pm

Matt McGee of Small Business SEM wrote a brilliant post on the the value of occupying the first search position. Here’s a graph detailing click share against position:

aol-search-data.gif

I guess we all intuitively understood this, but I was shocked that the top position saw 10x the traffic of the fifth position (which is still above the fold).

Shotguns and Sniper Rifles

I think that this suggests a lot about SEO strategy. JB receives lots of long-tail search traffic. Our approach towards SEO has been largely on-page optimization - we haven’t undertaken any linkbuilding efforts. We’ve aggregated a lot of our reviews into topically relevant pages that would be easy to link to, but we’ve largely taken a build it and links will come approach.

This data makes me rethink that strategy. Anecdotally, I know that we rarely rank in the top position, although we frequently make the top 5. I wonder if the next step would be to take a look at our long tail organic traffic and determine the rank of those terms. It would be simple to determine an ‘upside’ from this data and develop a much shorter list of terms to focus on. I’m curious what focused link building could do for some of those terms. It seems like something worth experimenting with.

The Secret List of Sites Banned by Digg

Analysis,Digg by on February 19, 2007 at 9:56 am


Update

Nearly everyone of these sites has been unbanned. There are only 6 sites on this list that remain banned.

neogaf.com
thevideosense.com
blinklist.com
geocities.com
digg.com
idontlikeyouinthatway.com

The first 4 sites were the ‘temporary ban’ type. Digg is digg and will likely remain banned. In fact, idontlikeyouinthatway is the only site from the original list that was permanently banned. It would seem likely that all the permanent bans were lifted (but not the temporary ones), and that the idontlikeyouinthatway was rebanned (or maybe it was extra banned to begin with)…

Original Post:

Ever wonder which sites are banned by Digg? Who would have thought that 3 of the top 10 Alexa sites and sites like CareerBuilder, DHL and 43Things would be banned? To develop as complete a list as possible, I tested the top 10,000 Alexa domains and top 1,000 Blogshares blogs to see which were banned. Overall, I found 183 banned sites.

The banned sites fell into several categories:

  • User Generated Content sites without subdomains. One bad actor on these sites can ruin it for everyone. istock_000002759661xsmall2.jpg Popular UGC sites like Myspace, Squidoo, 43Things, Geocities are all banned, whereas sites like Typepad, Blogspot, WordPress do just fine because it is easy to ban one bad actor. If I were Seth Godin, I’d give Squidoo lenses their own subdomains pronto - there is good content on Squidoo that will never see the light of Digg.
  • Sites about SEO & Affiliate Marketing. These include TopRankBlog, DigitalPoint, Revenews, John Chow, Paula Mooney, etc. There is some great content that’s been banned … and plenty of poor content as well (theRichJerk).
  • International Sites, particularly Asian sites (Baidu, Sohu, Sina, Yandex, etc.). I can’t speak to the quality of these sites, but four of them are in Alexa’s top 20 and others are very popular. Digg and Digg users would certainly benefit from international versions of its site. (Hint, follow the Google model, not the Yahoo model).
  • Scummy sites. There are plenty of sites here that I’m not surprised to find banned. Gossip Sites (perezhilton), Adult-themed sites (pornotube), adware/spyware sites (smileycentral), etc.

I’m sure that plenty of sites were banned due to attempts at gaming Digg, but I obviously can’t distinguish those from the sites on the list above.

The big list of banned domains:

Domain (Alexa)

baidu.com (4)
myspace.com (6)
sina.com.cn (10)
sohu.com (16)
163.com (17)
rapidshare.com (26)
wretch.cc (32)
yandex.ru (43)
rapidshare.de (65)
geocities.com (69)
digg.com (75)
digitalpoint.com (103)
126.com (105)
pornotube.com (188)
ynet.co.il (192)
21cn.com (194)
elmundo.es (248)
smileycentral.com (300)
libero.it (329)
livejasmin.com (330)
freewebs.com (339)
careerbuilder.com (388)
o2.pl (393)
sina.com (397)
juggcrew.com (404)
anonym.to (435)
startimes2.com (446)
ezinearticles.com (453)
forumer.com (469)
bangbros.com (512)
fishki.net (526)
donews.com (562)
6rooms.com (605)
yoqoo.com (617)
cjb.net (630)
myfreepaysite.com (637)
tvix.cn (666)
nichedsites.com (712)
tinyurl.com (727)
surfjunky.com (780)
as.com (785)
bolaa.com (819)
iwebtool.com (824)
perezhilton.com (832)
askjolene.com (835)
text-link-ads.com (949)
ce.cn (984)
getafreelancer.com (1053)
douban.com (1168)
thesuperficial.com (1210)
tiscali.it (1218)
1shoppingcart.com (1358)
katz.ws (1376)
clubic.com (1386)
segundamano.es (1580)
porkolt.com (1628)
indiafm.com (1656)
43things.com (1694)
wikimapia.org (1724)
ecademy.com (1749)
dreamhost.com (1819)
clickbank.net (1827)
thumblogger.com (1857)
hidebehind.com (1916)
oneindia.in (2004)
directtrack.com (2008)
egotastic.com (2019)
globes.co.il (2197)
tlen.pl (2228)
globe7.com (2263)
javimoya.com (2349)
wwtdd.com (2395)
serials.ws (2414)
sexyclips.org (2444)
techweb.com.cn (2504)
goarticles.com (2654)
furl.net (2662)
lix.in (2695)
care2.com (2747)
consumptionjunction.com (2825)
box.net (2879)
usfreeads.com (2923)
lynxtrack.com (2986)
dhl-usa.com (3010)
newsnow.co.uk (3051)
mojoflix.com (3063)
blueyonder.co.uk (3119)
fleshbot.com (3159)
freepay.com (3180)
lunarpages.com (3187)
9down.com (3289)
blinklist.com (3319)
bigpond.com (3382)
jajah.com (3596)
xpeeps.com (3603)
zooloo.co.il (3689)
m90.org (3696)
infos-du-net.com (3743)
agloco.com (3755)
johnchow.com (3887)
idontlikeyouinthatway.com (3898)
nothingtoxic.com (4007)
brinkster.com (4076)
blingo.com (4216)
earnersforum.com (4219)
6x.to (4260)
cheapflights.co.uk (4300)
naughtyathome.com (4333)
microsiervos.com (4335)
stubhub.com (4353)
justjared.com (4382)
petitiononline.com (4544)
assisass.com (4683)
ebags.com (4714)
ffshrine.org (4751)
planetnana.co.il (4769)
searchwarp.com (4912)
pimpmyspace.org (4954)
pokernews.com (4970)
totallycrap.com (5052)
giveawayoftheday.com (5089)
vbseo.com (5322)
dlisted.com (5323)
suite101.com (5361)
blogmarks.net (5436)
exploitedbabysitters.com (5480)
wierdporno.com (5537)
webworkshop.net (5846)
netidentity.com (5871)
neogaf.com (5932)
nforce.nl (5982)
parisexposed.com (6053)
defamer.com (6182)
therichjerk.com (6218)
yigg.de (6325)
ebooksclub.org (6371)
rs6.net (6400)
articlesbase.com (6445)
weakgame.com (6450)
podomatic.com (6524)
humornsex.com (6615)
vidaextra.com (6738)
clixgalore.com (6852)
todaysfreevideo.com (7001)
freeworldgroup.com (7022)
steakandcheese.com (7081)
webgains.com (7150)
crackserver.com (7159)
spankwire.com (7294)
funnyinside.com (7295)
bastardly.com (7403)
bildirgec.org (7417)
softsearch.ru (7442)
koreus.com (7560)
toprankblog.com (7568)
kingsofchaos.com (7642)
mihd.net (7977)
nastyboards.com (8118)
serialz.to (8121)
azjmp.com (8155)
totallynsfw.com (8260)
gambling911.com (8265)
shoutwire.com (8374)
poosieflix.com (8387)
stormpay.com (8475)
revenews.com (8703)
knuttz.net (8765)
gamereplays.org (8816)
indianpad.com (8867)
stormfront.org (8874)
habrahabr.ru (8900)
jkonline.cn (8976)
presseportal.de (9295)
thevideosense.com (9320)
bet365.com (9826)
offtopic.com (9841)
sweetnjuicey.com (9938)
fishki.ne (blogshares)
geeksmakemehot.com (blogshares)
mess.be (blogshares)
microsiervos.co (blogshares)
sfoxes.blogspot.com (blogshares)
popbytes.com (blogshares)
theundersigned.net (blogshares)

Methodology:

  • How to test a domain on Digg. Digg performs several validation checks when a URL is submitted. After these checks, Digg takes you to a page to enter the title and description. The checks occur in this order:
    • Is the URL valid?
    • Has the URL been submitted before?
    • Is the domain banned? Digg has three types of banning:
      • url is on the banned submit list. This seems to be a permanent ban.
      • This URL has been reported by users and cannot be submitted at this time. Perhaps a temporary ban? Sites previously listed with this tag don’t appear to be currently banned.
      • Please link directly to the story source.This URL has been reported as a news middle-man, it will remain blocked for 0 days. It looks like the bans start at 300 days or so…
  • Getting the top 10,000 domains. I used Ruby to query Amazon’s Alexa Top Sites web service and get the list of the top 10,000 sites. Five minutes later, I was $25 poorer and 10,000 domains richer.
  • Constructing queryable URLs. Alexa doesn’t provide subdomain information, so I added a “www” to the front of every domain, and a fake parameter to the back of each domain, thus creating a valid, unique URL for testing. So, 43things.com became http://www.43things.com?a13=1
  • I then tested all 10,000 URLs (in the middle of the night so as to not load Digg’s servers) to see if they passed all three tests. The ones that failed the ‘banned domain’ test are those I included in the list above.

Known Flaws:

  • Digg blocks at the subdomain level. I didn’t have the data to query subdomains. So, I added a www at the front of every domain. I missed all subdomains such as mydiggspamblog.blogspot.com or ww2.myspamsite.com
  • Not all websites accepted my fake parameter. These domains failed the valid URL test. 6% of websites didn’t return a valid page when presented with the parameter - most commonly because they perform some redirect when a user types domain root. Check out the diamond retailer: http://www.tiffany.com for an example.
  • Of course, I missed many, many websites that were banned by Digg.

More resources & Related Posts:

New Google Webmaster Link Tool – Proof that it pays to promote

Analysis,Search,SEO by on February 10, 2007 at 12:31 am

There has been plenty of coverage around the blogosphere about the new Google Webmaster Link Tool.
Danny Sullivan’s writeup at SearchEngineLand is top notch and well worth a read if you’re looking to understand the tool.

Here’s a screenshot of my links:

google-webmaster-tools.jpg

There are really 5 posts that stand out. These five posts generated 88% (1661) of my inbound links. The post that got to the front page of Digg generated 54% (1020) of my inbound links. With each of these 5 posts, I realized that they would likely be interesting to a wider audience. I emailed 2-4 very relevant bloggers each time and frequently got picked up - the rest of the links followed from those few pickups.

It is worth stating that the number of unique domains in that pool is much smaller than the number of links. I’d love for Google to include a unique domain count column in this report as well - it would greatly increase the value of the tool.

134 Countries where you can Google but you can’t Yahoo

Analysis,Geolocation,Search by on January 3, 2007 at 8:29 am

I’ve pulled together a list of countries that Google has entered where Yahoo hasn’t. I chose those countries where Google resolves a country-specific domain in that country’s native language and Yahoo doesn’t. I also chose to limit my analysis to the countries recognized by ISO 3166 (those that IANA has established ccTLDs for).

4307479_thumbnail4.jpg

Yes, it is far easier for Google to launch their home page in another country than for Yahoo to launch their homepage. However, making search available and customized for each country begins the long process of brand establishment. I noticed it in Peru and again in the Czech Republic – they know and love Google. They don’t know Yahoo.

Here’s a look at their addressable search markets:

Population Internet Users Internet Penetration
Yahoo and Google 4.31B 893M 21%
Google Only 1.67B 158M 9%
Neither 0.52B 19M 4%

Google reaches a population 40% larger than Yahoo’s and an Internet user base 20% larger. This Internet user base is also growing faster than Yahoo’s user base. Google’s approach bears itself out in their financials. % non-US (International) Revenue:

2005 Q3 2006
Google 39% 44%
Yahoo 30% 33%

The list of countries where you can Google but you can’t Yahoo:

Country – # Internet Users
Turkey – 16M
Poland – 11M
Pakistan – 11M
Portugal – 8M
Chile – 7M
Ukraine – 5M
Belgium – 5M
Czech Republic – 51M
South Africa – 5M
Egypt – 5M
Nigeria – 5M
Romania – 5M
Colombia – 5M
Morocco – 5M
Peru – 5M
Israel – 4M
Belarus – 4M
New Zealand – 3M
Hungary – 3M
Venezuela – 3M
Saudi Arabia – 2M
Slovakia – 2M
Bulgaria – 2M
Ireland – 2M
Serbia & Montenegro – 1.5M
Croatia – 1.5M
United Arab Emirates – 1.4M
Lithuania – 1.2M
Slovenia – 1.1M
Jamaica – 1.1M
Kenya – 1.1M
Latvia – 1M
Costa Rica – 1M
Puerto Rico – 1M
Zimbabwe – 1M
Tunisia – 1M
Dominican Republic – 0.9M
Uzbekistan – 0.9M
Bosnia-Herzegovina – 0.8M
Guatemala – 0.8M
Estonia – 0.7M
Uruguay – 0.7M
Azerbaijan – 0.7M
El Salvador – 0.6M
Jordan – 0.6M
Ecuador – 0.6M
Senegal – 0.5M
Haiti – 0.5M
Uganda – 0.5M
Bolivia – 0.5M
Moldova – 0.4M
Ghana – 0.4M
Kazakhstan – 0.4M
Luxembourg – 0.3M
Bangladesh – 0.3M
Panama – 0.3M
Kyrgystan – 0.3M
Sri Lanka – 0.3M
Mongolia – 0.3M
Iceland – 0.3M
Oman – 0.2M
Zambia – 0.2M
Honduras – 0.2M
Libya – 0.2M
Paraguay – 0.2M
Cuba – 0.2M
Mauritius – 0.2M
Georgia – 0.2M
Nepal – 0.2M
Qatar – 0.2M
Cote d’Ivoire – 0.2M
Guyana – 0.2M
Trinidad & Tobago – 0.2M
Bahrain – 0.2M
Armenia – 0.2M
Congo, Dem. Rep. – 0.1M
Nicaragua – 0.1M
Malta – 0.1M
Ethiopia – 0.1M
Bahamas – 0.1M
Guadeloupe – 0.1M
Namibia – 0.1M
Fiji – 0.1M
Botswana – 0.1M
Brunei Darussalem – 0.1M
Malawi – 0.1M
Gambia – 49K
Lesotho – 43K
Cambodia – 41K
French Guiana (FR) – 38K
Greenland – 38K
Rwanda – 38K
Congo – 36K
Guernsey & Alderney – 36K
Turkmenistan – 36K
Belize – 35K
Afganistan – 30K
Suriname – 30K
Virgin Islands (US) – 30K
Jersey – 27K
Burundi – 25K
Laos – 25K
Dominica – 201K
Antigua & Barbuda – 20K
Liechtenstein – 20K
Sao Tome & Principe – 20K
Seychelles – 20K
Grenada – 19K
Maldives – 19K
San Marino – 14K
Micronesia – 14K
Djibouti – 9K
Solomon Islands – 8K
St. Vincent & the Grenadines – 8K
Vanuatu – 8K
Gibraltar – 6K
Samoa – 6K
Tajikistan – 5K
British Virgin Islands – 4K
Cook Islands – 4K
Anguilla – 3K
Tonga – 3K
Kiribati – 2K
East Timor – 1K
Saint Helena (UK) – 1K
Norfolk Island – 0.7K
Niue – 0.5K
Nauru – 0.3K
American Samoa – –
Man, Isle of – –
Mayotte (FR) – –
Monserrat – –
Pitcairn Islands – –
Tokelau – –

A few other interesting tidbits.

  • Most interesting Google domain: google.off.ai
  • Domain squatters. Google owns google.com.ua (ukraine) and google.com.vc (st vincent and the grenadines), but not google.ua and google.vc. Google also doesn’t own google.cm – which could be a typo for google.com.
  • Biggest countries (by Internet users) without Google: Iran (7.5M), Sudan (2,8M), Algeria (1.9M), Syria (800K)

I used internet user statistics from http://www.internetworldstats.com.

As always, its easy to miss things. Please let me know if I missed something.

And a few related posts:

What your Alexa stats would look like after 20 diggs in 30 days

Analysis,Digg by on December 9, 2006 at 3:03 pm

nwfdailynews.com (North West Floriday) has had 20 stories dugg in the last 30 days (including 5 on 1 day alone). The graph below shows Alexa’s traffic stats along with the dates that the stories were dug. Top Digger giantapplecore (Ranked #39), submitted all 20 of those stories, and 147 stories from NWFDailyNews over the last two months.

Just about all the articles submitted by giantapplecore are syndicated AP articles. Given the tremendous success giantapplecore has had with Digg, I would guess that NWFDailyNews is a model for monetizing Digg traffic. Take a look at the adsense placements on this page: I would expect that the upper right box has been moved around a bunch and been fairly well optimized.

alexa2.PNG

Although NWFDailyNews seems to be oriented towards conservatives, it doesn’t look like giantapplecore’s submissions have a particular political bias. He is very good at writing compelling articles and headlines and I would guess that he has found a profitable system for making money off of Digg and is just repeating it over and over.

Syndicated news stories aren’t what Digg is about (kind of like blog spam), but news.yahoo.com is the third most submitted site and they are largely a news syndicator.

What’s even more impressive is how giantapplecore has used to Digg to build nwfdailynews from nothing 6 months ago, to a top 20,000 site on Alexa:

graph.png

Related Posts on Digg:

Which sites do the Top Diggers Read?

Analysis,Digg by on December 9, 2006 at 12:30 am

How do you influence the influencers?

I’ve become slightly obsessed with figuring out Digg. I’m fairly confident that the top Diggers represent the key to understanding Digg, so I scraped the most recent 150 submitted stories from the top 100 Diggers (just under 15,000 stories) with the goal of finding out what they read.

A few observations:

  • Although the list largely consisted of several tech-centric sites, there were a few surprises on the list – several domains that I’d never heard of before
  • Many Top Diggers have a pretty narrow reading list. However, there are those that are pretty adventurous – I’ll put together a separate post about the those diggers
  • Several sites have benefited tremendously from “Patron Diggers” – one or two diggers that religiously reads and submits from the site (in some cases they may be site owners, employees, etc.) See this post on nwfdailynews.com and GiantAppleCore.

The first output from my research is the top 50 domains that Top Diggers are reading. I’ve highlighted the domains where a single Digger accounted for an inordinate share of Diggs.

#DomainTotal Submits from Top 100 Diggers#Unique DiggersPatron Digger (#Submits)
1youtube.com51473Foenetik(105)
2news.com.com35857
3news.yahoo.com29856
4nytimes.com24949Parislemon (72)
5news.bbc.co.uk21053iFelix (71)
6today.reuters.com17944
7thinkprogress.org17511jlegum (119)
8physorg.com16628
9livescience.com15716starexplorer (104)
10arstechnica.com15435
     
11nwfdailynews.com1471giantAppleCore(147)
12washingtonpost.com14235
13engadget.com11941
14breitbart.com11710elebrio (87)
15abcnews.go.com11433
16cnn.com10533
17wired.com10240
18video.google.com9237
19gearlive.com912andru (90)
20theinquirer.net9126
     
21businessweek.com8929
22gizmodo.com8626
23msnbc.msn.com8545
24forbes.com7934
25linuxdevices.com797deviceguru (50)
26howtoforge.com765hausmasta (72)
27eweek.com7123
28money.cnn.com6827
29eurekalert.org6611
30lewrockwell.com662Rhiannon1214 (50)
     
31betanews.com6510
32informationweek.com6320
33kotaku.com6118
34space.com6010
35usatoday.com5631
36joystiq.com5518
37theregister.co.uk5421
38sciencedaily.com5312
39sports.espn.go.com5314
40timesonline.co.uk5331
     
41latimes.com5221
42tgdaily.com5111Bleek-II (30)
43newscientist.com5019
44news.zdnet.com4816
45mediamatters.org476snipehack (38)
46time.com4725
47blogs.zdnet.com4620
48dailytech.com4516Bleek-II (20)
49guardian.co.uk4527
50sfgate.com4522Tomboy501 (16)

nwfdailynews.com is the Northwest Florida News. Go figure.

Update

I’ve uploaded the source data I used for the analysis. Use that file to answer questions and comments along the lines of “You forgot ‘insert favorite site here’. I see that site on Digg all the time”. You’ll feel better knowing that I didn’t forget it – this was a study of the top diggers, not a study of all the sites submitted to Digg.

What I learned from my first attempt to get Dugg

Analysis,Digg by on December 3, 2006 at 10:01 pm

I submitted a post to digg on the Google Holiday gifts, hoping that it would be Dugg. As far as I can tell, I was the first person on the net to post a picture of the Google present, although someone had already posted the specs on a Webmasterworld thread. My post didn’t hit the front page, but several days later a very similar one did. My instincts were right (that the content was interesting to Digg users), but my execution was off.

Here is my submission:

Google’s 2006 Holiday Gift to Publishers – See a Photo (11 diggs) submitted by davenaff 5 days ago (via http://www.naffziger.net)
Google is sending out LCD photo frames to Publishers. Check out a picture of the Google Holiday package – it looks like they are preparing to send this out worldwide.
Here is the submission that hit the frontpage (submitted two days later):
Here it is: ‘The 2006 Google Christmas Card’ (747 Diggs)
submitted by CLIFFosakaJAPAN 3 days ago (via http://www.neatorama.com)
What do you have to do to get on Google’s Christmas card mailing list? Shawn Hogan describes the gift he received, a digital photo frame. I feel a little jealous…

I learned a few things:

  • Write titles that appeal to a broad audience – I narrowed the audience by using the words ‘for publishers’. The successful post title was unclear about who would receive a google holiday card (or that it was even something physical). Heck, for all the reader knew, they too could receive the Google holiday card.
  • Ask a question – Asking a question in the description encourages interaction and user comments – especially if the user knows the answer (and thinks the submitter doesn’t). It also makes users think about the story more. My most successful AdWords campaigns would combine a question with a call to action. I wonder if the same thing applies here.
  • Be active – Ideally get a top Digger to submitCLIFFosakaJAPAN is the #5 digg user. The top Digg users must be getting pitched stories, and if they aren’t I’m sure they will in the future. A story submitted by a top user has a much greater chance of getting to page 1. I wouldn’t be surprised if these pitches aren’t that sophisticated and I highly doubt that many PR agencies have established relationships with them (they should).

There is a ton of good content on the web about submitting to Digg, but these were my first takeaways and I felt it was useful to highlight them with a real-world successful/unsuccessful example.

Sample Size vs. Sample Bias

Analysis,Business by on October 11, 2006 at 5:40 pm

There are numerous posts online about how the various online measurement firms present very different views on things like unique users and page views. A few of the better ones I’ve read include Fred Wilson’s ‘Whose Numbers are right?’, Donna Bogatin’s ‘Data Attraction: Hard science or numbers game?’ and Avinash Kaushik’s detailed analysis of how Hitwise and Comscore get their data. I’ve even commented on Alexa’s fallibility w.r.t. Judy’s Book Traffic.

Invariably, someone always comments that the company with the biggest sample must be the most reliable. However, people often overlook the point that sample size and sample bias are both equally essential to creating a statistically valid sample. In fact, sample bias is typically far more important if you’re trying to extrapolate the data you find.

Sample Size

Sample size is a hard concept for people to grok. Yes, a larger sample is better, but the larger sample size only increases the confidence of an estimate. Offline survey firms routinely conduct statistically significant samples that use only 500 to 1000 people to estimate the way that the nation feels on an issue. Using a simplified example, the confidence interval using a 99% precision of different sample sizes looks like this (assuming a population of 250 Million & using this sample size tool http://www.surveysystem.com/sscalc.htm):

  • 1000 (+/- 4.01%)
  • 10,000 (+/- 1.29%)
  • 100,000 (+/- 0.41%)
  • 1,000,000 (+/- 0.13%)
  • 10,000,000 (+/- 0.04%)

As you can tell, the effect of the extra 9 million sample points doesn’t greatly increase your confidence that Google Search improved or declined vis-a-vis Yahoo search. An increase in sample size does help collect enough data to estimate long-tail usage, but it doesn’t better quantify top websites.

Sample Bias

Offline measurement firms go through great lengths to reduce the bias of a sample. Folks like Gallup, Ipsos & Nielsen understand that sample bias can completely corrupt the results of a survey or sample. Examples of bias offline include:

  • bias that someone has a landline (younger people are less likely to have landlines)
  • bias that someone is home when you call (families are more likely to be home at dinner)
  • bias introduced by where you approach someone (in a mall or a coffee shop as an example)

There is a great wikipedia entry on the topic of selection bias (those discussed above), and sample bias in general.

Online, bias can be far worse and it is much more difficult to generate unbiased samples. A few examples of the bias introduced by Hitwise, Comscore & Alexa:

  • Hitwise: biased towards users at home (they get their logs from consumer ISPs)
  • Comscore: biased towards people that click on ads to ‘speed up their internet’ or ‘protect your computer from email viruses’
  • Alexa: biased towards people that install the Alexa toolbar. Widely believed to be webmasters that are curious about their site and those sites of others

In addition, all of the basic stats on these services can easily be gamed, introducing further bias. See Markus Frind’s post on how these services get gamed.

It is easy to see that online, sample bias has a greater impact on data quality than differences in sample size. One final note on sample bias. It is ok if you are aware of and can quantify the bias (therefore removing it). This can be an incredibly complex process, and can easily result in imperfect results that are believed to be accurate.

All of these services provide interesting data, but they are not dependable enough to make strategic business decisions.

« Previous PageNext Page »
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. | Dave Naffziger's BlogDave & Iva Naffziger