The Statistics Behind Digg Submissions
Ever since Digg announced their API, I’ve been eager to see what stats I could generate. Since my wife is out at Book Club tonight, I spent a bit of time with Digg’s API. All of the analysis below was conducted on all of the stories submitted in May:
How long does it take for stories to get promoted?
Very few stories get promoted within 2 hrs. And very few stories get promoted after 24 hours. There is definitely a window of opportunity that lasts for 24 hours after submission.
Introducing ‘Promote Rate’
Up to date, the most interesting studies done on Digg have involved basic analysis of already promoted stories. Pronet Advertising has a good look at the top 10 brands on Digg, and SEOMoz has a YouMoz article on Digg that talks about the best time to submit a story.
While both of these articles are quite interesting, I think the greatest indicator of success on Digg is something I’ve been calling ‘Promote Rate’. Basically, it is the percentage of stories of a given set of characteristics that were promoted to the first page.
Best Time of Day to Submit to Digg:
Promote rates are higher on the weekends and in the evenings. A story submitted around 9PM on a weekday enjoys a 66% higher promotion rate than an 8 AM post.
Best Category to Submit to:
OK, so submitting an article to “Linux/Unix” looks to be 16x more likely to get promoted than if you submitted an article to “Business & Finance”. Certainly Diggers prefer Linux stories to the latest TPG buyout.
How much of this preference is topical vs. the category of the article? I looked at all of the stories submitted with the word ‘Linux’ in the title inside and outside the “Linux/Unix” category:
Articles with the word ‘Linux’ in the title are promoted 9x more frequently if they are submitted in the “Linux/Unix” category.
Does having a user image matter?
Users with images have more stories promoted than users without images. I would posit that a user image may indicate an active user with more friends, but submit stories without an image at your own risk =).
Anyway, that’s all for this evening. I’m looking at a few more things and will post a follow up in a little while.
Notes:
- Be careful with causality. While I think some of the conclusions are reasonable, I haven’t always gone to the extent necessary to prove causality - we may just be seeing correlation.
- I experienced XML errors with a small fraction of the calls to the Digg API - I didn’t try to recover these records, so the dataset is not 100% complete.