Algorithmic VS Manual SE spam detection
Search engines try to program all spam detection methods into their algorithm (spam: as in very unrelevant results), but there are many exceptions where they use people to verify if a website is really breaking the rules of the webmaster guidelines (Google quality guidelines).
When does a search engine use manual spam detection?
Required resources
The algorithms of search engines are getting more and more advanced and engines like Google can detect almost everything. The only thing is: “It take a lot of resources to do all checks all the time!”. Therefore only the things that use resources sparsely are checked in a continuous fashion, some things are checked less periodically and spam arbitrage is done manually.
Human intervention
Another problem besides server resources is: “When is spam intentional?”. Is some text hidden because someone wants to fool the search engine or is it to create a drop down menu? Some intentions can be “guessed” by algorithms, but sometimes you need a human to do the arbitrage. Search engines do as much as possible within their algorithm and humans are used more to teach the algorithm how to recognize spam, but human arbitrage is used by all mayor search engines.
The arbitrage process
There are certain degrees of illegal activities that Google (lets use them as an example) could distinct between.
- No spam, legal content and activities
- Possible spam in content. Detected, but impossible to determine algorithmicly.
(action: orange/red flag, little repercussion in ranking) - Certain light spam in content. Detected by algorithm.
(action: little repercussion in ranking) - Certain heavy spam in content. Detected by algorithm.
(action: red flag, mayor drawback in ranking) - Certain heavy spam in content. Flagged and verified by human.
(action: if heavy -> removed from visible index) - Possible spam in external factors. Detected, but impossible to determine algorithmicly.
(action: orange/red flag) - Certain spam in external factors. Detected, but impossible to determine if the owner is to blaim.
(action: orange/red flag) - Certain spam in external factors. Detected by algorithm.
(action: if heavy -> removed from visible index) - Certain heavy spam in external factors. Flagged and verified by human.
(action: if heavy -> removed from visible index)
The flags themself have little or no effect on your ranking, but as more red flags are raised for each violation the arbitrage gets easier for the search engine. When a human editor is needed, your websites (yes, websiteS!) will be examined more closely and your spammy tactic might get more then one site penalized. If you’re a notorious spammer you get watched more closely and you have a much bigger chance of being cought. If you’re really cool, your technique is the cause of a new algorithm change (you only will never find out if you were).
Using this to your advantage
Always concider the serverload needed to detect your spam tactic. Search engines hate wasting resources, so you might get away with it.
Don’t raise too many red flags because you might get away with some heavy spam if it raises just one questionable flag. Humans are an even more expensive resource so search engines keep those just for arbitrage when the outcome is almost certain (or to detect new spam techniques so the algorithm can get updated).
Once you’ve been flagged as a notorious spammer: “Change your identity!” and every link between your detected and undetected spam. They will keep a close eye on you!
December 6th, 2006 at 12:40 pm
Excellent article! Maybe you should post the damn thing on Digg!
February 20th, 2007 at 1:33 pm
All makes sense to me.
Where does the user-submitted spam report fit in I wonder? Surely wading through these spam reports is a human process, and I’m sure the spam report form gets spammed by people trying to ban their competetion.
I wouldn’t want to be the one writing the spam detection algos.