Man I Hate Content Thieves
|If you’ve ever run your own blog you may have noticed that the Internet is a cesspool of evil-doers who’ve set up their own WordPress blog to run around on other blogs’ RSS feeds and leach new posts to drop onto their own site. They write no original content of their own and oftentimes they don’t even pick which feeds to suck up as there are not only WordPress plugins that leach and publish, they hunt down RSS feeds themselves that appear to be relevant to whatever topics the scumbag selects. This is known as autoblogging, blog scraping and splogging. It’s that prevalent that there are actually multiple terms for the practice.
There are some autoblogging plugins, if you can believe it, and they are actually hosted on the main WordPress repository and not some shady torrent site, that go into each post it leaches using some sophisticated thesaurus-like tricks to reword and rephrase segments and even somehow link tags of each post so that the actual author has a more difficult time trying to Google down his leached content. And more often than not they hotlink the accompanying article images, meaning when someone views their blog it causes that browser to hit our server, draining a bit of bandwidth each time and adding to our CPU load. Both bandwidth, as you may have noticed if you’ve tried to load our website, and CPU load are very finite resources for us. Each little hit matters; it’s not entirely just a point of pride.
What’s even more frustrating is that these guys and their ISPs are about as responsive to requests to cease the theft as those door close buttons in elevators are (close, damnit!). You’re lucky if you can leverage the guy to remove your site from his list, a hollow victory as the guy will keep on doing his thing to a bunch of other blogs and just tell you Hey, relax guy, thought you’d be cool with it. Among their other active plugins, which undoubtedly and invariably include AdSense widgets, are advanced search engine optimization plugins (SEO) that examine and sift through the autoblogging blog’s unoriginal content to compile, dynamically, the best set of meta keywords and descriptions in order to get ranked higher on Google.
In short, they make money by registering a cheap domain name, installing WordPress (if their ISP didn’t already install it) and a couple plugins, leave the thing on autopilot and pull in money without any work beyond that and signing up for an AdSense account. Google could be an ally to the likes of us here as no one’s better at being able to detect this activity than them, Google downranking them accordingly, but Google has no way of knowing that we don’t have an arrangement with these guys so that ain’t happening. But WordPress hosting the plugins that facilitate this? Could someone please tell me what possible redeeming value there may be to a plugin that has the following in its description to warrant its redistribution through official channels? Look (but don’t download please) at this one which has had over 150,000 downloads so far. Just look at it. Blows my mind:
We stumble upon sites that do this to us routinely and it makes us mad and dizzy. Maybe our panties wad easily but it’s just infuriating. Even though it’s a pointless game of whack-a-mole, trying to stop each one, we try anyway and the real cost is not so much that visitors that would otherwise be visitors of our site (possibly) go to some other guy’s site and click his ads, but that we spend, collectively, a bunch of hours on each pursuit, time and energy we’d otherwise spend on writing more original content which should be our top priority. And then we’ll go back and forth with each other on, for example, installing an RSS fingerprint plugin to detect these guys automatically, whether to do a server trick to send out some sort of nasty image expressing our disapproval to the sites that hotlink our images, and then another one of us will note that that might screw up some of our RSS or Google Reader views, then another will say well no, not if you add this to .htaccess or apache2.conf or we should try this cloaking thing or … argh. A simple cost/benefit analysis on this, in my estimation, is to let it slide in almost every instance but we’re human and get emotional.
I was writing this just as a rant, to vent, but hey, if you’re ever in the mood, maybe Google a fragment of a recent post, maybe one posted six or seven slots down, see if you find anyone and if you happen to want to be a lawyer when you grow up, and then if you spot a content thief of either our content or another blog’s you frequent, see if you can write an intimidating or otherwise persuasive letter to make them stop, either to whoever you can figure out to be the blog’s contact either on the blog or the whois data or the ISP or both. Actually you know what, don’t do that, it’s a waste of time – and please don’t tip us off either as we’ll just go crazy whack-a-moling. Just keep reading our site instead and we’ll try to keep you entertained, informed and fully digested of mobility stuff or whatever would belong in our mission statement were we to write one. And to those of you reading this post anywhere other than our actual site, RSS or Google Reader or whatever, please come on over to MobilityDigest.com as, though the content may be similar to what you’re reading now, we worked harder on our theme than your guy did.
That’s it I’m done.
Doug Simmons
Well yes, hence having the RSS feed in the first place, but these guys don’t cite us unless it’s a manual copy/paste job. Some even replace the author’s name (as reported automatically by WordPress) with their own. I could plant our logo or whatever into every post but that wouldn’t help beautify the streams.
When you look at these sites, were you ever to stumble onto one, you’re given no indication that the content on that site came from another source.
I’m aware it does, but we’re presently trying to paddle upstream toward a larger audience, and going with teasers on the feed at this point is premature, just as would be trying to charge for subscriptions. For now we’re cool with Google Reader reading as that makes up a significant chunk of our audience (I use it) and Google does at least indicate where the content came from with a link to jump onto the site.
To try to harness that traffic further I fired up something that will dish out comments on articles into the feed at the end of each article, hopefully enticing people to jump over to the site to get in on the action or at least to get a more entertaining experience whether on Google, their RSS client or on the site.
I understand your frustions. On the other hand, arent you publishing to get to as many people as you can. Dont these really help with that goal?
Nice article Doug!
Look at your RSS feed it displays entire content. I use google reader and really enjoy it when I can get articles on my reader instead of going to the site cause its faster.
Also that mean most ads arent really translated to $$$!
Sites like gizmodo and cnet will only show excerpts in their feeds which works for them ,but i like how engadget does it. There are excerpts for most original articles or lengthy ones. Rest you do see if not all then most of the content in their feed. Heres one way to sort it out
Most tech blog content is a combination of new from various other sites and reviews and original content (though definition of that is still unclear)
So feed content for news should be displayed as full post and anything else excerpts.
Mobility Digest definitely has flavor!!!!!!
It is unfortunate that some choose to receive there content from a distance, rather than from the source, but some simply can’t stand the heat that results form some of the more volatile articles.
I personal prefer to experience the fun first hand, some of the more polarizing articles have me chuckling to my self for hours!!!!
I say NUKE the bastards! Back in the old CB Radio days, you could do some real damage to troublemaker with a really strong antenna. Maybe you could find a way to redirect all the spam you have to deflect to these idiot’s.sites That would make for some interesting reading.
Yeah well don’t quote me on this but I’ve suggested we coordinate some good old fashioned DDoS strikes on these sons of bitches. I’ve got 100mbps of pain myself to contribute…
Heh, I was hoping this article would get scraped for the irony (see the trackback). Actually this site in particular is a less egregious form of it as it’s just a teaser and it links to our site, no image hotlink, and I assume he’s doing something similar for all the other blogs he’s scraping, but still.
i just googled “tuck and paste”… at least the autobloggers are linking you guys instead of WMP LUSER though…