Matt Cutts (@mattcutts) who is the head of the Google Webspam Team put out a tweet today asking for help identifying Scraper Site URL’s that outrank the original source of content on Google. For us here at Mobility Digest we could not be more please! Mobility Digest certainly is not the largest Mobile Content provider on the Web but we have been victim of scrapers many times in the past. So what do you do if you come across one of these Scraper Sites with better ranking? Head on over to Google’s Scraper Report Page:
There is a pretty big conversation happening on this at Hacker News. The opinions seem to be mixed on Google’s intentions but at least for me it’s a start. Check it out and report any Scraping Sites you encounter. Not familiar with what a Scraping Site is?
Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.
Web scraping is closely related to web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, contact scraping, weather data monitoring, website change detection, research, web mashup and web data integration.