If you’re keeping track of your website’s traffic through Google Analytics, you’ve probably noticed referral visits from a website called semalt.com in recent weeks. Semalt is a web crawler designed to gather data for Senmalt’s marketing platform. The visits showing up in your logs are automated programs interacting with your site.
The difference between Semalt.com and reputable crawlers
If you look through your Google Analytics referral data, you’ll notice that the other large web crawlers such as Googlebot, MJ12bot, Rogerbot and Bingbot don’t show up in your logs. Semalt’s crawlers showing up in your traffic logs is unusual because most bots identify themselves as web crawlers and will thus be excluded from your traffic data. This results in skewed traffic data, especially for smaller sites for whom semalt.com vists make up a larger percentage of their traffic.
Semalt also doesn’t respect robot.txt (a easy way for webmasters to keep bots from their sites) instead asking that concerned webmasters seek them out and add themselves to a no-crawl list that Semalt maintains. I reached out to Semalt’s Alex Andrianov on twitter to ask if their crawlers were ignoring robots.txt. He confirmed that Semalt.com’s crawler doesn’t respect robots.txt and claimed that they were unable to have it do so.
How to stop semalt.com from visiting your site
As Alex suggests, you can submit your site to Semalt to ask for removal from their crawl at their site though there’s no way to tell if they’ll act on this request. As I’m inclined to distrust crawlers that don’t respect robot.txt I’ve opted to block their access to my site through .htaccess as outlined by logorrhoea.net.
Update 15/4/14: Semalt manager Alex Andrianov suggested that parts of this post may be factually incorrect as I failed to note I am not a Semalt customer. I would like to state that I am not a Semalt client but that I stand by the information listed here are true and welcome any factual corrections.