Interview with Semalt.com

Yesterday I posted about Semalt.com’s crawler and their unusual choice not to have their crawlers identify themselves as web crawlers or obey robots.txt, causing heartaches for analytics loving webmasters across the web. Semalt’s manager Alex Andrianov reached out through twitter and offered to answer some of my questions via email. The exchange is included in whole below.

Hi, thanks for taking the time to chat with me in a bit more detail about Semalt. Happy to update my blog post with factual corrections you’re able to provide.You’ve mentioned on twitter that Semalt does not obey robots.txt, further saying that “can’t change it”. Could you explain in a bit more detail what keeps Semalt’s bots from identifying themselves as bots or obeying robots.txt? Is this a talent issue, where your developers haven’t been able to discover the processes to undertake this, or is this part of a business decision on Semalt’s part?Are there plans in the future to have Semalt’s bots identify themselves properly as crawlers and to obey robots.txt?

You also claimed that my comments at http://www.closetoclever.com/semalt-com/ were incorrect, as I was not a Semalt client. Were there any specific factual errors that you would like to address?

Thanks again for taking the time to answer these,

Jessica Rose

 

Hello Jessica RoseThanks for your email.First of all I would like to bring apology on behalf of my company if our bots caused you some difficulties. I can assure you, all the visits on your website were accidental. At this moment our specialists are taking drastic actions to prevent these visits. Thank you for pointing to our service drawbacks. We appreciate your help and it is very important to us.

Our service has been launched quite recently and unfortunately there are still some bugs and shortcomings. Please, respect this fact. We are working hard trying to fix the existing errors and I hope soon our users won’t have any claims.

As you might notice, every user can manually remove their URLs from the Semalt database using Semalt Crawler. Furthermore, our Support center specialists are ready to come to the aid and remove URLs from the base once the website owner submits a request. We consider every single request and guarantee that every user will get a proper respond.

We realize this may bring some inconveniences, but unfortunately at the moment we can’t offer another way of solving this issue.

As for the comment posted on your blog, I believe it’s impossible to evaluate all the pros and cons unless you have the complete picture of the service. Probably once you try to use Semalt features you will change your mind.

Anyway, we thank you for your feedback, since we appreciate every opinion relating Semalt.

Sincerely yours,

Semalt LLC manager , Alex Andrianov

 

Thanks for the response, but would it be possible to have you address my specific questions more directly?1. Are you claiming that your bots’ failure to identify themselves as web crawlers is due to a technical failure?
2. Are you claiming that your bots not obeying robots.txt is due to a technical failure?
3. Do you have plans to make your bots identify themselves as web crawlers?
4. Do you have plans to have your bots comply with robots.txt?Jessie

 

Dear Jessica,I will try to give the most definite answers to your questions. As I mentioned before our service has recently appeared on the web which causes some technical unavailability. Today we upgrade the web scanning process and adjust our robots. Unfortunately sometimes Semalt bots visit random websites, but we do all our best to solve this problem in the shortest possible time.Thank you for your email and interest to Semalt.com service. Your opinion is very important to us.

Sincerely yours,

Semalt LLC manager, Alex Andrianov

I’m not sure that’s answering much. I’m really looking to find out:1. Will your crawlers be respecting robots.txt after your upgrade?
2. Will your crawlers be identifying themselves as web crawlers after your upgrade?Jessie

He hasn’t yet replied to this email, but responded to tweets on the subject:
semalt3
semalt4

What we learned from this exchange:

Nothing, really. There were some vague claims that the problems I’ve listed were “bugs” but no specific addressing of the problems of Semalt bots ignoring robots.txt or failing to properly identify themselves as web crawlers. Apparently several weeks of visits to sites across the web were “accidental”.

Why this is nonsense:

Given how easy creating robots.txt compliant crawlers are, failure of bots to identify themselves as web crawlers or obey robots.txt can only be viewed as a deliberate choice of the designer or gross incompetence. While my technical skills are also substandard, I’m confident that I would be able to put together a simple webcrawler that obeys robots.txt over the weekend (check in on Tuesday, I’ll be posting the results of my efforts). For a professional enterprise who sources data through crawling the web to claim following industry conventions is beyond their technical ability leaves me wondering if they’re fools or liars.

4 thoughts on “Interview with Semalt.com

  1. Classic. You went through the trouble of enumerating your questions. Yet, he seems to be either unable or unwilling (more likely) to answer. I would consider this very rude at least. Either that or someone needs to go learn themselves some English…

    Saying things about people not minding google/bing bots simply shows incompetence. Goes to show that as much as you’d like to “communicate while not communicating anything” you will end up implying things.

    1. Yeah, it was fairly frustrating. It makes me increasingly sure they’re scraping intentionally and have no plans to stop

    1. I have tried http://semalt.com/project_crawler.php but I think it is not working. They are doing it intentionally to harm websites. I have also seen ranking drops. This is very serious thing. It should be stopped as early as possible before they infect other sites with their So called Semalt Crawler (A VIRUS).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>