There is a Ukranian bot out there that is crawling and distorting the stats for millions of websites, and to some degree, is affecting every single site I’ve looked at this past week.
Go to Google Analytics, look at your referrers, and I’ll bet you a beer that SEMalt and crawler.semalt.com are both listed with dozens of visits over this past month. In some cases we’re seeing a history with them dating back to January 2 of this year.
What is SEMalt Up To?
According to one of their employees:
Then he answered again, and fed me this load of BS:
An accident? That’s a lie.
In a page on their website it says this:
Semalt crawler bots visit website and gather statistical data for our service simulating real user behavior: unique IP, browser, display resolution etc. This information is used exclusively within the Semalt.com project and isn’t revealed to a third party.
On their “about” page (their menu is in their footer) they claim to offer various tools, like keyword ranking, brand monitoring, reports, competitor explorer, website analyzer and a report system.
Presumably, these crawls are feeding their “competitor explorer” with info they then provide to their paying subscribers, but I don’t know that to be true.
Here on SEMpdx, here’s what the referrals looked like for March, where they visited nearly every single day for a total of 94 times.
Does it Matter?
If you’re a medium-sized website you probably didn’t notice, but if you’re a local business that only gets a few hundred visitors a month, you may just find that they are your number one referrer, and that’s severely distorting your stats.
They appeared to be friendly enough when I first Tweeted at them the other day, but I’ve done some digging now, and I distrust them…
Why do I distrust them?
- They visit sites from no consistent IP address or IP range
- They are stealing your bandwidth
- They are using your server resources
- They are skewing your stats
- They do not follow robots.txt
Depending on the size of your site, may be drastically skewing your overall statistics from your overall visitor count, to your conversion percentages and bounce rates.
For example, for one new local client with only about 300 visitors last month, SEMalt accounted for over 70 visits, which is more than 20% of their all their traffic!
A Special Message
SEMalt has managed to anger this site owner so much, that they added a special message just for them in his website header:
How can SEMalt be stopped?
They put up a page where you can supposedly list your domain for removal, but again, I don’t think I trust them. Here’s a link to their “removal” tool
Since I’ve discovered that blocking them via robots.txt didn’t work, and found that blocking their IP wasn’t possible, I began looking for the best way to edit our .htaccess file, and I had to try a couple of options before I found something that would work on the SEMpdx server.
Rather than provide you with .htaccess code here, which may or may not work for you in your hosting environment, I’ll refer you to a very useful post, where there are a lot of folks discussing the semalt situation and that’s where they show several options for .htaccess editing.
If they would simply obey a sites robots.txt, I think a lot of people would not worry about it, and might even try their service. Until they do though, we’re aggressively blocking them.
Scott Hendison is the CEO of Search Commander, Inc. and a recovering affiliate marketer. He is also one of the founding board members of SEMpdx. Find out more about him at his website, SearchCommander.com.
We noticed it a few weeks ago and up to 300 visits within that time. Either way, they quickly made a name for themselves by penetrating statistics over more traditional approaches. It would be fascinating to see how their link profile grows now.
Step 1 would be to filter them out of referrer data using a custom filter as given in Google Analytics here https://support.google.com/analytics/answer/1034842?hl=en
Yes, they can be filtered, good point, and we had to do that for a handful of client reports.
I know a lot of people have been talking about semalt. I know they are not respecting robot.txt files. I will try and block them via htaccess file. I wish the could code their software better so we didn’t have to block them.
They could easily choose to follow robots.txt if they cared to, and the .htaccess option seems like a drastic option but is likely best – however – that’s a hell of a lot of work editing those files for every single domain or client.
In some (many) cases we don’t even have access to .htaccess or robots, but from their own “bulk blocking” page, we were able to drop in 130+ domain names all at once, and within two days all the referral visits seemed to stop.
Thanks for the helpful informations. All our Sites in Germany are polluted by Semalt yet.
must read RT @nabble_nl Here’s proof that @SoundFrostorg is the spambot used by @SemaltCom https://blog.nabble.nl/post/93306955157/semalt-infecting-computers-to-spam-the-web
I’d like to clarify why Semalt has negative response. Daily our technical robots visit many websites. These robots harvest statistical data for our service and don’t cause any harm to the users’ web resources.
We understand each user who asks us to disable crawling activity on their websites. We bring apologies to each user and honor the request. So we’ve created a special tool to remove sites from the list of web resources we visit – Semalt Crawler https://semalt.com/project_crawler.php
Anyway, we’re always glad to answer all the questions regarding to Semalt.
Thanks for responding –
You claim that “these robots… don’t cause any harm to the users’ web resources”. I would argue that is false. Merely by using the resources and distorting referrer stats you are causing problems.
You also claim you are “glad to answer all the questions regarding to Semalt” – Well, here’s mine… Why don’t you follow users robots.txt?
You’re definitely not alone, especially in being threatened by their Twitter PR.
https://thenewfr0ntier.blogspot.nl/2014/03/anyone-running-blogger-or-wordpress.html
https://blog.nabble.nl/post/93306955157/semalt-infecting-computers-to-spam-the-web
and my own
https://blog.flameeyes.eu/2014/08/antibiotics-for-the-internet-or-why-blocking-semalt-crawlers
I wish I had read Nabble’s post before writing mine, they are much worse than I made them out to be.
Unless you manage websites for local businesses and Semalt accounts for well over 50% of your monthly traffic. I’ve got one site now where Semalt accounts for 72.25% of the traffic and has a 100% bounce rate – and that’s not counting youtube.downloader and kambasoft which skew the data even further.
Filtering this stuff is a huge pain.
Agreed. F*ck them 😉
How to block semalt in nginx? thx
Sorry, Alena, that’s not something I can help with. It’s probably easiest to just go to their site and add your domain to their “do not crawl” list.
I can confirm that URL’s submitted to their removal list end up crawled again after a few months. I did a blanket submit of all our clients back in February and Semalt is starting to show up in the stats again for some of them. They are not gaining any points with us or our clients…
I didn’t submitted all clients until April 1st or 2nd, and just I spot checked four, with none having any return visits yet… Thanks for the heads up though, Michael, i’ll keep an eye out
Glad to see I am not the only webbie who went ballistic when they saw this crap in their stats. Kinda like the U2 shit on my iPhone, no real biggie, just a pain, something I didn’t ask for and don’t want……. and MAJOR distraction in my day.
I am thinking to send an invoice to the scammers.
My tab to the uninvited Ukrainians will include:
The time I had to take to track their IP [semalt.com (217.23.11.15) and semalt.semalt.com (217.23.7.144)] Ran a tracert which gave me these yesterday and today, but it looks like they change from time to time, so I’m ready to block all traffic from 217.23 (their Dutch ISP).
Time to research who the f**k they are.
Time to handhold one of my clients, an elderly artist whose low volume website is an underpaid labor of love on my part (treat your legacy clients well, they got ya up and running!!).
Her usual 20 or 30 hits of real visitors a day was indeed significantly skewed, and those visits are very important to HER!!! More to the point it took my patience and time to explain that she had not really had a sudden magic jump in visitors, nor was site in “danger”, and who this flaky robotic visitor was, and what their hidden agenda really was.. the time for this task is still TBD!!!!
This is why webbies get grey!
SEMalt is back – just noticed – 63 visits over the past 30 days