Google’s John Mueller posted one of those PSA tweets this morning asking hosting companies that trigger robot detection interstitials to not use a 200 status code. John also said not to place a noindex on that page. If you do these things the pages served with these can be dropped out of the Google index and search results.
Instead, remove the noindex and serve a 5xx status code, that will help Googlebot deal with the robot detection interstitial.
Here is that tweet:
If you run a hosting service, and have a “you might be a bot” interstitial, make sure it uses a non-200 HTTP result code (perhaps 503), and that it doesn’t have a noindex on it. Serving a noindex+200 will result in pages being dropped from search, if search engines see it. pic.twitter.com/LFGQcq2dzf
— 🐄 John 🐄 (@JohnMu) January 17, 2022
Then John explains how to do proper Googlebot detection, a topic we covered numerous times here before. Here is that tweet:
If you want to make sure Googlebot doesn’t run into these pages, you can find out how to verify it with https://t.co/32pLcCkiXz (some SEO tools will crawl with a Googlebot user-agent, so you can’t rely on the user-agent alone).
— 🐄 John 🐄 (@JohnMu) January 17, 2022
Forum discussion at Twitter.