You may end up confusing Google when you have “garbage” parameters trailing in your URLs, espesially when it comes to translated content parameters. There is this interesting conversation when a large multilingual site found its translated content excluded from Google Search with a “crawled currently not indexed” status.
The SEO seemed very knowledge and to do his homework before coming to John Mueller of Google for help. John basically said this might be related to the the parameter at the end with the language code. John said “what can happen is that when we recognize that there are a lot of these parameters there that lead to the same content, then our systems can kind of get stuck into a situation well maybe this parameter is not very useful and we should just ignore it.”
John then gave some tips on how to use the URL parameter tool in Search Console to help Google know that those URLs should be indexed. And also, maybe how to use redirects and clean URLs to enforce that when Google crawls those URLs.
Here is the video, it starts at the 53:14 mark:
Here is the transcript:
Question:
I work on a fairly large multilingual site and in April last year, just all in one go all of our translation content or translated content moved from valid to excluded crawled currently not indexed and there it has stayed since April. You know because it happened all at once we thought maybe there was some systemic change on our side we get a massive change to our hosting platform, content management system, etc. We combed through the code extensively, we can’t find anything, we can’t find any change to content, we don’t see any notes in the google search release notes that look like they’re they’ll be affecting us as far as we can tell. We’ve also been pretty thorough going through and just doing best practice searches with Search Console . We’ve cleaned up our hreflang, canonicals, URL parameters, manual actions and and every other tool that’s listed on developers.google.com/search. I’m just about out of ideas. I don’t know what’s happened or what to do next to try to fix the issue but I’d really like to get our translated content back in the index.
Answer:
I took a look at that briefly before and passed some of that on to the team here as well. One of the things that I think is sometimes tricky is you have the parameter at the end with the language code, I think hl equals whatever. From our point of view what can happen is that when we recognize that there are a lot of these parameters there that lead to the same content, then our systems can kind of get stuck into a situation well maybe this parameter is not very useful and we should just ignore it. And to me it sounds a lot like something around that line happened.
And partially you can help this with the URL parameter tool in Search Console to make sure that that parameter is actually set – I do want to have everything indexed.
Partially what you could also do is maybe to crawl a portion of your website with, I don’t know, local crawler to see what what kind of parameter URLs actually get picked up and then double check that those pages actually have useful content for those languages. In particular things like like a common one that i’ve seen on sites is maybe you have all languages linked up and the Japanese version says oh we don’t have a Japanese version here’s our English one instead. Then our systems could say well the Japanese version is the same as the English version maybe there are some other languages the same as the English version we should just ignore them.
And sometimes this is from links within the website, sometimes it’s also external links, people who are linking to your site. If the parameter is at the end of your URL, then it’s very common that there’s some kind of garbage attached to the parameter as well. And if we crawl all of those URLs with that garbage and we say oh well this is not a valid language here’s the English version, then it again kind of kind of reinforces that loop where systems say well maybe this parameter is not so useful.
So the cleaner approach there would be if you have kind of garbage parameters, to redirect to the cleaner ones. Or to maybe even show a 404 page and say well we don’t we don’t know what you’re talking about with this URL. And to really cleanly make sure that whichever URLs we find we actually get some useful content that is not the same as other content which we’ve already seen.
Forum discussion at YouTube Community.