Google Says It Could Make Sense To Use Noindex Header With LLMS.txt

Google’s John Mueller answered a question about llms.txt related to duplicate content, stating that it doesn’t make sense that it would be viewed as duplicate content, but he also stated it could make sense to take steps to prevent indexing.

LLMs.txt

Llms.txt is a proposal to create a new content format standard that large language models can use to retrieve the main content of a web page without having to deal with other non-content data, such as advertising, navigation, and anything else that is not the main content. It offers web publishers the ability to provide a curated, Markdown-formatted version of the most important content. The llms.txt file sits at the root level of a website (example.com/llms.txt).

Contrary to some claims made about llms.txt, it is not in any way similar in purpose to robots.txt. The purpose of robots.txt is to control robot behavior, while the purpose of llms.txt is to provide content to large language models.

Will Google View Llms.txt As Duplicate Content?

Someone on Bluesky asked if llms.txt could be seen by Google as duplicate content, which is a good question. It could happen that someone outside of the website might link to the llms.txt and that Google might begin surfacing that content instead of or in addition to the HTML content.

This is the question asked:

“Will Google view LLMs.txt files as duplicate content? It seems stiff necked to do so, given that they know that it isn’t, and what it is really for.

Should I add a “noindex” header for llms.txt for Googlebot?”

Google’s John Mueller answered:

“It would only be duplicate content if the content were the same as a HTML page, which wouldn’t make sense (assuming the file itself were useful).

That said, using noindex for it could make sense, as sites might link to it and it could otherwise become indexed, which would be weird for users.”

Noindex For Llms.txt

Using a noindex header for the llms.txt is a good idea because it will prevent the content from entering Google’s index. Using a robots.txt to block Google is not necessary because that will only block Google from crawling the file which will prevent it from seeing the noindex.

Featured Image by Shutterstock/Krakenimages.com

Continue Reading