The Robots Meta Tag and Its Options for Search Engine Guidance
Controlling search engine crawling and indexing to align with your website preferences can prove challenging. While robots.txt manages crawler accessibility to your content, it does not address content indexation. This is where robots meta tags and the x-robots-tag HTTP header come into play.
Clarifying Misconceptions: First and foremost, it is crucial to recognize that robots.txt cannot control indexation. A common misconception is attempting to use the noindex rule within robots.txt to achieve this goal, but it was never officially supported by Google and was deprecated in July 2019.
Introduction to the Robots Meta Tag: A robots meta tag is an HTML snippet that instructs search engine robots on permissible actions for a particular page. It enables control over crawling, indexing, and the presentation of information in search results. This tag is placed within the <head> section of a webpage.
Example: <meta name="robots" content="noindex, nofollow">
Importance of the Robots Meta Tag for SEO: The robots meta tag is commonly used to prevent certain pages from appearing in search results. However, its applications extend to other scenarios as well.
Content types to prevent search engines from indexing include:
- Thin pages lacking value for users.
- Pages in the staging environment.
- Admin and thank-you pages.
- Internal search results.
- PPC landing pages.
- Pages featuring upcoming promotions, contests, or product launches.
- Duplicate content (canonical tags are recommended for the preferred version).
Attributes and Values of the Robots Meta Tag:
Robots meta tags comprise two attributes: “name” and “content.”
Name attribute and user-agent values:
The name attribute designates which crawlers should adhere to the instructions. User-agents (UAs) represent specific crawlers, such as Googlebot or Googlebot-image.
The UA value “robots” pertains to all crawlers. Multiple robots meta tags can be included in the <head> section. For example, to prevent images from appearing in Google or Bing image search:
<meta name="googlebot-image" content="noindex">
<meta name="MSNBot-Media" content="noindex">
(Note: Attributes are non-case sensitive, and various variations, such as “Googlebot-Image,” “msnbot-media,” and “Noindex,” work similarly.)
Content attribute and crawling/indexing directives: The content attribute provides instructions for crawling and indexing content on the page. If no robots meta tag is present, crawlers assume “index” and “follow,” permitting search result display and crawling all links (unless the rel=”nofollow” tag is used).
Google supports the following values for the content attribute:
- all (default value, typically not used).
- noindex (prevents indexing and search result display).
- nofollow (prevents crawling of all links on the page).
- none (combination of noindex and nofollow, not recommended due to lack of support by other search engines like Bing).
- noarchive (prevents Google from showing cached copies in the SERP).
- notranslate (prevents Google from offering translations in the SERP).
- noimageindex (prevents Google from indexing embedded images).
- unavailable_after (instructs Google not to show a page in search results after a specified date/time in RFC 850 format).
- nosnippet (prevents all text and video snippets within the SERP; also works as noarchive).
(Note: Since October 2019, Google offers more granular snippet control due to the European Copyright Directive.)
Additional Supported Robots Tags
Using the Max-snippet, Max-image-preview, and Max-video-preview Directives: Google introduced more snippet control options due to the European Copyright Directive. These directives impact how snippets are displayed in search results.
max-snippet:
- Use 0 to opt out of text snippets.
- Use -1 for no limit on the text preview.
- For a specific snippet length, set a value (e.g., 160).
max-image-preview:
- “none” prevents image snippets.
- “standard” allows default image previews.
- “large” displays the largest image previews (recommended with 1200px wide images for Google Discover visibility).
max-video-preview:
- Use 0 to opt out of video snippets.
- Use -1 for no limit on the video preview.
- For a specific duration, set a value (e.g., 15 seconds).
Using the Data-Nosnippet HTML Attribute:
In addition to the new robots directives, Google introduced the data-nosnippet HTML attribute. It can be used to tag text parts that should not be used as snippets. The attribute can be applied to div, span, and section elements. It is a boolean attribute, valid with or without a value.
Use Cases for the X-Robots-Tag:
The x-robots-tag HTTP header is used to prevent search engine indexing for content like images and PDFs. Unlike the robots meta tag, the x-robots-tag is sent as an HTTP header and not placed in the HTML of the page.
Setting Up the X-Robots-Tag:
The X-Robots-Tag configuration depends on the web server type used and the desired pages or files to keep out of the index. For example, in Apache, the code to set noindex headers on PDF files would be:
Header set X-Robots-Tag “noindex”
Considerations for Robots Meta Tag vs. X-Robots-Tag:
While the robots meta tag is suitable for HTML pages, the x-robots-tag is essential for preventing indexing of non-HTML files like images or PDFs. Moreover, when dealing with bulk changes or search engines beyond Google, x-robots-tags may be more efficient or necessary.
Avoiding Crawlability and Indexation Mistakes
To maintain effective indexation control and avoid issues, consider the following precautions:
- Avoid disallowing pages with noindex directives in robots.txt.
- Keep noindexed content in the sitemap until successfully deindexed.
- Remove noindex directives from the production environment after proper testing.
- Prevent “secret” URLs by using noindex rather than robots.txt disallow.