.Stat Suite documentation

Enabling and optimising external index of web crawlers

This page is a guide on how to configure the policy for search engine crawlers on the 3 .Stat Suite applications Data Explorer, Data Viewer and Data Lifecycle Manager.

robots tags

For each application, both HTML header <meta name="robots" /> and HTTP header x-robots-tag contents can be configured through an environment variable ROBOTS_POLICY. For both Data Explorer and Data Viewer, the default value is set to all, while default none is set for Data Lifecycle Manager.

Main functions for the search engine crawlers:

  • <meta name =”robots” content=”follow”> : a command for the search engine crawler to follow the links in that webpage
  • <meta name =”robots” content=”index”> : a command for the search engine crawler to index that webpage
  • <meta name =”robots” content=”nofollow”> : a command for the search engine crawler NOT to follow the links in that webpage
  • <meta name =”robots” content=”noindex”> : a command for the search engine crawler NOT to index that webpage
  • <meta name =”robots” content=”none”> : corresponds to the combination of ”noindex,nofollow”
  • <meta name =”robots” content=”all”> : corresponds to the combination of ”index,follow”

robots.txt file

For each application, a default robots.txt is served with the default content:

User-agent: *
Disallow: /

In order to configure your own robots.txt, you can, following your deployment strategy:

  • For a source code installation, edit the file src/server/robots.txt before build
  • For a docker installation, mount a volume:
docker run --mount type=bind,source=path/custom_robots.txt,target=/server/robots.txt -d data-explorer
  • redirect the route url domain/robots.txt to your own file