Robots.txt
Jump to navigation
Jump to search
robots.txt is a file that you can place in the root of your web server in order to let search engines know certain things about your site, for example how often they should revisit your site to adapt their search indexes.
Usage
- Create a file robots.txt
- This file must be accessible via HTTP on the local URL "/robots.txt"
Format
<field>:<optionalspace><value><optionalspace>
User-agent: <The value of this field is the name of the robot> Disallow: <The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.> Request-rate: <maximum rate> Visit-time: <to define a visit time range> Crawl-delay: <to wait between successive requests to the same server>
Examples
The examples below are taken from robotstxt.org
The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/", or /foo.html:
- robots.txt for http://www.example.com/
User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/ # these will soon disappear Disallow: /foo.html Request-rate: 1/5 # maximum rate is one page every 5 seconds Visit-time: 1000-1200 # only visit between 10:00 and 12:00 UTC (GMT) Crawl-delay: 10 # 10 seconds to wait between successive requests to the same server
This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called "cybermapper":
- robots.txt for http://www.example.com/
User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space
- Cybermapper knows where to go.
User-agent: cybermapper Disallow:
This example indicates that no robots should visit this site further:
# go away User-agent: * Disallow: /
Efficiency
- The protocol is purely advisory.