blob: d95b6292111a432c300f1724d72f874fd94e0171 [file] [log] [blame]
Andrew Hsieh9a7616f2013-05-21 20:32:42 +08001
2:mod:`robotparser` --- Parser for robots.txt
3=============================================
4
5.. module:: robotparser
6 :synopsis: Loads a robots.txt file and answers questions about
7 fetchability of other URLs.
8.. sectionauthor:: Skip Montanaro <skip@pobox.com>
9
10
11.. index::
12 single: WWW
13 single: World Wide Web
14 single: URL
15 single: robots.txt
16
17.. note::
18 The :mod:`robotparser` module has been renamed :mod:`urllib.robotparser` in
19 Python 3.
20 The :term:`2to3` tool will automatically adapt imports when converting
21 your sources to Python 3.
22
23This module provides a single class, :class:`RobotFileParser`, which answers
24questions about whether or not a particular user agent can fetch a URL on the
25Web site that published the :file:`robots.txt` file. For more details on the
26structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
27
28
29.. class:: RobotFileParser(url='')
30
31 This class provides methods to read, parse and answer questions about the
32 :file:`robots.txt` file at *url*.
33
34
35 .. method:: set_url(url)
36
37 Sets the URL referring to a :file:`robots.txt` file.
38
39
40 .. method:: read()
41
42 Reads the :file:`robots.txt` URL and feeds it to the parser.
43
44
45 .. method:: parse(lines)
46
47 Parses the lines argument.
48
49
50 .. method:: can_fetch(useragent, url)
51
52 Returns ``True`` if the *useragent* is allowed to fetch the *url*
53 according to the rules contained in the parsed :file:`robots.txt`
54 file.
55
56
57 .. method:: mtime()
58
59 Returns the time the ``robots.txt`` file was last fetched. This is
60 useful for long-running web spiders that need to check for new
61 ``robots.txt`` files periodically.
62
63
64 .. method:: modified()
65
66 Sets the time the ``robots.txt`` file was last fetched to the current
67 time.
68
69The following example demonstrates basic use of the RobotFileParser class. ::
70
71 >>> import robotparser
72 >>> rp = robotparser.RobotFileParser()
73 >>> rp.set_url("http://www.musi-cal.com/robots.txt")
74 >>> rp.read()
75 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
76 False
77 >>> rp.can_fetch("*", "http://www.musi-cal.com/")
78 True
79