How to Create Robots.txt File using pyrobotstxt?
In this tutorial, I will show how you can use pyrobotstxt
to create Robots.txt files.
Installation
You need to install pyrobotstxt
package using pip
with the following command. It should also work with other methods, e.g. pipenv
or poetry
. If you encounter any installation problems, please create an issue
.
pip install pyrobotstxt
Usage
There are several use cases for pyrobotstxt
.
How to find Robot Details?
You can use it to check the details of a search bot. Just specify a keyword, e.g. Google
, and it will provide a list of Google bots in the pyrobotstxt
database.
from pyrobotstxt import RobotsTxt
print(RobotsTxt().robots_name("Google"))
You can also do a reverse search by providing a robot name. Again, you will get details about that bot.
from pyrobotstxt import RobotsTxt
print(RobotsTxt().robots_details("Googlebot"))
How to Create Robots.txt File?
You can create a robots.txt file by creating an object of the RobotsTxt
class.
from pyrobotstxt import RobotsTxt
robots_file = RobotsTxt()
You can add a header and footer section. These are comments for humans looking into this robots.txt file. In the header section, you can also include file creation date
. It is handy for archiving purposes.
robots_file.include_header("Welcome Crawlers", append_date=True)
robots_file.include_footer("Good Bye Crawlers")
In robots.txt file, rules are specified as per user agent. pyrobotstxt
offer a UserAgent
class. You can use it to create multiple user agent. Default user agent is *
.
ua_general = UserAgent(name="*")
After creating a user agent, you can add all those routes/pages/images you want to Allow or Disallow.
ua_general.add_allow(
allow_items=["/home", "/deep", "/home"],
unique=True,
comments="This is a list of allowed items",
)
ua_general.add_disallow(
disallow_items=["/nopi$", "/topi?a", "/img*.png$"],
unique=True,
comments="This is a list of allowed items",
)
Here is a complete example of a user agent. You can also include a sitemap of your website.
ua_general_google = UserAgent(name="Google")
ua_general_google.add_allow(
allow_items=["/home", "/deep", "/home"],
unique=True,
comments="This is a list of allowed items",
)
ua_general_google.add_disallow(
disallow_items=["/nopi$", "/topi?a", "/img*.png$"],
unique=True,
comments="This is a list of allowed items",
)
ua_general_google.add_sitemap("https://seowings.org/sitemap.xml")
After you have prepared user agents, you can add them to the RobotsTxt
object. This object keeps a list of all the user agents.
robots_file.add_user_agent(ua_general)
robots_file.add_user_agent(ua_general_google)
You can also include any image (ASCII
format) in your robots.txt file. For example, add the following command in your script/program to include an ASCII
image in your robots.txt file.
robots_file.include_image("logo_dark.png", 90)
In the end, you can save this file to the desired location. The default name of the file is robots.txt.
robots_file.write("robots.txt")
How to Read Robots.txt File?
You can read a robots.txt file from a remote server or a website by creating an object in the RobotsTxt
class.
Once the file is read from a remote server, then you can use it for any operations specified in our API documentation e.g. to write it to a local file.
# Read Remote File
robots_file = RobotsTxt()
robots_file.read("https://nike.com/robots.txt")
robots_file.write("nike_robots.txt")
print (robots_file.robots_details("Baiduspider"))
Developers Tutorial
How to Install pyrobotstxt on your Computer?
You can install using the virtual environment.
C:\src\pyrobotstxt> python -m venv .venv
C:\src\pyrobotstxt> .\.venv\Scripts\activate
(.venv) C:\src\pyrobotstxt> pip install -e .
How to Contribute to pyrobotstxt?
Feature Suggestions
Do you have any feature suggestions, improvements and then create an issue on this repository.
Pull Requests
Have you improved anything in pyrobotstxt, then please create a usuall pull request and we will merge it after review.
Collaborations
You can contact us using our website serpwings.com/contact.