How to Create Robots.txt File using pyrobotstxt?

25-02-2023
Python

In this tutorial, I will show how you can use pyrobotstxt to create Robots.txt files.

Installation

You need to install pyrobotstxt package using pip with the following command. It should also work with other methods, e.g. pipenv or poetry. If you encounter any installation problems, please create an issue.

pip install pyrobotstxt

Usage

There are several use cases for pyrobotstxt.

How to find Robot Details?

You can use it to check the details of a search bot. Just specify a keyword, e.g. Google, and it will provide a list of Google bots in the pyrobotstxt database.

from pyrobotstxt import RobotsTxt
print(RobotsTxt().robots_name("Google"))

You can also do a reverse search by providing a robot name. Again, you will get details about that bot.

from pyrobotstxt import RobotsTxt
print(RobotsTxt().robots_details("Googlebot"))

How to Create Robots.txt File?

You can create a robots.txt file by creating an object of the RobotsTxt class.

from pyrobotstxt import RobotsTxt
robots_file = RobotsTxt()

You can add a header and footer section. These are comments for humans looking into this robots.txt file. In the header section, you can also include file creation date. It is handy for archiving purposes.

robots_file.include_header("Welcome Crawlers", append_date=True)
robots_file.include_footer("Good Bye Crawlers")

In robots.txt file, rules are specified as per user agent. pyrobotstxt offer a UserAgent class. You can use it to create multiple user agent. Default user agent is *.

ua_general = UserAgent(name="*")

After creating a user agent, you can add all those routes/pages/images you want to Allow or Disallow.

ua_general.add_allow(
    allow_items=["/home", "/deep", "/home"],
    unique=True,
    comments="This is a list of allowed items",
)

ua_general.add_disallow(
    disallow_items=["/nopi$", "/topi?a", "/img*.png$"],
    unique=True,
    comments="This is a list of allowed items",
)

Here is a complete example of a user agent. You can also include a sitemap of your website.

ua_general_google = UserAgent(name="Google")
ua_general_google.add_allow(
    allow_items=["/home", "/deep", "/home"],
    unique=True,
    comments="This is a list of allowed items",
)
ua_general_google.add_disallow(
    disallow_items=["/nopi$", "/topi?a", "/img*.png$"],
    unique=True,
    comments="This is a list of allowed items",
)
ua_general_google.add_sitemap("https://seowings.org/sitemap.xml")

After you have prepared user agents, you can add them to the RobotsTxt object. This object keeps a list of all the user agents.

robots_file.add_user_agent(ua_general)
robots_file.add_user_agent(ua_general_google)

You can also include any image (ASCII format) in your robots.txt file. For example, add the following command in your script/program to include an ASCII image in your robots.txt file.

robots_file.include_image("logo_dark.png", 90)

In the end, you can save this file to the desired location. The default name of the file is robots.txt.

robots_file.write("robots.txt")

How to Read Robots.txt File?

You can read a robots.txt file from a remote server or a website by creating an object in the RobotsTxt class.

Once the file is read from a remote server, then you can use it for any operations specified in our API documentation e.g. to write it to a local file.

# Read Remote File
robots_file = RobotsTxt()
robots_file.read("https://nike.com/robots.txt")
robots_file.write("nike_robots.txt")
print (robots_file.robots_details("Baiduspider"))

Developers Tutorial

How to Install pyrobotstxt on your Computer?

You can install using the virtual environment.

C:\src\pyrobotstxt> python -m venv .venv

C:\src\pyrobotstxt> .\.venv\Scripts\activate

(.venv) C:\src\pyrobotstxt> pip install -e .

How to Contribute to pyrobotstxt?

Feature Suggestions

Do you have any feature suggestions, improvements and then create an issue on this repository.

Pull Requests

Have you improved anything in pyrobotstxt, then please create a usuall pull request and we will merge it after review.

Collaborations

You can contact us using our website serpwings.com/contact.