Best tool to check Robots.txt permissions of a website

Robots Txt File

Faisal Shahzad 28-02-2023 Web Analysis Tools

Robots.txt checker is an online tool to check, analyze and validate your robots.txt file. This check is very important for the technical SEO of a website. Our tools analyze the robots.txt file of a website and also provide a search function to look for a specific robot’s configuration in your robots.txt file and restriction applied on a specific directory.

Why does checking robots.txt matter?

Most web admins need to put focus on robots.txt file configuration. By checking the robots.txt file, your website, and your competitors, you can find some common mistakes on the website. These mistakes can be costly, such as allowing any web crawler and bot to crawl your website unmetered without restriction.

How to use SERP Wings robots.txt file checker?

Put your website URL inside the text box and hit analyze button. Our backend functions will fetch the robots.txt file of your provided web address and then analyze/present it to you in a readable manner. You do not need to include robots.txt in your provided URL (a simple URL/web address is enough). Our algorithms are intelligent enough to check and validate your URL.

What are the important components of the robots.txt file?

A robots.txt file can restrict any crawl bot from accessing a file/directory. This restriction is global or specific to certain regional content on your website.

  • User Agents
  • Disallow Components
    • Blocked Pages
    • Blocked Directories
  • Allow Components
    • Allow Pages
    • Disallow Pages

After analyzing your website Allow and Disallow directories for a specific robot, you can reconfigure and change them from your content management system. You can edit them manually or through a plugin offered by your Content Management System (e.g. from WordPress Plugin Directory).

Python Code to Check Robots.txt File

We are using the following python code to check the robots.txt file of a website. First get a repsonse object using request library for your website (www.website.com). Then use following code to parse robots.txt file.

for item in response.text.split("User-agent:"):
    if item:
        subitem = [
            subitem.strip() for subitem in item.split("\n") if subitem
        ]
        bot = subitem[0]
        if not bot.startswith("#"):
            allow = [
                it.split("Allow:")[-1]
                for it in subitem[1:]
                if it.startswith("Allow:")
            ]
            disallow = [
                it.split("Disallow:")[-1]
                for it in subitem[1:]
                if it.startswith("Disallow:")
            ]
            comment = [
                it.split("# ")[-1]
                for it in subitem[1:]
                if it.startswith("#")
            ]

            if allow:
                robot_txt += [
                    {bot: {"al": al, "dal": ""}} for al in allow
                ]

            if disallow:
                robot_txt += [
                    {bot: {"al": "", "dal": dal}} for dal in disallow
                ]

            if not (allow or disallow):
                robot_txt += [
                    {
                        bot: {
                            "al": "",
                            "dal": "",
                        }
                    }
                ]

Things to keep in mind when checking robots.txt file

  • Check your current robots.txt file to find any issues with it
  • Make sure you only block that robot you do not want to crawl your website.
  • Avoid adding a delays tag for robots, as most robots do not like it.
  • You can also block SEO tools crawl agents, e.g. AHREFs, or SEMRush. A list of such agents can be found on www.seowings.org/robots-txt/
  • You must occasionally revise the robots.txt file of your website to make sure that it complies with your SEO plan.

Conclusion

Analyzing the robots.txt file is an important part of technical SEO, which still needs to be noticed. Make sure you analyze it and avoid any discrepancies concerning your SEO Plan. Make a habit of checking it at regular intervals. Also, watch your competitor’s robots.txt file to get inspiration from their SEO strategies.