How to Create and Analyze XML Sitemap using pysitemaps?
Sitemaps are used to systematically inform 3rd parties (users, search engines, websites) about the structure of your websites. They should be carefully designed to avoid any unwanted errors, (possible)redirect codes, or invalid URLs.
Sitemaps can be submitted to articles, pages, images, pdf files or for videos. They help search engines find appropriate content without in-depth crawling. These sitemaps also help users to avoid extra crawl depths (usually, search engines only go up to a crawl depth of 6).
pysitemaps
is ideal for such tasks for static sites generated using python.
What is pysitemaps python package ?
pysitemps is a python package to systematically generate such sitemaps for static sites using python. There is no additional library dependency as pystitemps only uses the python standard library (tested with version 3.10.)
Which Porblems can be solved with pysitemaps package?
- No additional packages need to be installed
- Neat and clean code
- It can be added to any python static site generator (your own or something like pelican or mkdocs)
- can read, write and analyze sitemaps
How to Create and Analyze XML Sitemaps using pysitemaps?
First, install it from PyPi using the following command
pip install pysitemaps
Or you can install it from the source by downloading the source code. Open the cmd
console and then navigate to download the source code. Now install it using the following command.
pip install -e .
Now import Sitemap
, `XmlDocument
and Url
from pysitemaps
module and use them as shown in the following example.
from pysitemaps import Sitemap, Url, XmlDocument
How to Read Sitemap
Reading a sitemap from a XML
file stored on a local computer is very easy with pysitemap
.
You only need to create a Sitemap
object with the website name and then call the .read
function. In the ‘.read()’ function you can also specify the full path of your sitemap.xml file.
An example of this procedure is shown below.
smp = Sitemap(website_name="seowings.org")
smp.read("sitemap.xml")
print(smp.as_dict())
How to Create Sitemap
You can create a website sitemap by using the Sitemap
Object with website_name
, file_path
of the sitemap.xml file, and optional xsl_file
.
Then append Url
s to the sitemap objects. Later, you can use the .write()
function to save the files to your local disk. The file will be saved to file_path
, which you specified at the time of Sitemap
object creation.
smp = Sitemap(
website_name="https://www.seowings.org/",
file_path="sitemap.xml",
xsl_file="https://www.seowings.org/main-sitemap.xsl",
)
smp.append(
Url(
loc="https://www.seowings.org/a.html",
lastmod="2022-12-25",
images_loc=["https://www.seowings.org/a1.png"],
)
)
smp.append(
{
"loc": "https://www.seowings.org/b.html",
"lastmod": "2023-05-01",
"images_loc": [
"https://www.seowings.org/b1.png",
"https://www.seowings.org/b2.png",
],
}
)
smp.write()
print(smp.as_dict())
How to Locate/Find Sitemap of any Website
If you want to locate the sitemap of any website on the internet, then create a Sitemap
object and use .fetch(include_urls=False)
. This function will look for all possible locations where a website can have a sitemap.
The following snippet will locate the sitemap of the seowings.org website.
smp = Sitemap(website_name="https://www.seowings.org/")
smp.fetch(include_urls=False)
print(smp.as_dict())
How to Fetch Sitemap of any Website
You can also use the fetch function to download the complete sitemap of a website. For this, you need to enable include_urls=True
in the fetch function as shown in the python snippet below.
This snippet will locate and download the sitemap of the seowings.org website.
smp = Sitemap(website_name="https://www.seowings.org/")
smp.fetch(include_urls=True)
print(smp.as_dict())
How to Fetch the Index Sitemap of any Website?
smp = Sitemap(website_name="https://www.dw.com/")
smp.fetch(include_urls=False)
print(smp.as_dict())
How to Extract Urls from XML Sitemap Document?
There are certain use cases when you might want to extract URLs from the sitemap’s url_set
tag. For this, you can create XmlDocument
Objet and then use the add_from_text
function to extract Url
s from the raw text (text needs to be a valid sitemap XML
file).
In the following snippet, we are showing how to create an XmlDocument
and then add Url
s to it using three different approaches.
- using Url
Objet
- using url paramters i.e. loc
, lastmod
and images_loc
- extracting Url
s from raw XML text.
from datetime import datetime
news_sitemap = XmlDocument(
"sitemap-news.xml",
lastmod=datetime.now().strftime("%Y-%m-%d"),
include_urls=False,
)
news_sitemap.add_object(Url("b.html", "2023-05-02", ["img1.png", "img2.png"]))
news_sitemap.add_url(
loc="c.html", lastmod="2023-01-02", images_loc=["img4.png", "img5.png"]
)
news_sitemap.add_from_text(
"""
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>z.html</loc>
<lastmod>2022</lastmod>
<image:image>
<image:loc>z.png</image:loc>
</image:image>
</url>
<url>
<loc>dz.html</loc>
<lastmod>2022</lastmod>
<image:image>
<image:loc>z.png</image:loc>
</image:image>
<image:image>
<image:loc>a.png</image:loc>
</image:image>
</url>
</urlset>
"""
)
print(news_sitemap.as_dict())
How to Contribute?
Suggestions, criticism, comments and contributions are welcome. Please open an issue on the repository home page on GitHub.
If you have fixed any bug or added any useful feature, please clone this repository. Include your changes and create a pull request. We appreciate your efforts and will add them at earliest.
Documentation
Documentation of pysitemaps
is live at pysitemaps.pages.dev.