Technical SEO Basics: Sitemaps

A sitemap lays the structural foundation for your website. Visualizing your website in a sitemap can be a good tool to layout, build and expand your website. It can also be a tool to visualize where linking to your own content from webpage to webpage can make sense.

Sitemaps also have another important role. They help to point search engines to pages on your website and ensure that pages aren’t missed by crawlers.

There are several types of sitemaps:

  • HTML sitemap is in general understood by users. It’s usually a simple page containing links to important pages within the site. It gives a general overview.
  • XML sitemap contains more information. For example:
    • when was the site last updated
    • how often one page of the site has been changed and
    • how important this page is in relation to other pages of the site.

This information allows search engines to analyze content in a more logical and intelligent manner. XML sitemaps are especially useful for new websites that have not been discovered yet by search engines.

seo basics sitemap technical seo

How to build an XML sitemap?

There are free tools like xml-sitemaps.com or screaming frog.

In general, leave the settings as default. Don’t include images as they will be discovered anyways. After sitemap creation upload it to your website server.  If you are working with wordpress, have a look at the free yeost plugin. There everything is already included.

Upload the sitemap to google search console as soon as your website is life. Please refer to google’s help files and how to’s on how to create google search console account. It’s very user friendly so don’t be afraid to try things out on your own. As always, don’t hesitate to get in touch for questions and exchange. It’s definitely worth the time and effort to understand how it works.

Sitemaps are inclusionary files. They tell search engines what you want to have included in.

The robots.txt file is another method for controlling how search engines crawl your website. It’s a protocol that allows to specify which pages should not be indexed. It also specifies the rate at which files are accessed. Robots.txt file allows you to exclude pages from being crawled. Once you uploaded the robots.txt file to your website server it tells bots to crawl and what not to crawl. But robots can choose to ignore this file. It’s a publicly available file (yoururl.com/robots.txt). It will be ignored if it’s in a sub folder. Here are some basics that should help you to understand the file better:

  • User_agent: * Asterix acts as wildcard for all search engines
  • Crawl_delay: 10 10sec to save bandwidth, else server can crash
  • Disallow: /directory/ don’t crawl referenced content