Robots + Sitemap

robots.txt + discovered sitemap URLs.

robots.txt + sitemaps for amaze.au

robots.txt

User-agent: *
Disallow:

Sitemap: https://amaze.au/sitemap_index.xml

Sitemaps discovered

About Robots + Sitemap

This tool fetches robots.txt from the host and displays it as-is. It also extracts every Sitemap: directive from the file and probes the two standard sitemap paths (/sitemap.xml and /sitemap_index.xml) to find any sitemaps that exist but were not declared. The result gives you a complete picture of what the site invites crawlers to read and where to find the structured URL inventory.

When to use it

Use this when migrating a site to confirm the new server is serving the right crawl rules. SEO professionals check it to verify they have not accidentally disallowed important paths. Bug bounty researchers read it for hints about admin paths or staging endpoints that are listed only here. Webmasters use it to discover where competitor sitemaps live, useful for content strategy and indexable URL counts.

How to read the results

Disallow lines tell crawlers what not to visit. A bare Disallow: with no path means crawl everything. Allow: overrides Disallow: for specific paths. User-agent: * applies to all crawlers, named agents take their own rules. Sitemap: lines list authoritative sitemap locations. The crawl-delay directive (some crawlers ignore it) suggests a minimum gap between requests in seconds.

Frequently asked questions

Does robots.txt enforce access control?

No. It is a polite request, not security. Well-behaved crawlers like Google and Bing respect it. Malicious crawlers and scrapers ignore it freely. For real access control, use authentication, firewall rules, or rate limits.

Why does the tool show sitemaps that were not in robots.txt?

Many sites publish sitemaps at the conventional /sitemap.xml path without declaring them in robots.txt. Search engines check this path automatically. The tool probes these standard locations and reports any that return a 200 response.

Should I block AI crawlers in robots.txt?

That is a policy choice. Many publishers now block GPTBot, ClaudeBot, Google-Extended, and similar named user agents. Be aware that some crawlers ignore robots.txt, and blocking does not prevent content from appearing in training sets assembled before the block.

How big can robots.txt be?

Google reads up to 500 KiB. Other crawlers may have different limits. Files larger than that get truncated, which can lead to unpredictable interpretation. Keep robots.txt focused on directives that matter and offload large URL lists to sitemaps.

Who Is Online

In total there are 15 users online: 0 registered, 13 guests and 2 bots.

Bots: Other Bot SemrushBot

Users active in the past 15 minutes. Total registered members: 340