Robots + Sitemap
robots.txt + discovered sitemap URLs.
robots.txt + sitemaps for filetruth.com
robots.txt
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7403111619150700110634121d1811000601001c5a171b19">[email protected]</a> to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> <p>Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.</p> <script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script></body></html>
Sitemaps discovered
No sitemap declared in robots.txt and the common paths returned no 200.
About Robots + Sitemap
This tool fetches robots.txt from the host and displays it as-is. It also extracts every Sitemap: directive from the file and probes the two standard sitemap paths (/sitemap.xml and /sitemap_index.xml) to find any sitemaps that exist but were not declared. The result gives you a complete picture of what the site invites crawlers to read and where to find the structured URL inventory.
When to use it
Use this when migrating a site to confirm the new server is serving the right crawl rules. SEO professionals check it to verify they have not accidentally disallowed important paths. Bug bounty researchers read it for hints about admin paths or staging endpoints that are listed only here. Webmasters use it to discover where competitor sitemaps live, useful for content strategy and indexable URL counts.
How to read the results
Disallow lines tell crawlers what not to visit. A bare Disallow: with no path means crawl everything. Allow: overrides Disallow: for specific paths. User-agent: * applies to all crawlers, named agents take their own rules. Sitemap: lines list authoritative sitemap locations. The crawl-delay directive (some crawlers ignore it) suggests a minimum gap between requests in seconds.
Frequently asked questions
Does robots.txt enforce access control? ▾
No. It is a polite request, not security. Well-behaved crawlers like Google and Bing respect it. Malicious crawlers and scrapers ignore it freely. For real access control, use authentication, firewall rules, or rate limits.
Why does the tool show sitemaps that were not in robots.txt? ▾
Many sites publish sitemaps at the conventional /sitemap.xml path without declaring them in robots.txt. Search engines check this path automatically. The tool probes these standard locations and reports any that return a 200 response.
Should I block AI crawlers in robots.txt? ▾
That is a policy choice. Many publishers now block GPTBot, ClaudeBot, Google-Extended, and similar named user agents. Be aware that some crawlers ignore robots.txt, and blocking does not prevent content from appearing in training sets assembled before the block.
How big can robots.txt be? ▾
Google reads up to 500 KiB. Other crawlers may have different limits. Files larger than that get truncated, which can lead to unpredictable interpretation. Keep robots.txt focused on directives that matter and offload large URL lists to sitemaps.