Features
SiteMapr is a single-purpose tool that does one thing well: crawl your website and produce a search-engine-ready XML sitemap. Below is an honest, detailed look at what the tool can do, how each feature works, and what it deliberately does not do.
Parallel HTTP crawler
The crawler I wrote uses a parallel HTTP client with a concurrency limit, an internal URL queue, and per-host throttling. On most websites it processes hundreds of pages in seconds. The fastest crawls I have seen are around ten URLs per second per host; the bottleneck is almost always the target server, not the crawler.
Concurrency is intentionally bounded so SiteMapr does not behave like a denial-of-service tool against small sites. If your origin starts returning 429 or 503 responses during a crawl, the crawler backs off automatically.
Robots.txt compliance, on by default
By default the crawler reads your robots.txt before doing any work and respects every Disallow directive that applies to its user agent. Crawl-delay is honored, and the Sitemap directive is ignored (the whole point of using SiteMapr is that you may not have a sitemap yet).
You can turn robots.txt compliance off in the advanced settings if you need to crawl everything for a private inventory. I recommend leaving it on for most use cases. If you are unfamiliar with how robots.txt interacts with sitemaps, the robots.txt guide on the blog covers the relationship in detail.
Configurable depth, scope, and page limit
Three options control the breadth of a crawl:
- Maximum pages caps the total URL count for one session. Defaults to 100, with a free-tier ceiling of 500 per session. Keeps small crawls quick and avoids runaway crawls on accidentally-large sites.
- Maximum depth caps how many link-hops from the entry URL the crawler will follow. Useful when you only want top-level pages or when the navigation structure is unusually deep.
- Include subdomains toggles whether the crawler follows links that go from
example.comtoblog.example.com. Off by default, because most sitemaps should be per-host.
Continue Crawling for sites over 500 URLs
The 500-URL session cap exists to keep individual crawls fast and bounded. For sites larger than that, the Continue Crawling button picks up where the previous session stopped: the crawler reuses the queue of already-visited URLs and the pending URLs discovered but not yet crawled, then runs another batch.
This is functionally a sitemap index pattern done client-side. For a 5,000-URL site you run ten consecutive batches, then export the combined result. The sitemap index article covers the server-side equivalent for very large sites.
Real-time progress feedback
Crawls run with a streaming progress bar that reports the current URL, total crawled, and total queued, updated in real time. The streaming uses Server-Sent Events under the hood, so the connection stays open for the duration of the crawl rather than polling.
For long crawls this matters because you can see immediately when a target server starts returning errors or when the crawler hits an unexpected redirect chain, instead of discovering it at the end.
Three export formats: XML, TXT, CSV
Every completed crawl produces three downloadable files:
- XML follows the sitemap.org 0.9 protocol exactly:
urlset,loc, andlastmodtags, valid against every major search engine. This is what you submit to Google Search Console. - TXT is a simple list of URLs, one per line. Most search engines also accept this format. Useful for piping into other tools or for human review.
- CSV includes the URL plus per-page metadata (status code, content type, content length). This is the format I use most often for actually auditing a site, because spreadsheet sorting and filtering reveal patterns that the XML hides.
No account required, ever
There is no signup. There is no email gate. There is no “upgrade to unlock” prompt. The full feature set, including the 500-URL cap and Continue Crawling, is available to anyone who visits the homepage and pastes a URL.
The site is supported by ads. If you find SiteMapr useful, the most helpful thing you can do is not block them. The next most helpful thing is to share the tool with someone who needs a sitemap.
What SiteMapr does not do
Honest disclosure of the tool's limits, so you do not waste time discovering them:
- JavaScript rendering. The crawler fetches HTML and parses link tags. It does not run a headless browser. For client-rendered single-page apps with no server-side fallback, the crawler will see only the entry page. The JavaScript-heavy sites article explains the alternatives.
- Authentication. SiteMapr cannot crawl URLs behind a login. If your content is members-only, it should not be in a public sitemap anyway, but if you need to audit a logged-in surface, use Screaming Frog or a custom Playwright crawler.
- Image, video, or news sitemap extensions. The output is a standard URL sitemap. If you need image-sitemap or video-sitemap metadata, you will need to generate those separately or use a CMS plugin that supports them natively.
- Scheduled regeneration. Each crawl is a one-off, run from the browser session. There is no “recrawl my site every Monday at 9am” feature. For sites that change frequently, generate a fresh sitemap whenever you publish.
- Sitemap submission. SiteMapr does not submit your sitemap to Google for you. You upload the XML to your own server, then submit the URL through Google Search Console. The submission guide on the blog walks through every step.
Where to start
The fastest path: enter your URL on the homepage, leave the defaults alone, and click Generate. For most websites that produces a usable sitemap in under thirty seconds.
If something does not work as expected, or if you have a feature request, the contact page goes directly to my inbox. I read and reply to every message personally.