Skip to main content

Built a CLI that tells me if GPTBot/ClaudeBot/Perplexity can actually reach my site (and where the block is)

I kept getting "your AI visibility is low" reports from various tools that wouldn't tell me *why* . Was the block in robots.txt? At the CDN? At origin? Different fixes, different teams. I guess this sits somewhere in the "generative engine optimization" bucket, but I wanted the tool to stay very concrete: can these crawlers reach the site, and if not, where are they being blocked? So I wrote a small Node CLI that just answers that question deterministically: ``` npx u/geosuite/ai-crawler-bots robots https://my-site.com ``` What it actually does: - Parses robots.txt with line-level provenance — when a bot is Disallow'd it tells me *which line in which group* . - For each tracked bot (24 right now: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Perplexity-User, Bytespider, etc.), reports the verdict. - Detects Cloudflare's "Managed Content" markers (`# BEGIN Cloudflare Managed content` … `# END`) and tells me whether my own rules would've allowed the bot. - Also has a `check <url>` mode that does an actual HTTP probe with each bot's UA, and distinguishes edge blocks (CDN fingerprints) from origin blocks. Different remediation. Zero runtime dependencies, MIT, Node 20+. Source: github.com/TryGeoSuite/ai-crawler-bots There are three companion tools in the same scope: - `@geosuite/schema-templates` — 23 schema.org JSON-LD templates + offline validator. - `@geosuite/llms-txt-generator` — sitemap.xml → llms.txt. - `@geosuite/sitemap-builder` — crawl + valid sitemap.xml for custom sites without one. Honest disclaimer: I also build a hosted SaaS (trygeosuite.it) on top of similar logic, but the four CLIs are MIT and stand alone. I open-sourced them because I find it dishonest to sell a black box that does things any dev can verify. Curious what other people are using to debug AI bot reachability — especially anyone running through Cloudflare, Akamai, or Vercel. The "managed content" injection broke my mental model the first time I hit it. 
submitted by /u/Perix97
[link] [comments]

from Search Engine Optimization: The Latest SEO News https://ift.tt/6EfYB4J

Comments

Popular posts from this blog

Local seo vs. natiowide seo?

I've done SEO for local businesses but I recently got my first client that sells an item nation wide. ​ Any suggestions for doing nationwide SEO? ​ I am used to making geopages for local towns. I was going to do the same with some input from the client about what cities or towns he would like to show up in? submitted by /u/Letmeinterviewyou [link] [comments] from Search Engine Optimization: The Latest SEO News http://bit.ly/2JHy0k0

Clients site has a weird issue with 302 redirects that I haven't seen before.

Site is in Drupal, hosted on Amazon CDN & Cloudflare. So here's a quick breakdown: The site itself works normally. It's a bit dated, but you can click on links and navigate around as you'd expect. Seeing no obvious issues, I run a Screaming Frog crawl to begin my audit. Only 5 pages were picked up by the crawl which was super weird, since all internal links are regular html and there shouldn't be any issues. So I go through the site and manually collect a bunch of URLs, which I submit to SF again as a list. Every single link bar the 5 originally crawled return a 302, with the 'redirect' pointing back to the home page. Except as I said, those pages don't browser redirect. Browser side, they work fine. I guess they redirect the crawl bot though, since the rest of the site is functionally invisible. Other tools I've looked at say that the pages return simultaneous 302 and 200s, which doesn't make too much sense. These 302s are also old enough ...