GEO benchmark: 1,400 prompts across ChatGPT, Perplexity, Claude, and Gemini for 50 B2B SaaS companies. Methodology and findings inside.

Been doing GEO work for a while now and kept running into the same problem: no actual data on how different AI platforms behave differently, or what separates companies that get recommended consistently from ones that don't.

So we built it ourselves. 50 B2B SaaS companies, 7 buyer-intent prompts each, across 4 platforms. 1,400 prompts total.

Scored each company on 4 things:

how often they get mentioned,
where they appear when they do,
how AI describes them,
and how consistent that is across platforms.

Composite score out of 100.

A few things worth sharing before I drop the full dataset.

Platform behavior is genuinely different across models.

Claude mentions 88% of the companies we tested. ChatGPT and Gemini both hit 100%. Perplexity sits at 90%.

These are not interchangeable systems. A brand can be consistently recommended on ChatGPT and completely absent from Claude. If you are only checking one platform, you are missing a real gap.

Optimizing for sentiment is probably a waste of time.

44 of 50 companies scored 19 or 20 out of 20 on sentiment.

AI speaks positively about almost everyone when it mentions them. The companies at the bottom of our leaderboard are not being described badly. They just are not being mentioned.

The lever is mention frequency and position, not how the model frames you when it does show up.

The gap between high and low scorers is bigger than I expected.

Companies scoring 60 and above average 18.8 out of 30 on mention rate. Companies scoring 35 and below average 3.0.

That is a 15.8-point delta.

It comes down to citation volume across a range of buyer prompts, not domain authority, not content quality in isolation.

The finding that surprised me most:

Make is present on all 4 platforms. Zapier is missing from Claude entirely. Zapier still scores 23 points higher. Being present does not mean being recommended. Mention frequency and position carry far more weight in the final score than whether a platform technically knows you exist.

One honest caveat: AI responses are non-deterministic. Same prompt, different run, sometimes different result. We estimate variance of roughly 3 to 8 points per company. Treat the directional patterns as reliable, treat individual scores as approximate.

Full dataset, methodology, category breakdowns, and all 50 companies ranked in the first comment. Curious what patterns others are seeing, especially on Claude where the selectivity gap seems most pronounced.

submitted by /u/ap-oorv
[link] [comments]

from Search Engine Optimization: The Latest SEO News https://ift.tt/iISpbM7

IT Skills India

Search This Blog

GEO benchmark: 1,400 prompts across ChatGPT, Perplexity, Claude, and Gemini for 50 B2B SaaS companies. Methodology and findings inside.

Labels

Comments

Post a Comment

Popular posts from this blog

Local seo vs. natiowide seo?

Does anyone have a list of words that AI always uses?

GMB territory inching with service company