Major Indexing Crisis: 2M Pages Not Indexed - Need Help! ๐
Hi SEO community! I'm facing a massive indexing problem with my Next.js website and could really use some expert advice.
๐ The Problem in Numbers:
- Current indexed pages: Only ~80k (down from a peak of 300k)
- Pages not being indexed: Nearly 2 million pages
- Platform: Next.js
๐ Google Search Console Breakdown:
| Issue | Source | Status | Pages |
|---|---|---|---|
| Page with redirects | Site | Failed | 69,281 |
| Crawled but not indexed | Google Systems | Failed | 1,232,822 |
| Not found (404) | Site | Started | 336,376 |
| Alternate page with proper canonical tag | Site | Started | 9,882 |
| Soft 404 error | Site | Started | 1,098 |
| Server error (5xx) | Site | Started | 910 |
| Duplicate without user-selected canonical | Site | Started | 395 |
| Redirect error | Site | Started | 97 |
| Blocked by robots.txt | Site | Started | 51 |
| Blocked due to other 4xx issue | Site | Started | 26 |
๐️ Site Structure Context:
My site generates pages for every combination of:
/machines/state/machines/state/category/machines/state/category/type/machines/state/city/machines/state/city/category/machines/state/city/category/type
This creates nearly 2M potential URLs across all US states, cities, equipment categories, and types.
๐ค The Mystery:
Here's what's really puzzling me:
Sitemaps are clean: My sitemap contain many valid pages.
Google is finding invalid URLs: Many problematic pages don't exist in my sitemaps.
๐จ Key Questions:
- How is Google discovering these invalid state/city combinations that don't exist in my sitemaps?
- Why are 1.2M pages "Crawled but not indexed"? Is this a quality issue or technical problem?
- Should I be concerned about the 69k redirect pages? These might be from URL structure changes.
- Is there a systematic approach to handle this scale of indexing issues?
๐ ️ What I've Already Tried:
- Verified sitemaps are properly formatted and submitted
- Checked robots.txt for blocking issues
- Monitored server logs for 5xx errors
- Reviewed canonical tag implementation
๐ญ My Theories:
- Google might be generating URLs from internal links or menu structures
- The massive scale might be triggering quality filters
- There could be crawl budget issues given the site size
- Next.js specific issues with server-side rendering?
๐ Looking For:
- Similar experiences with large-scale geographic/category sites
- Technical insights on why Google discovers non-sitemap URLs
- Strategies for managing massive indexing issues
- Next.js specific indexing best practices
Has anyone dealt with indexing issues at this scale? Any insights on managing millions of location-based pages would be incredibly helpful!
Tech Stack: Next.js, hosted on Vercel, standard sitemap implementation.
Thanks in advance for any help or insights! ๐
[link] [comments]
from Search Engine Optimization: The Latest SEO News https://ift.tt/85kcdBa
Comments
Post a Comment