Skip to main content

What to do with 6 MIL + pages

I’m in-house SEO working for a really old company that’s been around for decades and has lots of different facets to it.

There’s a real legacy issue with the website. Hundreds of people have had access and autonomy over the site, and there’s so much crap. They’ve also used the site as an intranet and - though they’re pretty hard to find - there’s notes from meetings, HR docs and so much more live on the site.

I’m running a crawl now, after I noticed I’ve got www pages linking to non www pages. So I need to get everything on the same domain. I can’t do that until I know the extent of the issue, historically I’ve not been able to crawl the full site just because of time restraints.

So I’ve always crawled specific subfolders and tackled deadweight in stages. Now I want to bite the bullet and do a full crawl because I want the full picture. But it’s onto 6 million pages now (and counting) excl. images obviously.

When this is done, how do I even go about exporting this? Surely excel and google sheets can’t handle that much data? Any advice around this would be amazing.

Thank you!

PS using Screaming Frog

submitted by /u/Sick_Turtle
[link] [comments]

from Search Engine Optimization: The Latest SEO News https://ift.tt/3jCi7ce

Comments

Popular posts from this blog

Local seo vs. natiowide seo?

I've done SEO for local businesses but I recently got my first client that sells an item nation wide. ​ Any suggestions for doing nationwide SEO? ​ I am used to making geopages for local towns. I was going to do the same with some input from the client about what cities or towns he would like to show up in? submitted by /u/Letmeinterviewyou [link] [comments] from Search Engine Optimization: The Latest SEO News http://bit.ly/2JHy0k0

Clients site has a weird issue with 302 redirects that I haven't seen before.

Site is in Drupal, hosted on Amazon CDN & Cloudflare. So here's a quick breakdown: The site itself works normally. It's a bit dated, but you can click on links and navigate around as you'd expect. Seeing no obvious issues, I run a Screaming Frog crawl to begin my audit. Only 5 pages were picked up by the crawl which was super weird, since all internal links are regular html and there shouldn't be any issues. So I go through the site and manually collect a bunch of URLs, which I submit to SF again as a list. Every single link bar the 5 originally crawled return a 302, with the 'redirect' pointing back to the home page. Except as I said, those pages don't browser redirect. Browser side, they work fine. I guess they redirect the crawl bot though, since the rest of the site is functionally invisible. Other tools I've looked at say that the pages return simultaneous 302 and 200s, which doesn't make too much sense. These 302s are also old enough ...