In a perfect world our websites would have no broken pages. But if that were a reality, than the unicorn frappuccino from Starbucks would actually be good.
In my work on the Visual Studio team at Microsoft, we have thousands of pages across subdomains. Finding all the 404s and redirecting all of them to proper pages would take a lifetime. So, I used a technique I first learned from Annie Cushing, on how to find the most important 404s to fix.
Data Collection
Step 1: Crawl for 404s
The first step here is to find the broken pages. The tool I used for this is Screaming Frog. You can crawl up to 500 URLs for free. We use the paid version at Microsoft, but even then it’s affordable. There are lots of amazing things you can do with Screaming Frog, but for our purposes, we just have to put in the site URL and let it crawl.
Browse to the Response Codes tab and filter by Client Error (4xx) to find your 404 pages.
Bonus: Inlink data is provided by Screaming Frog.
Step 2: Page Performance Data
After collecting my 4xx response pages, I head over to Google Analytics to grab some session/pageview data.
Option 1 (Sessions): Acquisition > Channel > Organic > Primary Dimension: Landing Page > Export
Pro tip: to export more then 10 rows, scroll to the bottom of the screen and select Show rows: 100.
Option 2 (Pageviews): Behavior > Site Content > All Pages > Export
You’ll want to select a healthy amount of time too, I suggest a month or two for most sites.
Step 3: Backlink Link Data
Lastly, I’ll use Moz, Majestic, SEMRush, or Ahrefs to grab some backlink data.
Free: Open Site Explorer > URL > Target: this root domain > Link Source: only external > Request CSV
(You will have to log into a free account to be able to download.)
This is harder to use with small site, because Mozscape mostly crawls domains with a certain authority.
Paid: SEMRush > Domain Analytics > Backlinks > Indexed Pages > Export
(Other options are Majestic, Ahrefs.)
Pro tip: You can also grab social share data from tools like Buzzsumo to add more insight to the analysis.
Married That Data
Using Excel, or Google Sheets, collect all the data in three sheets of one workbook. This is where the real analysis comes in.
Step 4: Simplify URLs
Find and replace in Excel is a dream send. Use it. To find, use the keyboard shortcut Command/Ctrl + F, then there is an option to replace.
Find all the http:// and https:// and replace with nothing. This trick is great for setting up our urls to be the same format to that we can to that VLOOKUP.
Step 5: VLOOKUP TIME!
Watch this video from Annie Cushing that breaks down how to do a VLOOKUP. Then add some columns to your 404 page data tab for sessions and backlinks. These columns are where you’ll put those VLOOKUPs. I also added some table formatting for easy filtering later.
Next, you’ll want to follow the steps in the video tutorial to match the 404 pages to pages that have received backlinks or sessions. Your final result will look something like this.
I wrapped my VLOOKUP function in an IFERROR function to show “”, or blank values if there is no url that was found.
Step 6: Glitter
Conditional Formatting is your friend. It’s hard when you see a bunch of numbers to analyze which are the most important.
Excel > Home > Conditional Formatting > Color Scales
Now you’re able to prioritize which 404 pages need to be fixed and with what level of urgency.
Improve UX and SEO
Step 7: Get to Fixing
I would recommend setting up some 301 redirects to that pages that best fit the visitor’s intent. It might be easy to redirect to the homepage. But take the time to find what the best alternative to that broken page is, then link to it.
When setting up these redirects, I recommend doing a quick check in Screaming Frog to see if you have any daisy-chain redirects going on.
Annie Cushing says:
Thanks for the mention, honey!
Victoria Cushing says:
> Happy to! Thanks for the inspiration.