FAQ-Off
Article
Source
Doc
Category Documentation
Type Doc
Last Modified 8 October 2025
Location Tools > Screaming Frog

Screaming Frog Guide

From FAQ-Off, the Calibre9 knowledge base

Introduction
Intro
Screaming Frog is a website crawler that you can run from your personal computer. It helps us to see websites as the Googlebot sees them, and to analyse technical issues at scale. It is the single most useful piece of SEO software in our entire library and is core to how we do technical audits.
This guide is maintained by Lachlan Cowie. If you’d like any additions or clarifications, please add a comment or Slack me.
Quick Start
Cowie’s Config
This is a copy of my usual crawl settings. It’s a great starting place for a default SEO crawl:
Importing Settings
→ Go to ConfigurationProfiles Load...
Select my config from your downloads folder (or a different config of your choice) then click Open
→ Then go to ConfigurationProfiles and click Save Current Configuration as Default (this will use the current settings as the default, each time you open Screaming Frog)
Exporting Settings
In case you’d like to share your own settings:
→ Go to ConfigurationProfiles Save As...
License
To enter your license:
Screaming Frog SEO Spider Settings... Licence
If you do not have a licence, please contact Chris Pride.
Common Issues
My crawl is taking forever 🤬🤬🤬
If the crawl is taking forever, it’s usually one of these settings that is responsible:
  • Images and Media (turn off crawling and storage for images and media)
  • Rendering (switch to text-only)
  • URL Rewriting (turn on, this is usually the first setting I change)
  • API Access (turn them all off)
  • Storage Mode (set to Database Storage)
  • Memory Allocation (increase to at least 10GB)
Keep in mind that most of these settings have a trade-off, and there are issues you may miss or misdiagnose because you have these settings changed. For particularly large websites, the crawl may take hours regardless of what your settings are. Plan accordingly.
“You are running out of memory for this crawl” error
This usually happens because you haven’t allocated Screaming Frog enough temporary storage (RAM). You should:
  • Switch to Database Storage
  • Increase memory allocation (to at least 10GB)
I keep getting 403/510 errors
403 and 510 errors usually mean you have been blocked by the webserver. This can happen for a number of reasons:
Shopify Sites
For Shopify sites, you usually get blocked by crawling the pages too quickly. I recommend reducing the crawl speed to ~2 URL/s (see instructions). The block is temporary, so you can try again in half an hour. If you need to crawl more quickly, use a VPN (or move your VPN to a different server)
WordPress and Other CMSs
WordPress blocks are usually permanent. You will need to fix your settings and then use a VPN (or move your VPN to a different server).
You have usually been blocked for one of these reasons:
  • You are ignoring nofollow tags (switch to respecting the rules)
  • You are ignoring disallow rules (switch to respecting the rules)
  • You are pretending to be a Google Bot (switch to the Screaming Frog user agent)
  • You are crawling too quickly (lower the crawl speed)
I want to know how many blog posts are on the website
The search bar above the page list in Screaming Frog is the best way to filter pages by pathway, title, status, indexability and more:
Click the three lines to get the full suite of filter options:
If the blog posts do not have a unique URL pathway, I recommend using a Custom Extraction (see instructions).
I want to crawl a specific list of URLs
To crawl a specific list of URLs, switch Screaming Frog to List Mode:
Once it is in list mode, you can click Upload and either paste in the URLs or attach a csv/Google Sheet.
I want to compare two crawls
To compare two previous crawls, switch to Compare Mode. You will need to be using Database Storage (to save previous crawls).
Click Select Crawl and pick the crawls you’d like to compare from your storage folder.
I want to change the colour of the UI
Go to Screaming Frog SEO Spider Settings... User Interface
→ Select pink (or a different, less interesting colour)
Images/resources are being counted as external links
This is usually because the resources are being hosted on the domain for their CMS or CDN. You can fix this in Screaming Frog by adding that domain to the CDN list.”
Settings
Storage Mode
This setting should be set to Database Storage. Database storage allows you to save and re-open crawls which is very important for having historical snapshots of your client’s sites. It also prevents your computer from having to hold the entire crawl in RAM (which can crash the computer on larger crawls).
Memory Allocation
This should be set to at least 10GB. Increasing the amount of RAM allocated to Screaming Frog allows it to crawl more quickly, and load more data into the UI.
Configuration
Spider - Crawl
 
Images & Media
These settings should usually be turned on. Oversized images and media are a frequent cause of site performance issues.
I sometimes turn this setting off for large sites, because crawling and storing the image/media files is causing the crawl to become too large.
CSS
These settings should always be turned on. CSS (Cascading Style Sheet) Files are a key part of how all modern websites are styled and rendered. Every single one of our clients uses CSS files on their site.
JavaScript
These settings should always be turned on. They will allow Screaming Frog to crawl and store JavaScript files. JavaScript is critical to how many modern web pages render and function.
Canonicals
These settings should always be turned on. They allow Screaming Frog to crawl and store canonical links.
Hreflang
Turning these options on will allow Screaming Frog to crawl and store hreflang links. I strongly recommend that you turn this on for all international clients (even if we are only working on the Australian site).
iframes
I recommend turning these settings on. An iframe is a second web document that has been embedded inside the client’s page. Our clients frequently use iframes for video content, contact forms and booking forms.
Follow Internal “nofollow” & Follow External “nofollow”
Ticking these options will allow Screaming Frog to crawl links tagged with the “nofollow” attribute. I recommend turning this option on, so that you can see all of the interlinking on the site. Sometimes web developers or past SEOs will wrongly tag links as nofollow.
If you are using this setting, make sure you are crawling with a VPN (Mulvad or equivalent). Some sites will intentionally have a hidden nofollow link, and then IP ban any crawler who follows the link. This is called a “blackhole” or a “honeypot” and is designed to catch and stop malicious crawlers who are not following the rules set out in the nofollow attribute or robots.txt file.
Crawl Linked XML Sitemaps
I strongly recommend turning on Auto Discover XML Sitemaps via robots.txt. This will allow Screaming Frog to find all of the indexable pages on the site via the robots.txt file (like the GoogleBot does), and tends to help the crawler find all of the pages on the site much more consistently.
If you are still having issues with the crawler finding pages, you can feed it the sitemap directly using the “Crawl these Sitemaps” setting.
Spider - Extraction
 
Page Details
I usually leave all of these settings on. Page content is very important to SEO (and is pretty light on the crawler), so there is no reason to ever turn these settings off.
Structured Data
Never turn these settings off. Structured data is incredibly important to SEO, and is only becoming more and more important as platforms lean more heavily on LLMs to process their data. Good structured data can greatly increase visibility, particularly for long-tail transactional keywords.
Directives
Never turn these settings off. Robots tags (especially noindex tags) are incredibly important to SEO. Pages with a noindex attribute in their robots tag will not be indexed by Google.
Structured Data Validation
I usually leave these settings on. It is very important that we make sure the client has valid structured data, but doing it page-by-page can slow the crawl significantly for large sites.
Spider - Rendering
 
Rendering
Set the Rendering to “Javascript”. Almost all modern websites rely on JavaScript either to render page content or to provide functionality.
For some very large sites, you may need to turn this setting off so that the crawl can be completed within a realistic timespan.
Spider - Advanced
 
Follow Redirects and Canonicals
Always leave these settings on. Redirects and canonicals are very important to technical SEO, and we should always follow those links where possible.
Respect Settings
Keep this shit off. We have no respect.
In all seriousness, it’s important that our crawler does not respect noindex or canonical tags because we are looking for issues and errors (pages that should not be noindex/canonicalised, but are).
Spider - Preferences
 
Page Title and Meta Description Width
This is where you set limits for flagging pixels and characters. Pages over or under the limits will be put into the “Page Title Over X Pixels” tab (or equivalent) once the crawl is complete.
As of writing this guide, the limits are:
  • Page Title – 571 Pixels
  • Meta Description – 923 Pixels
For maximum accuracy, lengths should always be measured in pixels. I recommend also having a minimum to help you find titles and descriptions that are malformed or mostly missing.
Non-descriptive Anchor Text
This setting allows you to adjust which links are flagged for having non-descriptive anchors. If you notice the client repeatedly uses a vague anchor in their links, I recommend adding that anchor to this list so that you can locate all instances of the issue.
Content - Spelling and Grammar
 
Spelling & Grammar
I usually leave these settings on. Screaming Frog’s Spelling and Grammar checking is pretty poor and regularly incorrectly flags proper nouns, or makes incredibly pedantic grammar suggestions.
I leave it on, because it’s still very useful to flick through the suggestions and see if you can find any particularly goofy mistakes. Users tend to perceive spelling mistakes as very unprofessional, and they can damage perception of the brand.
robots.txt
 
robots.txt
I usually leave this set to “ignore robots.txt but report status”.
This will allow Screaming Frog to crawl parts of the site that have been blocked by the robots.txt file. We do this to find any pages that should be indexed but have been incorrectly blocked from being crawled via robots.txt.
If you are using this setting, make sure you are crawling with a VPN (Mulvad or equivalent). Some sites will intentionally have a hidden page blocked via robots.txt, and then IP ban any crawler who follows the link. This is called a “blackhole” or a “honeypot” and is designed to catch and stop malicious crawlers who are not following the rules set out in the nofollow attribute or robots.txt file.
URL Rewriting
 
URL Rewriting
This setting rewrites removes all parameters from the URLs. I usually leave this setting off because it is important that we check all of the parameter URLs have been correctly canonicalised back to the original version of the page.
For some large sites however, they have too many pages with too many parameters and we need to use this setting in order to be able to crawl the site properly. By removing parameters and combining the pages, we will significantly cut down on the number of pages Screaming Frog needs to crawl.
CDNs
 
CDNs
Sometimes the crawl will have an issue where internal images and resources are being marked as external links. This usually occurs because they are being hosted on a different domain (usually belonging to the CDN (eg. Cloudflare) or CMS (eg. Shopify)).
You can fix this issue by adding the domain to this CDN list. All of those resource URLs will be counted as internal.
Include / Exclude
 
Include
This tab allows you to narrow the crawl so that it only includes certain URLs. We don’t often use this, but it can be very useful if you only want to crawl a particular subfolder on the site. For example, you could add /services/ to only crawl URLs within the services subfolder.
Exclude
This tab allows you to exclude particular URLs and sub-folders from the crawl. We don’t often use this, but it can be great if you just wanted to crawl the core pages on a site and exclude the accounts/blog pages (for example). Just add a list of any pathways or URLs you want excluded to the input box.
Speed
 
Speed
This setting allows you to reduce the amount of resources that your computer is giving to Screaming Frog, and/or restrict the number of URLs per second that Screaming Frog crawls. It is mostly used for crawling Shopify sites because Shopify will temporarily block your IP address if you make too many page requests too quickly. I recommend reducing your crawl speed to 2 URL/s.
User-Agent
 
User-Agent
This setting affects the user agent of your Screaming Frog crawler (who the crawler says it is when it requests a page). I usually have this set to Googlebot (Smartphone) because that is the crawler Google primarily uses for indexing, and I’d like to see the pages as Google sees them. For some clients you may also want to use the GPTBot or the ClaudeBot to see how the pages are served to LLMs.
In some very rare cases, the site will block you for using the Googlebot user agent (because they can tell you are not making the request from an IP address that is actually owned by Google). As a result, I recommend using a VPN while you commit this kind of user agent identity theft.
Custom
 
Custom Extraction
The custom extraction settings allow you to separate pages based on specific pieces of HTML in the page’s code. This is very useful for making a list of blog pages when the client’s site does not have a blog subfolder. You can identify an element that exists on the template for blog pages, but not the rest of the site and then extract all of the pages with that element.
Open a blog page and find a styling element that is present on all of the blog pages, and no other kind of page. “Recent Posts” and “Further Reading” sections are great for this.
→ Go to Screaming Frog and open the Custom Extraction settings. Click on the button with the globe icon.
Paste the URL from your blog into the address box, then scroll down and click on the element you found that is unique to blog posts on this site.
Click OK and then run the crawl. All of the blog posts should now be separated into a tab under Custom > Blogs (or whatever you called the extraction)
OR
You can also do this by highlighting the element in the Chrome Dev Tools (CMND/Ctrl + Shift + C) and then right clicking and selecting Copy > X-path.
API Access
 
You can use this feature to connect your Screaming Frog to our Google Search Console account. This will allow you to see the current indexing status and performance of your pages within Screaming Frog.
I don’t usually use this option because I look at GSC separately, and I often begin audits before the client has given us access to their analytics but it’s pretty cool and I recommend having a play around to see what works for you.
Crawl Analysis
 
Crawl Analysis
I strongly recommend leaving all of these settings on. They make Screaming Frog automatically perform additional analysis at the end of its crawls to find orphaned pages, sitemap issues, hreflang issues and more. This is very useful and there’s no good reason to turn it off.
I have noticed some occasional errors/inconsistencies in the crawl analysis, particularly when dealing with paginated products or sitemaps. I recommend double checking to make sure that a page is actually orphaned/missing from the sitemap.
Categories: Documentation Tools