Why should I use JavaScript for SEO?

JavaScript SEO: What you need to know

Did you know that while the Ahrefs blog is powered by WordPress, the rest of the website is largely powered by JavaScript like React?

Most websites use some form of JavaScript to enable interactivity and improve usability. Some use it for menus, showing products or prices, getting content from various sources, or in some cases for all elements on the website. The reality in the modern internet is that JavaScript is ubiquitous.

As Google's John Mueller said:

The web has moved from plain HTML - as an SEO you can embrace that. Learn from JS devs & share SEO knowledge with them. JS’s not going away.

- 🍌 John 🍌 (@JohnMu) August 8, 2017

I'm not saying SEOs need to go out and learn how to code JavaScript. The exact opposite is true. Most importantly, SEOs need to know how Google uses JavaScript and how to troubleshoot problems. In very few cases an SEO is even allowed to touch the code. My goal with this post is to help you learn:

JavaScript SEO is a part of technical SEO (Search Engine Optimization) that aims to make JavaScript-heavy websites easy to crawl, index, and search-friendly. The goal is for these websites to be found in search engines and get a higher ranking.

Is JavaScript bad for SEO; is JavaScript bad? Not at all. It's just different from what a lot of SEOs are used to, and there's a certain learning curve. People tend to use it for things that probably have a better solution. But sometimes you have to work with what you have. You just need to know that Javascript isn't perfect and isn't always the right tool for the job. Unlike HTML and CSS, it cannot be progressively parsed, and it can seriously affect page load time and performance. In many cases, you may be trading performance for functionality.

How Google processes pages with JavaScript

In the early days of search engines, a downloaded HTML response was enough to see the content of most pages. Thanks to the advance of JavaScript, search engines today have to render many pages like a browser so that they can see the content as a user sees it.

The system that handles the rendering process at Google is called the Web Rendering Service (WRS). Google has provided a simplified diagram to show how this process works.

Let's say we start the process at URL.

1. Crawler

The crawler sends GET requests to the server. The server replies with headers and the content of the file, which is then saved.

The request will likely come from a mobile user agent as Google is now mainly indexed on Mobile First. You can use the URL inspection tool in the search console to see how Google is crawling your website. If you're doing this for a URL, check the coverage information for “Crawled As”. This should tell you whether you are still in the desktop indexing or in the "Mobile First" indexing.

The requests are mostly from Mountain View, CA, USA, but they are also crawling some locally customized pages outside of the United States. I mention this because some websites block or treat visitors from a specific country or the use of a specific IP in different ways, which can result in your content not being seen by Googlebot.

Some websites may also use user agent discovery to display content to a specific crawler. With JavaScript websites in particular, Google may see something different than a user. This is why Google tools like the URL Inspection Tool in Google Search Console, the Mobile Friendliness Test, and the Rich Results Test are important for troubleshooting JavaScript SEO issues. They show you what Google is seeing and are useful for checking if Google is possibly being blocked and if they can see the content on the page. How you can test this I will cover in the section on the renderer, as there are some key differences between the downloaded GET request, the rendered page, and even the testing tools.

It's also important to note that while Google reports the output of the crawling process as "HTML" in the image above, it is actually crawling and storing all of the resources it takes to build the page. HTML pages, Javascript files, CSS, XHR requirements, API endpoints and more.

2. Processing

There are many systems that are obscured by the term "processing" in the image. I'll cover some of these that are relevant to JavaScript.

Resources and Links

Google doesn't navigate from page to page like a user would. Part of the processing is to check the page for links to other pages and files that are needed to build the page. These links are pulled out and added to the crawl queue that Google uses to prioritize and schedule crawls.

Google pulls resource links (CSS, JS, etc) needed to build a page from things like tags. However, links to other pages must be in a specific format for Google to treat them as links. Internal and external links must have a tag with an attribute. There are many ways you can do this for users with JavaScript who are not search friendly.

Well:

<a href=”/seite”>einfach ist gut</a> <a href=”/seite” onclick=”goTo(‘seite’)”>immer noch okay</a>

Bad:

no, no href no, link is missing no, link missing not the correct HTML element no link button, ng-click, there are many other possibilities how this can be done wrong.

It's also worth noting that internal links added using JavaScript are not picked up until after rendering. This should be done relatively quickly and, in most cases, not be a cause for concern.

Caching

Every file that Google downloads, including HTML pages, JavaScript files, CSS files, and so on, is aggressively cached. Google will ignore your cache timings and get a fresh copy if it wants to. I'll talk a little more about this and why this is important in the renderers section.

Elimination of duplicates

Duplicate content can be eliminated or deprioritized from the downloaded HTML before it is sent for rendering. In the case of app shell models, there may be very little content and code to be displayed in the HTML response. In fact, there may be times when the same code is displayed on every page of the website, and it could be the same code displayed on multiple websites. This can sometimes result in pages being treated as duplicates and not immediately sent for rendering. Worse, the wrong page or even the wrong website may appear in search results. This should fix itself over time, but it can be problematic, especially with newer websites.

The most restrictive guidelines

Google chooses the most restrictive instructions between HTML and the rendered version of a page. If JavaScript changes an instruction and it conflicts with the instruction from HTML, Google will simply follow the instruction that is most restrictive. Noindex overwrites the index, and Noindex in HTML skips the representation entirely.

3. Render list

Each page now goes to the renderer. One of the biggest concerns of many SEOs with JavaScript and two-step indexing (HTML, then rendered page) is that pages may not render for days or even weeks. When Google looked into this, it found that the pages went to the renderer at an average time of 5 seconds and the 90th percentile was minutes. The amount of time between fetching the HTML file and rendering the pages should not be a problem in most cases.

4. Renderer

The renderer is where Google renders a page to see what a user is seeing. This is where the JavaScript and any changes made by JavaScript to the Document Object Model (DOM) are processed.

To do this, Google uses a headless Chrome browser that is now "evergreen", i.e. it should use the latest Chrome version and support the latest functions. Until recently, Google was rendering with Chrome 41 so many features were unsupported.

Google has more information about the Web Rendering Service (WRS) which includes things like denial of permissions, stateless, flattening-light-DOM and shadow-DOM, and a lot more that is worth reading. DOM, and more that is worth reading.

Web-scale rendering could be the 8th wonder of the world. It is a serious endeavor and requires a tremendous amount of resources. Because of the size, Google takes a lot of shortcuts in the rendering process to speed things up. At Ahrefs, we are the only major SEO tool that renders web pages on a large scale, and we manage to render ~ 150 million pages per day to make our link index more complete. It allows us to check for JavaScript redirects, and we can also display links that we found inserted with JavaScript that we display with a JS tag in the link reports:

Cached resources

Google relies heavily on cached resources. Pages are cached; Files are cached; API requests are cached; basically everything is cached before it is sent to the renderer. Google doesn't go out and download every resource every time a page is viewed, but instead uses cached resources to speed up this process.

This can lead to some impossible states where previous file versions are used in the rendering process and the indexed version of a page can contain parts of older files. You can use file versioning or content fingerprinting to generate new file names when significant changes are made, requiring Google to download the updated version of the resource for rendering.

No fixed timeout

A common SEO myth is that the renderer only waits five seconds for your page to load. While it's always a good idea to make your page faster, this myth doesn't really make sense with the way Google caches the files mentioned above. Basically, they're loading a page that has everything cached already. The myth stems from the testing tools like the URL inspection tool, where resources are pulled live and they have to set a reasonable limit.

There is no fixed timeout for the renderer. What they're probably doing is something similar to the public rendertron. They're likely waiting for something like networkidle0 where there is no more network activity and also setting a maximum amount of time in case something gets stuck or someone tries to mine bitcoin on their pages.

What Googlebot sees

Googlebot is not active on websites. He won't click or scroll things, but that doesn't mean there aren't any workarounds. As for the content, you will see it as long as it is loaded in the DOM with no action required. I'll cover this in more detail in the troubleshooting section, but basically if the content is in the DOM but is only hidden, then the content will be seen. If it is only loaded into the DOM after a click, the content will not be found.

Google also doesn't have to scroll to see your content because they have a clever workaround to see the content. For cell phones, load the page with a screen size of 411x731 pixels and adjust the length to 12,140 pixels. Essentially, this becomes a really long phone with a screen size of 411x12140 pixels. Do the same for the desktop, going from 1024x768 pixels to 1024x9307 pixels.

Another interesting shortcut is that Google doesn't render the pixels during the rendering process. That takes time and additional resources to finish loading a page, and you don't really need to see the final state with the pixels displayed. You just need to know the structure and layout, and you get that without actually having to render the pixels. As Google's Martin Splitt puts it:

https://youtube.com/watch?v=Qxd_d9m9vzo%3Fstart%3D154

In Google search, we don't really care about the pixels because we don't really want to show them to anyone. We want to process the information and the semantic information, so we need something in between. We don't really need to represent the pixels.

A visual representation could help explain the cut part a little better. When you run a test on the Performance tab in Chrome Dev Tools, you will get a loading graph. The solid green part here represents the rendering phase, and for Googlebot that never happens, so they save resources.

Gray = Downloads

blue = HTML

yellow = JavaScript

violet = Layout

green = Representation

5. Crawl list

Google has a resource that talks a little about the crawl budget, but know that every website has its own crawl budget and every request needs to be prioritized. Google also needs to balance the crawling of your website compared to any other website on the internet. Newer sites in general, or sites with many dynamic pages, are likely to crawl more slowly. Some pages are updated less frequently than others, and some resources may also be requested less frequently.

One catch with JavaScript pages is that they can only update parts of the DOM. Browsing to another page as a user can result in some aspects such as title tags or canonical tags in the DOM not being updated. But this may not be a problem for search engines. Keep in mind that Google loads every page statelessly, meaning they don't store previous information and don't navigate between pages. I've seen SEOs get mixed up thinking there was a problem due to the things they see after navigating from page to page, like a canonical tag that doesn't update. However, Google may never see this state. Developers can fix this by updating the status using what's called the History API, but again this might not be a problem. Refresh the page and see what you see, or better yet, let it go through one of the Google's test tools are running to check what Google sees. More on that in a second.

View-source vs. inspect

If you right-click in a browser window, you'll see a number of options for viewing the page's source code and inspecting the page. View-source will show you the same thing as a GET request. This is the raw HTML of the page. Inspect shows you the processed DOM after changes are made and is closer to the content that Googlebot sees. It's basically the updated and most recent version of the page. You should use Inspect rather than View-Source when working with JavaScript.

Google cache

Google's cache is not a reliable means of checking what Googlebot is seeing. Usually it is the original HTML, but sometimes it is the rendered HTML or an older version. The system is designed to see the content when a website is not available. It's not particularly useful as a debugging tool.

Google Testing Tools

Google's test tools such as the URL Inspector in the Google Search Console, Mobile Friendly Tester, and Rich Results Tester are useful for troubleshooting. Still, these tools also differ slightly from what Google will see. I've already talked about the five second timeout in these tools that the renderer doesn't have, but these tools also differ in that they pull resources in real time rather than using the cached versions like the renderer would. The screenshots in these tools also show pages with the displayed pixels that Google does not see in the renderer.

However, the tools are useful for seeing if the content is loaded in the DOM. The HTML displayed in these tools is the rendered DOM. You can search for a snippet of text to see if it loaded by default.

The tools also show you resources that may be blocked and display error messages in the console that are useful for troubleshooting.

Search for text in Google

Another quick check you can do is to simply search for a snippet of your content on Google. Search for "any part of your content" and see if the page returns. If so, then your content has likely been seen. Note that content that is hidden by default may not appear in your snippet in the SERPs.

Ahrefs

In addition to the pages with link index rendering, you can activate JavaScript in Site Audit Crawls to unlock more data in your audits.

The Ahrefs Toolbar also supports JavaScript and allows you to compare HTML to rendered versions of tags.

There are many options when it comes to rendering JavaScript. Google has a solid chart that I'm just going to show you. Any type of SSR, static rendering, prerendering setup is fine for search engines. The main problem is full client-side rendering, where all rendering takes place in the browser.

While Google would probably even be happy with the client-side rendering, it is best to choose a different rendering option to help other search engines as well. Bing also has support for JavaScript rendering, but the scope is unknown. Yandex and Baidu have limited support from what I've seen so far, and many other search engines have little to no support for JavaScript.

There is also the option of Dynamic Rendering, which is the rendering for specific user agents. This is basically a workaround, but it can be useful for rendering for specific bots like search engines or even social media bots. Social media bots don't run JavaScript, so things like OG tags will only show up if you render the content before serving it to them.

If you used the old AJAX crawling scheme, note that it is out of date and may no longer be supported.

Make your JavaScript website SEO-friendly

Many of the processes are similar to things SEOs are already used to, but there could be slight differences.

On-page SEO

All normal on-page SEO rules for content, title tags, meta descriptions, alt attributes, meta robots tags, etc. still apply. See On-Page SEO: A Workable Guide.

A couple of problems I keep seeing when working with JavaScript websites are that titles and descriptions are reused, and alt attributes are rarely set on images.

Allow crawling

Don't block access to resources. Google needs to be able to access and download resources in order for the pages to display correctly. In your robots.txt, the easiest way to allow the resources you need to crawl is by adding:

User agent: Googlebot Allow: .js Allow: .css

Urls

Change URLs when updating content. I already mentioned the History API, but you should know that the JavaScript frameworks have a router that you can use to map to clean URLs. You don't want to use hashes (#) for routing. This is particularly a problem for Vue and some of the earlier versions of Angular. With a URL like abc.com/#something, everything after a # is usually ignored by a server. To fix this issue for Vue, you can work with your developer to change the following:

Vue router: Use "History" mode instead of the traditional "Hash" mode. const router = new VueRouter ({mode: ‘history’, router: [] // the array of router links)}

Double content

With JavaScript, there can be multiple URLs for the same content, which creates duplicate content problems. This can be caused by capitalization, IDs, parameters with IDs, etc. So these URLs can all exist:




The solution is simple. Choose a version to be indexed and set canonical tags.

SEO "plugin" options

In JavaScript frameworks, these are usually referred to as modules. You can find versions for many of the popular frameworks like React, Vue and Angular by searching for the framework + module name like “React Helmet”. Meta tags, helmet, and head are all popular modules with similar functionality that provide the ability to set many of the popular tags needed for SEO.

Error pages

Since JavaScript frameworks are not server-side, they cannot really throw a server error like a 404. You have a number of different options for creating error pages:

  1. Use a JavaScript redirect to a page that responds with a 404 status code
  2. Add a noindex tag to the page that failed, along with some sort of error message like "404 Page Not Found". This is treated like a soft 404 as the actual status code returned is an okay with 200.

Sitemap

JavaScript frameworks usually have routers that map to clean URLs. These routers usually have an additional module that can also create sitemaps. You can find it by looking for your system + router sitemap, e.g. "Vue router sitemap". Many of the rendering solutions may also have sitemap options. The same applies here: Just find the system you are using and google for the system + sitemap, e.g. B. "Gatsby sitemap" and you will surely find a solution that already exists.

Redirects

SEOs are used to 301/302 redirects, which are server-side. However, Javascript is typically executed on the client side. This is fine as Google will process the page as it would after it was redirected. The redirects continue to pass on all signals such as PageRank. You can usually find these redirects in code by searching for "window.location.href".

internationalization

There are usually a few module options for different frameworks that support some of the features required for internationalization, such as hreflang. They have been ported to the different systems and include i18n, intl. In many cases, the same modules used for header tags such as Helmet can be used to add tags as needed.

Lazy loading

There are usually modules for handling lazy loading. In case you haven't noticed, there are modules for pretty much anything you need to do when working with JavaScript frameworks. Lazy and Suspense are the most popular modules for lazy loading. You will want to load images "lazy", but be careful not to load any content "lazy". This can be achieved with JavaScript, but it could mean that the content is not being captured correctly by search engines.

Final thoughts

JavaScript is a tool that should be used wisely, but not something SEOs should fear. Hopefully this article has helped you better understand how to use it better, but don't be afraid to reach out and work with your developers and ask them questions. They will be your greatest ally when it comes to improving your JavaScript website for search engines.

Do you have any questions? Let me know on Twitter.

Translated bysehrausch.de: Search engine & conversion optimization, online marketing & paid advertising. A perfect fit from a single source.