As Google's John Mueller said:
The web has moved from plain HTML - as an SEO you can embrace that. Learn from JS devs & share SEO knowledge with them. JS’s not going away.
- 🍌 John 🍌 (@JohnMu) August 8, 2017
The system that handles the rendering process at Google is called the Web Rendering Service (WRS). Google has provided a simplified diagram to show how this process works.
Let's say we start the process at URL.
The crawler sends GET requests to the server. The server replies with headers and the content of the file, which is then saved.
The request will likely come from a mobile user agent as Google is now mainly indexed on Mobile First. You can use the URL inspection tool in the search console to see how Google is crawling your website. If you're doing this for a URL, check the coverage information for “Crawled As”. This should tell you whether you are still in the desktop indexing or in the "Mobile First" indexing.
The requests are mostly from Mountain View, CA, USA, but they are also crawling some locally customized pages outside of the United States. I mention this because some websites block or treat visitors from a specific country or the use of a specific IP in different ways, which can result in your content not being seen by Googlebot.
Resources and Links
Google doesn't navigate from page to page like a user would. Part of the processing is to check the page for links to other pages and files that are needed to build the page. These links are pulled out and added to the crawl queue that Google uses to prioritize and schedule crawls.
Well:<a href=”/seite”>einfach ist gut</a> <a href=”/seite” onclick=”goTo(‘seite’)”>immer noch okay</a>
Bad:no, no href no, link is missing no, link missing not the correct HTML element no link button, ng-click, there are many other possibilities how this can be done wrong.
Elimination of duplicates
Duplicate content can be eliminated or deprioritized from the downloaded HTML before it is sent for rendering. In the case of app shell models, there may be very little content and code to be displayed in the HTML response. In fact, there may be times when the same code is displayed on every page of the website, and it could be the same code displayed on multiple websites. This can sometimes result in pages being treated as duplicates and not immediately sent for rendering. Worse, the wrong page or even the wrong website may appear in search results. This should fix itself over time, but it can be problematic, especially with newer websites.
The most restrictive guidelines
3. Render list
To do this, Google uses a headless Chrome browser that is now "evergreen", i.e. it should use the latest Chrome version and support the latest functions. Until recently, Google was rendering with Chrome 41 so many features were unsupported.
Google has more information about the Web Rendering Service (WRS) which includes things like denial of permissions, stateless, flattening-light-DOM and shadow-DOM, and a lot more that is worth reading. DOM, and more that is worth reading.
Google relies heavily on cached resources. Pages are cached; Files are cached; API requests are cached; basically everything is cached before it is sent to the renderer. Google doesn't go out and download every resource every time a page is viewed, but instead uses cached resources to speed up this process.
This can lead to some impossible states where previous file versions are used in the rendering process and the indexed version of a page can contain parts of older files. You can use file versioning or content fingerprinting to generate new file names when significant changes are made, requiring Google to download the updated version of the resource for rendering.
No fixed timeout
A common SEO myth is that the renderer only waits five seconds for your page to load. While it's always a good idea to make your page faster, this myth doesn't really make sense with the way Google caches the files mentioned above. Basically, they're loading a page that has everything cached already. The myth stems from the testing tools like the URL inspection tool, where resources are pulled live and they have to set a reasonable limit.
There is no fixed timeout for the renderer. What they're probably doing is something similar to the public rendertron. They're likely waiting for something like networkidle0 where there is no more network activity and also setting a maximum amount of time in case something gets stuck or someone tries to mine bitcoin on their pages.
What Googlebot sees
Googlebot is not active on websites. He won't click or scroll things, but that doesn't mean there aren't any workarounds. As for the content, you will see it as long as it is loaded in the DOM with no action required. I'll cover this in more detail in the troubleshooting section, but basically if the content is in the DOM but is only hidden, then the content will be seen. If it is only loaded into the DOM after a click, the content will not be found.
Google also doesn't have to scroll to see your content because they have a clever workaround to see the content. For cell phones, load the page with a screen size of 411x731 pixels and adjust the length to 12,140 pixels. Essentially, this becomes a really long phone with a screen size of 411x12140 pixels. Do the same for the desktop, going from 1024x768 pixels to 1024x9307 pixels.
Another interesting shortcut is that Google doesn't render the pixels during the rendering process. That takes time and additional resources to finish loading a page, and you don't really need to see the final state with the pixels displayed. You just need to know the structure and layout, and you get that without actually having to render the pixels. As Google's Martin Splitt puts it:
In Google search, we don't really care about the pixels because we don't really want to show them to anyone. We want to process the information and the semantic information, so we need something in between. We don't really need to represent the pixels.
A visual representation could help explain the cut part a little better. When you run a test on the Performance tab in Chrome Dev Tools, you will get a loading graph. The solid green part here represents the rendering phase, and for Googlebot that never happens, so they save resources.
Gray = Downloads
blue = HTML
violet = Layout
green = Representation
5. Crawl list
Google has a resource that talks a little about the crawl budget, but know that every website has its own crawl budget and every request needs to be prioritized. Google also needs to balance the crawling of your website compared to any other website on the internet. Newer sites in general, or sites with many dynamic pages, are likely to crawl more slowly. Some pages are updated less frequently than others, and some resources may also be requested less frequently.
View-source vs. inspect
Google's cache is not a reliable means of checking what Googlebot is seeing. Usually it is the original HTML, but sometimes it is the rendered HTML or an older version. The system is designed to see the content when a website is not available. It's not particularly useful as a debugging tool.
Google Testing Tools
Google's test tools such as the URL Inspector in the Google Search Console, Mobile Friendly Tester, and Rich Results Tester are useful for troubleshooting. Still, these tools also differ slightly from what Google will see. I've already talked about the five second timeout in these tools that the renderer doesn't have, but these tools also differ in that they pull resources in real time rather than using the cached versions like the renderer would. The screenshots in these tools also show pages with the displayed pixels that Google does not see in the renderer.
However, the tools are useful for seeing if the content is loaded in the DOM. The HTML displayed in these tools is the rendered DOM. You can search for a snippet of text to see if it loaded by default.
The tools also show you resources that may be blocked and display error messages in the console that are useful for troubleshooting.
Search for text in Google
Another quick check you can do is to simply search for a snippet of your content on Google. Search for "any part of your content" and see if the page returns. If so, then your content has likely been seen. Note that content that is hidden by default may not appear in your snippet in the SERPs.
If you used the old AJAX crawling scheme, note that it is out of date and may no longer be supported.
Many of the processes are similar to things SEOs are already used to, but there could be slight differences.
All normal on-page SEO rules for content, title tags, meta descriptions, alt attributes, meta robots tags, etc. still apply. See On-Page SEO: A Workable Guide.
Don't block access to resources. Google needs to be able to access and download resources in order for the pages to display correctly. In your robots.txt, the easiest way to allow the resources you need to crawl is by adding:User agent: Googlebot Allow: .js Allow: .css
The solution is simple. Choose a version to be indexed and set canonical tags.
SEO "plugin" options
- Add a noindex tag to the page that failed, along with some sort of error message like "404 Page Not Found". This is treated like a soft 404 as the actual status code returned is an okay with 200.
There are usually a few module options for different frameworks that support some of the features required for internationalization, such as hreflang. They have been ported to the different systems and include i18n, intl. In many cases, the same modules used for header tags such as Helmet can be used to add tags as needed.
Do you have any questions? Let me know on Twitter.
Translated bysehrausch.de: Search engine & conversion optimization, online marketing & paid advertising. A perfect fit from a single source.
- What discounts do Disney performers get
- How to become an Amazon Seller
- Which lenses are good for hair growth
- The Mederma cream removes hyperpigmentation
- Have you used many types of salt?
- Is SHIATS a government institute or not
- How was Jane Addam's childhood
- What is the best video compression format
- What are the disadvantages of communal farming
- What is your best leadership trait
- How do you explain the internet
- Which ALT Altcoin is the best
- What is total benefit
- Arachnoiditis is a common medical disorder
- After the ASOIAF series, something ends next
- What are cash stocks
- Is the musician Prince bisexual?
- What is Facebook good for?
- What are different communication barriers
- What are IQ questions
- What is a rifleman in the army
- How often do you feel overwhelmed
- How do I take Modafinil 200mg effectively
- What purpose do pigments serve on plants