How do you monitor web performance

Author: liu, summerqy

http://www.alloyteam.com/2020/01/14184

Maybe you heard a question, how is your web application performance? What are you going to answer? Will it be better than mass market web application? This article is here How to do web performance monitoring Including the indicators we need to monitor Monitoring of classification, performance analysis and monitoring.

However, web performance monitoring is a big topic in itself. This article focuses only on part of the research, and some of the content is not very comprehensive.

Foreword: Why do we need surveillance?

The performance of the web affects the user retention rate to some extent. Research by Google DoubleClick shows that when a mobile page takes longer than 3 seconds to load, users give up and leave. The BBC found that for every 1 second increase in the page load time, 10% of users are lost.

We hope to know the status and trend of web application performance through monitoring and find the web application bottleneck. How about post-release performance? Will it affect performance after it's published? Perceive the likelihood of business failure? How is the stability of the business?

What is to be monitored?

First, we need to know what to monitor? What are the specific indicators?

Google developers suggested a model for measuring application performance, namely: 、、、 Represent four different aspects of the web application lifecycle for each. And it was suggested that the best performance indicators are: Respond to user input within 100 ms , Animation or scrolling must generate the next image within 10 ms; Maximizing idle time; Page load takes no more than 5 seconds.

We can be transformed into three aspects: response speed, page stability, external service calls

  • Response speed: initial page access speed + interactive response speed

  • Page stability: Page fault rate

  • External service call: access speed for network requests

1. Page access speed: white screen, first screen time, interactive time

Let's take a look at some performance counters suggested by Google developers for user experience

These counters are actually based on the user experience to extract the appropriate counters

1) first paint (FP) and first contentful paint (FCP)

First rendering, first rendering with content

These two indicator browsers have been standardized based on the performance it can be obtained. In general, the two times are the same, but in some cases they are different.

2) First meaningful paint and hero element timing

The first meaningful rendering, key elements of the page

We assume that when the DOM structure of a website changes drastically, ie when the main content of the website is displayed, this is the first meaningful rendering at such a point in time. This indicator browser is not yet standardized. After all, it is difficult to unify one standard to define the main content of the website.

Google lighthouse defined : https: //docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKT-JK8HI/view

3) Time to interactive

Interaction time

4) Long tasks

The browser is single-threaded. Running too many long tasks inevitably affects the user's response time. Good applications need to maximize idle time in order to have the fastest response to user input.

2. Page stability: Page faults

  • Error loading the resource

  • JS execution failure

3. External service call

  • CGI takes time

  • CGI success rate

  • CDN resource time consuming

Classification of surveillance?

Web performance monitoring can be broken down into two categories: Synthetic Monitoring (SYN) and Real User Monitoring (RUM).

Synthetic surveillance

Synthetic monitoring uses a web browser simulator to load web pages, collects related performance indicators by simulating what the end user is doing, and finally outputs a website performance report. E.g. 、、、、 Wait.

1. Lighthouse

Is an open source automation tool from Google that runs in two ways: one runs as a Chrome extension, the other as a command line tool. The Chrome extension provides a more user-friendly interface for easy reading of reports. Lighthouse can be integrated into a continuous integration system using command line tools.

Performance indicators like white screen, first screen, interactive time, SEO, PWA, etc. are displayed.

Tencent Document Mobile Terminal official website homepage speed test results:

2. PageSpeed

https://developers.google.com/speed/pagespeed/insights/

In addition to showing some key performance index data, it also gives some suggestions for optimizing performance.

Speed ​​test results for the Tencent document's mobile home page and suggestions for optimizing performance:

3. WebPageTest

WebPageTest

The waterfall chart of the results of the performance speed measurement and the resource usage is indicated.

4. Pingdom

https://www.pingdom.com/

Note: In addition to synthetic monitoring, Pingdom also provides real user monitoring.

Advantages and disadvantages of synthetic surveillance:

Advantage:

  • Non-invasive.

  • Easy and fast. Disadvantage:

  • It is not a real user access situation, but a simulation.

  • The login situation cannot be taken into account and the page that needs to be logged in cannot be monitored.

2. Real user monitoring

Real user monitoring is a passive monitoring technology and application service. The monitored web application accesses the service through SDK and other methods, collects and reports actual user access, interaction and other performance index data, and creates a performance analysis report after data cleansing and processing. E.g. 、、 wait.

1. oneapm

https://www.oneapm.com/bi/feature.html

Features include: market data, functional statistics, slow load tracking, page access, script errors, AJAX, combination analysis, reports, alarms, etc.

2. Datadog

https://www.datadoghq.com/rum/

3. FrontJs

https://www.frontjs.com/

Features include: access performance, abnormal monitoring, reports, trends, etc.

The advantages and disadvantages of this monitoring method:

Advantage:

  • This is the actual user access situation.

  • You can observe historical performance trends.

  • There are some additional functions: report push, monitor alarm, etc. Disadvantages:

  • It's intrusive and responsive to web performance to some extent.

Performance analysis

Before we get into monitoring, let's take a look at the browser-provided performance API, which is also the main source of performance monitoring data.

Performance Provides high-precision time stamps with an accuracy of nanoseconds and is not affected by the time setting of the operating system.

Current support status in the market: All major browsers support it, so you can use it safely.

Basic attributes

performance.navigation: Indicates whether the page has been loaded or updated and how many redirects have occurred

performance.timing: The duration of each phase of the page load

The meaning of each stage:

performance.memory: Basic memory usage, a non-standard extension added by Chrome

performance.timeorigin: Highly precise time stamp of the time at which the performance measurement was started

Basic method

performance.getEntries ()

With this method you can get everything entity-object, by using method for all entity-objects filters and returns certain entity-types.

The combination of the marking method and the measuring method can be used for the timing and the time-consuming execution of a certain function.

  • performance.getEntriesByName ()

  • performance.getEntriesByType ()

  • performance.mark ()

  • performance.clearMarks ()

  • performance.measure ()

  • performance.clearMeasures ()

  • performance.now () ...

API provided

The performance also offers multiple APIs and there may be some overlap between different APIs.

1. PerformanceObserver API

This API uses the observer pattern to detect performance events.

Get resource information

Monitor TTI

Monitor long mission

2. Navigation timing API

https://www.w3.org/TR/navigation-timing-2/

Are the different stages continuous? -Discontinuous

Will each stage happen? -- not necessarily

  • Redirects: performance.navigation.redirectCount

  • Time-consuming redirection: redirectEnd-redirectStart

  • Time consuming DNS resolution: domainLookupEnd-domainLookupStart

  • TCP connection time: connectEnd-connectStart

  • Time-consuming SSL-secure connection: connectEnd-SecureConnectionStart

  • Network request time (TTFB): responseStart-requestStart

  • Data transfer time: responseEnd-responseStart

  • Time-consuming DOM analysis: domInteractive-responseEnd

  • Loading time of the resource: loadEventStart-domContentLoadedEventEnd

  • First packet time: responseStart-domainLookupStart

  • White screen time: responseEnd-fetchStart

  • First interactive time: domInteractive-fetchStart

  • DOM readiness time: domContentLoadEventEnd-fetchStart

  • Loading time of the entire page: loadEventStart-fetchStart

  • http header size: transferSize-encodedBodySize

3. Resource Timing APIhttps://w3c.github.io/resource-timing/

This data corresponds to the network waterfall chart data in the Chrome debugging tool.

4. paint timing API

https://w3c.github.io/paint-timing/

First render time of the screen, first render time of the content

5. User timing API

https://www.w3.org/TR/user-timing-2/#introduction

Mainly use marking and measuring methods to calculate the time consumption of a certain stage, e.g. B. the time consumption of a certain function.

6. High Resolution Time APIhttps://w3c.github.io/hr-time/#dom-performance-timeorigin

It mainly contains the now () method and the timeOrigin attribute.

7. Performance Timeline APIhttps://www.w3.org/TR/performance-timeline-2/#introduction

to conclude

Based on performance, we can measure the following aspects:

mark 、 measure 、 navigation 、 resource 、 paint 、 frame。

Number of redirects:

Number of JS resources:

Number of CSS resources:

Number of AJAX requests:

Number of IMG resources:

Total resources:

Different time consuming periods:

  • Time-consuming redirection: redirectEnd-redirectStart

  • Time consuming DNS resolution: domainLookupEnd-domainLookupStart

  • TCP connection time: connectEnd-connectStart

  • Time-consuming SSL-secure connection: connectEnd-SecureConnectionStart

  • Network request time (TTFB): responseStart-requestStart

  • HTML download time-consuming: responseEnd-responseStart

  • Time-consuming DOM analysis: domInteractive-responseEnd

  • Loading time of the resource: loadEventStart-domContentLoadedEventEnd

Other combination analysis:

  • White Screen Time: domLoading-fetchStart

  • Rough first screen time: loadEventEnd-fetchStart or domInteractive-fetchStart

  • DOM readiness time: domContentLoadEventEnd-fetchStart

  • Loading time of the entire page: loadEventStart-fetchStart

JS total charging time:

Total CSS load time:

How to monitor?

Now that we understand the performance, let's look at how it is being monitored.

Overall process: performance indicator recording and data reporting-data storage-data aggregation analysis-display alarms, report push

The main concern here is how performance data is collected.

Notes on recording performance indicators:

  • Make sure the data is correct

  • Try not to affect the application's performance

1. Basic performance report

Collect data: All items in are reported and the rest of the reported content can be reported in the intercepted portion of the performance analysis section. Example: white screen time, total JS and CSS, and total load time.

Other reports to refer to: Is there a cache? Specifies whether to enable gzip compression and page loading mode. After the performance data has been collected, the data can be reported.

When would you like to report?

Reporting method recommended by Google developers:

2. Calculation of the first screen time

We know first screen time is an important indicator, but it's difficult to tell from performance. Let's see how the first screen time is mainly calculated.

https://web.dev/first-meaningful-paint/

1) Custom management - the most accurate way (only the user knows best when the first screen loads)

2) The lighthouse is using the trace event recorded during the chrome rendering process

3) You can use the Chrome DevTools protocol to get the number of page layout nodes. The idea is: determine when the page has the greatest change in layout

4) Aegis method: use the MutationObserver interface to monitor the node changes of the document object.

Verify that these changed nodes appear on the first screen. If these nodes are on the first screen, the current time is the render time of the first screen. However, the loading time of the image in the first screen to be taken into account must also be run through. For all image entity objects obtained, the first screen rendering time is updated according to the initial loading time and the loading completion time of the image.

5) Use The interface offers the possibility to monitor changes to the DOM tree and is part of the DOM3 event specification.

Method: Put a div in the first content module, use the Mutation Observer API to monitor the div's dom event and see if the height of the div is greater than 0 or greater than the specified value. If it's greater than this value, the main content has been rendered and the first can be calculated. Screen time.

6) A patent: In the loading state, it is determined whether the current page height is greater than the screen height. If it is larger than the screen height, the screen image of the current page is preserved and the page rendering is compared to see if the page is full.

https://patentimages.storage.googleapis.com/bd/83/3d/f65775c31c7120/CN103324521A.pdf

3. Abnormal report

  • 1) js error listens to the window.onerror event

  • 2) Abnormal monitoring of promise to reject unhandled rejection event

  • 3) Resource loading failed window.addEventListener ('error')

  • 4) The network request was unable to rewrite window.XMLHttpRequest and window.fetch to capture request errors

  • 5) iframe exception window.frames [0] .onerror

  • 6) window.console.error

4. CGI report

General principle: intercepting Ajax requests

Data storage and aggregation

A user visits can report dozens of data, each data is multidimensional. Namely: current access time, platform, network, IP, etc. This data is stored in the database and meaningful data can then be extracted through data analysis and aggregation. For example: the average visit time of all users on a given day, pv, etc.

Statistical data analysis methods: average statistical method, statistical percentile method, statistical sample distribution method.

Reference article

Why performance is so important: https: //developers.google.cn/web/fundamentals/performance/why-performance-matters

First meaningful color in Chrome: https: //juejin.im/entry/598080226fb9a03c5d535cd5

Ant Financial: https: //www.infoq.cn/article/Dxa8aM44oz*Lukk5Ufhy

FMP : https: //docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKT-JK8HI/view#heading=h.k50nnyhtptq0

How to build a front-end surveillance system: https://www.zhihu.com/question/37585246

FEX-7 days to build a front-end performance monitoring system: https: //fex.baidu.com/blog/2014/05/build-performance-monitor-in-7-days/

Automation of the first screen time: https: //cloud.tencent.com/developer/article/1061844https: //segmentfault.com/a/1190000013532766

How to use the Performance API to measure performance: https://blog.logrocket.com/how-to-practically-use-performance-api-to-measure-performance/Improving Performance with the Paint Timing API : https: //www.sitepen.com/blog/improving-performance-with-the-paint-timing-api/

Chrome performance page performance analysis tutorial: https://www.cnblogs.com/ranyonsue/p/9342839.html

Alibaba Cloud Front End Monitoring Overview: https: //help.aliyun.com/document_detail/58652.html? Spm = a2c4g.11186623.6.627.7f782f4dsb9ZV7

The difference between the first charge and the first meaning: https: //webenso.com/forget-page-load-time/

others: https: //cdc.tencent.com/2018/09/13/frontend-exception-monitor-research/

Realization principle of LightHouse: https: //juejin.im/post/5dca05f45188250c643b7d76

Test website performance with Puppeteer : https: //michaljanaszek.com/blog/test-website-performance-with-puppeteer

"" The question interviewers look at, click to learn ""