Sencha Inc. | HTML5 Apps

Blog

Understanding Hardware Acceleration on Mobile Browsers

July 15, 2011 | Ariya Hidayat

Understanding Hardware Acceleration on Mobile Browsers There has been a lot of mentions of the use of GPU (graphics processing unit) hardware acceleration in smartphone and tablet web browsers. So far, the content has been pretty general and hasn’t provided much technical direction apart from simple advice such as “use CSS translate3d”. This blog article tries to shed some more light on browser interactions with the GPU and explain what happens behind the scenes.

Accelerating Primitive Drawing

A web rendering engine, such as WebKit, takes a web page that is described structurally using HTML and a DOM and visually using CSS and transforms it into a series of painting commands and then passes these commands to the graphics stack. In WebKit specifically, WebKit talks to an abstract interface called GraphicsContext. There are different implementations of GraphicsContext depending on the underlying platform. For example, on iOS the GraphicsContext is bound to CoreGraphics. On Android, GraphicContext uses the Skia graphics engine.

A major responsibility of the graphics stack is rasterization: converting vector painting commands into color pixels on a screen. Rasterization also applies to text display. A single letter can consist of a chain of hundreds of curves. Rasterization produces a matrix of pixels of varying colors that gives users the impression of smoothly drawn text. The following picture shows the enlarged portion of a letter displayed on the screen:

Graphic showing letter 'A' and the zoomed in pixels thereof

The most common mobile graphics API is OpenGL for Embedded System, shortened as OpenGL ES, which operates quite similarly to its desktop OpenGL counterpart. A modern GPU has the power to carry out a lot of primitive drawing, from textured triangles to anti-aliased polygons, with massively paralleled implementation of various graphics algorithm. This is, of course, evidenced by a lot of graphics-intensive games which run smoothly - often achieving the ultimate goal of 60 fps on even highly complex scenes.

If you’re building a browser, it makes sense to reduce the burden of the CPU and to delegate most of the primitive drawing (such as images, curves, gradients, and so on) to the GPU. This is one way that the GPU accelerates performance and it is very often taken care of automatically by the graphics stack. On iOS, CoreGraphics leverages many different GPU features for difficult drawing operations, leveraging its Mac OS X experience. On Android (since Honeycomb), Skia also has a full-featured OpenGL ES back-end which fits nicely with NVIDIA Tegra2 GPUs.

Backing Store

It is important to note here that GPUs were originally designed to tackle heavy-duty operations needed in engineering applications (CAD/CAM) and graphics-intensive games. But optimizing the primitive drawing typically found in a web page is very different than making game graphics fast. For a start, most web pages consist of a lot of text and an occasional image. Most web page user-interface elements have solid colors, with some gradients and rounded corners here and there. In contrast, top-selling games like Angry Birds, Need for Speed, and Quake hardly contain any text and almost everything in the game world is an object with a texture. In addition, 3-D models with photo-realistic appearances are pretty common in such games.

Since the GPU is optimized for complex use-cases, it does not always come as a surprise that simply asking a GPU to draw images, curves, text glyphs, and other content, does not magically translate into a fluid 60 frames/second for web page rendering. In addition, unlike games, web content can’t be predicted by the browser. A web page can be as simple as a Bing search page or as complicated as the New York Times front page. To achieve a really smooth browsing experience, user interactions with the browser should not be limited by the complexity of the page. In other words, even if the browser is busy loading images and rendering the page, the user should still be able to scroll around and zoom in/out as she wants.

Modern mobile browsers adopt an off-screen buffer approach to decouple the complexity of displaying a web page from the user interaction. Usually the web rendering engine, (WebKit for example), draws into the buffer instead of straight to the display. This buffer, often called the backing store, will be shown on screen based on user activity. When the web page is quite complicated and the user scrolls and zooms quickly, the backing store is often not filled fast enough. This is the reason why on an iPhone or iPad, the checkerboard pattern is visible; it serves as a placeholder for the region of the buffer that is not fully rendered yet. This way, the web page can be scrolled around or zoomed in/out as fast as the user wants. The rendering process (which fills the backing store) may lag user interactions but since it is in a separate thread, it does not block any user interactions occurring in the main UI thread.

Another side effect of using a backing store is progressive rendering when the user zooms in and out. A backing store is nothing but a rectangle with a texture. For efficiency, the backing store is usually tiled, i.e. it comprises several small textured rectangles instead of a giant one. During pinching, all the browser does is scale the backing store up and down, thus giving an enlarged but blurry version of the web page. Since pinching typically happens in a few hundred milliseconds, there is no use of faithful high-resolution rendering. Once the user is done with pinching, or when there is an idle moment, the backing store is updated with the correct resolution web page rendering.

Graphic showing blurry SVG vs sharp SVG, side by side

One of the disadvantages of using a backing store per page (regardless whether it is tiled or not) is that is causes difficulty in implementing support for overflow:scroll and position:fixed. The main reason is that the panning and zooming actions from the user modify only the transformation matrix of the backing store, but do not update the backing store. For these two CSS features to work, the handling of the backing store has to be improved to account for content movement within the display.

Layer and Compositing

For web applications which have more dynamic content, including for example CSS animations, having a static off-screen buffer does not really help. However, the same backing store concept can be extended further. Instead of one giant backing store for the entire page, we can have multiple smaller backing stores, each associated with an animated element.

Take for example the famous falling leaves demo from WebKit. This demo really shows how creating backing stores at a more granular level can improve the frame rate. Rather than drawing the leaves (with different rotation and position) for each animation step, WebKit creates a small layer for each leaf, sends those layers to the GPU once, and performs the animation by varying the transformation matrix and opacity of every layer (and thus also the corresponding leaf thereof). Effectively, this creates a really smooth animation because (1) the CPU does not need to do anything beside the initial animation setup and (2) the GPU is only responsible for compositing different layers during the entire animation process. As evidenced from 60 fps performance of many graphics-intensive mobile games, compositing such a rather simple collection of layers is a piece of cake for modern GPU nowadays.

Graphic outlining the layers in the falling leaves demo

The best practice of setting the CSS transformation matrix to translate3d or scale3d (even though there is no 3-D involved) comes from the fact that those types of matrix will switch the animated element to have its own layer which will then be composited together with the rest of the web page and other layers. But you should note that creating and compositing layers come with a price, namely memory allocation. It is not wise to blindly composite every little element in the web page for the sake of hardware acceleration, you’ll eat memory.

Conclusion

In short, making a web browser take advantage of GPU hardware acceleration is far from trivial. It involves making lots of changes at multiple levels, from primitive drawing acceleration, to textured backing store, and layer compositing. But the best possible performance can be achieved when all of these work in harmony.

There are 11 responses. Add yours.

Jamie Avins

3 years ago

Nice post Ariya!

David Kaneda

3 years ago

Awesome post!

Trygve Lie

3 years ago

Very good post.

It’s pretty clear that using the GPU does come with a price (memory allocation) as you point out. Is there a good way to / thumb of rule on how to keep control of this memory allocation?

Let’s take a simple example: Let’s say we have a document with 20 elements (think flip board) with added translate3d. These elements are intended to be moved from position A to B and we only move one element from A to B at a time. Will all 20 elements allocate memory in the GPU or will memory be allocated only during the animation of a element from A to B?

I would expect that all 20 elements will allocate memory. If so; would an approach where we apply translate3d to the element moved from A to B only during movement be a memory saving approach?

Andrea Cammarata

3 years ago

Great post!

Ariya

3 years ago

@Trygve I don’t know the answer to your question. One thing I omitted from the blog entry is the role of the underlying graphics stack. In the case of iOS, CoreAnimation also plays an important role. If it has support for swapping the texture as needed (which I suspect it is the case, though it’s hard to verify it since CA is not open-source), then it may push and pop the textures as needed, making it more difficult to find out whether a certain textured layer (associated with your element) is eating the GPU memory or not. The way I describe it is usually as follows: all the trick with translate3d/scale3d/opacity just gives a hint (albeit a strong one) to the graphics stack to keep the layer as long as possible in the GPU. Whether it is guaranteed to be the case of not is different story.

Trygve Lie

3 years ago

@Ariya I see. It would be nice if the device / OS vendors would provide some more insight into these subjects.

I must say I would like to see more live crash tests related to these topics. That might give us some more insight to where the limitations are.

Gaurav Mishra

3 years ago

Very Robust! explanation
I think need to read that again :- )

Mobile Money Machines

3 years ago

This is a great explanation on quite a tricky topic so thanks for sharing.

I have one quick question if that’s OK. You mention that when building a browser, MOST of the primitive drawing is delegated to the GPU. Are there any examples of where this is not the case and it is in fact it makes sense to leave the CPU to do the work?

Thanks

Ariya Hidayat

3 years ago

That’s the outside of the scope of this blog entry, but the short answer is “it depends”. For example, some old-generation GPU may have problem with complicated glyphs used to render the text and thus the CPU needs to provide some assistance, e.g. performing path breakdown and simplification. Dual- or quad-core processor, with support for vectorized or data-level parallelism, can perform some initial operations on various graphics algorithm, which would help the GPU a lot.

engineer

3 years ago

Bravo, me parece, es la frase admirable

 
file

Iris

3 years ago

Another great post from you, Ariya!  You’re always so informative smile

Comments are Gravatar enabled. Your email address will not be shown.

Commenting is not available in this channel entry.