Understanding Hardware Acceleration on Mobile Browsers
There has been a lot of mentions of the use of GPU (graphics processing unit) hardware acceleration in smartphone and tablet web browsers. So far, the content has been pretty general and hasn’t provided much technical direction apart from simple advice such as “use CSS translate3d”. This blog article tries to shed some more light on browser interactions with the GPU and explain what happens behind the scenes.
Accelerating Primitive Drawing
A web rendering engine, such as WebKit, takes a web page that is described structurally using HTML and a DOM and visually using CSS and transforms it into a series of painting commands and then passes these commands to the graphics stack. In WebKit specifically, WebKit talks to an abstract interface called GraphicsContext. There are different implementations of GraphicsContext depending on the underlying platform. For example, on iOS the GraphicsContext is bound to CoreGraphics. On Android, GraphicContext uses the Skia graphics engine.
A major responsibility of the graphics stack is rasterization: converting vector painting commands into color pixels on a screen. Rasterization also applies to text display. A single letter can consist of a chain of hundreds of curves. Rasterization produces a matrix of pixels of varying colors that gives users the impression of smoothly drawn text. The following picture shows the enlarged portion of a letter displayed on the screen:
The most common mobile graphics API is OpenGL for Embedded System, shortened as OpenGL ES, which operates quite similarly to its desktop OpenGL counterpart. A modern GPU has the power to carry out a lot of primitive drawing, from textured triangles to anti-aliased polygons, with massively paralleled implementation of various graphics algorithm. This is, of course, evidenced by a lot of graphics-intensive games which run smoothly - often achieving the ultimate goal of 60 fps on even highly complex scenes.
If you’re building a browser, it makes sense to reduce the burden of the CPU and to delegate most of the primitive drawing (such as images, curves, gradients, and so on) to the GPU. This is one way that the GPU accelerates performance and it is very often taken care of automatically by the graphics stack. On iOS, CoreGraphics leverages many different GPU features for difficult drawing operations, leveraging its Mac OS X experience. On Android (since Honeycomb), Skia also has a full-featured OpenGL ES back-end which fits nicely with NVIDIA Tegra2 GPUs.
It is important to note here that GPUs were originally designed to tackle heavy-duty operations needed in engineering applications (CAD/CAM) and graphics-intensive games. But optimizing the primitive drawing typically found in a web page is very different than making game graphics fast. For a start, most web pages consist of a lot of text and an occasional image. Most web page user-interface elements have solid colors, with some gradients and rounded corners here and there. In contrast, top-selling games like Angry Birds, Need for Speed, and Quake hardly contain any text and almost everything in the game world is an object with a texture. In addition, 3-D models with photo-realistic appearances are pretty common in such games.
Since the GPU is optimized for complex use-cases, it does not always come as a surprise that simply asking a GPU to draw images, curves, text glyphs, and other content, does not magically translate into a fluid 60 frames/second for web page rendering. In addition, unlike games, web content can’t be predicted by the browser. A web page can be as simple as a Bing search page or as complicated as the New York Times front page. To achieve a really smooth browsing experience, user interactions with the browser should not be limited by the complexity of the page. In other words, even if the browser is busy loading images and rendering the page, the user should still be able to scroll around and zoom in/out as she wants.
Modern mobile browsers adopt an off-screen buffer approach to decouple the complexity of displaying a web page from the user interaction. Usually the web rendering engine, (WebKit for example), draws into the buffer instead of straight to the display. This buffer, often called the backing store, will be shown on screen based on user activity. When the web page is quite complicated and the user scrolls and zooms quickly, the backing store is often not filled fast enough. This is the reason why on an iPhone or iPad, the checkerboard pattern is visible; it serves as a placeholder for the region of the buffer that is not fully rendered yet. This way, the web page can be scrolled around or zoomed in/out as fast as the user wants. The rendering process (which fills the backing store) may lag user interactions but since it is in a separate thread, it does not block any user interactions occurring in the main UI thread.
Another side effect of using a backing store is progressive rendering when the user zooms in and out. A backing store is nothing but a rectangle with a texture. For efficiency, the backing store is usually tiled, i.e. it comprises several small textured rectangles instead of a giant one. During pinching, all the browser does is scale the backing store up and down, thus giving an enlarged but blurry version of the web page. Since pinching typically happens in a few hundred milliseconds, there is no use of faithful high-resolution rendering. Once the user is done with pinching, or when there is an idle moment, the backing store is updated with the correct resolution web page rendering.
One of the disadvantages of using a backing store per page (regardless whether it is tiled or not) is that is causes difficulty in implementing support for overflow:scroll and position:fixed. The main reason is that the panning and zooming actions from the user modify only the transformation matrix of the backing store, but do not update the backing store. For these two CSS features to work, the handling of the backing store has to be improved to account for content movement within the display.
Layer and Compositing
For web applications which have more dynamic content, including for example CSS animations, having a static off-screen buffer does not really help. However, the same backing store concept can be extended further. Instead of one giant backing store for the entire page, we can have multiple smaller backing stores, each associated with an animated element.
Take for example the famous falling leaves demo from WebKit. This demo really shows how creating backing stores at a more granular level can improve the frame rate. Rather than drawing the leaves (with different rotation and position) for each animation step, WebKit creates a small layer for each leaf, sends those layers to the GPU once, and performs the animation by varying the transformation matrix and opacity of every layer (and thus also the corresponding leaf thereof). Effectively, this creates a really smooth animation because (1) the CPU does not need to do anything beside the initial animation setup and (2) the GPU is only responsible for compositing different layers during the entire animation process. As evidenced from 60 fps performance of many graphics-intensive mobile games, compositing such a rather simple collection of layers is a piece of cake for modern GPU nowadays.
The best practice of setting the CSS transformation matrix to translate3d or scale3d (even though there is no 3-D involved) comes from the fact that those types of matrix will switch the animated element to have its own layer which will then be composited together with the rest of the web page and other layers. But you should note that creating and compositing layers come with a price, namely memory allocation. It is not wise to blindly composite every little element in the web page for the sake of hardware acceleration, you’ll eat memory.
In short, making a web browser take advantage of GPU hardware acceleration is far from trivial. It involves making lots of changes at multiple levels, from primitive drawing acceleration, to textured backing store, and layer compositing. But the best possible performance can be achieved when all of these work in harmony.