Fast

How fast are web workers?

The next version of Firefox OS, the mobile operating system, will unleash the power of devices by taking full advantage of their multi-core processors. Classically, JavaScript has been executed on a single thread, but web workers offer a way to execute code in parallel. Doing so frees the browser of anything that may get in the way of the main thread so that it can smoothly animate the UI.

A brief introduction to web workers

There are several types of web workers:

They each have specific properties, but share a similar design. The code running in a worker is executed in its own separate thread and runs in parallel with the main thread and other workers. The different types of workers all share a common interface.

Web workers

Dedicated web workers are instantiated by the main process and they can communicate only with it.

Shared workers

Shared workers can be reached by all processes running on the same origin (different browser tabs, iframes or other shared workers).

Service workers

Service workers have gained a lot of attention recently. They make it possible to proxy a web server programmatically to deliver specific content to the requester (e.g. the main thread). One use case for service workers is to serve content while offline. Service workers are a very new API, not fully implemented in all browsers, and are not covered in this article.

Service workers have gained a lot of attention recently. They make it
possible to proxy a web server programmatically to deliver specific
content to the requester (e.g. the main thread). One use case for
service workers is to serve content while offline.
Service workers are a very new API, not fully implemented in all
browsers, and are not covered in this article

In order to verify that web workers make Firefox OS run faster, we need to validate their speed by benchmarking them.

The cost of creating web workers

This article focuses on Firefox OS. All measurement are made on a Flame device, powered by middle-end hardware.

The first set of benchmarks will look at the time it takes to create web workers. To do that, we set up a script that instantiates a web worker and sends a minimal message, to which the worker replies immediately. Once the response is received by the main thread, the time that the operation takes is calculated. The web worker is destroyed and the operation is repeated enough times to get a good idea of how long it takes on average to get a functional web worker. Instantiating a web worker is as easy as:

<code>// Start a worker.
var worker = new Worker('worker-script.js');
 
// Terminate a worker.
worker.terminate();

The same method is applied to the creation of broadcast channel:

// Open a broadcast channel.
var channel = new window.BroadcastChannel('channel-name');
 
// Close a broadcast channel.
channel.close();

Shared workers can’t really be benchmarked here because once they are created, the developer can’t destroy them. The browser is entirely responsible for their lifetime. For that reason, we can’t create and destroy shared workers at will to get a meaningful benchmark.

Web workers take about 40ms to be instantiated. Also, this time is pretty stable with variations of only a few milliseconds. Setting up a broadcast channel is usually done within 1ms.

Under normal circumstances, the browser UI is refreshed at a rate of 60 frames per second. This means that no JavaScript code should run longer than the time needed by a frame, i.e., 16.66ms (60 frames per second). Otherwise, you may introduce jankiness and lag in your application.

Instantiating web workers is pretty efficient, but still may not fit in the time allocated for a single frame. That’s why it’s important to create as few web workers as possible and reuse them.

Message latency

A critical aspect of web workers is having fast communication between your main thread and the workers. There are two different ways the main browser thread can communicate with a web worker.

postMessage

This API is the default and preferred way to send and receive messages from a web worker. postMessage() is easy to use:

// Send a message to the worker.
worker.postMessage(myMessage);
 
// Listen to messages from the worker.
worker.onmessage = evt => {
  var message = evt.data;
};

Broadcast Channel

This is a newly implemented API, only available in Firefox at the time of this writing. It lets us broadcast messages to all contexts sharing the same origin. All browser tabs, iframes, or workers served from the same origin can emit and receive messages:

// Send a message to the broadcast channel.
channel.postMessage(myMessage);
 
// Listen to messages from the broadcast channel.
channel.onmessage = evt => {
  var message = evt.data;
};

To benchmark this, we use a script similar to the one described above, except that the web worker is not destroyed and reused at each operation. The time to get a round trip response should be divided by two.

As you might expect, the simple postMessage is fast. It usually takes between 0 to 1ms to send a message, whether to a web or shared worker. Broadcast channel API takes about 1 to 2ms.

Under normal circumstances, exchanging messages with workers is fast and you should not feel too concerned about speed here. However, larger messages can take longer.

The size of messages

There are 2 ways to send messages to web workers:

  • Copying the message
  • Transferring the message

In the first case, the message is serialized, copied, and sent over. In the latter, the data is transferred. This means that the original sender can no longer use it once sent. Transferring data is almost instantaneous, so there is no real point in benchmarking that. However, only ArrayBuffer is transferable.

As expected, serializing, copying, and de-serializing data adds significant overhead to the message transmission. The bigger the message, the longer it takes to be sent.

The benchmark here sends a typed array to a web worker. Its size is progressively increased at each iteration. There is a linear correlation between size of the message and transfer time. For each measurement, we can divide the size (in kilobytes) by the time (in milliseconds) to get the transfer speed in kb/ms.

Typically, on a Flame, the transfer speed is 45 kb/ms for postMessage and 6b/ms using broadcast channel. This means that if you want your message to fit in a single frame, you should keep it under 350MB with postMessage and under 50MB when using the broadcast channel. Otherwise, it may introduce frame drop in your application.

In this benchmark, we use typed arrays, because it makes it possible to determine their size in kilobytes precisely. You can also transfer JavaScript objects, but due to the serialization process, they take longer to post. For small objects, this doesn’t really matter, but if you need to send huge objects, you may as well serialize them to a binary format. You can use something similar to Protocol Buffer.

Web workers are fast if used correctly

Here is a quick summary of various benchmarks related to web workers, as measured on a Flame:

Operation Value
Instantiation of a web worker 40ms
Instantiation of a broadcast channel 1ms
Communication latency with postMessage 0.5ms
Communication latency with broadcast channel 1.5ms
Communication speed with postMessage 45kB/ms
Communication speed with broadcast channel 6kB/ms
Maximum message size with postMessage 350MB
Maximum message size with broadcast channel 50MB

Benchmarking is the only way to make sure that the solution you are implementing is fast. This process takes much of the guesswork out of web development.

If you want to run these benchmarks on a specific device, the app I built to make these measurements, web workers benchmark, is open source. You are also welcome to contribute by submitting new types of benchmarks.

View full post on Mozilla Hacks – the Web developer blog

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

How fast is PDF.js?

Hi, my name is Thorben and I work at Opera Software in Oslo, not at Mozilla. So, how did I end up writing for Mozilla Hacks? Maybe you know that there is no default PDF viewer in the Opera Browser, something we would like to change. But how to include one? Buy it from Adobe or Foxit? Start our own?

Introducing PDF.js

While investigating our options we quickly stumbled upon PDF.js. The project aims to create a full-featured PDF viewer in the browser using JavaScript and Canvas. Yeah, it sounds a bit crazy, but it makes sense: browsers need to be good at processing text, images, fonts, and vector graphics — exactly the things a PDF viewer has to be good at. The draw commands in PDFs are a subset of Postscript, and they are not so different from what Canvas offers. Also security is virtually no issue: using PDF.js is as secure as opening any other website.

Working on PDF.js

So Christian Krebs, Mathieu Henri and myself began looking at PDF.js in more detail and were impressed: it’s well designed, seems fast and big parts of the code are just wow!

But we also discovered some problems, mainly with performance on very large or graphics-heavy PDFs. We decided that the best way to get to know PDF.js better and to push the project further, was to help the project and address the major issues we found. This gave us a pretty good understanding of the project and its high potential. We were also very impressed by how much the performance of PDF.js improved while we worked on it. This is an active and well managed project.

Benchmarking PDF.js

Of course, our tests gave us the wrong impression about performance. We tried to find super large, awkward and hard-to-render PDFs, but that is not what most people want to view. Most PDFs you actually want to view in PDF.js are fine. But how to test that?

Well, you could check the most popular PDFs on the Internet – as these are the ones you probably want to view – and benchmark them. A snapshot of 5 to 10k PDFs should be enough … but how do you get them?

I figured that search engines would be my friend. If you tell them to search for PDFs only, they give you the most relevant PDFs for that keyword, which in turn are probably the most popular ones. And if you use the most searched keywords you end up with a good approximation.

Benchmarking that many PDFs is a big task. So I got myself a small cluster of old computers and built a nice server application that supplied them with tasks. The current repository has almost 7000 PDFs and benchmarking one version of PDF.js takes around eight hours.

The results

Let’s skip to the interesting part with the pretty pictures. This graph

histogram

gives us almost all the interesting results at one look. You see a histogram of the time it took to process all the pages in the PDFs in relation to the average time it takes to process the average page of the Tracemonkey Paper (the default PDF you see when opening PDF.js). The User Experience when viewing the Tracemonkey Paper is good and from my tests even 3 to 4 times slower is still okay. That means from all benchmarked pages over 96% (exclude pdfs that crashed) will translate to a good user experience. That is really good news! Or to use a very simple pie chart (in % of pages):

overview

You probably already noticed the small catch: around 0.8% of the PDFs crashed PDF.js when we tested them. We had a closer look at most of them and at least a third are actually so heavily damaged that probably no PDF viewer could ever display them.

And this leads us to another good point: we have to keep in mind that these results just stand here without comparison. There are some PDFs on the Internet that are so complex that there is no hope that even native PDF viewers could display them nice and fast. The slowest tested PDF is an incredibly detailed vector map of the public transport system of Lisbon. Try to open it in Adobe Reader, it’s not fun!

Conclusion

From these results we concluded that PDF.js is a very valid candidate to be used as the default PDF viewer in the Opera Browser. There is still a lot of work to do to integrate PDF.js nicely into it, but we are working right now on integrating it behind an experimental flag (BTW: There is an extension that adds PDF.js with the default Mozilla viewer. The “nice” integration I am talking about would be deeper and include a brand new viewer). Thanks Mozilla! We are looking forward to working on PDF.js together with you guys!

PS: Both the code of the computational system and the results are publicly available. Have a look and tell us if you find them useful!

PPS: If anybody works at a big search engine company and could give me a list with the actual 10k most used PDFs, that would be awesome 🙂

Appendix: What’s next?

The corpus and the computational framework I described, could be used to do all kinds of interesting things. In the next step, we hope to classify PDFs by used fonts formats, image formats and the like. So you can quickly get PDFs to test a new feature with. We also want to look at which drawing instructions are used with which frequency in the Postscript so we can better optimise for the very common ones, like we did with HTML in browsers. Let’s see what we can actually do 😉

View full post on Mozilla Hacks – the Web developer blog

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Fast retro gaming on mobile

Emulation is the cool technique that makes retro gaming possible, i.e. play old video games on modern devices. It allows pixel lovers to revive gaming experiences from the past. In this article we will demonstrate that the web platform is suitable for emulation, even on mobile where by definition everything is limited.

Emulation is a challenge

Emulation consists of recreating all the internals of a game console in JavaScript. The original CPU and its functions are totally reimplemented. It communicates with both video and sound units whilst listening to the gamepad inputs.

Traditionally, emulators are built as native apps, but the web stack is equally powerful, provided the right techniques are used. On web based OSes, like Firefox OS, the only way to do retro gaming is to use HTML and JavaScript.

Emulators are resource intensive applications. Running them on mobile is definitely a challenge. Even more so that Firefox OS is designed to power low-end devices where computational resources are further limited. But fear not because techniques are available to make full speed retro gaming a reality on our beloved handhelds.

In the beginning was the ROM

Video game emulation starts with ROM image files (ROM files for short). A ROM file is the representation of a game cartridge chip obtained through a process called dumping. In most video game systems, a ROM file is a single binary file containing all aspects of the game, including:

  • The logic (player movements, enemies’ artificial intelligence, level designs…)
  • The characters and backgrounds sprite
  • The music

Let’s now consider the Sega Master System and Game Gear consoles. Take the homebrew game Blockhead as an example and examine the beginning of the file:

0xF3 0xED 0×56 0xC3 0x6F 0×00 0x3F 0×00 0x7D 0xD3 0xBF 0x7C 0xD3 0xBF 0xC9 0×00
0x7B 0xD3 0xBF 0x7A 0xD3 0xBF 0xC9 0×00 0xC9 0×70 0×72 0x6F 0×70 0×70 0×79 0×00
0xC9 0×00 0×00 0×00 0×00 0×00 0×00 0×00 0xC9 0×62 0x6C 0x6F 0×63 0x6B 0×68 0×65

The elements listed above are mixed together in the ROM. The difficulty consists of telling apart the different bytes:

  • opcodes (for operation code, they are CPU instructions, similar to basic JavaScript functions)
  • operands (think of it as parameters passed to opcodes)
  • data (for example, the sprites used by the game)

If we highlight these elements differently according to their types, this is what we get:

0xF3 0xED 0×56 0xC3 0x6F 0×00 0x3F 0×00 0x7D 0xD3 0xBF 0x7C 0xD3 0xBF 0xC9 0×00
0x7B 0xD3 0xBF 0x7A 0xD3 0xBF 0xC9 0×00 0xC9 0×70 0×72 0x6F 0×70 0×70 0×79 0×00
0xC9 0×00 0×00 0×00 0×00 0×00 0×00 0×00 0xC9 0×62 0x6C 0x6F 0×63 0x6B 0×68 0×65
Caption
Opcode Operand Data

Start small with an interpreter

Let’s start playing this ROM, one instruction at a time. First we put the binary content into an ArrayBuffer (you can use XMLHttpRequest or the File API for that). As we need to access data in different types, like 8 or 16 bit integers, the easiest way is to pass this buffer to a DataView.

In Master System, the entry point is the instruction located at index 0. We create a variable called pc for program counter and set it to 0. It will keep a track of the location of the current instruction. We then read the 8 bit unsigned integer located at the current position of pc and place it into a variable called opcode. The instruction associated to this opcode will be executed. From there, we just repeat the process.

var rom = new DataView(romBuffer);
 
var pc = 0x0000;
while (true) {
  var opcode = rom.getUint8(pc++);
  switch(opcode) {
    // ... more to come here!
  }
}

For example, the 3rd instruction, located at index 3, has value 0xC3. It matches opcode `JP (nn)` (JP stands for jump). A jump transfers the execution of the program to somewhere else in the ROM. In terms of logic, that means update the value of pc. The target address is the operand. We simply read the next 2 bytes as a 16 bit unsigned integer (0x006F in this case). Let’s put it all together:

var rom = new DataView(romBuffer);
 
var pc = 0x0000;
while (true) {
  var opcode = rom.getUint8(pc++);
  switch(opcode) {
    case 0xC3:
      // Code for opcode 0xC3 `JP (nn)`.
      pc = rom.getUint16(pc);
      break;
    case 0xED:
      // @todo Write code for opcode 0xED 0x56 `IM 1`.
      break;
    case 0xF3:
      // @todo Write code for opcode 0xF3 `DI`.
      break;
  }
}

Of course, for the sake of simplicity, many details are omitted here.

Emulators working this way are called interpreters. They are relatively easy to develop, but the fetch/decode/execute loop adds significant overhead.

Recompilation, the secret to full speed

Interpreters are just a first step to fast emulation, using them ensures everything else is working: video, sound, and controllers. Interpreters can be fast enough on desktop, but are definitely too slow on mobile and drain battery.

Let’s step back a second and examine the code above. Wouldn’t it be great if we could generate JavaScript code to mimic the logic? We know that when pc equals 0×0000, the next 3 instructions will always be executed one after another, until the jump is reached.

In other words, we want something like this:

var blocks = {
  0x0000: function() {
    // @todo Write code for opcode 0xF3 `DI`.
    // @todo Write code for opcode 0xED 0x56 `IM 1`.
    // Code for opcode 0xC3 `JP (nn)`.
    this.pc = 0x006F;
  },
  0x006F: function() {
    // @todo Write code for this opcode...
  }
};
pc = 0x0000;
while (true) {
  blocks[pc]();
}

This technique is called recompilation.

The reason why it is fast is because each opcode and operand is only read once when the JavaScript code is compiled. It is then easier for the JavaScript VM to optimise the generated code.

Recompilation is said to be static when it uses static analysis to generate code. On the other hand, dynamic recompilation creates new JavaScript functions at runtime.

In jsSMS, the emulator in which I implemented these techniques, the recompiler is made of 4 components:

  • Parser: determines what part of the ROM is opcode, operand and data
  • Analyser: groups instructions into blocks (e.g. a jump instruction closes a block and open a new one) and output an AST (abstract syntax tree)
  • Optimiser: apply several passes to make the code even faster
  • Generator: convert the AST to JavaScript code

Generating functions on the fly can take time. That’s why one of the approaches is to use static recompilation and generate as much JavaScript code as possible before the game even starts. Then, because static recompilation is limited, whenever we find unparsed instructions at runtime, we generate new functions as the game is being played.

So it is faster, but how faster?

According to the benchmarks I ran on mobile, recompilers are about 3-4 times faster than interpreters.

Here are some benchmarks on different browser / device pairs:

  • Firefox OS v.1.1 Keon
  • Firefox OS v.1.1 Peak
  • Firefox 24 Samsung Galaxy S II
  • Firefox 24 LG Nexus 4

Optimisation considerations

When developing jsSMS, I applied many optimisations. Of course, the first thing was to implement the improvements suggested by this article about games for Firefox OS.

Before being more specific, keep in mind that emulators are a very particular type of gaming app. They have a limited number of variables and objects. This architecture is static, limited and as such is easy to optimise for performance.

Use typed arrays wherever possible

Resources of old consoles are limited and most concepts can be mapped to typed arrays (stack, screen data, sound buffer…). Using such arrays makes it easier for the VM to optimise.

Use dense arrays

A dense array is an array without holes. The most usual way is to set the length at creation and fill it with default values. Of course it doesn’t apply to arrays with unknown or variable size.

// Create an array of 255 items and prefill it with empty strings.
var denseArray = new Array(255);
for (var i = 0; i < 255; i++) {
  denseArray[i] = '';
}

Variables should be type stable

The type inferrer of the JavaScript VM tags variables with their type and uses this information to apply optimisations. You can help it by not changing the types of variables as the game runs. This implies the following consequences:

  • Set a default value at declaration. ‘var a = 0;` instead of `var a;` Otherwise, the VM considers that the variable can be either number or undefined.
  • Avoid recycling a variable for different types. E.g. number then string.
  • Make Boolean variables real Boolean. Avoid truthy or falsey values and use `!!` or `Boolean()` to coerce.

Some syntaxes are ambiguous to the VM. For example, the following code was tagged as unknown arithmetic type by SpiderMonkey:

pc += d < 128 ? d : d - 256;

A simple fix was to rewrite this to:

if (d >= 128) {
  d = d - 256;
}
pc += d;

Keep numeric types stable

SpiderMonkey stores all JavaScript numeric values differently depending on what they look like. It tries to map numbers to internal types (like u32 or float). The implication of this is that maintaining the same underlying type is very likely to help the VM.

To target these type changes, I used to use JIT inspector, an extension for Firefox that exposes some internals of SpiderMonkey. However, it is not compatible with the latest versions of Firefox and no longer produce a useful output. There is a bug to follow the issue, but don’t expect any changes soon: https://bugzilla.mozilla.org/show_bug.cgi?id=861069.

… and as usual profile and optimise

Using a JavaScript profiler will help you in finding the most frequently called functions. These are the ones you should focus on and optimise first.

Digging deeper in code

If you want to learn more about mobile emulation and recompilation, have a look at this talk in which the slides are actually a ROM running inside the emulator!

Conclusion

Mobile emulation shows how fast the web platform is, even on low-end devices. Using the right techniques and applying optimisations allows your games to run smoothly and at full speed. The documentation about emulation on the browser is scarce on the net, specially using modern JavaScript APIs. May this article address this lack.

There are so many video game consoles and so few web based emulators, so now, enough with the theory, and let’s start making apps for the sake of retro gaming!

View full post on Mozilla Hacks – the Web developer blog

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Fast Park Launches Mobile Web Site, Bringing Parking Convenience to On-the-Go Customers

Fast Park is the only parking service company to bring its customers true mobile convenience. The mobile site includes Google or Mapquest maps to Fast Park facilities, daily Fast Park parking rate, facility amenities, and other useful information to improve a traveler’s parking experience.Cincinnati, OH (PRWEB) July 27, 2011 The Fast Park family of off-site airport parking facilities, with 16 …

View full post on web development – Yahoo! News Search Results

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)