It is no secret that JavaScript - a single-threaded dynamic loosely typed language - is not good enough for websites where you need to do a lot of math or computation. So when a college friend of mine and I decided to build a client-side image compression tool, we knew that we would need to write at least parts of the logic in a faster language.
This blog details the things we learned along the way.
If you think of the browser as an operating system(which some could argue is actually true), then Web Assembly is the language it understands. Just like an ARM processor understands ARM Assembly, the browser understands WASM and can process it natively without the need for interpreting it, as is the case with JavaScript.
This means it opens up the web for much more complicated and computation heavy tasks than could be imagined before. But we don’t actually write Web Assembly - just the same way we don’t write Assembly by hand. We use a high level language and use the WASM as a compilation target.
First things first, which language do you choose to write the math-heavy code in? Rust? Go? C++? All are good options with varying levels of support for Web Assembly but we settled on C++.
Not because it was the best option - we still don’t know if it was - but because given that our main algorithm - SVD or Singular Valued Decomposition - would span less than 30 lines of code, writing it in a language that provides rapid development was important. Rust, with its type safety and borrow checking hinders fast development - at least for those not familiar with the language semantics(that is, us).
Go would probably have been fine as well but given that neither one of us was in the mood to learn a new language, we were pretty much left with only C++. We of course limited ourselves to header-only libraries in C++ given that none of us had any experience in getting CMake or any other build tool work.
This, along with Emscripten, provided for as seamless an experience as you can have while writing a nontrivial application in C++.
We could’ve started with the Web part but up until then, we were under the (wrong) impression that that would be more or less straightforward. So we started with the C++ code
Of course, we had to face problems while building the CLI tool as well - including navigating hard to read documentation and out-of-date dependencies.
But after a while, we managed to create a CLI tool that took in the image path and then created the compressed image.
The algorithm we used - as I mentioned earlier - was SVD. I won’t claim to understand the math behind the madness here, but in a nutshell, it is quite similar to the PCA algorithm most commonly used in Machine Learning. You take in the features which contribute the most to the image - say the top 100 - and discard the rest. This figure is referred to as the rank in SVD and is used to control the level of compression of the image.
Emscripten is an entire toolchain that leverages LLVM to compile C++ code to WASM. Once we had the logic written in C++, we needed to compile that to WASM and of course, expose these functions to the JavaScript land. While Emscripten has its flaws, for our use case, it was pretty much smooth sailing once we got the hang of how things worked.
The guides could certainly be better - while the documentation is extensive, there’s no real way for you to stumble across a configuration option unless you already know it exists - I can recall multiple cases where we couldn’t figure out something and in the end, had to look at the actual generated JS glue code to see what configuration options we could set.
One specific example I’ll mention is the weirdness around importScripts
and workers. Here’s the basic setup - if you enable pthreads and import a.out.js
(that is the code that imports and sets up WASM for you) in a worker, the a.out.worker.js
(the glue code for the pthread backed workers) somehow magically ends up calling your custom worker.
And this means everything you do in your worker, affects the other generated workers - which is probably something no one wants. I struggled at this for 3 days and in the end, ended up raising an issue on the Emscripten repo on GitHub. The behavior was so confusing that the first response I got was literally “Are you sure that’s what’s happening?”.
In the end, I had to manually look at the generated files, and would you know it, there’s a configuration option to fix exactly this issue. It was a one-line fix but finding it took a lot more effort than should be required. Emscripten specifically mentions calling WASM code in a worker to prevent blocking the main thread on their docs and I would’ve hoped that this gotcha was mentioned there but alas.
But regardless of those stumbling blocks, we had a working app at the end of it. It still wasn’t very performant and we were making copies of data as if there was no tomorrow but it still worked.
People have long believed that WASM is the end-all be-all solution to performance problems when in actuality, it’s a lot easier to write slow C++ than it is to write slow JavaScript code. JavaScript runtimes optimize away a lot of bad code and you generally don’t have to worry about anything other than not using an unnecessary loop or forgetting a stopping condition in a recursive call.
This is especially true of memory issues. Most memory leaks I’ve encountered in JavaScript can be explained by an HTML element that is no longer in the DOM but is pointed to by some variable - thereby preventing garbage collection.
So long as you use your WeakMaps and WeakSets correctly, you shouldn’t face this issue.
But Web Assembly doesn’t have a garbage collector. Not only that, Web Assembly can’t access the DOM or memory allocated in the JavaScript land which means that you end up making a bunch of unnecessary copies when moving from WASM to JS.
Thankfully, there is a way to circumvent this, although it is rather ugly. It requires you to allocate memory on the WASM stack while in JavaScript, copy your data to it, and then pass along that memory pointer to C++ and then read from it. There’s still a copy - but there’s just one copy. Once the memory is allocated, JavaScript and WebAssembly can both read it without any overhead.
The reason this is ugly is not only because your JavaScript code is littered with malloc
(a function I hoped to never see again) but because Emscripten at the very least doesn’t allow taking in raw pointers as arguments to functions. So what you end up doing(and I found this after a painstaking amount of research) is passing along a 32 bit integer and then reinterpret_cast
to a pointer.
In our case, it was even more complex since just like WASM, Web Workers can’t access the DOM. Furthermore, you communicate between the main thread and workers by message passing which makes a copy of whatever you send along.
This, fortunately, has a more simple solution - at least in modern browsers - use a SharedArrayBuffer. This is still fairly new - Chrome implemented it in 2018 and Firefox didn’t support it until 2020 - but is the only good way to get this working.
So after doing all those memory optimizations and hacking along with raw pointers like we’re cavemen, we have a more performant app. But there are still things to improve and we need the Browser Dev tools to figure out what those are.
I was amazed by how good the browser tooling for Web Assembly is. Our application is still fairly simple so I’m not sure if it’s due to that, but for our needs, the browser memory and performance section were more than excellent.
As I mentioned in the previous section, unlike JavaScript, you’re responsible for cleaning up your memory in Web Assembly. This makes the situation ripe for memory leaks. We had a couple and it would’ve been too difficult to find them without the Memory tab in the Dev tools.
While Firefox provides a cool-looking graph-type thing, Chrome is the one you want to use here. It shows the references each variable holds and you can compare it to the previous snapshot to figure out what’s the data that’s responsible for the leak. The tree-like structure for references is much easier to grasp than the way Firefox presents it.
For time improvements, you better be using Firefox. This, like before, is subjective, but I prefer how detailed and comfortable the Firefox profiling feels. Some of it is because it opens the entire report in a new tab which makes things much more spacious and easy to understand. Just make sure to compile to WASM with debug symbols enabled and you should see your native code which is taking the most time to run
And in the end, we made a minimalist wrapper UI around the application. We decided not to go for any front-end framework since it probably would’ve been overkill and might have complicated things further. A side effect is a faster application with a smaller bundle size(the WASM module is less than 500 bytes when gzipped!) with support for mobile devices as well.
And there you have it. Web Assembly might not be the answer to everything but it’s certainly worth it if you’re willing to trade away a bit of your time for a lot more performance.
You can find our website here and the source code here and as always, thanks for reading!