This blog details the things we learned along the way.
This means it opens up the web for much more complicated and computation heavy tasks than could be imagined before. But we don’t actually write Web Assembly - just the same way we don’t write Assembly by hand. We use a high level language and use the WASM as a compilation target.
First things first, which language do you choose to write the math-heavy code in? Rust? Go? C++? All are good options with varying levels of support for Web Assembly but we settled on C++.
Not because it was the best option - we still don’t know if it was - but because given that our main algorithm - SVD or Singular Valued Decomposition - would span less than 30 lines of code, writing it in a language that provides rapid development was important. Rust, with its type safety and borrow checking hinders fast development - at least for those not familiar with the language semantics(that is, us).
Go would probably have been fine as well but given that neither one of us was in the mood to learn a new language, we were pretty much left with only C++. We of course limited ourselves to header-only libraries in C++ given that none of us had any experience in getting CMake or any other build tool work.
This, along with Emscripten, provided for as seamless an experience as you can have while writing a nontrivial application in C++.
We could’ve started with the Web part but up until then, we were under the (wrong) impression that that would be more or less straightforward. So we started with the C++ code
Of course, we had to face problems while building the CLI tool as well - including navigating hard to read documentation and out-of-date dependencies.
But after a while, we managed to create a CLI tool that took in the image path and then created the compressed image.
The algorithm we used - as I mentioned earlier - was SVD. I won’t claim to understand the math behind the madness here, but in a nutshell, it is quite similar to the PCA algorithm most commonly used in Machine Learning. You take in the features which contribute the most to the image - say the top 100 - and discard the rest. This figure is referred to as the rank in SVD and is used to control the level of compression of the image.
The guides could certainly be better - while the documentation is extensive, there’s no real way for you to stumble across a configuration option unless you already know it exists - I can recall multiple cases where we couldn’t figure out something and in the end, had to look at the actual generated JS glue code to see what configuration options we could set.
One specific example I’ll mention is the weirdness around
importScripts and workers. Here’s the basic setup - if you enable pthreads and import
a.out.js(that is the code that imports and sets up WASM for you) in a worker, the
a.out.worker.js(the glue code for the pthread backed workers) somehow magically ends up calling your custom worker.
And this means everything you do in your worker, affects the other generated workers - which is probably something no one wants. I struggled at this for 3 days and in the end, ended up raising an issue on the Emscripten repo on GitHub. The behavior was so confusing that the first response I got was literally “Are you sure that’s what’s happening?”.
In the end, I had to manually look at the generated files, and would you know it, there’s a configuration option to fix exactly this issue. It was a one-line fix but finding it took a lot more effort than should be required. Emscripten specifically mentions calling WASM code in a worker to prevent blocking the main thread on their docs and I would’ve hoped that this gotcha was mentioned there but alas.
But regardless of those stumbling blocks, we had a working app at the end of it. It still wasn’t very performant and we were making copies of data as if there was no tomorrow but it still worked.
malloc(a function I hoped to never see again) but because Emscripten at the very least doesn’t allow taking in raw pointers as arguments to functions. So what you end up doing(and I found this after a painstaking amount of research) is passing along a 32 bit integer and then
reinterpret_cast to a pointer.
In our case, it was even more complex since just like WASM, Web Workers can’t access the DOM. Furthermore, you communicate between the main thread and workers by message passing which makes a copy of whatever you send along.
This, fortunately, has a more simple solution - at least in modern browsers - use a SharedArrayBuffer. This is still fairly new - Chrome implemented it in 2018 and Firefox didn’t support it until 2020 - but is the only good way to get this working.
So after doing all those memory optimizations and hacking along with raw pointers like we’re cavemen, we have a more performant app. But there are still things to improve and we need the Browser Dev tools to figure out what those are.
I was amazed by how good the browser tooling for Web Assembly is. Our application is still fairly simple so I’m not sure if it’s due to that, but for our needs, the browser memory and performance section were more than excellent.
While Firefox provides a cool-looking graph-type thing, Chrome is the one you want to use here. It shows the references each variable holds and you can compare it to the previous snapshot to figure out what’s the data that’s responsible for the leak. The tree-like structure for references is much easier to grasp than the way Firefox presents it.
For time improvements, you better be using Firefox. This, like before, is subjective, but I prefer how detailed and comfortable the Firefox profiling feels. Some of it is because it opens the entire report in a new tab which makes things much more spacious and easy to understand. Just make sure to compile to WASM with debug symbols enabled and you should see your native code which is taking the most time to run
And in the end, we made a minimalist wrapper UI around the application. We decided not to go for any front-end framework since it probably would’ve been overkill and might have complicated things further. A side effect is a faster application with a smaller bundle size(the WASM module is less than 500 bytes when gzipped!) with support for mobile devices as well.
And there you have it. Web Assembly might not be the answer to everything but it’s certainly worth it if you’re willing to trade away a bit of your time for a lot more performance.