Investigate and optimize memory usage in WebAssembly decoder

ohif · July 10, 2024, 1:21pm

View in #htj2k on Slack

@Adrien: Hi I’m investigating relatively high memory consumption in the htj2k image decoder… But I’m not super familiar with memory profiling on wasm/emscripten code. After decoding ~1200 MRI frames (each 600x600 pixels), I see that the heap of the web worker is around 700MB. But it seems that the decoder memory usage should stay constant no?

As far as I could gather, the current decoding process goes like this:
• the worker has its own heap, that it can grow
• the JS side copies the htj2k data to decode in a shared array buffer
• the JS side asks the decoder to run
• the JS side copies the raw pixel data out of a shared array buffer
I’m not really sure why the heap grows, but I could imagine that the emscripten malloc implementation can’t easily shrink the ArrayBuffer used for the heap (why it grows to 700MB is a mystery to me still).

I was wondering why the decoder needs to allocate memory at all. Could the following also work?
• the worker has its own heap, which it cannot grow. It is basically the working memory used for bookkeeping while decoding, and should be relatively small, especially if doing streaming decode?
• the JS side knows the final resolution of the image (maybe this required decoding the header first), so it allocates an ArrayBuffer of the right size and transfers it to the worker
• the JS side passes the htj2k data to the worker by transferring it ArrayBuffers (they don’t need to be Shared) - either at once or in chunks if we can stream
• once the decoder is done, it transfers the ArrayBuffer back to the JS side
this way, the “heavy” buffers (those containing image data) would always be managed by the browser GC, and could be disposed when not used anymore.
also: interestingly, changing the concurrency of the decoder (worker count) doesn’t impact peak memory usage: I either end up with two workers with a heap of 350MB each, or one worker with a heap of 700MB. So I need to investigate if we maybe need a “back pressure” management solution, for when the htj2k data arrives faster than what the CPU can decode.
answering to myself: you apparently can’t simply “pass an ArrayBuffer” to C++, at best you can create a typed view on the emscript heap and copy the data from/to there (which is what the code currently does). So the mystery is rather why memory usage grows, since the encode/decode buffers should never be larger than 600x600x2 bytes (uint16)
alright, I can reproduce the issue with a simple nodejs script that calls the htj2k decodeAsync directly (basically like the unit test). Decoding a series of 467 frames ends up using a bit over 430MB of RAM (Node+wasm).

@pieper: I’m not sure if it’s a heavy-weight operation but maybe you could just reset/recreate the wasm worker environment periodically (even after each slice). This could wipe out any memory fragmentation.

@Adrien: yes, that’s a workaround I’m already using, currently killing idle workers after 5 seconds… I could certainly try to do that after each slice, but am a bit worried of the performance hit
I suppose I could also try to adapt the cpp unit test of openjphjs to run through my series, run that through massif, and see how memory usage evolves.
alright, running through the same frames in c++ I never go above 6MB

@Bill_Wallace: We really need to update the wasm library for htj2k to the latest HTJ2K release build, as well as switching the memory manager used. That was done for the JPEG-LS I think it was and made a huge difference in stability and memory usage. There are also bugs in the decoding process that caused the decoder to get slower and slower, so the decoder gets a new context on every decode. Re-using the same decoder should re-use the old memory set, or else de-allocated it and create a new set.
Debugging it is on my todo list, along with getting the JPEG XL decoding working.

@Adrien: For me the two main “investigation tracks” would be:
• can openjph run without allocating memory?
• why does memory usage when run in wasm keep growing?
Investigating the first point is probably too complex for me for now, so I’m looking at https://emscripten.org/docs/api_reference/trace.h.html to see if I can get more info on point 2
@Bill_Wallace when you say “memory manager”, do you refer to the malloc implementation compiled by emscripten in the wasm binary?

@Bill_Wallace: Yes, the malloc implementation - I’m not all that familiar with wasm myself, so still learning to deal with it

@Adrien: it looks like we’re using emmalloc for htj2k as well