feat: Release the GIL when decoding the image #1

hazza203 · 2025-01-02T06:59:48Z

Hey there, firstly, thanks a heap for making this package. It's been great for us as we try and optimize some of our image loading workflows. We work with really large images so being able to load them in quicker is a blessing.

Though upon some testing I discovered that this package isn't so great for loading in multiple images at the same time via the use of threads, actually, it performs significantly worse than opencv. Most python image libraries which make use of C-extensions seem to release the GIL at some point to improve multi-threaded performance.

So I've added some handling here to release the GIL, and it works. But I'm not sure if this is the best solution / there may be a more optimal place to do it as I believe there are still some areas for optimization. Also just FYI, I'm not that experienced with writing python-C binding code, this are really my first crack at it.

Even after these changes, it is only on-par with opencv in terms of multi-threaded performance, whereas it still beats it in single threaded performance, so I reckon there are still areas for improvement.

Let me know what you think, if this change is even desired and if there may be some more optimizations we can do! I've added some timings and a test script below:

Benchmarks, using a 15k RGBA image (15000,13000)

	OpenCV	PyWuffs (current)	PyWuffs (proposed)
Single Threaded	1.83s	1.01s	1.01s
3 Concurrent Threads	2.02s	3.00s	1.44s
6 Concurrent Threads	2.40s	6.01s	2.21s
18 Concurrent Threads	5.56s	17.44s	5.88s

from concurrent.futures import ThreadPoolExecutor
import time
import cv2
import numpy as np
from pywuffs import ImageDecoderType, PixelFormat
from pywuffs.aux import ImageDecoder, ImageDecoderConfig


def load_wuffs(filepath: str) -> np.ndarray:
    """
    Returns numpy array from .png encoding.

    :param filename: filename to .png file.
    :param pixel_format: pixel format to decode image as; see PixelFormatType class.
    :return: image array
    """
    t = time.time()

    config = ImageDecoderConfig()
    config.enabled_decoders = [ImageDecoderType.PNG]
    config.pixel_format = PixelFormat.BGRA_NONPREMUL

    decoder = ImageDecoder(config)
    decoding_result = decoder.decode(filepath)
    image_data = decoding_result.pixbuf

    return time.time() - t


def load_cv2(filepath: str) -> np.ndarray:
    t = time.time()
    image = cv2.imread(filepath)
    return time.time() - t


if __name__ == "__main__":
    max_workers = 6
    thread_executor = ThreadPoolExecutor(max_workers=max_workers)
    image_path = "image.png"

    t = time.time()
    image = load_wuffs(image_path)
    print("Wuffs Single Thread Time: ", time.time() - t)

    t = time.time()
    image = load_cv2(image_path)
    print("OpenCV Single Thread Time: ", time.time() - t)

    futures = []
    t = time.time()
    for i in range(max_workers):
        futures.append(thread_executor.submit(load_wuffs, image_path))

    avg_time = sum([future.result() for future in futures]) / max_workers
    print("Wuffs Multi Thread Total Time: ", time.time() - t)
    print("Wuffs Multi Thread Avg Time: ", avg_time)

    futures = []
    t = time.time()
    for i in range(max_workers):
        futures.append(thread_executor.submit(load_cv2, image_path))

    avg_time = sum([future.result() for future in futures]) / max_workers
    print("OpenCV Multi Thread Total Time: ", time.time() - t)
    print("OpenCV Multi Thread Avg Time: ", avg_time)

dev0x13 · 2025-01-03T14:06:05Z

Hi! Thank you very much for the contribution! I like the idea, but something seems to be off: I'm getting a segmentation fault on Linux while running tests on your branch and getting the following error when running the benchmark script you provided:

pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::dec_ref() call was triggered on a NoneType object.
terminate called after throwing an instance of 'std::runtime_error'
  what():  pybind11::handle::dec_ref() PyGILState_Check() failure.

In order to figure out whether it's just my environment or not, I enabled CI for PRs - could you rebase your fork to the current state of the upstream please?

…arallelism

hazza203 · 2025-01-06T00:32:36Z

@dev0x13 Odd, I definitely didn't experience that when testing on my own environment. May I ask what python version you're running?

I've sync'd my fork, but it appears the workflows need approval to be run first

dev0x13 · 2025-01-06T15:06:30Z

@hazza203 So the CI fails too... I'm using Python 3.12 locally, and 3.7 and 3.13 are also used in CI.

However, the root cause of the issue seems to be wuffs_aux_wrap::MetadataEntry::data and wuffs_aux_wrap::ImageDecodingResult::pixbuf struct members which are pybind11::array_t<uint8_t>, i.e. basically Python objects (NumPy arrays). They are both manipulated in callbacks which are invoked by wuffs_aux::DecodeImage meaning that the solution you proposed is unsafe unless the GIL is reacquired in the said callbacks.

Having the aforementioned fields stored as NumPy arrays is somewhat a feature (yet implemented quite poorly) which was intended to eliminate the need to copy decoded data from C++ layer back to Python. I think it's possible to make these two fields std::vector<uint8_t> and then construct corresponding NumPy arrays as views so that the data is referenced instead of being copied (like this).

If you feel comfortable doing this yourself, I'll be happy to accept your PR with the needed changes, but if not, I could do it myself considering this a feature request.

hazza203 · 2025-01-07T02:28:14Z

@dev0x13

Understood, I've given this a shot, but my lack of experience with C++ and pybind is showing. I'm struggling to come up with a working solution for this, so maybe label it as a feature request.

I have run all of the tests locally, and most of them are actually passing and don't cause a seg fault. (python3.11). I've identified some clues as to what specifically is failing. It seems to be closely related to the REPORT_METADATA flags, and curiously, not for every image. This leads me to believe that the unsafe code is most likely related to wuffs_aux_wrap::MetadataEntry::data rather than wuffs_aux_wrap::ImageDecodingResult::pixbuf.

TLDR: Commenting out lines 47, 177, 216 in test/test_aux_image_decoder.py ensures that all tests run fine without a seg fault. But some tests fail of course due to missing metadata.

lena_exif.png file fails with any REPORT_METADATA_ flag
- it doesn't fail in the tests which sets config.max_incl_metadata_length = 8
lena.png fails with REPORT_METADATA_KVP
- it doesn't fail if you manually set config.max_incl_metadata_length = 8 in the test
test_decode_image_flags actually runs successfully
- This uses most of the REPORT_METADATA_ flags
- The TEST_IMAGES array doesn't include lena_exif.png

We don't use any flags when decoding, so for now, we can continue with my fork. Though it is unsafe, it works for our purposes. But ideally I'd love to get a solution on the main repo.

I'd love to see what the solution looks like if you do indeed get around to it. Thanks for the feedback and all of your hard work on this project!

dev0x13 · 2025-01-08T12:55:36Z

@hazza203

Thank you for the investigation and for the provided clues!
I've created a corresponding feature request to track the progress, and I'm closing this PR if you don't mind.
It's good to know that your patch seems to work for your use case, but I'd like to warn you additionally that these segfaults are actually a sign of an undefined behavior meaning that even if the code runs without a crash, you might be getting incorrect decoding results, so use at your own risk.
Thank you for your feedback and contribution!

dev0x13 · 2025-04-05T17:30:50Z

@hazza203 Implemented in #3 and available in v2.0.1.

hazza203 · 2025-04-06T23:53:47Z

@dev0x13 Sick! Nice work, can't wait to check it out

feat: Release the GIL when decoding the image

11f8976

dev0x13 self-assigned this Jan 3, 2025

Merge branch 'dev0x13:master' into feat/release-python-gil-optimize-p…

012feb5

…arallelism

dev0x13 mentioned this pull request Jan 8, 2025

Feature request: release GIL on decoder invocations #2

Closed

dev0x13 closed this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Release the GIL when decoding the image #1

feat: Release the GIL when decoding the image #1

Uh oh!

hazza203 commented Jan 2, 2025 •

edited

Loading

Uh oh!

dev0x13 commented Jan 3, 2025

Uh oh!

hazza203 commented Jan 6, 2025

Uh oh!

dev0x13 commented Jan 6, 2025

Uh oh!

hazza203 commented Jan 7, 2025 •

edited

Loading

Uh oh!

dev0x13 commented Jan 8, 2025

Uh oh!

dev0x13 commented Apr 5, 2025

Uh oh!

hazza203 commented Apr 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Release the GIL when decoding the image #1

feat: Release the GIL when decoding the image #1

Uh oh!

Conversation

hazza203 commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks, using a 15k RGBA image (15000,13000)

Uh oh!

dev0x13 commented Jan 3, 2025

Uh oh!

hazza203 commented Jan 6, 2025

Uh oh!

dev0x13 commented Jan 6, 2025

Uh oh!

hazza203 commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dev0x13 commented Jan 8, 2025

Uh oh!

dev0x13 commented Apr 5, 2025

Uh oh!

hazza203 commented Apr 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hazza203 commented Jan 2, 2025 •

edited

Loading

hazza203 commented Jan 7, 2025 •

edited

Loading