Skip to content

Conversation

@hazza203
Copy link

@hazza203 hazza203 commented Jan 2, 2025

Hey there, firstly, thanks a heap for making this package. It's been great for us as we try and optimize some of our image loading workflows. We work with really large images so being able to load them in quicker is a blessing.

Though upon some testing I discovered that this package isn't so great for loading in multiple images at the same time via the use of threads, actually, it performs significantly worse than opencv. Most python image libraries which make use of C-extensions seem to release the GIL at some point to improve multi-threaded performance.

So I've added some handling here to release the GIL, and it works. But I'm not sure if this is the best solution / there may be a more optimal place to do it as I believe there are still some areas for optimization. Also just FYI, I'm not that experienced with writing python-C binding code, this are really my first crack at it.

Even after these changes, it is only on-par with opencv in terms of multi-threaded performance, whereas it still beats it in single threaded performance, so I reckon there are still areas for improvement.

Let me know what you think, if this change is even desired and if there may be some more optimizations we can do! I've added some timings and a test script below:

Benchmarks, using a 15k RGBA image (15000,13000)

OpenCV PyWuffs (current) PyWuffs (proposed)
Single Threaded 1.83s 1.01s 1.01s
3 Concurrent Threads 2.02s 3.00s 1.44s
6 Concurrent Threads 2.40s 6.01s 2.21s
18 Concurrent Threads 5.56s 17.44s 5.88s
from concurrent.futures import ThreadPoolExecutor
import time
import cv2
import numpy as np
from pywuffs import ImageDecoderType, PixelFormat
from pywuffs.aux import ImageDecoder, ImageDecoderConfig


def load_wuffs(filepath: str) -> np.ndarray:
    """
    Returns numpy array from .png encoding.

    :param filename: filename to .png file.
    :param pixel_format: pixel format to decode image as; see PixelFormatType class.
    :return: image array
    """
    t = time.time()

    config = ImageDecoderConfig()
    config.enabled_decoders = [ImageDecoderType.PNG]
    config.pixel_format = PixelFormat.BGRA_NONPREMUL

    decoder = ImageDecoder(config)
    decoding_result = decoder.decode(filepath)
    image_data = decoding_result.pixbuf

    return time.time() - t


def load_cv2(filepath: str) -> np.ndarray:
    t = time.time()
    image = cv2.imread(filepath)
    return time.time() - t


if __name__ == "__main__":
    max_workers = 6
    thread_executor = ThreadPoolExecutor(max_workers=max_workers)
    image_path = "image.png"

    t = time.time()
    image = load_wuffs(image_path)
    print("Wuffs Single Thread Time: ", time.time() - t)

    t = time.time()
    image = load_cv2(image_path)
    print("OpenCV Single Thread Time: ", time.time() - t)

    futures = []
    t = time.time()
    for i in range(max_workers):
        futures.append(thread_executor.submit(load_wuffs, image_path))

    avg_time = sum([future.result() for future in futures]) / max_workers
    print("Wuffs Multi Thread Total Time: ", time.time() - t)
    print("Wuffs Multi Thread Avg Time: ", avg_time)

    futures = []
    t = time.time()
    for i in range(max_workers):
        futures.append(thread_executor.submit(load_cv2, image_path))

    avg_time = sum([future.result() for future in futures]) / max_workers
    print("OpenCV Multi Thread Total Time: ", time.time() - t)
    print("OpenCV Multi Thread Avg Time: ", avg_time)

@dev0x13 dev0x13 self-assigned this Jan 3, 2025
@dev0x13
Copy link
Owner

dev0x13 commented Jan 3, 2025

Hi! Thank you very much for the contribution! I like the idea, but something seems to be off: I'm getting a segmentation fault on Linux while running tests on your branch and getting the following error when running the benchmark script you provided:

pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::dec_ref() call was triggered on a NoneType object.
terminate called after throwing an instance of 'std::runtime_error'
  what():  pybind11::handle::dec_ref() PyGILState_Check() failure.

In order to figure out whether it's just my environment or not, I enabled CI for PRs - could you rebase your fork to the current state of the upstream please?

@hazza203
Copy link
Author

hazza203 commented Jan 6, 2025

@dev0x13 Odd, I definitely didn't experience that when testing on my own environment. May I ask what python version you're running?

I've sync'd my fork, but it appears the workflows need approval to be run first

@dev0x13
Copy link
Owner

dev0x13 commented Jan 6, 2025

@hazza203 So the CI fails too... I'm using Python 3.12 locally, and 3.7 and 3.13 are also used in CI.

However, the root cause of the issue seems to be wuffs_aux_wrap::MetadataEntry::data and wuffs_aux_wrap::ImageDecodingResult::pixbuf struct members which are pybind11::array_t<uint8_t>, i.e. basically Python objects (NumPy arrays). They are both manipulated in callbacks which are invoked by wuffs_aux::DecodeImage meaning that the solution you proposed is unsafe unless the GIL is reacquired in the said callbacks.

Having the aforementioned fields stored as NumPy arrays is somewhat a feature (yet implemented quite poorly) which was intended to eliminate the need to copy decoded data from C++ layer back to Python. I think it's possible to make these two fields std::vector<uint8_t> and then construct corresponding NumPy arrays as views so that the data is referenced instead of being copied (like this).

If you feel comfortable doing this yourself, I'll be happy to accept your PR with the needed changes, but if not, I could do it myself considering this a feature request.

@hazza203
Copy link
Author

hazza203 commented Jan 7, 2025

@dev0x13

Understood, I've given this a shot, but my lack of experience with C++ and pybind is showing. I'm struggling to come up with a working solution for this, so maybe label it as a feature request.

I have run all of the tests locally, and most of them are actually passing and don't cause a seg fault. (python3.11). I've identified some clues as to what specifically is failing. It seems to be closely related to the REPORT_METADATA flags, and curiously, not for every image. This leads me to believe that the unsafe code is most likely related to wuffs_aux_wrap::MetadataEntry::data rather than wuffs_aux_wrap::ImageDecodingResult::pixbuf.

TLDR: Commenting out lines 47, 177, 216 in test/test_aux_image_decoder.py ensures that all tests run fine without a seg fault. But some tests fail of course due to missing metadata.

  • lena_exif.png file fails with any REPORT_METADATA_ flag
    • it doesn't fail in the tests which sets config.max_incl_metadata_length = 8
  • lena.png fails with REPORT_METADATA_KVP
    • it doesn't fail if you manually set config.max_incl_metadata_length = 8 in the test
  • test_decode_image_flags actually runs successfully
    • This uses most of the REPORT_METADATA_ flags
    • The TEST_IMAGES array doesn't include lena_exif.png

We don't use any flags when decoding, so for now, we can continue with my fork. Though it is unsafe, it works for our purposes. But ideally I'd love to get a solution on the main repo.

I'd love to see what the solution looks like if you do indeed get around to it. Thanks for the feedback and all of your hard work on this project!

@dev0x13
Copy link
Owner

dev0x13 commented Jan 8, 2025

@hazza203

Thank you for the investigation and for the provided clues!
I've created a corresponding feature request to track the progress, and I'm closing this PR if you don't mind.
It's good to know that your patch seems to work for your use case, but I'd like to warn you additionally that these segfaults are actually a sign of an undefined behavior meaning that even if the code runs without a crash, you might be getting incorrect decoding results, so use at your own risk.
Thank you for your feedback and contribution!

@dev0x13 dev0x13 closed this Jan 8, 2025
@dev0x13
Copy link
Owner

dev0x13 commented Apr 5, 2025

@hazza203 Implemented in #3 and available in v2.0.1.

@hazza203
Copy link
Author

hazza203 commented Apr 6, 2025

@dev0x13 Sick! Nice work, can't wait to check it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants