Skip to content

ENH: Add libjpeg-turbo backend for GDCM JPEG codec#6149

Draft
blowekamp wants to merge 1 commit intoInsightSoftwareConsortium:mainfrom
blowekamp:gdcm-jpegturbo-backend
Draft

ENH: Add libjpeg-turbo backend for GDCM JPEG codec#6149
blowekamp wants to merge 1 commit intoInsightSoftwareConsortium:mainfrom
blowekamp:gdcm-jpegturbo-backend

Conversation

@blowekamp
Copy link
Copy Markdown
Member

Replace GDCM's vendored IJG 6b JPEG libraries (gdcmjpeg8/12/16) with
ITK's existing libjpeg-turbo, controlled by a GDCM_USE_JPEGTURBO CMake
option (default ON). The old IJG path is preserved as an opt-out fallback.

Motivation

GDCM currently vendors a patched IJG 6b JPEG library (~1998 vintage) built
as three separate static libraries (gdcmjpeg8, gdcmjpeg12, gdcmjpeg16)
for the three DICOM-required precisions. ITK already vendors libjpeg-turbo
3.0.x with native multi-precision support. Using a single modern library:

  • Reduces compiled code (one library instead of three)
  • Brings SIMD-accelerated JPEG decode/encode paths to GDCM
  • Consolidates JPEG dependency maintenance
  • Adds native lossless JPEG (SOF3 / Process 14) via jpeg_enable_lossless()

Changes

File Change
gdcmJPEGTurboCodec.h/.cxx New codec class replacing the three per-bitdepth codecs
gdcmJPEGCodec.cxx Dispatcher selects JPEGTurboCodec when turbo is enabled
MediaStorageAndFileFormat/CMakeLists.txt Conditionally compile old or new codec sources
gdcm/CMakeLists.txt GDCM_USE_JPEGTURBO clears GDCM_LJPEG_LIBRARIES
gdcm/Utilities/CMakeLists.txt Skip building gdcmjpeg subdir when turbo is used
GDCM/src/CMakeLists.txt Wire ITKJPEG include dirs and library
GDCM/itk-module.cmake Add ITKJPEG to DEPENDS

Implementation notes

  • Multi-precision: libjpeg-turbo 3.0 provides jpeg12_read_scanlines()
    and jpeg16_read_scanlines() prefixed functions; the codec dispatches at
    runtime based on cinfo.data_precision.
  • Lossless detection: uses jpegint.h's master->lossless field (same
    technique as libtiff's tif_jpeg.c), since libjpeg-turbo exposes no public
    process field.
  • Lossless encoding: jpeg_enable_lossless() replaces GDCM's IJG-specific
    jpeg_simple_lossless().
  • Stream I/O: custom jpeg_source_mgr/jpeg_destination_mgr implementations
    bridge libjpeg-turbo's scanline API to GDCM's std::istream/std::ostream.

Future work

This is an initial in-tree hack — the intent is to upstream the
JPEGTurboCodec to GDCM proper so ITK's vendored copy can track upstream
without carrying a large patch. A follow-up PR to the GDCM repository is
planned.

Test results (local)

Built and tested against the default preset (Release, x86-64 Linux):

cmake --build --preset default --target gdcmMSFF  # clean build
ctest --preset default -R "itkGDCM.*JPEG|itkGDCMImageReadWrite|itkGDCMImageIOTest" --output-on-failure

All 18 GDCM JPEG tests pass, including:

  • itkGDCMImageReadWriteTest_JPEGBaseline1 (8-bit lossy)
  • itkGDCM_ComplianceTestRGB_losslessJPEG-RGB (lossless SOF3)
  • itkGDCM_ComplianceTestRGB_lossyJPEG-YBR_FULL_422 (YCbCr lossy)
  • All JPEG2000, JPEG-LS, raw, and RLE compliance tests continue to pass

Confirmed gdcmjpeg8 target no longer built (ninja: error: unknown target 'gdcmjpeg8').

AI assistance

This PR was generated by a GitHub Copilot agent (Claude Sonnet 4.6) with
human direction and review.

  • Role: full implementation — API research, codec design, source/CMake
    authoring, build error diagnosis, and test verification
  • Human contribution: architecture decisions (in-tree hack vs upstream
    patch), option naming, build preset selection, code review, and all commits
  • Evidence of testing: see test results above; the agent ran all build
    and ctest commands and iterated on compilation errors (incorrect include
    path for jpegint.h, inverted if/else branch in CMakeLists)

This is an agent-driven initial effort. The code should be reviewed
carefully before merge, particularly the stream source/dest managers and
the multi-precision scanline dispatch paths.

Add JPEGTurboCodec that uses ITK's libjpeg-turbo instead of GDCM's
vendored IJG 6b libraries. Controlled by GDCM_USE_JPEGTURBO (ON by
default). When enabled, a single codec handles 8/12/16-bit JPEG
via runtime precision dispatch, replacing gdcmjpeg8/12/16 libraries.
The old IJG path is preserved as a fallback when the option is OFF.
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation area:ThirdParty Issues affecting the ThirdParty module labels Apr 27, 2026
@hjmjohnson
Copy link
Copy Markdown
Member

@blowekamp I am supportive of this effort. Ideally, the upstream submission would be done first, and this side-step could be avoided.

I'll monitor this PR and take your guidance on whether to wait for an upstream incorporation or move forward with this temporary incorporation.

@hjmjohnson
Copy link
Copy Markdown
Member

@greptile Review this draft.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR replaces GDCM's three vendored IJG 6b JPEG libraries (gdcmjpeg8/12/16) with ITK's existing libjpeg-turbo backend, gated by a new GDCM_USE_JPEGTURBO CMake option (default ON). The new JPEGTurboCodec class handles 8/12/16-bit lossy and lossless JPEG in a single codec using libjpeg-turbo's multi-precision scanline API.

  • P1 – fill_input_buffer returns FALSE at end-of-stream: the early return FALSE when end == pos signals suspension mode to the decompressor instead of inserting a fake EOI marker (FF D9), causing incorrect decode behaviour for JPEG data that exhausts the 4096-byte buffer exactly.
  • P1 – Memory leak under JPEG error in planar encode path: malloc'd tempbuffer in InternalCode's planar-configuration branch is not freed when the JPEG error handler fires longjmp; allocating from the libjpeg pool fixes this.

Confidence Score: 3/5

Not safe to merge as-is: two P1 bugs in the stream I/O and memory management paths of the new codec could cause silent decode failures and memory leaks in production DICOM workloads.

Two independent P1 defects in the core codec path (incorrect EOF handling in the source manager and a longjmp-skipped malloc) lower the ceiling to 4/5, and together with P2 concerns (private header reliance, YCbCr planar config, non-seekable stream breakage, and the INTERNAL cache type blocking opt-out) the score sits at 3/5.

gdcmJPEGTurboCodec.cxx — stream source manager (turbo_fill_input_buffer) and InternalCode planar path require fixes before merge.

Important Files Changed

Filename Overview
Modules/ThirdParty/GDCM/src/gdcm/Source/MediaStorageAndFileFormat/gdcmJPEGTurboCodec.cxx New 842-line codec; has two P1 bugs: fill_input_buffer returns FALSE on EOF instead of inserting fake EOI, and malloc'd tempbuffer in the planar-encode path leaks under JPEG error/longjmp. Also relies on private jpegint.h and incorrectly sets PlanarConfiguration=1 for YCbCr.
Modules/ThirdParty/GDCM/src/gdcm/Source/MediaStorageAndFileFormat/gdcmJPEGTurboCodec.h Clean header declaring JPEGTurboCodec as a pimpl-based subclass of JPEGCodec; no issues found.
Modules/ThirdParty/GDCM/src/gdcm/Source/MediaStorageAndFileFormat/gdcmJPEGCodec.cxx Dispatcher correctly selects JPEGTurboCodec for bit depths 1–16 under GDCM_USE_JPEGTURBO; guards and fallback path look correct.
Modules/ThirdParty/GDCM/src/CMakeLists.txt Introduces GDCM_USE_JPEGTURBO as CACHE INTERNAL, which silently prevents user opt-out despite the PR describing an opt-out fallback.
Modules/ThirdParty/GDCM/src/gdcm/Source/MediaStorageAndFileFormat/CMakeLists.txt Correctly conditionalises old IJG codec sources vs new TurboCodec source; include_directories and target_link_libraries wiring looks correct.
Modules/ThirdParty/GDCM/src/gdcm/CMakeLists.txt Clears GDCM_LJPEG_LIBRARIES when turbo is active; properly slots into the existing LJPEG/system-LJPEG if/elseif/else chain.
Modules/ThirdParty/GDCM/src/gdcm/Utilities/CMakeLists.txt One-line guard correctly skips building the IJG gdcmjpeg subdir when libjpeg-turbo is used.
Modules/ThirdParty/GDCM/itk-module.cmake ITKJPEG added unconditionally to DEPENDS; correct for the default-ON turbo path, but will declare an unused dependency when GDCM_USE_JPEGTURBO is OFF.

Comments Outside Diff (1)

  1. Modules/ThirdParty/GDCM/src/CMakeLists.txt, line 22 (link)

    P2 CACHE INTERNAL prevents users from opting out of libjpeg-turbo

    set(GDCM_USE_JPEGTURBO ON CACHE INTERNAL ...) hides the variable from the CMake GUI and prevents command-line override (-DGDCM_USE_JPEGTURBO=OFF). The PR description says the old IJG path is preserved as an opt-out fallback, but INTERNAL makes that fallback inaccessible. If opt-out is intended, use BOOL instead:

    set(GDCM_USE_JPEGTURBO ON CACHE BOOL "Use ITK's libjpeg-turbo as the JPEG backend instead of GDCM's IJG 6b")

Reviews (1): Last reviewed commit: "ENH: Add libjpeg-turbo backend for GDCM ..." | Re-trigger Greptile

Comment on lines +79 to +80
return FALSE; // suspension

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 fill_input_buffer returns FALSE on stream-end instead of fake EOI

When the stream position equals the end position (end == pos), the function returns FALSE directly. In non-suspension libjpeg usage, fill_input_buffer returning FALSE signals suspension mode to the decompressor — it will not insert a synthetic end-of-image marker and the decoder enters an incorrect state. The block below that inserts a fake FF D9 EOI marker is never reached in this case.

The standard approach (matching libjpeg's example.c / the original gdcmJPEGBITSCodec.hxx) is to fall through to the gcount <= 0 branch which emits the fake EOI and returns TRUE. The early return FALSE must be removed.

Comment on lines +684 to +700
row_pointer[0] = tempbuffer;
int offset = image_height * image_width;
while (cinfo.next_scanline < cinfo.image_height) {
gdcm_assert(row_stride % 3 == 0);
JSAMPLE *ptempbuffer = tempbuffer;
JSAMPLE *red = image_buffer + cinfo.next_scanline * row_stride / 3;
JSAMPLE *green = red + offset;
JSAMPLE *blue = green + offset;
for (size_t i = 0; i < row_stride / 3; ++i) {
*ptempbuffer++ = *red++;
*ptempbuffer++ = *green++;
*ptempbuffer++ = *blue++;
}
jpeg_write_scanlines(&cinfo, row_pointer, 1);
}
free(tempbuffer);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 tempbuffer leaked when JPEG error fires longjmp in planar path

tempbuffer is allocated via malloc inside the planar-configuration branch of InternalCode. The setjmp recovery block calls jpeg_destroy_compress and returns false, but never frees tempbuffer. If jpeg_write_scanlines triggers an error that fires longjmp, this allocation leaks.

Replace malloc/free with a libjpeg pool allocation so it is owned by the compress struct:

JSAMPLE *tempbuffer = (JSAMPLE *)(*cinfo.mem->alloc_small)(
    (j_common_ptr)&cinfo, JPOOL_IMAGE, row_stride * sizeof(JSAMPLE));
// remove the free(tempbuffer) call below

Comment on lines +75 to +87
std::streampos end = src->infile->seekg(0, std::ios::end).tellg();
src->infile->seekg(pos, std::ios::beg);

if (end == pos)
return FALSE; // suspension

size_t toread = TURBO_INPUT_BUF_SIZE;
if ((end - pos) < (std::streamoff)TURBO_INPUT_BUF_SIZE)
toread = (size_t)(end - pos);

src->infile->read((char *)src->buffer, toread);
std::streamsize gcount = src->infile->gcount();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 seekg-based size probe in fill_input_buffer is expensive and breaks non-seekable streams

Seeking to end-of-stream on every buffer-fill call is expensive and silently fails for pipes, network streams, or std::istringstream-derived buffers that don't support seekg. The libjpeg convention (example.c, jdatasrc.c) is to unconditionally call read(BUFSIZE), then handle a short read via the gcount <= 0 path that already exists below.

Comment on lines +357 to +359
} else if (cinfo.jpeg_color_space == JCS_CMYK) {
gdcm_assert(cinfo.num_components == 4);
PI = PhotometricInterpretation::CMYK;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 YCbCr path unconditionally sets PlanarConfiguration = 1

Standard baseline JPEG stores YCbCr components interleaved (PlanarConfiguration = 0). Setting this->PlanarConfiguration = 1 tells GDCM the pixel data is planar-separated, which would cause callers to de-interleave pixel data incorrectly. The original gdcmJPEGBITSCodec.hxx does not set PlanarConfiguration to 1 for this case. Verify whether this is intentional and if not, remove or guard it.

extern "C" {
#include "itk_jpeg.h"
#include <itkjpeg-turbo/jpegint.h> // for cinfo.master->lossless
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dependency on private libjpeg-turbo internal header jpegint.h

#include <itkjpeg-turbo/jpegint.h> pulls in an internal, unsupported header to access cinfo->master->lossless. This field is not part of the public API and can be removed or renamed in future libjpeg-turbo releases without notice, silently breaking the build. A more stable alternative is to check cinfo.process == JPROC_LOSSLESS if available through public headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:ThirdParty Issues affecting the ThirdParty module type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants