Bug #1004: dcm2json produces invalid UTF-8 output for some incorrect DICOM files - DCMTK - OFFIS DCMTK and DICOM Projects

Actions

Copy link

Bug #1004

closed

dcm2json produces invalid UTF-8 output for some incorrect DICOM files

Added by Marco Eichelberg over 4 years ago. Updated over 4 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Marco Eichelberg

Category:

Library and Apps

Target version:

3.6.7

Start date:

2021-08-27

Due date:

% Done:

100%

Estimated time:

1:00 h

Module:

dcmdata

Operating System:

Compiler:

Description

The JSON specifications require all JSON scripts to be encoded in UTF-8. dcm2json, therefore, converts DICOM datasets to UTF-8 before writing them to JSON.
Currently, however, DICOM files that do not contain (0008,0005) SpecificCharacterSet but do contain extended characters are simply passed through to JSON, possibly resulting in invalid UTF-8.

dcm2json should check and report this case, just like dcm2xml or dcmconv +U8.

Furthermore, DICOM files not containing (0008,0005) SpecificCharacterSet (or containing the value "ISO_IR 6") should be written to JSON without setting SpecificCharacterSet to ISO_IR 192.

Reported 2021-08-26 by Mathieu Malaterre <mathieu.malaterre@gmail.com>.

Files

badUnc.dcm (513 KB) badUnc.dcm

Example file containing extended characters but no SpecificCharacterSet

Marco Eichelberg, 2021-08-27 13:00

Actions

Copy link

Updated by Marco Eichelberg over 4 years ago

Status changed from New to Closed
% Done changed from 0 to 100
Estimated time set to 1:00 h

Closed by commit #92da003ff.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DCMTK

Bug #1004

dcm2json produces invalid UTF-8 output for some incorrect DICOM files

Updated by Marco Eichelberg over 4 years ago