Enhanced character set support¶
Beginning with DCMTK 3.6.1 (20111208), support for handling enhanced character sets has been added to the toolkit. Also see this blog post for details.
The current status of implementation (as of 2013-07-26) is as follows:
- The OFCharacterEncoding class provides a wrapper around libiconv and allows for converting between different character encodings, e.g. from ISO 8859-1 (Latin-1) to UTF-8.
- In addition, this class also provides an interface that allows for using the Windows-specific conversion mechanism to and from wide character encoding (UTF-16) -- without requiring libiconv.
- The DcmSpecificCharacterSet class builds on top of OFCharacterEncoding and allows for converting between different DICOM character sets.
- For the input, all currently defined DICOM character sets are supported (including code extensions according to ISO 2022 and multi-byte character sets).
- For the output, support is limited to character sets without code extensions, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8.
- The DICOM data structure classes DcmFileFormat, DcmItem and DcmDirectoryRecord allow for converting all element values that are contained in this "dataset" and that are affected by Specific Character Set (0008,0005) to a given destination character set. This also includes updating the value of the data element Specific Character Set (0008,0005).
- The command line classes OFCommandLine and OFConsoleApplication have been extended by supporting wide character strings on Windows systems.
- The command line tools dcmdump and dcmconv have options for converting all element values that are affected by Specific Character Set (0008,0005) to e.g. UTF-8.
- E.g., the dcmdump option --convert-to-utf8 (+U8) can be used to display any DICOM encoded string in a dataset, e.g. some Asian patient name, on a UTF-8 console.
What's still missing is the following:
- The various
getXXX()
methods of the dcmdata classes do not support the character set conversion of the returned value, i.e. the "dataset", in which this element is contained, has to be converted to the destination character set before calling thegetXXX()
method. - The various
setXXX()
methods in this module also do not (yet) support character set conversion, i.e. the passed string value has to be encoded in the expected character set. - All command line tools need to be extended by also supporting wide character strings (on Windows systems) as input, i.e. using
wmain()
instead ofmain()
based on theDCMTK_MAIN_FUNCTION
macro.