DCMTK  Version 3.7.0
OFFIS DICOM Toolkit
Public Member Functions | Static Public Member Functions | Protected Member Functions | Private Types | Private Attributes | List of all members
DcmSpecificCharacterSet Class Reference

A class for managing and converting between different DICOM character sets. More...

Public Member Functions

 DcmSpecificCharacterSet ()
 constructor. More...
 
virtual ~DcmSpecificCharacterSet ()
 destructor
 
virtual void clear ()
 clear the internal state. More...
 
virtual operator OFBool () const
 query whether selectCharacterSet() has successfully been called for this object, i.e. whether convertString() may be called. More...
 
virtual OFBool operator! () const
 query whether selectCharacterSet() has not been called before, i.e. convertString() would fail. More...
 
virtual const OFStringgetSourceCharacterSet () const
 get currently selected source DICOM character set(s). More...
 
virtual const OFStringgetDestinationCharacterSet () const
 get currently selected destination DICOM character set. More...
 
virtual const OFStringgetDestinationEncoding () const
 get currently selected destination encoding, i.e. the name of the character set as used by the underlying character encoding library for the conversion. More...
 
virtual unsigned getConversionFlags () const
 get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion. More...
 
virtual OFCondition setConversionFlags (const unsigned flags)
 set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion. More...
 
virtual OFCondition selectCharacterSet (const OFString &fromCharset, const OFString &toCharset="ISO_IR 192")
 select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert. More...
 
virtual OFCondition selectCharacterSet (DcmItem &dataset, const OFString &toCharset="ISO_IR 192")
 select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert. More...
 
virtual OFCondition convertString (const OFString &fromString, OFString &toString, const OFString &delimiters="")
 convert the given string from the selected source character set(s) to the selected destination character set. More...
 
virtual OFCondition convertString (const char *fromString, const size_t fromLength, OFString &toString, const OFString &delimiters="")
 convert the given string from the selected source character set(s) to the selected destination character set. More...
 

Static Public Member Functions

static OFBool isConversionAvailable ()
 check whether the underlying character set conversion library is available. More...
 
static size_t countCharactersInUTF8String (const OFString &utf8String)
 count characters in given UTF-8 string and return the resulting number of so-called "code points". More...
 

Protected Member Functions

virtual OFCondition determineDestinationEncoding (const OFString &toCharset)
 determine the destination character encoding (as used by the underlying character encoding library) from the given DICOM defined term (specific character set), and set the member variables accordingly. More...
 
virtual OFCondition selectCharacterSetWithoutCodeExtensions ()
 select a particular DICOM character set without code extensions for subsequent conversions. More...
 
virtual OFCondition selectCharacterSetWithCodeExtensions (const unsigned long sourceVM)
 select a particular DICOM character set with code extensions for subsequent conversions. More...
 
virtual OFCondition convertStringWithoutCodeExtensions (const char *fromString, const size_t fromLength, OFString &toString, const OFString &delimiters)
 convert the given string from the selected source character set (without code extensions) to the selected destination character set More...
 
virtual OFCondition convertStringWithCodeExtensions (const char *fromString, const size_t fromLength, OFString &toString, const OFString &delimiters, const OFBool hasEscapeChar)
 convert the given string from the selected source character set(s) to the selected destination character set. More...
 
virtual OFBool checkForEscapeCharacter (const char *strValue, const size_t strLength) const
 check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022 More...
 
virtual OFString convertToLengthLimitedOctalString (const char *strValue, const size_t strLength) const
 convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation. More...
 

Private Types

typedef OFMap< OFString, OFCharacterEncodingT_EncodingConvertersMap
 type definition of a map storing the identifier (key) of a character set and the associated character set converter
 

Private Attributes

OFString SourceCharacterSet
 selected source character set(s) based on one or more DICOM defined terms
 
OFString DestinationCharacterSet
 selected destination character set based on a single DICOM defined term
 
OFString DestinationEncoding
 selected destination encoding based on names supported by the underlying character encoding library
 
OFCharacterEncoding DefaultEncodingConverter
 character encoding converter
 
T_EncodingConvertersMap EncodingConverters
 map of character set conversion descriptors (only used if multiple character sets are needed)
 

Detailed Description

A class for managing and converting between different DICOM character sets.

The conversion relies on the OFCharacterEncoding class, which again relies on an underlying character encoding library (e.g. oficonv or libiconv).

Note
Please note that a current limitation is that only a single value is allowed for the destination character set (i.e. no code extensions). Of course, for the source character set, also multiple values are supported.

Constructor & Destructor Documentation

◆ DcmSpecificCharacterSet()

DcmSpecificCharacterSet::DcmSpecificCharacterSet ( )

constructor.

Initializes the member variables.

Member Function Documentation

◆ checkForEscapeCharacter()

virtual OFBool DcmSpecificCharacterSet::checkForEscapeCharacter ( const char *  strValue,
const size_t  strLength 
) const
protectedvirtual

check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022

Parameters
strValueinput string to be checked for any escape character
strLengthlength of the input string
Returns
OFTrue if an escape character has been found, OFFalse otherwise

◆ clear()

virtual void DcmSpecificCharacterSet::clear ( )
virtual

clear the internal state.

This also forgets about the currently selected character sets, so selectCharacterSet() has to be called again before a string can be converted with convertString().

◆ convertString() [1/2]

virtual OFCondition DcmSpecificCharacterSet::convertString ( const char *  fromString,
const size_t  fromLength,
OFString toString,
const OFString delimiters = "" 
)
virtual

convert the given string from the selected source character set(s) to the selected destination character set.

That means selectCharacterSet() has to be called prior to this method. Since the length of the input string has to be specified explicitly, the string can contain more than one NULL byte.

Note
The conversion code does not perform a thorough validation of the string. For example, characters that are permitted in the source character set but forbidden in DICOM (such as byte positions 0x80-0x9F in ISO_IR 100) may be converted without warning or error.
Parameters
fromStringinput string to be converted (using the currently selected character set)
fromLengthlength of the input string (number of bytes without the trailing NULL byte)
toStringreference to variable where the converted string (using the currently selected destination character set) is stored
delimitersoptional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF, FF and HT are always regarded as delimiters (see DICOM PS 3.5).
Returns
status, EC_Normal if successful, an error code otherwise

◆ convertString() [2/2]

virtual OFCondition DcmSpecificCharacterSet::convertString ( const OFString fromString,
OFString toString,
const OFString delimiters = "" 
)
virtual

convert the given string from the selected source character set(s) to the selected destination character set.

That means selectCharacterSet() has to be called prior to this method.

Note
The conversion code does not perform a thorough validation of the string. For example, characters that are permitted in the source character set but forbidden in DICOM (such as byte positions 0x80-0x9F in ISO_IR 100) may be converted without warning or error.
Parameters
fromStringinput string to be converted (using the currently selected source character set)
toStringreference to variable where the converted string (using the currently selected destination character set) is stored
delimitersoptional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF, FF and HT are always regarded as delimiters (see DICOM PS 3.5).
Returns
status, EC_Normal if successful, an error code otherwise

◆ convertStringWithCodeExtensions()

virtual OFCondition DcmSpecificCharacterSet::convertStringWithCodeExtensions ( const char *  fromString,
const size_t  fromLength,
OFString toString,
const OFString delimiters,
const OFBool  hasEscapeChar 
)
protectedvirtual

convert the given string from the selected source character set(s) to the selected destination character set.

This method supports code extension techniques according to ISO 2022 for the input string.

Parameters
fromStringinput string to be converted
fromLengthlength of the input string (in bytes)
toStringreference to variable where to store the converted string
delimitersstring of characters regarded as delimiters
hasEscapeCharflag indicating wether the input string contains one or more escape characters (ESC)
Returns
status, EC_Normal if successful, an error code otherwise

◆ convertStringWithoutCodeExtensions()

virtual OFCondition DcmSpecificCharacterSet::convertStringWithoutCodeExtensions ( const char *  fromString,
const size_t  fromLength,
OFString toString,
const OFString delimiters 
)
protectedvirtual

convert the given string from the selected source character set (without code extensions) to the selected destination character set

Parameters
fromStringinput string to be converted
fromLengthlength of the input string (in bytes)
toStringreference to variable where to store the converted string
delimitersstring of characters regarded as delimiters
Returns
status, EC_Normal if successful, an error code otherwise

◆ convertToLengthLimitedOctalString()

virtual OFString DcmSpecificCharacterSet::convertToLengthLimitedOctalString ( const char *  strValue,
const size_t  strLength 
) const
protectedvirtual

convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation.

The total length of the string is always limited to a particular maximum (see implementation). If the converted string would be longer, it is cropped and "..." is appended to indicate this cropping.

Parameters
strValueinput string to be converted and possibly cropped
strLengthlength of the input string
Returns
resulting string in octal format

◆ countCharactersInUTF8String()

static size_t DcmSpecificCharacterSet::countCharactersInUTF8String ( const OFString utf8String)
static

count characters in given UTF-8 string and return the resulting number of so-called "code points".

Please note that invalid UTF-8 encodings are not handled properly. ASCII strings (7-bit) are also supported, although OFString::length() is probably much faster.

Parameters
utf8Stringvalid character string with UTF-8 encoding
Returns
number of characters (code points) in given UTF-8 string

◆ determineDestinationEncoding()

virtual OFCondition DcmSpecificCharacterSet::determineDestinationEncoding ( const OFString toCharset)
protectedvirtual

determine the destination character encoding (as used by the underlying character encoding library) from the given DICOM defined term (specific character set), and set the member variables accordingly.

Parameters
toCharsetname of the destination character set used for the output string
Returns
status, EC_Normal if successful, an error code otherwise

◆ getConversionFlags()

virtual unsigned DcmSpecificCharacterSet::getConversionFlags ( ) const
virtual

get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion.

Note
This method will always return 0 if no encoder was selected using selectEncoding() before calling it.
Returns
a combination the IllegalSequenceMode constants (bitwise or) that is currently set or 0 if the current mode cannot be determined.

◆ getDestinationCharacterSet()

virtual const OFString& DcmSpecificCharacterSet::getDestinationCharacterSet ( ) const
virtual

get currently selected destination DICOM character set.

Please note that the returned string, which contains a defined term, is always normalized, i.e. leading and trailing spaces have been removed.

Returns
currently selected destination DICOM character set or an empty string if none is selected (identical to ASCII, which is the default)

◆ getDestinationEncoding()

virtual const OFString& DcmSpecificCharacterSet::getDestinationEncoding ( ) const
virtual

get currently selected destination encoding, i.e. the name of the character set as used by the underlying character encoding library for the conversion.

If code extension techniques are used to switch between different character encodings, the main/default encoding is returned.

Returns
currently selected destination encoding or an empty string if none is selected

◆ getSourceCharacterSet()

virtual const OFString& DcmSpecificCharacterSet::getSourceCharacterSet ( ) const
virtual

get currently selected source DICOM character set(s).

Please note that the returned string can contain multiple values (defined terms separated by a backslash) if code extension techniques are used. Furthermore, the returned string is always normalized, i.e. leading and trailing spaces have been removed.

Returns
currently selected source DICOM character set(s) or an empty string if none is selected (identical to ASCII, which is the default)

◆ isConversionAvailable()

static OFBool DcmSpecificCharacterSet::isConversionAvailable ( )
static

check whether the underlying character set conversion library is available.

If not, no conversion between different character sets will be possible.

Returns
OFTrue if the character set conversion is available, OFFalse otherwise

◆ operator OFBool()

virtual DcmSpecificCharacterSet::operator OFBool ( ) const
virtual

query whether selectCharacterSet() has successfully been called for this object, i.e. whether convertString() may be called.

Returns
OFTrue if selectCharacterSet() was successfully called before, OFFalse if not (or clear() has been called in the meantime).

◆ operator!()

virtual OFBool DcmSpecificCharacterSet::operator! ( ) const
virtual

query whether selectCharacterSet() has not been called before, i.e. convertString() would fail.

Returns
OFTrue if selectCharacterSet() must be called before using convertString(), OFFalse if it has already been called.

◆ selectCharacterSet() [1/2]

virtual OFCondition DcmSpecificCharacterSet::selectCharacterSet ( const OFString fromCharset,
const OFString toCharset = "ISO_IR 192" 
)
virtual

select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.

The defined terms for a particular character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are given for 'fromCharset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.

Parameters
fromCharsetname of the source character set(s) used for the input string as given in the DICOM attribute Specific Character Set (0008,0005). Leading and trailing spaces are removed automatically (if present).
toCharsetname of the destination character set used for the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8).
Returns
status, EC_Normal if successful, an error code otherwise

◆ selectCharacterSet() [2/2]

virtual OFCondition DcmSpecificCharacterSet::selectCharacterSet ( DcmItem dataset,
const OFString toCharset = "ISO_IR 192" 
)
virtual

select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.

The source character set is determined from the DICOM element Specific Character Set (0008,0005) stored in the given dataset/item. The defined terms for the destination character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are found in the Specific Character Set element of the given 'dataset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.

Parameters
datasetDICOM dataset or item from which the source character set should be retrieved. If the data element Specific Character Set (0008,0005) is empty or missing, the default character set (i.e. ASCII) is used.
toCharsetname of the destination character set used for the the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8).
Returns
status, EC_Normal if successful, an error code otherwise

◆ selectCharacterSetWithCodeExtensions()

virtual OFCondition DcmSpecificCharacterSet::selectCharacterSetWithCodeExtensions ( const unsigned long  sourceVM)
protectedvirtual

select a particular DICOM character set with code extensions for subsequent conversions.

The corresponding DICOM defined terms for the source character set are determined from the member variable 'SourceCharacterSet'.

Parameters
sourceVMvalue multiplicity of the member variable 'SourceCharacterSet'. Usually, this value has already been determined by the calling method.
Returns
status, EC_Normal if successful, an error code otherwise

◆ selectCharacterSetWithoutCodeExtensions()

virtual OFCondition DcmSpecificCharacterSet::selectCharacterSetWithoutCodeExtensions ( )
protectedvirtual

select a particular DICOM character set without code extensions for subsequent conversions.

The corresponding DICOM defined term for the source character set is determined from the member variable 'SourceCharacterSet'.

Returns
status, EC_Normal if successful, an error code otherwise

◆ setConversionFlags()

virtual OFCondition DcmSpecificCharacterSet::setConversionFlags ( const unsigned  flags)
virtual

set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion.

Precondition
An encoding has been selected by successfully calling OFCharacterEncoding::selectEncoding(), i.e. OFCharacterEncoding::isLibraryAvailable() and *this evaluate to OFTrue.
Parameters
flagsthe conversion flags that shall be used, a combination of the OFCharacterEncoding::ConversionFlags constants, e.g. TransliterateIllegalSequences | DiscardIllegalSequences.
Returns
EC_Normal if the flags were set, an error code otherwise, i.e. if the flags are not supported by the underlying implementation.
See also
OFCharacterEncoding::supportsConversionFlags()

The documentation for this class was generated from the following file:


Generated on Mon Dec 15 2025 for DCMTK Version 3.7.0 by Doxygen 1.9.1