DCMTK
Version 3.6.2
OFFIS DICOM Toolkit
|
A class for managing and converting between different DICOM character sets. More...
Public Member Functions | |
DcmSpecificCharacterSet () | |
constructor. More... | |
~DcmSpecificCharacterSet () | |
destructor | |
void | clear () |
clear the internal state. More... | |
operator OFBool () const | |
query whether selectCharacterSet() has successfully been called for this object, i.e. whether convertString() may be called. More... | |
OFBool | operator! () const |
query whether selectCharacterSet() has not been called before, i.e. convertString() would fail. More... | |
const OFString & | getSourceCharacterSet () const |
get currently selected source DICOM character set(s). More... | |
const OFString & | getDestinationCharacterSet () const |
get currently selected destination DICOM character set. More... | |
const OFString & | getDestinationEncoding () const |
get currently selected destination encoding, i.e. the name of the character set as used by the underlying character encoding library for the conversion. More... | |
unsigned | getConversionFlags () const |
get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion. More... | |
OFCondition | setConversionFlags (const unsigned flags) |
set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion. More... | |
OFCondition | selectCharacterSet (const OFString &fromCharset, const OFString &toCharset="ISO_IR 192") |
select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert. More... | |
OFCondition | selectCharacterSet (DcmItem &dataset, const OFString &toCharset="ISO_IR 192") |
select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert. More... | |
OFCondition | convertString (const OFString &fromString, OFString &toString, const OFString &delimiters="") |
convert the given string from the selected source character set(s) to the selected destination character set. More... | |
OFCondition | convertString (const char *fromString, const size_t fromLength, OFString &toString, const OFString &delimiters="") |
convert the given string from the selected source character set(s) to the selected destination character set. More... | |
Static Public Member Functions | |
static OFBool | isConversionAvailable () |
check whether the underlying character set conversion library is available. More... | |
static size_t | countCharactersInUTF8String (const OFString &utf8String) |
count characters in given UTF-8 string and return the resulting number of so-called "code points". More... | |
Protected Member Functions | |
OFCondition | determineDestinationEncoding (const OFString &toCharset) |
determine the destination character encoding (as used by the underlying character encoding library) from the given DICOM defined term (specific character set), and set the member variables accordingly. More... | |
OFCondition | selectCharacterSetWithoutCodeExtensions () |
select a particular DICOM character set without code extensions for subsequent conversions. More... | |
OFCondition | selectCharacterSetWithCodeExtensions (const unsigned long sourceVM) |
select a particular DICOM character set with code extensions for subsequent conversions. More... | |
OFBool | checkForEscapeCharacter (const char *strValue, const size_t strLength) const |
check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022 More... | |
OFString | convertToLengthLimitedOctalString (const char *strValue, const size_t strLength) const |
convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation. More... | |
Private Types | |
typedef OFMap< OFString, OFCharacterEncoding > | T_EncodingConvertersMap |
type definition of a map storing the identifier (key) of a character set and the associated character set converter | |
Private Attributes | |
OFString | SourceCharacterSet |
selected source character set(s) based on one or more DICOM defined terms | |
OFString | DestinationCharacterSet |
selected destination character set based on a single DICOM defined term | |
OFString | DestinationEncoding |
selected destination encoding based on names supported by the underlying character encoding library | |
OFCharacterEncoding | DefaultEncodingConverter |
character encoding converter | |
T_EncodingConvertersMap | EncodingConverters |
map of character set conversion descriptors (only used if multiple character sets are needed) | |
A class for managing and converting between different DICOM character sets.
The conversion relies on the OFCharacterEncoding class, which again relies on an underlying character encoding library (e.g. libiconv or ICU).
DcmSpecificCharacterSet::DcmSpecificCharacterSet | ( | ) |
constructor.
Initializes the member variables.
|
protected |
check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022
strValue | input string to be checked for any escape character |
strLength | length of the input string |
void DcmSpecificCharacterSet::clear | ( | ) |
clear the internal state.
This also forgets about the currently selected character sets, so selectCharacterSet() has to be called again before a string can be converted with convertString().
OFCondition DcmSpecificCharacterSet::convertString | ( | const OFString & | fromString, |
OFString & | toString, | ||
const OFString & | delimiters = "" |
||
) |
convert the given string from the selected source character set(s) to the selected destination character set.
That means selectCharacterSet() has to be called prior to this method.
fromString | input string to be converted (using the currently selected source character set) |
toString | reference to variable where the converted string (using the currently selected destination character set) is stored |
delimiters | optional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF, FF and HT are always regarded as delimiters (see DICOM PS 3.5). |
OFCondition DcmSpecificCharacterSet::convertString | ( | const char * | fromString, |
const size_t | fromLength, | ||
OFString & | toString, | ||
const OFString & | delimiters = "" |
||
) |
convert the given string from the selected source character set(s) to the selected destination character set.
That means selectCharacterSet() has to be called prior to this method. Since the length of the input string has to be specified explicitly, the string can contain more than one NULL byte.
fromString | input string to be converted (using the currently selected character set) |
fromLength | length of the input string (number of bytes without the trailing NULL byte) |
toString | reference to variable where the converted string (using the currently selected destination character set) is stored |
delimiters | optional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF, FF and HT are always regarded as delimiters (see DICOM PS 3.5). |
|
protected |
convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation.
The total length of the string is always limited to a particular maximum (see implementation). If the converted string would be longer, it is cropped and "..." is appended to indicate this cropping.
strValue | input string to be converted and possibly cropped |
strLength | length of the input string |
|
static |
count characters in given UTF-8 string and return the resulting number of so-called "code points".
Please note that invalid UTF-8 encodings are not handled properly. ASCII strings (7-bit) are also supported, although OFString::length() is probably much faster.
utf8String | valid character string with UTF-8 encoding |
|
protected |
determine the destination character encoding (as used by the underlying character encoding library) from the given DICOM defined term (specific character set), and set the member variables accordingly.
toCharset | name of the destination character set used for the output string |
unsigned DcmSpecificCharacterSet::getConversionFlags | ( | ) | const |
get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion.
const OFString& DcmSpecificCharacterSet::getDestinationCharacterSet | ( | ) | const |
get currently selected destination DICOM character set.
Please note that the returned string, which contains a defined term, is always normalized, i.e. leading and trailing spaces have been removed.
const OFString& DcmSpecificCharacterSet::getDestinationEncoding | ( | ) | const |
get currently selected destination encoding, i.e. the name of the character set as used by the underlying character encoding library for the conversion.
If code extension techniques are used to switch between different character encodings, the main/default encoding is returned.
const OFString& DcmSpecificCharacterSet::getSourceCharacterSet | ( | ) | const |
get currently selected source DICOM character set(s).
Please note that the returned string can contain multiple values (defined terms separated by a backslash) if code extension techniques are used. Furthermore, the returned string is always normalized, i.e. leading and trailing spaces have been removed.
|
static |
check whether the underlying character set conversion library is available.
If not, no conversion between different character sets will be possible.
DcmSpecificCharacterSet::operator OFBool | ( | ) | const |
query whether selectCharacterSet() has successfully been called for this object, i.e. whether convertString() may be called.
OFBool DcmSpecificCharacterSet::operator! | ( | ) | const |
query whether selectCharacterSet() has not been called before, i.e. convertString() would fail.
OFCondition DcmSpecificCharacterSet::selectCharacterSet | ( | const OFString & | fromCharset, |
const OFString & | toCharset = "ISO_IR 192" |
||
) |
select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.
The defined terms for a particular character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are given for 'fromCharset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.
fromCharset | name of the source character set(s) used for the input string as given in the DICOM attribute Specific Character Set (0008,0005). Leading and trailing spaces are removed automatically (if present). |
toCharset | name of the destination character set used for the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8). |
OFCondition DcmSpecificCharacterSet::selectCharacterSet | ( | DcmItem & | dataset, |
const OFString & | toCharset = "ISO_IR 192" |
||
) |
select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.
The source character set is determined from the DICOM element Specific Character Set (0008,0005) stored in the given dataset/item. The defined terms for the destination character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are found in the Specific Character Set element of the given 'dataset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.
dataset | DICOM dataset or item from which the source character set should be retrieved. If the data element Specific Character Set (0008,0005) is empty or missing, the default character set (i.e. ASCII) is used. |
toCharset | name of the destination character set used for the the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8). |
|
protected |
select a particular DICOM character set with code extensions for subsequent conversions.
The corresponding DICOM defined terms for the source character set are determined from the member variable 'SourceCharacterSet'.
sourceVM | value multiplicity of the member variable 'SourceCharacterSet'. Usually, this value has already been determined by the calling method. |
|
protected |
select a particular DICOM character set without code extensions for subsequent conversions.
The corresponding DICOM defined term for the source character set is determined from the member variable 'SourceCharacterSet'.
OFCondition DcmSpecificCharacterSet::setConversionFlags | ( | const unsigned | flags | ) |
set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion.
flags | the conversion flags that shall be used, a combination of the OFCharacterEncoding::ConversionFlags constants, e.g. TransliterateIllegalSequences | DiscardIllegalSequences. |