A class for managing and converting between different DICOM character sets. More...

Public Member Functions
	DcmSpecificCharacterSet ()
	constructor.
	~DcmSpecificCharacterSet ()
	destructor
void	clear ()
	clear the internal state.
const OFString &	getSourceCharacterSet () const
	get currently selected source DICOM character set(s).
const OFString &	getDestinationCharacterSet () const
	get currently selected destination DICOM character set.
const OFString &	getDestinationEncoding () const
	get currently selected destination encoding, i.e.
OFBool	getTransliterationMode () const
	get mode specifying whether a character that cannot be represented in the destination character encoding is approximated through one or more characters that look similar to the original one.
OFCondition	selectCharacterSet (const OFString &fromCharset, const OFString &toCharset="ISO_IR 192", const OFBool transliterate=OFFalse)
	select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.
OFCondition	selectCharacterSet (DcmItem &dataset, const OFString &toCharset="ISO_IR 192", const OFBool transliterate=OFFalse)
	select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.
OFCondition	convertString (const OFString &fromString, OFString &toString, const OFString &delimiters="")
	convert the given string from the selected source character set(s) to the selected destination character set.
OFCondition	convertString (const char *fromString, const size_t fromLength, OFString &toString, const OFString &delimiters="")
	convert the given string from the selected source character set(s) to the selected destination character set.
Static Public Member Functions
static OFBool	isConversionLibraryAvailable ()
	check whether the underlying character set conversion library is available.
static size_t	countCharactersInUTF8String (const OFString &utf8String)
	count characters in given UTF-8 string and return the resulting number of so-called "code points".
Protected Types
typedef OFMap< OFString, OFCharacterEncoding::T_Descriptor >	T_DescriptorMap
	type definition of a map storing the identifier (key) of a character set and the associated conversion descriptor
Protected Member Functions
OFCondition	determineDestinationEncoding (const OFString &toCharset)
	determine the destination character encoding (as used by libiconv) from the given DICOM defined term (specific character set), and set the member variables accordingly.
OFCondition	selectCharacterSetWithoutCodeExtensions ()
	select a particular DICOM character set without code extensions for subsequent conversions.
OFCondition	selectCharacterSetWithCodeExtensions (const unsigned long sourceVM)
	select a particular DICOM character set with code extensions for subsequent conversions.
void	closeConversionDescriptors ()
	close any currently open character set conversion descriptor(s).
OFBool	checkForEscapeCharacter (const char *strValue, const size_t strLength) const
	check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022
OFString	convertToLengthLimitedOctalString (const char *strValue, const size_t strLength) const
	convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation.
Private Member Functions
	DcmSpecificCharacterSet (const DcmSpecificCharacterSet &)
DcmSpecificCharacterSet &	operator= (const DcmSpecificCharacterSet &)
Private Attributes
OFString	SourceCharacterSet
	selected source character set(s) based on one or more DICOM defined terms
OFString	DestinationCharacterSet
	selected destination character set based on a single DICOM defined term
OFString	DestinationEncoding
	selected destination encoding based on names supported by the libiconv toolkit
OFCharacterEncoding	EncodingConverter
	character encoding converter
T_DescriptorMap	ConversionDescriptors
	map of character set conversion descriptors (only used if multiple character sets are needed)

Detailed Description

A class for managing and converting between different DICOM character sets.

The conversion relies on the OFCharacterEncoding class, which again relies on the libiconv toolkit (if available).

Constructor & Destructor Documentation

DcmSpecificCharacterSet::DcmSpecificCharacterSet ( )

constructor.

Initializes the member variables.

Member Function Documentation

OFBool DcmSpecificCharacterSet::checkForEscapeCharacter	(	const char *	strValue,
		const size_t	strLength
	)		const `[protected]`

check whether the given string contains at least one escape character (ESC), because it is used for code extension techniques like ISO 2022

Parameters:

strValue	input string to be checked for any escape character
strLength	length of the input string

Returns:: OFTrue if an escape character has been found, OFFalse otherwise

void DcmSpecificCharacterSet::clear ( )

clear the internal state.

This also forgets about the currently selected character sets, so selectCharacterSet() has to be called again before a string can be converted with convertString().

void DcmSpecificCharacterSet::closeConversionDescriptors ( ) [protected]

close any currently open character set conversion descriptor(s).

Afterwards, no conversion descriptor is selected, pretty much like after the initialization with the constructor.

OFCondition DcmSpecificCharacterSet::convertString	(	const OFString &	fromString,
		OFString &	toString,
		const OFString &	delimiters = `""`
	)

convert the given string from the selected source character set(s) to the selected destination character set.

That means selectCharacterSet() has to be called prior to this method.

Parameters:

fromString	input string to be converted (using the currently selected source character set)
toString	reference to variable where the converted string (using the currently selected destination character set) is stored
delimiters	optional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF and FF are always regarded as delimiters (see DICOM PS 3.5).

Returns:: status, EC_Normal if successful, an error code otherwise

OFCondition DcmSpecificCharacterSet::convertString	(	const char *	fromString,
		const size_t	fromLength,
		OFString &	toString,
		const OFString &	delimiters = `""`
	)

convert the given string from the selected source character set(s) to the selected destination character set.

That means selectCharacterSet() has to be called prior to this method. Since the length of the input string has to be specified explicitly, the string can contain more than one NULL byte.

Parameters:

fromString	input string to be converted (using the currently selected character set)
fromLength	length of the input string (number of bytes without the trailing NULL byte)
toString	reference to variable where the converted string (using the currently selected destination character set) is stored
delimiters	optional string of characters that are regarded as delimiters, i.e. when found the character set is switched back to the default. CR, LF and FF are always regarded as delimiters (see DICOM PS 3.5).

Returns:: status, EC_Normal if successful, an error code otherwise

OFString DcmSpecificCharacterSet::convertToLengthLimitedOctalString	(	const char *	strValue,
		const size_t	strLength
	)		const `[protected]`

convert given string to octal format, i.e. all non-ASCII and control characters are converted to their octal representation.

The total length of the string is always limited to a particular maximum (see implementation). If the converted string would be longer, it is cropped and "..." is appended to indicate this cropping.

Parameters:

strValue	input string to be converted and possibly cropped
strLength	length of the input string

Returns:: resulting string in octal format

static size_t DcmSpecificCharacterSet::countCharactersInUTF8String ( const OFString & utf8String ) [static]

count characters in given UTF-8 string and return the resulting number of so-called "code points".

Please note that invalid UTF-8 encodings are not handled properly. ASCII strings (7-bit) are also supported, although OFString::length() is probably much faster.

Parameters:

utf8String valid character string with UTF-8 encoding

Returns:: number of characters (code points) in given UTF-8 string

OFCondition DcmSpecificCharacterSet::determineDestinationEncoding ( const OFString & toCharset ) [protected]

determine the destination character encoding (as used by libiconv) from the given DICOM defined term (specific character set), and set the member variables accordingly.

Parameters:

toCharset name of the destination character set used for the output string

Returns:: status, EC_Normal if successful, an error code otherwise

const OFString& DcmSpecificCharacterSet::getDestinationCharacterSet ( ) const

get currently selected destination DICOM character set.

Please note that the returned string, which contains a defined term, is always normalized, i.e. leading and trailing spaces have been removed.

Returns:: currently selected destination DICOM character set or an empty string if none is selected (identical to ASCII, which is the default)

const OFString& DcmSpecificCharacterSet::getDestinationEncoding ( ) const

get currently selected destination encoding, i.e.

the name of the character set as used by libiconv for the conversion. If code extension techniques are used to switch between different character encodings, the main/default encoding is returned.

Returns:: currently selected destination encoding or an empty string if none is selected

const OFString& DcmSpecificCharacterSet::getSourceCharacterSet ( ) const

get currently selected source DICOM character set(s).

Please note that the returned string can contain mutiple values (defined terms separated by a backslash) if code extension techniques are used. Furthermore, the returned string is always normalized, i.e. leading and trailing spaces have been removed.

Returns:: currently selected source DICOM character set(s) or an empty string if none is selected (identical to ASCII, which is the default)

OFBool DcmSpecificCharacterSet::getTransliterationMode ( ) const

get mode specifying whether a character that cannot be represented in the destination character encoding is approximated through one or more characters that look similar to the original one.

See selectCharacterSet().

Returns:: current value of the mode. OFTrue means that the mode is enabled, OFFalse means disabled.

static OFBool DcmSpecificCharacterSet::isConversionLibraryAvailable ( ) [static]

check whether the underlying character set conversion library is available.

If the library is not available, no conversion between different character sets will be possible.

Returns:: OFTrue if the character set conversion library is available, OFFalse otherwise

OFCondition DcmSpecificCharacterSet::selectCharacterSet	(	const OFString &	fromCharset,
		const OFString &	toCharset = `"ISO_IR 192"`,
		const OFBool	transliterate = `OFFalse`
	)

select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.

The defined terms for a particular character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are given for 'fromCharset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.

Parameters:

fromCharset	name of the source character set(s) used for the input string as given in the DICOM attribute Specific Character Set (0008,0005). Leading and trailing spaces are removed automatically (if present).
toCharset	name of the destination character set used for the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8).
transliterate	mode specifying whether a character that cannot be represented in the destination character encoding is approximated through one or more characters that look similar to the original one. By default, this mode is disabled.

Returns:: status, EC_Normal if successful, an error code otherwise

OFCondition DcmSpecificCharacterSet::selectCharacterSet	(	DcmItem &	dataset,
		const OFString &	toCharset = `"ISO_IR 192"`,
		const OFBool	transliterate = `OFFalse`
	)

select DICOM character sets for the input and output string, between which subsequent calls of convertString() convert.

The source character set is determined from the DICOM element Specific Character Set (0008,0005) stored in the given dataset/item. The defined terms for the destination character set can be found in the DICOM standard, e.g. "ISO_IR 100" for ISO 8859-1 (Latin 1) or "ISO_IR 192" for Unicode in UTF-8. An empty string denotes the default character repertoire, which is ASCII (7-bit). If multiple values are found in the Specific Character Set element of the given 'dataset' (separated by a backslash) code extension techniques are used and escape sequences may be encountered in the source string to switch between the specified character sets.

Parameters:

dataset	DICOM dataset or item from which the source character set should be retrieved. If the data element Specific Character Set (0008,0005) is empty or missing, the default character set (i.e. ASCII) is used.
toCharset	name of the destination character set used for the output string. Only a single value is permitted (no code extensions). Leading and trailing spaces are removed automatically (if present). The default value is "ISO_IR 192" (Unicode in UTF-8).
transliterate	mode specifying whether a character that cannot be represented in the destination character encoding is approximated through one or more characters that look similar to the original one. By default, this mode is disabled.

Returns:: status, EC_Normal if successful, an error code otherwise

OFCondition DcmSpecificCharacterSet::selectCharacterSetWithCodeExtensions ( const unsigned long sourceVM ) [protected]

select a particular DICOM character set with code extensions for subsequent conversions.

The corresponding DICOM defined terms for the source character set are determined from the member variable 'SourceCharacterSet'.

Parameters:

sourceVM value multiplicity of the member variable 'SourceCharacterSet'. Usually, this value has already been determined by the calling method.

Returns:: status, EC_Normal if successful, an error code otherwise

OFCondition DcmSpecificCharacterSet::selectCharacterSetWithoutCodeExtensions ( ) [protected]

select a particular DICOM character set without code extensions for subsequent conversions.

The corresponding DICOM defined term for the source character set is determined from the member variable 'SourceCharacterSet'.

Returns:: status, EC_Normal if successful, an error code otherwise

The documentation for this class was generated from the following file:

dcmdata/include/dcmtk/dcmdata/dcspchrs.h

Public Member Functions

Static Public Member Functions

Protected Types

Protected Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation