DCMTK  Version 3.6.4
OFFIS DICOM Toolkit
Public Types | Public Member Functions | Static Public Member Functions | Private Attributes | List of all members
OFCharacterEncoding Class Reference

A class for managing and converting between different character encodings. More...

Public Types

enum  ConversionFlags { AbortTranscodingOnIllegalSequence = 1, DiscardIllegalSequences = 2, TransliterateIllegalSequences = 4 }
 Constants to control encoder behavior, e.g. regarding illegal character sequences. More...
 

Public Member Functions

 OFCharacterEncoding ()
 constructor. More...
 
 OFCharacterEncoding (const OFCharacterEncoding &rhs)
 copy constructor. More...
 
 ~OFCharacterEncoding ()
 destructor
 
OFCharacterEncodingoperator= (const OFCharacterEncoding &rhs)
 copy assignment. More...
 
 operator OFBool () const
 check whether this object refers to a valid encoder. More...
 
OFBool operator! () const
 check whether this object does not refer to a valid encoder. More...
 
OFBool operator== (const OFCharacterEncoding &rhs) const
 check whether two OFCharacterEncoding instances refer to the same encoder. More...
 
OFBool operator!= (const OFCharacterEncoding &rhs) const
 check whether two OFCharacterEncoding instances do not refer to the same encoder. More...
 
void clear ()
 clear the internal state. More...
 
unsigned getConversionFlags () const
 get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion. More...
 
OFCondition setConversionFlags (const unsigned flags)
 set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion. More...
 
OFCondition selectEncoding (const OFString &fromEncoding, const OFString &toEncoding)
 select source and destination character encoding for subsequent conversion(s). More...
 
OFCondition convertString (const OFString &fromString, OFString &toString, const OFBool clearMode=OFTrue)
 convert the given string between the selected character encodings. More...
 
OFCondition convertString (const char *fromString, const size_t fromLength, OFString &toString, const OFBool clearMode=OFTrue)
 convert the given string between the selected character encodings. More...
 

Static Public Member Functions

static OFBool hasDefaultEncoding ()
 determine whether the underlying implementations defines a default encoding. More...
 
static OFString getLocaleEncoding ()
 get the character encoding of the currently set global locale. More...
 
static OFBool supportsConversionFlags (const unsigned flags)
 determine whether the underlying implementation supports the given conversion flags. More...
 
static OFBool isLibraryAvailable ()
 check whether character set conversion is available, e.g. the underlying encoding library is available. More...
 
static OFString getLibraryVersionString ()
 get version information of the underlying character encoding library. More...
 
static size_t countCharactersInUTF8String (const OFString &utf8String)
 count characters in given UTF-8 string and return the resulting number of so-called "code points". More...
 

Private Attributes

OFshared_ptr< Implementation > TheImplementation
 shared pointer to internal implementation (interface to character encoding library)
 

Detailed Description

A class for managing and converting between different character encodings.

The implementation relies on ICONV (native implementation or libiconv) or ICU, depending on the configuration.

Remarks
An encoder might be shared by copy constructing an OFCharacterEncoding object from an existing one. Both objects will refer to the same encoder once this is done, which will only be destroyed after both objects are, using OFshared_ptr internally.

Member Enumeration Documentation

◆ ConversionFlags

Constants to control encoder behavior, e.g. regarding illegal character sequences.

Currently defined constants may be used to control the implementation's behavior regarding illegal character sequences. An illegal character sequence is a sequence of characters in the source string that is only valid in the context of the source string's character set and has no valid representation in the character set of the destination string. Use these constants to control the transcoding behavior in case an illegal sequence is encountered.

Note
You may set a single one of the constants as the encoder behavior or even a combination (bitwise OR), however, it depends on the underlying implementation which flags/combinations are supported. Use supportsConversionFlags() to query this information at runtime.
Enumerator
AbortTranscodingOnIllegalSequence 

Abort transcoding (returning an error condition) if an illegal sequence is encountered.

DiscardIllegalSequences 

Skip over any illegal character sequences that are encountered.

TransliterateIllegalSequences 

Replace illegal character sequences with an available representation in the destination character set that somewhat resembles the meaning (i.e. ö -> "o).

The actual results may vary depending on the underlying implementation.

Constructor & Destructor Documentation

◆ OFCharacterEncoding() [1/2]

OFCharacterEncoding::OFCharacterEncoding ( )

constructor.

Will create an OFCharacterEncoding instance that does not refer to an encoder.

◆ OFCharacterEncoding() [2/2]

OFCharacterEncoding::OFCharacterEncoding ( const OFCharacterEncoding rhs)

copy constructor.

Will share the encoder of another OFCharacterEncoding instance.

Parameters
rhsanother OFCharacterEncoding instance.

Member Function Documentation

◆ clear()

void OFCharacterEncoding::clear ( )

clear the internal state.

This resets the converter and potentially frees all used resources (if this is the last OFCharacterEncoding instance referring to the encoder).

◆ convertString() [1/2]

OFCondition OFCharacterEncoding::convertString ( const OFString fromString,
OFString toString,
const OFBool  clearMode = OFTrue 
)

convert the given string between the selected character encodings.

That means selectEncoding() has to be called prior to this method.

Parameters
fromStringinput string to be converted (using the source character encoding)
toStringreference to variable where the converted string (using the destination character encoding) is stored (or appended, see parameter 'clearMode')
clearModeflag indicating whether to clear the variable 'toString' before appending the converted string
Returns
status, EC_Normal if successful, an error code otherwise

◆ convertString() [2/2]

OFCondition OFCharacterEncoding::convertString ( const char *  fromString,
const size_t  fromLength,
OFString toString,
const OFBool  clearMode = OFTrue 
)

convert the given string between the selected character encodings.

That means selectEncoding() has to be called prior to this method. Since the length of the input string has to be specified explicitly, the string can contain more than one NULL byte.

Parameters
fromStringinput string to be converted (using the source character encoding). A NULL pointer is regarded as an empty string.
fromLengthlength of the input string (number of bytes without the trailing NULL byte)
toStringreference to variable where the converted string (using the destination character encoding) is stored (or appended, see parameter 'clearMode')
clearModeflag indicating whether to clear the variable 'toString' before appending the converted string
Returns
status, EC_Normal if successful, an error code otherwise

◆ countCharactersInUTF8String()

static size_t OFCharacterEncoding::countCharactersInUTF8String ( const OFString utf8String)
static

count characters in given UTF-8 string and return the resulting number of so-called "code points".

Please note that invalid UTF-8 encodings are not handled properly. ASCII strings (7-bit) are also supported, although OFString::length() is probably much faster.

Parameters
utf8Stringvalid character string with UTF-8 encoding
Returns
number of characters (code points) in given UTF-8 string

◆ getConversionFlags()

unsigned OFCharacterEncoding::getConversionFlags ( ) const

get flags controlling converter behavior, e.g. specifying how illegal character sequences should be handled during conversion.

Note
This method will always return 0 if no encoder was selected using selectEncoding() before calling it.
Returns
a combination the IllegalSequenceMode constants (bitwise or) that is currently set or 0 if the current mode cannot be determined.

◆ getLibraryVersionString()

static OFString OFCharacterEncoding::getLibraryVersionString ( )
static

get version information of the underlying character encoding library.

Typical output format: "LIBICONV, Version 1.14". If character encoding is not available the output is: "<no character encoding library available>"

Returns
name and version number of the character encoding library

◆ getLocaleEncoding()

static OFString OFCharacterEncoding::getLocaleEncoding ( )
static

get the character encoding of the currently set global locale.

Remarks
Calling this function might be rather exhaustive depending on employed character set conversion library. Caching the result might therefore be recommended.
Note
The result may be an empty string, if the name of the current encoding cannot be determined.
Returns
the current locale's character encoding

◆ hasDefaultEncoding()

static OFBool OFCharacterEncoding::hasDefaultEncoding ( )
static

determine whether the underlying implementations defines a default encoding.

Most implementations define a default encoding, i.e. one can pass an empty string as the toEncoding and/or fromEncoding argument(s) of selectEncoding() to select the current locale's encoding. However, some iconv implementations inside the C standard library do not understand this.

Warning
Using getLocaleEncoding() instead of an empty string argument in this case typically won't work, since the implementations that don't define a default encoding typically also don't support determining the current locale's encoding.
Returns
OFTrue if a default encoding is defined and empty strings are valid arguments to selectEncoding(), OFFalse otherwise.

◆ isLibraryAvailable()

static OFBool OFCharacterEncoding::isLibraryAvailable ( )
static

check whether character set conversion is available, e.g. the underlying encoding library is available.

If not, no conversion between different character encodings will be possible (apart from the Windows-specific wide character conversion functions).

Returns
OFTrue if character set conversion is possible, OFFalse otherwise

◆ operator OFBool()

OFCharacterEncoding::operator OFBool ( ) const

check whether this object refers to a valid encoder.

Returns
OFTrue if this refers to a valid encoder, OFFalse otherwise.

◆ operator!()

OFBool OFCharacterEncoding::operator! ( ) const

check whether this object does not refer to a valid encoder.

Returns
OFFalse if this refers to a valid encoder, OFTrue otherwise.

◆ operator!=()

check whether two OFCharacterEncoding instances do not refer to the same encoder.

Note
This only tests if both objects refer to the exactly same encoder, originating from one and the same call to selectEncoding(). The result will be OFTrue if both encoders were constructed independently of each other, even if exactly the same parameters were used.
Parameters
rhsanother OFCharacterEncoding instance.
Returns
OFFalse if both instances refer to the same encoder, OFTrue otherwise.

◆ operator=()

OFCharacterEncoding& OFCharacterEncoding::operator= ( const OFCharacterEncoding rhs)

copy assignment.

Effectively calls clear() and then shares the encoder of another OFCharacterEncoding instance.

Parameters
rhsanother OFCharacterEncoding instance.
Returns
reference to this object

◆ operator==()

OFBool OFCharacterEncoding::operator== ( const OFCharacterEncoding rhs) const

check whether two OFCharacterEncoding instances refer to the same encoder.

Note
This only tests if both objects refer to the exactly same encoder, originating from one and the same call to selectEncoding(). The result will be OFFalse if both encoders were constructed independently of each other, even if exactly the same parameters were used.
Parameters
rhsanother OFCharacterEncoding instance.
Returns
OFTrue if both instances refer to the same encoder, OFFalse otherwise.

◆ selectEncoding()

OFCondition OFCharacterEncoding::selectEncoding ( const OFString fromEncoding,
const OFString toEncoding 
)

select source and destination character encoding for subsequent conversion(s).

The encoding names can be found in the documentation of the underlying implementation (e.g. libiconv). Typical names are "ASCII", "ISO-8859-1" and "UTF-8". An empty string denotes the encoding of the current locale (see getLocaleEncoding()).

Parameters
fromEncodingname of the source character encoding
toEncodingname of the destination character encoding
Returns
status, EC_Normal if successful, an error code otherwise

◆ setConversionFlags()

OFCondition OFCharacterEncoding::setConversionFlags ( const unsigned  flags)

set flags controlling converter behavior, e.g. illegal character sequences should be handled during conversion.

Precondition
An encoding has been selected by successfully calling OFCharacterEncoding::selectEncoding(), i.e. OFCharacterEncoding::isLibraryAvailable() and *this evaluate to OFTrue.
Parameters
flagsthe conversion flags that shall be used, a combination of the OFCharacterEncoding::ConversionFlags constants, e.g. TransliterateIllegalSequences | DiscardIllegalSequences.
Returns
EC_Normal if the flags were set, an error code otherwise, i.e. if the flags are not supported by the underlying implementation.
See also
OFCharacterEncoding::supportsConversionFlags()

◆ supportsConversionFlags()

static OFBool OFCharacterEncoding::supportsConversionFlags ( const unsigned  flags)
static

determine whether the underlying implementation supports the given conversion flags.

Parameters
flagsthe flags to query, a combination of OFCharacterEncoding::ConversionFlags constants, e.g. TransliterateIllegalSequences | DiscardIllegalSequences.
Returns
OFTrue if the given flags are supported, OFFalse if not or support is unknown.

The documentation for this class was generated from the following file:


Generated on Thu Nov 29 2018 for DCMTK Version 3.6.4 by Doxygen 1.8.14