edu.utexas.its.eis.tools.qwicap.util
Class HTMLEntityCodec

java.lang.Object
  extended by edu.utexas.its.eis.tools.qwicap.util.HTMLEntityCodec

public final class HTMLEntityCodec
extends Object

HTMLEntityCodec converts most control characters, various reserved characters (ampersands, double-quotes, chevrons, etc.), and all Unicode characters greater than or equal to 127, to and from HTML character entity references. When encoding, the symbolic form of HTML character entities (e.g. "·") are always used in preference to the numeric forms (e.g. "·"). All character entity references specified in the HTML 4 specification are supported.

Version:
1.3
Author:
Chris W. Johnson
See Also:
Character entity references in HTML 4

Constructor Summary
HTMLEntityCodec()
          Creates an HTMLEntityCodec instance for translating back and forth between Unicode characters and HTML character entity references.
HTMLEntityCodec(boolean EncodeDoubleQuotes)
          Creates an HTMLEntityCodec instance for translating back and forth between Unicode characters and HTML character entity references.
 
Method Summary
 Characters decode(Characters EnStr)
          Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.
 boolean decode(Characters EnStr, StringBuffer Buff)
          Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.
 boolean decode(Characters EnStr, StringBuilder Buff)
          Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.
 String decode(String EnStr)
          Decodes a string by translating HTML character entity references into Unicode characters.
 boolean decode(String EnStr, StringBuffer Buff)
          Decodes a string by translating HTML character entity references into corresponding Unicode characters.
 boolean decode(String EnStr, StringBuilder Buff)
          Decodes a string by translating HTML character entity references into corresponding Unicode characters.
 Characters encode(Characters UnStr)
          Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.
 boolean encode(Characters UnStr, StringBuffer Buff)
          Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.
 boolean encode(Characters UnStr, StringBuilder Buff)
          Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.
 boolean encode(CharSequence UnStr, StringBuffer Buff)
          Encodes a Unicode character sequence by translating characters into HTML character entity references, as needed.
 boolean encode(CharSequence UnStr, StringBuilder Buff)
          Encodes a Unicode character sequence by translating characters into HTML character entity references, as needed.
 String encode(String UnStr)
          Encodes a Unicode string by translating characters into HTML character entity references, as needed.
 String getEncoding(char Char)
          Returns the encoded form of the specified Unicode character as a String, or returns null, if the character will be left unchanged.
 HTMLEntityCodec setEncoding(char Char, String Encoded)
          Sets the encoded form of the specified Unicode character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLEntityCodec

public HTMLEntityCodec()
Creates an HTMLEntityCodec instance for translating back and forth between Unicode characters and HTML character entity references. Double-quote characters are translated to/from the HTML character entity reference """. Unprintable characters are omitted. Line separator characters ('\n') are left unaltered.


HTMLEntityCodec

public HTMLEntityCodec(boolean EncodeDoubleQuotes)
Creates an HTMLEntityCodec instance for translating back and forth between Unicode characters and HTML character entity references.

Parameters:
EncodeDoubleQuotes - true if the double-quote character (") should be replaced by an HTML character entity reference, false if it should be left unchanged.
Method Detail

getEncoding

public String getEncoding(char Char)
Returns the encoded form of the specified Unicode character as a String, or returns null, if the character will be left unchanged.

Parameters:
Char - The character to be encoded.
Returns:
The encoded form the character as a String, or null if the character doesn't need to be encoded.

setEncoding

public HTMLEntityCodec setEncoding(char Char,
                                   String Encoded)
Sets the encoded form of the specified Unicode character.

Parameters:
Char - The character for which an encoding is being specified.
Encoded - The encoding to be used for the specified character, or null, if the character should be left unchanged.
Returns:
A reference to this HTMLEntityCodec instance.

encode

public String encode(String UnStr)
Encodes a Unicode string by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode string.
Returns:
A string in which all characters that were supposed to be translated into HTML entity references have been.

encode

public boolean encode(CharSequence UnStr,
                      StringBuffer Buff)
Encodes a Unicode character sequence by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode character sequence.
Buff - A StringBuffer to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were encoded, false if no characters were encoded.

encode

public boolean encode(CharSequence UnStr,
                      StringBuilder Buff)
Encodes a Unicode character sequence by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode character sequence.
Buff - A StringBuilder to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were encoded, false if no characters were encoded.

encode

public Characters encode(Characters UnStr)
Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode string.
Returns:
A sequence of characters in which all characters that were supposed to be translated into HTML entity references have been.

encode

public boolean encode(Characters UnStr,
                      StringBuffer Buff)
Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode string.
Buff - A StringBuffer to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were encoded, false if no characters were encoded.

encode

public boolean encode(Characters UnStr,
                      StringBuilder Buff)
Encodes a Unicode string, represented as a Characters object, by translating characters into HTML character entity references, as needed.

Parameters:
UnStr - A Unicode string.
Buff - A StringBuilder to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were encoded, false if no characters were encoded.

decode

public String decode(String EnStr)
Decodes a string by translating HTML character entity references into Unicode characters.

Parameters:
EnStr - The string to be decoded.
Returns:
A String in which all of HTML character entity references in the original string have been translated into their Unicode equivalents.

decode

public boolean decode(String EnStr,
                      StringBuffer Buff)
Decodes a string by translating HTML character entity references into corresponding Unicode characters.

Parameters:
EnStr - The string to be decoded.
Buff - A StringBuffer to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were decoded, false if no characters were decoded.

decode

public boolean decode(String EnStr,
                      StringBuilder Buff)
Decodes a string by translating HTML character entity references into corresponding Unicode characters.

Parameters:
EnStr - The string to be decoded.
Buff - A StringBuilder to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were decoded, false if no characters were decoded.

decode

public Characters decode(Characters EnStr)
Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.

Parameters:
EnStr - The string to be decoded.
Returns:
A Characters object in which all of HTML character entity references in the original string have been translated into their Unicode equivalents.

decode

public boolean decode(Characters EnStr,
                      StringBuffer Buff)
Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.

Parameters:
EnStr - The string to be decoded.
Buff - A StringBuffer to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were decoded, false if no characters were decoded.

decode

public boolean decode(Characters EnStr,
                      StringBuilder Buff)
Decodes a string, represented as a Characters object, by translating HTML character entity references into corresponding Unicode characters.

Parameters:
EnStr - The string to be decoded.
Buff - A StringBuffer to which the output of this method is to be appended. The buffer is not cleared.
Returns:
true if any characters were decoded, false if no characters were decoded.