edu.utexas.its.eis.tools.qwicap.servlet
Class ContentType

java.lang.Object
  extended by edu.utexas.its.eis.tools.qwicap.servlet.ContentType

public class ContentType
extends Object

The ContentType class parses the values of the "Content-type" headers in standard email messages (RFC 822), MIME email messages (RFC 2045 and 2046), and HTTP transactions (RFC 2616). Because those standards differ with regard to their default character sets, this class is incomplete unless its constructor is supplied with an implementation of ContentTypeDefaultCharSet that knows the default character set rule(s) for a particular standard.


About The "Content-Type" Header and HTTP

RFCs 822, 2045, 2046 and 2616 share a substantially common set of rules for the their "Content-Type" identifiers. Seen from the perspective of the HTTP 1.1 standard (RFC 2616), which is the most derivative of the standards with regard to "Content-Type", the ultimate set of rules are derived in the following manner:

The HTTP 1.1 "Content-Type" header is discussed in section 14.17 of RFC 2616. The "Content-Type" header specifies a media type. Media types are discussed in section 3.7 of RFC 2616. Section 3.7 mentions several important facts:

Section 3.7.1 of RFC 2616, "Canonicalization and Text Defaults", also has something interesting to say:

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

Section 3.7.2 of RFC 2616, "Multipart Types", states that multi-part HTTP uses the same syntax defined in RFC 2046, section 5.1.1. The "abstract" at the start of RFC 2046 states:

The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing system and defines an initial set of media types. [....]

Thus, the HTTP 1.1 standard (RFC 2616) bases its handling of mutlipart data on RFC 2046, and RFC 2046 depends on RFC 2045 to specify the relevant headers. Section 5 of RFC 2045, defines the "Content-Type" header, and thus seems to be the ultimate source of the definition of the "Content-Type" header of HTTP 1.1 (RFC 2616). Section 5.1 of RFC 2045 provides the detailed defintion of the "Content-Type" header syntax (using the augmented BNF of RFC 822):

    content := "Content-Type" ":" type "/" subtype
               *(";" parameter)
        

Where "parameter" is defined as:

    parameter := attribute "=" value
        

And "value" is defined as:

    value := token / quoted-string
        

However, while "token" is defined in section 5.1 of RFC 2045, "quoted-string" is not defined in RFC 2045 at all. To find the definition of "quoted-string" we have to refer back to RFC 822, section 3.3, "Lexical Tokens", which supplies the following definition:

    quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                ;   quoted chars.
        

Where "qtext" is defined as:

 
    qtext       =  <any CHAR excepting <">,     ; => may be folded
                    "\" & CR, and including
                    linear-white-space>
        

And "quoted-pair" is defined as:

    quoted-pair =  "\" CHAR                     ; may quote any char
        

So, a "quoted-pair" identifies a conventional backslash ('\') based single-character escaping mechanism, as explained in detail in RFC 822 section 3.4.1, where it is referred to as "quoting". That's a misleading choice of terminology, because the escaping mechanism can apply to any character, not just quotes, but that's the term RFC 822 uses, so we're stuck with it.

By the way, RFC 2045, section 5.1, helpfully includes the following examples of "Content-Type" headers:

 
    Content-type: text/plain; charset=us-ascii (Plain text)

    Content-type: text/plain; charset="us-ascii"
        

These do not illustrate the quoted-pair mechanism, but do illustrate the concepts of parameters whose values are tokens (charset=us-ascii), parameters whose values are quoted-strings (charset="us-ascii"), and parenthetical comments ((Plain text)). Comments are not included in the RFC 2045 BNF defining "parameter", but the text of section 5.1 states: "comments are allowed in accordance with RFC 822 rules for structured header fields". So, back we go to RFC 822, where we find the following definition of "comment" in section 3.3:

   
    comment     =  "(" *(ctext / quoted-pair / comment) ")"
        

Thus it seems that an RFC 2616 "Content-Type" header must support all of the features and syntax defined for it in RFC 2045 and RFC 822. That support includes: an unlimited number of parameters following the type/subtype; parameters including a trailing, parenthetical comment; quoted parameter values; and a backslash-based escaping mechanism for use within quoted parameter values and comments.

Author:
Chris W. Johnson

Constructor Summary
ContentType(String ContentTypeStr, edu.utexas.its.eis.tools.qwicap.servlet.ContentTypeDefaultCharSet DefaultCharSet)
          Creates a ContentType instance which is a parsed representation of the value of a "Content-Type" header.
 
Method Summary
 String getCharacterSet()
          Returns the canonicalized name of the character set identified by the "charset" parameter, if that parameter was present.
 boolean getCharacterSetWasSpecified()
          Returns true if the character set was explicity identified in the content type, and false if it was missing.
 String getMIMEType()
          The MIME media type of this content.
 ContentTypeParameter getParameter(String ParamName)
          Returns the parameter that has the specified, case-insensitive name, or null if this content type did not include the specified parameter.
 ContentTypeParameter[] getParameters()
          Returns all of the parameters included in this content type.
 String getSubtype()
          The MIME subtype of this content.
 String getType()
          The MIME type of this content.
 String toString()
          The content type string passed to the constructor.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ContentType

public ContentType(String ContentTypeStr,
                   edu.utexas.its.eis.tools.qwicap.servlet.ContentTypeDefaultCharSet DefaultCharSet)
Creates a ContentType instance which is a parsed representation of the value of a "Content-Type" header.

Parameters:
ContentTypeStr - The value of a "Content-Type" header. For example: "text/html; charset="UTF-8" (Unicode 8-bit Encoding)".
DefaultCharSet - An instance of a class that can determine the default character set appropriate to a particular standard in the absence of a "charset" parameter in the content type value. Can be null, in which case the "default" charset will also be null.
Method Detail

toString

public String toString()
The content type string passed to the constructor.

Overrides:
toString in class Object
Returns:
The content type string passed to the constructor.

getMIMEType

public String getMIMEType()
The MIME media type of this content. For example, if the MIME media type is "text/plain", this method returns "text/plain".

Returns:
The MIME type of this content, or null if the MIME type was missing.

getType

public String getType()
The MIME type of this content. For example, if the MIME media type is "text/plain", the type is "text".

Returns:
The MIME type of this content, or null if the MIME type was missing.

getSubtype

public String getSubtype()
The MIME subtype of this content. For example, if the MIME media type is "text/plain", the subtype is "plain".

Returns:
The MIME subtype of this content, or null if the MIME type was missing.

getCharacterSet

public String getCharacterSet()
Returns the canonicalized name of the character set identified by the "charset" parameter, if that parameter was present.

Returns:
The canonicalized name of the character set identified by the "charset" parameter, or null if the parameter was absent, and the default character set could not be determined.

getCharacterSetWasSpecified

public boolean getCharacterSetWasSpecified()
Returns true if the character set was explicity identified in the content type, and false if it was missing. The ability, or inability, to obtain a definitive character set identification from other sources, like a protocol's defaults, is irrelevant to the value returned by this method.

Returns:
true if the character set was explicity identified in the content type, or false otherwise.

getParameter

public ContentTypeParameter getParameter(String ParamName)
Returns the parameter that has the specified, case-insensitive name, or null if this content type did not include the specified parameter.

Parameters:
ParamName - The case-insensitive name of the parameter to retrieve.
Returns:
The requested parameter, or null, if there was no such parameter.

getParameters

public ContentTypeParameter[] getParameters()
Returns all of the parameters included in this content type.

Returns:
The parameters included in this content type, or an empty array if there were no parameters.