|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.utexas.its.eis.tools.qwicap.servlet.ContentType
public class ContentType
The ContentType
class parses the values of the "Content-type" headers in standard email messages
(RFC 822), MIME email messages (RFC
2045 and 2046),
and HTTP transactions (RFC 2616). Because those
standards differ with regard to their default character sets, this class is incomplete unless its constructor is
supplied with an implementation of ContentTypeDefaultCharSet
that knows the default character set rule(s)
for a particular standard.
RFCs 822, 2045, 2046 and 2616 share a substantially common set of rules for the their "Content-Type" identifiers. Seen from the perspective of the HTTP 1.1 standard (RFC 2616), which is the most derivative of the standards with regard to "Content-Type", the ultimate set of rules are derived in the following manner:
The HTTP 1.1 "Content-Type" header is discussed in section 14.17 of RFC 2616. The "Content-Type" header specifies a media type. Media types are discussed in section 3.7 of RFC 2616. Section 3.7 mentions several important facts:
Section 3.7.1 of RFC 2616, "Canonicalization and Text Defaults", also has something interesting to say:
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
Section 3.7.2 of RFC 2616, "Multipart Types", states that multi-part HTTP uses the same syntax defined in RFC 2046, section 5.1.1. The "abstract" at the start of RFC 2046 states:
The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing system and defines an initial set of media types. [....]
Thus, the HTTP 1.1 standard (RFC 2616) bases its handling of mutlipart data on RFC 2046, and RFC 2046 depends on RFC 2045 to specify the relevant headers. Section 5 of RFC 2045, defines the "Content-Type" header, and thus seems to be the ultimate source of the definition of the "Content-Type" header of HTTP 1.1 (RFC 2616). Section 5.1 of RFC 2045 provides the detailed defintion of the "Content-Type" header syntax (using the augmented BNF of RFC 822):
content := "Content-Type" ":" type "/" subtype *(";" parameter)
Where "parameter
" is defined as:
parameter := attribute "=" value
And "value
" is defined as:
value := token / quoted-string
However, while "token
" is defined in section 5.1 of RFC 2045, "quoted-string
" is
not defined in RFC 2045 at all. To find the definition of "quoted-string
" we have to refer back
to RFC 822, section 3.3, "Lexical Tokens", which supplies the following definition:
quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars.
Where "qtext
" is defined as:
qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space>
And "quoted-pair
" is defined as:
quoted-pair = "\" CHAR ; may quote any char
So, a "quoted-pair
" identifies a conventional backslash ('\') based single-character escaping
mechanism, as explained in detail in RFC 822 section 3.4.1, where it is referred to as "quoting". That's a
misleading choice of terminology, because the escaping mechanism can apply to any character, not just quotes, but
that's the term RFC 822 uses, so we're stuck with it.
By the way, RFC 2045, section 5.1, helpfully includes the following examples of "Content-Type" headers:
Content-type: text/plain; charset=us-ascii (Plain text) Content-type: text/plain; charset="us-ascii"
These do not illustrate the quoted-pair mechanism, but do illustrate the concepts of parameters whose values are
tokens (charset=us-ascii
), parameters whose values are quoted-strings (charset="us-ascii"
),
and parenthetical comments ((Plain text)
). Comments are not included in the RFC 2045 BNF defining
"parameter
", but the text of section 5.1 states: "comments are allowed in accordance with RFC 822 rules
for structured header fields". So, back we go to RFC 822, where we find the following definition of
"comment
" in section 3.3:
comment = "(" *(ctext / quoted-pair / comment) ")"
Thus it seems that an RFC 2616 "Content-Type" header must support all of the features and syntax defined for it in RFC 2045 and RFC 822. That support includes: an unlimited number of parameters following the type/subtype; parameters including a trailing, parenthetical comment; quoted parameter values; and a backslash-based escaping mechanism for use within quoted parameter values and comments.
Constructor Summary | |
---|---|
ContentType(String ContentTypeStr,
edu.utexas.its.eis.tools.qwicap.servlet.ContentTypeDefaultCharSet DefaultCharSet)
Creates a ContentType instance which is a parsed representation of the value of a "Content-Type"
header. |
Method Summary | |
---|---|
String |
getCharacterSet()
Returns the canonicalized name of the character set identified by the "charset" parameter, if that parameter was present. |
boolean |
getCharacterSetWasSpecified()
Returns true if the character set was explicity identified in the content type, and false
if it was missing. |
String |
getMIMEType()
The MIME media type of this content. |
ContentTypeParameter |
getParameter(String ParamName)
Returns the parameter that has the specified, case-insensitive name, or null if this content type did
not include the specified parameter. |
ContentTypeParameter[] |
getParameters()
Returns all of the parameters included in this content type. |
String |
getSubtype()
The MIME subtype of this content. |
String |
getType()
The MIME type of this content. |
String |
toString()
The content type string passed to the constructor. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public ContentType(String ContentTypeStr, edu.utexas.its.eis.tools.qwicap.servlet.ContentTypeDefaultCharSet DefaultCharSet)
ContentType
instance which is a parsed representation of the value of a "Content-Type"
header.
ContentTypeStr
- The value of a "Content-Type" header. For example: "text/html;
charset="UTF-8" (Unicode 8-bit Encoding)
".DefaultCharSet
- An instance of a class that can determine the default character set appropriate
to a particular standard in the absence of a "charset" parameter in the content
type value. Can be null
, in which case the "default" charset will
also be null
.Method Detail |
---|
public String toString()
toString
in class Object
public String getMIMEType()
null
if the MIME type was missing.public String getType()
null
if the MIME type was missing.public String getSubtype()
null
if the MIME type was missing.public String getCharacterSet()
null
if the parameter was absent, and the default character set could not
be determined.public boolean getCharacterSetWasSpecified()
true
if the character set was explicity identified in the content type, and false
if it was missing. The ability, or inability, to obtain a definitive character set identification from other
sources, like a protocol's defaults, is irrelevant to the value returned by this method.
true
if the character set was explicity identified in the content type, or
false
otherwise.public ContentTypeParameter getParameter(String ParamName)
null
if this content type did
not include the specified parameter.
ParamName
- The case-insensitive name of the parameter to retrieve.
null
, if there was no such parameter.public ContentTypeParameter[] getParameters()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |