On Mon, Jun 27, 2011 at 6:41 PM, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote: > On 06/27/2011 06:07 PM, Austin Clements wrote: >> Oh, right, of course. show_message_part will walk into the parts, so >> format_part_content_raw will still be called on the leafs of a >> requested multipart. Though, this approach results in each leaf being >> transfer decoded and printed individually, so if you ask for a >> multipart, you won't get the "raw" contents of the multipart (unless >> it's part 0), so much as you get the concatenated "raw" contents of >> each part in the multipart. > > > let's take two labeled examples: > > A└┬╴multipart/signed 58292 bytes > B ├┬╴multipart/mixed 56553 bytes > C │├╴text/plain 1278 bytes > D │├╴text/plain attachment [grub-install.out] 54109 bytes > E │└╴text/x-diff attachment [597538.patch] 496 bytes > F └╴application/pgp-signature attachment [signature.asc] 900 bytes > > > X└┬╴multipart/signed 3863 bytes > Y ├╴text/plain 1857 bytes > Z └╴application/pgp-signature attachment [signature.asc] 900 bytes > > (i know, you won't use "A" or "Z" as part IDs once we have hierarchical > part numbers, but consider them placeholders). > > if parts F or Z are ever going to be useful (e.g. to some external > process that wants to validate the signature by hand), then the tool > needs to provide some way of producing parts B and Y in a pristine form > (that is, including MIME headers and without interpreting/applying any > transfer encodings). > > Perhaps this means there are two flavors of "raw" that we should be > distinguishing, something like: > > 0) "source" -- the equivalent to viewing the source of the message, > with headers and without attempting to reverse transfer-encodings, etc. > > 1) "rare" -- (not entirely raw, but still bloody, ha ha) strip headers, > reverse transfer encodings, etc. > > I think our current implementation of --format=raw emits "source" when > applied to the entire message, but "rare" when applied to one of the parts. Yes. > I'm suggesting that it might be useful to be able to get "source" of a > part. (and perhaps it might also be useful to get the whole message > "rare" sometimes?) > > My first instinct was: if it's multipart, provide "source", if it's > single-part, provide "rare". But that fails for the XYZ case above -- > we'd need Y (which is single-part) to be provided as "source" if we were > ever to be able to make use of Z on its own, so i don't think it'll be > that simple. > > OTOH, i'm not sure that "rare" is particularly meaningful for non-leaf > parts. > >> That if you ask for a multipart, you should effectively get a slice >> out of the original message bytes (since multipart/* parts can't have >> non-identity transfer encodings). Are you also saying that should >> extend to transfer encoded leaf parts, too? > > hmm. is it true that multipart/* parts can't have non-identity transfer > encodings? that would simplify some things, but i don't have a > reference handy that says it's the case. RFC 2045, section 6.4: "If an entity is of type "multipart" the Content-Transfer-Encoding is not permitted to have any value other than "7bit", "8bit" or "binary"." (And, for completeness, section 6.2: "The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all mean that the identity (i.e. NO) encoding transformation has been performed.") > At any rate, i'm not sure it affects the need for being able to emit > both "rare" and "source" forms of at least the leaf (non-multipart) parts. > > i hope this is all at least somewhat clarifying and not just adding to > the confusion, Thanks. That's actually very informative and solidifies some of what's been slowly coagulating in my mind. I was also thinking about the two output variants you describe (though, being less clever, I was thinking "raw" and "decoded"). The fact that multipart/* parts can only have identity encodings makes me wonder if the two could be merged by thinking of the decoded content of a leaf part as a child/body to the original, encoded part. On the other hand, that doesn't make sense for other formats, so perhaps that's not a fruitful approach.