Re: [PATCH 1/4] show: indicate length of omitted body content (json)

Subject: Re: [PATCH 1/4] show: indicate length of omitted body content (json)

Date: Tue, 07 Aug 2012 09:57:26 -0400

To: Peter Wang

Cc: notmuch@notmuchmail.org

From: Austin Clements


Quoting Peter Wang <novalazy@gmail.com>:
> On Mon, 6 Aug 2012 12:47:10 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
>> What's the overall goal of adding this?  Are you planning to add size
>> information to one of the frontends?
>
> Yes, to my frontend.
>
>>> > diff --git a/devel/schemata b/devel/schemata
>> > index 9cb25f5..3df2764 100644
>> > --- a/devel/schemata
>> > +++ b/devel/schemata
>> > @@ -69,7 +69,10 @@ part = {
>> >      # A leaf part's body content is optional, but may be included if
>> >      # it can be correctly encoded as a string.  Consumers should use
>> >      # this in preference to fetching the part content separately.
>> > -    content?:       string
>> > +    content?:       string,
>> > +    # If a leaf part's body content is not included, the content-length
>> > +    # may be included instead.
>>
>> You mentioned elsewhere that the content-length returned is an
>> estimate.  If that's the case, this comment should say as much.  Is it
>> actually the case, though?  g_mime_part_get_content_object is
>> remarkably poorly documented for such an important function, but based
>> on format_part_raw, it seems like the content-length your code returns
>> will be exactly the number of bytes returned by the raw format for a
>> leaf part.
>
> It's the exact length of the _encoded_ content.  If the transfer
> encoding is base64, multiplying by 3/4 will get a close estimate of the
> decoded content length.  I assume quoted-printable encoding would only
> be used if the content is mostly ASCII, so the encoded length can serve
> as the estimated decoded length then.

Ah, I see.  format_part_raw misled me; apparently the
g_mime_data_wrapper_write_to_stream is key there, since *that* decodes
the transfer encoding of the data wrapper's underlying, raw stream.

In that case, the comment could either mention that this is the length
of the transfer encoded content or it could say it's an approximation
of the decoded length.  The advantage of only claiming the latter is
that it would leave open the possibility of, say, multiplying by .75
for base64 transfer encoding to get a better decoded estimate (your
assumption about quoted-printable sounds completely reasonable).
Alternatively, we could add the transfer encoding in the future and
let the caller do such approximations.

>> > diff --git a/notmuch-show.c b/notmuch-show.c
>> > index 3556293..5c54257 100644
>> > --- a/notmuch-show.c
>> > +++ b/notmuch-show.c
>> > @@ -664,6 +664,14 @@ format_part_json (const void *ctx, sprinter_t 
>> *sp, mime_node_t *node,
>> >  	    sp->map_key (sp, "content");
>> >  	    sp->string_len (sp, (char *) part_content->data, part_content->len);
>> >  	    g_object_unref (stream_memory);
>> > +	} else {
>> > +	    GMimeDataWrapper *wrapper = g_mime_part_get_content_object 
>> (GMIME_PART (node->part));
>> > +	    GMimeStream *stream = g_mime_data_wrapper_get_stream (wrapper);
>> > +	    ssize_t length = g_mime_stream_length (stream);
>> > +	    if (length >= 0) {
>> > +		sp->map_key (sp, "content-length");
>> > +		sp->integer (sp, length);
>> > +	    }
>>
>> Do wrapper or stream need to be g_object_unref'd?
>
> No.
>
>> Any idea what the performance overhead of this is?  I'm just curious.
>> It might be approximately nothing, since GMime's parser is eager.
>
> The start and end bounds of the stream are already known so there's
> approximately nothing for g_mime_stream_length to do.  The other
> functions simply return field values.

Sounds good.

> I'll drop the changes for text output.
>
> Peter


Thread: