Re: [PATCH 0/5] notmuch batch count

Subject: Re: [PATCH 0/5] notmuch batch count

Date: Mon, 21 Jan 2013 18:21:52 +0100

To: Tomi Ollila, Mark Walters, notmuch@notmuchmail.org

Cc:

From: Jani Nikula


On Wed, 16 Jan 2013, Tomi Ollila <tomi.ollila@iki.fi> wrote:
> On Wed, Jan 16 2013, Mark Walters <markwalters1009@gmail.com> wrote:
>
>> On Tue, 15 Jan 2013, Jani Nikula <jani@nikula.org> wrote:
>>> Hi all -
>>>
>>> Notmuch remote usage [1] is a pretty handy way of accessing a notmuch
>>> database on a remote server. However, the more you have saved searches
>>> and tags, the slower notmuch-hello becomes, and it ends up being by and
>>> far the biggest usability issue with remote notmuch. This is because
>>> notmuch-hello issues a separate 'notmuch count' for each saved search
>>> and tag.
>>>
>>> One could argue that notmuch-hello should be fixed somehow, but I chose
>>> to try another route: batch support for notmuch count. This enables
>>> notmuch-hello to get the counts for all the saved searches or tags in a
>>> single call. The performance improvement is huge in remote usage, but
>>> it's not limited to that. Regular local usage benefits from it too, but
>>> it's not as obviously noticeable.
>>
>> This series looks good to me (that is the code looks fine).
>>
>> Two questions are:
>>
>> Do we want this functionality? I think it is useful even on local setups
>> particularly if people have lots of tags (the section that shows all
>> tags can be quite noticeably sped up). It is a substantial improvement
>> on remote setups but I am not sure if that is sufficiently common to
>> warrant the change. At least the code path is the same so it will get
>> enough testing.
>
> I do want the functionality. Especialy where I am now it takes about
> 0.4 sec for 'ssh remote echo foo' to get executed (using connection sharing).
> pipelining the count requests could make all the count requests emacs
> does (in my current set) to complete in less than 1 sec. 
>
>> Secondly, if we do the functionality should it be more general so that
>> it can do searches etc too. I think this is less clear. Count is likely
>> to be the most useful one since running several (simultaneous) counts is
>> probably more common than running several simultaneous searches.
>
> One could argue that we'd should send json "documents" to notmuch in
> stdin and notmuch would output json(/sexp) "documents". That is just
> SMOP. I bet Austin would like this solution, especially the part
> that involves writing or integrating json parser >;). 
> I'd be happy with this 'batch' approach. 
>
> I'll be testing this soon, but refrain from reviewing the code
> until 0.15 is out.

id:87a9s5cp38.fsf@zancas.localnet ;)

J.


>
>>
>> Best wishes
>>
>> Mark
>
>
> Tomi
>
>
>>
>>
>>>
>>> Here's a script that demonstrates one-by-one count vs. batch count,
>>> locally and over ssh (assuming ssh key authentication is set up), over
>>> 10 iterations:
>>>
>>> #!/bin/bash
>>>
>>> echo "tag count:"
>>> notmuch search --output=tags "*" | wc -l
>>>
>>> for remote in "" "ssh example.com"; do
>>>     export remote
>>>     echo "one-by-one count:"
>>>     time sh -c 'for i in `seq 10`; do notmuch search --format=text0 --output=tags "*" | xargs -0 -n 1 -I "{}" $remote notmuch count tag:"{}" > /dev/null; done'
>>>
>>>     echo "batch count:"
>>>     time sh -c 'for i in `seq 10`; do notmuch search --format=text --output=tags "*" | sed "s/.*/tag:\"\0\"/" | $remote notmuch count --batch > /dev/null; done'
>>> done
>>>
>>> And here's the output of it in my setup:
>>>
>>> tag count:
>>> 36
>>> one-by-one count:
>>>
>>> real	0m2.349s
>>> user	0m0.552s
>>> sys	0m0.868s
>>> batch count:
>>>
>>> real	0m0.179s
>>> user	0m0.120s
>>> sys	0m0.064s
>>> one-by-one count:
>>>
>>> real	0m56.527s
>>> user	0m1.424s
>>> sys	0m1.164s
>>> batch count:
>>>
>>> real	0m2.407s
>>> user	0m0.068s
>>> sys	0m0.040s
>>>
>>> As can be seen, in local usage (the first pair of results) the speedup
>>> is more than 10x, although one-by-one notmuch count is usually
>>> sufficiently fast. The difference is more noticeable in remote use (the
>>> second pair of results), where the speedup is 20x here, and any
>>> additional, occasional network latency is multiplied by tag count. (That
>>> result is actually faster than usual for me, but it's still 5+ seconds
>>> to display or refresh notmuch-hello.)
>>>
>>> Mark has written a patch that I've been using to switch notmuch-hello to
>>> use batch count. That has made me switch from running notmuch in ssh to
>>> using remote notmuch. The great thing is that we could switch to using
>>> that in Emacs with no special casing for remote usage, and it would
>>> speed things up also in local use. I'm expecting Mark to post his patch
>>> in reply to this series.
>>>
>>> Mark actually wrote the elisp part based on the rough idea prior to any
>>> of this cli plumbing, so I felt obliged to follow up. So thanks Mark!
>>>
>>>
>>> BR,
>>> Jani.
>>>
>>>
>>> [1] http://notmuchmail.org/remoteusage/ (the page could use some
>>> cleanup; it's really not nearly as complicated as the page suggests)
>>>
>>>
>>> Jani Nikula (5):
>>>   cli: remove useless strdup
>>>   cli: extract count printing to a separate function in notmuch count
>>>   cli: add --batch option to notmuch count
>>>   man: document notmuch count --batch and --input options
>>>   test: notmuch count --batch and --input options
>>>
>>>  man/man1/notmuch-count.1 |   20 +++++++++
>>>  notmuch-count.c          |  111 +++++++++++++++++++++++++++++++++++-----------
>>>  test/count               |   46 +++++++++++++++++++
>>>  3 files changed, 150 insertions(+), 27 deletions(-)
>>>
>>> -- 
>>> 1.7.10.4
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch

Thread: