I profiled it, but nothing jumped out to me. Here's the code I've used:
import notmuch2
import timeit
def msg2_threads():
db = notmuch2.Database()
ts = db.threads("date:2025")
for t in ts:
authors = {}
tags = {}
for msg in t:
authors[msg.header("from")] = 1
for tag in msg.tags:
tags[tag] = 1
author_list = list(authors.keys())
tag_list = list(tags.keys())
db.close()
def msg2_messages():
db = notmuch2.Database()
ts = db.messages("date:2025")
threads = {}
for msg in ts:
threads[msg.threadid] = 1
authors = {}
tags = {}
for msg in db.messages(" or ".join([f"thread:{t}" for t in list(threads.keys())])):
authors[msg.header("from")] = 1
for tag in msg.tags:
tags[tag] = 1
author_list = list(authors.keys())
tag_list = list(tags.keys())
db.close()
print(timeit.timeit(msg2_threads, number=10))
print(timeit.timeit(msg2_messages, number=10))
The second function takes *half* the time of the first on my machine, even
though they both get the same messages.
Cheers,
Lars
On Sun, 09 Feb 2025 14:56:42 -0400, David Bremner <david@tethera.net> wrote:
> Lars Kotthoff <lists@larsko.org> writes:
>
> > On a somewhat related note, I've noticed that getting getting threads is much
> > slower than getting messages that match the same query, extracting the thread
> > IDs, and then getting the messages for each of those threads. This seems to be
> > the case both in the old and new APIs — any ideas?
> >
>
> Retrieving threads with C-API does a fair amount of work, so it might be
> worth running under perf and seeing if there is a common hotspot in the
> notmuch library.
>
> d
> [...]
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org