Excerpts from Austin Clements's message of Fri May 27 03:41:44 +0100 2011: > >> > > Have you tried simply calling list() on your thread > >> > > iterator to see how expensive it is? My bet is that it's quite cheap, > >> > > both memory-wise and CPU-wise. > >> > Funny thing: > >> > q=Database().create_query('*') > >> > time tlist = list(q.search_threads()) > >> > raises a NotmuchError(STATUS.NOT_INITIALIZED) exception. For some reason > >> > the list constructor must read mere than once from the iterator. > >> > So this is not an option, but even if it worked, it would show > >> > the same behaviour as my above test.. > >> > >> Interesting. Looks like the Threads class implements __len__ and that > >> its implementation exhausts the iterator. Which isn't a great idea in > >> itself, but it turns out that Python's implementation of list() calls > >> __len__ if it's available (presumably to pre-size the list) before > >> iterating over the object, so it exhausts the iterator before even > >> using it. > >> > >> That said, if list(q.search_threads()) did work, it wouldn't give you > >> better performance than your experiment above. true. Nevertheless I think that list(q.search_threads()) should be equivalent to [t for t in q.search_threads()], which is something to be fixed in the bindings. Should I file an issue somehow? Or is enough to state this as a TODO here on the list? > >> > would it be very hard to implement a Query.search_thread_ids() ? > >> > This name is a bit off because it had to be done on a lower level. > >> > >> Lazily fetching the thread metadata on the C side would probably > >> address your problem automatically. But what are you doing that > >> doesn't require any information about the threads you're manipulating? > > Agreed. Unfortunately, there seems to be no way to get a list of thread > > ids or a reliable iterator thereof by using the current python bindings. > > It would be enough for me to have the ids because then I could > > search for the few threads I actually need individually on demand. > > There's no way to do that from the C API either, so don't feel left > out. ]:--8) It seems to me that the right solution to your problem > is to make thread information lazy (effectively, everything gathered > in lib/thread.cc:_thread_add_message). Then you could probably > materialize that iterator cheaply. Alright. I'll put this on my mental notmuch wish list and hope that someone will have addressed this before I run out of ideas how to improve my UI and have time to look at this myself. For now, I go with the [t.get_thread_id for t in q.search_threads()] approach to cache the thread ids myself and live with the fact that this takes time for large result sets. > In fact, it's probably worth > trying a hack where you put dummy information in the thread object > from _thread_add_message and see how long it takes just to walk the > iterator (unfortunately I don't think profiling will help much here > because much of your time is probably spent waiting for I/O). I don't think I understand what you mean by dummy info in a thread object. > I don't think there would be any downside to doing this for eager > consumers like the CLI. one should think so, yes. /p