> >> > - fuse zip stores all changes in memory until unmounted > >> > - fuse zip (and libzip for that matter) creates new temporary file when > >> > updating archive, which takes considerable time when the archive is > >> > very big. > >> > >> This isn't much of a hastle if you have maildir per time period and > >> archive off. Maybe if you sync flags it may be... > > > > That might be interesting solution, maildir per time period. > > > Although using a zip file through FUSE as a maildir store is not > much better in my opinion. > > This is because it still doesn't solve the syscall overhead. For > example just going through the list of files to find those that > changed requires the following syscalls: > * reading the next directory entry (which is amortized as it reads > them in a batch, but the batch size is limited, should we say 1 > syscall per 10 files?); > * stat-ing the file; > > Now by adding FUSE we add an extra context switch for each syscall... > > Although this issue would be problematic only for reindexing, but still... That's a price I would be willing to pay to have single file instead of many. > > But still > > fuse zip caches all the data until unmounted. So even with just reading > > it keeps growing (I hope I'm not accusing fuse zip here, but this is my > > understanding form the code). This could be simply alleviated by having > > it periodically unmounted and mounted again (perhaps from cron). > > I think there is an option for FUSE mount to specify if the data > should be cached by the kernel or not, as such this shouldn't be a > problem for FUSE itself, except if the Zip FUSE handler does some > extra caching.) To my understanding it's the handler itself. > >> > Of course this solution would have some disadvantages too, but for me > >> > the advantages would win. At the moment I'm not sure if I want to > >> > continue working on that. Maybe if there would be more interested guys > >> > >> I'm *really* tempted to investigate making this work for archived > >> mail. Of course, the list of mounted file systems could get insane > >> depending on granularity I guess... > > > > Well, if your granularity will be one archive per year of mail, it > > should not be that bad ... > > > On the other hand I strongly sustain having a more optimized > backend for emails, especially for such cases. For example a > BerkeleyDB would perfectly fit such a use case, especially if we store > the body and the headers in separate databases. > > Just a small experiment, below are the R `summary(emails)` of the > sizes of my 700k emails: > ~~~~ > Min. 1st Qu. Median Mean 3rd Qu. Max. > 8 4364 5374 11510 7042 31090000 > ~~~~ > > As seen 75% of the emails are below 7k, and this without any compression... > > Moreover we could organize the keys so that in a B-Tree structure > the emails in the same thread are closer together... Now I'm not sure if you talk about some berkeley-db fuse filesystem or direct support in notmuch. I don't have enough cycles to modify notmuch, so I started to look at simpler (codewise) solution ... To summarize, what I personally want from the mail storage - ability to read and write mails - should work with mutt (or mutt-kz) - simple backup to windows drive (files can't contain double colon ':') -- Vlad