Austin Clements <amdragon@MIT.EDU> writes: >> +subdirs := compat completion emacs lib man parse-time-string >> +subdirs := $(subdirs) performance-test util test > += ? > Sure. >> +CORPUS_NAME := notmuch-email-corpus-$(PERFTEST_VERSION).tar.xz > > Would it make sense to split out the different size corpora so a user > could, say, only download the small one? Currently the choice of test is local to given test file; one doing something particularly intense (or just lots of repetitions) might want to only use a subset. So I'm not sure if separate downloading of smaller corpora makes sense. This is all hypothetical at the moment, since the one test file uses the full corpus. > "\nPlease download ${TXZFILE} using\n\n"? OK >> +add_email_corpus takes arguments "--small" and "--medium" for when you >> +want smaller corpuses to check. > > "corpora"? reworded to say ,---- | add_email_corpus takes arguments "--small" and "--medium" for when you | want smaller subsets of the corpus to check. `---- > > I'm a bit confused by this. What happens if you don't specify --small > or --medium? Is the "large"/default corpus just the combined small > and medium corpora? Would be worth a comment, at least. Hopefully the README makes this clear(er) now? > This probably doesn't matter now, but I wonder if we want to unpack on > first use to somewhere not test-specific and then cp -rl the corpus > into the test directory. I haven't tried unpacking the corpus yet, > but if you're running tests repeatedly to compare results, or running > more than one performance test, it seems like a full decompress and > unpack could get onerous. Hmm. On my machine it is 10s for the copy versus 45s for a full unpack. For some reason I tested with "cp -a" which is incredibly slow, so I thought there was no loss. For comparison the basic test takes about 10 minutes on the same machine. In any case this can wait until we have a second test file and a second call to add_mail_corpus, adding caching now would not help.