Re: [PATCH] NEWS: cleartext indexing

On Mon 2017-10-30 12:16:25 -0400, Antoine Beaupré wrote:
> I think that assumption should be made clear in the documentation,
> because "security of your index" means nothing to me. Explicitly mention
> FDE as an example may be a good start.

again, i'm not convinced that "full disk" encryption is what's
warranted, although filesystem, directory, or per-file encryption might
be part of the solution.

I don't want the documentation produced here to prescribe a particular
solution, because i *want* people to experiment and investigate.

I also don't think that the notmuch documentation is the right place to
put a primer on filesystem encryption, or a treatise on the comparison
of data at rest to data in flight.  I wouldn't object to pointers to
more discussion here though, for people who want to read more.

> Frankly, I don't have a good solution for this. I was thinking that
> there may be a way to just encrypt the whole notmuch database with gpg
> and decrypt it on the fly as needed, but that's probably a ludicrous
> idea.

Given the size of the database on the corpora i'm used to running
notmuch over, i don't see an efficient way to do that workflow.  If the
database was smaller, that'd be no problem.

But even if it was doable efficiently, you're still left with a question
of when you plan to unlock or re-lock the database.  And while the
database is unlocked, how do you limit access to it to notmuch itself?

And if you do limit it to notmuch itself, what about the other tools
that might want to interface with notmuch?  are they allowed access to
the cleartext?

These are much more interesting questions to my mind than whether we use
gpg specifically or something else.

They would also apply to filesystem-level encryption.

> So I hear all those arguments and mostly agree with them. That's the
> "rationale about the decision" part, what I'm missing is the "mitigation
> strategies". What I'm hearing is simply "use FDE", but I already do that
> and I don't feel it brings much added security.

It's interesting that you heard "FDE" when i've explicitly said "not
FDE". Filesystem encryption ≠ "full disk" encryption ≠ per-directory
encryption.

For example, you could have a separate local filesystem that contains
only your message store and your notmuch index, mounted atop a distinct
crypto-mapped block device.

Then, when you're willing to allow access to your index (and the rest of
your "at-rest" mail) you simply map the block device and mount the
filesystem where the user account that uses notmuch can read it.

when you're done with mail, you umount and unmap.

This proposal also adds "encryption at rest" to the mail that wasn't
even encrypted in transit, as a bonus :)

You could even put the whole underlying block device on removable
storage, making it truly inaccessible when it is not plugged in.

Or you could try to use ext4's new-ish encryption features:

    https://lwn.net/Articles/639427/
    https://wiki.archlinux.org/index.php/Ext4#Using_file-based_encryption
    http://kernsec.org/files/lss2014/Halcrow_EXT4_Encryption.pdf

I'd be happy to experiment with that with you and report back to the
list if you like.  Maybe we could prototype a "notmuch lock" command?

Again, i don't think that notmuch documentation is a great place to
document these ideas, unless we're actually implementing them in a
simple and straightforward way so that the user can trigger the actions
easily.

And, sadly, note that my proposal above usually requires root access
(mount/umount/device-map) on most GNU/Linux systems (i haven't looked
into other systems in enough detail).  This is certainly an obstacle to
deployment.

> Having my emails encrypted adds another layer of security to that
> content. FDE is good for data "at rest", e.g. when i travel with my
> laptop. But when my laptop is opened and running, I like to think that a
> part of it isn't accessible without the security layers behind PGP and
> actual human confirmation.

It sounds to me like you think that all invocations of PGP are going to
be mapped behind human confirmation.  This is not the general GnuPG use
case these days (due to caching in gpg-agent), except for people with an
external crypto token that itself physically requires presence to
decrypt.

In the event of rendering a 10-message encrypted thread, the "human
confirmation" approach requires 10 touches of the cryptotoken.  IIRC,
even if the human is alert and ready to touch the token when needed,
we're talking about probably at least half a minute of delay between
"i'd like to read this thread" and "ah, there it is".  and the user
can't even do something else with that time because they're all tied up
watching for and performing the crypto-token-touch.

I don't think that's acceptable for an e-mail client that you expect
users to actually use, unless you're running some sort of skinnerian
behaviorist experiement.

So in practice, one authorization of your PGP key is likely to enable
arbitrary programs to access it for the duration of the gpg-agent
cache.  So your data is still actually accessible to any process that
can access both the agent and the message store.

That said, if there is a specific message which you think should not be
in the index, the cleartext-index series and the session-key series both
provide means for keeping a *specific* encrypted message in the
mailstore while ensuring that it is not indexed and no session-keys are
stashed.

An interesting proposal might be to add an additional per-message
property, which says explicitly "do not index cleartext or store session
keys for this message".  I don't believe there are many users who would
actually use this feature, but i would review a patch for it and provide
feedback.

> Now, I understand there may be no solution to this. But shifting the
> burden of "secure this" to the user doesn't seem fair in this
> context. We should clearly expose this as a compromise that the user
> must be ready to take, not just be left as an exercise to the reader,
> because there may be *no* solution.

I disagree.  This patch series doesn't shift any burden to the user.
Without the series, the user has no way to make this decision -- they're
stuck with "encryption in transit means complete encryption at rest",
with all the poor usability and crypto-discouragement that entails.  The
series provides users a way to *decide* to shift the burden to
themselves, to improve their prospects for actually using end-to-end
encryption in transit.

> In other words, what I think you are proposing with this patchset is to
> consider PGP email encryption as a end to end encryption mechanism, but
> *not* as a "at rest" encryption mechanism. I think that's a tradeoff I
> may be ready to make, but at least it needs to be explicitly stated.

That's exactly what the documentation says: do not enable this without
considering the security of your index.  And again, i don't want to
prescribe solutions unless we're going to offer an easy way to implement
them.

> I am also not sure if it's the best way to implement such a
> tradeoff. Why not simply decrypt the actual email on delivery and store
> them in cleartext if you're going to have a cleartext copy on the side
> anyways? That would seem like a much simpler solution to the problem
> you're trying to solve here...

I hear this proposal about every month, i think, usually from different
people.   I think it's a bad idea on several levels.

Personally, i expect my message store to be untouched, with the message
as-delivered.  This allows me to sync the message store itself to other
places (e.g. offline-imap, rsync, etc) without worrying that i'm
suddenly exposing new data.  It also allows me to inspect the delivered
messages for any metadata that might otherwise be destroyed during
decryption, if i ever want to analyze the traces that i'm leaving.  And
if i was to keep both copies of the message (as-delivered, and
as-decrypted), i'd double the size of my mailstore (well, i'd double the
size of my encrypted mailstore anyway).

No thanks!  These are bad tradeoffs, when i can get cleartext indexes
and fast rendering without bothering with all of these other downsides.

> Yeah, that's definitely something that's missing from Linux
> systems. Android also suffers from that problem, even though it really
> tries hard to keep data from being shared between applications. This is
> better explained by Matt Green here:
>
> https://blog.cryptographyengineering.com/2016/11/24/android-n-encryption/
>
> But basically, iOS encrypts file per app, not per disk, so that app A
> doesn't have the crypto key material to decrypt data from app B at
> all. This is a fundamentally different principle than the way we do
> encryption now in Linux, and would require a fundamentally different
> approach to a lot of things for this to work at all on our
> workstations.

I agree with you that per-app, per-user, and per-file encryption all
offer several great properties that are better than filesystem-level
encryption on its own.  and GNU/Linux is behind on that front (though
see my remarks about about ext4 encryption).

Note that a stored encrypted e-mail, with its session-key stored
separately in the index, is actually very similar to per-file
encryption.  The e-mail leaks more metadata than the file itself, but
the model's roughly equivalent.  The only thing that's missing is the
protection of the index itself, as you mention.

I'd be happy to explore what "per-app" encryption might mean on desktop
systems as well with you, but we should probably do that off-list, and
come back here when we've got something specific to propose.

> I definitely don't mean to block this. But I would like to see some
> changes to the documentation to better explain those trade-offs, even if
> it means just linking to this discussion. :)

Please propose a patch to the documentation that would satisfy you!  I
agree with you that having some more discussion would be useful, but the
full content of even this thread would be out of place in any of the
notmuch manpages.

        --dkg

Re: [PATCH] NEWS: cleartext indexing

Thread: