Re: feature request: caching message arrival time

On Sat 2019-06-01 16:19:19 +0200, Ralph Seichter wrote:
> I'm interested. Right now I frankly don't know what knowing when a
> message was first seen by Notmuch might be useful for. That makes it
> a bit difficult for me to contemplate your questions.

Sure, thanks for asking!

As i went to write this down, it became a lot longer than i'd expected.
sorry about that!  On the positive side, i may have convinced myself in
the process that the threat this mechanism would defend against is small
enough that it may not be worth the additional implementation (though if
the implementation were there, we'd certainly want to use it).

So, this is a story about Autocrypt state, out-of-order delivery, and
e-mails with suspicious date stamps ("from the future"). (if you're
reading this message haven't been following Autocrypt closely, you can
read up at https://www.autocrypt.org/)

------

When receiving an e-mail sent From: the peer foo@example.org, an
Autocrypt-capable client needs to update the Autocrypt state for that
peer's e-mail address ("foo@example.org").  This is the case for
messages that have an Autocrypt: header *and* for messages that *don't*
have one.

Both kinds of messages update the Autocrypt peer state, because if you
start receiving Autocrypt-free messages from someone who used Autocrypt
in the past, your client needs to make a note of that and consider it
when it makes its recommendation for new outbound messages to that peer.

Additionally, sometimes we receive e-mail messages out of order.
sometimes this is because we're suddenly running across a cache of old
messages, sometimes it's because we've just popped online after a day
off, and sometimes it's because SMTP had a hiccup (there are probably
many other reasons).

We also probably don't want to store state about everyone who has ever
sent us mail *without* using Autocrypt.  At the moment, at least, that's
probably most senders, and it's both a waste of space and a potential
privacy concern to record a lot of empty state that just indicates that
you got mail from someone at some point in the past.  So if we've never
seen an Autocrypt header from a given peer, there's no state to update.

So now consider the following set of e-mail messages all from the same
sender; mails with a * have an Autocrypt header, and the times
following the message indicates its Date: header in an abstract way
(higher numbers are later than lower numbers).

 A: (time 1)
 B*: (time 2)
 C: (time 3)

Let's assume that i update Autocrypt state about the peer upon receipt
of each message, regardless of what order the messages were sent.  We
want the Autocrypt state to be immutable, independent of the order of
delivery.

If i receive them at times 4, 5, and 6 in order (A, B, C) then i'll
think that the Autocrypt state for the peer is "we had an Autocrypt
header earlier (from B), but a more recent delivery (C) suggests that
they might not be using Autocrypt reliably" (depending on the actual
difference in time between the Date:s of B and C, the peer might end up
with an Autocrypt recommendation called "discourage").  This is the
correct state for us to end up in.

But now imagine that at times 4, 5, and 6 i receive the messages in the
order A, C, B.  If i don't store Autocrypt state for the peer at times 4
and 5, because i've never seen an Autocrypt header for the peer before,
and there is none in messages A and C.  Then my end result is that i'll
think that the Autocrypt state for the peer is just the Autocrypt header
from B.

But that's it's different from what we ended up with when we received
the messages in order.

Now, we can improve on this with the following extra technique: when a
peer goes from no Autocrypt state to having an Autocrypt state, we can
search the existing index for messages from that peer with a later Date:
header.  If we find such a message, then we should include it in our
calculations.  If we do that, then we end up with the correct state,
regardless of the order of delivery.  good!

So far, we haven't needed the firstseen= property yet.  There's one
final wrinkle that introduces the need for it: message Date: headers can
be wrong.  They can even be grossly wrong -- they can be from the
future.  This can happen when the sender's clock is bad, mainly, but it
can also happen through malice (someone wanting to forge a message to
mess with the receipient's state about a given peer, for example).

So Autocrypt defines the "effective date" of a message as the *earliest*
of two dates: the date that the message is first seen, and the Date:
header itself.  So we want our augmented Autocrypt header ingestion
routine to search for all other messages we know about from the sender
that have both a later firstseen= property *and* a later Date: header.

Otherwise, one poorly formed e-mail without an Autocrypt header with the
Date: set to the year 3000 (the "bogus future message") would make it so
that the peer's recommendation would be set to "discourage" when a
message that contains an Autocrypt: header first comes in.

Conclusion
----------

Upon writing all this down, perhaps that's not such a troubling threat.
Having such a bogus future message stored in the database would indeed
leave the peer with a "discouraged" Autocrypt state upon receipt of the
first Autocrypt: header.  But if that database search only happens upon
the first Autocrypt: header seen, then a second message from the same
peer would clear the "discouraged" recommendation without consulting the
bogus future message at all.

So if the threat of a bogus future message is overcome by just a single
additional Autocrypt-enabled message from the same peer, that's not
particularly bad.  And "bogus future message" probably isn't all that
likely either.

So this isn't very high on my list of priorities after all, though if
such a lastseen property were available, i'd definitely use it to
improve the Autocrypt experience in this minor way.

        --dkg

Re: feature request: caching message arrival time

Thread: