Re: Fixed Message-ID trouble

Subject: Re: Fixed Message-ID trouble

Date: Tue, 26 Sep 2023 13:44:00 +0200

Cc: Daniel Corbe, notmuch@notmuchmail.org

David Bremner <david@tethera.net> writes:

> Alexander Adolf <alexander.adolf@condition-alpha.com> writes:
>
>> Bearing in mind that re-recognising a message which has arrived
>> multiple times via different routes is a worthwhile feature, it would
>> seem to me that a hash over the invariant part of the message, that is
>> the body, would allow for such detection. In that light, it would seem
>> to me that the tuple (body_hash, message_id) could be a candidate for
>> a “unique enough”(tm) identifier?
>
> I always had the impression that the message body had too variation
> imposed by different delivery routes for this to be very helpful:
> essentially the hash would be different for every file due to trailers
> added by mailing lists,

Ah, good point. I hadn't thought of mailing list trailers. Could these
perhaps be detected via the signature line separator "-- \n"?

I guess this also touches on the question of what a consensus definition
of "sameness" could be. If we take the message-id only, it'd be a purely
technical one. If we'd include the content one way or another (for
instance via hash over the body), that would rather be an editorial
definition of "sameness".

> re-encoding,

Like...? utf-8 to/from quoted-printable...?

> stupid "external message" headers added by malicious^Wcorporate mail
> servers, etc...

Headers would not "muddy the waters" since they are headers. In my mind,
the hash would be over the body only.

> I could be wrong, maybe hashing is a useful approach, but I'd need to
> see some numbers to be convinced.

I fully agree that we need to adapt to the realities of how things are
actually used, not how they were intended to be used.

How would I find instances of multiple files for the same message-id in
my database for example?

Cheers,

  --alexander
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-leave@notmuchmail.org

Previous message (by thread): Re: Fixed Message-ID trouble

Thread:

Teemu Likonen—Fixed Message-ID trouble [inbox, signed, unread]
- Teemu Likonen—Re: Fixed Message-ID trouble [inbox, signed, unread]
  - Michael J Gruber—Re: Fixed Message-ID trouble [inbox, unread]
  - Daniel Corbe—Re: Fixed Message-ID trouble [inbox, signed, unread]
    - Teemu Likonen—Re: Fixed Message-ID trouble [inbox, signed, unread]
      - Alexander Adolf—Re: Fixed Message-ID trouble [attachment, inbox, signed, unread]
        David Bremner—Re: Fixed Message-ID trouble [inbox, unread]
        Alexander Adolf—Re: Fixed Message-ID trouble [inbox, unread]
        Andreas Kähäri—Re: Fixed Message-ID trouble [inbox, unread]
        Alexander Adolf—Re: Fixed Message-ID trouble [inbox, unread]
      - David Bremner—Re: Fixed Message-ID trouble [inbox, unread]
        Teemu Likonen—Re: Fixed Message-ID trouble [inbox, signed, unread]
        David Bremner—Re: Fixed Message-ID trouble [inbox, unread]
- Gregor Zattler—Re: Fixed Message-ID trouble [inbox, unread]
  - Andy Smith—Re: Fixed Message-ID trouble [inbox, unread]
- Daniel Kahn Gillmor—Re: Fixed Message-ID trouble [inbox, signed, unread]
- David Bremner—Re: Fixed Message-ID trouble [inbox, unread]
  - Teemu Likonen—Re: Fixed Message-ID trouble [inbox, signed, unread]