Although this thread now might be offtopic, let me send a follow-up. By searching with C related terms, I found some articles about this issue. It seems to be a common problem on regex + multibyte in C. (e.g. https://stackoverflow.com/a/15895746 <https://stackoverflow.com/a/15895746>) On Wed, Aug 21, 2019 at 12:58:04PM +0000, tptlab@tuta.io <mailto:tptlab@tuta.io> wrote: > - [1] (U+FF11) is treated as [\x{F000}-\x{FFFF}] Actually, it becomes [\xef\xbc\x91]. That's why it matches with U+Fxxx (starts with \xef in UTF-8). And without ^, it matches partial byte of a character, U+4444 (\xe4\x91\x84), U+5C11 (\xeb\xb0\x91) for example. I'm not familiar with C and don't know whether pcre or \k solve this issue, but it might hard to fix if the root cause is how C handles multibyte strings. _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch