Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch

Subject: Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch

Date: Sat, 27 Oct 2012 23:38:00 +0300

To: Austin Clements, Jani Nikula

Cc: notmuch@notmuchmail.org

From: Tomi Ollila


On Thu, Oct 25 2012, Austin Clements wrote:

> Quoth myself on Oct 22 at  4:14 am:
>> Overall this looks pretty good to me, and I must say, this parser is
>> amazingly flexible and copes well with a remarkably hostile grammar.
>> 
>> A lot of little comments below (sorry if any of this ground has
>> already been covered in the previous four versions).
>> 
>> I do have one broad comment.  While I'm all for ad hoc parsers for ad
>> hoc grammars like dates, there is one piece of the literature I think
>> this parser suffers for by ignoring: tokenizing.  I think it would
>> simplify a lot of this code if it did a tokenizing pass before the
>> parsing pass.  It doesn't have to be a serious tokenizer with
>> streaming and keywords and token types and junk; just something that
>> first splits the input into substrings, possibly just non-overlapping
>> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
>> handling of postponed numbers because, with trivial lookahead in the
>> token stream, you wouldn't have to postpone them.  Likewise, it would
>> eliminate last_field.  It would simplify keyword matching because you
>> wouldn't have to worry about matching substrings (I spent a long time
>> staring at that code before I figured out what it would and wouldn't
>> accept).  Most important, I think it would make the parser more
>> predictable for users; for example, the parser currently accepts
>> things like "saturtoday" because it's aggressively single-pass.
>
> I should add that I am not at all opposed to this patch as it is
> currently designed.  We need a date parser.  My comment about
> separating tokenization is just a way that this code could probably be
> simplified if someone were so inclined or if simplifying the code
> would help it pass any hurdles.

What if the current patch set, i.e. messages

$ grep Message-Id: ~/patch | sed 's/Message-Id: /id:/; y/<>/""/'
id:"e684cadbb5a01b6079ef344b0d6f97541847914a.1350854171.git.jani@nikula.org"
id:"a90d3b687895a26f765539d6c0420038a74ee42f.1350854171.git.jani@nikula.org"
id:"75a8f129d5e0d824b3e04ddfc1816c45fa0ec70d.1350854171.git.jani@nikula.org"
id:"606a94d565e6b21abfc59d6ba9676a807d669127.1350854171.git.jani@nikula.org"
id:"cbd383bfc4bf844bb0366f13f675d48956137c52.1350854171.git.jani@nikula.org"
id:"f21b8702728457c087478b26700e9448bc16c61d.1350854171.git.jani@nikula.org"
id:"37026480956679b12e82e4975f1837e93ef1c531.1350854171.git.jani@nikula.org"
id:"cff9c1dd87b8bc11326dca0b3589c81656500f5e.1350854171.git.jani@nikula.org"

(patches 1-8 / 9 -- NEWS patch is stale) would just be pushed: there are
just few trivial things to be tuned and NEWS rebased -- which I think 
Jani will gladly do... It is just so much easier for him to continue
and us others to review the new diffs than these whole patches again
and again... At least I volunteer to track that these remaining issues
(tokenizer not included :).

Tomi

Thread: