Quoth myself on Oct 22 at 4:14 am: > Overall this looks pretty good to me, and I must say, this parser is > amazingly flexible and copes well with a remarkably hostile grammar. > > A lot of little comments below (sorry if any of this ground has > already been covered in the previous four versions). > > I do have one broad comment. While I'm all for ad hoc parsers for ad > hoc grammars like dates, there is one piece of the literature I think > this parser suffers for by ignoring: tokenizing. I think it would > simplify a lot of this code if it did a tokenizing pass before the > parsing pass. It doesn't have to be a serious tokenizer with > streaming and keywords and token types and junk; just something that > first splits the input into substrings, possibly just non-overlapping > matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.]. This would simplify the > handling of postponed numbers because, with trivial lookahead in the > token stream, you wouldn't have to postpone them. Likewise, it would > eliminate last_field. It would simplify keyword matching because you > wouldn't have to worry about matching substrings (I spent a long time > staring at that code before I figured out what it would and wouldn't > accept). Most important, I think it would make the parser more > predictable for users; for example, the parser currently accepts > things like "saturtoday" because it's aggressively single-pass. I should add that I am not at all opposed to this patch as it is currently designed. We need a date parser. My comment about separating tokenization is just a way that this code could probably be simplified if someone were so inclined or if simplifying the code would help it pass any hurdles.