Re: [PATCH] v2: Added better support for multiple structured output formats.

Subject: Re: [PATCH] v2: Added better support for multiple structured output formats.

Date: Tue, 10 Jul 2012 15:13:31 -0400

To: craven@gmx.net

Cc: notmuch@notmuchmail.org

From: Austin Clements


Since it would be great to use the structure printer for show as well,
it would make sense to put it in its own source file, where it can
easily be shared between commands.

There are a few systematic code formatting problems in your patch.  To
be consistent with other notmuch code, there should be a space between
function names and parameter lists (in both declarations and calls)
and there should be a space between a keyword and a paren (e.g., "if
(" and "for (").  In a function definition, the function name should
start on a new line (this makes it easy to grep for a function's
definition).  Also, in general, try to keep lines under 80 characters
wide (we're not very consistent about this one, but it would be nice
to not become even less consistent about it).

More detailed comments below.

Quoth craven@gmx.net on Jul 10 at  3:30 pm:
> As discussed in <id:20120121220407.GK16740@mit.edu>, this patch adds
> support for new structured output formats (like s-expressions) by using
> stateful structure_printers. An implementation of the JSON structure
> printer that passes all tests is included. The output for JSON (and
> text) is identical to the current output. S-Expressions will be added in
> a later patch.
> 
> A large part of this patch just implements the differentiation between
> structured and non-structured output (all the code within 
> "if(format == unstructured_text_printer)").
> 
> In a second patch, the structured output code should be isolated in a
> separate file, and also used in all other parts of notmuch.
> 
> The interface is a structure structure_printer, which contains the following methods:
> 
> - initial_state: is called to create a state object, that is passed to all invocations. This should be used to keep track of the output file and everything else necessary to correctly format output.
> - map: is called when a new map (associative array, dictionary) is started. map_key and the primitives (string, number, bool) are used alternatingly to add key/value pairs. pop is used to close the map (see there). This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level.
> - list: is called when a new list (array, vector) is started. the primitives (string, number, bool) are used consecutively to add values to the list. pop is used to close the list. This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level.
> - pop: is called to return to a given nesting level. All lists and maps with a deeper nesting level must be closed.
> - number, string, bool: output one element of the specific type.
> 
> All functions should use the state object to insert delimiters etc. automatically when appropriate.
> 
> Example:
> int top, one;
> top = map(state);
> map_key(state, "foo");
> one = list(state);
> number(state, 1);
> number(state, 2);
> number(state, 3);
> pop(state, i);
> map_key(state, "bar");
> map(state);
> map_key(state, "baaz");
> string(state, "hello world");
> pop(state, top);
> 
> would output JSON as follows:
> 
> {"foo": [1, 2, 3], "bar": { "baaz": "hello world"}}
> ---
>  notmuch-search.c | 491 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 361 insertions(+), 130 deletions(-)
> 
> diff --git a/notmuch-search.c b/notmuch-search.c
> index 3be296d..4127777 100644
> --- a/notmuch-search.c
> +++ b/notmuch-search.c
> @@ -28,6 +28,210 @@ typedef enum {
>      OUTPUT_TAGS
>  } output_t;
>  
> +/* structured formatting, useful for JSON, S-Expressions, ...
> +
> +- initial_state: is called to create a state object, that is passed to all invocations. This should be used to keep track of the output file and everything else necessary to correctly format output.
> +- map: is called when a new map (associative array, dictionary) is started. map_key and the primitives (string, number, bool) are used alternatingly to add key/value pairs. pop is used to close the map (see there). This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level.
> +- list: is called when a new list (array, vector) is started. the primitives (string, number, bool) are used consecutively to add values to the list. pop is used to close the list. This function must return a nesting level identifier that can be used to close all nested structures (maps and lists), backing out to the returned nesting level.
> +- pop: is called to return to a given nesting level. All lists and maps with a deeper nesting level must be closed.
> +- number, string, bool: output one element of the specific type.
> +
> +All functions should use state to insert delimiters etc. automatically when appropriate.
> +
> +Example:
> +int top, one;
> +top = map(state);
> +map_key(state, "foo");
> +one = list(state);
> +number(state, 1);
> +number(state, 2);
> +number(state, 3);
> +pop(state, i);
> +map_key(state, "bar");
> +map(state);
> +map_key(state, "baaz");
> +string(state, "hello world");
> +pop(state, top);
> +
> +would output JSON as follows:
> +
> +{"foo": [1, 2, 3], "bar": { "baaz": "hello world"}}
> +
> + */

The above comment should follow the style of other notmuch comments:

/* Put a star before every line, use full sentences starting with a
 * capital letter, and wrap the comment at 72 columns.
 */

For the descriptions of the function pointers, it probably makes sense
to comment them inline in the struct itself so the reader doesn't have
to match things up.  E.g.,

typedef struct structure_printer {
    /* (Description of map.) */
    int (*map) (void *state);
    ...
} structure_printer_t;

> +typedef struct structure_printer {
> +    int (*map)(void *state);

This is a rather less obvious application of notmuch code style, but
there should be a space before the parameter list here, too:
    int (*map) (void *state);
Likewise for the other fields, of course.

> +    int (*list)(void *state);
> +    void (*pop)(void *state, int level);
> +    void (*map_key)(void *state, const char *key);
> +    void (*number)(void *state, int val);
> +    void (*string)(void *state, const char *val);
> +    void (*bool)(void *state, notmuch_bool_t val);
> +    void *(*initial_state)(const struct structure_printer *sp, FILE *output);
> +} structure_printer_t;

Is there a reason to separate the structure printer's vtable from its
state?  It seems like this forces the caller to keep track of two
things instead of just one.  For show this isn't too bad because the
structure printer is the formatter, but I think it will be cumbersome
for the search code.  Keeping them together would also be at least
nominally more typesafe.

> +
> +/* JSON structure printer */
> +
> +/* single linked list implementation for keeping track of the array/map nesting state */

Same comment formatting comment here.

> +typedef struct json_list {
> +    int type;
> +    int first_already_seen;

Maybe just "first_seen"?  The "already" doesn't add anything.  Also,
this should probably be a notmuch_bool_t.

> +    struct json_list *rest;
> +} json_list_t;
> +
> +#define TYPE_JSON_MAP 1
> +#define TYPE_JSON_ARRAY 2

An enum would be preferable here.

> +
> +typedef struct json_state {
> +    FILE *output;
> +    json_list_t *stack;
> +    int level;
> +} json_state_t;
> +
> +int json_map(void *state);
> +int json_list(void *state);
> +void json_pop(void *state, int level);
> +void json_map_key(void *state, const char *key);
> +void json_number(void *state, int val);
> +void json_string(void *state, const char *val);
> +void json_bool(void *state, notmuch_bool_t val);
> +void *json_initial_state(const struct structure_printer *sp, FILE *output);
> +
> +structure_printer_t json_structure_printer = {
> +    &json_map,
> +    &json_list,
> +    &json_pop,
> +    &json_map_key,
> +    &json_number,
> +    &json_string,
> +    &json_bool,
> +    &json_initial_state
> +};
> +
> +int json_map(void *st) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) {
> +	fputs(",", output);
> +	if(state->level == 1)
> +	    fputs("\n", output);
> +	else
> +	    fputs(" ", output);
> +    }
> +    if(state->stack != NULL) {
> +	state->stack->first_already_seen = TRUE;
> +    }

You repeat the above code (or something very similar) in several
places.  It would make sense to put it in a separate function,
possibly with an additional char argument to control whether to follow
the comma with a newline or a space.

> +    fputs("{", output);
> +    void *ctx_json_map = talloc_new (0);
> +    json_list_t *el = talloc(ctx_json_map, json_list_t);

You're leaking ctx_json_map here.  It's also not necessary: you should
use st as the talloc context for el.  It still makes sense to free
these as you pop, but that way if the caller needs to abort it can
free the whole state structure in one swoop.

> +    el->type = TYPE_JSON_MAP;
> +    el->first_already_seen = FALSE;
> +    el->rest = state->stack;
> +    state->stack = el;
> +    return state->level++;
> +}
> +
> +int json_list(void *st) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) {
> +	fputs(",", output);
> +	if(state->level == 1)
> +	    fputs("\n", output);
> +	else
> +	    fputs(" ", output);
> +    }
> +    if(state->stack != NULL) {
> +	state->stack->first_already_seen = TRUE;
> +    }
> +    fputs("[", output);
> +    void *ctx_json_map = talloc_new (0);
> +    json_list_t *el = talloc(ctx_json_map, json_list_t);
> +    el->type = TYPE_JSON_ARRAY;
> +    el->first_already_seen = FALSE;
> +    el->rest = state->stack;
> +    state->stack = el;
> +    return state->level++;
> +}
> +
> +void json_pop(void *st, int level) {
> +    int i;
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    for(i = state->level; i > level; i--) {

Maybe "while (state->level > level)"?

> +	json_list_t *tos = state->stack;
> +	if(tos->type == TYPE_JSON_MAP) {
> +	    fputs("}", output);
> +	}
> +	if(tos->type == TYPE_JSON_ARRAY) {

Maybe "else if"?

> +	    fputs("]", output);
> +	}
> +	state->stack = tos->rest;
> +	state->level--;
> +	talloc_free(tos);
> +    }
> +    if(state->level == 0)
> +	fputs("\n", output);
> +}
> +
> +void json_map_key(void *st, const char *key) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    if(state->stack != NULL && state->stack->first_already_seen) {
> +	fputs(",\n", output);
> +    }
> +    fputs("\"", output);
> +    fputs(key, output);

The key needs to be escaped like any JSON string.

> +    fputs("\": ", output);
> +}
> +
> +void json_number(void *st, int val) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) {
> +	fputs(", ", output);
> +    }
> +    state->stack->first_already_seen = TRUE;
> +    fprintf(output, "%i", val);

"%d" is much more common (I had to look up %i!)

> +}
> +
> +void json_string(void *st, const char *val) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;
> +    void *ctx = talloc_new(0);
> +    if(state->stack != NULL && state->stack->type == TYPE_JSON_ARRAY && state->stack->first_already_seen) {
> +	fputs(",", output);
> +	if(state->level == 1)
> +	    fputs("\n", output);
> +	else
> +	    fputs(" ", output);
> +    }
> +
> +    state->stack->first_already_seen = TRUE;
> +    fprintf(output, "%s", json_quote_str(ctx, val));

Take a look at the string quoting function in
id:"87d34hsdx8.fsf@awakening.csail.mit.edu".  It quotes directly to a
FILE * without allocating an intermediate buffer.  It's also actually
correct, unlike json_quote_str.

> +    talloc_free(ctx);
> +}
> +
> +void json_bool(void *st, notmuch_bool_t val) {
> +    json_state_t *state = (json_state_t*)st;
> +    FILE *output = state->output;

This needs to insert a comma like the other output functions.

> +    if(val)
> +	fputs("true", output);
> +    else
> +	fputs("false", output);

Maybe fputs(val ? "true" : "false", output)?

> +}
> +
> +void *json_initial_state(const struct structure_printer *sp, FILE *output) {
> +    (void)sp;
> +    json_state_t *st = talloc(0, json_state_t);
> +    st->level = 0;
> +    st->stack = NULL;
> +    st->output = output;
> +    return st;
> +}

This is as far as I've gotten.  I'll switch to your newer version and
review the rest when I get a chance.

Thread: