Re: [PATCH 3/7] go: Allow notmuch objects to be garbage collected

Subject: Re: [PATCH 3/7] go: Allow notmuch objects to be garbage collected

Date: Sat, 4 Aug 2012 15:28:40 -0400

To: Adrien Bustany

Cc: notmuch@notmuchmail.org

From: Austin Clements


Quoth Adrien Bustany on Jul 24 at  1:03 am:
> Le 20/07/2012 06:23, Austin Clements a écrit :
> >Quoth Adrien Bustany on Jul 19 at  9:25 pm:
> >>Le 18/07/2012 23:40, Austin Clements a écrit :
> >>>This is subtle enough that I think it deserves a comment in the source
> >>>code explaining that tracking the talloc owner reference, combined
> >>>with the fact that Go finalizers are run in dependency order, ensures
> >>>that the C objects will always be destroyed from the talloc leaves up.
> >>
> >>Definitely, I tend to comment in the commit message and forget about
> >>the code...
> >>
> >>>
> >>>Just one inline comment below.  Otherwise, I think this is all
> >>>correct.
> >>
> >>Agree with the comment, the Database should be the parent. I guess I
> >>wasn't sure of the talloc parenting.
> >>
> >>>
> >>>Is reproducing the talloc hierarchy in all of the bindings really the
> >>>right approach?  I feel like there has to be a better way (or that the
> >>>way we use talloc in the library is slightly broken).  What if the
> >>>bindings created an additional talloc reference to each managed
> >>>object, just to keep the object alive, and used talloc_unlink instead
> >>>of the destroy functions?
> >>
> >>Reproducing the hierarchy is probably error prone, and not that
> >>simple indeed :/
> >>I haven't checked at all the way you suggest, but if we use
> >>talloc_reference/unlink, we get the same issue no?
> >>- If we do for each new wrapped object talloc_reference(NULL,
> >>wrapped_object), the the object will be kept alive until we
> >>talloc_unlink(NULL, wrapped_object), but what about its parents? For
> >>example will doing that on a notmuch_message_t keep the
> >>notmuch_messages_t alive?
> >
> >Hmm.  This is what I was thinking.  You have an interesting point; I
> >think it's slightly wrong, but it exposes something deeper.  I believe
> >there are two different things going on here: some of the talloc
> >relationships are for convenience, while some are structural.  In the
> >former case, I'm pretty sure my suggestion will work, but in the
> >latter case the objects should *never* be freed by the finalizer!
> >
> >For example, notmuch_query_search_messages returns a new
> >notmuch_messages_t with the query as the talloc parent, but that
> >notmuch_messages_t doesn't depend on the query object; this is just so
> >you can conveniently delete everything retrieved from the query by
> >deleting the query.  In this case, you can either use parent
> >references like you did---which will prevent a double-free by forcing
> >destruction to happen from the leaves up but at the cost of having to
> >encode these relationships and of extending the parent object
> >lifetimes beyond what's strictly necessary---or you can use my
> >suggestion of creating an additional talloc reference.
> 
> Actually, checking the code of notmuch_query_search_messages, it
> seems that the notmuch_messages_t (and the notmuch_message_t as
> well) object *does* depend on the database and the query... So in
> that case I think we need the "owner" Object reference as I
> currently have (we want the Messages to keep the Query alive, and
> the Query keeps the Database alive).

It does depends on the database (I think just about everything depends
on the database, directly or indirectly, so I suppose everything will
need some parent pointer), but could you explain how it depends on the
query?  It uses the MSet derived from the query, but Xapian internally
handles the sharing and referencing counting of all of its objects.

> That said, you example below looks valid, and it seems I'll need to
> add a flag to createMessage() (and some others) to disable the
> SetFinalizer call for certain instances (we probably want to keep it
> for eg. SearchMessageByFilename).
> 
> - The candidates I found for adding a tmalloc reference and not a
> "full" Go reference (therefore preventing to keep the parent alive
> too long needlessly) are GetAllTags, Thread.GetTags,
> Messages.CollectTags, and Message.GetTags (those are basically
> string lists)

Sounds reasonable (but I haven't gone through carefully).

> - The methods for which I should remove the SetFinalizer on the
> wrapper (as you showed in the example below) while keeping the Go
> reference are Threads.Get and Messages.Get

Sounds right.  I think those are the only cases where the object is
still owned by a container, other than strings (which Go has to copy
anyway).

> I would also maybe remove all the Destroy() functions, since they
> now seem more dangerous than anything else...

Yeah, probably.

> I tried to write a test using runtime.GC to test the behaviour of
> the bindings, but for some reasons some cases which are supposed to
> crash don't, which makes me sceptical about the validity of the test
> :-/

Hmm.  Go's collector is partially conservative, IIRC, so maybe it's
following a technically dead pointer?

> Cheers
> 
> Adrien
> 
> >
> >However, in your example, the notmuch_message_t's are structurally
> >related to the notmuch_messages_t from whence they came.  They're all
> >part of one data structure and hence it *never* makes sense for a
> >caller to delete the notmuch_message_t's.  For example, even with the
> >code in this patch, I think the following could lead to a crash:
> >
> >1. Obtain a Messages object, say ms.
> >2. m1 := ms.Get()
> >3. m1 = nil
> >4. m2 := ms.Get()
> >5. m2.whatever()
> >
> >If a garbage collection happens between steps 3 and 4, the Message in
> >m1 will get finalized and destroyed.  But step 4 will return the same,
> >now dangling, pointer, leading to a potential crash in step 5.
> >
> >Maybe the answer in the structural case is to include the parent
> >pointer in the Go struct and not set a finalizer on the child?  That
> >way, if there's a Go reference to the parent wrapper, it won't go away
> >and the children won't get destroyed (collecting wrappers of children
> >is fine) and if there's a Go reference to the child wrapper, it will
> >keep the parent alive so it won't get destroyed and neither will the
> >child.
> >
> >>- If we do talloc_reference(parent, wrapped), then we reproduce the
> >>hierarchy again?
> >>
> >>Note that I have 0 experience with talloc, so I might as well be
> >>getting things wrong here.
> >>
> >>>
> >>>Quoth Adrien Bustany on Jul 18 at  9:34 pm:
> >>>>This makes notmuch appropriately free the underlying notmuch C objects
> >>>>when garbage collecting their Go wrappers. To make sure we don't break
> >>>>the underlying links between objects (for example, a notmuch_messages_t
> >>>>being GC'ed before a notmuch_message_t belonging to it), we add for each
> >>>>wraper struct a pointer to the owner object (Go objects with a reference
> >>>>pointing to them don't get garbage collected).
> >>>>---
> >>>>  bindings/go/src/notmuch/notmuch.go |  153 +++++++++++++++++++++++++++++++-----
> >>>>  1 files changed, 134 insertions(+), 19 deletions(-)
> >>>>
> >>>>diff --git a/bindings/go/src/notmuch/notmuch.go b/bindings/go/src/notmuch/notmuch.go
> >>>>index 1d77fd2..3f436a0 100644
> >>>>--- a/bindings/go/src/notmuch/notmuch.go
> >>>>+++ b/bindings/go/src/notmuch/notmuch.go
> >>>>@@ -11,6 +11,7 @@ package notmuch
> >>>>  #include "notmuch.h"
> >>>>  */
> >>>>  import "C"
> >>>>+import "runtime"
> >>>>  import "unsafe"
> >>>>
> >>>>  // Status codes used for the return values of most functions
> >>>>@@ -47,40 +48,152 @@ func (self Status) String() string {
> >>>>  /* Various opaque data types. For each notmuch_<foo>_t see the various
> >>>>   * notmuch_<foo> functions below. */
> >>>>
> >>>>+type Object interface {}
> >>>>+
> >>>>  type Database struct {
> >>>>  	db *C.notmuch_database_t
> >>>>  }
> >>>>
> >>>>+func createDatabase(db *C.notmuch_database_t) *Database {
> >>>>+	self := &Database{db: db}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Database) {
> >>>>+		if (x.db != nil) {
> >>>>+			C.notmuch_database_destroy(x.db)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>+}
> >>>>+
> >>>>  type Query struct {
> >>>>  	query *C.notmuch_query_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createQuery(query *C.notmuch_query_t, owner Object) *Query {
> >>>>+	self := &Query{query: query, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Query) {
> >>>>+		if (x.query != nil) {
> >>>>+			C.notmuch_query_destroy(x.query)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Threads struct {
> >>>>  	threads *C.notmuch_threads_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createThreads(threads *C.notmuch_threads_t, owner Object) *Threads {
> >>>>+	self := &Threads{threads: threads, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Threads) {
> >>>>+		if (x.threads != nil) {
> >>>>+			C.notmuch_threads_destroy(x.threads)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Thread struct {
> >>>>  	thread *C.notmuch_thread_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createThread(thread *C.notmuch_thread_t, owner Object) *Thread {
> >>>>+	self := &Thread{thread: thread, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Thread) {
> >>>>+		if (x.thread != nil) {
> >>>>+			C.notmuch_thread_destroy(x.thread)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Messages struct {
> >>>>  	messages *C.notmuch_messages_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createMessages(messages *C.notmuch_messages_t, owner Object) *Messages {
> >>>>+	self := &Messages{messages: messages, owner: owner}
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Message struct {
> >>>>  	message *C.notmuch_message_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createMessage(message *C.notmuch_message_t, owner Object) *Message {
> >>>>+	self := &Message{message: message, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Message) {
> >>>>+		if (x.message != nil) {
> >>>>+			C.notmuch_message_destroy(x.message)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Tags struct {
> >>>>  	tags *C.notmuch_tags_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createTags(tags *C.notmuch_tags_t, owner Object) *Tags {
> >>>>+	self := &Tags{tags: tags, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Tags) {
> >>>>+		if (x.tags != nil) {
> >>>>+			C.notmuch_tags_destroy(x.tags)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Directory struct {
> >>>>  	dir *C.notmuch_directory_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createDirectory(directory *C.notmuch_directory_t, owner Object) *Directory {
> >>>>+	self := &Directory{dir: directory, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Directory) {
> >>>>+		if (x.dir != nil) {
> >>>>+			C.notmuch_directory_destroy(x.dir)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type Filenames struct {
> >>>>  	fnames *C.notmuch_filenames_t
> >>>>+	owner Object
> >>>>+}
> >>>>+
> >>>>+func createFilenames(filenames *C.notmuch_filenames_t, owner Object) *Filenames {
> >>>>+	self := &Filenames{fnames: filenames, owner: owner}
> >>>>+
> >>>>+	runtime.SetFinalizer(self, func(x *Filenames) {
> >>>>+		if (x.fnames != nil) {
> >>>>+			C.notmuch_filenames_destroy(x.fnames)
> >>>>+		}
> >>>>+	})
> >>>>+
> >>>>+	return self
> >>>>  }
> >>>>
> >>>>  type DatabaseMode C.notmuch_database_mode_t
> >>>>@@ -100,12 +213,13 @@ func NewDatabase(path string) (*Database, Status) {
> >>>>  		return nil, STATUS_OUT_OF_MEMORY
> >>>>  	}
> >>>>
> >>>>-	self := &Database{db: nil}
> >>>>-	st := Status(C.notmuch_database_create(c_path, &self.db))
> >>>>+	var db *C.notmuch_database_t;
> >>>>+	st := Status(C.notmuch_database_create(c_path, &db))
> >>>>  	if st != STATUS_SUCCESS {
> >>>>  		return nil, st
> >>>>  	}
> >>>>-	return self, st
> >>>>+
> >>>>+	return createDatabase(db), st
> >>>>  }
> >>>>
> >>>>  /* Open an existing notmuch database located at 'path'.
> >>>>@@ -134,12 +248,13 @@ func OpenDatabase(path string, mode DatabaseMode) (*Database, Status) {
> >>>>  		return nil, STATUS_OUT_OF_MEMORY
> >>>>  	}
> >>>>
> >>>>-	self := &Database{db: nil}
> >>>>-	st := Status(C.notmuch_database_open(c_path, C.notmuch_database_mode_t(mode), &self.db))
> >>>>+	var db *C.notmuch_database_t;
> >>>>+	st := Status(C.notmuch_database_open(c_path, C.notmuch_database_mode_t(mode), &db))
> >>>>  	if st != STATUS_SUCCESS {
> >>>>  		return nil, st
> >>>>  	}
> >>>>-	return self, st
> >>>>+
> >>>>+	return createDatabase(db), st
> >>>>  }
> >>>>
> >>>>  /* Close the given notmuch database, freeing all associated
> >>>>@@ -204,7 +319,7 @@ func (self *Database) GetDirectory(path string) (*Directory, Status) {
> >>>>  	if st != STATUS_SUCCESS || c_dir == nil {
> >>>>  		return nil, st
> >>>>  	}
> >>>>-	return &Directory{dir: c_dir}, st
> >>>>+	return createDirectory(c_dir, nil), st
> >>>
> >>>It looks like you have a nil owner for anything whose talloc parent is
> >>>the database.  Is this intentional?  Shouldn't the owner be self in
> >>>these cases, too?
> >>>
> >>>>  }
> >>>>
> >>>>  /* Add a new message to the given notmuch database.
> >>>>@@ -258,7 +373,7 @@ func (self *Database) AddMessage(fname string) (*Message, Status) {
> >>>>  	var c_msg *C.notmuch_message_t = new(C.notmuch_message_t)
> >>>>  	st := Status(C.notmuch_database_add_message(self.db, c_fname, &c_msg))
> >>>>
> >>>>-	return &Message{message: c_msg}, st
> >>>>+	return createMessage(c_msg, nil), st
> >>>>  }
> >>>>
> >>>>  /* Remove a message from the given notmuch database.
> >>>>@@ -319,12 +434,12 @@ func (self *Database) FindMessage(message_id string) (*Message, Status) {
> >>>>  		return nil, STATUS_OUT_OF_MEMORY
> >>>>  	}
> >>>>
> >>>>-	msg := &Message{message: nil}
> >>>>-	st := Status(C.notmuch_database_find_message(self.db, c_msg_id, &msg.message))
> >>>>+	var msg *C.notmuch_message_t
> >>>>+	st := Status(C.notmuch_database_find_message(self.db, c_msg_id, &msg))
> >>>>  	if st != STATUS_SUCCESS {
> >>>>  		return nil, st
> >>>>  	}
> >>>>-	return msg, st
> >>>>+	return createMessage(msg, nil), st
> >>>>  }
> >>>>
> >>>>  /* Return a list of all tags found in the database.
> >>>>@@ -339,7 +454,7 @@ func (self *Database) GetAllTags() *Tags {
> >>>>  	if tags == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Tags{tags: tags}
> >>>>+	return createTags(tags, nil)
> >>>>  }
> >>>>
> >>>>  /* Create a new query for 'database'.
> >>>>@@ -379,7 +494,7 @@ func (self *Database) CreateQuery(query string) *Query {
> >>>>  	if q == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Query{query: q}
> >>>>+	return createQuery(q, nil)
> >>>>  }
> >>>>
> >>>>  /* Sort values for notmuch_query_set_sort */
> >>>>@@ -459,7 +574,7 @@ func (self *Query) SearchThreads() *Threads {
> >>>>  	if threads == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Threads{threads: threads}
> >>>>+	return createThreads(threads, self)
> >>>>  }
> >>>>
> >>>>  /* Execute a query for messages, returning a notmuch_messages_t object
> >>>>@@ -505,7 +620,7 @@ func (self *Query) SearchMessages() *Messages {
> >>>>  	if msgs == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Messages{messages: msgs}
> >>>>+	return createMessages(msgs, self)
> >>>>  }
> >>>>
> >>>>  /* Destroy a notmuch_query_t along with any associated resources.
> >>>>@@ -607,7 +722,7 @@ func (self *Messages) Get() *Message {
> >>>>  	if msg == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Message{message: msg}
> >>>>+	return createMessage(msg, self)
> >>>>  }
> >>>>
> >>>>  /* Move the 'messages' iterator to the next message.
> >>>>@@ -659,7 +774,7 @@ func (self *Messages) CollectTags() *Tags {
> >>>>  	if tags == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Tags{tags: tags}
> >>>>+	return createTags(tags, self)
> >>>>  }
> >>>>
> >>>>  /* Get the message ID of 'message'.
> >>>>@@ -739,7 +854,7 @@ func (self *Message) GetReplies() *Messages {
> >>>>  	if msgs == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Messages{messages: msgs}
> >>>>+	return createMessages(msgs, self)
> >>>>  }
> >>>>
> >>>>  /* Get a filename for the email corresponding to 'message'.
> >>>>@@ -871,7 +986,7 @@ func (self *Message) GetTags() *Tags {
> >>>>  	if tags == nil {
> >>>>  		return nil
> >>>>  	}
> >>>>-	return &Tags{tags: tags}
> >>>>+	return createTags(tags, self)
> >>>>  }
> >>>>
> >>>>  /* The longest possible tag value. */
> 
> 

-- 
Austin Clements                                      MIT/'06/PhD/CSAIL
amdragon@mit.edu                           http://web.mit.edu/amdragon
       Somewhere in the dream we call reality you will find me,
              searching for the reality we call dreams.

Thread: