Possession is nine-tenths of design

Being explicit about ownership of resources really matters.

Ownership/non-ownership can be construed as the difference between aggregation and composition (white or black UML diamonds), and it’s something we need to be absolutely clear about every time we design an object with an association to another object.

A garbage-collected runtime like the JVM will let you get away without thinking about this too much. The hardest thing the GC has to do is deal with cyclic references. You can design your GC classes without really caring who controls the lifespan of the things it uses. Because the reaper is the GC god, and we trust the gods. Whether it matters that you don’t have to be as careful to avoid cycles in your design in Java/C#/Python/Ruby is an interesting question, and I don’t have an answer. My inclination is that it’s something you should have to be explicit about in your design–because even if an object isn’t actually freed, its state may not make any sense relative to the rest of your program after its useful life has expired. So the deal is that an unmanaged language makes you consider this where you don’t really have to in a managed environment. And maybe I’m glad to be made to. Maybe.

It seems to me that if we set up the appropriate heuristic good behavior will simply follow good design. Let’s try:

1) Design for ownership

There is nothing intrinsically wrong with this code:

class Goofus
{
private:
    Gallant* m_gallant;
};

But I would suggest that we establish a policy that that code can be read to mean, “A Goofus references a Gallant but does not control its lifespan.” This code should mean this Goofus does not own its Gallant. The only thing I can suggest to improve this would be a comment:

Gallant* m_gallant; // not owned

In the case where a Goofus should own its Gallant, write

class Goofus
{
private:
    Holder<Gallant> m_gallant;
};

where Holder is some reference-tracking wrapper. (Don’t use std::auto_ptr unless your class is non-copyable.) Our group mostly uses an intrusive-counted shared-ownership holder. The boost smart pointers work great.

Your ownership strategy should be clear to you before you code and easy to diagram. And it should contain no cycles. In a tree structure often a parent should own its children, but the children should not own their parent.

But cycles are a part of life and do not always indicate bad design. In a free graph structure for example, it’s impossible to know statically how the ownership will work out in the running process. In this case, I say punt. What you have is a legitimately difficult ownership scenario, and you should separate the ownership issues from the data structure. You should make a little ownership god for this data structure who deals with these issues and nothing else.

2) Everybody’s owned by somebody

Any call that may even potentially produce a resource should return an owning wrapper to that resource until it hits the level at which your design manages it. Once the resource has been nestled into its owned context for your design, you should feel free to return a raw pointer. This sounds complicated, but it isn’t.

Consider it this way: “new” makes memory, gives you a pointer and tells you it’s your problem to clean it up. Therefore “new” should ALWAYS IMMEDIATELY be wrapped in something that takes ownership of the created resource. Something like

Holder<Goofus> goofusHolder(new Goofus());

(The need for immediate wrapping has to do with exception safety. Even if you separate the holding operation by a line or two, you open up the possibility of an exception intervening and your leaking an un-held object.)

Now you really have to work to leak that new Goofus object. As your creation routines pass this new Goofus up the chain, return the Holder, not the Goofus. At the point that you store that Holder on an object whose lifespan is controlled by somebody else, the Goofus has been integrated into your ownership strategy and you may return the raw Goofus pointer from there on. Maybe this Goofus really is temporary and shouldn’t be held by anybody outside the current scope. Fine just let your Holder expire without assigning it to anybody with a longer lifespan. Your Goofus will get magically deleted.

This implies that you shouldn’t see “delete” of a Goofus anywhere but in the Holder code. If you or others are writing “deletes” of Goofuses, you aren’t following the pattern. The general rule is that the cleanup of a resource should be implicitly and automatically established at the point of construction (RAII). If you find yourself manually invoking cleanup code for this specific resource somewhere else, you’re doing something wrong. (You may write specialized cleanup code, but its execution should be guaranteed by your ownership design.)

This brings up an interesting recipe for pain that we have in the existing codebase. We are clients of some code that returns unwrapped pointers to resources that may have been newly allocated and are not being managed or may be persistent managed buffers, and it doesn’t give us any way of finding out which is which. We have a choice: leak or crash. If the code we were using instead returned a holder, there wouldn’t be any problem. Either we would be the only owner and when our reference went away, the resource would be deleted, or it would be also owned by some other system, and when our reference went away, it would persist.

3) It’s OK to pass around a raw pointer

Raw pointers are fine. However, I have come to believe that it isn’t a good idea for anyone else receiving this raw pointer to produce and hold another reference. (Certainly you don’t want a newly produced boost::shared_ptr unless you use shared_from_this, because that would be downright dangerous.) But generally, holding a second reference starts to indicate a subversion of the simple ownership strategy you intended at the beginning. It becomes difficult to reason about the lifespan of an object if you cannot be sure who might continue to hold it in memory.

Wrinkles

Multithreaded contexts make this stuff a little more complicated. In single-threaded code, you can be sure that in straight-line code your raw pointer won’t be deleted. In a multi-threaded environment, it makes sense to pass the holder around more so that you never suffer time gaps in ownership when another thread can dereference the resource out of existence.

Conclusion

The heuristic is

1) Design for ownership (beware possible ownership cycles)
2) Everybody’s owned by somebody (maybe even just a local variable)
3) It’s OK to pass around a raw pointer (in single-threaded code)

And maybe, that’s really about it.

Leave a Reply

Your email address will not be published. Required fields are marked *