Smart Pointers

Smart pointers have been around for quite awhile, but it is a concept I am only beginning to appreciate (wish I have learned about it earlier). Pointers are inescapable, especially when you want to utilise the full extent of the object-orientedness of C++; and when performance is key. However, using them opens up a Pandora’s box of memory management issues.

Pointers

For anyone who have just begun learning C/C++, pointers are something that is not easy to grasp. Admittedly, even up till now, I still run into troubles with them. For all intents and purposes, a “pointer” is just a fancy word to mean “memory address”. An analogue would be your ID in your country’s registry. Here, the ID is the pointer. Dereferencing this pointer would return you all your details, say your birthdate, fullname, etc.

Anyway, this article is not an introductory one about pointers. I believe there are plenty of tutorials from a quick Googling spree. I’m here to discuss about some of the pitfalls of using pointers all willy-nilly, despite its capabilities. You know what they say, with great power comes great responsibility.

In most cases, the programmer is responsible for freeing the memory they allocated once it is no longer needed. In C++, this is done using the delete keyword.

int* pNumber = new int;
/* do something */
delete pNumber;

The snippet you see above simply allocates 4 bytes of memory (in the heap space) to hold an integer, then subsequently frees it. For a simple program that terminates relatively quickly after it starts executing, forgetting to free memory isn’t very detrimental nowadays as the operating system would ultimately help to clean up when the program exits.

However, if the program is destined to run for hours on end, forgetting to free memory or having memory leaks is not good. The latter being one of the worst issues to run into. A memory leak occur when the address to a heap-allocated data is lost.

int* pNumber = new int;
pNumber = 0;

When pNumber is assigned to 0, the memory location of this int-sized data is gone. Hence, there is no way we could deallocate this bit of memory. Allocated chunks of memory cannot be recycled unless they are deallocated. The only way to get the memory back is to exit the program, or wait for it to crash from memory exhaustion.

Give It Some Intelligence

Smart pointers are created to mediate some of the problems that come with raw pointers. Smart pointers are not “smart” in the “artificial intelligence” sense. It is smart because they keep track of references and self-deallocate when there are no more references (that’s the tldr definition). In this article, we attempt to develop a simple implementation of a smart pointer, but do take note that the latest C++ standard has support for built-in smart pointers for various use cases.

Now, whenever the execution of a program enters a function, a stack frame is created. All local variables defined in the function are allocated on this stack. When the function exits and execution returns to the caller (the one who invoked the function), the entire stack frame is popped out of existence (I may have exaggerated this a bit, but yes, the entire stack is popped). This is also why when a function recursively calls itself indefinitely many times, you run into a stack overflow problem because the stack eventually intrudes into other parts of the memory space (most likely the heap).

When the stack frame is popped, all the local variables are freed.

void a_function()
{
    int a=0;
    char b=1;
    /* do something */

    /* Implicitly perform the following on exit:
    delete &a;
    delete &b;
    */
}

The above code kind of illustrates my point from earlier, that when a function exits, all local variables are freed. Of course, don’t un-comment those delete lines, cause deleting a local variable like that may result in undefined behaviour.

With that knowledge, can we craft something up that can help us perform automatic deletion of heap-allocated data? The answer is, yes we can!

A very simple implementation of a smart pointer is to wrap pointers in a class.

#include <stdio.h>

template<class ptype>
class SmartPtr {
    ptype* m_ptr;

public:
    SmartPtr(ptype* raw) 
    {
        printf("SmartPtr::ctor\n");
        m_ptr = raw;
    }

    ~SmartPtr() 
    {
        printf("SmartPtr::dtor\n");
        delete m_ptr;
    }

    // member access operator overload
    ptype* operator->() 
    {
        return m_ptr ? reinterpret_cast<ptype*>(m_ptr) : 0;
    }

    // dereference the pointer
    ptype& operator*() 
    {
        return *reinterpret_cast<ptype*>(m_ptr);
    }
};

void a_function() 
{
    SmartPtr<int> pInt(new int);
    *pInt = 2;
    (*pInt)++;
    printf("%d\n", *pInt);
}

int main() 
{
    a_function();
}

To ensure seamless use of this smart pointer, we would like to retain the usual semantics of the dereference and pointer member access operators (* and -> respectively) through operator overloading. By defining the SmartPtr as a template class, we allow it to take any type you can throw at it.

Using it is pretty simple too. If you take look at the a_function() procedure, you will see that the SmartPtr class is used as a local variable. Hence, by being allocated on the stack frame of the function, the deletion of this variable is guaranteed when we exit the scope of this function.

Now that we have a basic idea of how to make a smart pointer, we have to think about the common operations we can do on a pointer and how these operations can be seamlessly implemented in the SmartPtr class. A lot of these operations tend to scatter the pointer around the program code, so we have to find a way to count the references to the pointer.

#include <stdio.h>
#include <stdlib.h>

static int _inner_ptr_id = 0;

// structure to hold a reference counter for a pointer
typedef struct  
{
    void* m_rawPtr; // the pointer to the data
    int m_refCount; // number of SmartPtr referencing this inner pointer
    int m_id; // id of this inner pointer
} SmartPtrAttr;

// SmartPtr class - holds a reference to an inner pointer
template<class T>
class SmartPtr 
{
private:
    static int next_id;

    SmartPtrAttr* m_ptr;
    int m_id;

public:
    // ctor
    SmartPtr(T* pRaw) 
    {
        m_id = SmartPtr::next_id++;
        m_ptr = new SmartPtrAttr;
        m_ptr->m_rawPtr = pRaw;
        m_ptr->m_refCount = 1;
        m_ptr->m_id = _inner_ptr_id++;
        printf("SmartPtr(%d)::ctor()\n", m_id);
        printf("..inner_ptr(%d) ref count: %d\n", m_ptr->m_id, 
            m_ptr->m_refCount);
    }

    // copy ctor
    SmartPtr(const SmartPtr& copy) 
    {
        m_ptr = copy.m_ptr;
        m_ptr->m_refCount++;

        m_id = SmartPtr::next_id++;
        printf("SmartPtr(%d)::copy_ctor()\n", m_id);
        printf("..inner_ptr(%d) ref count: %d\n", m_ptr->m_id, 
            m_ptr->m_refCount);
    }

    // dtor
    ~SmartPtr() 
    {
        printf("SmartPtr(%d)::dtor()\n", m_id);
        if (m_ptr) {
            m_ptr->m_refCount--;
            printf("..inner_ptr(%d) ref count: %d\n", m_ptr->m_id, 
                m_ptr->m_refCount);
            if(m_ptr->m_refCount == 0) {
                delete reinterpret_cast<T*>(m_ptr->m_rawPtr);
                m_ptr->m_rawPtr = 0;

                delete m_ptr;
            }
            m_ptr = 0;
        }
    }

    // member access operator overload
    T* operator->() 
    {
        return m_ptr ? reinterpret_cast<T*>(m_ptr->m_rawPtr) : 0;
    }

    // dereference the pointer
    T& operator*() 
    {
        return *reinterpret_cast<T*>(m_ptr->m_rawPtr);
    }

    // copy assignment operator overload
    //   copies the pointer over, incrementing the reference count of the 
    //   source pointer and decrementing the refcount of the dest pointer.
    SmartPtr& operator=(const SmartPtr& other) 
    {
        printf("SmartPtr(%d)::operator=()\n", m_id);
        SmartPtrAttr* pTemp = m_ptr;
        if(m_ptr) {
            m_ptr->m_refCount--;
            printf("..inner_ptr(%d) ref count: %d\n", 
                m_ptr->m_id, m_ptr->m_refCount);
        }

        m_ptr = other.m_ptr;
        if(m_ptr) {
            m_ptr->m_refCount++;
            printf("..inner_ptr(%d) ref count: %d\n", 
                m_ptr->m_id, m_ptr->m_refCount);
        }

        if(pTemp->m_refCount==0){
            delete reinterpret_cast<T*>(pTemp->m_rawPtr);
            delete pTemp;
        }
        return *this;
    }
};

// some arbitrary class for demonstration
class Object 
{
private:
    static int next_id;
    int m_value;
    int m_id;
public:
    Object() 
    {
        m_id = Object::next_id++;
        m_value=0;
        printf("Object(%d)::ctor()\n", m_id);
    }

    ~Object() 
    {
        printf("Object(%d)::dtor()\n", m_id);
    }

    void Increment() { m_value++; }
    void Set(int value) { m_value = value; }
    void PrintValue() { printf("Object(%d)'s value: %d\n", m_id, m_value); }
};

// static variable initialization
template<class T> int SmartPtr<T>::next_id = 0;
int Object::next_id = 0;

#define PRINT_AND_EVAL(X) printf("\n %s\n",#X); X

// main entry point
int main(int argc, char** argv) 
{
    // normal ctor
    PRINT_AND_EVAL(SmartPtr<Object> pObject0(new Object()));

    // copy ctor
    PRINT_AND_EVAL(SmartPtr<Object> pObject1 = pObject0);

    // normal ctor
    PRINT_AND_EVAL(SmartPtr<Object> pObject2(new Object()));

    // copy assignment
    PRINT_AND_EVAL(pObject1 = pObject2);
    PRINT_AND_EVAL(pObject2 = pObject0);

    return 0;
}

Instead of having the SmartPtr to store the raw pointer to the data, we instead make it store a data structure (SmartPtrAttr) containing that raw pointer and the reference count. That way, when we pass the pointer around by the assignment operator (=), or through the copy constructor, we can update the reference count appropriately. If the reference count drops to 0 at any point, it will delete itself. This is as primitive as garbage collection can get.

For the purpose of the demonstration, each smart pointer, object and raw pointer has its own associated ID, so when you run the code above, you can keep track of the creation and destruction of the pointers.

Of course, for production purposes, there's really no need to reinvent the wheel. The modern C++ have provisions for smart pointers, defined in the <memory> header file (check out <memory> in cppreference). Being passionately curious, I'd just like to get my hands dirty to find out how some things work.

Leave a comment