From Java to C++ – Copy&Move semantics

In my opinion, in Java we do not care that much about object creation and then assigning them to other variables. In general, due to GC inner-workings, we’re more reluctant in this area, than C++ programmers. However, that’s not the only reason why I’ve decided to write this post. Recently a survey result appeared in my social feed claiming, that move semantics and understanding object lifecycle is not a piece of cake. In order to cover all my bases, I’ve decided to dive into the topic a little deeper. Here we go.

DISCLAIMER! This article was written by me, but later, it was reviewed by an experienced C++ programmer/trainer. The parts where something was changed (or a comment was added), are written in blue. The conclusion is, that this post is quite trustworthy now 😉

Copy semantics

I’m mentioning copy semantics here, mostly for the sake of completeness, and to introduce some working code example. In fact, it works exactly the same in Java. Every time we assign an already existing object to the new one (or even existing one), a shallow copy is performed. Below code snippet in Java:

class MyObject {
  int i = 5;
  String s = "test";
}


MyObject obj1 = new MyObject();
MyObject obj2 = obj1;

obj1.i = 9;

System.out.println("Obj1: " + obj1.i);
System.out.println("Obj2: " + obj2.i);

Will print:

Obj1: 9
Obj2: 9

For the C++ equivalent we can use this code (code used here was reviewed so it’s valid):

#include <memory>
#include <string>
#include <iostream>

struct MyObject {
    int i = 5;
    std::string s = "init";
};

int main()
{       
    auto obj1 = std::make_shared<MyObject>();
    auto obj2 = obj1;

    obj1->i = 9;
    obj1->s = "new";

    std::cout << "obj1: " << obj1->i << ", " << obj1->s << '\n';
    std::cout << "obj2: " << obj2->i << ", " << obj2->s << '\n';
    
}

And the output is this:

obj1: 9, new
obj2: 9, new

Passing objects as params – copy it is

When an object is passed to the function/method, as long as we don’t operate on references or pointers, a shallow copy of the passed object is created, and passed as na argument. Take a look at the following code:

#include <string>
#include <iostream>

struct MyObject {
    int i = 5;
    std::string s = "init";
};


void changeObject(MyObject obj)
{
    obj.i = 6;
    obj.s = "changed in function";
}


int main()
{        
   MyObject obj1;
   MyObject obj2 = obj1;

   std::cout << "obj1: " << obj1.i << ", " << obj1.s << '\n';
   std::cout << "obj2: " << obj2.i << ", " << obj2.s << '\n'; 

   changeObject(obj1);

   std::cout << "obj1: " << obj1.i << ", " << obj1.s << '\n';
   std::cout << "obj2: " << obj2.i << ", " << obj2.s << '\n';       
} 

Which produces this output:

obj1: 5, init
obj2: 5, init
Inside the function 6 changed in function
obj1: 5, init
obj2: 5, init

Constructor for one?

So far we saw nothing new. In exactly the same way works a custom constructor (called copy constructor), that takes an object as a parameter. It’s up to the programmer how this constructor will create a new object – it can be either simple shallow copy, or even a deep copy:

struct MyObject {
    int i = 5;
    std::string s = "init";
    MyObject(const MyObject& obj) {
        // This is copy constructor, and we can do whatever we want
        // However, passed reference is CONST - no changes can be done to it
        // obj.i = 6; This won't compile
    }
};

Gimme some equal copy!

You may say – again – nothing new under the sun. Been there, done that. However, C++ has some more magic to offer than Java. In general, it is possible in C++ to override not only the class methods functions, but also operators used on it! That’s something you can’t see in Java (maybe I should add – yet). Overriding an operator is simple – technically it’s like overriding a method. However! With a great power, comes great responsibility! Let’s see how it may go wrong.

int main()
{
   MyObject obj1;
   MyObject obj2 = obj1;

   std::cout << "Obj1: " << obj1.s << std::endl;
   std::cout << "Obj2: " << obj2.s << std::endl;
   std::cout << "-----" << std::endl;

   MyObject obj3; // New object created
   obj3.s = "b"; // New value assigned to the string

   obj3 = obj1; // Here is the main part!!

   std::cout << "Obj1: " << obj1.s << std::endl;
   std::cout << "Obj2: " << obj2.s << std::endl;
   std::cout << "Obj3: " << obj3.s << std::endl;

   return 0;
}
 

The output is as predicted:

Obj1: init
Obj2: init
-----
Obj1: init
Obj2: init
Obj3: init

The question is – what happens under the hood? In our situation, every new object has its own instance (and therefore dynamically allocated memory) of s. With the default behaviour of the copy assignment, we’re making only a shallow copy of the string instance when assigning obj1 to obj3! The memory allocated for the string instance in the obj3 leaked! That’s something we have to pay attention to, when (re)assigning objects around! In Java, GC process will take care of the lost memory, in C++ – no way.

So how do we fix this? General rule of thumb is that we need to override the assignment operator, and before we make all the assignments of data, we have to make sure, that we free all the memory we’ve allocated in the object. The solution in our example should look like this:

struct MyObject {
    int i = 5;
    std::string s = "init";

    MyObject& operator=(const MyObject& objToCopyFrom) {
        if( this == &objToCopyFrom) return *this;  // to avoid additional work

        // should here be any kind of cleanup of instance S?
        s = std::string(objToCopyFrom.s);
        i = objToCopyFrom.i;
        return *this;
    }
};

So what’s the default behaviour?

In general, the logic standing behind automatic generation of copy operator and copy constructor is quite robust, and caution is advised while using them. Two objects after copying should not alter their states (as we saw above), and also should not leave any of the objects in the inconsistent state. To finish this topic it is worth mentioning, that there’s a possibility to explicitly indicate, that we’re fine (or not) with default implementations generated by the compiler. Here are the examples:

MyObject(const MyObject&) = default;    // Yes, I'm happy with default implementation
MyObject(const MyObject&) = delete;     // I'm not happy with defaults - remove the auto-generation and raise compiler error when such situation occurs

Move semantics

With the previous chapter we’ve introduced a foundation for explaining more advanced concept, which is move semantics. To start with (and I was surprised to learn that), move semantics wasn’t present in C++ before C++11! So pay attention to that fact. Second thing that we have to start with, is the difference between lvalue and rvalue

Lvalue vs Rvalue

Here, I would use a direct quote from ‘C++ Crash Course’:

We’ll consider a very simplified view of value categories. For now, you’ll just need a general understanding of lvalues and rvalues. An lvalue is any value that has a name, and an rvalue is anything that isn’t an lvalue.

To make it easier – here are the usual things that are rvalues – temporary objects, literal constants, function return values (unless they’re lvalues passed as params) and usually results of built-in operators. Also lvalues that were parameters to std::move().

I know that it might sound like explaining the unknown through unknown, however, I think that simple code example (taken from the same book) would explain the concept:

#include <cstdio>

void ref_type(int &x) {
    printf("lvalue reference %d\n", x);
}

void ref_type(int &&x) {
    printf("rvalue reference %d\n", x);
}

int main() {
    auto x = 1;
    ref_type(x);
    ref_type(2);
    ref_type(x + 2);
}

The output is:

lvalue reference 1
rvalue reference 2
rvalue reference 3

The idea behind move semantics

Why this concept is important? Let’s go back to the example used in the previous subchapter. Our code there, had an string inside. Imagine for a moment, that this string holds a large amount of chars, like millions or so. Copying all this data back and forth would be a tremendous waste of both – memory and time. What is more, quite often, when we assign an object to the new reference (or we pass it as a constructor parameter), we actually don’t care about the source object anymore. We just need new reference to hold the data, and underlying memory does not bother us, as we want to discard the source object. That’s the perfect situation when move semantics can be used.

Enough talk, show me the code. Let’s revisit an example already shown above.

#include <iostream>

int main()
{

    MyObject obj1;
    MyObject obj2 = obj1;

    std::cout << "Obj1: " << obj1.s << std::endl;
    std::cout << "Obj2: " << obj2.s << std::endl;
    std::cout << "-----" << std::endl;

    MyObject obj3;
    obj3.s = "b";

    obj3 = obj1;

    std::cout << "Obj1: " << obj1.s << std::endl;
    std::cout << "Obj2: " << obj2.s << std::endl;
    std::cout << "Obj3: " << obj3.s << std::endl;

    return 0; }

The problem we had before was, that after we’ve assigned obj1 to obj2, we wanted to use both of them later. Or more precisely, we wanted to use the objects that were referenced by obj1 and obj2. As I’ve mentioned above, quite often we don’t need that. We just want to perform a copy of one object (so we’re interested in its values), but we don’t care that much about the remaining reference. There are two ways to achieve that, similarly to the copying above – through move constructor or move assignment. What’s the difference between them, and their copying counterparts? Just one – usage of rvalues. As usual – code speaks thousands words.

// I present only the new methods, fields of the class are public for readability

MyObject(MyObject&& objToMove) noexcept {  // Notice lack of 'const', && operator and noexcept

    // Assignments as in previous examples

    // here we clean up the source object
    objToMove.s = nullptr;
    objToMove.i = 0;
}

MyObject& operator=(MyObject&& objToMove) noexcept {  // Notice lack of 'const', && operator and noexcept

    // Assignments as in previous examples

    // here we clean up the source object
    objToMove.s = nullptr;
    objToMove.i = 0;

    return *this;
}

We have a couple of changes here that we need to look at.

  • noexcept – in general we don’t expect these methods to throw any kind of exceptions. We cannibalize the source object, and all the operations are either simple copies of values or reassignment of memory addresses.
  • no const – pay attention to that! As we’re ’emptying’ the source object, we must be able to actually change its state. Therefore we cannot use const for the parameter.
  • emptying source object – already mentioned this one. In general, we leave the source object ‘in statu nascendi’, which means it is raw/virgin state. There’s no problem with reusing existing reference to assign a new object to it, however the reference itself at this time is ’empty’.

The last point above can be a source of the problems. The programmer must always remember to clean up the source object, in order to avoid nasty runtime errors (like double-free ones). Therefore, when possible, try to reuse existing move constructors/operators, or use

std::exchange

function, that performs move operation, but also nulls the source object.

Ok, but how I get rvalue?

That’s the valid point. In general, if we create a new object, and at the same time we pass it to the constructor (or assign it) we should be fine. However, the examples above were showing assigning already existing object/reference (so – lvalue). To help us with move semantics C++ introduced a function in the STD called move. Its purpose is to cast any lvalue to rvalue, and therefore to use existing references in the move semantics. An example from ‘C++ Crash Course’:

#include <cstdio>
#include <utility>

void ref_type(int &x) {
    printf("lvalue reference %d\n", x);
}

void ref_type(int &&x) {
    printf("rvalue reference %d\n", x);
}

int main() {
    auto x = 1;
    ref_type(std::move(x));
    ref_type(2);
    ref_type(x + 2);
}

The output is:

rvalue reference 1
rvalue reference 2
rvalue reference 3

Summary

Copy and move semantics are crucial to understand, in order to properly interact with objects, and understand their lifecycle. Failing to do so, can result in nasty runtime bugs and possible memory leaks. This topic has its own section in the core guidelines, that author of the language – Bjarne Stroustrup – created and published for everyone to benefit from. So please, take a look at them too.

SOURCES:

You Might Also Like

Leave a Reply

Back to top