2012-05-08-1 CPlusPlus11

Changes to old habits - you need to move to!

With moving available the old habit of always passing non native types as const to ref is not necessary any more. Moving makes call by value and return by value good alternatives.

From my experience return by value was something that was used a lot although the overhead of a copy. Partly because functions are easier to understand that way, return values in parameters is not taught as a good design, and partly because of the return value optimization sometimes got away without a copy anyway and still do!

With C++11 we can safely use call by value when we want that even for complex types. So we should write code like this from now on:

   string manipulate(string s)
   {
      <... perform some manipulation on s ...>
      return s;
   }

Since we are going to change the value on the string we might as well use call by value and change that. The modified value is then returned. The move ctor will kick in an see to that no unnecessary copy is needed.

2012-05-08 CPlusPlus11

Still moving on

One obvious place for moving objects instead of copying is when making a swap. It is a common operation so there is even a library function for that, std::swap. When used with types that implements the move-ctor and move assignment operator it looks like this.

First we define the move members to illustrate our example.

   ...
   Class(Class&& c) { cout << "Class&& ctor" << endl; }
   Class& operator=(Class&& c) { cout << "Class&& operator=" << endl; return *this; }
   ...

Running this code

   Class c1;
   Class c2;
   swap(c1, c2);

Produces this output:

   Class&& ctor
   Class&& operator=
   Class&& operator=

This tells us that first one of the objects is moved somewhere. A temporary lvalue to hold the value as party of the swap. Then there are two move assignments to get the values back in their new place.

If we would implement this ourselves it could look like this:

   Class temp(move(c1));
   c1 = move(c2);
   c2 = move(temp);

This introduces the new function std::move. It can look a little strange but its job is no more and no less than to typecast its argument into an rvalue reference. This will enable the use of the move members and the output is the same as the example above.

The move feature has been implemented in the standard library containers. So code like this:

   vector <Class> v;
   v.push_back(Class());

will use the move ctor when the new Class object is pushed into the vector. This was legal code before C++11 so this is an example where old code will benefit from the new features without any changes.

2012-05-09 CPlusPlus11

The future of threading

A major concern these days is how to handle all these cores we have available everywhere. How to program them? C++11 tries to address this problem by including support for threads in the standard library.

The basic stuff for starting and joining threads and synchronization with the help of different types of mutexes is there. Support for low level atomic operations is there to. We might come back to those issues later. Now we will look at the high level support provided by futures and async.

The basic functionality is this. The function async takes nearly anything executable and returns a future object. When you want to have the result from the async call you call the get method on the future object.

   auto f = async(doIt);
   ...
   auto resultFromDoIt = f.get();

Very simple in its basic form. But there is more...

The async execution can be controlled in more detail by specifying a launch policy. It can be launched either async or deferred. If it is called with async policy it will be started asynchronously if it can. You'll get an exception otherwise.

   auto f = async(launch::async, doIt);

When launch::deferred is used it is guaranteed that the function wont run until get is called. What good this is for is another question. Debugging could be one use of it.

Besides the get method there are also methods to poll for a result.

I also said async could be called with almost anything that could be executed. It can take functions, member functions and lambdas. You can pass arguments to the functions.

Be careful though when passing arguments so that they don't go out of scope before the asynchronous computation is finished. All these types of problems are of course still relevant for asynch. Passing arguments by value is one solution to that problem.

2012-05-11 CPlusPlus11

Unrestricted Unions

Unions is somewhat of a white area on the C++ map for me. I can't remember that I have ever used a union in a C++ program. Nevertheless C++11 have introduced a way to improve unions.

What it is all about is that before C++11 unions could not have members that had non-trival constructors. So if you had a union like this

   union U
   {
      int i;
      Class c;
   };

and class Class had a non-trivial constructor, it would be illegal. In C++11 this restriction is lifted and the code is legal.

2012-05-18 Computers

Running single tests with CppUnit?

As a C++ programmer you are probably used to, and have accepted, the verbose code you have to write in order to set up a test case with CppUnit. C++ does not have the flexibility like other languages that can know about test cases by naming conventions etc and then set up almost everything automatically. In C++ you need to do most of that yourself by coding it.

So once upon a time you set up the boiler plate code from the CppUnit Cookbook to get the framework for your unit tests installed. Your main test program could very well look like this.

#include <cppunit/extensions/TestFactoryRegistry.h>
#include <cppunit/ui/text/TestRunner.h>

int main(int argc, char **argv)
{
  CppUnit::TextUi::TestRunner runner;
  CppUnit::TestFactoryRegistry &registry = CppUnit::TestFactoryRegistry::getRegistry();
  runner.addTest(registry.makeTest());
  bool wasSuccessful = runner.run("", false);
  return wasSuccessful;
}

This is what it takes to run your test cases using the TextUi::TestRunner?. At that time you had a small set of tests but, as the project continued, more tests were added. And now after some time, if you did your homework right, your test suite is anything from small. In fact it is starting to take some time to execute. To long time. When you are developing new test cases it has become a bottle neck in the test, code and refactor loop.

So you would like to limit the number of test cases you run in order to get up to speed. Can it be done? Yes!

The solution lies in that old code you picked from the Cookbook so long ago you hardly remember it. Maybe you, like me, lived long enough with it to assume that it is so it must be in C++ being such a verbose and non dynamic language as it is. Maybe you, like me, have used comments or preprocessor constructs to hide tests. Was I wrong! Single tests can be run. It has been there all the time at my finger tips. If you look carefully at the Cookbook code you'll see that the first parameter of the run method is an empty string. That string parameter is actually the name of the test or the test suite to run! The empty string is only the special case of running all tests!

Now equipped with this knowledge it is easy to change the code to allow us to specify what test to run. The most straight forward is to use the first arg, if present, to the unit test program to be the test case to run. Like this:

  ...
  bool wasSuccessful = runner.run((argc > 1) ? argv[1] : "", false);
  ...

2012-05-25 Computers

Automatic vectorization

Just the other day I saw a video that presented a new feature in Visual Studio 2011 called auto vectorization. This made me curious about the feature and whether it was available in the tools I have access to, e.g. g++.

The short answer is yes and if you use the optimization level 3, compilation flag -O3, you get this. It seems like the situation is the same for Visual Studio 2011. You get this automatically. So that is fine and end of story if you like. If you want some more details please read on.

How does this optimization work?

The idea is to use the CPU registers introduced with technologies such as MMX or SSE. These vector registers can hold multiple scalar values and perform operations on them. That is a situation that occurs in a loop going over a vector.

int a[SIZE], b[SIZE], c[SIZE];
...
for (i=0; i<SIZE; i++)
{
   a[i] = b[i] + c[i];
}

In the code fragment above the arithmetic in the loop can be vectorized meaning that the operations for more than one index can be performed in parallel using the vector registers. It is up to the compiler to analyse the code to figure out that the optimization can be applied.

How to use it

With gcc this optimization if on by default from optimization level 3, compilation flag -O3. You can also turn it on by using the flag, -ftree-vectorize. Note however that you also need to turn on the use of the vector registers. On my x86 machine I need to add the compiler flag -msse2.

There is however another useful flag, -ftree-vectorizer-verbose. This causes the compiler to tell you if it has found a loop that it is able to optimize. This is good since compiler optimization is like a black box. You either have to run the program and measure it or analyse the assembly code to understand if the optimization took place. A verbose message is a great help in that situation so you know you are on the right track.

Will my program run faster?

This must be tested of course. So I came up with this test program.

// Vectorization

#include <chrono>
#include <iostream>

using namespace std;

const int SIZE=2048;
int a[SIZE], b[SIZE], c[SIZE];

void foo ()
{
   for (int j=0; j<1000000; j++)
   {
      for (int i=0; i<SIZE; i++)
      {
         a[i] = 0;
         b[i] = 10;
         c[i] = 100;
      }

      for (int i=0; i<SIZE; i++)
      {
         a[i] = b[i] + c[i];
      }
   }
}

int main()
{
   auto start = chrono::steady_clock::now();

   foo();

   auto diff = chrono::steady_clock::now() - start;
   auto ms = chrono::duration_cast<chrono::milliseconds>(diff);
   cout << "It took " << ms.count() << endl;
}

In order to get some measurable execution time I had to loop over the loop one million times. The optimized version ran in 1122 milliseconds while the unoptimized program ran in 4424 milliseconds. That is something like a speed up of 4 times. This is due to that on the current machine the vector registers can store four integers at once. A vector register is 128 bits and an int is 32 bits.

Summary

Vectorization is an optimization technique that can be used automatically by a smart compiler. It is also easy to understand how it works. Programs that will benefit from it uses loops to manipulate data.

The optimization uses mechanisms that is on the CPU within the a single core. It does illustrate though how you can speed up execution by using parallelism and it helps in understanding the features in C++ AMP, Accelerated Massive Parallelism which I hope to be able to show more about in the future.

2012-05-29 Power failure

Power failure during the afternoon. Now power is back and we are online again.

Below is the report from Fortum.

Avslutade avbrott det senaste dygnet
2012-05-29 15:23
Strömavbrott i Skärholmen.
Sluttid: 2012-05-29 18:18.
Maximalt 349 berörda kundanläggningar.

Page Collection for ^2012-05