Floating point performance of the Raspberry Pi

One of the things I always like to do with any new computer I get is benchmark it by running computationally intensive simulation codes and comparing the results against other machines I have. So obviously I want to do the same with my Raspberry Pi. The interesting thing here is that the kinds of codes I typically run are highly floating point intensive, and there are currenly issues with floating point performance with most of the linux images currently available for the Pi.

 

There are several different codes I like to run, in several different languages. But it makes sense to start with a C program, as you can usually rely on C code to provide a fairly reliable indication of the underlying hardware capability. So for this post I will use the code “gibbs.c“:

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
void main(){
  int N=50000;
  int thin=1000;
  int i,j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  double x=0;
  double y=0;
  printf("Iter x y\n");
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2));
    }
    printf("%d %f %f\n",i,x,y);
  }
}

described in this post over on my main blog. This code requires the GSL, which can be installed on any Debian distro (or Ubuntu) with

% sudo apt-get install gsl-bin libgsl0-dev

 

Then the code can be compiled and run with:

% gcc -O3 gibbs.c -lm -lgsl -lgslcblas
% time ./a.out > /dev/null

So, this code takes around 8.5 seconds to run on my super-powerful Intel i7 laptop. However, it’s important to note that this is a very expensive high performance “mobile workstation” bought specifically for its floating point performance. A possibly more meaningful comparator for the Pi is my Asus EeePC 1000HE Intel Atom based netbook. The code takes 56 seconds to run on that. So, how does the Pi compare? It depends…

 

Running on the Debian wheezy beta image, it takes 27 minutes when compiled and run as above. However, adding the -mfloat-abi=softfp flag to gcc improves this somewhat, reducing the time to 20 minutes. But this is still fairly pathetic, even compared to my netbook. But as discussed in my previous post on this blog, the real problem here is that the standard images don’t have proper hard-float support. My understanding is that the softfp flag causes the actual floating point to get carried out on the hardware, but via a software layer which slows things down considerably.

The Raspbian image does have does have proper hard-float support, so compiling and running as above but on a Raspbian image reduces the run-time to 2 minutes 10 seconds (130 seconds) – almost 10 times faster than on an image without hard-float support! This isn’t quite as fast as my netbook, but at least it is now the same order of magnitude, which is what I would hope and expect.

Obviously, “slower than netbook” might sound a bit disappointing, but remember that the Pi costs around 20 pounds and uses around 2.5 Watts of power. So for 2k pounds you could buy 100 Pis and they would together consume around 250 Watts – and this is what you should be comparing against my fancy laptop. Then the comparison seems pretty good. A straight comparison isn’t quite fair, as my laptop is quad-core, but still, the bang-for-buck is at least in the same kind of ballpark, if not a bit better, which is good to know.

Just to be clear, I don’t actually think that the Pi is about to revolutionise the world of high-performance computing. Serious HPC people will know that there are many interesting developments in that area based around novel many-core architectures – my fancy laptop is not an efficient way to do serious computation, either. The point I’m really trying to make is that it is perfectly possible to use the Pi for serious computing, and I do think that a small cluster of Pis (known as a “bramble“) will be a great and cheap way to learn about parallel, distributed, and scalable computing. If you can figure out how to process gigabytes of data on a “bramble”, you have probably gone a long way towards figuring out how to process petabytes of data in The Cloud.

I’m expecting my second Pi soon (got one from RS and expecting one from Farnell), but I’ll wait until the backlog is cleared before I try and get any more…

Advertisements

Published by

darrenjw

I am Professor of Stochastic Modelling within the School of Mathematics & Statistics at Newcastle University, UK. I am also a computational systems biologist.

13 thoughts on “Floating point performance of the Raspberry Pi”

  1. UPDATE! The new official Raspbian image is out, which ships with gcc 4.6 by default. I didn’t mention in the post, but I was actually using gcc 4.7 on the experimental raspbian image. Anyway, using the default gcc on the new raspbian image, the code takes 2 minutes 45 seconds using the -O3 flag, and 2 minutes 30 seconds using the -O2 flag. That is, the new official image is a bit slower than the experimental image, but the differing gcc version could explain it. Either way, it’s still an order of magnitude faster than the wheezy beta…

      1. ok thank please do i need a flag in compilatiion or some configuraton to use the floating point unit to improve performance of my program (i’m working on facedetection using opencv and c++) ….

  2. Hello, I am trying to find out why gsl/spmatrix.h is not included on the raspian jessie with the latest updates of gsl-bin?

    I was able to get you example to work but I am trying to deal with gibbs and sparse matrices.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s