R, OpenMP, MKL, disaster

Oct. 3, 2021

It all started off with a poor economist running this R code:

N = 520000
ALPHA = 0.01
y = rbinom(N,1,ALPHA)
mod = lm(y~1)
print(coefficients(mod))

The answer should be something close to ALPHA, i.e. 0.01. What R was printing out was weird numbers like 1e-10.

This was the evening before an important presentation. The economist had written the code snippet above after tearing his hairs out over wild and constantly changing regression results in his labour market study. He was out of hope. Society was incomprehensible. There was no point looking for regularities in it. But before giving up economics for good, he wanted to do a last check of his software stack. And hence this code, which by the simple logic of least-squares should give him 0.01. It was giving him 1e-10.

Now society may or may not be incomprehensible. But computers surely could not be allowed to break the laws of math. Something had to be done.

The only thing happening in the code above had to be a call to some BLAS/LAPACK equation solver. So our economist checked his BLAS. He discovered that at some point he had set Intel’s MKL as the default BLAS on his system in order to wring the most juice out of his middling laptop. He switched it back to OpenBLAS and the code above started working, his real regression began giving results, and he could give his presentation the next day.

But over the weekend the thought kept nagging at him that Intel used MKL to show off the performance of its processors. Surely it would not have a basic bug. And if it did have one it would be all over the Internet. Surely it was he who was doing something wrong.

Turned out that he had committed the basic computing sin. He had not read the manual. He was using a version of the MKL library that made threading and architecture choices at runtime (libmkl_rt.so). And the Intel manual said that if you were using this version of the library with a multithreaded program compiled with gcc you had to set the environment variable MKL_THREADING_LAYER=GNU. The economist set the variable, and lo, his R code worked right even with MKL.

Still, there was a problem. Every R user knows, and bemoans the fact, that R is not multi-threaded. The Intel documentation said that for single-threaded programs the environment variable should not matter. And when the economist called BLAS routines from his own single-threaded programs they would work fine with MKL without any environment variables set.

So the economist stared and stared at R’s linker and compiler options. And he saw -fopenmp. OpenMP, was the culprit! At just a couple of places in the R codebase OpenMP is used and that was messing up MKL.

Finally, the economist could have his own minimal reproducible example independent of R. Call it minrep.c:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <cblas.h>

#define N 480000
#define SCALE 0.01

int main()
{
  double *x;
  int failures;
  
  x = malloc(sizeof(double)*N);
  if (x==NULL){
    fprintf(stderr,"Memory allocation failed\n");
    exit(1);
  }
  
  #pragma omp parallel for
  for (int i=0;i<N;i++)
    x[i] = 1.0;
  
  cblas_dscal(N,SCALE,x,1);

  failures = 0;
  for (int i=0;i<N;i++){
    if (fabs(x[i]-SCALE)>1e-13)
      failures++;
  }

  printf("%d failures\n",failures);

  return !(failures==0);
}

Compile this with

gcc -o minrep -O2 -g -fopenmp minrep.c -lblas -lm

and with MKL as your BLAS you will get errors or not depending on whether MKL_THREADING_LAYER=GNU is set. Remove the OpenMP pragma and the errors go away even if you compile with -fopenmp.

The economist wanted to dig into why Intel and GNU’s OpenMP runtimes cannot happily coexit, but he has to go back to being an economist tomorrow, so the investigation has to be closed here.

As a good citizen he wanted to file a documentation bug to have this behaviour documented. But R’s bug tracker seems not to be open to the public. So the story has to be recorded here for Google to find.