除了使用 R 基函数之外，有没有一种有效的方法来获得“pmax”？内存强制重新编译更多内存和小的优化奖励 - NA_INTEGER

如何解决除了使用 R 基函数之外，有没有一种有效的方法来获得“pmax”？内存强制重新编译更多内存和小的优化奖励 - NA_INTEGER

我想使用 Rcpp 创建一个函数，该函数可以胜过 R base 中的 pmax 函数。我还尝试处理 Rcpp 函数内部的缺失值，这可能不是一个好主意。所有向量都必须有一些缺失值，并且它们都是正值。这就是我将缺失值重新编码为 -1 的原因，因此我可以将其添加回来，以防在所有值都缺失的情况下最大值不存在。

这是我第一次尝试，但还没有成功：

library("benchr")
library("Rcpp")

Pmax <- function(...) {
  argd_list <- list(...)
  cppFunction("
  NumericVector cpp_pmax(List args) {
    List args0 = args[0];
    int n_arg = args.length();
    int n_vec = args0.length();
    NumericVector out(n_vec);
    out = args[0];
    for (int i = 1; i < n_arg; ++i) {
        NumericVector pa(n_vec);
        pa = args[i];
        for (int j = 0; j < n_vec; ++j) {
          if (R_IsNA(out[j])) {
            out[j] = -1;
          }
          if (R_IsNA(pa[j])) {
            pa[j] = -1;
          }
          out[j] = std::max(out[j],pa[j]);
        }
    }
    for (int j = 0; j < n_vec; ++j) {
      if (out[j] == -1) {
        out[j] = NA_REAL;
      }
    }
    return out;
  }
")
  output <- cpp_pmax(argd_list)
  return(output)
}


n <- 200000
x1 <- sample(0:1,n,replace = TRUE)
y1 <- sample(0:1,replace = TRUE)
z1 <- sample(0:1,replace = TRUE)
x1[sample(1:n,90)]<-NA
y1[sample(1:n,60)]<-NA
z1[sample(1:n,70)]<-NA

pm1 <- Pmax(x1,y1,z1)
pm2 <- pmax(x1,z1,na.rm = TRUE)

all(pm1 == pm2)

benchr::benchmark(pmax(x1,na.rm = TRUE),Pmax(x1,z1))

Benchmark summary:
  Time units : milliseconds 
expr                           n.eval   min lw.qu median  mean up.qu   max total relative
pmax(x1,na.rm = TRUE)    100  1.34  1.37   1.39  1.44  1.46  1.74   144     1.00
Pmax(x1,z1)                  100 13.30 13.50  13.80 19.90 15.70 67.50  1990     9.88

编辑：

我删除了一些循环，只是在 Rcpp 之外用 NA 替换了 -1，它加快了一点，但仍然没有超过 R 基础 pmax。

虽然 Rcpp::pmax 是一个很好的实现，但它只处理两个向量，不确定它是否可以处理缺失值。当缺少值时，我得到了不同的结果。

第二次尝试是：

Pmax1 <- function(...) {
  args_list <- list(...)
  cppFunction("
  NumericVector cpp_pmax(List args) {
    List args0 = args[0];
    int n_arg = args.length();
    int n_vec = args0.length();
    NumericVector out(n_vec);
    out = args[0];
    for (int i = 1; i < n_arg; ++i) {
        NumericVector pa(n_vec);
        pa = args[i];
        for (int j = 0; j < n_vec; ++j) {
          if (R_IsNA(out[j])) {
            out[j] = -1;
          }
          if (R_IsNA(pa[j])) {
            pa[j] = -1;
          }
          out[j] = std::max(out[j],pa[j]);
        }
    }
    return out;
  }
")
  output <- cpp_pmax(args_list)
  output[output == -1] <- NA
  return(output)
}

Pmax2 <- function(...) {
  args_list <- list(...)
  cppFunction("
  NumericVector cpp_pmax(List args) {
    NumericVector out = args[0];
    int n_arg = args.length();
    int n_vec = out.length();
    for (int j = 0; j < n_vec; ++j) {
      if (NumericVector::is_na(out[j])) out[j] = -1;
    }
    for (int i = 1; i < n_arg; ++i) {
      NumericVector pa = args[i];
      for (int j = 0; j < n_vec; ++j) {
        if (NumericVector::is_na(pa[j])) pa[j] = -1;
        out[j] = std::max(out[j],pa[j]);
      }
    }
    return out;
  }
")
  output <- cpp_pmax(args_list)
  output[output == -1] <- NA
  return(output)
}

n <- 200000
x <- sample(0:5,replace = TRUE)
y <- sample(0:5,replace = TRUE)
z <- sample(0:5,replace = TRUE)
w <- sample(0:5,replace = TRUE)
x[sample(1:n,900)]<-NA
y[sample(1:n,600)]<-NA
z[sample(1:n,700)]<-NA
z[sample(1:n,800)]<-NA

benchr::benchmark(pmax(x,y,z,w,Pmax1(x,w),Pmax2(x,w))

Benchmark summary:
  Time units : milliseconds 
                          expr n.eval   min lw.qu median  mean up.qu  max total relative
pmax(x,na.rm = TRUE)    100  2.38  2.43   2.46  2.46  2.48  2.6   246     1.00
Pmax1(x,w)                 100 16.00 16.90  17.20 19.40 17.70 61.2  1940     6.98
Pmax2(x,w)                 100  9.44  9.74   9.90 11.30 10.10 45.6  1130     4.02

有没有人知道如何使它比 R 基础 pmax 更快？

我们的想法是有一个通用函数来处理不同数量的向量，所有这些都在 Rcpp 函数中。

更新基于@DirkEddelbuettel 和@Cole 的回答

感谢您帮助优化代码。受到@DirkEddelbuettel 和@Cole 回答的启发，我只是添加了 Rcpp::pmax 来删除循环之一，它也有助于加快速度。

library("bench")
library("Rcpp")

cppFunction("
  IntegerVector cpp_pmax1(List args) {
    IntegerVector tmp = args[0];
    IntegerVector out = clone(tmp);
    int n_arg = args.length();
    int n_vec = out.length();
    for (int i = 1; i < n_arg; ++i) {
      IntegerVector pa = args[i];
      for (int j = 0; j < n_vec; ++j) {
        if (pa[j] > out[j]) out[j] = pa[j];
      }
    }
    return out;
  }
")

cppFunction("
  IntegerVector cpp_pmax2(List args) {
    IntegerVector tmp = args[0];
    IntegerVector out = clone(tmp);
    int n_arg = args.length();
    int n_vec = out.length();
    for (int i = 1; i < n_arg; ++i) {
      IntegerVector pa = args[i];
      out = pmax(out,pa);
    }
    return out;
  }
")

Pmax1 <- function(...) {
  cpp_pmax1(list(...))
}


Pmax2 <- function(...) {
  cpp_pmax2(list(...))
}


n <- 200000
x <- sample(0:5,replace = TRUE)
k <- sample(0:5,900)] <- NA
y[sample(1:n,600)] <- NA
z[sample(1:n,700)] <- NA
w[sample(1:n,800)] <- NA
k[sample(1:n,800)] <- NA

pm0 <- pmax(x,k,na.rm = TRUE)
pm1 <- Pmax1(x,k)
pm2 <- Pmax2(x,k)

benchr::benchmark(pmax(x,k),k))


Benchmark summary:
  Time units : microseconds 
                             expr n.eval  min lw.qu median mean up.qu  max  total relative
pmax(x,na.rm = TRUE)    100 2880  2900   2920 3050  3080 8870 305000     5.10
Pmax1(x,k)                 100 2150  2180   2200 2310  2350 8060 231000     3.85
Pmax2(x,k)                 100  527   558    572  812   719 7870  81200     1.00

谢谢！

解决方法

顺便说一句，请注意 Rcpp 糖已经有 Rcpp::pmax()：

> library(Rcpp)
> cppFunction("NumericVector pm(NumericVector x,NumericVector y) { 
+              return pmax(x,y);}")
> pm(10.0*(1:10),rep(50,10))
 [1]  50  50  50  50  50  60  70  80  90 100
> pm(10.0*(1:10),c(rep(50,8),NA,50))
 [1]  50  50  50  50  50  60  70  80  NA 100
>

很可能还有其他更通用的功能的空间，但希望这也能作为基准对您有所帮助。

编辑： 在我的第一个版本中，当我打算调用 pmax()（使用 pm()）时，我不小心调用了 Rcpp::pmax()。结果是一样的。

pm() 和 pmax() 的速度与人们预期的大致相同，因为两者都是矢量化的：

> library(microbenchmark)
> set.seed(123)
> x <- cumsum(rnorm(1e6))
> y <- cumsum(rnorm(1e6))
> microbenchmark(pmax(x,y),pm(x,y))
Unit: milliseconds
       expr     min      lq    mean  median      uq      max neval cld
 pmax(x,y) 3.94342 4.07488 4.66378 4.15433 5.39961  7.81931   100   a
   pm(x,y) 3.58781 3.68886 4.74249 3.75815 5.38444 22.31268   100   a
>

我想您可以尝试 fcoalesce + fifelse（均来自 data.table 包）来定义您的 Pmax 函数，如下所示

Pmax <- function(...,na.rm = FALSE) {
  u <- list(...)
  if (na.rm) {
    return(
      Reduce(function(x,y) {
        x <- fcoalesce(x,y)
        y <- fcoalesce(y,x)
        fifelse(x <= y,y,x)
      },u)
    )
  }
  Reduce(function(x,y) fifelse(x <= y,x),u)
}

基准（使用 OP 帖子中的数据进行测试）

如果启用 na.rm = TRUE，Pmax 会比基本 R pmax 稍慢

> microbenchmark::microbenchmark(
+   pmax(x1,y1,z1,na.rm = TRUE),+   Pmax(x1,+   check = "equivalent",+   unit = "relati ..." ... [TRUNCATED]
Unit: relative
                           expr      min      lq     mean   median       uq
 pmax(x1,na.rm = TRUE) 1.000000 1.00000 1.000000 1.000000 1.000000
 Pmax(x1,na.rm = TRUE) 1.428545 1.87539 1.974959 2.022579 2.094833
      max neval
 1.000000   100
 1.387139   100

如果您使用默认的 na.rm 选项，您会发现 Pmax 比基本 R pmax 稍快

> microbenchmark::microbenchmark(
+   pmax(x1,z1),+   unit = "relative"
+ )
Unit: relative
             expr      min      lq     mean   median       uq      max neval
 pmax(x1,z1) 1.387953 1.32482 1.053983 1.220124 1.143867 0.266205   100
 Pmax(x1,z1) 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000   100

从 bench::mark 中可以看到的内存分配似乎存在一些问题。

bench::mark(pmax(x,z,w,Pmax2(x,w))

## # A tibble: 2 x 13
##   expression                         min  median `itr/sec` mem_alloc
##   <bch:expr>                     <bch:t> <bch:t>     <dbl> <bch:byt>
## 1 pmax(x,na.rm = TRUE)  5.79ms  6.28ms     157.    781.3KB
## 2 Pmax2(x,w)              39.56ms 54.48ms      19.7    9.18MB

内存强制

与基础 pmax() 相比，内存分配是 10 倍。您的 rcpp 相对直接，所以这暗示存在某种强制。在查看样本数据时，您将整数向量发送到数字签名。这会产生代价高昂的强制。让我们更新签名和代码以期待 IntegerVector。为此，我只是将所有内容从 NumericVector 更改为 IntegerVector。

  expression                         min  median `itr/sec` mem_alloc
  <bch:expr>                     <bch:t> <bch:t>     <dbl> <bch:byt>
1 pmax(x,na.rm = TRUE)  1.89ms  2.33ms     438.    781.3KB
2 Pmax2_int(x,w)          37.42ms 49.88ms      17.6    2.32MB

重新编译

OP 代码在较大的函数代码中包含 cppFunction。除非我们需要在每个循环中重新编译它，否则我们可以改为编译，然后从 R 中调用编译后的代码。这是此数据集大小的最大性能提升。

cppFunction("
  IntegerVector cpp_pmax_pre(List args) {
    IntegerVector out = args[0];
    int n_arg = args.length();
    int n_vec = out.length();
    for (int j = 0; j < n_vec; ++j) {
      if (IntegerVector::is_na(out[j])) out[j] = -1;
    }
    for (int i = 1; i < n_arg; ++i) {
      IntegerVector pa = args[i];
      for (int j = 0; j < n_vec; ++j) {
        if (IntegerVector::is_na(pa[j])) pa[j] = -1;
        out[j] = std::max(out[j],pa[j]);
      }
    }
    return out;
  }
")

Pmax2_int_pre <- function(...) {
  args_list <- list(...)
  output <- cpp_pmax_pre(args_list)
  output[output == -1] <- NA
  return(output)
}

bench::mark(pmax(x,Pmax2_int_pre(x,w))

## # A tibble: 2 x 13
##   expression                        min median `itr/sec` mem_alloc
##   <bch:expr>                     <bch:> <bch:>     <dbl> <bch:byt>
## 1 pmax(x,na.rm = TRUE) 2.31ms 2.42ms      397.   781.3KB
## 2 Pmax2_int_pre(x,w)      2.48ms 3.55ms      270.    2.29MB

奖励 - NA_INTEGER

根据Rcpp for Everyone，NA_INTEGER 应该等于-2147483648 的最小整数值。使用这个，我们可以删除 NA 的替换，因为 我们可以在处理 int 数据类型时直接与 NA 进行比较。

在这个实现过程中，我也发现了上一部分的一个问题——我们需要克隆初始参数，这样我们就不会意外地通过引用改变它。尽管如此，我们仍然比基础 pmax() 略快。

cppFunction("
  IntegerVector cpp_pmax_last(List args) {
    IntegerVector tmp = args[0];
    IntegerVector out = clone(tmp);
    int n_arg = args.length();
    int n_vec = out.length();
    for (int i = 1; i < n_arg; ++i) {
      IntegerVector pa = args[i];
      for (int j = 0; j < n_vec; ++j) {
        if (pa[j] > out[j]) out[j] = pa[j];
      }
    }
    return out;
  }
")

Pmax2_last <- function(...) {
  cpp_pmax_last(list(...))
}

bench::mark(pmax(x,Pmax2_last(x,w),)

## # A tibble: 2 x 13
##   expression                        min median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                     <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
## 1 pmax(x,na.rm = TRUE) 5.98ms 6.36ms      154.     781KB        0
## 2 Pmax2_last(x,w)         5.09ms 5.46ms      177.     784KB        0

除了使用 R 基函数之外，有没有一种有效的方法来获得“pmax”？ 内存强制重新编译更多内存和小的优化奖励 - NA_INTEGER

如何解决除了使用 R 基函数之外，有没有一种有效的方法来获得“pmax”？ 内存强制重新编译更多内存和小的优化奖励 - NA_INTEGER