Bug: 372344643

Clone this repo:
  1. 6b1f0df Upgrade clpeak to 1.1.4 by Sadaf Ebrahimi · 6 days ago main
  2. c8b3bd4 Add janitors to the OWNERS file by Sadaf Ebrahimi · 5 weeks ago
  3. 1cd03ba Explicitly link against the OpenCL ICD by Jeremy Kemp · 7 months ago android15-automotiveos-dev android15-qpr1-release android15-qpr1-s3-release android15-qpr1-s4-release android15-qpr1-s5-release aml_ads_351121120 aml_ase_351112060 aml_ase_351114000 aml_cbr_351111000 aml_con_351110000 aml_doc_351113060 aml_ext_351122080 aml_ext_351312060 aml_hef_351120040 aml_ips_351111040 aml_mpr_351113060 aml_mpr_351113100 aml_net_351111100 aml_net_351111140 aml_odp_351121040 aml_per_351112280 aml_per_351112300 aml_res_351111020 aml_sdk_351110000 aml_sta_351110040 aml_tet_351110060 aml_wif_351110060 android-15.0.0_r10 android-15.0.0_r11 android-15.0.0_r12 android-15.0.0_r13 android-15.0.0_r6 android-15.0.0_r7 android-15.0.0_r8 android-15.0.0_r9
  4. 98f5b26 Merge remote-tracking branch 'origin/third-party-review' by Roman Yepishev · 7 months ago
  5. adb7dff Third-Party Import of: https://github.com/krrishnarraj/clpeak by Jeremy Kemp · 8 months ago

clpeak

Build Status Snap Status

A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case

Building

git submodule update --init --recursive --remote
mkdir build
cd build
cmake ..
cmake --build .

Sample

Platform: NVIDIA CUDA
  Device: Tesla V100-SXM2-16GB
    Driver version  : 390.77 (Linux x64)
    Compute units   : 80
    Clock frequency : 1530 MHz

    Global memory bandwidth (GBPS)
      float   : 767.48
      float2  : 810.81
      float4  : 843.06
      float8  : 726.12
      float16 : 735.98

    Single-precision compute (GFLOPS)
      float   : 15680.96
      float2  : 15674.50
      float4  : 15645.58
      float8  : 15583.27
      float16 : 15466.50

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 7859.49
      double2  : 7849.96
      double4  : 7832.96
      double8  : 7799.82
      double16 : 7740.88

    Integer compute (GIOPS)
      int   : 15653.47
      int2  : 15654.40
      int4  : 15655.21
      int8  : 15659.04
      int16 : 15608.65

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 10.64
      enqueueReadBuffer          : 11.92
      enqueueMapBuffer(for read) : 9.97
        memcpy from mapped ptr   : 8.62
      enqueueUnmap(after write)  : 11.04
        memcpy to mapped ptr     : 9.16

    Kernel launch latency : 7.22 us