scs: ./out/demo_socp_gpu fails to solve its problem

Specifications

  • OS: Arch Linux
  • SCS Version: master at 5be0e1684d12c4cfd4d22c5fba236a84a092ab5b
  • Compiler: gcc

Description

scs fails at solving ./out/demo_socp_gpu 1000 0.5 0.5 1

How to reproduce

linking against julia openblas:

JULIA_HOME="/opt/julias/julia-1.6"
JULIA_LD_PATH="$JULIA_HOME/lib/julia"
BLASLDFLAGS="-L$JULIA_LD_PATH -lopenblas64_"
SCSFLAGS="USE_OPENMP=1 BLAS64=1 BLASSUFFIX=_64_"
make -j4 CFLAGS="-march=native" DLONG=0 ${SCSFLAGS} BLASLDFLAGS="${BLASLDFLAGS}" gpu

then running it via

LD_LIBRARY_PATH=$JULIA_LD_PATH:$LD_LIBRARY_PATH ./out/demo_socp_gpu 1000 0.5 0.5 1

Additional information

similarly compiled direct and indirect solvers (cpu) work just fine

Output

seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones:    z: primal zero / dual free vars: 2000
          l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  9.46e+01  3.33e+04 -1.66e+04  1.00e-01  1.03e-03 
   250| 1.76e+04  4.31e+01  1.23e+04 -6.15e+03  1.00e-01  1.65e-01 
   500| 2.74e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.29e-01 
   750| 1.57e+04  4.26e+01  1.23e+04 -6.16e+03  1.00e-01  4.94e-01 
  1000| 1.64e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  6.85e-01 
  1250| 4.30e+21  2.67e+22  6.54e+22 -3.27e+22  1.00e-01  8.48e-01 
  1500| 1.90e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  9.48e-01 
  1750| 2.14e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.04e+00 
  2000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.13e+00 
  2250| 6.45e+20  2.19e+22  4.21e+22  2.11e+22  1.00e-01  1.22e+00 
  2500| 2.07e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.30e+00 
  2750| 2.53e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.39e+00 
  3000| 2.02e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.48e+00 
  3250| 5.72e+20  3.01e+22  3.73e+22  1.87e+22  1.00e-01  1.57e+00 
  3500| 2.09e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.66e+00 
  3750| 2.43e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.75e+00 
  4000| 2.31e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.84e+00 
 [ ... ]
 99500| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.65e+01 
 99750| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.67e+01 
100000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.68e+01 
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 3.68e+01s = setup: 5.47e-02s + solve: 3.68e+01s
         lin-sys: 3.16e+01s, cones: 7.88e-01s, accel: 4.77e-01s
------------------------------------------------------------------
objective = -6159.028853 (inaccurate)
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 0.000000
scs dua obj = -12318.057707

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (27 by maintainers)

Most upvoted comments

Try the following patch. I got all the tests to pass with this fix.

--- a/linsys/gpu/gpu.c
+++ b/linsys/gpu/gpu.c
@@ -19,13 +19,13 @@ void SCS(accum_by_atrans_gpu)(const ScsGpuMatrix *Ag,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_NON_TRANSPOSE, &onef, Ag->descr, x,
-   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* this is slow, use trans routine if possible */
@@ -48,13 +48,13 @@ void SCS(accum_by_a_gpu)(const ScsGpuMatrix *Ag, const cusparseDnVecDescr_t x,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_TRANSPOSE, &onef, Ag->descr, x, &onef, y,
-   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* This assumes that P has been made full (ie not triangular) and uses the

I presume this issue can be closed after #251 is merged

Hmmm, actually this is likely something to do with the GPU solver specifically. There is some issue in there that only trips on some GPUs that I have run into before. It’s probably something to do with type sizes that I have not been able to figure out. I would probably recommend shelving the GPU solver for now, the MKL one is typically faster anyway.

Looks like the tests are passing except for hs21, which is probably just because the numerics are slightly different on the GPU and it’s producing a bad flag.

Thanks for posting. I am unable to reproduce this, when I run the command I get:

2021-10-16 14:47:37 (base) 0 bodonoghue@bodonoghue-[]-~/git/scs:
└──[ins] => out/demo_socp_gpu_indirect 1000 0.5 0.5 1
seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
	       SCS v3.0.0 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones: 	  z: primal zero / dual free vars: 2000
	  l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, warm_start: 0
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
	  nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  7.44e+00  2.65e+02  3.90e+03  1.00e-01  2.11e-02
    25| 3.80e-06  3.17e-04  3.36e-03  2.02e+03  1.00e-01  1.08e-01
------------------------------------------------------------------
status:  solved
timings: total: 6.66e-01s = setup: 5.58e-01s + solve: 1.08e-01s
	 lin-sys: 8.57e-02s, cones: 2.84e-04s, accel: 6.22e-05s
------------------------------------------------------------------
objective = 2022.072100
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 2022.070419
scs dua obj = 2022.073782

It might be the case that you are missing the gpu fixes I submitted here: https://github.com/cvxgrp/scs/commit/13e675d8c1f17e8f1e184281b25b8196c4ac74da.

I did not cut a new release / tag with those fixes. Is that the issue?

By the way, you can better test the gpu using:

make purge
make test_gpu
out/run_tests_gpu_indirect