I still haven't lost hope that some of you actually want to do something 
useful with GPU
programming, fool that I am. Pour encourager les autres I have attached 
open cl code
and a cpp driver which implements the BFGS update to the inverse of the 
sort of Hessian
for quasi Newton function minimizers.


For 10240 variables it produces

  CPU time  13.4657 seconds

  GPU time  0.603332 seconds

Norm of difference between CPU and GPU BFGS update of K = H inv
5.23044e-26 for an 10240 times 10240 matrix

Code is attached.  Feedback or insults to keep me awake always appreciated.

