[Developers] opencl newfmin example

Sat May 12 08:31:45 PDT 2012

Has anyone else actually got this example to work?

Some advice. Older GPU's (whatever that is) probably
do not support double precision.

WRT using the BFGS update on the CPU. It does not seem
to perform as well as doing iton the GPU. I think this is
due to roundoff error.  The CPU is carrying out additions in a different
way. It may be that with say 4K or more parameters and this
(artificial) example roundoff error becomes important.

I stored the matrix by rows. It is now appears that it should be stored
by columns for the fastest matrix * vector multiplication.