Toradex's T20: NVIDIA Tegra 2 without NEON:
ib256 = In-place (input array overwritten with output) backwards 256ob256 = Out-of-place (input array is preserved) backwards 256
// gcc version 4.5.2 (Sourcery G++ Lite 2011.03-41)
// Soft float, -fPIC
./bench -s ib256
Problem: ib256, setup: 7.16 s, time: 143.97 us, ``mflops'': 71.127
//gcc version 4.7.3 20121001 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2012.10-20121022 - Linaro GCC 2012.10)
// Hard float, -fPIC
Problem: ib256, setup: 7.20 s, time: 13.83 us, ``mflops'': 740.52
Problem: ob256, setup: 3.38 s, time: 12.45 us, ``mflops'': 822.8
Problem: ob256, setup: 3.25 s, time: 9.87 us, ``mflops'': 1037.6
// Hard float, -fPIC, -mfpu=neon, --enable--neon
Problem: ib256, setup: 10.09 s, time: 8.14 us, ``mflops'': 1258.2
Problem: ob256, setup: 5.15 s, time: 7.18 us, ``mflops'': 1425.1
Toradex's T30: NVIDIA Tegra 3 with NEON:
//gcc version 4.7.3 20121001 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2012.10-20121022 - Linaro GCC 2012.10)
// Hard float, -fPIC, -mfpu=neon
Problem: ib256, setup: 6.98 s, time: 10.85 us, ``mflops'': 943.9Problem: ob256, setup: 3.25 s, time: 9.87 us, ``mflops'': 1037.6
// Hard float, -fPIC, -mfpu=neon, --enable--neon
Problem: ib256, setup: 10.09 s, time: 8.14 us, ``mflops'': 1258.2
Problem: ob256, setup: 5.15 s, time: 7.18 us, ``mflops'': 1425.1
An i7-950 running an Ubuntu VM:
// --enable-sse2, -fPIC, -m32
Problem: ib256, setup: 42.86 ms, time: 845.89 ns, ``mflops'': 12106
Problem: ob256, setup: 21.93 ms, time: 665.65 ns, ``mflops'': 15383