YUV(NV21) to RGBA function NEON optimizations.

Used perf to measure the performance gains and the average of 3 runs
showed that:

Old NEON:
    Decodes 7200 1920x1080 frames in 92,372,429,902.33 cycles (~12.83E6 / frame)
    (stddev 3.22E-3)

New NEON:
    Decodes 7200 1920x1080 frames in 66,456,635,523.00 cycles ( ~9.23E6 / frame)
    (stddev 5.16E-5)

about 35% faster.

This code was wrapped in C framework that was developed to obtain these
measurements, and has not been tested within RenderScript.

Change-Id: I4f4e25b968f858f9fca973b36c105c715d90acbf
Signed-off-by: David Butcher <[email protected]>
1 file changed