Update libjpeg-turbo.
This change updates our copy of libjpeg-turbo to r829.
BUG=none
TEST=none
Review URL: https://chromiumcodereview.appspot.com/10386084
git-svn-id: http://src.chromium.org/svn/trunk/deps/third_party/libjpeg_turbo@136524 4ff67af0-8c30-449e-8e8b-ad334ec8d88c
diff --git a/ChangeLog.txt b/ChangeLog.txt
index 083a0a4..98481ec 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -1,3 +1,36 @@
+1.3 pre-beta
+============
+
+[1] Added support for additional scaling factors (3/8, 5/8, 3/4, 7/8, 9/8, 5/4,
+11/8, 3/2, 13/8, 7/4, 15/8, and 2) when decompressing. Note that the IDCT will
+not be SIMD-accelerated when using any of these new scaling factors.
+
+[2] Added SIMD acceleration for performing 4:2:2 upsampling on NEON-capable ARM
+platforms. This speeds up the decompression of 4:2:2 JPEGs by 20-25% on such
+platforms.
+
+[3] Creating or decoding a JPEG file that uses the RGB colorspace should now
+properly work when the input or output colorspace is one of the libjpeg-turbo
+colorspace extensions.
+
+[4] When libjpeg-turbo was built without SIMD support and merged (non-fancy)
+upsampling was used along with an alpha-enabled colorspace during
+decompression, the unused byte of the decompressed pixels was not being set to
+0xFF. This has been fixed. TJUnitTest has also been extended to test for the
+correct behavior of the colorspace extensions when merged upsampling is used.
+
+[5] The TurboJPEG dynamic library is now versioned. It was not strictly
+necessary to do so, because TurboJPEG uses versioned symbols, and if a function
+changes in an ABI-incompatible way, that function is renamed and a legacy
+function is provided to maintain backward compatibility. However, certain
+Linux distro maintainers will blindly reject any library that is not versioned,
+so this was an attempt to make them happy.
+
+[6] Fixed a bug whereby the libjpeg-turbo SSE2 SIMD code would not preserve the
+upper 64 bits of xmm6 and xmm7 on Win64 platforms, which violated the Win64
+calling conventions.
+
+
1.2.0
=====
@@ -19,6 +52,20 @@
when decompressing to a 4-component RGB buffer, the unused byte should be set
to 0xFF so that it can be interpreted as an opaque alpha channel.
+[5] Fixed regression issue whereby DevIL failed to build against libjpeg-turbo
+because libjpeg-turbo's distributed version of jconfig.h contained an INLINE
+macro, which conflicted with a similar macro in DevIL. This macro is used only
+internally when building libjpeg-turbo, so it was moved into config.h.
+
+[6] libjpeg-turbo will now correctly decompress erroneous CMYK/YCCK JPEGs whose
+K component is assigned a component ID of 1 instead of 4. Although these files
+are in violation of the spec, other JPEG implementations handle them
+correctly.
+
+[7] Added ARM v6 and ARM v7 architectures to libjpeg.a and libturbojpeg.a in
+the official OS X distribution package, so that those libraries can be used to
+build both OS X and iOS applications.
+
1.1.90 (1.2 beta1)
==================
@@ -170,7 +217,8 @@
[2] Created a new CMake-based build system for the Visual C++ and MinGW builds.
-[3] TurboJPEG/OSS can now compress from/decompress to grayscale bitmaps.
+[3] Grayscale bitmaps can now be compressed from/decompressed to using the
+TurboJPEG API.
[4] jpgtest can now be used to test decompression performance with existing
JPEG images.
@@ -186,11 +234,12 @@
[7] Added arithmetic encoding and decoding support (can be disabled with
configure or CMake options)
-[8] Added a TJ_YUV flag to TurboJPEG/OSS which causes both the compressor and
-decompressor to output planar YUV images.
+[8] Added a TJ_YUV flag to the TurboJPEG API, which causes both the compressor
+and decompressor to output planar YUV images.
-[9] Added an extended version of tjDecompressHeader() to TurboJPEG/OSS which
-allows the caller to determine the type of subsampling used in a JPEG image.
+[9] Added an extended version of tjDecompressHeader() to the TurboJPEG API,
+which allows the caller to determine the type of subsampling used in a JPEG
+image.
[10] Added further protections against invalid Huffman codes.
@@ -202,7 +251,7 @@
from a corrupt JPEG image.) Previously, these would cause libjpeg-turbo to
crash under certain circumstances.
-[2] Fixed typo in SIMD dispatch routines which was causing 4:2:2 upsampling to
+[2] Fixed typo in SIMD dispatch routines that was causing 4:2:2 upsampling to
be used instead of 4:2:0 when decompressing JPEG images using SSE2 code.
[3] configure script will now automatically determine whether the
@@ -223,7 +272,7 @@
[3] The Unix/Linux distribution packages now include the libjpeg run-time
programs (cjpeg, etc.) and man pages.
-[4] Created a 32-bit supplementary package for amd64 Debian systems which
+[4] Created a 32-bit supplementary package for amd64 Debian systems, which
contains just the 32-bit libjpeg-turbo libraries.
[5] Moved the libraries from */lib32 to */lib in the i386 Debian package.
diff --git a/README b/README
index afa052d..0e9b429 100644
--- a/README
+++ b/README
@@ -1,7 +1,8 @@
-libjpeg-turbo note: This file is mostly taken from the libjpeg v8b README
-file, and it is included only for reference. Some parts of it may not apply to
-libjpeg-turbo. Please see README-turbo.txt for information specific to the
-turbo version.
+libjpeg-turbo note: This file contains portions of the libjpeg v6b and v8
+README files, with additional wordsmithing by The libjpeg-turbo Project.
+It is included only for reference, as some parts of it may not apply to
+libjpeg-turbo. Please see README-turbo.txt for information specific to
+libjpeg-turbo.
The Independent JPEG Group's JPEG software
@@ -256,8 +257,8 @@
The "official" archive site for this software is www.ijg.org.
The most recent released version can always be found there in
directory "files". This particular version will be archived as
-http://www.ijg.org/files/jpegsrc.v8b.tar.gz, and in Windows-compatible
-"zip" archive format as http://www.ijg.org/files/jpegsr8b.zip.
+http://www.ijg.org/files/jpegsrc.v8d.tar.gz, and in Windows-compatible
+"zip" archive format as http://www.ijg.org/files/jpegsr8d.zip.
The JPEG FAQ (Frequently Asked Questions) article is a source of some
general information about JPEG.
@@ -286,4 +287,4 @@
TO DO
=====
-Please send bug reports, offers of help, etc. to [email protected].
+Please send bug reports, offers of help, etc. to [email protected].
diff --git a/README-turbo.txt b/README-turbo.txt
index 327647c..f452618 100644
--- a/README-turbo.txt
+++ b/README-turbo.txt
@@ -3,31 +3,31 @@
*******************************************************************************
libjpeg-turbo is a derivative of libjpeg that uses SIMD instructions (MMX,
-SSE2, etc.) to accelerate baseline JPEG compression and decompression on x86
-and x86-64 systems. On such systems, libjpeg-turbo is generally 2-4x as fast
-as the unmodified version of libjpeg, all else being equal.
+SSE2, NEON) to accelerate baseline JPEG compression and decompression on x86,
+x86-64, and ARM systems. On such systems, libjpeg-turbo is generally 2-4x as
+fast as the unmodified version of libjpeg, all else being equal.
libjpeg-turbo was originally based on libjpeg/SIMD by Miyasaka Masaru, but
the TigerVNC and VirtualGL projects made numerous enhancements to the codec in
2009, including improved support for Mac OS X, 64-bit support, support for
-32-bit and big endian pixel formats (RGBX, XBGR, etc.), accelerated Huffman
-encoding/decoding, and various bug fixes. The goal was to produce a fully open
-source codec that could replace the partially closed source TurboJPEG/IPP codec
-used by VirtualGL and TurboVNC. libjpeg-turbo generally performs in the range
-of 80-120% of TurboJPEG/IPP. It is faster in some areas but slower in others.
+32-bit and big-endian pixel formats (RGBX, XBGR, etc.), accelerated Huffman
+encoding/decoding, and various bug fixes. The goal was to produce a fully
+open-source codec that could replace the partially closed-source TurboJPEG/IPP
+codec used by VirtualGL and TurboVNC. libjpeg-turbo generally achieves 80-120%
+of the performance of TurboJPEG/IPP. It is faster in some areas but slower in
+others.
In early 2010, libjpeg-turbo spun off into its own independent project, with
the goal of making high-speed JPEG compression/decompression technology
-available to a broader range of users and developers. The libjpeg-turbo shared
-libraries can be used as drop-in replacements for libjpeg on most systems.
+available to a broader range of users and developers.
*******************************************************************************
** License
*******************************************************************************
-libjpeg-turbo is licensed under a non-restrictive, BSD-style license
-(see README.) The TurboJPEG/OSS wrapper (both C and Java versions) and
+Most of libjpeg-turbo inherits the non-restrictive, BSD-style license used by
+libjpeg (see README.) The TurboJPEG/OSS wrapper (both C and Java versions) and
associated test programs bear a similar license, which is reproduced below:
Redistribution and use in source and binary forms, with or without
@@ -62,16 +62,16 @@
libjpeg-turbo includes two APIs that can be used to compress and decompress
JPEG images:
- TurboJPEG/OSS: This API wraps libjpeg-turbo and provides an easy-to-use
- interface for compressing and decompressing JPEG images in memory. It also
- provides some features that would not be straightforward to implement using
- the underlying libjpeg API, such as generating planar YUV images and
- performing multiple simultaneous lossless transforms on an image. The Java
- interface for libjpeg-turbo is written on top of TurboJPEG/OSS.
+ TurboJPEG API: This API provides an easy-to-use interface for compressing
+ and decompressing JPEG images in memory. It also provides some functionality
+ that would not be straightforward to achieve using the underlying libjpeg
+ API, such as generating planar YUV images and performing multiple
+ simultaneous lossless transforms on an image. The Java interface for
+ libjpeg-turbo is written on top of the TurboJPEG API.
- libjpeg API: This is the industry standard API for compressing and
- decompressing JPEG images. It is more difficult to use than TurboJPEG/OSS
- but also more powerful. libjpeg-turbo is both API/ABI-compatible and
+ libjpeg API: This is the de facto industry-standard API for compressing and
+ decompressing JPEG images. It is more difficult to use than the TurboJPEG
+ API but also more powerful. libjpeg-turbo is both API/ABI-compatible and
mathematically compatible with libjpeg v6b. It can also optionally be
configured to be API/ABI-compatible with libjpeg v7 and v8 (see below.)
@@ -101,13 +101,13 @@
architecture.
System administrators can also replace the libjpeg sym links in /usr/{lib} with
-links to the libjpeg dynamic library located in /opt/libjpeg-turbo/{lib}. This
-will effectively accelerate every dynamically linked libjpeg application on the
-system.
+links to the libjpeg-turbo dynamic library located in /opt/libjpeg-turbo/{lib}.
+This will effectively accelerate every application that uses the libjpeg
+dynamic library on the system.
The libjpeg-turbo SDK for Visual C++ installs the libjpeg-turbo DLL
-(jpeg62.dll, jpeg7.dll, or jpeg8.dll, depending on whether libjpeg v6b, v7, or
-v8 emulation is enabled) into c:\libjpeg-turbo[64]\bin, and the PATH
+(jpeg62.dll, jpeg7.dll, or jpeg8.dll, depending on whether it was built with
+libjpeg v6b, v7, or v8 emulation) into c:\libjpeg-turbo[64]\bin, and the PATH
environment variable can be modified such that this directory is searched
before any others that might contain a libjpeg DLL. However, if a libjpeg
DLL exists in an application's install directory, then Windows will load this
@@ -117,16 +117,16 @@
application's install directory to accelerate it.
The version of the libjpeg-turbo DLL distributed in the libjpeg-turbo SDK for
-Visual C++ requires the Visual C++ 2008 C run time DLL (msvcr90.dll).
+Visual C++ requires the Visual C++ 2008 C run-time DLL (msvcr90.dll).
msvcr90.dll ships with more recent versions of Windows, but users of older
Windows releases can obtain it from the Visual C++ 2008 Redistributable
Package, which is available as a free download from Microsoft's web site.
-NOTE: Features of libjpeg that require passing a C run time structure, such
+NOTE: Features of libjpeg that require passing a C run-time structure, such
as a file handle, from an application to libjpeg will probably not work with
the version of the libjpeg-turbo DLL distributed in the libjpeg-turbo SDK for
Visual C++, unless the application is also built to use the Visual C++ 2008 C
-run time DLL. In particular, this affects jpeg_stdio_dest() and
+run-time DLL. In particular, this affects jpeg_stdio_dest() and
jpeg_stdio_src().
Mac applications typically embed their own copies of the libjpeg dylib inside
@@ -146,7 +146,7 @@
libjpeg-turbo is a drop-in replacement for the TurboJPEG/IPP SDK used by
VirtualGL 2.1.x and TurboVNC 0.6 (and prior.) libjpeg-turbo contains a wrapper
library (TurboJPEG/OSS) that emulates the TurboJPEG API using libjpeg-turbo
-instead of the closed source Intel Performance Primitives. You can replace the
+instead of the closed-source Intel Performance Primitives. You can replace the
TurboJPEG/IPP package on Linux systems with the libjpeg-turbo package in order
to make existing releases of VirtualGL 2.1.x and TurboVNC 0.x use the new codec
at run time. Note that the 64-bit libjpeg-turbo packages contain only 64-bit
@@ -157,7 +157,7 @@
You can also build the VirtualGL 2.1.x and TurboVNC 0.6 source code with
the libjpeg-turbo SDK instead of TurboJPEG/IPP. It should work identically.
libjpeg-turbo also includes static library versions of TurboJPEG/OSS, which
-are used to build TurboVNC 1.0 and later.
+are used to build VirtualGL 2.2 and TurboVNC 1.0 and later.
========================================
Using libjpeg-turbo in Your Own Programs
@@ -256,24 +256,17 @@
libjpeg v7 and v8 API/ABI support
=================================
-libjpeg v7 and v8 added new features to the API/ABI, and, unfortunately, the
-compression and decompression structures were extended in a backward-
-incompatible manner to accommodate these features. Thus, programs that are
+With libjpeg v7 and v8, new features were added that necessitated extending the
+compression and decompression structures. Unfortunately, due to the exposed
+nature of those structures, extending them also necessitated breaking backward
+ABI compatibility with previous libjpeg releases. Thus, programs that are
built to use libjpeg v7 or v8 did not work with libjpeg-turbo, since it is
based on the libjpeg v6b code base. Although libjpeg v7 and v8 are still not
as widely used as v6b, enough programs (including a few Linux distros) have
made the switch that it was desirable to provide support for the libjpeg v7/v8
-API/ABI in libjpeg-turbo.
-
-Some of the libjpeg v7 and v8 features -- DCT scaling, to name one -- involve
-deep modifications to the code that cannot be accommodated by libjpeg-turbo
-without either breaking compatibility with libjpeg v6b or producing an
-unsupportable mess. In order to fully support libjpeg v8 with all of its
-features, we would have to essentially port the SIMD extensions to the libjpeg
-v8 code base and maintain two separate code trees. We are hesitant to do this
-until/unless the newer libjpeg code bases garner more community support and
-involvement and until/unless we have some notion of whether future libjpeg
-releases will also be backward-incompatible.
+API/ABI in libjpeg-turbo. Although libjpeg-turbo can now be configured as a
+drop-in replacement for libjpeg v7 or v8, it should be noted that not all of
+the features in libjpeg v7 and v8 are supported (see below.)
By passing an argument of --with-jpeg7 or --with-jpeg8 to configure, or an
argument of -DWITH_JPEG7=1 or -DWITH_JPEG8=1 to cmake, you can build a version
@@ -292,6 +285,11 @@
for convenience purposes. It has always been possible to implement this
feature with libjpeg v6b (see rdswitch.c for an example.)
+-- libjpeg: IDCT scaling extensions in decompressor
+ libjpeg-turbo supports IDCT scaling with scaling factors of 1/8, 1/4, 3/8,
+ 1/2, 5/8, 3/4, 7/8, 9/8, 5/4, 11/8, 3/2, 13/8, 7/4, 15/8, and 2/1 (only 1/4
+ and 1/2 are SIMD-accelerated.)
+
-- cjpeg: 32-bit BMP support
-- jpegtran: lossless cropping
@@ -312,16 +310,27 @@
-- libjpeg: DCT scaling in compressor
cinfo.scale_num and cinfo.scale_denom are silently ignored.
+ There is no technical reason why DCT scaling cannot be supported, but
+ without the SmartScale extension (see below), it would only be able to
+ down-scale using ratios of 1/2, 8/15, 4/7, 8/13, 2/3, 8/11, 4/5, and 8/9,
+ which is of limited usefulness.
--- libjpeg: IDCT scaling extensions in decompressor
- libjpeg-turbo still supports IDCT scaling with scaling factors of 1/2, 1/4,
- and 1/8 (same as libjpeg v6b.)
+-- libjpeg: SmartScale
+ cinfo.block_size is silently ignored.
+ SmartScale is an extension to the JPEG format that allows for DCT block
+ sizes other than 8x8. It would be difficult to support this feature while
+ retaining backward compatibility with libjpeg v6b.
-- libjpeg: Fancy downsampling in compressor
cinfo.do_fancy_downsampling is silently ignored.
+ This requires the DCT scaling feature, which is not supported.
-- jpegtran: Scaling
- Seems to depend on the DCT scaling feature, which isn't supported.
+ This requires both the DCT scaling and SmartScale features, which are not
+ supported.
+
+-- Lossless RGB JPEG files
+ This requires the SmartScale feature, which is not supported.
*******************************************************************************
@@ -333,12 +342,13 @@
===============
The optimized Huffman decoder in libjpeg-turbo does not handle restart markers
-in a way that makes libjpeg happy, so it is necessary to use the slow Huffman
-decoder when decompressing a JPEG image that has restart markers. This can
-cause the decompression performance to drop by as much as 20%, but the
-performance will still be much much greater than that of libjpeg v6b. Many
-consumer packages, such as PhotoShop, use restart markers when generating JPEG
-images, so images generated by those programs will experience this issue.
+in a way that makes the rest of the libjpeg infrastructure happy, so it is
+necessary to use the slow Huffman decoder when decompressing a JPEG image that
+has restart markers. This can cause the decompression performance to drop by
+as much as 20%, but the performance will still be much greater than that of
+libjpeg. Many consumer packages, such as PhotoShop, use restart markers when
+generating JPEG images, so images generated by those programs will experience
+this issue.
===============================================
Fast Integer Forward DCT at High Quality Levels
diff --git a/README.chromium b/README.chromium
index 8e02302..cee676f 100644
--- a/README.chromium
+++ b/README.chromium
@@ -1,12 +1,12 @@
Name: libjpeg-turbo
URL: http://sourceforge.net/projects/libjpeg-turbo/
-Version: 1.2
+Version: 1.2.80
License File: LICENSE.txt
Security Critical: yes
Description:
This consists of three components:
-* A partial copy of libjpeg-turbo 1.2 (r733);
+* A partial copy of libjpeg-turbo 1.2.80 (r829);
* A build file (libjpeg.gyp), and;
* Patched header files used by Chromium.
diff --git a/cjpeg.c b/cjpeg.c
index 3daa259..0475c02 100644
--- a/cjpeg.c
+++ b/cjpeg.c
@@ -164,6 +164,9 @@
fprintf(stderr, " -targa Input file is Targa format (usually not needed)\n");
#endif
fprintf(stderr, "Switches for advanced users:\n");
+#ifdef C_ARITH_CODING_SUPPORTED
+ fprintf(stderr, " -arithmetic Use arithmetic coding\n");
+#endif
#ifdef DCT_ISLOW_SUPPORTED
fprintf(stderr, " -dct int Use integer DCT method%s\n",
(JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
@@ -184,9 +187,6 @@
fprintf(stderr, " -outfile name Specify name for output file\n");
fprintf(stderr, " -verbose or -debug Emit debug output\n");
fprintf(stderr, "Switches for wizards:\n");
-#ifdef C_ARITH_CODING_SUPPORTED
- fprintf(stderr, " -arithmetic Use arithmetic coding\n");
-#endif
fprintf(stderr, " -baseline Force baseline quantization tables\n");
fprintf(stderr, " -qtables file Use quantization tables given in file\n");
fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");
@@ -277,9 +277,9 @@
if (! printed_version) {
fprintf(stderr, "%s version %s (build %s)\n",
PACKAGE_NAME, VERSION, BUILD);
- fprintf(stderr, "%s\n\n", LJTCOPYRIGHT);
- fprintf(stderr, "Based on Independent JPEG Group's libjpeg, version %s\n%s\n\n",
- JVERSION, JCOPYRIGHT);
+ fprintf(stderr, "%s\n\n", JCOPYRIGHT);
+ fprintf(stderr, "Emulating The Independent JPEG Group's libjpeg, version %s\n\n",
+ JVERSION);
printed_version = TRUE;
}
cinfo->err->trace_level++;
diff --git a/config.h b/config.h
new file mode 100644
index 0000000..25793c0
--- /dev/null
+++ b/config.h
@@ -0,0 +1,147 @@
+/* config.h. Generated from config.h.in by configure. */
+/* config.h.in. Generated from configure.ac by autoheader. */
+
+/* Build number */
+#define BUILD "20120511"
+
+/* Support arithmetic encoding */
+/* #undef C_ARITH_CODING_SUPPORTED */
+
+/* Support arithmetic decoding */
+/* #undef D_ARITH_CODING_SUPPORTED */
+
+/* Define to 1 if you have the <dlfcn.h> header file. */
+#define HAVE_DLFCN_H 1
+
+/* Define to 1 if you have the <inttypes.h> header file. */
+#define HAVE_INTTYPES_H 1
+
+/* Define to 1 if you have the <jni.h> header file. */
+/* #undef HAVE_JNI_H */
+
+/* Define to 1 if you have the `memcpy' function. */
+#define HAVE_MEMCPY 1
+
+/* Define to 1 if you have the <memory.h> header file. */
+#define HAVE_MEMORY_H 1
+
+/* Define to 1 if you have the `memset' function. */
+#define HAVE_MEMSET 1
+
+/* Define if your compiler supports prototypes */
+#define HAVE_PROTOTYPES 1
+
+/* Define to 1 if you have the <stddef.h> header file. */
+#define HAVE_STDDEF_H 1
+
+/* Define to 1 if you have the <stdint.h> header file. */
+#define HAVE_STDINT_H 1
+
+/* Define to 1 if you have the <stdlib.h> header file. */
+#define HAVE_STDLIB_H 1
+
+/* Define to 1 if you have the <strings.h> header file. */
+#define HAVE_STRINGS_H 1
+
+/* Define to 1 if you have the <string.h> header file. */
+#define HAVE_STRING_H 1
+
+/* Define to 1 if you have the <sys/stat.h> header file. */
+#define HAVE_SYS_STAT_H 1
+
+/* Define to 1 if you have the <sys/types.h> header file. */
+#define HAVE_SYS_TYPES_H 1
+
+/* Define to 1 if you have the <unistd.h> header file. */
+#if !defined(_MSC_VER)
+#define HAVE_UNISTD_H 1
+#endif
+
+/* Define to 1 if the system has the type `unsigned char'. */
+#define HAVE_UNSIGNED_CHAR 1
+
+/* Define to 1 if the system has the type `unsigned short'. */
+#define HAVE_UNSIGNED_SHORT 1
+
+/* Compiler does not support pointers to undefined structures. */
+/* #undef INCOMPLETE_TYPES_BROKEN */
+
+/* How to obtain function inlining. */
+#ifndef INLINE
+#if defined(__GNUC__)
+#define INLINE __attribute__((always_inline))
+#elif defined(_MSC_VER)
+#define INLINE __forceinline
+#else
+#define INLINE
+#endif
+#endif
+
+/* libjpeg API version */
+#define JPEG_LIB_VERSION 62
+
+/* libjpeg-turbo version */
+#define LIBJPEG_TURBO_VERSION 1.2.80
+
+/* Define to the sub-directory in which libtool stores uninstalled libraries.
+ */
+#define LT_OBJDIR ".libs/"
+
+/* Define if you have BSD-like bzero and bcopy */
+/* #undef NEED_BSD_STRINGS */
+
+/* Define if you need short function names */
+/* #undef NEED_SHORT_EXTERNAL_NAMES */
+
+/* Define if you have sys/types.h */
+#define NEED_SYS_TYPES_H 1
+
+/* Name of package */
+#define PACKAGE "libjpeg-turbo"
+
+/* Define to the address where bug reports for this package should be sent. */
+#define PACKAGE_BUGREPORT ""
+
+/* Define to the full name of this package. */
+#define PACKAGE_NAME "libjpeg-turbo"
+
+/* Define to the full name and version of this package. */
+#define PACKAGE_STRING "libjpeg-turbo 1.2.80"
+
+/* Define to the one symbol short name of this package. */
+#define PACKAGE_TARNAME "libjpeg-turbo"
+
+/* Define to the home page for this package. */
+#define PACKAGE_URL ""
+
+/* Define to the version of this package. */
+#define PACKAGE_VERSION "1.2.80"
+
+/* Define if shift is unsigned */
+/* #undef RIGHT_SHIFT_IS_UNSIGNED */
+
+/* Define to 1 if you have the ANSI C header files. */
+#define STDC_HEADERS 1
+
+/* Version number of package */
+#define VERSION "1.2.80"
+
+/* Use accelerated SIMD routines. */
+#define WITH_SIMD 1
+
+/* Define to 1 if type `char' is unsigned and you are not using gcc. */
+#ifndef __CHAR_UNSIGNED__
+/* # undef __CHAR_UNSIGNED__ */
+#endif
+
+/* Define to empty if `const' does not conform to ANSI C. */
+/* #undef const */
+
+/* Define to `__inline__' or `__inline' if that's what the C compiler
+ calls it, or to nothing if 'inline' is not supported under any name. */
+#ifndef __cplusplus
+/* #undef inline */
+#endif
+
+/* Define to `unsigned int' if <sys/types.h> does not define. */
+/* #undef size_t */
diff --git a/djpeg.c b/djpeg.c
index 0dd4f24..2438856 100644
--- a/djpeg.c
+++ b/djpeg.c
@@ -245,9 +245,9 @@
if (! printed_version) {
fprintf(stderr, "%s version %s (build %s)\n",
PACKAGE_NAME, VERSION, BUILD);
- fprintf(stderr, "%s\n\n", LJTCOPYRIGHT);
- fprintf(stderr, "Based on Independent JPEG Group's libjpeg, version %s\n%s\n\n",
- JVERSION, JCOPYRIGHT);
+ fprintf(stderr, "%s\n\n", JCOPYRIGHT);
+ fprintf(stderr, "Emulating The Independent JPEG Group's libjpeg, version %s\n\n",
+ JVERSION);
printed_version = TRUE;
}
cinfo->err->trace_level++;
diff --git a/google.patch b/google.patch
index 3e88c25..9fcfe9b 100644
--- a/google.patch
+++ b/google.patch
@@ -1,8 +1,8 @@
Index: jdmarker.c
===================================================================
---- jdmarker.c (revision 733)
+--- jdmarker.c (revision 829)
+++ jdmarker.c (working copy)
-@@ -906,7 +906,7 @@
+@@ -910,7 +910,7 @@
}
if (cinfo->marker->discarded_bytes != 0) {
@@ -11,7 +11,7 @@
cinfo->marker->discarded_bytes = 0;
}
-@@ -940,7 +940,144 @@
+@@ -944,7 +944,144 @@
return TRUE;
}
@@ -156,7 +156,7 @@
/*
* Read markers until SOS or EOI.
*
-@@ -1009,6 +1146,7 @@
+@@ -1013,6 +1150,7 @@
break;
case M_SOS:
@@ -166,7 +166,7 @@
cinfo->unread_marker = 0; /* processed the marker */
Index: jmorecfg.h
===================================================================
---- jmorecfg.h (revision 733)
+--- jmorecfg.h (revision 829)
+++ jmorecfg.h (working copy)
@@ -153,14 +153,18 @@
/* INT16 must hold at least the values -32768..32767. */
@@ -203,7 +203,7 @@
/*
Index: jpeglib.h
===================================================================
---- jpeglib.h (revision 733)
+--- jpeglib.h (revision 829)
+++ jpeglib.h (working copy)
@@ -15,6 +15,10 @@
#ifndef JPEGLIB_H
@@ -336,7 +336,7 @@
+#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_
Index: simd/jcgrass2-64.asm
===================================================================
---- simd/jcgrass2-64.asm (revision 733)
+--- simd/jcgrass2-64.asm (revision 829)
+++ simd/jcgrass2-64.asm (working copy)
@@ -30,7 +30,7 @@
SECTION SEG_CONST
@@ -349,7 +349,7 @@
Index: simd/jiss2fst.asm
===================================================================
---- simd/jiss2fst.asm (revision 733)
+--- simd/jiss2fst.asm (revision 829)
+++ simd/jiss2fst.asm (working copy)
@@ -59,7 +59,7 @@
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -369,9 +369,22 @@
EXTN(jsimd_idct_ifast_sse2):
push ebp
+Index: simd/jcclrss2-64.asm
+===================================================================
+--- simd/jcclrss2-64.asm (revision 829)
++++ simd/jcclrss2-64.asm (working copy)
+@@ -37,7 +37,7 @@
+
+ align 16
+
+- global EXTN(jsimd_rgb_ycc_convert_sse2)
++ global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
+
+ EXTN(jsimd_rgb_ycc_convert_sse2):
+ push rbp
Index: simd/jiss2red-64.asm
===================================================================
---- simd/jiss2red-64.asm (revision 733)
+--- simd/jiss2red-64.asm (revision 829)
+++ simd/jiss2red-64.asm (working copy)
@@ -73,7 +73,7 @@
SECTION SEG_CONST
@@ -400,22 +413,9 @@
EXTN(jsimd_idct_2x2_sse2):
push rbp
-Index: simd/jcclrss2-64.asm
-===================================================================
---- simd/jcclrss2-64.asm (revision 733)
-+++ simd/jcclrss2-64.asm (working copy)
-@@ -37,7 +37,7 @@
-
- align 16
-
-- global EXTN(jsimd_rgb_ycc_convert_sse2)
-+ global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
-
- EXTN(jsimd_rgb_ycc_convert_sse2):
- push rbp
Index: simd/ji3dnflt.asm
===================================================================
---- simd/ji3dnflt.asm (revision 733)
+--- simd/ji3dnflt.asm (revision 829)
+++ simd/ji3dnflt.asm (working copy)
@@ -27,7 +27,7 @@
SECTION SEG_CONST
@@ -437,7 +437,7 @@
push ebp
Index: simd/jsimdcpu.asm
===================================================================
---- simd/jsimdcpu.asm (revision 733)
+--- simd/jsimdcpu.asm (revision 829)
+++ simd/jsimdcpu.asm (working copy)
@@ -29,7 +29,7 @@
;
@@ -448,9 +448,22 @@
EXTN(jpeg_simd_cpu_support):
push ebx
+Index: simd/jdmerss2-64.asm
+===================================================================
+--- simd/jdmerss2-64.asm (revision 829)
++++ simd/jdmerss2-64.asm (working copy)
+@@ -35,7 +35,7 @@
+ SECTION SEG_CONST
+
+ alignz 16
+- global EXTN(jconst_merged_upsample_sse2)
++ global EXTN(jconst_merged_upsample_sse2) PRIVATE
+
+ EXTN(jconst_merged_upsample_sse2):
+
Index: simd/jdsammmx.asm
===================================================================
---- simd/jdsammmx.asm (revision 733)
+--- simd/jdsammmx.asm (revision 829)
+++ simd/jdsammmx.asm (working copy)
@@ -22,7 +22,7 @@
SECTION SEG_CONST
@@ -497,22 +510,9 @@
EXTN(jsimd_h2v2_upsample_mmx):
push ebp
-Index: simd/jdmerss2-64.asm
-===================================================================
---- simd/jdmerss2-64.asm (revision 733)
-+++ simd/jdmerss2-64.asm (working copy)
-@@ -35,7 +35,7 @@
- SECTION SEG_CONST
-
- alignz 16
-- global EXTN(jconst_merged_upsample_sse2)
-+ global EXTN(jconst_merged_upsample_sse2) PRIVATE
-
- EXTN(jconst_merged_upsample_sse2):
-
Index: simd/jdmrgmmx.asm
===================================================================
---- simd/jdmrgmmx.asm (revision 733)
+--- simd/jdmrgmmx.asm (revision 829)
+++ simd/jdmrgmmx.asm (working copy)
@@ -40,7 +40,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -534,7 +534,7 @@
push ebp
Index: simd/jdsamss2.asm
===================================================================
---- simd/jdsamss2.asm (revision 733)
+--- simd/jdsamss2.asm (revision 829)
+++ simd/jdsamss2.asm (working copy)
@@ -22,7 +22,7 @@
SECTION SEG_CONST
@@ -583,7 +583,7 @@
push ebp
Index: simd/jiss2flt-64.asm
===================================================================
---- simd/jiss2flt-64.asm (revision 733)
+--- simd/jiss2flt-64.asm (revision 829)
+++ simd/jiss2flt-64.asm (working copy)
@@ -38,7 +38,7 @@
SECTION SEG_CONST
@@ -605,7 +605,7 @@
push rbp
Index: simd/jfss2int-64.asm
===================================================================
---- simd/jfss2int-64.asm (revision 733)
+--- simd/jfss2int-64.asm (revision 829)
+++ simd/jfss2int-64.asm (working copy)
@@ -67,7 +67,7 @@
SECTION SEG_CONST
@@ -627,7 +627,7 @@
push rbp
Index: simd/jcqnts2f.asm
===================================================================
---- simd/jcqnts2f.asm (revision 733)
+--- simd/jcqnts2f.asm (revision 829)
+++ simd/jcqnts2f.asm (working copy)
@@ -35,7 +35,7 @@
%define workspace ebp+16 ; FAST_FLOAT * workspace
@@ -649,7 +649,7 @@
push ebp
Index: simd/jdmrgss2.asm
===================================================================
---- simd/jdmrgss2.asm (revision 733)
+--- simd/jdmrgss2.asm (revision 829)
+++ simd/jdmrgss2.asm (working copy)
@@ -40,7 +40,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -671,7 +671,7 @@
push ebp
Index: simd/jfmmxint.asm
===================================================================
---- simd/jfmmxint.asm (revision 733)
+--- simd/jfmmxint.asm (revision 829)
+++ simd/jfmmxint.asm (working copy)
@@ -66,7 +66,7 @@
SECTION SEG_CONST
@@ -693,7 +693,7 @@
push ebp
Index: simd/jcgryss2-64.asm
===================================================================
---- simd/jcgryss2-64.asm (revision 733)
+--- simd/jcgryss2-64.asm (revision 829)
+++ simd/jcgryss2-64.asm (working copy)
@@ -37,7 +37,7 @@
@@ -706,7 +706,7 @@
push rbp
Index: simd/jcqnts2i.asm
===================================================================
---- simd/jcqnts2i.asm (revision 733)
+--- simd/jcqnts2i.asm (revision 829)
+++ simd/jcqnts2i.asm (working copy)
@@ -35,7 +35,7 @@
%define workspace ebp+16 ; DCTELEM * workspace
@@ -728,7 +728,7 @@
push ebp
Index: simd/jiss2fst-64.asm
===================================================================
---- simd/jiss2fst-64.asm (revision 733)
+--- simd/jiss2fst-64.asm (revision 829)
+++ simd/jiss2fst-64.asm (working copy)
@@ -60,7 +60,7 @@
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -750,7 +750,7 @@
push rbp
Index: simd/jiss2flt.asm
===================================================================
---- simd/jiss2flt.asm (revision 733)
+--- simd/jiss2flt.asm (revision 829)
+++ simd/jiss2flt.asm (working copy)
@@ -37,7 +37,7 @@
SECTION SEG_CONST
@@ -772,7 +772,7 @@
push ebp
Index: simd/jiss2int.asm
===================================================================
---- simd/jiss2int.asm (revision 733)
+--- simd/jiss2int.asm (revision 829)
+++ simd/jiss2int.asm (working copy)
@@ -66,7 +66,7 @@
SECTION SEG_CONST
@@ -794,7 +794,7 @@
push ebp
Index: simd/jfsseflt-64.asm
===================================================================
---- simd/jfsseflt-64.asm (revision 733)
+--- simd/jfsseflt-64.asm (revision 829)
+++ simd/jfsseflt-64.asm (working copy)
@@ -38,7 +38,7 @@
SECTION SEG_CONST
@@ -816,7 +816,7 @@
push rbp
Index: simd/jccolss2-64.asm
===================================================================
---- simd/jccolss2-64.asm (revision 733)
+--- simd/jccolss2-64.asm (revision 829)
+++ simd/jccolss2-64.asm (working copy)
@@ -34,7 +34,7 @@
SECTION SEG_CONST
@@ -829,7 +829,7 @@
Index: simd/jcsamss2-64.asm
===================================================================
---- simd/jcsamss2-64.asm (revision 733)
+--- simd/jcsamss2-64.asm (revision 829)
+++ simd/jcsamss2-64.asm (working copy)
@@ -41,7 +41,7 @@
; r15 = JSAMPARRAY output_data
@@ -851,7 +851,7 @@
push rbp
Index: simd/jdclrss2-64.asm
===================================================================
---- simd/jdclrss2-64.asm (revision 733)
+--- simd/jdclrss2-64.asm (revision 829)
+++ simd/jdclrss2-64.asm (working copy)
@@ -39,7 +39,7 @@
%define WK_NUM 2
@@ -864,7 +864,7 @@
push rbp
Index: simd/jdcolmmx.asm
===================================================================
---- simd/jdcolmmx.asm (revision 733)
+--- simd/jdcolmmx.asm (revision 829)
+++ simd/jdcolmmx.asm (working copy)
@@ -35,7 +35,7 @@
SECTION SEG_CONST
@@ -877,7 +877,7 @@
Index: simd/jcclrmmx.asm
===================================================================
---- simd/jcclrmmx.asm (revision 733)
+--- simd/jcclrmmx.asm (revision 829)
+++ simd/jcclrmmx.asm (working copy)
@@ -40,7 +40,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -890,7 +890,7 @@
push ebp
Index: simd/jfsseflt.asm
===================================================================
---- simd/jfsseflt.asm (revision 733)
+--- simd/jfsseflt.asm (revision 829)
+++ simd/jfsseflt.asm (working copy)
@@ -37,7 +37,7 @@
SECTION SEG_CONST
@@ -912,7 +912,7 @@
push ebp
Index: simd/jdmrgss2-64.asm
===================================================================
---- simd/jdmrgss2-64.asm (revision 733)
+--- simd/jdmrgss2-64.asm (revision 829)
+++ simd/jdmrgss2-64.asm (working copy)
@@ -39,7 +39,7 @@
%define WK_NUM 3
@@ -932,22 +932,9 @@
EXTN(jsimd_h2v2_merged_upsample_sse2):
push rbp
-Index: simd/jdmermmx.asm
-===================================================================
---- simd/jdmermmx.asm (revision 733)
-+++ simd/jdmermmx.asm (working copy)
-@@ -35,7 +35,7 @@
- SECTION SEG_CONST
-
- alignz 16
-- global EXTN(jconst_merged_upsample_mmx)
-+ global EXTN(jconst_merged_upsample_mmx) PRIVATE
-
- EXTN(jconst_merged_upsample_mmx):
-
Index: simd/jdcolss2.asm
===================================================================
---- simd/jdcolss2.asm (revision 733)
+--- simd/jdcolss2.asm (revision 829)
+++ simd/jdcolss2.asm (working copy)
@@ -35,7 +35,7 @@
SECTION SEG_CONST
@@ -958,9 +945,35 @@
EXTN(jconst_ycc_rgb_convert_sse2):
+Index: simd/jdmermmx.asm
+===================================================================
+--- simd/jdmermmx.asm (revision 829)
++++ simd/jdmermmx.asm (working copy)
+@@ -35,7 +35,7 @@
+ SECTION SEG_CONST
+
+ alignz 16
+- global EXTN(jconst_merged_upsample_mmx)
++ global EXTN(jconst_merged_upsample_mmx) PRIVATE
+
+ EXTN(jconst_merged_upsample_mmx):
+
+Index: simd/jcclrss2.asm
+===================================================================
+--- simd/jcclrss2.asm (revision 829)
++++ simd/jcclrss2.asm (working copy)
+@@ -38,7 +38,7 @@
+
+ align 16
+
+- global EXTN(jsimd_rgb_ycc_convert_sse2)
++ global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
+
+ EXTN(jsimd_rgb_ycc_convert_sse2):
+ push ebp
Index: simd/jiss2red.asm
===================================================================
---- simd/jiss2red.asm (revision 733)
+--- simd/jiss2red.asm (revision 829)
+++ simd/jiss2red.asm (working copy)
@@ -72,7 +72,7 @@
SECTION SEG_CONST
@@ -989,22 +1002,9 @@
EXTN(jsimd_idct_2x2_sse2):
push ebp
-Index: simd/jcclrss2.asm
-===================================================================
---- simd/jcclrss2.asm (revision 733)
-+++ simd/jcclrss2.asm (working copy)
-@@ -38,7 +38,7 @@
-
- align 16
-
-- global EXTN(jsimd_rgb_ycc_convert_sse2)
-+ global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
-
- EXTN(jsimd_rgb_ycc_convert_sse2):
- push ebp
Index: simd/jdmerss2.asm
===================================================================
---- simd/jdmerss2.asm (revision 733)
+--- simd/jdmerss2.asm (revision 829)
+++ simd/jdmerss2.asm (working copy)
@@ -35,7 +35,7 @@
SECTION SEG_CONST
@@ -1017,7 +1017,7 @@
Index: simd/jfss2fst-64.asm
===================================================================
---- simd/jfss2fst-64.asm (revision 733)
+--- simd/jfss2fst-64.asm (revision 829)
+++ simd/jfss2fst-64.asm (working copy)
@@ -53,7 +53,7 @@
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1037,31 +1037,9 @@
EXTN(jsimd_fdct_ifast_sse2):
push rbp
-Index: simd/jimmxfst.asm
-===================================================================
---- simd/jimmxfst.asm (revision 733)
-+++ simd/jimmxfst.asm (working copy)
-@@ -59,7 +59,7 @@
- %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
-
- alignz 16
-- global EXTN(jconst_idct_ifast_mmx)
-+ global EXTN(jconst_idct_ifast_mmx) PRIVATE
-
- EXTN(jconst_idct_ifast_mmx):
-
-@@ -94,7 +94,7 @@
- ; JCOEF workspace[DCTSIZE2]
-
- align 16
-- global EXTN(jsimd_idct_ifast_mmx)
-+ global EXTN(jsimd_idct_ifast_mmx) PRIVATE
-
- EXTN(jsimd_idct_ifast_mmx):
- push ebp
Index: simd/jcqntmmx.asm
===================================================================
---- simd/jcqntmmx.asm (revision 733)
+--- simd/jcqntmmx.asm (revision 829)
+++ simd/jcqntmmx.asm (working copy)
@@ -35,7 +35,7 @@
%define workspace ebp+16 ; DCTELEM * workspace
@@ -1081,9 +1059,31 @@
EXTN(jsimd_quantize_mmx):
push ebp
+Index: simd/jimmxfst.asm
+===================================================================
+--- simd/jimmxfst.asm (revision 829)
++++ simd/jimmxfst.asm (working copy)
+@@ -59,7 +59,7 @@
+ %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
+
+ alignz 16
+- global EXTN(jconst_idct_ifast_mmx)
++ global EXTN(jconst_idct_ifast_mmx) PRIVATE
+
+ EXTN(jconst_idct_ifast_mmx):
+
+@@ -94,7 +94,7 @@
+ ; JCOEF workspace[DCTSIZE2]
+
+ align 16
+- global EXTN(jsimd_idct_ifast_mmx)
++ global EXTN(jsimd_idct_ifast_mmx) PRIVATE
+
+ EXTN(jsimd_idct_ifast_mmx):
+ push ebp
Index: simd/jfss2fst.asm
===================================================================
---- simd/jfss2fst.asm (revision 733)
+--- simd/jfss2fst.asm (revision 829)
+++ simd/jfss2fst.asm (working copy)
@@ -52,7 +52,7 @@
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1105,7 +1105,7 @@
push ebp
Index: simd/jcgrammx.asm
===================================================================
---- simd/jcgrammx.asm (revision 733)
+--- simd/jcgrammx.asm (revision 829)
+++ simd/jcgrammx.asm (working copy)
@@ -33,7 +33,7 @@
SECTION SEG_CONST
@@ -1116,9 +1116,22 @@
EXTN(jconst_rgb_gray_convert_mmx):
+Index: simd/jdcolss2-64.asm
+===================================================================
+--- simd/jdcolss2-64.asm (revision 829)
++++ simd/jdcolss2-64.asm (working copy)
+@@ -35,7 +35,7 @@
+ SECTION SEG_CONST
+
+ alignz 16
+- global EXTN(jconst_ycc_rgb_convert_sse2)
++ global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE
+
+ EXTN(jconst_ycc_rgb_convert_sse2):
+
Index: simd/jf3dnflt.asm
===================================================================
---- simd/jf3dnflt.asm (revision 733)
+--- simd/jf3dnflt.asm (revision 829)
+++ simd/jf3dnflt.asm (working copy)
@@ -27,7 +27,7 @@
SECTION SEG_CONST
@@ -1138,22 +1151,9 @@
EXTN(jsimd_fdct_float_3dnow):
push ebp
-Index: simd/jdcolss2-64.asm
-===================================================================
---- simd/jdcolss2-64.asm (revision 733)
-+++ simd/jdcolss2-64.asm (working copy)
-@@ -35,7 +35,7 @@
- SECTION SEG_CONST
-
- alignz 16
-- global EXTN(jconst_ycc_rgb_convert_sse2)
-+ global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE
-
- EXTN(jconst_ycc_rgb_convert_sse2):
-
Index: simd/jdsamss2-64.asm
===================================================================
---- simd/jdsamss2-64.asm (revision 733)
+--- simd/jdsamss2-64.asm (revision 829)
+++ simd/jdsamss2-64.asm (working copy)
@@ -23,7 +23,7 @@
SECTION SEG_CONST
@@ -1202,7 +1202,7 @@
push rbp
Index: simd/jcgrass2.asm
===================================================================
---- simd/jcgrass2.asm (revision 733)
+--- simd/jcgrass2.asm (revision 829)
+++ simd/jcgrass2.asm (working copy)
@@ -30,7 +30,7 @@
SECTION SEG_CONST
@@ -1215,7 +1215,7 @@
Index: simd/jcsammmx.asm
===================================================================
---- simd/jcsammmx.asm (revision 733)
+--- simd/jcsammmx.asm (revision 829)
+++ simd/jcsammmx.asm (working copy)
@@ -40,7 +40,7 @@
%define output_data(b) (b)+28 ; JSAMPARRAY output_data
@@ -1235,9 +1235,82 @@
EXTN(jsimd_h2v2_downsample_mmx):
push ebp
+Index: simd/jsimd_i386.c
+===================================================================
+--- simd/jsimd_i386.c (revision 829)
++++ simd/jsimd_i386.c (working copy)
+@@ -61,6 +61,7 @@
+ simd_support &= JSIMD_SSE2;
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_rgb_ycc (void)
+ {
+@@ -82,6 +83,7 @@
+
+ return 0;
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_rgb_gray (void)
+@@ -127,6 +129,7 @@
+ return 0;
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(void)
+ jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
+ JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
+@@ -179,6 +182,7 @@
+ mmxfct(cinfo->image_width, input_buf,
+ output_buf, output_row, num_rows);
+ }
++#endif
+
+ GLOBAL(void)
+ jsimd_rgb_gray_convert (j_compress_ptr cinfo,
+@@ -286,6 +290,7 @@
+ input_row, output_buf, num_rows);
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_h2v2_downsample (void)
+ {
+@@ -351,6 +356,7 @@
+ compptr->v_samp_factor, compptr->width_in_blocks,
+ input_data, output_data);
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_h2v2_upsample (void)
+@@ -636,6 +642,7 @@
+ in_row_group_ctr, output_buf);
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_convsamp (void)
+ {
+@@ -855,6 +862,7 @@
+ else if (simd_support & JSIMD_3DNOW)
+ jsimd_quantize_float_3dnow(coef_block, divisors, workspace);
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_idct_2x2 (void)
+@@ -1045,4 +1053,3 @@
+ jsimd_idct_float_3dnow(compptr->dct_table, coef_block,
+ output_buf, output_col);
+ }
+-
Index: simd/jcqnts2f-64.asm
===================================================================
---- simd/jcqnts2f-64.asm (revision 733)
+--- simd/jcqnts2f-64.asm (revision 829)
+++ simd/jcqnts2f-64.asm (working copy)
@@ -36,7 +36,7 @@
; r12 = FAST_FLOAT * workspace
@@ -1259,7 +1332,7 @@
push rbp
Index: simd/jcqnt3dn.asm
===================================================================
---- simd/jcqnt3dn.asm (revision 733)
+--- simd/jcqnt3dn.asm (revision 829)
+++ simd/jcqnt3dn.asm (working copy)
@@ -35,7 +35,7 @@
%define workspace ebp+16 ; FAST_FLOAT * workspace
@@ -1281,7 +1354,7 @@
push ebp
Index: simd/jcsamss2.asm
===================================================================
---- simd/jcsamss2.asm (revision 733)
+--- simd/jcsamss2.asm (revision 829)
+++ simd/jcsamss2.asm (working copy)
@@ -40,7 +40,7 @@
%define output_data(b) (b)+28 ; JSAMPARRAY output_data
@@ -1301,9 +1374,82 @@
EXTN(jsimd_h2v2_downsample_sse2):
push ebp
+Index: simd/jsimd_x86_64.c
+===================================================================
+--- simd/jsimd_x86_64.c (revision 829)
++++ simd/jsimd_x86_64.c (working copy)
+@@ -29,6 +29,7 @@
+
+ #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_rgb_ycc (void)
+ {
+@@ -45,6 +46,7 @@
+
+ return 1;
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_rgb_gray (void)
+@@ -80,6 +82,7 @@
+ return 1;
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(void)
+ jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
+ JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
+@@ -118,6 +121,7 @@
+
+ sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
+ }
++#endif
+
+ GLOBAL(void)
+ jsimd_rgb_gray_convert (j_compress_ptr cinfo,
+@@ -197,6 +201,7 @@
+ sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_h2v2_downsample (void)
+ {
+@@ -242,6 +247,7 @@
+ compptr->width_in_blocks,
+ input_data, output_data);
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_h2v2_upsample (void)
+@@ -451,6 +457,7 @@
+ sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
+ }
+
++#ifndef JPEG_DECODE_ONLY
+ GLOBAL(int)
+ jsimd_can_convsamp (void)
+ {
+@@ -601,6 +608,7 @@
+ {
+ jsimd_quantize_float_sse2(coef_block, divisors, workspace);
+ }
++#endif
+
+ GLOBAL(int)
+ jsimd_can_idct_2x2 (void)
+@@ -750,4 +758,3 @@
+ jsimd_idct_float_sse2(compptr->dct_table, coef_block,
+ output_buf, output_col);
+ }
+-
Index: simd/jimmxint.asm
===================================================================
---- simd/jimmxint.asm (revision 733)
+--- simd/jimmxint.asm (revision 829)
+++ simd/jimmxint.asm (working copy)
@@ -66,7 +66,7 @@
SECTION SEG_CONST
@@ -1325,7 +1471,7 @@
push ebp
Index: simd/jcgrymmx.asm
===================================================================
---- simd/jcgrymmx.asm (revision 733)
+--- simd/jcgrymmx.asm (revision 829)
+++ simd/jcgrymmx.asm (working copy)
@@ -41,7 +41,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -1338,7 +1484,7 @@
push ebp
Index: simd/jfss2int.asm
===================================================================
---- simd/jfss2int.asm (revision 733)
+--- simd/jfss2int.asm (revision 829)
+++ simd/jfss2int.asm (working copy)
@@ -66,7 +66,7 @@
SECTION SEG_CONST
@@ -1360,7 +1506,7 @@
push ebp
Index: simd/jcgryss2.asm
===================================================================
---- simd/jcgryss2.asm (revision 733)
+--- simd/jcgryss2.asm (revision 829)
+++ simd/jcgryss2.asm (working copy)
@@ -39,7 +39,7 @@
@@ -1373,7 +1519,7 @@
push ebp
Index: simd/jccolmmx.asm
===================================================================
---- simd/jccolmmx.asm (revision 733)
+--- simd/jccolmmx.asm (revision 829)
+++ simd/jccolmmx.asm (working copy)
@@ -37,7 +37,7 @@
SECTION SEG_CONST
@@ -1386,7 +1532,7 @@
Index: simd/jimmxred.asm
===================================================================
---- simd/jimmxred.asm (revision 733)
+--- simd/jimmxred.asm (revision 829)
+++ simd/jimmxred.asm (working copy)
@@ -72,7 +72,7 @@
SECTION SEG_CONST
@@ -1417,7 +1563,7 @@
push ebp
Index: simd/jsimdext.inc
===================================================================
---- simd/jsimdext.inc (revision 733)
+--- simd/jsimdext.inc (revision 829)
+++ simd/jsimdext.inc (working copy)
@@ -73,6 +73,9 @@
; * *BSD family Unix using elf format
@@ -1446,7 +1592,7 @@
; --------------------------------------------------------------------------
Index: simd/jdclrmmx.asm
===================================================================
---- simd/jdclrmmx.asm (revision 733)
+--- simd/jdclrmmx.asm (revision 829)
+++ simd/jdclrmmx.asm (working copy)
@@ -40,7 +40,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -1459,7 +1605,7 @@
push ebp
Index: simd/jccolss2.asm
===================================================================
---- simd/jccolss2.asm (revision 733)
+--- simd/jccolss2.asm (revision 829)
+++ simd/jccolss2.asm (working copy)
@@ -34,7 +34,7 @@
SECTION SEG_CONST
@@ -1472,7 +1618,7 @@
Index: simd/jisseflt.asm
===================================================================
---- simd/jisseflt.asm (revision 733)
+--- simd/jisseflt.asm (revision 829)
+++ simd/jisseflt.asm (working copy)
@@ -37,7 +37,7 @@
SECTION SEG_CONST
@@ -1494,7 +1640,7 @@
push ebp
Index: simd/jcqnts2i-64.asm
===================================================================
---- simd/jcqnts2i-64.asm (revision 733)
+--- simd/jcqnts2i-64.asm (revision 829)
+++ simd/jcqnts2i-64.asm (working copy)
@@ -36,7 +36,7 @@
; r12 = DCTELEM * workspace
@@ -1516,7 +1662,7 @@
push rbp
Index: simd/jdclrss2.asm
===================================================================
---- simd/jdclrss2.asm (revision 733)
+--- simd/jdclrss2.asm (revision 829)
+++ simd/jdclrss2.asm (working copy)
@@ -40,7 +40,7 @@
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
@@ -1529,7 +1675,7 @@
push ebp
Index: simd/jcqntsse.asm
===================================================================
---- simd/jcqntsse.asm (revision 733)
+--- simd/jcqntsse.asm (revision 829)
+++ simd/jcqntsse.asm (working copy)
@@ -35,7 +35,7 @@
%define workspace ebp+16 ; FAST_FLOAT * workspace
@@ -1551,7 +1697,7 @@
push ebp
Index: simd/jiss2int-64.asm
===================================================================
---- simd/jiss2int-64.asm (revision 733)
+--- simd/jiss2int-64.asm (revision 829)
+++ simd/jiss2int-64.asm (working copy)
@@ -67,7 +67,7 @@
SECTION SEG_CONST
@@ -1573,7 +1719,7 @@
push rbp
Index: simd/jfmmxfst.asm
===================================================================
---- simd/jfmmxfst.asm (revision 733)
+--- simd/jfmmxfst.asm (revision 829)
+++ simd/jfmmxfst.asm (working copy)
@@ -52,7 +52,7 @@
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
@@ -1595,7 +1741,7 @@
push ebp
Index: jdarith.c
===================================================================
---- jdarith.c (revision 733)
+--- jdarith.c (revision 829)
+++ jdarith.c (working copy)
@@ -150,8 +150,8 @@
*/
@@ -1610,7 +1756,7 @@
temp = e->a - qe;
Index: jdhuff.c
===================================================================
---- jdhuff.c (revision 733)
+--- jdhuff.c (revision 829)
+++ jdhuff.c (working copy)
@@ -742,7 +742,7 @@
* this module, since we'll just re-assign them on the next call.)
diff --git a/jccolext.c b/jccolext.c
index acbfa23..dbac84a 100644
--- a/jccolext.c
+++ b/jccolext.c
@@ -2,7 +2,7 @@
* jccolext.c
*
* Copyright (C) 1991-1996, Thomas G. Lane.
- * Copyright (C) 2009-2011, D. R. Commander.
+ * Copyright (C) 2009-2012, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -112,3 +112,35 @@
}
}
}
+
+
+/*
+ * Convert some rows of samples to the JPEG colorspace.
+ * This version handles extended RGB->plain RGB conversion
+ */
+
+INLINE
+LOCAL(void)
+rgb_rgb_convert_internal (j_compress_ptr cinfo,
+ JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
+ JDIMENSION output_row, int num_rows)
+{
+ register JSAMPROW inptr;
+ register JSAMPROW outptr0, outptr1, outptr2;
+ register JDIMENSION col;
+ JDIMENSION num_cols = cinfo->image_width;
+
+ while (--num_rows >= 0) {
+ inptr = *input_buf++;
+ outptr0 = output_buf[0][output_row];
+ outptr1 = output_buf[1][output_row];
+ outptr2 = output_buf[2][output_row];
+ output_row++;
+ for (col = 0; col < num_cols; col++) {
+ outptr0[col] = GETJSAMPLE(inptr[RGB_RED]);
+ outptr1[col] = GETJSAMPLE(inptr[RGB_GREEN]);
+ outptr2[col] = GETJSAMPLE(inptr[RGB_BLUE]);
+ inptr += RGB_PIXELSIZE;
+ }
+ }
+}
diff --git a/jccolor.c b/jccolor.c
index 0d8910a..3a0772b 100644
--- a/jccolor.c
+++ b/jccolor.c
@@ -3,7 +3,7 @@
*
* Copyright (C) 1991-1996, Thomas G. Lane.
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
- * Copyright (C) 2009-2011, D. R. Commander.
+ * Copyright (C) 2009-2012, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -14,6 +14,7 @@
#include "jinclude.h"
#include "jpeglib.h"
#include "jsimd.h"
+#include "config.h"
/* Private subobject */
@@ -95,6 +96,7 @@
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define rgb_ycc_convert_internal extrgb_ycc_convert_internal
#define rgb_gray_convert_internal extrgb_gray_convert_internal
+#define rgb_rgb_convert_internal extrgb_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -102,6 +104,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
@@ -109,6 +112,7 @@
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define rgb_ycc_convert_internal extrgbx_ycc_convert_internal
#define rgb_gray_convert_internal extrgbx_gray_convert_internal
+#define rgb_rgb_convert_internal extrgbx_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -116,6 +120,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
@@ -123,6 +128,7 @@
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define rgb_ycc_convert_internal extbgr_ycc_convert_internal
#define rgb_gray_convert_internal extbgr_gray_convert_internal
+#define rgb_rgb_convert_internal extbgr_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -130,6 +136,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
@@ -137,6 +144,7 @@
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define rgb_ycc_convert_internal extbgrx_ycc_convert_internal
#define rgb_gray_convert_internal extbgrx_gray_convert_internal
+#define rgb_rgb_convert_internal extbgrx_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -144,6 +152,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
@@ -151,6 +160,7 @@
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define rgb_ycc_convert_internal extxbgr_ycc_convert_internal
#define rgb_gray_convert_internal extxbgr_gray_convert_internal
+#define rgb_rgb_convert_internal extxbgr_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -158,6 +168,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
@@ -165,6 +176,7 @@
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define rgb_ycc_convert_internal extxrgb_ycc_convert_internal
#define rgb_gray_convert_internal extxrgb_gray_convert_internal
+#define rgb_rgb_convert_internal extxrgb_rgb_convert_internal
#include "jccolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -172,6 +184,7 @@
#undef RGB_PIXELSIZE
#undef rgb_ycc_convert_internal
#undef rgb_gray_convert_internal
+#undef rgb_rgb_convert_internal
/*
@@ -306,6 +319,52 @@
/*
+ * Extended RGB to plain RGB conversion
+ */
+
+METHODDEF(void)
+rgb_rgb_convert (j_compress_ptr cinfo,
+ JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
+ JDIMENSION output_row, int num_rows)
+{
+ switch (cinfo->in_color_space) {
+ case JCS_EXT_RGB:
+ extrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ case JCS_EXT_RGBX:
+ case JCS_EXT_RGBA:
+ extrgbx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ case JCS_EXT_BGR:
+ extbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ case JCS_EXT_BGRX:
+ case JCS_EXT_BGRA:
+ extbgrx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ case JCS_EXT_XBGR:
+ case JCS_EXT_ABGR:
+ extxbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ case JCS_EXT_XRGB:
+ case JCS_EXT_ARGB:
+ extxrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ default:
+ rgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
+ num_rows);
+ break;
+ }
+}
+
+
+/*
* Convert some rows of samples to the JPEG colorspace.
* This version handles Adobe-style CMYK->YCCK conversion,
* where we convert R=1-C, G=1-M, and B=1-Y to YCbCr using the same
@@ -522,21 +581,25 @@
break;
case JCS_RGB:
- case JCS_EXT_RGB:
- case JCS_EXT_RGBX:
- case JCS_EXT_BGR:
- case JCS_EXT_BGRX:
- case JCS_EXT_XBGR:
- case JCS_EXT_XRGB:
- case JCS_EXT_RGBA:
- case JCS_EXT_BGRA:
- case JCS_EXT_ABGR:
- case JCS_EXT_ARGB:
if (cinfo->num_components != 3)
ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);
- if (cinfo->in_color_space == cinfo->jpeg_color_space &&
- rgb_pixelsize[cinfo->in_color_space] == 3)
+ if (rgb_red[cinfo->in_color_space] == 0 &&
+ rgb_green[cinfo->in_color_space] == 1 &&
+ rgb_blue[cinfo->in_color_space] == 2 &&
+ rgb_pixelsize[cinfo->in_color_space] == 3)
cconvert->pub.color_convert = null_convert;
+ else if (cinfo->in_color_space == JCS_RGB ||
+ cinfo->in_color_space == JCS_EXT_RGB ||
+ cinfo->in_color_space == JCS_EXT_RGBX ||
+ cinfo->in_color_space == JCS_EXT_BGR ||
+ cinfo->in_color_space == JCS_EXT_BGRX ||
+ cinfo->in_color_space == JCS_EXT_XBGR ||
+ cinfo->in_color_space == JCS_EXT_XRGB ||
+ cinfo->in_color_space == JCS_EXT_RGBA ||
+ cinfo->in_color_space == JCS_EXT_BGRA ||
+ cinfo->in_color_space == JCS_EXT_ABGR ||
+ cinfo->in_color_space == JCS_EXT_ARGB)
+ cconvert->pub.color_convert = rgb_rgb_convert;
else
ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
break;
diff --git a/jcdctmgr.c b/jcdctmgr.c
index 711f9da..12f8872 100644
--- a/jcdctmgr.c
+++ b/jcdctmgr.c
@@ -182,7 +182,7 @@
/* fq will be one bit too large to fit in DCTELEM, so adjust */
fq >>= 1;
r--;
- } else if (fr <= (divisor / 2)) { /* fractional part is < 0.5 */
+ } else if (fr <= (divisor / 2U)) { /* fractional part is < 0.5 */
c++;
} else { /* fractional part is > 0.5 */
fq++;
diff --git a/jconfig.h b/jconfig.h
index 794edef..aa18469 100644
--- a/jconfig.h
+++ b/jconfig.h
@@ -2,9 +2,10 @@
/* Version ID for the JPEG library.
* Might be useful for tests like "#if JPEG_LIB_VERSION >= 60".
*/
-#ifndef JPEG_LIB_VERSION
#define JPEG_LIB_VERSION 62
-#endif /* JPEG_LIB_VERSION */
+
+/* libjpeg-turbo version */
+#define LIBJPEG_TURBO_VERSION 1.2.80
/* Support arithmetic encoding */
/* #undef C_ARITH_CODING_SUPPORTED */
@@ -13,44 +14,23 @@
/* #undef D_ARITH_CODING_SUPPORTED */
/* Define if your compiler supports prototypes */
-#ifndef HAVE_PROTOTYPES
#define HAVE_PROTOTYPES 1
-#endif /* HAVE_PROTOTYPES */
/* Define to 1 if you have the <stddef.h> header file. */
-#ifndef HAVE_STDDEF_H
#define HAVE_STDDEF_H 1
-#endif /* HAVE_STDDEF_H */
/* Define to 1 if you have the <stdlib.h> header file. */
-#ifndef HAVE_STDLIB_H
#define HAVE_STDLIB_H 1
-#endif /* HAVE_STDLIB_H */
/* Define to 1 if the system has the type `unsigned char'. */
-#ifndef HAVE_UNSIGNED_CHAR
#define HAVE_UNSIGNED_CHAR 1
-#endif /* HAVE_UNSIGNED_CHAR */
/* Define to 1 if the system has the type `unsigned short'. */
-#ifndef HAVE_UNSIGNED_SHORT
#define HAVE_UNSIGNED_SHORT 1
-#endif /* HAVE_UNSIGNED_SHORT */
/* Define if you want use complete types */
/* #undef INCOMPLETE_TYPES_BROKEN */
-/* How to obtain function inlining. */
-#ifndef INLINE
-#if defined(__GNUC__)
-#define INLINE __attribute__((always_inline))
-#elif defined(_MSC_VER)
-#define INLINE __forceinline
-#else
-#define INLINE
-#endif
-#endif
-
/* Define if you have BSD-like bzero and bcopy */
/* #undef NEED_BSD_STRINGS */
@@ -74,11 +54,5 @@
/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */
-/* Define to `__inline__' or `__inline' if that's what the C compiler
- calls it, or to nothing if 'inline' is not supported under any name. */
-#ifndef __cplusplus
-/* #undef inline */
-#endif
-
/* Define to `unsigned int' if <sys/types.h> does not define. */
/* #undef size_t */
diff --git a/jdcolext.c b/jdcolext.c
index 07da949..3b8aeff 100644
--- a/jdcolext.c
+++ b/jdcolext.c
@@ -102,3 +102,40 @@
}
}
}
+
+
+/*
+ * Convert RGB to extended RGB: just swap the order of source pixels
+ */
+
+INLINE
+LOCAL(void)
+rgb_rgb_convert_internal (j_decompress_ptr cinfo,
+ JSAMPIMAGE input_buf, JDIMENSION input_row,
+ JSAMPARRAY output_buf, int num_rows)
+{
+ register JSAMPROW inptr0, inptr1, inptr2;
+ register JSAMPROW outptr;
+ register JDIMENSION col;
+ JDIMENSION num_cols = cinfo->output_width;
+
+ while (--num_rows >= 0) {
+ inptr0 = input_buf[0][input_row];
+ inptr1 = input_buf[1][input_row];
+ inptr2 = input_buf[2][input_row];
+ input_row++;
+ outptr = *output_buf++;
+ for (col = 0; col < num_cols; col++) {
+ /* We can dispense with GETJSAMPLE() here */
+ outptr[RGB_RED] = inptr0[col];
+ outptr[RGB_GREEN] = inptr1[col];
+ outptr[RGB_BLUE] = inptr2[col];
+ /* Set unused byte to 0xFF so it can be interpreted as an opaque */
+ /* alpha channel value */
+#ifdef RGB_ALPHA
+ outptr[RGB_ALPHA] = 0xFF;
+#endif
+ outptr += RGB_PIXELSIZE;
+ }
+ }
+}
diff --git a/jdcolor.c b/jdcolor.c
index a9a9220..694de9b 100644
--- a/jdcolor.c
+++ b/jdcolor.c
@@ -3,7 +3,7 @@
*
* Copyright (C) 1991-1997, Thomas G. Lane.
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
- * Copyright (C) 2009, 2011, D. R. Commander.
+ * Copyright (C) 2009, 2011-2012, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -14,6 +14,7 @@
#include "jinclude.h"
#include "jpeglib.h"
#include "jsimd.h"
+#include "config.h"
/* Private subobject */
@@ -79,6 +80,7 @@
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extrgb_convert_internal
#define gray_rgb_convert_internal gray_extrgb_convert_internal
+#define rgb_rgb_convert_internal rgb_extrgb_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -86,6 +88,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
@@ -94,6 +97,7 @@
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extrgbx_convert_internal
#define gray_rgb_convert_internal gray_extrgbx_convert_internal
+#define rgb_rgb_convert_internal rgb_extrgbx_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -102,6 +106,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
@@ -109,6 +114,7 @@
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extbgr_convert_internal
#define gray_rgb_convert_internal gray_extbgr_convert_internal
+#define rgb_rgb_convert_internal rgb_extbgr_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -116,6 +122,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
@@ -124,6 +131,7 @@
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extbgrx_convert_internal
#define gray_rgb_convert_internal gray_extbgrx_convert_internal
+#define rgb_rgb_convert_internal rgb_extbgrx_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -132,6 +140,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
@@ -140,6 +149,7 @@
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extxbgr_convert_internal
#define gray_rgb_convert_internal gray_extxbgr_convert_internal
+#define rgb_rgb_convert_internal rgb_extxbgr_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -148,6 +158,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
@@ -156,6 +167,7 @@
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define ycc_rgb_convert_internal ycc_extxrgb_convert_internal
#define gray_rgb_convert_internal gray_extxrgb_convert_internal
+#define rgb_rgb_convert_internal rgb_extxrgb_convert_internal
#include "jdcolext.c"
#undef RGB_RED
#undef RGB_GREEN
@@ -164,6 +176,7 @@
#undef RGB_PIXELSIZE
#undef ycc_rgb_convert_internal
#undef gray_rgb_convert_internal
+#undef rgb_rgb_convert_internal
/*
@@ -352,6 +365,51 @@
/*
+ * Convert plain RGB to extended RGB
+ */
+
+METHODDEF(void)
+rgb_rgb_convert (j_decompress_ptr cinfo,
+ JSAMPIMAGE input_buf, JDIMENSION input_row,
+ JSAMPARRAY output_buf, int num_rows)
+{
+ switch (cinfo->out_color_space) {
+ case JCS_EXT_RGB:
+ rgb_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ case JCS_EXT_RGBX:
+ case JCS_EXT_RGBA:
+ rgb_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ case JCS_EXT_BGR:
+ rgb_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ case JCS_EXT_BGRX:
+ case JCS_EXT_BGRA:
+ rgb_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ case JCS_EXT_XBGR:
+ case JCS_EXT_ABGR:
+ rgb_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ case JCS_EXT_XRGB:
+ case JCS_EXT_ARGB:
+ rgb_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ default:
+ rgb_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,
+ num_rows);
+ break;
+ }
+}
+
+/*
* Adobe-style YCCK->CMYK conversion.
* We convert YCbCr to R=1-C, G=1-M, and B=1-Y using the same
* conversion as above, while passing K (black) unchanged.
@@ -493,9 +551,14 @@
}
} else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) {
cconvert->pub.color_convert = gray_rgb_convert;
- } else if (cinfo->jpeg_color_space == cinfo->out_color_space &&
- rgb_pixelsize[cinfo->out_color_space] == 3) {
- cconvert->pub.color_convert = null_convert;
+ } else if (cinfo->jpeg_color_space == JCS_RGB) {
+ if (rgb_red[cinfo->out_color_space] == 0 &&
+ rgb_green[cinfo->out_color_space] == 1 &&
+ rgb_blue[cinfo->out_color_space] == 2 &&
+ rgb_pixelsize[cinfo->out_color_space] == 3)
+ cconvert->pub.color_convert = null_convert;
+ else
+ cconvert->pub.color_convert = rgb_rgb_convert;
} else
ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
break;
diff --git a/jdct.h b/jdct.h
index 7b49a97..3637448 100644
--- a/jdct.h
+++ b/jdct.h
@@ -95,9 +95,21 @@
#define jpeg_idct_islow jRDislow
#define jpeg_idct_ifast jRDifast
#define jpeg_idct_float jRDfloat
+#define jpeg_idct_7x7 jRD7x7
+#define jpeg_idct_6x6 jRD6x6
+#define jpeg_idct_5x5 jRD5x5
#define jpeg_idct_4x4 jRD4x4
+#define jpeg_idct_3x3 jRD3x3
#define jpeg_idct_2x2 jRD2x2
#define jpeg_idct_1x1 jRD1x1
+#define jpeg_idct_9x9 jRD9x9
+#define jpeg_idct_10x10 jRD10x10
+#define jpeg_idct_11x11 jRD11x11
+#define jpeg_idct_12x12 jRD12x12
+#define jpeg_idct_13x13 jRD13x13
+#define jpeg_idct_14x14 jRD14x14
+#define jpeg_idct_15x15 jRD15x15
+#define jpeg_idct_16x16 jRD16x16
#endif /* NEED_SHORT_EXTERNAL_NAMES */
/* Extern declarations for the forward and inverse DCT routines. */
@@ -115,15 +127,51 @@
EXTERN(void) jpeg_idct_float
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_7x7
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_6x6
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_5x5
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
EXTERN(void) jpeg_idct_4x4
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_3x3
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
EXTERN(void) jpeg_idct_2x2
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
EXTERN(void) jpeg_idct_1x1
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_9x9
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_10x10
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_11x11
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_12x12
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_13x13
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_14x14
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_15x15
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
+EXTERN(void) jpeg_idct_16x16
+ JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
/*
diff --git a/jddctmgr.c b/jddctmgr.c
index 044e469..1758bed 100644
--- a/jddctmgr.c
+++ b/jddctmgr.c
@@ -2,6 +2,7 @@
* jddctmgr.c
*
* Copyright (C) 1994-1996, Thomas G. Lane.
+ * Modified 2002-2010 by Guido Vollbeding.
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
* Copyright (C) 2010, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
@@ -115,6 +116,10 @@
method_ptr = jpeg_idct_2x2;
method = JDCT_ISLOW; /* jidctred uses islow-style table */
break;
+ case 3:
+ method_ptr = jpeg_idct_3x3;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
case 4:
if (jsimd_can_idct_4x4())
method_ptr = jsimd_idct_4x4;
@@ -122,6 +127,18 @@
method_ptr = jpeg_idct_4x4;
method = JDCT_ISLOW; /* jidctred uses islow-style table */
break;
+ case 5:
+ method_ptr = jpeg_idct_5x5;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 6:
+ method_ptr = jpeg_idct_6x6;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 7:
+ method_ptr = jpeg_idct_7x7;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
#endif
case DCTSIZE:
switch (cinfo->dct_method) {
@@ -157,6 +174,38 @@
break;
}
break;
+ case 9:
+ method_ptr = jpeg_idct_9x9;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 10:
+ method_ptr = jpeg_idct_10x10;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 11:
+ method_ptr = jpeg_idct_11x11;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 12:
+ method_ptr = jpeg_idct_12x12;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 13:
+ method_ptr = jpeg_idct_13x13;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 14:
+ method_ptr = jpeg_idct_14x14;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 15:
+ method_ptr = jpeg_idct_15x15;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
+ case 16:
+ method_ptr = jpeg_idct_16x16;
+ method = JDCT_ISLOW; /* jidctint uses islow-style table */
+ break;
default:
ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->_DCT_scaled_size);
break;
diff --git a/jdhuff.c b/jdhuff.c
index 6255a3b..3d74340 100644
--- a/jdhuff.c
+++ b/jdhuff.c
@@ -758,7 +758,7 @@
usefast = 0;
}
- if (cinfo->src->bytes_in_buffer < BUFSIZE * cinfo->blocks_in_MCU
+ if (cinfo->src->bytes_in_buffer < BUFSIZE * (size_t)cinfo->blocks_in_MCU
|| cinfo->unread_marker != 0)
usefast = 0;
diff --git a/jdinput.c b/jdinput.c
index 9fcd089..eddace6 100644
--- a/jdinput.c
+++ b/jdinput.c
@@ -2,7 +2,6 @@
* jdinput.c
*
* Copyright (C) 1991-1997, Thomas G. Lane.
- * Modified 2002-2009 by Guido Vollbeding.
* Copyright (C) 2010, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
@@ -38,79 +37,6 @@
* Routines to calculate various quantities related to the size of the image.
*/
-
-#if JPEG_LIB_VERSION >= 80
-/*
- * Compute output image dimensions and related values.
- * NOTE: this is exported for possible use by application.
- * Hence it mustn't do anything that can't be done twice.
- */
-
-GLOBAL(void)
-jpeg_core_output_dimensions (j_decompress_ptr cinfo)
-/* Do computations that are needed before master selection phase.
- * This function is used for transcoding and full decompression.
- */
-{
-#ifdef IDCT_SCALING_SUPPORTED
- int ci;
- jpeg_component_info *compptr;
-
- /* Compute actual output image dimensions and DCT scaling choices. */
- if (cinfo->scale_num * cinfo->block_size <= cinfo->scale_denom) {
- /* Provide 1/block_size scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width, (long) cinfo->block_size);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height, (long) cinfo->block_size);
- cinfo->min_DCT_h_scaled_size = 1;
- cinfo->min_DCT_v_scaled_size = 1;
- } else if (cinfo->scale_num * cinfo->block_size <= cinfo->scale_denom * 2) {
- /* Provide 2/block_size scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width * 2L, (long) cinfo->block_size);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height * 2L, (long) cinfo->block_size);
- cinfo->min_DCT_h_scaled_size = 2;
- cinfo->min_DCT_v_scaled_size = 2;
- } else if (cinfo->scale_num * cinfo->block_size <= cinfo->scale_denom * 4) {
- /* Provide 4/block_size scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width * 4L, (long) cinfo->block_size);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height * 4L, (long) cinfo->block_size);
- cinfo->min_DCT_h_scaled_size = 4;
- cinfo->min_DCT_v_scaled_size = 4;
- } else if (cinfo->scale_num * cinfo->block_size <= cinfo->scale_denom * 8) {
- /* Provide 8/block_size scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width * 8L, (long) cinfo->block_size);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height * 8L, (long) cinfo->block_size);
- cinfo->min_DCT_h_scaled_size = 8;
- cinfo->min_DCT_v_scaled_size = 8;
- }
- /* Recompute dimensions of components */
- for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
- ci++, compptr++) {
- compptr->DCT_h_scaled_size = cinfo->min_DCT_h_scaled_size;
- compptr->DCT_v_scaled_size = cinfo->min_DCT_v_scaled_size;
- }
-
-#else /* !IDCT_SCALING_SUPPORTED */
-
- /* Hardwire it to "no scaling" */
- cinfo->output_width = cinfo->image_width;
- cinfo->output_height = cinfo->image_height;
- /* jdinput.c has already initialized DCT_scaled_size,
- * and has computed unscaled downsampled_width and downsampled_height.
- */
-
-#endif /* IDCT_SCALING_SUPPORTED */
-}
-#endif
-
-
LOCAL(void)
initial_setup (j_decompress_ptr cinfo)
/* Called once, when first SOS marker is reached */
diff --git a/jdmarker.c b/jdmarker.c
index 382cb07..0cdb887 100644
--- a/jdmarker.c
+++ b/jdmarker.c
@@ -2,6 +2,7 @@
* jdmarker.c
*
* Copyright (C) 1991-1998, Thomas G. Lane.
+ * Copyright (C) 2012, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -322,13 +323,16 @@
/* Collect the component-spec parameters */
+ for (i = 0; i < cinfo->num_components; i++)
+ cinfo->cur_comp_info[i] = NULL;
+
for (i = 0; i < n; i++) {
INPUT_BYTE(cinfo, cc, return FALSE);
INPUT_BYTE(cinfo, c, return FALSE);
for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
ci++, compptr++) {
- if (cc == compptr->component_id)
+ if (cc == compptr->component_id && !cinfo->cur_comp_info[ci])
goto id_found;
}
diff --git a/jdmaster.c b/jdmaster.c
index c73ec02..225e825 100644
--- a/jdmaster.c
+++ b/jdmaster.c
@@ -2,6 +2,7 @@
* jdmaster.c
*
* Copyright (C) 1991-1997, Thomas G. Lane.
+ * Modified 2002-2009 by Guido Vollbeding.
* Copyright (C) 2009-2011, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
@@ -89,6 +90,177 @@
* Compute output image dimensions and related values.
* NOTE: this is exported for possible use by application.
* Hence it mustn't do anything that can't be done twice.
+ */
+
+#if JPEG_LIB_VERSION >= 80
+GLOBAL(void)
+#else
+LOCAL(void)
+#endif
+jpeg_core_output_dimensions (j_decompress_ptr cinfo)
+/* Do computations that are needed before master selection phase.
+ * This function is used for transcoding and full decompression.
+ */
+{
+#ifdef IDCT_SCALING_SUPPORTED
+ int ci;
+ jpeg_component_info *compptr;
+
+ /* Compute actual output image dimensions and DCT scaling choices. */
+ if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom) {
+ /* Provide 1/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 1;
+ cinfo->_min_DCT_v_scaled_size = 1;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 2) {
+ /* Provide 2/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 2L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 2L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 2;
+ cinfo->_min_DCT_v_scaled_size = 2;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 3) {
+ /* Provide 3/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 3L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 3L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 3;
+ cinfo->_min_DCT_v_scaled_size = 3;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 4) {
+ /* Provide 4/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 4L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 4L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 4;
+ cinfo->_min_DCT_v_scaled_size = 4;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 5) {
+ /* Provide 5/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 5L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 5L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 5;
+ cinfo->_min_DCT_v_scaled_size = 5;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 6) {
+ /* Provide 6/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 6L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 6L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 6;
+ cinfo->_min_DCT_v_scaled_size = 6;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 7) {
+ /* Provide 7/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 7L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 7L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 7;
+ cinfo->_min_DCT_v_scaled_size = 7;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 8) {
+ /* Provide 8/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 8L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 8L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 8;
+ cinfo->_min_DCT_v_scaled_size = 8;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 9) {
+ /* Provide 9/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 9L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 9L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 9;
+ cinfo->_min_DCT_v_scaled_size = 9;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 10) {
+ /* Provide 10/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 10L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 10L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 10;
+ cinfo->_min_DCT_v_scaled_size = 10;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 11) {
+ /* Provide 11/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 11L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 11L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 11;
+ cinfo->_min_DCT_v_scaled_size = 11;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 12) {
+ /* Provide 12/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 12L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 12L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 12;
+ cinfo->_min_DCT_v_scaled_size = 12;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 13) {
+ /* Provide 13/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 13L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 13L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 13;
+ cinfo->_min_DCT_v_scaled_size = 13;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 14) {
+ /* Provide 14/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 14L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 14L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 14;
+ cinfo->_min_DCT_v_scaled_size = 14;
+ } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 15) {
+ /* Provide 15/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 15L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 15L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 15;
+ cinfo->_min_DCT_v_scaled_size = 15;
+ } else {
+ /* Provide 16/block_size scaling */
+ cinfo->output_width = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_width * 16L, (long) DCTSIZE);
+ cinfo->output_height = (JDIMENSION)
+ jdiv_round_up((long) cinfo->image_height * 16L, (long) DCTSIZE);
+ cinfo->_min_DCT_h_scaled_size = 16;
+ cinfo->_min_DCT_v_scaled_size = 16;
+ }
+
+ /* Recompute dimensions of components */
+ for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
+ ci++, compptr++) {
+ compptr->_DCT_h_scaled_size = cinfo->_min_DCT_h_scaled_size;
+ compptr->_DCT_v_scaled_size = cinfo->_min_DCT_v_scaled_size;
+ }
+
+#else /* !IDCT_SCALING_SUPPORTED */
+
+ /* Hardwire it to "no scaling" */
+ cinfo->output_width = cinfo->image_width;
+ cinfo->output_height = cinfo->image_height;
+ /* jdinput.c has already initialized DCT_scaled_size,
+ * and has computed unscaled downsampled_width and downsampled_height.
+ */
+
+#endif /* IDCT_SCALING_SUPPORTED */
+}
+
+
+/*
+ * Compute output image dimensions and related values.
+ * NOTE: this is exported for possible use by application.
+ * Hence it mustn't do anything that can't be done twice.
* Also note that it may be called before the master module is initialized!
*/
@@ -105,65 +277,24 @@
if (cinfo->global_state != DSTATE_READY)
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
+ /* Compute core output image dimensions and DCT scaling choices. */
+ jpeg_core_output_dimensions(cinfo);
+
#ifdef IDCT_SCALING_SUPPORTED
- /* Compute actual output image dimensions and DCT scaling choices. */
- if (cinfo->scale_num * 8 <= cinfo->scale_denom) {
- /* Provide 1/8 scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width, 8L);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height, 8L);
-#if JPEG_LIB_VERSION >= 70
- cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = 1;
-#else
- cinfo->min_DCT_scaled_size = 1;
-#endif
- } else if (cinfo->scale_num * 4 <= cinfo->scale_denom) {
- /* Provide 1/4 scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width, 4L);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height, 4L);
-#if JPEG_LIB_VERSION >= 70
- cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = 2;
-#else
- cinfo->min_DCT_scaled_size = 2;
-#endif
- } else if (cinfo->scale_num * 2 <= cinfo->scale_denom) {
- /* Provide 1/2 scaling */
- cinfo->output_width = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_width, 2L);
- cinfo->output_height = (JDIMENSION)
- jdiv_round_up((long) cinfo->image_height, 2L);
-#if JPEG_LIB_VERSION >= 70
- cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = 4;
-#else
- cinfo->min_DCT_scaled_size = 4;
-#endif
- } else {
- /* Provide 1/1 scaling */
- cinfo->output_width = cinfo->image_width;
- cinfo->output_height = cinfo->image_height;
-#if JPEG_LIB_VERSION >= 70
- cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = DCTSIZE;
-#else
- cinfo->min_DCT_scaled_size = DCTSIZE;
-#endif
- }
/* In selecting the actual DCT scaling for each component, we try to
* scale up the chroma components via IDCT scaling rather than upsampling.
* This saves time if the upsampler gets to use 1:1 scaling.
- * Note this code assumes that the supported DCT scalings are powers of 2.
+ * Note this code adapts subsampling ratios which are powers of 2.
*/
for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
ci++, compptr++) {
int ssize = cinfo->_min_DCT_scaled_size;
while (ssize < DCTSIZE &&
- (compptr->h_samp_factor * ssize * 2 <=
- cinfo->max_h_samp_factor * cinfo->_min_DCT_scaled_size) &&
- (compptr->v_samp_factor * ssize * 2 <=
- cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size)) {
+ ((cinfo->max_h_samp_factor * cinfo->_min_DCT_scaled_size) %
+ (compptr->h_samp_factor * ssize * 2) == 0) &&
+ ((cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size) %
+ (compptr->v_samp_factor * ssize * 2) == 0)) {
ssize = ssize * 2;
}
#if JPEG_LIB_VERSION >= 70
diff --git a/jdmerge.c b/jdmerge.c
index c813080..5336125 100644
--- a/jdmerge.c
+++ b/jdmerge.c
@@ -38,6 +38,7 @@
#include "jinclude.h"
#include "jpeglib.h"
#include "jsimd.h"
+#include "config.h"
#ifdef UPSAMPLE_MERGING_SUPPORTED
@@ -102,6 +103,7 @@
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
#define RGB_BLUE EXT_RGBX_BLUE
+#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define h2v1_merged_upsample_internal extrgbx_h2v1_merged_upsample_internal
#define h2v2_merged_upsample_internal extrgbx_h2v2_merged_upsample_internal
@@ -109,6 +111,7 @@
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
+#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef h2v1_merged_upsample_internal
#undef h2v2_merged_upsample_internal
@@ -130,6 +133,7 @@
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
#define RGB_BLUE EXT_BGRX_BLUE
+#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define h2v1_merged_upsample_internal extbgrx_h2v1_merged_upsample_internal
#define h2v2_merged_upsample_internal extbgrx_h2v2_merged_upsample_internal
@@ -137,6 +141,7 @@
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
+#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef h2v1_merged_upsample_internal
#undef h2v2_merged_upsample_internal
@@ -144,6 +149,7 @@
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
#define RGB_BLUE EXT_XBGR_BLUE
+#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define h2v1_merged_upsample_internal extxbgr_h2v1_merged_upsample_internal
#define h2v2_merged_upsample_internal extxbgr_h2v2_merged_upsample_internal
@@ -151,6 +157,7 @@
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
+#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef h2v1_merged_upsample_internal
#undef h2v2_merged_upsample_internal
@@ -158,6 +165,7 @@
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
#define RGB_BLUE EXT_XRGB_BLUE
+#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define h2v1_merged_upsample_internal extxrgb_h2v1_merged_upsample_internal
#define h2v2_merged_upsample_internal extxrgb_h2v2_merged_upsample_internal
@@ -165,6 +173,7 @@
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
+#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef h2v1_merged_upsample_internal
#undef h2v2_merged_upsample_internal
diff --git a/jdmrgext.c b/jdmrgext.c
index 95ddd55..2b93265 100644
--- a/jdmrgext.c
+++ b/jdmrgext.c
@@ -2,6 +2,7 @@
* jdmrgext.c
*
* Copyright (C) 1994-1996, Thomas G. Lane.
+ * Copyright (C) 2011, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -54,11 +55,17 @@
outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr[RGB_ALPHA] = 0xFF;
+#endif
outptr += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr0++);
outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr[RGB_ALPHA] = 0xFF;
+#endif
outptr += RGB_PIXELSIZE;
}
/* If image width is odd, do the last output column separately */
@@ -72,6 +79,9 @@
outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr[RGB_ALPHA] = 0xFF;
+#endif
}
}
@@ -120,21 +130,33 @@
outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr0[RGB_ALPHA] = 0xFF;
+#endif
outptr0 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr00++);
outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr0[RGB_ALPHA] = 0xFF;
+#endif
outptr0 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr01++);
outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr1[RGB_ALPHA] = 0xFF;
+#endif
outptr1 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr01++);
outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr1[RGB_ALPHA] = 0xFF;
+#endif
outptr1 += RGB_PIXELSIZE;
}
/* If image width is odd, do the last output column separately */
@@ -148,9 +170,15 @@
outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr0[RGB_ALPHA] = 0xFF;
+#endif
y = GETJSAMPLE(*inptr01);
outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue];
+#ifdef RGB_ALPHA
+ outptr1[RGB_ALPHA] = 0xFF;
+#endif
}
}
diff --git a/jidctint.c b/jidctint.c
index a72b320..77d8121 100644
--- a/jidctint.c
+++ b/jidctint.c
@@ -2,6 +2,7 @@
* jidctint.c
*
* Copyright (C) 1991-1998, Thomas G. Lane.
+ * Modification developed 2002-2009 by Guido Vollbeding.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -23,6 +24,27 @@
* The advantage of this method is that no data path contains more than one
* multiplication; this allows a very simple and accurate implementation in
* scaled fixed-point arithmetic, with a minimal number of shifts.
+ *
+ * We also provide IDCT routines with various output sample block sizes for
+ * direct resolution reduction or enlargement without additional resampling:
+ * NxN (N=1...16) pixels for one 8x8 input DCT block.
+ *
+ * For N<8 we simply take the corresponding low-frequency coefficients of
+ * the 8x8 input DCT block and apply an NxN point IDCT on the sub-block
+ * to yield the downscaled outputs.
+ * This can be seen as direct low-pass downsampling from the DCT domain
+ * point of view rather than the usual spatial domain point of view,
+ * yielding significant computational savings and results at least
+ * as good as common bilinear (averaging) spatial downsampling.
+ *
+ * For N>8 we apply a partial NxN IDCT on the 8 input coefficients as
+ * lower frequencies and higher frequencies assumed to be zero.
+ * It turns out that the computational effort is similar to the 8x8 IDCT
+ * regarding the output size.
+ * Furthermore, the scaling and descaling is the same for all IDCT sizes.
+ *
+ * CAUTION: We rely on the FIX() macro except for the N=1,2,4,8 cases
+ * since there would be too many additional constants to pre-calculate.
*/
#define JPEG_INTERNALS
@@ -38,7 +60,7 @@
*/
#if DCTSIZE != 8
- Sorry, this code only copes with 8x8 DCTs. /* deliberate syntax err */
+ Sorry, this code only copes with 8x8 DCT blocks. /* deliberate syntax err */
#endif
@@ -386,4 +408,2216 @@
}
}
+#ifdef IDCT_SCALING_SUPPORTED
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 7x7 output block.
+ *
+ * Optimized algorithm with 12 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/14).
+ */
+
+GLOBAL(void)
+jpeg_idct_7x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12, tmp13;
+ INT32 z1, z2, z3;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[7*7]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp13 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp13 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp13 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */
+ tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */
+ tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
+ tmp0 = z1 + z3;
+ z2 -= tmp0;
+ tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
+ tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */
+ tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */
+ tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+
+ tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */
+ tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */
+ tmp0 = tmp1 - tmp2;
+ tmp1 += tmp2;
+ tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */
+ tmp1 += tmp2;
+ z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */
+ tmp0 += z2;
+ tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */
+
+ /* Final output stage */
+
+ wsptr[7*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[7*6] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[7*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[7*5] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[7*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[7*4] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[7*3] = (int) RIGHT_SHIFT(tmp13, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 7 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 7; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp13 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp13 <<= CONST_BITS;
+
+ z1 = (INT32) wsptr[2];
+ z2 = (INT32) wsptr[4];
+ z3 = (INT32) wsptr[6];
+
+ tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */
+ tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */
+ tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
+ tmp0 = z1 + z3;
+ z2 -= tmp0;
+ tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
+ tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */
+ tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */
+ tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+
+ tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */
+ tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */
+ tmp0 = tmp1 - tmp2;
+ tmp1 += tmp2;
+ tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */
+ tmp1 += tmp2;
+ z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */
+ tmp0 += z2;
+ tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 7; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a reduced-size 6x6 output block.
+ *
+ * Optimized algorithm with 3 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/12).
+ */
+
+GLOBAL(void)
+jpeg_idct_6x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;
+ INT32 z1, z2, z3;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[6*6]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp0 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
+ tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */
+ tmp1 = tmp0 + tmp10;
+ tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS);
+ tmp10 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */
+ tmp10 = tmp1 + tmp0;
+ tmp12 = tmp1 - tmp0;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
+ tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
+ tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
+ tmp1 = (z1 - z2 - z3) << PASS1_BITS;
+
+ /* Final output stage */
+
+ wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[6*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[6*1] = (int) (tmp11 + tmp1);
+ wsptr[6*4] = (int) (tmp11 - tmp1);
+ wsptr[6*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[6*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 6 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 6; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp0 <<= CONST_BITS;
+ tmp2 = (INT32) wsptr[4];
+ tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */
+ tmp1 = tmp0 + tmp10;
+ tmp11 = tmp0 - tmp10 - tmp10;
+ tmp10 = (INT32) wsptr[2];
+ tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */
+ tmp10 = tmp1 + tmp0;
+ tmp12 = tmp1 - tmp0;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
+ tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
+ tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
+ tmp1 = (z1 - z2 - z3) << CONST_BITS;
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 6; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a reduced-size 5x5 output block.
+ *
+ * Optimized algorithm with 5 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/10).
+ */
+
+GLOBAL(void)
+jpeg_idct_5x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp1, tmp10, tmp11, tmp12;
+ INT32 z1, z2, z3;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[5*5]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp12 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp12 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp12 += ONE << (CONST_BITS-PASS1_BITS-1);
+ tmp0 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ tmp1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
+ z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
+ z3 = tmp12 + z2;
+ tmp10 = z3 + z1;
+ tmp11 = z3 - z1;
+ tmp12 -= z2 << 2;
+
+ /* Odd part */
+
+ z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+
+ z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */
+ tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */
+ tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */
+
+ /* Final output stage */
+
+ wsptr[5*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[5*4] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[5*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[5*3] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[5*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 5 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 5; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp12 <<= CONST_BITS;
+ tmp0 = (INT32) wsptr[2];
+ tmp1 = (INT32) wsptr[4];
+ z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
+ z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
+ z3 = tmp12 + z2;
+ tmp10 = z3 + z1;
+ tmp11 = z3 - z1;
+ tmp12 -= z2 << 2;
+
+ /* Odd part */
+
+ z2 = (INT32) wsptr[1];
+ z3 = (INT32) wsptr[3];
+
+ z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */
+ tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */
+ tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 5; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a reduced-size 3x3 output block.
+ *
+ * Optimized algorithm with 2 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/6).
+ */
+
+GLOBAL(void)
+jpeg_idct_3x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp2, tmp10, tmp12;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[3*3]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp0 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
+ tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
+ tmp10 = tmp0 + tmp12;
+ tmp2 = tmp0 - tmp12 - tmp12;
+
+ /* Odd part */
+
+ tmp12 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
+
+ /* Final output stage */
+
+ wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[3*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[3*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 3 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 3; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp0 <<= CONST_BITS;
+ tmp2 = (INT32) wsptr[2];
+ tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
+ tmp10 = tmp0 + tmp12;
+ tmp2 = tmp0 - tmp12 - tmp12;
+
+ /* Odd part */
+
+ tmp12 = (INT32) wsptr[1];
+ tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 3; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 9x9 output block.
+ *
+ * Optimized algorithm with 10 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/18).
+ */
+
+GLOBAL(void)
+jpeg_idct_9x9 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13, tmp14;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*9]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp0 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */
+ tmp1 = tmp0 + tmp3;
+ tmp2 = tmp0 - tmp3 - tmp3;
+
+ tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
+ tmp11 = tmp2 + tmp0;
+ tmp14 = tmp2 - tmp0 - tmp0;
+
+ tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
+ tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */
+ tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */
+
+ tmp10 = tmp1 + tmp0 - tmp3;
+ tmp12 = tmp1 - tmp0 + tmp2;
+ tmp13 = tmp1 - tmp2 + tmp3;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */
+
+ tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */
+ tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */
+ tmp0 = tmp2 + tmp3 - z2;
+ tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */
+ tmp2 += z2 - tmp1;
+ tmp3 += z2 + tmp1;
+ tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp13 + tmp3, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp13 - tmp3, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp14, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 9 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 9; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp0 <<= CONST_BITS;
+
+ z1 = (INT32) wsptr[2];
+ z2 = (INT32) wsptr[4];
+ z3 = (INT32) wsptr[6];
+
+ tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */
+ tmp1 = tmp0 + tmp3;
+ tmp2 = tmp0 - tmp3 - tmp3;
+
+ tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
+ tmp11 = tmp2 + tmp0;
+ tmp14 = tmp2 - tmp0 - tmp0;
+
+ tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
+ tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */
+ tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */
+
+ tmp10 = tmp1 + tmp0 - tmp3;
+ tmp12 = tmp1 - tmp0 + tmp2;
+ tmp13 = tmp1 - tmp2 + tmp3;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+
+ z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */
+
+ tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */
+ tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */
+ tmp0 = tmp2 + tmp3 - z2;
+ tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */
+ tmp2 += z2 - tmp1;
+ tmp3 += z2 + tmp1;
+ tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp3,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp3,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 10x10 output block.
+ *
+ * Optimized algorithm with 12 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/20).
+ */
+
+GLOBAL(void)
+jpeg_idct_10x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24;
+ INT32 z1, z2, z3, z4, z5;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*10]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ z3 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ z3 += ONE << (CONST_BITS-PASS1_BITS-1);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */
+ z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */
+ tmp10 = z3 + z1;
+ tmp11 = z3 - z2;
+
+ tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1), /* c0 = (c4-c8)*2 */
+ CONST_BITS-PASS1_BITS);
+
+ z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */
+ tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
+ tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
+
+ tmp20 = tmp10 + tmp12;
+ tmp24 = tmp10 - tmp12;
+ tmp21 = tmp11 + tmp13;
+ tmp23 = tmp11 - tmp13;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp11 = z2 + z4;
+ tmp13 = z2 - z4;
+
+ tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */
+ z5 = z3 << CONST_BITS;
+
+ z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */
+ z4 = z5 + tmp12;
+
+ tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
+ tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
+
+ z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */
+ z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1));
+
+ tmp12 = (z1 - tmp13 - z3) << PASS1_BITS;
+
+ tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
+ tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) (tmp22 + tmp12);
+ wsptr[8*7] = (int) (tmp22 - tmp12);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 10 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 10; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ z3 <<= CONST_BITS;
+ z4 = (INT32) wsptr[4];
+ z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */
+ z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */
+ tmp10 = z3 + z1;
+ tmp11 = z3 - z2;
+
+ tmp22 = z3 - ((z1 - z2) << 1); /* c0 = (c4-c8)*2 */
+
+ z2 = (INT32) wsptr[2];
+ z3 = (INT32) wsptr[6];
+
+ z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */
+ tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
+ tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
+
+ tmp20 = tmp10 + tmp12;
+ tmp24 = tmp10 - tmp12;
+ tmp21 = tmp11 + tmp13;
+ tmp23 = tmp11 - tmp13;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z3 <<= CONST_BITS;
+ z4 = (INT32) wsptr[7];
+
+ tmp11 = z2 + z4;
+ tmp13 = z2 - z4;
+
+ tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */
+
+ z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */
+ z4 = z3 + tmp12;
+
+ tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
+ tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
+
+ z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */
+ z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1));
+
+ tmp12 = ((z1 - tmp13) << CONST_BITS) - z3;
+
+ tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
+ tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 11x11 output block.
+ *
+ * Optimized algorithm with 24 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/22).
+ */
+
+GLOBAL(void)
+jpeg_idct_11x11 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*11]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp10 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp10 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */
+ tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */
+ z4 = z1 + z3;
+ tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */
+ z4 -= z2;
+ tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */
+ tmp21 = tmp20 + tmp23 + tmp25 -
+ MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */
+ tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
+ tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
+ tmp24 += tmp25;
+ tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */
+ tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */
+ MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */
+ tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp11 = z1 + z2;
+ tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
+ tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */
+ tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */
+ tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
+ tmp10 = tmp11 + tmp12 + tmp13 -
+ MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */
+ z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
+ tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */
+ tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */
+ z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */
+ tmp11 += z1;
+ tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */
+ tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */
+ MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */
+ MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*10] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 11 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 11; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp10 <<= CONST_BITS;
+
+ z1 = (INT32) wsptr[2];
+ z2 = (INT32) wsptr[4];
+ z3 = (INT32) wsptr[6];
+
+ tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */
+ tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */
+ z4 = z1 + z3;
+ tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */
+ z4 -= z2;
+ tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */
+ tmp21 = tmp20 + tmp23 + tmp25 -
+ MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */
+ tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
+ tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
+ tmp24 += tmp25;
+ tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */
+ tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */
+ MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */
+ tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+
+ tmp11 = z1 + z2;
+ tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
+ tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */
+ tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */
+ tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
+ tmp10 = tmp11 + tmp12 + tmp13 -
+ MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */
+ z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
+ tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */
+ tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */
+ z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */
+ tmp11 += z1;
+ tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */
+ tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */
+ MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */
+ MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 12x12 output block.
+ *
+ * Optimized algorithm with 15 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/24).
+ */
+
+GLOBAL(void)
+jpeg_idct_12x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*12]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ z3 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ z3 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
+
+ tmp10 = z3 + z4;
+ tmp11 = z3 - z4;
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
+ z1 <<= CONST_BITS;
+ z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+ z2 <<= CONST_BITS;
+
+ tmp12 = z1 - z2;
+
+ tmp21 = z3 + tmp12;
+ tmp24 = z3 - tmp12;
+
+ tmp12 = z4 + z2;
+
+ tmp20 = tmp10 + tmp12;
+ tmp25 = tmp10 - tmp12;
+
+ tmp12 = z4 - z1 - z2;
+
+ tmp22 = tmp11 + tmp12;
+ tmp23 = tmp11 - tmp12;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */
+ tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */
+
+ tmp10 = z1 + z3;
+ tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */
+ tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */
+ tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */
+ tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */
+ tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
+ tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
+ tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */
+ MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */
+
+ z1 -= z4;
+ z2 -= z3;
+ z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */
+ tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */
+ tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 12 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 12; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ z3 <<= CONST_BITS;
+
+ z4 = (INT32) wsptr[4];
+ z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
+
+ tmp10 = z3 + z4;
+ tmp11 = z3 - z4;
+
+ z1 = (INT32) wsptr[2];
+ z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
+ z1 <<= CONST_BITS;
+ z2 = (INT32) wsptr[6];
+ z2 <<= CONST_BITS;
+
+ tmp12 = z1 - z2;
+
+ tmp21 = z3 + tmp12;
+ tmp24 = z3 - tmp12;
+
+ tmp12 = z4 + z2;
+
+ tmp20 = tmp10 + tmp12;
+ tmp25 = tmp10 - tmp12;
+
+ tmp12 = z4 - z1 - z2;
+
+ tmp22 = tmp11 + tmp12;
+ tmp23 = tmp11 - tmp12;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+
+ tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */
+ tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */
+
+ tmp10 = z1 + z3;
+ tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */
+ tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */
+ tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */
+ tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */
+ tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
+ tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
+ tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */
+ MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */
+
+ z1 -= z4;
+ z2 -= z3;
+ z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */
+ tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */
+ tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 13x13 output block.
+ *
+ * Optimized algorithm with 29 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/26).
+ */
+
+GLOBAL(void)
+jpeg_idct_13x13 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*13]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ z1 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ z1 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ tmp10 = z3 + z4;
+ tmp11 = z3 - z4;
+
+ tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */
+
+ tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */
+ tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */
+
+ tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */
+
+ tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */
+ tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
+
+ tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */
+
+ tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
+ tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
+
+ tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */
+ tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */
+ tmp15 = z1 + z4;
+ tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */
+ tmp10 = tmp11 + tmp12 + tmp13 -
+ MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */
+ tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */
+ tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
+ tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
+ tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */
+ tmp11 += tmp14;
+ tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
+ tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */
+ tmp12 += tmp14;
+ tmp13 += tmp14;
+ tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */
+ tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
+ MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */
+ z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */
+ tmp14 += z1;
+ tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */
+ MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*12] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*11] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*10] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp26, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 13 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 13; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ z1 <<= CONST_BITS;
+
+ z2 = (INT32) wsptr[2];
+ z3 = (INT32) wsptr[4];
+ z4 = (INT32) wsptr[6];
+
+ tmp10 = z3 + z4;
+ tmp11 = z3 - z4;
+
+ tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */
+
+ tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */
+ tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */
+
+ tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */
+
+ tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */
+ tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
+
+ tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */
+ tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */
+
+ tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
+ tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
+
+ tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+
+ tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */
+ tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */
+ tmp15 = z1 + z4;
+ tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */
+ tmp10 = tmp11 + tmp12 + tmp13 -
+ MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */
+ tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */
+ tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
+ tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
+ tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */
+ tmp11 += tmp14;
+ tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
+ tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */
+ tmp12 += tmp14;
+ tmp13 += tmp14;
+ tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */
+ tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
+ MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */
+ z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */
+ tmp14 += z1;
+ tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */
+ MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 14x14 output block.
+ *
+ * Optimized algorithm with 20 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/28).
+ */
+
+GLOBAL(void)
+jpeg_idct_14x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*14]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ z1 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ z1 += ONE << (CONST_BITS-PASS1_BITS-1);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */
+ z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */
+ z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */
+
+ tmp10 = z1 + z2;
+ tmp11 = z1 + z3;
+ tmp12 = z1 - z4;
+
+ tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)*2 */
+ CONST_BITS-PASS1_BITS);
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */
+
+ tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
+ tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
+ tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */
+ MULTIPLY(z2, FIX(1.378756276)); /* c2 */
+
+ tmp20 = tmp10 + tmp13;
+ tmp26 = tmp10 - tmp13;
+ tmp21 = tmp11 + tmp14;
+ tmp25 = tmp11 - tmp14;
+ tmp22 = tmp12 + tmp15;
+ tmp24 = tmp12 - tmp15;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+ tmp13 = z4 << CONST_BITS;
+
+ tmp14 = z1 + z3;
+ tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */
+ tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */
+ tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
+ tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */
+ tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */
+ z1 -= z2;
+ tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13; /* c11 */
+ tmp16 += tmp15;
+ z1 += z4;
+ z4 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */
+ tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */
+ tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */
+ z4 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */
+ tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
+ tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */
+
+ tmp13 = (z1 - z3) << PASS1_BITS;
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) (tmp23 + tmp13);
+ wsptr[8*10] = (int) (tmp23 - tmp13);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 14 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 14; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ z1 <<= CONST_BITS;
+ z4 = (INT32) wsptr[4];
+ z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */
+ z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */
+ z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */
+
+ tmp10 = z1 + z2;
+ tmp11 = z1 + z3;
+ tmp12 = z1 - z4;
+
+ tmp23 = z1 - ((z2 + z3 - z4) << 1); /* c0 = (c4+c12-c8)*2 */
+
+ z1 = (INT32) wsptr[2];
+ z2 = (INT32) wsptr[6];
+
+ z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */
+
+ tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
+ tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
+ tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */
+ MULTIPLY(z2, FIX(1.378756276)); /* c2 */
+
+ tmp20 = tmp10 + tmp13;
+ tmp26 = tmp10 - tmp13;
+ tmp21 = tmp11 + tmp14;
+ tmp25 = tmp11 - tmp14;
+ tmp22 = tmp12 + tmp15;
+ tmp24 = tmp12 - tmp15;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+ z4 <<= CONST_BITS;
+
+ tmp14 = z1 + z3;
+ tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */
+ tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */
+ tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
+ tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */
+ tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */
+ z1 -= z2;
+ tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4; /* c11 */
+ tmp16 += tmp15;
+ tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4; /* -c13 */
+ tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */
+ tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */
+ tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */
+ tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
+ tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */
+
+ tmp13 = ((z1 - z3) << CONST_BITS) + z4;
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 15x15 output block.
+ *
+ * Optimized algorithm with 22 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/30).
+ */
+
+GLOBAL(void)
+jpeg_idct_15x15 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*15]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ z1 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ z1 += ONE << (CONST_BITS-PASS1_BITS-1);
+
+ z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+
+ tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
+ tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
+
+ tmp12 = z1 - tmp10;
+ tmp13 = z1 + tmp11;
+ z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */
+
+ z4 = z2 - z3;
+ z3 += z2;
+ tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
+ z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */
+
+ tmp20 = tmp13 + tmp10 + tmp11;
+ tmp23 = tmp12 - tmp10 + tmp11 + z2;
+
+ tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
+
+ tmp25 = tmp13 - tmp10 - tmp11;
+ tmp26 = tmp12 + tmp10 - tmp11 - z2;
+
+ tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
+
+ tmp21 = tmp12 + tmp10 + tmp11;
+ tmp24 = tmp13 - tmp10 + tmp11;
+ tmp11 += tmp11;
+ tmp22 = z1 + tmp11; /* c10 = c6-c12 */
+ tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp13 = z2 - z4;
+ tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */
+ tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */
+ tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */
+
+ tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */
+ tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */
+ z2 = z1 - z4;
+ tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */
+
+ tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
+ tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
+ tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */
+ z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */
+ tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */
+ tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*14] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*13] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*12] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*11] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*10] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp27, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 15 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 15; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ z1 <<= CONST_BITS;
+
+ z2 = (INT32) wsptr[2];
+ z3 = (INT32) wsptr[4];
+ z4 = (INT32) wsptr[6];
+
+ tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
+ tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
+
+ tmp12 = z1 - tmp10;
+ tmp13 = z1 + tmp11;
+ z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */
+
+ z4 = z2 - z3;
+ z3 += z2;
+ tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
+ z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */
+
+ tmp20 = tmp13 + tmp10 + tmp11;
+ tmp23 = tmp12 - tmp10 + tmp11 + z2;
+
+ tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
+
+ tmp25 = tmp13 - tmp10 - tmp11;
+ tmp26 = tmp12 + tmp10 - tmp11 - z2;
+
+ tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
+ tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
+
+ tmp21 = tmp12 + tmp10 + tmp11;
+ tmp24 = tmp13 - tmp10 + tmp11;
+ tmp11 += tmp11;
+ tmp22 = z1 + tmp11; /* c10 = c6-c12 */
+ tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z4 = (INT32) wsptr[5];
+ z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */
+ z4 = (INT32) wsptr[7];
+
+ tmp13 = z2 - z4;
+ tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */
+ tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */
+ tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */
+
+ tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */
+ tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */
+ z2 = z1 - z4;
+ tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */
+
+ tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
+ tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
+ tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */
+ z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */
+ tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */
+ tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+
+/*
+ * Perform dequantization and inverse DCT on one block of coefficients,
+ * producing a 16x16 output block.
+ *
+ * Optimized algorithm with 28 multiplications in the 1-D kernel.
+ * cK represents sqrt(2) * cos(K*pi/32).
+ */
+
+GLOBAL(void)
+jpeg_idct_16x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
+ JCOEFPTR coef_block,
+ JSAMPARRAY output_buf, JDIMENSION output_col)
+{
+ INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;
+ INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
+ INT32 z1, z2, z3, z4;
+ JCOEFPTR inptr;
+ ISLOW_MULT_TYPE * quantptr;
+ int * wsptr;
+ JSAMPROW outptr;
+ JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+ int ctr;
+ int workspace[8*16]; /* buffers data between passes */
+ SHIFT_TEMPS
+
+ /* Pass 1: process columns from input, store into work array. */
+
+ inptr = coef_block;
+ quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
+ wsptr = workspace;
+ for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
+ /* Even part */
+
+ tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+ tmp0 <<= CONST_BITS;
+ /* Add fudge factor here for final descale. */
+ tmp0 += 1 << (CONST_BITS-PASS1_BITS-1);
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+ tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */
+ tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */
+
+ tmp10 = tmp0 + tmp1;
+ tmp11 = tmp0 - tmp1;
+ tmp12 = tmp0 + tmp2;
+ tmp13 = tmp0 - tmp2;
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+ z3 = z1 - z2;
+ z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */
+ z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */
+
+ tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */
+ tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */
+ tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
+ tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
+
+ tmp20 = tmp10 + tmp0;
+ tmp27 = tmp10 - tmp0;
+ tmp21 = tmp12 + tmp1;
+ tmp26 = tmp12 - tmp1;
+ tmp22 = tmp13 + tmp2;
+ tmp25 = tmp13 - tmp2;
+ tmp23 = tmp11 + tmp3;
+ tmp24 = tmp11 - tmp3;
+
+ /* Odd part */
+
+ z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+ z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+ z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+ z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+
+ tmp11 = z1 + z3;
+
+ tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */
+ tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */
+ tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */
+ tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */
+ tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */
+ tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */
+ tmp0 = tmp1 + tmp2 + tmp3 -
+ MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */
+ tmp13 = tmp10 + tmp11 + tmp12 -
+ MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */
+ z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */
+ tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */
+ tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */
+ z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */
+ tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */
+ tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */
+ z2 += z4;
+ z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */
+ tmp1 += z1;
+ tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */
+ z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */
+ tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */
+ tmp12 += z2;
+ z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
+ tmp2 += z2;
+ tmp3 += z2;
+ z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */
+ tmp10 += z2;
+ tmp11 += z2;
+
+ /* Final output stage */
+
+ wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0, CONST_BITS-PASS1_BITS);
+ wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1, CONST_BITS-PASS1_BITS);
+ wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2, CONST_BITS-PASS1_BITS);
+ wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp3, CONST_BITS-PASS1_BITS);
+ wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3, CONST_BITS-PASS1_BITS);
+ wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS);
+ wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS);
+ wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*9] = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS);
+ wsptr[8*7] = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS);
+ wsptr[8*8] = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS);
+ }
+
+ /* Pass 2: process 16 rows from work array, store into output array. */
+
+ wsptr = workspace;
+ for (ctr = 0; ctr < 16; ctr++) {
+ outptr = output_buf[ctr] + output_col;
+
+ /* Even part */
+
+ /* Add fudge factor here for final descale. */
+ tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
+ tmp0 <<= CONST_BITS;
+
+ z1 = (INT32) wsptr[4];
+ tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */
+ tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */
+
+ tmp10 = tmp0 + tmp1;
+ tmp11 = tmp0 - tmp1;
+ tmp12 = tmp0 + tmp2;
+ tmp13 = tmp0 - tmp2;
+
+ z1 = (INT32) wsptr[2];
+ z2 = (INT32) wsptr[6];
+ z3 = z1 - z2;
+ z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */
+ z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */
+
+ tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */
+ tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */
+ tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
+ tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] */
+
+ tmp20 = tmp10 + tmp0;
+ tmp27 = tmp10 - tmp0;
+ tmp21 = tmp12 + tmp1;
+ tmp26 = tmp12 - tmp1;
+ tmp22 = tmp13 + tmp2;
+ tmp25 = tmp13 - tmp2;
+ tmp23 = tmp11 + tmp3;
+ tmp24 = tmp11 - tmp3;
+
+ /* Odd part */
+
+ z1 = (INT32) wsptr[1];
+ z2 = (INT32) wsptr[3];
+ z3 = (INT32) wsptr[5];
+ z4 = (INT32) wsptr[7];
+
+ tmp11 = z1 + z3;
+
+ tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */
+ tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */
+ tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */
+ tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */
+ tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */
+ tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */
+ tmp0 = tmp1 + tmp2 + tmp3 -
+ MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */
+ tmp13 = tmp10 + tmp11 + tmp12 -
+ MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */
+ z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */
+ tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */
+ tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */
+ z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */
+ tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */
+ tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */
+ z2 += z4;
+ z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */
+ tmp1 += z1;
+ tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */
+ z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */
+ tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */
+ tmp12 += z2;
+ z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
+ tmp2 += z2;
+ tmp3 += z2;
+ z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */
+ tmp10 += z2;
+ tmp11 += z2;
+
+ /* Final output stage */
+
+ outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+ outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13,
+ CONST_BITS+PASS1_BITS+3)
+ & RANGE_MASK];
+
+ wsptr += 8; /* advance pointer to next row */
+ }
+}
+
+#endif /* IDCT_SCALING_SUPPORTED */
#endif /* DCT_ISLOW_SUPPORTED */
diff --git a/jpegcomp.h b/jpegcomp.h
index 1b9e0a4..ed9eeab 100644
--- a/jpegcomp.h
+++ b/jpegcomp.h
@@ -11,6 +11,8 @@
#if JPEG_LIB_VERSION >= 70
#define _DCT_scaled_size DCT_h_scaled_size
+#define _DCT_h_scaled_size DCT_h_scaled_size
+#define _DCT_v_scaled_size DCT_v_scaled_size
#define _min_DCT_scaled_size min_DCT_h_scaled_size
#define _min_DCT_h_scaled_size min_DCT_h_scaled_size
#define _min_DCT_v_scaled_size min_DCT_v_scaled_size
@@ -18,6 +20,8 @@
#define _jpeg_height jpeg_height
#else
#define _DCT_scaled_size DCT_scaled_size
+#define _DCT_h_scaled_size DCT_scaled_size
+#define _DCT_v_scaled_size DCT_scaled_size
#define _min_DCT_scaled_size min_DCT_scaled_size
#define _min_DCT_h_scaled_size min_DCT_scaled_size
#define _min_DCT_v_scaled_size min_DCT_scaled_size
diff --git a/jpegtran.c b/jpegtran.c
index f861464..6a22ed2 100644
--- a/jpegtran.c
+++ b/jpegtran.c
@@ -78,14 +78,14 @@
fprintf(stderr, " -trim Drop non-transformable edge blocks\n");
#endif
fprintf(stderr, "Switches for advanced users:\n");
+#ifdef C_ARITH_CODING_SUPPORTED
+ fprintf(stderr, " -arithmetic Use arithmetic coding\n");
+#endif
fprintf(stderr, " -restart N Set restart interval in rows, or in blocks with B\n");
fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
fprintf(stderr, " -outfile name Specify name for output file\n");
fprintf(stderr, " -verbose or -debug Emit debug output\n");
fprintf(stderr, "Switches for wizards:\n");
-#ifdef C_ARITH_CODING_SUPPORTED
- fprintf(stderr, " -arithmetic Use arithmetic coding\n");
-#endif
#ifdef C_MULTISCAN_FILES_SUPPORTED
fprintf(stderr, " -scans file Create multi-scan JPEG per script file\n");
#endif
@@ -204,9 +204,9 @@
if (! printed_version) {
fprintf(stderr, "%s version %s (build %s)\n",
PACKAGE_NAME, VERSION, BUILD);
- fprintf(stderr, "%s\n\n", LJTCOPYRIGHT);
- fprintf(stderr, "Based on Independent JPEG Group's libjpeg, version %s\n%s\n\n",
- JVERSION, JCOPYRIGHT);
+ fprintf(stderr, "%s\n\n", JCOPYRIGHT);
+ fprintf(stderr, "Emulating The Independent JPEG Group's libjpeg, version %s\n\n",
+ JVERSION);
printed_version = TRUE;
}
cinfo->err->trace_level++;
diff --git a/jversion.h b/jversion.h
index a045405..71d7b91 100644
--- a/jversion.h
+++ b/jversion.h
@@ -2,7 +2,7 @@
* jversion.h
*
* Copyright (C) 1991-2010, Thomas G. Lane, Guido Vollbeding.
- * Copyright (C) 2010, D. R. Commander.
+ * Copyright (C) 2010, 2012, D. R. Commander.
* This file is part of the Independent JPEG Group's software.
* For conditions of distribution and use, see the accompanying README file.
*
@@ -14,23 +14,18 @@
#define JVERSION "8b 16-May-2010"
-#define JCOPYRIGHT "Copyright (C) 2010, Thomas G. Lane, Guido Vollbeding"
-
#elif JPEG_LIB_VERSION >= 70
#define JVERSION "7 27-Jun-2009"
-#define JCOPYRIGHT "Copyright (C) 2009, Thomas G. Lane, Guido Vollbeding"
-
#else
#define JVERSION "6b 27-Mar-1998"
-#define JCOPYRIGHT "Copyright (C) 1998, Thomas G. Lane"
-
#endif
-#define LJTCOPYRIGHT "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \
+#define JCOPYRIGHT "Copyright (C) 1991-2010 Thomas G. Lane, Guido Vollbeding\n" \
+ "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \
"Copyright (C) 2009 Pierre Ossman for Cendio AB\n" \
- "Copyright (C) 2009-2011 D. R. Commander\n" \
+ "Copyright (C) 2009-2012 D. R. Commander\n" \
"Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies)"
diff --git a/simd/jdmrgss2.asm b/simd/jdmrgss2.asm
index f1c3c9d..f0fcc2c 100644
--- a/simd/jdmrgss2.asm
+++ b/simd/jdmrgss2.asm
@@ -478,7 +478,7 @@
movq MMWORD [edi], xmmA
add edi, byte SIZEOF_XMMWORD/2
sub ecx, byte SIZEOF_XMMWORD/8
- psrldq xmmA, SIZEOF_XMMWORD/8*4
+ psrldq xmmA, 64
.column_st7:
; Store one pixel (4 bytes) of xmmA to the output when it has enough
; space.
diff --git a/simd/jsimd.h b/simd/jsimd.h
index 6ee99cc..3d4751f 100644
--- a/simd/jsimd.h
+++ b/simd/jsimd.h
@@ -522,6 +522,10 @@
JPP((JDIMENSION output_width, JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
+EXTERN(void) jsimd_h2v1_fancy_upsample_neon
+ JPP((int max_v_samp_factor, JDIMENSION downsampled_width,
+ JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
+
/* SIMD Sample Conversion */
EXTERN(void) jsimd_convsamp_mmx JPP((JSAMPARRAY sample_data,
JDIMENSION start_col,
diff --git a/simd/jsimd_arm.c b/simd/jsimd_arm.c
index 5a095f2..cae84df 100644
--- a/simd/jsimd_arm.c
+++ b/simd/jsimd_arm.c
@@ -104,7 +104,7 @@
int bufsize = 1024; /* an initial guess for the line buffer size limit */
#endif
- if (simd_support != ~0)
+ if (simd_support != ~0U)
return;
simd_support = 0;
@@ -338,6 +338,15 @@
{
init_simd();
+ /* The code is optimised for these values only */
+ if (BITS_IN_JSAMPLE != 8)
+ return 0;
+ if (sizeof(JDIMENSION) != 4)
+ return 0;
+
+ if (simd_support & JSIMD_ARM_NEON)
+ return 1;
+
return 0;
}
@@ -355,6 +364,9 @@
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
+ if (simd_support & JSIMD_ARM_NEON)
+ jsimd_h2v1_fancy_upsample_neon(cinfo->max_v_samp_factor,
+ compptr->downsampled_width, input_data, output_data_ptr);
}
GLOBAL(int)
diff --git a/simd/jsimd_arm_neon.S b/simd/jsimd_arm_neon.S
index b2f9c2a..9962b8a 100644
--- a/simd/jsimd_arm_neon.S
+++ b/simd/jsimd_arm_neon.S
@@ -2157,3 +2157,241 @@
.unreq SHIFT
.unreq LOOP_COUNT
.endfunc
+
+/*****************************************************************************/
+
+/*
+ * GLOBAL(void)
+ * jsimd_h2v1_fancy_upsample_neon (int max_v_samp_factor,
+ * JDIMENSION downsampled_width,
+ * JSAMPARRAY input_data,
+ * JSAMPARRAY * output_data_ptr);
+ *
+ * Note: the use of unaligned writes is the main remaining bottleneck in
+ * this code, which can be potentially solved to get up to tens
+ * of percents performance improvement on Cortex-A8/Cortex-A9.
+ */
+
+/*
+ * Upsample 16 source pixels to 32 destination pixels. The new 16 source
+ * pixels are loaded to q0. The previous 16 source pixels are in q1. The
+ * shifted-by-one source pixels are constructed in q2 by using q0 and q1.
+ * Register d28 is used for multiplication by 3. Register q15 is used
+ * for adding +1 bias.
+ */
+.macro upsample16 OUTPTR, INPTR
+ vld1.8 {q0}, [\INPTR]!
+ vmovl.u8 q8, d0
+ vext.8 q2, q1, q0, #15
+ vmovl.u8 q9, d1
+ vaddw.u8 q10, q15, d4
+ vaddw.u8 q11, q15, d5
+ vmlal.u8 q8, d4, d28
+ vmlal.u8 q9, d5, d28
+ vmlal.u8 q10, d0, d28
+ vmlal.u8 q11, d1, d28
+ vmov q1, q0 /* backup source pixels to q1 */
+ vrshrn.u16 d6, q8, #2
+ vrshrn.u16 d7, q9, #2
+ vshrn.u16 d8, q10, #2
+ vshrn.u16 d9, q11, #2
+ vst2.8 {d6, d7, d8, d9}, [\OUTPTR]!
+.endm
+
+/*
+ * Upsample 32 source pixels to 64 destination pixels. Compared to 'usample16'
+ * macro, the roles of q0 and q1 registers are reversed for even and odd
+ * groups of 16 pixels, that's why "vmov q1, q0" instructions are not needed.
+ * Also this unrolling allows to reorder loads and stores to compensate
+ * multiplication latency and reduce stalls.
+ */
+.macro upsample32 OUTPTR, INPTR
+ /* even 16 pixels group */
+ vld1.8 {q0}, [\INPTR]!
+ vmovl.u8 q8, d0
+ vext.8 q2, q1, q0, #15
+ vmovl.u8 q9, d1
+ vaddw.u8 q10, q15, d4
+ vaddw.u8 q11, q15, d5
+ vmlal.u8 q8, d4, d28
+ vmlal.u8 q9, d5, d28
+ vmlal.u8 q10, d0, d28
+ vmlal.u8 q11, d1, d28
+ /* odd 16 pixels group */
+ vld1.8 {q1}, [\INPTR]!
+ vrshrn.u16 d6, q8, #2
+ vrshrn.u16 d7, q9, #2
+ vshrn.u16 d8, q10, #2
+ vshrn.u16 d9, q11, #2
+ vmovl.u8 q8, d2
+ vext.8 q2, q0, q1, #15
+ vmovl.u8 q9, d3
+ vaddw.u8 q10, q15, d4
+ vaddw.u8 q11, q15, d5
+ vmlal.u8 q8, d4, d28
+ vmlal.u8 q9, d5, d28
+ vmlal.u8 q10, d2, d28
+ vmlal.u8 q11, d3, d28
+ vst2.8 {d6, d7, d8, d9}, [\OUTPTR]!
+ vrshrn.u16 d6, q8, #2
+ vrshrn.u16 d7, q9, #2
+ vshrn.u16 d8, q10, #2
+ vshrn.u16 d9, q11, #2
+ vst2.8 {d6, d7, d8, d9}, [\OUTPTR]!
+.endm
+
+/*
+ * Upsample a row of WIDTH pixels from INPTR to OUTPTR.
+ */
+.macro upsample_row OUTPTR, INPTR, WIDTH, TMP1
+ /* special case for the first and last pixels */
+ sub \WIDTH, \WIDTH, #1
+ add \OUTPTR, \OUTPTR, #1
+ ldrb \TMP1, [\INPTR, \WIDTH]
+ strb \TMP1, [\OUTPTR, \WIDTH, asl #1]
+ ldrb \TMP1, [\INPTR], #1
+ strb \TMP1, [\OUTPTR, #-1]
+ vmov.8 d3[7], \TMP1
+
+ subs \WIDTH, \WIDTH, #32
+ blt 5f
+0: /* process 32 pixels per iteration */
+ upsample32 \OUTPTR, \INPTR
+ subs \WIDTH, \WIDTH, #32
+ bge 0b
+5:
+ adds \WIDTH, \WIDTH, #16
+ blt 1f
+0: /* process 16 pixels if needed */
+ upsample16 \OUTPTR, \INPTR
+ subs \WIDTH, \WIDTH, #16
+1:
+ adds \WIDTH, \WIDTH, #16
+ beq 9f
+
+ /* load the remaining 1-15 pixels */
+ add \INPTR, \INPTR, \WIDTH
+ tst \WIDTH, #1
+ beq 2f
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[0]}, [\INPTR]
+2:
+ tst \WIDTH, #2
+ beq 2f
+ vext.8 d0, d0, d0, #6
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[1]}, [\INPTR]
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[0]}, [\INPTR]
+2:
+ tst \WIDTH, #4
+ beq 2f
+ vrev64.32 d0, d0
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[3]}, [\INPTR]
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[2]}, [\INPTR]
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[1]}, [\INPTR]
+ sub \INPTR, \INPTR, #1
+ vld1.8 {d0[0]}, [\INPTR]
+2:
+ tst \WIDTH, #8
+ beq 2f
+ vmov d1, d0
+ sub \INPTR, \INPTR, #8
+ vld1.8 {d0}, [\INPTR]
+2: /* upsample the remaining pixels */
+ vmovl.u8 q8, d0
+ vext.8 q2, q1, q0, #15
+ vmovl.u8 q9, d1
+ vaddw.u8 q10, q15, d4
+ vaddw.u8 q11, q15, d5
+ vmlal.u8 q8, d4, d28
+ vmlal.u8 q9, d5, d28
+ vmlal.u8 q10, d0, d28
+ vmlal.u8 q11, d1, d28
+ vrshrn.u16 d10, q8, #2
+ vrshrn.u16 d12, q9, #2
+ vshrn.u16 d11, q10, #2
+ vshrn.u16 d13, q11, #2
+ vzip.8 d10, d11
+ vzip.8 d12, d13
+ /* store the remaining pixels */
+ tst \WIDTH, #8
+ beq 2f
+ vst1.8 {d10, d11}, [\OUTPTR]!
+ vmov q5, q6
+2:
+ tst \WIDTH, #4
+ beq 2f
+ vst1.8 {d10}, [\OUTPTR]!
+ vmov d10, d11
+2:
+ tst \WIDTH, #2
+ beq 2f
+ vst1.8 {d10[0]}, [\OUTPTR]!
+ vst1.8 {d10[1]}, [\OUTPTR]!
+ vst1.8 {d10[2]}, [\OUTPTR]!
+ vst1.8 {d10[3]}, [\OUTPTR]!
+ vext.8 d10, d10, d10, #4
+2:
+ tst \WIDTH, #1
+ beq 2f
+ vst1.8 {d10[0]}, [\OUTPTR]!
+ vst1.8 {d10[1]}, [\OUTPTR]!
+2:
+9:
+.endm
+
+asm_function jsimd_h2v1_fancy_upsample_neon
+
+ MAX_V_SAMP_FACTOR .req r0
+ DOWNSAMPLED_WIDTH .req r1
+ INPUT_DATA .req r2
+ OUTPUT_DATA_PTR .req r3
+ OUTPUT_DATA .req OUTPUT_DATA_PTR
+
+ OUTPTR .req r4
+ INPTR .req r5
+ WIDTH .req ip
+ TMP .req lr
+
+ push {r4, r5, r6, lr}
+ vpush {d8-d15}
+
+ ldr OUTPUT_DATA, [OUTPUT_DATA_PTR]
+ cmp MAX_V_SAMP_FACTOR, #0
+ ble 99f
+
+ /* initialize constants */
+ vmov.u8 d28, #3
+ vmov.u16 q15, #1
+11:
+ ldr INPTR, [INPUT_DATA], #4
+ ldr OUTPTR, [OUTPUT_DATA], #4
+ mov WIDTH, DOWNSAMPLED_WIDTH
+ upsample_row OUTPTR, INPTR, WIDTH, TMP
+ subs MAX_V_SAMP_FACTOR, MAX_V_SAMP_FACTOR, #1
+ bgt 11b
+
+99:
+ vpop {d8-d15}
+ pop {r4, r5, r6, pc}
+
+ .unreq MAX_V_SAMP_FACTOR
+ .unreq DOWNSAMPLED_WIDTH
+ .unreq INPUT_DATA
+ .unreq OUTPUT_DATA_PTR
+ .unreq OUTPUT_DATA
+
+ .unreq OUTPTR
+ .unreq INPTR
+ .unreq WIDTH
+ .unreq TMP
+
+.endfunc
+
+.purgem upsample16
+.purgem upsample32
+.purgem upsample_row
diff --git a/simd/jsimd_i386.c b/simd/jsimd_i386.c
index 94510d3..ad0f794 100644
--- a/simd/jsimd_i386.c
+++ b/simd/jsimd_i386.c
@@ -3,7 +3,7 @@
*
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
* Copyright 2009-2011 D. R. Commander
- *
+ *
* Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru.
* For conditions of distribution and use, see copyright notice in jsimdext.inc
@@ -41,7 +41,7 @@
{
char *env = NULL;
- if (simd_support != ~0)
+ if (simd_support != ~0U)
return;
simd_support = jpeg_simd_cpu_support();
@@ -398,7 +398,7 @@
GLOBAL(void)
jsimd_h2v2_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -412,7 +412,7 @@
GLOBAL(void)
jsimd_h2v1_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -466,7 +466,7 @@
GLOBAL(void)
jsimd_h2v2_fancy_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -481,7 +481,7 @@
GLOBAL(void)
jsimd_h2v1_fancy_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -1053,4 +1053,3 @@
jsimd_idct_float_3dnow(compptr->dct_table, coef_block,
output_buf, output_col);
}
-
diff --git a/simd/jsimd_x86_64.c b/simd/jsimd_x86_64.c
index e27094f..98e36b1 100644
--- a/simd/jsimd_x86_64.c
+++ b/simd/jsimd_x86_64.c
@@ -3,7 +3,7 @@
*
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
* Copyright 2009-2011 D. R. Commander
- *
+ *
* Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru.
* For conditions of distribution and use, see copyright notice in jsimdext.inc
@@ -275,7 +275,7 @@
GLOBAL(void)
jsimd_h2v2_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -286,7 +286,7 @@
GLOBAL(void)
jsimd_h2v1_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -327,7 +327,7 @@
GLOBAL(void)
jsimd_h2v2_fancy_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -338,7 +338,7 @@
GLOBAL(void)
jsimd_h2v1_fancy_upsample (j_decompress_ptr cinfo,
- jpeg_component_info * compptr,
+ jpeg_component_info * compptr,
JSAMPARRAY input_data,
JSAMPARRAY * output_data_ptr)
{
@@ -758,4 +758,3 @@
jsimd_idct_float_sse2(compptr->dct_table, coef_block,
output_buf, output_col);
}
-
diff --git a/simd/jsimdext.inc b/simd/jsimdext.inc
index 265fa64..3a3092e 100644
--- a/simd/jsimdext.inc
+++ b/simd/jsimdext.inc
@@ -325,15 +325,15 @@
push rsi
push rdi
sub rsp, SIZEOF_XMMWORD
- movlpd XMMWORD [rsp], xmm6
+ movaps XMMWORD [rsp], xmm6
sub rsp, SIZEOF_XMMWORD
- movlpd XMMWORD [rsp], xmm7
+ movaps XMMWORD [rsp], xmm7
%endmacro
%imacro uncollect_args 0
- movlpd xmm7, XMMWORD [rsp]
+ movaps xmm7, XMMWORD [rsp]
add rsp, SIZEOF_XMMWORD
- movlpd xmm6, XMMWORD [rsp]
+ movaps xmm6, XMMWORD [rsp]
add rsp, SIZEOF_XMMWORD
pop rdi
pop rsi
diff --git a/tjbench.c b/tjbench.c
index f28780d..e529ffc 100644
--- a/tjbench.c
+++ b/tjbench.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C)2009-2011 D. R. Commander. All Rights Reserved.
+ * Copyright (C)2009-2012 D. R. Commander. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@@ -110,7 +110,7 @@
if((handle=tjInitDecompress())==NULL)
_throwtj("executing tjInitDecompress()");
- bufsize=(yuv==YUVDECODE? yuvsize:pitch*h);
+ bufsize=(yuv==YUVDECODE? yuvsize:pitch*scaledh);
if(dstbuf==NULL)
{
if((dstbuf=(unsigned char *)malloc(bufsize)) == NULL)
@@ -681,7 +681,7 @@
printf("-rgb, -bgr, -rgbx, -bgrx, -xbgr, -xrgb =\n");
printf(" Test the specified color conversion path in the codec (default: BGR)\n");
printf("-fastupsample = Use fast, inaccurate upsampling code to perform 4:2:2 and 4:2:0\n");
- printf(" YUV decoding in libjpeg decompressor\n");
+ printf(" YUV decoding\n");
printf("-quiet = Output results in tabular rather than verbose format\n");
printf("-yuvencode = Encode RGB input as planar YUV rather than compressing as JPEG\n");
printf("-yuvdecode = Decode JPEG image to planar YUV rather than RGB\n");
@@ -696,6 +696,7 @@
if(i!=nsf-1) printf(", ");
if(i==nsf-2) printf("or ");
}
+ if(i%8==0 && i!=0) printf("\n ");
}
printf(")\n");
printf("-hflip, -vflip, -transpose, -transverse, -rot90, -rot180, -rot270 =\n");
@@ -811,7 +812,8 @@
{
for(j=0; j<nsf; j++)
{
- if(temp1==scalingfactors[j].num && temp2==scalingfactors[j].denom)
+ if((double)temp1/(double)temp2
+ == (double)scalingfactors[j].num/(double)scalingfactors[j].denom)
{
sf=scalingfactors[j];
match=1; break;
diff --git a/tjunittest.c b/tjunittest.c
index d14ec52..89a6d1d 100644
--- a/tjunittest.c
+++ b/tjunittest.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C)2009-2011 D. R. Commander. All Rights Reserved.
+ * Copyright (C)2009-2012 D. R. Commander. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@@ -77,7 +77,7 @@
const int _onlyRGB[]={TJPF_RGB};
enum {YUVENCODE=1, YUVDECODE};
-int yuv=0, alloc=0, alpha=0;
+int yuv=0, alloc=0;
int exitStatus=0;
#define bailout() {exitStatus=-1; goto bailout;}
@@ -502,6 +502,8 @@
for(i=0; i<2; i++)
{
int flags=0;
+ if(subsamp==TJSAMP_422 || subsamp==TJSAMP_420 || subsamp==TJSAMP_440)
+ flags|=TJFLAG_FASTUPSAMPLE;
if(i==1)
{
if(yuv==YUVDECODE) goto bailout;
@@ -617,15 +619,12 @@
if(doyuv) {yuv=YUVENCODE; alloc=0;}
doTest(35, 39, _3byteFormats, 2, TJSAMP_444, "test");
doTest(39, 41, _4byteFormats, 4, TJSAMP_444, "test");
- if(doyuv)
- {
- doTest(41, 35, _3byteFormats, 2, TJSAMP_422, "test");
- doTest(35, 39, _4byteFormats, 4, TJSAMP_422, "test");
- doTest(39, 41, _3byteFormats, 2, TJSAMP_420, "test");
- doTest(41, 35, _4byteFormats, 4, TJSAMP_420, "test");
- doTest(35, 39, _3byteFormats, 2, TJSAMP_440, "test");
- doTest(39, 41, _4byteFormats, 4, TJSAMP_440, "test");
- }
+ doTest(41, 35, _3byteFormats, 2, TJSAMP_422, "test");
+ doTest(35, 39, _4byteFormats, 4, TJSAMP_422, "test");
+ doTest(39, 41, _3byteFormats, 2, TJSAMP_420, "test");
+ doTest(41, 35, _4byteFormats, 4, TJSAMP_420, "test");
+ doTest(35, 39, _3byteFormats, 2, TJSAMP_440, "test");
+ doTest(39, 41, _4byteFormats, 4, TJSAMP_440, "test");
doTest(35, 39, _onlyGray, 1, TJSAMP_GRAY, "test");
doTest(39, 41, _3byteFormats, 2, TJSAMP_GRAY, "test");
doTest(41, 35, _4byteFormats, 4, TJSAMP_GRAY, "test");
diff --git a/turbojpeg-jni.c b/turbojpeg-jni.c
index 25cca7d..1ff9bba 100644
--- a/turbojpeg-jni.c
+++ b/turbojpeg-jni.c
@@ -635,7 +635,8 @@
if(t[i].r.w!=0) w=t[i].r.w;
if(t[i].r.h!=0) h=t[i].r.h;
bailif0(jdstBufs[i]=(*env)->GetObjectArrayElement(env, dstobjs, i));
- if((*env)->GetArrayLength(env, jdstBufs[i])<tjBufSize(w, h, jpegSubsamp))
+ if((unsigned long)(*env)->GetArrayLength(env, jdstBufs[i])
+ <tjBufSize(w, h, jpegSubsamp))
_throw("Destination buffer is not large enough");
bailif0(dstBufs[i]=(*env)->GetPrimitiveArrayCritical(env, jdstBufs[i], 0));
}
diff --git a/turbojpeg.c b/turbojpeg.c
index e27f0da..c875fd9 100644
--- a/turbojpeg.c
+++ b/turbojpeg.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C)2009-2011 D. R. Commander. All Rights Reserved.
+ * Copyright (C)2009-2012 D. R. Commander. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@@ -92,10 +92,22 @@
JXFORM_TRANSVERSE, JXFORM_ROT_90, JXFORM_ROT_180, JXFORM_ROT_270
};
-#define NUMSF 4
+#define NUMSF 16
static const tjscalingfactor sf[NUMSF]={
+ {2, 1},
+ {15, 8},
+ {7, 4},
+ {13, 8},
+ {3, 2},
+ {11, 8},
+ {5, 4},
+ {9, 8},
{1, 1},
+ {7, 8},
+ {3, 4},
+ {5, 8},
{1, 2},
+ {3, 8},
{1, 4},
{1, 8}
};
@@ -160,12 +172,17 @@
cinfo->in_color_space=JCS_EXT_XBGR; break;
#else
case TJPF_RGB:
- if(RGB_RED==0 && RGB_GREEN==1 && RGB_BLUE==2 && RGB_PIXELSIZE==3)
- {
- cinfo->in_color_space=JCS_RGB; break;
- }
- default:
- _throw("Unsupported pixel format");
+ case TJPF_BGR:
+ case TJPF_RGBX:
+ case TJPF_BGRX:
+ case TJPF_XRGB:
+ case TJPF_XBGR:
+ case TJPF_RGBA:
+ case TJPF_BGRA:
+ case TJPF_ARGB:
+ case TJPF_ABGR:
+ cinfo->in_color_space=JCS_RGB; pixelFormat=TJPF_RGB;
+ break;
#endif
}
@@ -189,9 +206,6 @@
cinfo->comp_info[1].v_samp_factor=1;
cinfo->comp_info[2].v_samp_factor=1;
- #if JCS_EXTENSIONS!=1
- bailout:
- #endif
return retval;
}
@@ -229,10 +243,16 @@
#endif
#else
case TJPF_RGB:
- if(RGB_RED==0 && RGB_GREEN==1 && RGB_BLUE==2 && RGB_PIXELSIZE==3)
- {
- dinfo->out_color_space=JCS_RGB; break;
- }
+ case TJPF_BGR:
+ case TJPF_RGBX:
+ case TJPF_BGRX:
+ case TJPF_XRGB:
+ case TJPF_XBGR:
+ case TJPF_RGBA:
+ case TJPF_BGRA:
+ case TJPF_ARGB:
+ case TJPF_ABGR:
+ dinfo->out_color_space=JCS_RGB; break;
#endif
default:
_throw("Unsupported pixel format");
@@ -271,6 +291,149 @@
}
+#ifndef JCS_EXTENSIONS
+
+/* Conversion functions to emulate the colorspace extensions. This allows the
+ TurboJPEG wrapper to be used with libjpeg */
+
+#define TORGB(PS, ROFFSET, GOFFSET, BOFFSET) { \
+ int rowPad=pitch-width*PS; \
+ while(height--) \
+ { \
+ unsigned char *endOfRow=src+width*PS; \
+ while(src<endOfRow) \
+ { \
+ dst[RGB_RED]=src[ROFFSET]; \
+ dst[RGB_GREEN]=src[GOFFSET]; \
+ dst[RGB_BLUE]=src[BOFFSET]; \
+ dst+=RGB_PIXELSIZE; src+=PS; \
+ } \
+ src+=rowPad; \
+ } \
+}
+
+static unsigned char *toRGB(unsigned char *src, int width, int pitch,
+ int height, int pixelFormat, unsigned char *dst)
+{
+ unsigned char *retval=src;
+ switch(pixelFormat)
+ {
+ case TJPF_RGB:
+ #if RGB_RED!=0 || RGB_GREEN!=1 || RGB_BLUE!=2 || RGB_PIXELSIZE!=3
+ retval=dst; TORGB(3, 0, 1, 2);
+ #endif
+ break;
+ case TJPF_BGR:
+ #if RGB_RED!=2 || RGB_GREEN!=1 || RGB_BLUE!=0 || RGB_PIXELSIZE!=3
+ retval=dst; TORGB(3, 2, 1, 0);
+ #endif
+ break;
+ case TJPF_RGBX:
+ case TJPF_RGBA:
+ #if RGB_RED!=0 || RGB_GREEN!=1 || RGB_BLUE!=2 || RGB_PIXELSIZE!=4
+ retval=dst; TORGB(4, 0, 1, 2);
+ #endif
+ break;
+ case TJPF_BGRX:
+ case TJPF_BGRA:
+ #if RGB_RED!=2 || RGB_GREEN!=1 || RGB_BLUE!=0 || RGB_PIXELSIZE!=4
+ retval=dst; TORGB(4, 2, 1, 0);
+ #endif
+ break;
+ case TJPF_XRGB:
+ case TJPF_ARGB:
+ #if RGB_RED!=1 || RGB_GREEN!=2 || RGB_BLUE!=3 || RGB_PIXELSIZE!=4
+ retval=dst; TORGB(4, 1, 2, 3);
+ #endif
+ break;
+ case TJPF_XBGR:
+ case TJPF_ABGR:
+ #if RGB_RED!=3 || RGB_GREEN!=2 || RGB_BLUE!=1 || RGB_PIXELSIZE!=4
+ retval=dst; TORGB(4, 3, 2, 1);
+ #endif
+ break;
+ }
+ return retval;
+}
+
+#define FROMRGB(PS, ROFFSET, GOFFSET, BOFFSET, SETALPHA) { \
+ int rowPad=pitch-width*PS; \
+ while(height--) \
+ { \
+ unsigned char *endOfRow=dst+width*PS; \
+ while(dst<endOfRow) \
+ { \
+ dst[ROFFSET]=src[RGB_RED]; \
+ dst[GOFFSET]=src[RGB_GREEN]; \
+ dst[BOFFSET]=src[RGB_BLUE]; \
+ SETALPHA \
+ dst+=PS; src+=RGB_PIXELSIZE; \
+ } \
+ dst+=rowPad; \
+ } \
+}
+
+static void fromRGB(unsigned char *src, unsigned char *dst, int width,
+ int pitch, int height, int pixelFormat)
+{
+ switch(pixelFormat)
+ {
+ case TJPF_RGB:
+ #if RGB_RED!=0 || RGB_GREEN!=1 || RGB_BLUE!=2 || RGB_PIXELSIZE!=3
+ FROMRGB(3, 0, 1, 2,);
+ #endif
+ break;
+ case TJPF_BGR:
+ #if RGB_RED!=2 || RGB_GREEN!=1 || RGB_BLUE!=0 || RGB_PIXELSIZE!=3
+ FROMRGB(3, 2, 1, 0,);
+ #endif
+ break;
+ case TJPF_RGBX:
+ #if RGB_RED!=0 || RGB_GREEN!=1 || RGB_BLUE!=2 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 0, 1, 2,);
+ #endif
+ break;
+ case TJPF_RGBA:
+ #if RGB_RED!=0 || RGB_GREEN!=1 || RGB_BLUE!=2 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 0, 1, 2, dst[3]=0xFF;);
+ #endif
+ break;
+ case TJPF_BGRX:
+ #if RGB_RED!=2 || RGB_GREEN!=1 || RGB_BLUE!=0 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 2, 1, 0,);
+ #endif
+ break;
+ case TJPF_BGRA:
+ #if RGB_RED!=2 || RGB_GREEN!=1 || RGB_BLUE!=0 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 2, 1, 0, dst[3]=0xFF;); return;
+ #endif
+ break;
+ case TJPF_XRGB:
+ #if RGB_RED!=1 || RGB_GREEN!=2 || RGB_BLUE!=3 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 1, 2, 3,); return;
+ #endif
+ break;
+ case TJPF_ARGB:
+ #if RGB_RED!=1 || RGB_GREEN!=2 || RGB_BLUE!=3 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 1, 2, 3, dst[0]=0xFF;); return;
+ #endif
+ break;
+ case TJPF_XBGR:
+ #if RGB_RED!=3 || RGB_GREEN!=2 || RGB_BLUE!=1 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 3, 2, 1,); return;
+ #endif
+ break;
+ case TJPF_ABGR:
+ #if RGB_RED!=3 || RGB_GREEN!=2 || RGB_BLUE!=1 || RGB_PIXELSIZE!=4
+ FROMRGB(4, 3, 2, 1, dst[0]=0xFF;); return;
+ #endif
+ break;
+ }
+}
+
+#endif
+
+
/* General API functions */
DLLEXPORT char* DLLCALL tjGetErrorStr(void)
@@ -411,6 +574,9 @@
unsigned long *jpegSize, int jpegSubsamp, int jpegQual, int flags)
{
int i, retval=0, alloc=1; JSAMPROW *row_pointer=NULL;
+ #ifndef JCS_EXTENSIONS
+ unsigned char *rgbBuf=NULL;
+ #endif
getinstance(handle)
if((this->init&COMPRESS)==0)
@@ -430,6 +596,16 @@
if(pitch==0) pitch=width*tjPixelSize[pixelFormat];
+ #ifndef JCS_EXTENSIONS
+ if(pixelFormat!=TJPF_GRAY)
+ {
+ rgbBuf=(unsigned char *)malloc(width*height*RGB_PIXELSIZE);
+ if(!rgbBuf) _throw("tjCompress2(): Memory allocation failure");
+ srcBuf=toRGB(srcBuf, width, pitch, height, pixelFormat, rgbBuf);
+ pitch=width*RGB_PIXELSIZE;
+ }
+ #endif
+
cinfo->image_width=width;
cinfo->image_height=height;
@@ -462,6 +638,9 @@
bailout:
if(cinfo->global_state>CSTATE_START) jpeg_abort_compress(cinfo);
+ #ifndef JCS_EXTENSIONS
+ if(rgbBuf) free(rgbBuf);
+ #endif
if(row_pointer) free(row_pointer);
return retval;
}
@@ -498,8 +677,11 @@
JSAMPROW *outbuf[MAX_COMPONENTS];
int row, pw, ph, cw[MAX_COMPONENTS], ch[MAX_COMPONENTS];
JSAMPLE *ptr=dstBuf;
- unsigned long yuvsize=0;
+ unsigned long yuvsize=0;
jpeg_component_info *compptr;
+ #ifndef JCS_EXTENSIONS
+ unsigned char *rgbBuf=NULL;
+ #endif
getinstance(handle);
if((this->init&COMPRESS)==0)
@@ -525,6 +707,16 @@
if(pitch==0) pitch=width*tjPixelSize[pixelFormat];
+ #ifndef JCS_EXTENSIONS
+ if(pixelFormat!=TJPF_GRAY)
+ {
+ rgbBuf=(unsigned char *)malloc(width*height*RGB_PIXELSIZE);
+ if(!rgbBuf) _throw("tjEncodeYUV2(): Memory allocation failure");
+ srcBuf=toRGB(srcBuf, width, pitch, height, pixelFormat, rgbBuf);
+ pitch=width*RGB_PIXELSIZE;
+ }
+ #endif
+
cinfo->image_width=width;
cinfo->image_height=height;
@@ -607,6 +799,9 @@
bailout:
if(cinfo->global_state>CSTATE_START) jpeg_abort_compress(cinfo);
+ #ifndef JCS_EXTENSIONS
+ if(rgbBuf) free(rgbBuf);
+ #endif
if(row_pointer) free(row_pointer);
for(i=0; i<MAX_COMPONENTS; i++)
{
@@ -734,6 +929,10 @@
{
int i, retval=0; JSAMPROW *row_pointer=NULL;
int jpegwidth, jpegheight, scaledw, scaledh;
+ #ifndef JCS_EXTENSIONS
+ unsigned char *rgbBuf=NULL;
+ unsigned char *_dstBuf=NULL; int _pitch=0;
+ #endif
getinstance(handle);
if((this->init&DECOMPRESS)==0)
@@ -756,7 +955,10 @@
jpeg_mem_src_tj(dinfo, jpegBuf, jpegSize);
jpeg_read_header(dinfo, TRUE);
- if(setDecompDefaults(dinfo, pixelFormat)==-1) return -1;
+ if(setDecompDefaults(dinfo, pixelFormat)==-1)
+ {
+ retval=-1; goto bailout;
+ }
if(flags&TJFLAG_FASTUPSAMPLE) dinfo->do_fancy_upsampling=FALSE;
@@ -778,6 +980,21 @@
jpeg_start_decompress(dinfo);
if(pitch==0) pitch=dinfo->output_width*tjPixelSize[pixelFormat];
+
+ #ifndef JCS_EXTENSIONS
+ if(pixelFormat!=TJPF_GRAY &&
+ (RGB_RED!=tjRedOffset[pixelFormat] ||
+ RGB_GREEN!=tjGreenOffset[pixelFormat] ||
+ RGB_BLUE!=tjBlueOffset[pixelFormat] ||
+ RGB_PIXELSIZE!=tjPixelSize[pixelFormat]))
+ {
+ rgbBuf=(unsigned char *)malloc(width*height*3);
+ if(!rgbBuf) _throw("tjDecompress2(): Memory allocation failure");
+ _pitch=pitch; pitch=width*3;
+ _dstBuf=dstBuf; dstBuf=rgbBuf;
+ }
+ #endif
+
if((row_pointer=(JSAMPROW *)malloc(sizeof(JSAMPROW)
*dinfo->output_height))==NULL)
_throw("tjDecompress2(): Memory allocation failure");
@@ -794,8 +1011,15 @@
}
jpeg_finish_decompress(dinfo);
+ #ifndef JCS_EXTENSIONS
+ fromRGB(rgbBuf, _dstBuf, width, _pitch, height, pixelFormat);
+ #endif
+
bailout:
if(dinfo->global_state>DSTATE_START) jpeg_abort_decompress(dinfo);
+ #ifndef JCS_EXTENSIONS
+ if(rgbBuf) free(rgbBuf);
+ #endif
if(row_pointer) free(row_pointer);
return retval;
}
@@ -1065,7 +1289,7 @@
&xinfo[i]);
if(t[i].customFilter)
{
- int ci, by, y;
+ int ci, y; JDIMENSION by;
for(ci=0; ci<cinfo->num_components; ci++)
{
jpeg_component_info *compptr=&cinfo->comp_info[ci];
diff --git a/win/jconfig.h b/win/jconfig.h
index 9cf6f10..aefa9a2 100644
--- a/win/jconfig.h
+++ b/win/jconfig.h
@@ -1,30 +1,58 @@
-/* jconfig.vc --- jconfig.h for Microsoft Visual C++ on Windows 95 or NT. */
-/* see jconfig.doc for explanations */
+/* jconfig.h. Generated from jconfig.h.in by configure. */
+/* Version ID for the JPEG library.
+ * Might be useful for tests like "#if JPEG_LIB_VERSION >= 60".
+ */
+#define JPEG_LIB_VERSION 62
-#define HAVE_PROTOTYPES
-#define HAVE_UNSIGNED_CHAR
-#define HAVE_UNSIGNED_SHORT
-/* #define void char */
-/* #define const */
-#undef CHAR_IS_UNSIGNED
-#define HAVE_STDDEF_H
-#define HAVE_STDLIB_H
-#undef NEED_BSD_STRINGS
-#undef NEED_SYS_TYPES_H
-#undef NEED_FAR_POINTERS /* we presume a 32-bit flat memory model */
-#undef NEED_SHORT_EXTERNAL_NAMES
-#undef INCOMPLETE_TYPES_BROKEN
+/* libjpeg-turbo version */
+#define LIBJPEG_TURBO_VERSION 1.2.80
-/* Define "boolean" as unsigned char, not int, per Windows custom */
-#ifndef __RPCNDR_H__ /* don't conflict if rpcndr.h already read */
-typedef unsigned char boolean;
+/* Support arithmetic encoding */
+/* #undef C_ARITH_CODING_SUPPORTED */
+
+/* Support arithmetic decoding */
+/* #undef D_ARITH_CODING_SUPPORTED */
+
+/* Compiler supports function prototypes. */
+#define HAVE_PROTOTYPES 1
+
+/* Define to 1 if you have the <stddef.h> header file. */
+#define HAVE_STDDEF_H 1
+
+/* Define to 1 if you have the <stdlib.h> header file. */
+#define HAVE_STDLIB_H 1
+
+/* Compiler supports 'unsigned char'. */
+#define HAVE_UNSIGNED_CHAR 1
+
+/* Compiler supports 'unsigned short'. */
+#define HAVE_UNSIGNED_SHORT 1
+
+/* Compiler does not support pointers to unspecified structures. */
+/* #undef INCOMPLETE_TYPES_BROKEN */
+
+/* Compiler has <strings.h> rather than standard <string.h>. */
+/* #undef NEED_BSD_STRINGS */
+
+/* Linker requires that global names be unique in first 15 characters. */
+/* #undef NEED_SHORT_EXTERNAL_NAMES */
+
+/* Need to include <sys/types.h> in order to obtain size_t. */
+/* #undef NEED_SYS_TYPES_H 1 */
+
+/* Broken compiler shifts signed values as an unsigned shift. */
+/* #undef RIGHT_SHIFT_IS_UNSIGNED */
+
+/* Use accelerated SIMD routines. */
+#define WITH_SIMD 1
+
+/* Define to 1 if type `char' is unsigned and you are not using gcc. */
+#ifndef __CHAR_UNSIGNED__
+/* # undef __CHAR_UNSIGNED__ */
#endif
-#define HAVE_BOOLEAN /* prevent jmorecfg.h from redefining it */
-#define inline __inline
+/* Define to empty if `const' does not conform to ANSI C. */
+/* #undef const */
-#ifdef JPEG_INTERNALS
-
-#undef RIGHT_SHIFT_IS_UNSIGNED
-
-#endif /* JPEG_INTERNALS */
+/* Define to `unsigned int' if <sys/types.h> does not define. */
+/* #undef size_t */