commit f21daff3dc11cf5881f1727c3c9d505f0810d20b Author: Jason Garrett-Glaser Date: Wed Jul 15 12:43:35 2009 -0700 Cacheline-split SSSE3 chroma MC ~70% faster chroma MC on 32-bit Conroe Also slightly faster SSSE3 intra_sad_8x8c common/frame.c | 2 +- common/x86/mc-a.asm | 76 +++++++++++++++++++++++++++++++++++++++++------- common/x86/mc-c.c | 5 +++ common/x86/sad-a.asm | 28 ++++++++++++++---- common/x86/x86inc.asm | 8 ++-- tools/checkasm.c | 7 ++-- 6 files changed, 101 insertions(+), 25 deletions(-) commit 6c13403195d42b2c0ee707e9f2a6e9f9cd81afd6 Author: Jason Garrett-Glaser Date: Sun Jul 12 12:07:01 2009 -0700 Improve documentation of qp/crf options x264.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) commit 49bf7673b2f52b2cd8e9c10d8d6e9cbbb5422cf7 Author: Jason Garrett-Glaser Date: Thu Jul 9 19:02:57 2009 -0700 Merge array_non_zero into zigzag_sub Faster lossless, cleaner code. SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding. common/dct.c | 54 ++++++++++++++++++++++++++++++++++++++++++++----- common/dct.h | 5 ++- common/macroblock.h | 4 +- common/x86/dct-a.asm | 32 ++++++++++++++++++++++++++++- common/x86/dct.h | 5 +++- common/x86/util.h | 33 ------------------------------ encoder/macroblock.c | 38 ++++++++++++----------------------- tools/checkasm.c | 38 ++++++++++++++++++++++++++++++++-- 8 files changed, 136 insertions(+), 73 deletions(-) commit b63f5919e3f5367a0df3dbf218d5a94d2fdba5fb Author: James Darnley Date: Thu Jul 9 11:25:55 2009 -0700 Fix bug in reference frame autoadjustment For some types of input file, x264 did the adjustment before width/height were known. x264.c | 38 ++++++++++++++++++++------------------ 1 files changed, 20 insertions(+), 18 deletions(-) commit 96e2229e96d65420d491596affa9aaa068d718d6 Author: Jason Garrett-Glaser Date: Tue Jul 7 11:13:39 2009 -0700 Fix fprofile settings to match changes in defaults Also add b-adapt 2 to fprofile. Makefile | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) commit 3f6713d5c794d4fbfd3131985e33a822a40cb870 Author: Jason Garrett-Glaser Date: Fri Jul 3 02:33:44 2009 -0700 Slightly faster dequant_flat assembly Eliminate some redundant shifts. common/x86/quant-a.asm | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit af2a4ecd7bcefc97c8aa83913c9a2980206f9cd0 Author: Jason Garrett-Glaser Date: Wed Jul 1 21:14:57 2009 -0700 Totally new preset system for x264.c (not libx264), new defaults Other new features include "tune" and "profile" settings; see --help for more details. Unlike most other settings, "preset" and "tune" act before all other options. However, "profile" acts afterwards, overriding all other options. Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress. Users will hopefully find these changes to greatly improve usability. common/common.c | 24 ++- encoder/encoder.c | 20 ++ x264.c | 520 ++++++++++++++++++++++++++++++++++++++++------------- x264.h | 15 +- 4 files changed, 435 insertions(+), 144 deletions(-) commit 72534d466a6bd99b9cbf32c74e667bea608c6dee Author: Jason Garrett-Glaser Date: Wed Jul 1 16:33:12 2009 -0700 Update Gabriel's email address in AUTHORS AUTHORS | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit f4dac817e45b967572f5c4e4af4644dc7d263512 Author: Jason Garrett-Glaser Date: Tue Jun 30 15:20:32 2009 -0700 Early termination for chroma encoding Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only. This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only. Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8. mmx/sse2/ssse3 versions of each. Early termination is disabled at very low QPs due to it not being useful there. Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2. Increase is greater with lower bitrates. common/dct.c | 25 +++++++++++ common/dct.h | 1 + common/pixel.c | 28 ++++++++++++ common/pixel.h | 1 + common/x86/dct-a.asm | 74 +++++++++++++++++++++++++++++++ common/x86/dct.h | 3 +- common/x86/pixel-a.asm | 113 ++++++++++++++++++++++++++++++++++++++++++++++++ common/x86/pixel.h | 3 + encoder/macroblock.c | 60 +++++++++++++++++++++++++- tools/checkasm.c | 19 ++++++++ 10 files changed, 325 insertions(+), 2 deletions(-) commit 7fd6a9099f18ec028d6c73890258280e6f8a6c02 Author: David Conrad Date: Fri Jun 26 13:09:44 2009 -0700 Fix bug in checkasm frame_init_lowres_core check didn't check the C plane. However, all x86 and PPC assembly was correct regardless of the unit test being incorrect. tools/checkasm.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) commit f6d31669a2547110b9c1323aa51437296f2f3506 Author: Jason Garrett-Glaser Date: Wed Jun 24 14:39:15 2009 -0700 Add subpartition cost for sub-8x8 blocks Improves sub-p8x8 mode decision. encoder/cabac.c | 4 +++- encoder/cavlc.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) commit b484fe1bff3cb68b3325a9b77d802789cf77e600 Author: Jason Garrett-Glaser Date: Wed Jun 24 13:24:18 2009 -0700 Yet more CABAC and CAVLC optimizations Also clean up a lot of pointless code duplication in CAVLC MV coding. encoder/cabac.c | 139 ++++++++++++++++++++++--------------------------------- encoder/cavlc.c | 108 +++++++++++++------------------------------ 2 files changed, 87 insertions(+), 160 deletions(-)