Commit Graph

195 Commits

Author SHA1 Message Date
James Almer
7b3cb953f7 checkasm: add fixed_dsp tests
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-11 18:05:13 -03:00
Clément Bœsch
210678d3c5 Merge commit '3794062ab1a13442b06f6d76c54dce51ffa54697'
* commit '3794062ab1a13442b06f6d76c54dce51ffa54697':
  Remove Plan 9 support

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-09 14:52:00 +02:00
James Almer
6747fc436e Merge commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27'
* commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27':
  Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"

Merged-by: James Almer <jamrial@gmail.com>
2017-04-04 15:26:18 -03:00
Clément Bœsch
edfa7ac8ec Merge commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b'
* commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b':
  checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-01 11:54:29 +02:00
Clément Bœsch
b589e83f43 Merge commit '9498237049d15812cecb79df47b196c73013908b'
* commit '9498237049d15812cecb79df47b196c73013908b':
  checkasm: Add --test parameter to check only specific components

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-31 10:06:13 +02:00
Clément Bœsch
1c9f4b5078 lavc/vp9: split into vp9{block,data,mvs}
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
James Almer
09ce5519f3 fate/checkasm: fix use of uninitialized memory on hevc_add_res tests 2017-03-24 22:11:34 -03:00
James Almer
36eae45510 fate/checkasm: use LOCAL_ALINGED_32 on hevc_add_res tests 2017-03-24 22:11:22 -03:00
Clément Bœsch
3d4039f964 Merge commit 'ed48a9d8143d2575a4458589cebde69ec326afd8'
* commit 'ed48a9d8143d2575a4458589cebde69ec326afd8':
  checkasm: Add a test for HEVC add_residual

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 12:37:09 +01:00
James Almer
0d34473d8e Merge commit 'dd5d4a0e1e3a30a254d1a57ecbdcedf230c6014b'
* commit 'dd5d4a0e1e3a30a254d1a57ecbdcedf230c6014b':
  checkasm: aarch64: Don't clobber x29 in checkasm_stack_clobber

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:31:36 -03:00
James Almer
f23078904f Merge commit '2816f8a8bb33bd67fec5e94f5d357918caf4e055'
* commit '2816f8a8bb33bd67fec5e94f5d357918caf4e055':
  build: Drop arch-specific checkasm Makefiles

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:01:47 -03:00
James Almer
3ddae9eee9 Merge commit '93d5b022a9fd3a1a1f9c521a1eac7f0410e05b81'
* commit '93d5b022a9fd3a1a1f9c521a1eac7f0410e05b81':
  build: Drop duplicate asm recipe

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 17:57:35 -03:00
James Almer
67b639b496 Merge commit 'c91d6a33f872574c95c8784277cf60ffcf6bff4f'
* commit 'c91d6a33f872574c95c8784277cf60ffcf6bff4f':
  checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 17:38:20 -03:00
James Almer
a2d34cc51b Merge commit 'f1b3e131385176c3c9d9783b25047856a0dcebf6'
* commit 'f1b3e131385176c3c9d9783b25047856a0dcebf6':
  checkasm: aarch64: Clobber the stack before calling functions

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 17:36:53 -03:00
James Almer
cab4c7fa19 Merge commit 'a05cc56124b4f1237f6355784de821e3290ddb44'
* commit 'a05cc56124b4f1237f6355784de821e3290ddb44':
  checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 17:35:38 -03:00
Clément Bœsch
50bbb67472 Merge commit 'e3f941cb03b139b866a0ad6dc95fbe1b247d54af'
* commit 'e3f941cb03b139b866a0ad6dc95fbe1b247d54af':
  checkasm: add a test for HEVC IDCT

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 12:17:39 +01:00
James Almer
30cadfe071 avcodec/lossless_videodsp: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-22 18:38:35 -03:00
Clément Bœsch
7c2a7f9c11 Merge commit '22c3ab18646924ce24dc6017a9e882ff69689e40'
* commit '22c3ab18646924ce24dc6017a9e882ff69689e40':
  checkasm: Add test for huffyuvdsp add_bytes

huffyuvdsp is renamed to llviddsp to be consistent with our codebase.

Note: af607b7e07 wasn't actually required for this test since this
commit is not actually testing huffyuvdsp.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 16:31:38 +01:00
Clément Bœsch
83cd80d10a Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'
* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
  audiodsp/x86: yasmify vector_clipf_sse
  audiodsp: reorder arguments for vector_clipf

Merged the version from Libav after a discussion with James Almer on
IRC:

19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed

Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 22:35:07 +01:00
Clément Bœsch
8414755486 Merge commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017'
* commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017':
  checkasm: add tests for audiodsp

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 19:10:56 +01:00
Clément Bœsch
c50b2164a6 Merge commit '2eb97af66af90ca3978229da151f0b8b3a5d9370'
* commit '2eb97af66af90ca3978229da151f0b8b3a5d9370':
  checkasm: add a test for blockdsp

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 19:05:05 +01:00
Clément Bœsch
e07fa3008b Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'
* commit 'de452e503734ebb0fdbce86e9d16693b3530fad3':
  pixblockdsp: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 15:58:32 +01:00
Clément Bœsch
3c8f7a8f6b Merge commit 'e89cef40506d990a982aefedfde7d3ca4f88c524'
* commit 'e89cef40506d990a982aefedfde7d3ca4f88c524':
  checkasm: Read the unsigned value as it should

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:55:20 +01:00
James Almer
e5623aafd8 Merge commit '87c6c78604e4dd16f1f45862b27ca006da010527'
* commit '87c6c78604e4dd16f1f45862b27ca006da010527':
  vp8: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:11:44 -03:00
Clément Bœsch
8b13492c9e Merge commit '40ad05bab206c932a32171d45581080c914b06ec'
* commit '40ad05bab206c932a32171d45581080c914b06ec':
  checkasm: Cast unsigned to signed

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-15 12:32:15 +01:00
Clément Bœsch
92cb9a3869 Merge commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc'
* commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc':
  checkasm: add HEVC test for testing IDCT DC

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-02-02 11:40:58 +01:00
Clément Bœsch
a0860b0a38 Merge commit '6f9e34baea4f6f484392e4e67f606a0835d07b73'
* commit '6f9e34baea4f6f484392e4e67f606a0835d07b73':
  arm: Check for support for the .fpu directive

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-02-02 11:22:04 +01:00
Clément Bœsch
9f1c81e5ec Merge commit '71a0472114574993df7035f4de9aa007e03817b8'
* commit '71a0472114574993df7035f4de9aa007e03817b8':
  checkasm: arm: report the first clobbered register in checkasm_checked_call

Also includes 446353ea18, 59aeed93e4, and 37961044c6 to avoid breaking
too much stuff.

Merged-by: Clément Bœsch <u@pkh.me>
2017-01-24 19:21:29 +01:00
Martin Storsjö
388f6e6715 arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c2b.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-14 21:13:30 +01:00
Ronald S. Bultje
1c8fbd7b90 checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST). 2016-12-27 10:02:33 -05:00
Diego Biurrun
3794062ab1 Remove Plan 9 support
Supporting the system was a nice joke for the 9 release, but it has
run its course. Nowadays Plan 9 receives no testing and has no
practical usefulness.
2016-12-03 09:15:01 +01:00
Martin Storsjö
9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
Ronald S. Bultje
06fec74cac checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST).
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:55:38 +02:00
Martin Storsjö
effc1430b2 Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
This reverts commit 81d7f0bbca.

Instead of just benchmarking dc separately, test all relevant subparts
(in the next commit).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:55:26 +02:00
Hendrik Leppkes
286d8bae61 Merge commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f'
* commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f':
  checkasm/arm: preserve the stack alignment checkasm_checked_call

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:21:32 +01:00
Hendrik Leppkes
c0af1ee90d Merge commit '80fbb7becae530167373fe5178966b7d7604306e'
* commit '80fbb7becae530167373fe5178966b7d7604306e':
  checkasm: vp8.mc: initialize the full src buffer after ec32574209

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:20:10 +01:00
Hendrik Leppkes
90b72f6bda Merge commit '8c816c0c9b12fdefd9046415e97df299880bc9b8'
* commit '8c816c0c9b12fdefd9046415e97df299880bc9b8':
  checkasm/arm: align the clobber check data properly for ldrd

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:10 +01:00
Hendrik Leppkes
4fe013fc70 Merge commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6'
* commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6':
  checkasm: vp8: mc: test unequal width/height for partitions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:05:25 +01:00
Martin Storsjö
81d7f0bbca checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately
The dc-only mode is already checked to work correctly above, but this
allows benchmarking this mode for performance tuning, and allows making
sure that it actually is correctly hooked up.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-16 10:06:32 +02:00
Hendrik Leppkes
47f75839e4 Merge commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5'
* commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5':
  checkasm: Add tests for vp8dsp

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:29:08 +01:00
Hendrik Leppkes
f75035b06f Merge commit 'e48746deec48e9ff195841bc3266b4e153a878cd'
* commit 'e48746deec48e9ff195841bc3266b4e153a878cd':
  checkasm: h264dsp: Move the x and y variables into the randomize_buffer macro

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 23:02:39 +01:00
Ronald S. Bultje
0b37cd09a6 checkasm: add vp9dsp.itxfm_add tests.
This includes fixes by Henrik Gramner.

The forward transforms are derived from the reference encoder.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 11:09:05 +02:00
Diego Biurrun
9498237049 checkasm: Add --test parameter to check only specific components
Inspired by a patch from Martin Storsjö <martin@martin.st>.
2016-11-08 17:32:25 +01:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Alexandra Hájková
ed48a9d814 checkasm: Add a test for HEVC add_residual 2016-10-22 17:33:35 +02:00
Martin Storsjö
dd5d4a0e1e checkasm: aarch64: Don't clobber x29 in checkasm_stack_clobber
x29 (FP) is a callee saved register and should be restored on
return. Instead of backing up x29 and restoring it here, back up
sp in a register that we are allowed to overwrite.

This fixes crashes in checkasm on aarch64 since f1b3e13138.
For some reason, gcc builds didn't crash, but clang builds do.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-18 16:17:12 +03:00
Diego Biurrun
2816f8a8bb build: Drop arch-specific checkasm Makefiles
They only contain one line and will never contain more.
2016-10-17 16:25:38 +02:00
Diego Biurrun
93d5b022a9 build: Drop duplicate asm recipe
And move the asm recipe to the top-level Makefile next to the other
local pattern rules for .o files.
2016-10-17 16:25:35 +02:00
Martin Storsjö
c91d6a33f8 checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-16 23:26:33 +03:00
Martin Storsjö
f1b3e13138 checkasm: aarch64: Clobber the stack before calling functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-16 23:26:22 +03:00