Square root algorithms
Bjørn, Sat Apr 22 2006, 11:50AM
Does anyone know any efficient square root algorithms? I made this one in an attempt to make it very short and failsafe but it is the fastest one I have tried out so far.
I was surprised not to find anything faster so if anyone has any ideas or pointers towards something efficient it would be interesting to hear. I saw an algorithm years ago that was very efficient but I can't recall the name.
;r0 = sqrt(r1)
mov r0, #0
mov r3, #32768
loop1 orr r0, r0, r3
mul r2, r0, r0
cmp r2, r1
bichi r0, r0, r3
movnes r3, r3, lsr #1
bne loop1
Re:
Square root algorithms
ShawnLG, Sat Apr 22 2006, 03:48PM
This one says it is under 20 clock cycles on some cpus.
Re:
Square root algorithms
JimG, Sat Apr 22 2006, 04:38PM
The code you've shown is equivalent to the following C code:
int const isqrt( int const square )
{
int guess = 0;
int bit = 1 << 15; // do the first 15 LSBs
while ( bit > 0 )
{
guess |= bit; // build up a value
if ( guess * guess > square ) // is the square of the guess bigger than the orginal?
guess &= ~bit; // remove the bit
bit >>= 1;
}
return guess;
}
It doesn't scale very well beyond 32 bit inputs and will always take 96 instructions to execute, but the algorithm you choose is dependant on your situation. If you're ever going to be taking the square root of 8 bit numbers you would use a different algorithm from if you were getting the square root of an 80 bit floating point number.
Normally table lookup with a Newton iteration like Newton-Raphson yeilds fast and accurate results for larger numbers. A purely iterative approach works really well for small numbers. It's always a trade off of overhead
I have used the
BSR trick on Intel that is pointed out in the link ShawnLG provided to get an approximation before doing an iterative approach, as well.
Update: Fixed or to an and.
Re: Square root algorithms
Electroholic, Sat Apr 22 2006, 09:20PM
Copied from circuit cellar issue 187
Sum of odds
uint16_t Exh_sumodd-opt
(uint16_t A)
{
uint16_t x_odd = 1;
while (x_odd <= A)
{
A -= x_odd;
x_odd += 2;
}
return (x_odd>>1);
}
Re:
Square root algorithms
Alfons, Sat Apr 22 2006, 11:43PM
In BASIC, this will work:
10 INPUT "Calculate square root of";A
20 GOSUB 50
30 PRINT "Square root of "A" is: ";B
40 END
50 O=1:B=0
60 AA=A*100
70 AA=AA-O
80 B=B+1
90 O=O+2
100 IF AA<=0 THEN B=B/10:RETURN
110 GOTO 70
(The code was given to me by Fedor Steeman, a Tandy Color Computer enthousiast)
Re: Square root algorithms
Bjørn, Sun Apr 23 2006, 12:05AM
My code does exactly the same as the following C code except that it exits early if it finds an exact result.
Table lookup in Flash memory is quite slow and the accuracy is not what I was aiming for but I have made a note of the <20 cycle algorithm for other uses.
I tried the sum of odds and i was faster on 8 bit numbers than my first attempt but if I configure my code for 8 bit then it becomes a lot faster than the sum of odds.
Re: Square root algorithms
Simon, Sun Apr 23 2006, 12:15AM
The Newton method should work well. To keep the options coming, what about a Taylor expansion? Ie. approximating the square-root function with a polynomial. Each term in your approximation should only involve two extra multiplications at most.
This isn't the best applicaton for Taylor approximations, however. Exponents and trig are better examples.
I guess this is meant to be a general algorithm, otherwise that would change things a lot.
Re:
Square root algorithms
Bjørn, Mon Apr 24 2006, 04:02PM
It is supposed to be a general purpose algorithm for a microcomputer and have these properties that does not mix well:
- Predictable
- Accurate
- Fast
- Small
My first attempt unrolls well and I can use an unrolled version for 16 bit numbers and use that for small numbers and to approximate the square root of large numbers and use the looped version for 32 bit numbers.
Re: Square root algorithms
Carbon_Rod, Tue Apr 25 2006, 04:43AM
For low speed 8-bit real time fixed-root core design a look-up style hash table is nice as it uses no ram. =) Its cost is O (1) so it’s fast, simple, and predictable. There is a space-time trade off.. However, it also offers controlled number rounding if a weighted result is needed.
Did you mean an n-th root function or specifically square root? Estimated results using successive approximation techniques are common.
Re: Square root algorithms
Bjørn, Tue Apr 25 2006, 12:54PM
I was specifically thinking about square roots since that is what I needed at the time. I would be interested in efficient implementations of all kinds of useful functions, including IEEE 754 floating point functions. There is only room for 8192 instructions and there is a lot of other things to be included.
Re:
Square root algorithms
Bjørn, Sat May 13 2006, 12:41PM
I found some fairly optimal code, 51 cycles, no approximations and no tables.
;Wilco Dijkstra
;R0 = sqrt(R1)
;Use R0-R2
sqrt32f?A MOV r2, #3 << 30
MOV r0, #1 << 30
CMP r1, r0, ROR #2 * 0
SUBHS r1, r1, r0, ROR #2 * 0
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 1
SUBHS r1, r1, r0, ROR #2 * 1
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 2
SUBHS r1, r1, r0, ROR #2 * 2
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 3
SUBHS r1, r1, r0, ROR #2 * 3
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 4
SUBHS r1, r1, r0, ROR #2 * 4
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 5
SUBHS r1, r1, r0, ROR #2 * 5
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 6
SUBHS r1, r1, r0, ROR #2 * 6
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 7
SUBHS r1, r1, r0, ROR #2 * 7
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 8
SUBHS r1, r1, r0, ROR #2 * 8
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 9
SUBHS r1, r1, r0, ROR #2 * 9
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 10
SUBHS r1, r1, r0, ROR #2 * 10
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 11
SUBHS r1, r1, r0, ROR #2 * 11
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 12
SUBHS r1, r1, r0, ROR #2 * 12
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 13
SUBHS r1, r1, r0, ROR #2 * 13
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 14
SUBHS r1, r1, r0, ROR #2 * 14
ADC r0, r2, r0, LSL #1
CMP r1, r0, ROR #2 * 15
SUBHS r1, r1, r0, ROR #2 * 15
ADC r0, r2, r0, LSL #1
BIC r0, r0, #3 << 30 ; for rounding add: CMP r1, r0 ADC r0,r0, #1
mov pc,r14