I'm trying to calculate the Taylor Series
1 + x + x2 / 2! + x3 / 3! + ... + x10 / 10!.
My program gives me infinity every time, I'm brand new to MIPS. I am only concerned about the input when it is between 0 and 10, inclusive. we are stopping at xn/n! when n = 10. Heres what I came up with:
# A program to first, find the power of x^n and the factorial of n. x,n are both between 0-10 inclusive. Then it finds the taylor series
.data
pr1: .asciiz "Enter Float x: "
.text
.globl main
main:
la $a0, pr1 # prompt user for x
li $v0,4 # print string
syscall
li $v0, 6 # read single
syscall
mtc1 $v0, $f0 # f0 <--- x
exponent: # f0 --> x
mul.s $f1, $f0, $f0 # f1 --> x^2
mul.s $f2, $f1, $f0 # f2 --> x^3
mul.s $f3, $f2, $f0 # f3 --> x^4
mul.s $f4, $f3, $f0 # f4 --> x^5
mul.s $f5, $f4, $f0 # f5 --> x^6
mul.s $f6, $f5, $f0 # f6 --> x^7
mul.s $f7, $f6, $f0 # f7 --> x^8
mul.s $f8, $f7, $f0 # f8 --> x^9
mul.s $f9, $f8, $f0 # f9 --> x^10
factorial:
li $s0, 1 # n = 10
li $t0, 1
add $t0, $t0, $s0 # t0 = 2! = 2
add $t1, $t0, $s0 # t1 = 3
mul $t1, $t1, $t0 # t1 = 3! = 6
add $t2, $t0, $t0 # t2 = 4
mul $t2, $t2, $t1 # t2 = 4! = 24
addi $t3, $t1, -1 # t3 = 5
mul $t3, $t3, $t2 # t3 = 5! = 120
add $t4, $t1, $zero # t4 = 6
mul $t4, $t4, $t3 # t4 = 6! = 720
addi $t5, $t1, 1 # t5 = 7
mul $t5, $t5, $t4 # t5 = 7! = 5040
add $t6, $t1, $t0 # t6 = 8
mul $t6, $t6, $t5 # t6 = 8! = 40320
add $t7, $t1, $t0
addi $t7, $t7, 1 # t7 = 9
mul $t7, $t7, $t6 # t7 = 9! = 362880
mul $s1, $t0, 5 # s1 = 10
mul $s1, $s1, $t7 # s1 = 10! = 3628800
taylor:
mtc1 $s0, $f10 # $f10 = 1
cvt.s.w $f10, $f10 # convert to float
add.s $f0, $f0, $f10 # = 1 + x
mtc1 $t0, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f1 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^2/2!
mtc1 $t1, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f2 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x +.. + x^3/3!
mtc1 $t2, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f3 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x +..+ x^4/4!
mtc1 $t3, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f4 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^5/5!
mtc1 $t4, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f5 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^6/6!
mtc1 $t5, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f6 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^7/7!
mtc1 $t6, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f7 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^8/8!
mtc1 $t7, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f8 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^9/9!
mtc1 $s1, $f10 # move n! to cp1
cvt.s.w $f10, $f10 # convert to float
div.s $f10, $f10, $f9 # divide x^n/ n
add.s $f0, $f0, $f10 # = 1 + x + x^10/10!
end:
mov.s $f12, $f0 # argument
li $v0, 2 # print float
syscall
li $v0, 10
syscall
To start with, you have a few bugs:
(1) As Jester mentioned, your "read float" syscall had a bug.
(2) You're calculating factorial(n)
but the Taylor cosine series uses factorial(2n)
, so it increases more quickly.
(3) Another problem you may be having [would be having with the correct factorial] is that you're using integer math for the factorial. When calculating to 10 iterations, the factorial will overflow a 32 bit integer. For 10 iterations, we end up with [at least] factorial(18)
which is ~6.4e15 (i.e. does not fit in a 32 integer). With integer math, the factorial, instead of increasing steadily, will "oscillate" whenever the value wraps 32 bits.
I've taken a slightly different approach.
Rather than pre-calculate everything [as you've done], I created a solution that uses a loop. That may or may not help you with debugging your own implementation, but I'd be remiss if I didn't recommend refactoring.
Below is asm code for the loop implementation. There is also a C version that I used to debug the algorithm first. There is also an asm version with some debug statements.
Rationale:
You're trying to calculate cos
using Taylor Series summation/expansion [to 10 iterations]. The formula does use a "summation" operator, after all.
You're precalculating all x^n
terms and all factorial terms and putting them in different registers. That's fine as MIPS has 32 FP registers. But, with double precision, we'd have to use two registers, so that would mean only 16 numbers. In a way, what you did is the equivalent of a compiler doing "loop unrolling".
Another issue is that it can be hard to keep track of all those registers. What is holding what value.
And, suppose, the problem were to use 20 iterations instead of 10. We'd probably run out of registers. In practice, this might be necessary for other series expansions because we might not get convergence as quickly.
So, I'd recommend using a loop. Each power term is easily calculated from the previous one. Likewise, for each factorial.
Another advantage to using the loop approach is that rather than using a fixed number of iterations, we can monitor the term value [cur
] (which is getting smaller and smaller) and if it changes by less than a certain amount [or is smaller than that amount] (e.g. for double precision, 1e-14) we can stop because our values won't get any better than the precision our floating point format [and hardware] affords us.
Note: This is not shown below but is easy enough to implement.
Here's the asm version. The variable names used in the comments refer to the variable names used in the C code [which follows afterward].
# A program to calculate cos(x) using taylor series
.data
pr1: .asciiz "Enter Float x: "
sym_fnc: .asciiz "cos(x): "
nl: .asciiz "\n"
.text
.globl main
main:
li $s7,0 # clear debug flag #+
# prompt user for x value
li $v0,4 # print string
la $a0,pr1 # prompt user for x
syscall
# read user's x value
li $v0,6 # read float
syscall
jal qcos
la $a0,sym_fnc # string
jal prtflt
li $v0,10
syscall
# qcos -- calculate cosine
#
# RETURNS:
# f0 -- cos(x)
#
# arguments:
# f0 -- x value
#
# registers:
# f2 -- x value
# f4 -- sum
# f6 -- xpow (x^n)
# f8 -- n2m1
# f10 -- factorial (nfac)
# f12 -- RESERVED (used in pflt)
# f14 -- current term
# f16 -- x^2
#
# f18 -- a one value
#
# t0 -- zero value
# t1 -- one value
#
# t6 -- negation flag
# t7 -- iteration count
qcos:
move $s0,$ra # save return address
mov.s $f2,$f0 # save x value
mul.s $f16,$f2,$f2 # get x^2
li $t0,0 # get a zero
li $t1,1 # get a one
li $t6,1 # start with positive term
# xpow = 1
mtc1 $t1,$f6 # xpow = 1
cvt.s.w $f6,$f6 # convert to float
# n2m1 = 0
mtc1 $t0,$f8 # n2m1 = 0
cvt.s.w $f8,$f8 # convert to float
# nfac = 1
mtc1 $t1,$f10 # nfac = 1
cvt.s.w $f10,$f10 # convert to float
# get a one value
mtc1 $t1,$f18 # onetmp = 1
cvt.s.w $f18,$f18 # convert to float
# zero the sum
mtc1 $t0,$f4 # sum = 0
cvt.s.w $f4,$f4 # convert to float
li $t7,10 # set number of iterations
cosloop:
div.s $f14,$f6,$f10 # cur = xpow / nfac
# apply the term to the sum
bgtz $t6,cospos # do positive? yes, fly
sub.s $f4,$f4,$f14 # subtract the term
b cosneg
cospos:
add.s $f4,$f4,$f14 # add the term
cosneg:
subi $t7,$t7,1 # bump down iteration count
blez $t7,cosdone # are we done? if yes, fly
# now calculate intermediate values for _next_ term
# get _next_ power term
mul.s $f6,$f6,$f16 # xpow *= x2
# go from factorial(2n) to factorial(2n+1)
add.s $f8,$f8,$f18 # n2m1 += 1
mul.s $f10,$f10,$f8 # nfac *= n2m1
# go from factorial(2n+1) to factorial(2n+1+1)
add.s $f8,$f8,$f18 # n2m1 += 1
mul.s $f10,$f10,$f8 # nfac *= n2m1
neg $t6,$t6 # flip sign for next time
j cosloop
cosdone:
mov.s $f0,$f4 # set return value
move $ra,$s0 # restore return address
jr $ra
# dbgflt -- debug print float number
dbgflt:
bnez $s7,prtflt
jr $ra
# dbgnum -- debug print int number
dbgnum:
beqz $s7,dbgnumdone
li $v0,1
syscall
dbgnumdone:
jr $ra
# dbgprt -- debug print float number
dbgprt:
beqz $s7,dbgprtdone
li $v0,4
syscall
dbgprtdone:
jr $ra
# prtflt -- print float number
#
# arguments:
# a0 -- prefix string (symbol name)
# f12 -- number to print
prtflt:
li $v0,4 # syscall: print string
syscall
li $v0,2 # print float
syscall
li $v0,4 # syscall: print string
la $a0,nl # print newline
syscall
jr $ra
Here's the C code [it also has code for sin(x)
]:
// mipsqsin/mipstaylor -- fast sine/cosine calculation
#include <stdio.h>
#include <math.h>
#define ITERMAX 10
// qcos -- calculate cosine
double
qcos(double x)
{
int iteridx;
double x2;
double cur;
int neg;
double xpow;
double n2m1;
double nfac;
double sum;
// square of x
x2 = x * x;
// values for initial terms where n==0:
xpow = 1.0;
n2m1 = 0.0;
nfac = 1.0;
neg = 1;
sum = 0.0;
iteridx = 0;
// NOTES:
// (1) with the setup above, we can just use the loop without any special
// casing
while (1) {
// calculate current value
cur = xpow / nfac;
// apply it to sum
if (neg < 0)
sum -= cur;
else
sum += cur;
// bug out when done
if (++iteridx >= ITERMAX)
break;
// now calculate intermediate values for _next_ sum term
// get _next_ power term
xpow *= x2;
// go from factorial(2n) to factorial(2n+1)
n2m1 += 1.0;
nfac *= n2m1;
// now get factorial(2n+1+1)
n2m1 += 1.0;
nfac *= n2m1;
// flip sign
neg = -neg;
}
return sum;
}
// qsin -- calculate sine
double
qsin(double x)
{
int iteridx;
double x2;
double cur;
int neg;
double xpow;
double n2m1;
double nfac;
double sum;
// square of x
x2 = x * x;
// values for initial terms where n==0:
xpow = x;
n2m1 = 1.0;
nfac = 1.0;
neg = 1;
sum = 0.0;
iteridx = 0;
// NOTES:
// (1) with the setup above, we can just use the loop without any special
// casing
while (1) {
// calculate current value
cur = xpow / nfac;
// apply it to sum
if (neg < 0)
sum -= cur;
else
sum += cur;
// bug out when done
if (++iteridx >= ITERMAX)
break;
// now calculate intermediate values for _next_ sum term
// get _next_ power term
xpow *= x2;
// go from factorial(2n+1) to factorial(2n+1+1)
n2m1 += 1.0;
nfac *= n2m1;
// now get factorial(2n+1+1+1)
n2m1 += 1.0;
nfac *= n2m1;
// flip sign
neg = -neg;
}
return sum;
}
// testfnc -- test function
void
testfnc(int typ,const char *sym)
{
double (*efnc)(double);
double (*qfnc)(double);
double vale;
double valq;
double x;
double dif;
int iter;
switch (typ) {
case 0:
efnc = cos;
qfnc = qcos;
break;
case 1:
efnc = sin;
qfnc = qsin;
break;
default:
efnc = NULL;
qfnc = NULL;
break;
}
iter = 0;
for (x = 0.0; x <= M_PI_2; x += 0.001, ++iter) {
vale = efnc(x);
valq = qfnc(x);
dif = vale - valq;
dif = fabs(dif);
printf("%s: %d x=%.15f e=%.15f q=%.15f dif=%.15f %s\n",
sym,iter,x,vale,valq,dif,(dif < 1e-14) ? "PASS" : "FAIL");
}
}
// main -- main program
int
main(int argc,char **argv)
{
testfnc(0,"cos");
testfnc(1,"sin");
return 0;
}
Here's the asm version with debug statements I used. This is "printf debugging", just like in C.
Alternatively, both the mars
and spim
simulators have builtin GUI debugging. You can single step the code by clicking on a single button. You can see a live display of all register values.
Note: Personally, I prefer mars
. This works with mars
but I don't know if spim
supports .eqv
pseudo ops.
# A program to calculate cos(x) using taylor series
.data
pr1: .asciiz "Enter Float x: "
sym_fnc: .asciiz "cos(x): "
nl: .asciiz "\n"
#+ddef f2,x
.eqv fpr_x $f2 #+
sym_x: .asciiz "x[f2]: " #+
#+
#+ddef f16,x2
.eqv fpr_x2 $f16 #+
sym_x2: .asciiz "x2[f16]: " #+
#+
#+ddef f4,sum
.eqv fpr_sum $f4 #+
sym_sum: .asciiz "sum[f4]: " #+
#+
#+ddef f6,xpow
.eqv fpr_xpow $f6 #+
sym_xpow: .asciiz "xpow[f6]: " #+
#+
#+ddef f8,n2m1
.eqv fpr_n2m1 $f8 #+
sym_n2m1: .asciiz "n2m1[f8]: " #+
#+
#+ddef f10,nfac
.eqv fpr_nfac $f10 #+
sym_nfac: .asciiz "nfac[f10]: " #+
#+
#+ddef f14,cur
.eqv fpr_cur $f14 #+
sym_cur: .asciiz "cur[f14]: " #+
#+
.text
.globl main
main:
#+dask
la $a0,dbgask # prompt user #+
li $v0,4 # print string #+
syscall #+
# get debug flag #+
li $v0,5 #+
syscall #+
move $s7,$v0 #+
.data #+
dbgask: .asciiz "Debug (0/1) ? " #+
.text #+
#+
# prompt user for x value
li $v0,4 # print string
la $a0,pr1 # prompt user for x
syscall
# read user's x value
li $v0,6 # read float
syscall
jal qcos
la $a0,sym_fnc # string
jal prtflt
li $v0,10
syscall
# qcos -- calculate cosine
#
# RETURNS:
# f0 -- cos(x)
#
# arguments:
# f0 -- x value
#
# registers:
# f2 -- x value
# f4 -- sum
# f6 -- xpow (x^n)
# f8 -- n2m1
# f10 -- factorial (nfac)
# f12 -- RESERVED (used in pflt)
# f14 -- current term
# f16 -- x^2
#
# f18 -- a one value
#
# t0 -- zero value
# t1 -- one value
#
# t6 -- negation flag
# t7 -- iteration count
qcos:
move $s0,$ra # save return address
mov.s $f2,$f0 # save x value
#+dflt x
la $a0,sym_x #+
mov.s $f12,fpr_x #+
jal dbgflt #+
#+
mul.s $f16,$f2,$f2 # get x^2
#+dflt x2
la $a0,sym_x2 #+
mov.s $f12,fpr_x2 #+
jal dbgflt #+
#+
li $t0,0 # get a zero
li $t1,1 # get a one
li $t6,1 # start with positive term
# xpow = 1
mtc1 $t1,$f6 # xpow = 1
cvt.s.w $f6,$f6 # convert to float
# n2m1 = 0
mtc1 $t0,$f8 # n2m1 = 0
cvt.s.w $f8,$f8 # convert to float
# nfac = 1
mtc1 $t1,$f10 # nfac = 1
cvt.s.w $f10,$f10 # convert to float
# get a one value
mtc1 $t1,$f18 # onetmp = 1
cvt.s.w $f18,$f18 # convert to float
# zero the sum
mtc1 $t0,$f4 # sum = 0
cvt.s.w $f4,$f4 # convert to float
li $t7,10 # set number of iterations
cosloop:
#+dprt "cosloop: LOOP iter="
la $a0,dprt_1 #+
jal dbgprt #+
.data #+
dprt_1: .asciiz "cosloop: LOOP iter=" #+
.text #+
#+
#+dnum $t7
move $a0,$t7 #+
jal dbgnum #+
#+
#+dprt "\n"
la $a0,dprt_2 #+
jal dbgprt #+
.data #+
dprt_2: .asciiz "\n" #+
.text #+
#+
#+dflt xpow
la $a0,sym_xpow #+
mov.s $f12,fpr_xpow #+
jal dbgflt #+
#+
#+dflt nfac
la $a0,sym_nfac #+
mov.s $f12,fpr_nfac #+
jal dbgflt #+
#+
div.s $f14,$f6,$f10 # cur = xpow / nfac
#+dflt cur
la $a0,sym_cur #+
mov.s $f12,fpr_cur #+
jal dbgflt #+
#+
# apply the term to the sum
bgtz $t6,cospos # do positive? yes, fly
#+dprt "costerm: NEG\n"
la $a0,dprt_3 #+
jal dbgprt #+
.data #+
dprt_3: .asciiz "costerm: NEG\n" #+
.text #+
#+
sub.s $f4,$f4,$f14 # subtract the term
b cosneg
cospos:
#+dprt "costerm: POS\n"
la $a0,dprt_4 #+
jal dbgprt #+
.data #+
dprt_4: .asciiz "costerm: POS\n" #+
.text #+
#+
add.s $f4,$f4,$f14 # add the term
cosneg:
#+dflt sum
la $a0,sym_sum #+
mov.s $f12,fpr_sum #+
jal dbgflt #+
#+
subi $t7,$t7,1 # bump down iteration count
blez $t7,cosdone # are we done? if yes, fly
# now calculate intermediate values for _next_ term
#+dprt "cosloop: CALC\n"
la $a0,dprt_5 #+
jal dbgprt #+
.data #+
dprt_5: .asciiz "cosloop: CALC\n" #+
.text #+
#+
# get _next_ power term
mul.s $f6,$f6,$f16 # xpow *= x2
# go from factorial(2n) to factorial(2n+1)
add.s $f8,$f8,$f18 # n2m1 += 1
#+dflt n2m1
la $a0,sym_n2m1 #+
mov.s $f12,fpr_n2m1 #+
jal dbgflt #+
#+
mul.s $f10,$f10,$f8 # nfac *= n2m1
#+dflt nfac
la $a0,sym_nfac #+
mov.s $f12,fpr_nfac #+
jal dbgflt #+
#+
# go from factorial(2n+1) to factorial(2n+1+1)
add.s $f8,$f8,$f18 # n2m1 += 1
#+dflt n2m1
la $a0,sym_n2m1 #+
mov.s $f12,fpr_n2m1 #+
jal dbgflt #+
#+
mul.s $f10,$f10,$f8 # nfac *= n2m1
#+dflt nfac
la $a0,sym_nfac #+
mov.s $f12,fpr_nfac #+
jal dbgflt #+
#+
neg $t6,$t6 # flip sign for next time
j cosloop
cosdone:
mov.s $f0,$f4 # set return value
move $ra,$s0 # restore return address
jr $ra
# dbgflt -- debug print float number
dbgflt:
bnez $s7,prtflt
jr $ra
# dbgnum -- debug print int number
dbgnum:
beqz $s7,dbgnumdone
li $v0,1
syscall
dbgnumdone:
jr $ra
# dbgprt -- debug print float number
dbgprt:
beqz $s7,dbgprtdone
li $v0,4
syscall
dbgprtdone:
jr $ra
# prtflt -- print float number
#
# arguments:
# a0 -- prefix string (symbol name)
# f12 -- number to print
prtflt:
li $v0,4 # syscall: print string
syscall
li $v0,2 # print float
syscall
li $v0,4 # syscall: print string
la $a0,nl # print newline
syscall
jr $ra
Here's the debug output log for a single value:
Debug (0/1) ? 1
Enter Float x: 0.123
x[f2]: 0.123
x2[f16]: 0.015129001
cosloop: LOOP iter=10
xpow[f6]: 1.0
nfac[f10]: 1.0
cur[f14]: 1.0
costerm: POS
sum[f4]: 1.0
cosloop: CALC
n2m1[f8]: 1.0
nfac[f10]: 1.0
n2m1[f8]: 2.0
nfac[f10]: 2.0
cosloop: LOOP iter=9
xpow[f6]: 0.015129001
nfac[f10]: 2.0
cur[f14]: 0.0075645004
costerm: NEG
sum[f4]: 0.9924355
cosloop: CALC
n2m1[f8]: 3.0
nfac[f10]: 6.0
n2m1[f8]: 4.0
nfac[f10]: 24.0
cosloop: LOOP iter=8
xpow[f6]: 2.2888667E-4
nfac[f10]: 24.0
cur[f14]: 9.536944E-6
costerm: POS
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 5.0
nfac[f10]: 120.0
n2m1[f8]: 6.0
nfac[f10]: 720.0
cosloop: LOOP iter=7
xpow[f6]: 3.4628265E-6
nfac[f10]: 720.0
cur[f14]: 4.8094813E-9
costerm: NEG
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 7.0
nfac[f10]: 5040.0
n2m1[f8]: 8.0
nfac[f10]: 40320.0
cosloop: LOOP iter=6
xpow[f6]: 5.2389105E-8
nfac[f10]: 40320.0
cur[f14]: 1.2993329E-12
costerm: POS
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 9.0
nfac[f10]: 362880.0
n2m1[f8]: 10.0
nfac[f10]: 3628800.0
cosloop: LOOP iter=5
xpow[f6]: 7.925948E-10
nfac[f10]: 3628800.0
cur[f14]: 2.1841789E-16
costerm: NEG
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 11.0
nfac[f10]: 3.99168E7
n2m1[f8]: 12.0
nfac[f10]: 4.790016E8
cosloop: LOOP iter=4
xpow[f6]: 1.1991168E-11
nfac[f10]: 4.790016E8
cur[f14]: 2.503367E-20
costerm: POS
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 13.0
nfac[f10]: 6.2270208E9
n2m1[f8]: 14.0
nfac[f10]: 8.7178289E10
cosloop: LOOP iter=3
xpow[f6]: 1.8141439E-13
nfac[f10]: 8.7178289E10
cur[f14]: 2.0809583E-24
costerm: NEG
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 15.0
nfac[f10]: 1.30767428E12
n2m1[f8]: 16.0
nfac[f10]: 2.09227885E13
cosloop: LOOP iter=2
xpow[f6]: 2.7446184E-15
nfac[f10]: 2.09227885E13
cur[f14]: 1.3117842E-28
costerm: POS
sum[f4]: 0.99244505
cosloop: CALC
n2m1[f8]: 17.0
nfac[f10]: 3.55687415E14
n2m1[f8]: 18.0
nfac[f10]: 6.4023735E15
cosloop: LOOP iter=1
xpow[f6]: 4.1523335E-17
nfac[f10]: 6.4023735E15
cur[f14]: 6.485616E-33
costerm: NEG
sum[f4]: 0.99244505
cos(x): 0.99244505
-- program is finished running --