Complex math inside Altera Cyclone FPGA / Computer Science / Forums

Forums

4hv.org :: Forums :: Computer Science

« Previous topic | Next topic »

Complex math inside Altera Cyclone FPGA

1 2 next

Move Thread

LAN_403

Linas

Tue Jul 30 2013, 07:01PM

Registered Member #1143 Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721

Hello, just wondering does any one know how to make FMAC core inside fpga ?
Problem is, i have adc (16b) running at 30MHz, i need to be able to multiply adc value to f32 constant, and add to older f32 constant ( f32MAC )
Question is , how to deal with all latency inside addition, and f32 multiply megafunctions ?
to make problem easy to understand, c code for processor would looks like this:
f32 real = 0.0f,imag = 0.0f;
while(1)
{
i=0;
while(i<128)
{
real+=ADC_DATA*LUT_COS[i];//ADC data goes to 7-22 bit, exponent and sign value is 0, and mantissa first bits are zero;
imag-=adc_DATA*LUT_SIN[i];
i++;
}
//latch real and imag data to 64b ports, so mcu can take over , and while mcu do PID stuff, reset real and imag for next F32MAC
real=imag=0;
}

for LUT i will use ROM function, and will load f32 constans just in boot.
F32 multiply have 5 cycles delay, and addition/substraction have same 5 cycles delay.

Adrenaline

Tue Jul 30 2013, 07:07PM

Registered Member #235 Joined: Wed Feb 22 2006, 04:59PM
Location:
Posts: 80

Why Cyclone version, the II has 18x18 bit multiplies?
Why does the multiply take so long?
I know with Xilinx you can gang the discrete multiplier blocks together for a larger bit multiply and still execute in one clock cycle.

Linas

Tue Jul 30 2013, 07:28PM

Registered Member #1143 Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721

Adrenaline wrote ...

Why Cyclone version, the II has 18x18 bit multiplies?
Why does the multiply take so long?
I know with Xilinx you can gang the discrete multiplier blocks together for a larger bit multiply and still execute in one clock cycle.

I am using 7x 9bit_dsp_block, 111x LUT, and 209x Reg to do single f32 multiplication, maybe it is possible to do that in 1 cycle by using your own VHDL function, but i am no good at vhdl

I will use EP4CE6E22C7

Adrenaline

Tue Jul 30 2013, 07:57PM

Registered Member #235 Joined: Wed Feb 22 2006, 04:59PM
Location:
Posts: 80

Ah, crap, my fault I missed the floating point aspect.
Do you really need floating point?
Isn't the ADC just an integer value? You can have your LUTs as fixed point or integer.

Linas

Tue Jul 30 2013, 08:18PM

Registered Member #1143 Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721

yes, i need f32

Adrenaline

Tue Jul 30 2013, 08:33PM

Registered Member #235 Joined: Wed Feb 22 2006, 04:59PM
Location:
Posts: 80

I really don't know what I am looking at.
Is the sign always positive?
Are you using the exponent bits on the ADC data?
I understood your lower 7 bits are always zero?
"ADC data goes to 7-22 bit"
You should be able get away with a 15bit multiply for the mantissa bits.
The ADC data is still odd for an FPGA, for me, what is your overall project?

Linas

Tue Jul 30 2013, 09:03PM

Registered Member #1143 Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721

I have detector running 30MHz, with 16b adc, and i need to make PID loop.
STM32F4 have good I/O, but weak FMAC, ADSP-21488 have powerfull FMAC, but weak I/O.
only way to go is FPGA+MCU/DSP (while fpga do fmac, mcu/dsp will do pid calc, so no time is wasted, only problem i will have time shift in PID, but cycle speed must be constant, and greater than 200KHz (30MHz@128pixel)

Steve Conner

Tue Jul 30 2013, 09:20PM

Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706

Trust me, you don't want to use floating point math inside a FPGA. You will throw away the advantage of the FPGA by making a poor copy of a DSP floating point unit inside it.

Use fixed point with a large bit depth instead. The Xilinx Spartan 3A-DSP series have ready-made hardware MAC blocks inside. Or map the FPGA's block RAM into the memory space of a DSP, so you can have the FPGA do the high speed I/O and lay the data out ready for the DSP to do the math.

Linas

Tue Jul 30 2013, 09:31PM

Registered Member #1143 Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721

Steve Conner wrote ...

Trust me, you don't want to use floating point math inside a FPGA. You will throw away the advantage of the FPGA by making a poor copy of the floating point unit of a DSP inside it.

Use fixed point with a large bit depth instead. The Xilinx Spartan 3A-DSP series have ready-made hardware MAC blocks inside. Or map the FPGA into the memory space of a DSP, so you can have the FPGA do the high speed I/O and lay the data out ready for the DSP to do the math.

i know, i tested adsp-21488 with all pid loop code, just adc input was changed to u16 counter, and flags was used for performance measurement, i get more than 1.2MHz loop speed.
So my next move from DSP side is to use 176 pin adsp-21489, clock it to 500MHz, use dedicated 16b data port (original it should be used for sram or other memory), and use flags or even dedicated memory pins like clkin/out, cs and so on for speeding up reading from fifo ( or if speed is good, directly from adc), and fpga only do simple i/o so adsp don't waste time for detector timing generation and so on.

but if that can't do 200KHz, i must use fpga.

Steve Conner

Tue Jul 30 2013, 09:54PM

Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706

So you got the ADSP to boot?

1 2 next

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob