If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.
Special Thanks To:
Aaron Holmes
Aaron Wheeler
Adam Horden
Alan Scrimgeour
Andre
Andrew Haynes
Anonymous000
asabase
Austin Weil
barney
Barry
Bert Hickman
Bill Kukowski
Blitzorn
Brandon Paradelas
Bruce Bowling
BubeeMike
Byong Park
Cesiumsponge
Chris F.
Chris Hooper
Corey Worthington
Derek Woodroffe
Dalus
Dan Strother
Daniel Davis
Daniel Uhrenholt
datasheetarchive
Dave Billington
Dave Marshall
David F.
Dennis Rogers
drelectrix
Dr. John Gudenas
Dr. Spark
E.TexasTesla
eastvoltresearch
Eirik Taylor
Erik Dyakov
Erlend^SE
Finn Hammer
Firebug24k
GalliumMan
Gary Peterson
George Slade
GhostNull
Gordon Mcknight
Graham Armitage
Grant
GreySoul
Henry H
IamSmooth
In memory of Leo Powning
Jacob Cash
James Howells
James Pawson
Jeff Greenfield
Jeff Thomas
Jesse Frost
Jim Mitchell
jlr134
Joe Mastroianni
John Forcina
John Oberg
John Willcutt
Jon Newcomb
klugesmith
Leslie Wright
Lutz Hoffman
Mads Barnkob
Martin King
Mats Karlsson
Matt Gibson
Matthew Guidry
mbd
Michael D'Angelo
Mikkel
mileswaldron
mister_rf
Neil Foster
Nick de Smith
Nick Soroka
nicklenorp
Nik
Norman Stanley
Patrick Coleman
Paul Brodie
Paul Jordan
Paul Montgomery
Ped
Peter Krogen
Peter Terren
PhilGood
Richard Feldman
Robert Bush
Royce Bailey
Scott Fusare
Scott Newman
smiffy
Stella
Steven Busic
Steve Conner
Steve Jones
Steve Ward
Sulaiman
Thomas Coyle
Thomas A. Wallace
Thomas W
Timo
Torch
Ulf Jonsson
vasil
Vaxian
vladi mazzilli
wastehl
Weston
William Kim
William N.
William Stehl
Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Registered Member #1143
Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721
Wastrel wrote ...
To what extent did you optimise the code on the ARM?
You mean STM32F4 running pid loop ? If yes I was using c functions close to asm instructions, and using dap tricks to get high speed calculations (like atan2 approximation). If I use inline asm I could boost speed, but not by much. Compiler done good job by optimising Code
Yes, STM32F4. A doubling of speed from C to asm is usually pretty easy. I don't know what you mean by DAP tricks. Mostly I'm thinking clock counting, assembler level math optimising, copying code from flash to SRAM. But then a 256 point FFT sounds complicated for a PID loop.
Registered Member #1143
Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721
Wastrel wrote ...
I don't know what you mean by DAP tricks. Mostly I'm thinking clock counting, assembler level math optimising, copying code from flash to SRAM. But then a 256 point FFT sounds complicated for a PID loop.
DSP, spell correction kicks in :( FFT is good for all spectrum, but because i calculate only one frequency magnitude, DFT is very fast, because i have coefficients in CCM memory. It's a bit hard do write asm for cortex M4, for old school DSP was very easy to program in asm, small instructions set (20 pages), not like arm (>1000)
I see what you mean by DFT, usually it's a synonym for FFT, you are doing a classic F(x)*sin(nx) and a F(x)*cos(nx) and summations, with the sin and cos coefficients in tightly coupled memory. The M4 is thumb based and that's a pain, also I can't find the latency information for the M4 they provide so readily for the Cortex A parts. It's possible a move to fixpoint would help, even without that my suspicion is we could speed up that routine hugely. If you are happy to let me look at the code I'm happy to spend some time.
Registered Member #1143
Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721
Wastrel wrote ...
I see what you mean by DFT, usually it's a synonym for FFT, you are doing a classic F(x)*sin(nx) and a F(x)*cos(nx) and summations, with the sin and cos coefficients in tightly coupled memory. The M4 is thumb based and that's a pain, also I can't find the latency information for the M4 they provide so readily for the Cortex A parts. It's possible a move to fixpoint would help, even without that my suspicion is we could speed up that routine hugely. If you are happy to let me look at the code I'm happy to spend some time.
this is my main loop, all sync is done by FPGA , i only need to toggle start signal and fpga will do the rest. When i toggle GPIO_13 i check if FIFO is empty, if not, read data, and do floating MAC operation, and when Re and Im is calculate, i found angle and generate feedback via FSMC to 16b dac
Is while(i<128) right or would that normally be 256?
From the look of the routine, we can certainly speed up the math but my main suspicion would be the GPIO could be taking a lot of time. Have you tested how fast you can toggle a GPIO line with nothing else in the loop?
An example from a slightly earlier chip,
Some thoughts currently, you are doing a minimum of one dummy read every cycle with while(GPIOC->IDR < 32766); and it may be faster to cache it. If your FPGA can trigger on both edges that would save another access.
There is a lot of IO in the routine, so I can't meaningfully compile and test here yet. The muls are very fast, so we just need to streamline the assembler and maybe unroll the loop by one. If possible please upload a working binary, I can match the disassembly to the source to work out what the chip is really wasting time on, I can patch that and then we can inline the changes.
I'm quite impressed with this chip, I'm going to order one of the discovery boards.
Registered Member #1143
Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721
gpio toggle using this method is as fast as you can get. it is precisely half core speed, i tested it (125MHz).
Without math i can read detector at 62.49KHz (62,5 is theoretical), with math and feedback enabled i can do 58KHz. But i don't know will it keep-up with new detector, here is more about my problems
The core in that thread has 26 instructions. It takes about 36 clock cycles based on a 250Mhz chip achieving 54khz. 10 instructions, mostly load and store can be removed easily by keeping the variables in registers during the loop. 1 of those is conditional on the top 16 bits of an IDR read being 0 and it not needing a sign extend. Interleaving the sin and cos table would save 1 instruction (1 clock cycle) and free up an extra register. while(i<128) { sctable[2*i]=(float32_t)cosf((6.28318531f*i*POINT)
/N); sctable[2*i+1]=(float32_t)
sinf((6.28318531f*i*POINT)/N); i++; } Where sctable needs to be 256 single floats in size.
FPGA acting on both edges would save 1 instruction (1 or 2 clock cycles). Moving to fixpoint has the potential to save 4 instructions, but we might be short of registers, we'd need an extra four. If the frequency you are filtering is quite low, so the sin and cos tables change slowly, using the coefficients twice would save 1 instruction (a load, 2 clocks?) per cycle on average and may not affect precision too much.
If possible please update your current core with the interleaved table and upload the binary, I will patch that and when we know it works we can inline. From that point changes will be easier. I'm optimistic we can double the speed of this keeping floats, what are the details of the new sensor, are the frames faster or do they have more pixels?
Edited 13/07/2013 to correct a mistake in the code.
Registered Member #1143
Joined: Sun Nov 25 2007, 04:55PM
Location: Vilnius, Lithuania
Posts: 721
Small update, made new programmer based on STM32F415 ( i had lot of problems with STM32F103)
with help from Labview, it can program flash in no time ( less than 1s for small program)
Problem, i don't see any activity from ADSP-21488. after power up i should see some live in CS or other spi flash bus line, but i don't, and sure enough, no toggle is seen on flag pins
Can any one point me ADSP-21488 SPI flash boot pins, since i was unable to find any data about it, so i used same pins as on larger device ( from LQFP 176 ). For boot i have lowest clk konfig (00) , also i am booting as (00)
Or should i boot as SPI Master ? (damn, i should cut some tracks then )
ok, i made as SPI_MASTE, now i have some activity on spi buss, clk, data, cs
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.