If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.
Special Thanks To:
Aaron Holmes
Aaron Wheeler
Adam Horden
Alan Scrimgeour
Andre
Andrew Haynes
Anonymous000
asabase
Austin Weil
barney
Barry
Bert Hickman
Bill Kukowski
Blitzorn
Brandon Paradelas
Bruce Bowling
BubeeMike
Byong Park
Cesiumsponge
Chris F.
Chris Hooper
Corey Worthington
Derek Woodroffe
Dalus
Dan Strother
Daniel Davis
Daniel Uhrenholt
datasheetarchive
Dave Billington
Dave Marshall
David F.
Dennis Rogers
drelectrix
Dr. John Gudenas
Dr. Spark
E.TexasTesla
eastvoltresearch
Eirik Taylor
Erik Dyakov
Erlend^SE
Finn Hammer
Firebug24k
GalliumMan
Gary Peterson
George Slade
GhostNull
Gordon Mcknight
Graham Armitage
Grant
GreySoul
Henry H
IamSmooth
In memory of Leo Powning
Jacob Cash
James Howells
James Pawson
Jeff Greenfield
Jeff Thomas
Jesse Frost
Jim Mitchell
jlr134
Joe Mastroianni
John Forcina
John Oberg
John Willcutt
Jon Newcomb
klugesmith
Leslie Wright
Lutz Hoffman
Mads Barnkob
Martin King
Mats Karlsson
Matt Gibson
Matthew Guidry
mbd
Michael D'Angelo
Mikkel
mileswaldron
mister_rf
Neil Foster
Nick de Smith
Nick Soroka
nicklenorp
Nik
Norman Stanley
Patrick Coleman
Paul Brodie
Paul Jordan
Paul Montgomery
Ped
Peter Krogen
Peter Terren
PhilGood
Richard Feldman
Robert Bush
Royce Bailey
Scott Fusare
Scott Newman
smiffy
Stella
Steven Busic
Steve Conner
Steve Jones
Steve Ward
Sulaiman
Thomas Coyle
Thomas A. Wallace
Thomas W
Timo
Torch
Ulf Jonsson
vasil
Vaxian
vladi mazzilli
wastehl
Weston
William Kim
William N.
William Stehl
Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Registered Member #49
Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Neural networks texts and websites often speak of overtraining, and the importance of ensuring that the network stays generally applicable. I find the many sources call this "memorization"... that is, the network simply memorizes the training set, rather than approximates the underlying function. I can't get my head around this, and I think the memorization statement is wrong or a simplification.
Consider a network being trained to approximate AND: The training set contains only 4 cases, so its easy for me to imagine that even a simple network could produce satisfactory performance by simply remembering each case.
However, they also often speak of how simple networks cannot approximate XOR. And yet, this input set is the same size as AND, so if the network is simply "memorizing", there should be no such thing as an impossible function.
The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.
My guess is that saying a network "memorizes" a set when its overtrained is not accurate, but is a simplification used to make texts more accesible to the casual reader. As I understand it, overtraining is simply a name for when a network identifies trends that are valid in your training set, but not in the 'general' set, and that no 'memorization' takes place. Any one agree/disagree?
Registered Member #27
Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
Consider a network being trained to approximate AND: The training set contains only 4 cases, so its easy for me to imagine that even a simple network could produce satisfactory performance by simply remembering each case.
For cases that are too simple to be useful, overtraining may give as good or better results. The problem arises on real problems where it is impossible to train the net on anything but a tiny subset of all possible input. If it is overtrained it will fail on inputs it has not been trained on and it will reach false optimums where it can't continue to improve because it does very good on the training set.
However, they also often speak of how simple networks cannot approximate XOR. And yet, this input set is the same size as AND, so if the network is simply "memorizing", there should be no such thing as an impossible function.
A neural network with one input layer and one output layer is mathematically incapable of XOR and countless other functions (if you try to make an XOR gate out of transistors you will realise the problem). You would need at least one hidden layer or feedback to do XOR. It can also be shown that a neural network with one hidden layer can do everything a neural network with N hidden layers can do.
You are right that there are no impossible functions for 1+N hidden layers but for a network with no hidden layers even "memorizing" out of reach in the same way as the XOR function.
The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.
That works well and is always a good idea. It would be even better to try to estimate the size needed by some more advanced method so you scale the network to contain the resulting function rather than a fraction of the training set.
My guess is that saying a network "memorizes" a set when its overtrained is not accurate, but is a simplification used to make texts more accesible to the casual reader. As I understand it, overtraining is simply a name for when a network identifies trends that are valid in your training set, but not in the 'general' set, and that no 'memorization' takes place. Any one agree/disagree?
The memory effect is real, but if it is the only or even the most common problem when overtraining is mentioned I don't know. If you train a neural network with just a few data point it is easy to see that the network simply memorizes the data. Even if it gives the correct result it is not what we wanted, we wanted it to model the simplest function that fits the data, not the data itself.
Registered Member #49
Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Bjørn Bæverfjord wrote ...
The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.
That works well and is always a good idea. It would be even better to try to estimate the size needed by some more advanced method so you scale the network to contain the resulting function rather than a fraction of the training set.
Do such analyses have a name? I've been frustrated thus far in my search.
I'm imagining cases where the function behind the data is unknown, or possibly even nonexistant (stock prediction, horse race gambling, rainfall forecasting, etc). In such a case one cannot, even with advanced methods like you mentioned, size the net to a function that one doesn't know.
Thus it seems that one would want to size the net to the available data: large enough to hopefully model the function but small enough to be incapable of memorizing the entire training set. I haven't the foggiest idea of the form such analyses would take.
Registered Member #65
Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Weighted variables do have limits. I have used a GUI application that tracks the relative weights in an easy to read overlapping graph that can be tuned/edited with a mouse click.
IIRC there was a site about OCR that uses the NN technique and also compares PID control situation comparisons. I will post the URL if I recall its location...
Registered Member #27
Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
For all functions that can be evaluated on a digital computer there exists at least one set of weights that will make a digitally simulated neural network with one hidden layer compute that function.
Do such analyses have a name? I've been frustrated thus far in my search.
I don't know any name for it. The simple but fairly efficient method I have used it to split the training data into two sets of identical properties then train on one set and test on the other. After trying a few different sizes it usually becomes clear what size is most promising.
Registered Member #32
Joined: Sat Feb 04 2006, 08:58AM
Location: Australia
Posts: 549
wrote ...
The simple but fairly efficient method I have used it to split the training data into two sets of identical properties then train on one set and test on the other.
This technique is becoming more popular for this problem, which is really a problem that affects all model fitting. It's nice because it lends itself to automation.: generate a batch of models that fit one subset of your data, pick the best n, test these on the other set and pick the best one.
Registered Member #49
Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Well I'm actually doing just that. I have about 600 cases in my data pool. I take half and train on them, and after each training point I run the network on the selection set. I print out the residual error from the training point and the selection point and I can monitor how the network is doing.
My problem is that I just can't make any headway. I have 8 input neurons and one output neuron. I started by formulating my data in binary, e.g. all inputs and outputs were 0 or 1. This only gave me 512 unique datapoints, however, and when training on only 300 cases, im sure most of them fell on the few most common points. I figured this lended itself to overtraining.
So I changed to sigmoid activation functions and sigmoid inputs (but my output data was still binary). Now the problem is that if I keep the hidden neurons high (like 15), the training error will drop very low, but the select error never goes does. If I drop the hidden neurons lower, like to 4, then the error drops to 30% (i.e. 30% of the cases output an incorrect result, assuming that i'm generous and treat >.5 as a 1 and <.5 as a 0) but the selection error STILL doesn't budge. I'm at my wits end, I guess my data truly is random.
Registered Member #27
Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
The only other thing I can think of is that the input representation might not expose the information in a way that the training algorithm can exploit. You could try Fourier, Wavelet or some other transformation of the data before you present it to the network. You could even try to feed it severel different transforms at the same time.
Neural networks are very sensitive to the way the information is presented to them, but with only 8 bits of data it is hard to imagine it should make a big difference.
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.