Welcome
Username or Email:

Password:


Missing Code




[ ]
[ ]
Online
  • Guests: 17
  • Members: 0
  • Newest Member: omjtest
  • Most ever online: 396
    Guests: 396, Members: 0 on 12 Jan : 12:51
Members Birthdays:
No birthdays today

Next birthdays
06/02 GreySoul (45)
06/02 GluD (35)
06/02 northern_lightning (41)
Contact
If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.


Special Thanks To:
  • Aaron Holmes
  • Aaron Wheeler
  • Adam Horden
  • Alan Scrimgeour
  • Andre
  • Andrew Haynes
  • Anonymous000
  • asabase
  • Austin Weil
  • barney
  • Barry
  • Bert Hickman
  • Bill Kukowski
  • Blitzorn
  • Brandon Paradelas
  • Bruce Bowling
  • BubeeMike
  • Byong Park
  • Cesiumsponge
  • Chris F.
  • Chris Hooper
  • Corey Worthington
  • Derek Woodroffe
  • Dalus
  • Dan Strother
  • Daniel Davis
  • Daniel Uhrenholt
  • datasheetarchive
  • Dave Billington
  • Dave Marshall
  • David F.
  • Dennis Rogers
  • drelectrix
  • Dr. John Gudenas
  • Dr. Spark
  • E.TexasTesla
  • eastvoltresearch
  • Eirik Taylor
  • Erik Dyakov
  • Erlend^SE
  • Finn Hammer
  • Firebug24k
  • GalliumMan
  • Gary Peterson
  • George Slade
  • GhostNull
  • Gordon Mcknight
  • Graham Armitage
  • Grant
  • GreySoul
  • Henry H
  • IamSmooth
  • In memory of Leo Powning
  • Jacob Cash
  • James Howells
  • James Pawson
  • Jeff Greenfield
  • Jeff Thomas
  • Jesse Frost
  • Jim Mitchell
  • jlr134
  • Joe Mastroianni
  • John Forcina
  • John Oberg
  • John Willcutt
  • Jon Newcomb
  • klugesmith
  • Leslie Wright
  • Lutz Hoffman
  • Mads Barnkob
  • Martin King
  • Mats Karlsson
  • Matt Gibson
  • Matthew Guidry
  • mbd
  • Michael D'Angelo
  • Mikkel
  • mileswaldron
  • mister_rf
  • Neil Foster
  • Nick de Smith
  • Nick Soroka
  • nicklenorp
  • Nik
  • Norman Stanley
  • Patrick Coleman
  • Paul Brodie
  • Paul Jordan
  • Paul Montgomery
  • Ped
  • Peter Krogen
  • Peter Terren
  • PhilGood
  • Richard Feldman
  • Robert Bush
  • Royce Bailey
  • Scott Fusare
  • Scott Newman
  • smiffy
  • Stella
  • Steven Busic
  • Steve Conner
  • Steve Jones
  • Steve Ward
  • Sulaiman
  • Thomas Coyle
  • Thomas A. Wallace
  • Thomas W
  • Timo
  • Torch
  • Ulf Jonsson
  • vasil
  • Vaxian
  • vladi mazzilli
  • wastehl
  • Weston
  • William Kim
  • William N.
  • William Stehl
  • Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Forums
4hv.org :: Forums :: Computer Science
« Previous topic | Next topic »   

Neural Network Overtraining

Move Thread LAN_403
AndrewM
Tue Jan 16 2007, 03:27PM Print
AndrewM Registered Member #49 Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Neural networks texts and websites often speak of overtraining, and the importance of ensuring that the network stays generally applicable. I find the many sources call this "memorization"... that is, the network simply memorizes the training set, rather than approximates the underlying function. I can't get my head around this, and I think the memorization statement is wrong or a simplification.

Consider a network being trained to approximate AND: The training set contains only 4 cases, so its easy for me to imagine that even a simple network could produce satisfactory performance by simply remembering each case.

However, they also often speak of how simple networks cannot approximate XOR. And yet, this input set is the same size as AND, so if the network is simply "memorizing", there should be no such thing as an impossible function.

The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.

My guess is that saying a network "memorizes" a set when its overtrained is not accurate, but is a simplification used to make texts more accesible to the casual reader. As I understand it, overtraining is simply a name for when a network identifies trends that are valid in your training set, but not in the 'general' set, and that no 'memorization' takes place. Any one agree/disagree?
Back to top
Bjørn
Tue Jan 16 2007, 08:07PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
Consider a network being trained to approximate AND: The training set contains only 4 cases, so its easy for me to imagine that even a simple network could produce satisfactory performance by simply remembering each case.
For cases that are too simple to be useful, overtraining may give as good or better results. The problem arises on real problems where it is impossible to train the net on anything but a tiny subset of all possible input. If it is overtrained it will fail on inputs it has not been trained on and it will reach false optimums where it can't continue to improve because it does very good on the training set.

However, they also often speak of how simple networks cannot approximate XOR. And yet, this input set is the same size as AND, so if the network is simply "memorizing", there should be no such thing as an impossible function.
A neural network with one input layer and one output layer is mathematically incapable of XOR and countless other functions (if you try to make an XOR gate out of transistors you will realise the problem). You would need at least one hidden layer or feedback to do XOR. It can also be shown that a neural network with one hidden layer can do everything a neural network with N hidden layers can do.

You are right that there are no impossible functions for 1+N hidden layers but for a network with no hidden layers even "memorizing" out of reach in the same way as the XOR function.

The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.
That works well and is always a good idea. It would be even better to try to estimate the size needed by some more advanced method so you scale the network to contain the resulting function rather than a fraction of the training set.

My guess is that saying a network "memorizes" a set when its overtrained is not accurate, but is a simplification used to make texts more accesible to the casual reader. As I understand it, overtraining is simply a name for when a network identifies trends that are valid in your training set, but not in the 'general' set, and that no 'memorization' takes place. Any one agree/disagree?
The memory effect is real, but if it is the only or even the most common problem when overtraining is mentioned I don't know. If you train a neural network with just a few data point it is easy to see that the network simply memorizes the data. Even if it gives the correct result it is not what we wanted, we wanted it to model the simplest function that fits the data, not the data itself.
Back to top
AndrewM
Tue Jan 16 2007, 09:14PM
AndrewM Registered Member #49 Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Bjørn Bæverfjord wrote ...

The reason I'm asking is: if networks truly memorize the set itself when overtrained, we should be able to estimate the storage capacity of a network based on its architecture (x neurons, y synapses, z layers). The advantage being that you could size your training set such that it was larger than the network is capable of memorizing and reduce the nets ability to overtrain.
That works well and is always a good idea. It would be even better to try to estimate the size needed by some more advanced method so you scale the network to contain the resulting function rather than a fraction of the training set.

Do such analyses have a name? I've been frustrated thus far in my search.

I'm imagining cases where the function behind the data is unknown, or possibly even nonexistant (stock prediction, horse race gambling, rainfall forecasting, etc). In such a case one cannot, even with advanced methods like you mentioned, size the net to a function that one doesn't know.

Thus it seems that one would want to size the net to the available data: large enough to hopefully model the function but small enough to be incapable of memorizing the entire training set. I haven't the foggiest idea of the form such analyses would take. rolleyes
Back to top
Carbon_Rod
Tue Jan 16 2007, 10:42PM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Weighted variables do have limits. I have used a GUI application that tracks the relative weights in an easy to read overlapping graph that can be tuned/edited with a mouse click.

IIRC there was a site about OCR that uses the NN technique and also compares PID control situation comparisons. I will post the URL if I recall its location...

Cheers,
Back to top
Bjørn
Wed Jan 17 2007, 02:19AM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
For all functions that can be evaluated on a digital computer there exists at least one set of weights that will make a digitally simulated neural network with one hidden layer compute that function.

Do such analyses have a name? I've been frustrated thus far in my search.
I don't know any name for it. The simple but fairly efficient method I have used it to split the training data into two sets of identical properties then train on one set and test on the other. After trying a few different sizes it usually becomes clear what size is most promising.
Back to top
Simon
Sat Jan 20 2007, 01:43AM
Simon Registered Member #32 Joined: Sat Feb 04 2006, 08:58AM
Location: Australia
Posts: 549
wrote ...

The simple but fairly efficient method I have used it to split the training data into two sets of identical properties then train on one set and test on the other.
This technique is becoming more popular for this problem, which is really a problem that affects all model fitting. It's nice because it lends itself to automation.: generate a batch of models that fit one subset of your data, pick the best n, test these on the other set and pick the best one.
Back to top
AndrewM
Sun Jan 21 2007, 01:12AM
AndrewM Registered Member #49 Joined: Thu Feb 09 2006, 04:05AM
Location: Bigass Pile of Penguins
Posts: 362
Well I'm actually doing just that. I have about 600 cases in my data pool. I take half and train on them, and after each training point I run the network on the selection set. I print out the residual error from the training point and the selection point and I can monitor how the network is doing.

My problem is that I just can't make any headway. I have 8 input neurons and one output neuron. I started by formulating my data in binary, e.g. all inputs and outputs were 0 or 1. This only gave me 512 unique datapoints, however, and when training on only 300 cases, im sure most of them fell on the few most common points. I figured this lended itself to overtraining.

So I changed to sigmoid activation functions and sigmoid inputs (but my output data was still binary). Now the problem is that if I keep the hidden neurons high (like 15), the training error will drop very low, but the select error never goes does. If I drop the hidden neurons lower, like to 4, then the error drops to 30% (i.e. 30% of the cases output an incorrect result, assuming that i'm generous and treat >.5 as a 1 and <.5 as a 0) but the selection error STILL doesn't budge. I'm at my wits end, I guess my data truly is random.
Back to top
Bjørn
Sun Jan 21 2007, 05:18AM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
The only other thing I can think of is that the input representation might not expose the information in a way that the training algorithm can exploit. You could try Fourier, Wavelet or some other transformation of the data before you present it to the network. You could even try to feed it severel different transforms at the same time.

Neural networks are very sensitive to the way the information is presented to them, but with only 8 bits of data it is hard to imagine it should make a big difference.
Back to top

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob

Go to:

Powered by e107 Forum System
 
Legal Information
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.