Welcome
Username or Email:

Password:


Missing Code




[ ]
[ ]
Online
  • Guests: 30
  • Members: 0
  • Newest Member: omjtest
  • Most ever online: 396
    Guests: 396, Members: 0 on 12 Jan : 12:51
Members Birthdays:
One birthday today, congrats!
MicroTesla (34)


Next birthdays
07/07 MicroTesla (34)
07/09 Avi (41)
07/09 Jannick Hagen (15)
Contact
If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.


Special Thanks To:
  • Aaron Holmes
  • Aaron Wheeler
  • Adam Horden
  • Alan Scrimgeour
  • Andre
  • Andrew Haynes
  • Anonymous000
  • asabase
  • Austin Weil
  • barney
  • Barry
  • Bert Hickman
  • Bill Kukowski
  • Blitzorn
  • Brandon Paradelas
  • Bruce Bowling
  • BubeeMike
  • Byong Park
  • Cesiumsponge
  • Chris F.
  • Chris Hooper
  • Corey Worthington
  • Derek Woodroffe
  • Dalus
  • Dan Strother
  • Daniel Davis
  • Daniel Uhrenholt
  • datasheetarchive
  • Dave Billington
  • Dave Marshall
  • David F.
  • Dennis Rogers
  • drelectrix
  • Dr. John Gudenas
  • Dr. Spark
  • E.TexasTesla
  • eastvoltresearch
  • Eirik Taylor
  • Erik Dyakov
  • Erlend^SE
  • Finn Hammer
  • Firebug24k
  • GalliumMan
  • Gary Peterson
  • George Slade
  • GhostNull
  • Gordon Mcknight
  • Graham Armitage
  • Grant
  • GreySoul
  • Henry H
  • IamSmooth
  • In memory of Leo Powning
  • Jacob Cash
  • James Howells
  • James Pawson
  • Jeff Greenfield
  • Jeff Thomas
  • Jesse Frost
  • Jim Mitchell
  • jlr134
  • Joe Mastroianni
  • John Forcina
  • John Oberg
  • John Willcutt
  • Jon Newcomb
  • klugesmith
  • Leslie Wright
  • Lutz Hoffman
  • Mads Barnkob
  • Martin King
  • Mats Karlsson
  • Matt Gibson
  • Matthew Guidry
  • mbd
  • Michael D'Angelo
  • Mikkel
  • mileswaldron
  • mister_rf
  • Neil Foster
  • Nick de Smith
  • Nick Soroka
  • nicklenorp
  • Nik
  • Norman Stanley
  • Patrick Coleman
  • Paul Brodie
  • Paul Jordan
  • Paul Montgomery
  • Ped
  • Peter Krogen
  • Peter Terren
  • PhilGood
  • Richard Feldman
  • Robert Bush
  • Royce Bailey
  • Scott Fusare
  • Scott Newman
  • smiffy
  • Stella
  • Steven Busic
  • Steve Conner
  • Steve Jones
  • Steve Ward
  • Sulaiman
  • Thomas Coyle
  • Thomas A. Wallace
  • Thomas W
  • Timo
  • Torch
  • Ulf Jonsson
  • vasil
  • Vaxian
  • vladi mazzilli
  • wastehl
  • Weston
  • William Kim
  • William N.
  • William Stehl
  • Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Forums
4hv.org :: Forums :: General Science and Electronics
« Previous topic | Next topic »   

probability distribution of chopped sage

Move Thread LAN_403
Dr. Slack
Mon Nov 30 2015, 05:14PM Print
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
I was making a stuffing for Sunday lunch, rocking my nice sharp knife back and forth in the fancy pro-chef way that my wife doesn't seem able to master over a few sage leaves. While there were a few large bits, I chased those. That rapidly got impractical as the bits got smaller, and so I just heaped and chopped, working slowly through most of the pile before turning and reheaping. Obviously, the pieces had a range of sizes.

I started to wonder about the probability distribution of the sizes. My first thoughts were that it should be stable as chopping progressed, perhaps heading to an asymptotic distribution from any starting distribution. The chance of a piece being cut seemed to be somehow proportional to its size, so the big bits would shrink preferentially to small bits.

I realised that the dimensionality of the problem might be significant. The leaf fragments were 3D, and the moving knife defined a 2D cut plane, would it generalise? My wife, peeling the potatoes, noticed that I looked deep in thought, and asked me what I was thinking about. I should have said something along the lines of how beautiful she looked, instead I said I was wondering about the probability distribution that results from N-dimensional hypervolumes being randomly cut by N-1 dimensional hyperplanes. Damn, another opportunity missed to appear more human than Vulcan.

Anyhow, in one dimension, consider sticks of spaghetti, lined up in a bundle. It would be very easy to simulate that. After the initial deterministic slicing of the leaves, most pieces are very long and thin, approximating well to one dimension, and while they lie flat, they tend not to line up well. However, cuts at that stage do tend to produce two long thin pieces, so staying 'one dimensional'. A random position, the cut point, can be computed simply along the length.

In two dimensions, consider a stack of pieces of paper. The choice of random cut line needs the sheet to be rotated to a random angle, before a position is chosen across a diameter. Each resulting polygon would be strictly convex, and could be defined by a set of vertices. How wrong would it be to approximate each fragment to a disc of the same area, with a corresponding simplification of keeping track of the pieces, and elimination of the rotation?

In three dimensions, the particle would need to be randomly rotated around two axes, before the position of the cut plane is chosen across the diameter. I'm not sure that a real knife would cut real sage leaf fragments as randomly as that, they might be deflected rather than suffer a thin cut to the edge, but it's a nice idealisation to consider. Again, a spherical approximation might make things easier, at least at first.

In four dimensions, my head explodes, but I'm sure that just writing the sums down in k dimensions, then setting k to 4 or whatever, will have it all come out in the wash, as I'll be able to check whether it works for k = 1, 2 or 3.

Given the range of size of pieces, there will clearly be a difference between the distribution of the number of fragments of a size, and the mass of fragments of a size, which will tend to skew the distribution one way or the other.

Anyhow, before I actually do a simulation, does anybody have any idea what the distribution will be? Is there a way to do this analytically? I presume it will be asymptotic in any dimension. Will it be a known one, like Rayleigh, Boltzman or Zipf? Will it be significantly different between numbers of dimensions, and as the dimensional number increases will that be asymptotic?

If you make any guesses in this thread as to what you think the distribution will be, you are on scout's honour not to have done the simulation and found out the real answer beforehand. I'll probably have a crack at a simulation over the holidays, while the rest of the family sleep off the turkey.




Back to top
Sulaiman
Mon Nov 30 2015, 06:53PM
Sulaiman Registered Member #162 Joined: Mon Feb 13 2006, 10:25AM
Location: United Kingdom
Posts: 3141
nice problem, made me (attempt to) think for a while,
I assume that the average particle size will asymptotically approach zero
For no reason other than it being one of the most common,
I guess a gaussian distribution of particle size at any one time after several iterations (cuts) have passed.

Assuming a one-dimensional blade randomly chopping a pile of leaves that are re-piled after each cut;
I can't even think how I'd simulate it !

Next time just tell your wife how gorgeous she / her shoes/dress/handbag/hair is and don't torture us ;)
Back to top
Electra
Tue Dec 01 2015, 12:25AM
Electra Registered Member #816 Joined: Sun Jun 03 2007, 07:29PM
Location:
Posts: 156
For simplicity could you not assume the leaves are just a two dimensional array of two dimensional objects, and the heap is just a stack of a random number of two dimensional pieces (so multiple of cuts with one knife cut). Consider the smallest size which a piece could be, the equivalent of perhaps 2 blade widths. Are you factoring any free space between fragments (you could have made a cut, that not cut much at all).

If you looking at the distribution over time or number of cuts v size, I would guess at Poisson distribution.
(edit)
Realized that doesn't make sense, Yes it should be the probability v size after a predetermined time or set number of cuts. I'm keeping Poisson distribution as its my favorite distribution.

Are you going to write something in 'c' or do you have an easier way of simulating it?
Back to top
Dr. Slack
Tue Dec 01 2015, 07:44AM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
After a little bit of thought, I realised that the 1D case was very easy to simulate.

Although cutting through a random heap of stuff produces many cuts, without loss of generality it can be done one cut at a time. In the nD case, the chance of hitting a particular fragment depends on the diameter of that fragment as seen by the knife. In the 1D case, that's just the length of the fragment.

So, how do I pick which fragment to cut? Conceptually, I line them all up without gaps between them, and bring a knife down at some random point. Algorithmically, make an array with the coordinates of the end points, starting at 0, and incrementing by the length of each fragment. Find which fragment's coordinates contain the random position, then cut that fragment. Conveniently, that random position also gives a random point at which to cut the target fragment. Rinse and repeat.

But wait. Why work out where to cut the chosen fragment by interpolating the random position into its end points, when the random position does the whole job in one? To clarify, let's say I have a fragment of length 0.1, that happens to be somewhere along the line, so the array with its endpoints contains the sequence [... 0.3, 0.4, ...]. Say my random position is 0.32, so it falls on that fragment. I could interpolate 0.32 into 0.3 and 0.4 to discover I cut it 0.2 along. Or, I could simply say that that cutting operation leaves a new junction at 0.32. If I insert that random number into my array, it now contains 0.3, 0.32 and 0.4, placing the boundary between the new fragments at the right place. But it gets better. If all I did was add a random number into an array, then the array does not have to be ordered. I can therefore simply collect random numbers, and order the array once only, at the end. I then difference adjacent position coordinates to get the lengths of the fragments.

This is such a clean and easy to describe operation, that I am sure the resulting distribution must have a simple analytic form, or at least be well known.

I start with a single line of unit length, represented by the array [0, 1]. Once cut into N fragments, the average length of each will be 1/N. This provides a convenient means to normalise the results. I histogram the lengths into bins whose widths are multiples of the average length after cutting. In a bastard child of matlab and python, the algorithm is

coords = [0, 1]
for (N_bits-1):
    coords.append(random())
coords.sort()
lengths = coords[1:end] - coords[0:end-1]
results = histogram(lengths, 1/N_bits)
As I ran this, it quickly became apparent that the number of fragments in the zeroth bin (0 to 1) was around 1-1/e, and the 1st (1 to 2) bin seemed to be less by a factor of e. To confirm this guess, I ran 10^7 fragments, and compared the results to what I'd expect if the bth bin contained a fraction of exp(b)*(1-1/e) of the fragments. I did not expect equality, but the difference would be around sqrt(m) for m items in the bin to be consistent with this hypothesis. These goodness factors are tabled as well.

actual        guess    difference      sqrt(guess)
  6322302     6321205.6     1096.4           2514.2
  2323841     2325441.6     1600.6           1524.9
   855439      855482.1       43.1            924.9
   314410      314714.3      304.3            561.0
   116316      115776.9      539.1            340.3
    42835       42591.9      243.1            206.4
    15695       15668.7       26.3            125.2
     5820        5764.2       55.8             75.9
     2137        2120.5       16.5             46.0
      756         780.1       24.1             27.9
It convinces me. It also helps that the sum of exp(b)*(1-1/e) for b in (0 to infinity) is 1 (does this board support LaTeX?). Now it needs an analytic derivation of that distribution from statistics! It does look exactly like a decreasing power law. For narrower bins, say 0.25 of an expected length, the ratio between bins is a consistent exp(0.25), all the way.

A cookie to Electra for Poisson! At least for some value of m, and integer bin widths.

The N dimensional case will not afford that easy shortcut of leaving the array of coordinates unsorted. I can use the same concept of stacking the diameters, but as they are proportional to the 1/Nth power of the volume, the sum of diameters will change as the hypervolume remains constant, so I will have to recompute the coordinate array every time. Perhaps I can make multiple cuts into the coordinate array, and rebuild less frequently, though that would be expected to change the statistics, slightly.
Back to top
Uspring
Tue Dec 01 2015, 11:15AM
Uspring Registered Member #3988 Joined: Thu Jul 07 2011, 03:25PM
Location:
Posts: 711
I believe the length distribution for a [0,1] interval randomly cut N times is:

P(x) = N * (1-x)^(N-1)

For large N and small x this is close to

N * e^(-N * x)

Back to top
Dr. Slack
Tue Dec 01 2015, 05:58PM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
I've done a very crude simulation in N dimensions, with no attempt at the moment to speed things up over the brute force way of one cut, then re-order. This means that 2000 cuts by this method takes as long as 10^7 cuts by the fast 1D method, so I can't get a statistically meaningful result. However, the trend is clear.

If I histogram by volume, as the number of dimensions increases, the distribution skews to favour smaller pieces.

If I histogram by diameter, it peaks up to look more Gaussian.
Back to top
Ash Small
Tue Dec 01 2015, 10:47PM
Ash Small Registered Member #3414 Joined: Sun Nov 14 2010, 05:05PM
Location: UK
Posts: 4245
Has this information actually proved useful in improving your cutting method? wink
Back to top
Dr. Slack
Wed Dec 02 2015, 10:02AM
Dr. Slack Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659
Ash Small wrote ...

Has this information actually proved useful in improving your cutting method? wink

Funny you should ask that, the answer is no. Like all cutting tool operation, the important thing is to have a sharp blade.

It has shown me the weirdness of higher dimensional spaces however, when trying to understand the strange distributions I get when I allow the number of dimensions to increase to say 1000 or 1000000. For instance, start with a 10^6 dimensional hypercube with unit volume and unit width, the longest diagonal is 1000. Then shrink the volume to 1% of what it was, the width, the parameter I was using to see whether it gets cut by the 999,999 dimensional hyperknife, stays essentially unchanged. However irrelevant this is to food preparation, it is relevant to channel coding, error correction, cryptography etc.

Fortunately sage sticks resolutely to three dimensions, no matter how finely I chop it, even if the leaves do approximate to 2 before the first slicing, and the threads to 1 briefly after that.
Back to top
Uspring
Thu Dec 03 2015, 12:20PM
Uspring Registered Member #3988 Joined: Thu Jul 07 2011, 03:25PM
Location:
Posts: 711
Ash wrote:
Has this information actually proved useful in improving your cutting method?
I've given up on the multidimensional case, but let me give an example for the one dimensional one, e.g. cutting chives. Say you have a 10cm piece and you cut it 20 times randomly. The probability of a particular piece being longer than 1cm is (1-0.1)^20 = 0.12. Since you have 21 pieces, it is very likely, that at least one of them is longer than 1cm. You could chop 40 times and make the probability that any of the pieces is longer than 1cm drop to around 0.5. If the intent is to avoid any longer than 1cm snippets, it is more economical to stop random chopping at 20 cuts and look for the 1 or 2 long parts and cut those. This looks like common practice to me.
Back to top

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob

Go to:

Powered by e107 Forum System
 
Legal Information
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.