probability distribution of chopped sage / General Science and Electronics / Forums

Forums

4hv.org :: Forums :: General Science and Electronics

« Previous topic | Next topic »

probability distribution of chopped sage

Move Thread

LAN_403

Dr. Slack

Mon Nov 30 2015, 05:14PM

Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659

I was making a stuffing for Sunday lunch, rocking my nice sharp knife back and forth in the fancy pro-chef way that my wife doesn't seem able to master over a few sage leaves. While there were a few large bits, I chased those. That rapidly got impractical as the bits got smaller, and so I just heaped and chopped, working slowly through most of the pile before turning and reheaping. Obviously, the pieces had a range of sizes.

I started to wonder about the probability distribution of the sizes. My first thoughts were that it should be stable as chopping progressed, perhaps heading to an asymptotic distribution from any starting distribution. The chance of a piece being cut seemed to be somehow proportional to its size, so the big bits would shrink preferentially to small bits.

I realised that the dimensionality of the problem might be significant. The leaf fragments were 3D, and the moving knife defined a 2D cut plane, would it generalise? My wife, peeling the potatoes, noticed that I looked deep in thought, and asked me what I was thinking about. I should have said something along the lines of how beautiful she looked, instead I said I was wondering about the probability distribution that results from N-dimensional hypervolumes being randomly cut by N-1 dimensional hyperplanes. Damn, another opportunity missed to appear more human than Vulcan.

Anyhow, in one dimension, consider sticks of spaghetti, lined up in a bundle. It would be very easy to simulate that. After the initial deterministic slicing of the leaves, most pieces are very long and thin, approximating well to one dimension, and while they lie flat, they tend not to line up well. However, cuts at that stage do tend to produce two long thin pieces, so staying 'one dimensional'. A random position, the cut point, can be computed simply along the length.

In two dimensions, consider a stack of pieces of paper. The choice of random cut line needs the sheet to be rotated to a random angle, before a position is chosen across a diameter. Each resulting polygon would be strictly convex, and could be defined by a set of vertices. How wrong would it be to approximate each fragment to a disc of the same area, with a corresponding simplification of keeping track of the pieces, and elimination of the rotation?

In three dimensions, the particle would need to be randomly rotated around two axes, before the position of the cut plane is chosen across the diameter. I'm not sure that a real knife would cut real sage leaf fragments as randomly as that, they might be deflected rather than suffer a thin cut to the edge, but it's a nice idealisation to consider. Again, a spherical approximation might make things easier, at least at first.

In four dimensions, my head explodes, but I'm sure that just writing the sums down in k dimensions, then setting k to 4 or whatever, will have it all come out in the wash, as I'll be able to check whether it works for k = 1, 2 or 3.

Given the range of size of pieces, there will clearly be a difference between the distribution of the number of fragments of a size, and the mass of fragments of a size, which will tend to skew the distribution one way or the other.

Anyhow, before I actually do a simulation, does anybody have any idea what the distribution will be? Is there a way to do this analytically? I presume it will be asymptotic in any dimension. Will it be a known one, like Rayleigh, Boltzman or Zipf? Will it be significantly different between numbers of dimensions, and as the dimensional number increases will that be asymptotic?

If you make any guesses in this thread as to what you think the distribution will be, you are on scout's honour not to have done the simulation and found out the real answer beforehand. I'll probably have a crack at a simulation over the holidays, while the rest of the family sleep off the turkey.

Sulaiman

Mon Nov 30 2015, 06:53PM

Registered Member #162 Joined: Mon Feb 13 2006, 10:25AM
Location: United Kingdom
Posts: 3141

nice problem, made me (attempt to) think for a while,
I assume that the average particle size will asymptotically approach zero
For no reason other than it being one of the most common,
I guess a gaussian distribution of particle size at any one time after several iterations (cuts) have passed.

Assuming a one-dimensional blade randomly chopping a pile of leaves that are re-piled after each cut;
I can't even think how I'd simulate it !

Next time just tell your wife how gorgeous she / her shoes/dress/handbag/hair is and don't torture us ;)

Electra

Tue Dec 01 2015, 12:25AM

Registered Member #816 Joined: Sun Jun 03 2007, 07:29PM
Location:
Posts: 156

For simplicity could you not assume the leaves are just a two dimensional array of two dimensional objects, and the heap is just a stack of a random number of two dimensional pieces (so multiple of cuts with one knife cut). Consider the smallest size which a piece could be, the equivalent of perhaps 2 blade widths. Are you factoring any free space between fragments (you could have made a cut, that not cut much at all).

If you looking at the distribution over time or number of cuts v size, I would guess at Poisson distribution.
(edit)
Realized that doesn't make sense, Yes it should be the probability v size after a predetermined time or set number of cuts. I'm keeping Poisson distribution as its my favorite distribution.

Are you going to write something in 'c' or do you have an easier way of simulating it?

Dr. Slack

Tue Dec 01 2015, 07:44AM

Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659

After a little bit of thought, I realised that the 1D case was very easy to simulate.

Although cutting through a random heap of stuff produces many cuts, without loss of generality it can be done one cut at a time. In the nD case, the chance of hitting a particular fragment depends on the diameter of that fragment as seen by the knife. In the 1D case, that's just the length of the fragment.

So, how do I pick which fragment to cut? Conceptually, I line them all up without gaps between them, and bring a knife down at some random point. Algorithmically, make an array with the coordinates of the end points, starting at 0, and incrementing by the length of each fragment. Find which fragment's coordinates contain the random position, then cut that fragment. Conveniently, that random position also gives a random point at which to cut the target fragment. Rinse and repeat.

But wait. Why work out where to cut the chosen fragment by interpolating the random position into its end points, when the random position does the whole job in one? To clarify, let's say I have a fragment of length 0.1, that happens to be somewhere along the line, so the array with its endpoints contains the sequence [... 0.3, 0.4, ...]. Say my random position is 0.32, so it falls on that fragment. I could interpolate 0.32 into 0.3 and 0.4 to discover I cut it 0.2 along. Or, I could simply say that that cutting operation leaves a new junction at 0.32. If I insert that random number into my array, it now contains 0.3, 0.32 and 0.4, placing the boundary between the new fragments at the right place. But it gets better. If all I did was add a random number into an array, then the array does not have to be ordered. I can therefore simply collect random numbers, and order the array once only, at the end. I then difference adjacent position coordinates to get the lengths of the fragments.

This is such a clean and easy to describe operation, that I am sure the resulting distribution must have a simple analytic form, or at least be well known.

I start with a single line of unit length, represented by the array [0, 1]. Once cut into N fragments, the average length of each will be 1/N. This provides a convenient means to normalise the results. I histogram the lengths into bins whose widths are multiples of the average length after cutting. In a bastard child of matlab and python, the algorithm is

coords = [0, 1]
for (N_bits-1):
    coords.append(random())
coords.sort()
lengths = coords[1:end] - coords[0:end-1]
results = histogram(lengths, 1/N_bits)

As I ran this, it quickly became apparent that the number of fragments in the zeroth bin (0 to 1) was around 1-1/e, and the 1st (1 to 2) bin seemed to be less by a factor of e. To confirm this guess, I ran 10^7 fragments, and compared the results to what I'd expect if the bth bin contained a fraction of exp(b)*(1-1/e) of the fragments. I did not expect equality, but the difference would be around sqrt(m) for m items in the bin to be consistent with this hypothesis. These goodness factors are tabled as well.

actual        guess    difference      sqrt(guess)
  6322302     6321205.6     1096.4           2514.2
  2323841     2325441.6     1600.6           1524.9
   855439      855482.1       43.1            924.9
   314410      314714.3      304.3            561.0
   116316      115776.9      539.1            340.3
    42835       42591.9      243.1            206.4
    15695       15668.7       26.3            125.2
     5820        5764.2       55.8             75.9
     2137        2120.5       16.5             46.0
      756         780.1       24.1             27.9

It convinces me. It also helps that the sum of exp(b)*(1-1/e) for b in (0 to infinity) is 1 (does this board support LaTeX?). Now it needs an analytic derivation of that distribution from statistics! It does look exactly like a decreasing power law. For narrower bins, say 0.25 of an expected length, the ratio between bins is a consistent exp(0.25), all the way.

A cookie to Electra for Poisson! At least for some value of m, and integer bin widths.

The N dimensional case will not afford that easy shortcut of leaving the array of coordinates unsorted. I can use the same concept of stacking the diameters, but as they are proportional to the 1/Nth power of the volume, the sum of diameters will change as the hypervolume remains constant, so I will have to recompute the coordinate array every time. Perhaps I can make multiple cuts into the coordinate array, and rebuild less frequently, though that would be expected to change the statistics, slightly.

Uspring

Tue Dec 01 2015, 11:15AM

Registered Member #3988 Joined: Thu Jul 07 2011, 03:25PM
Location:
Posts: 711

I believe the length distribution for a [0,1] interval randomly cut N times is:

P(x) = N * (1-x)^(N-1)

For large N and small x this is close to

N * e^(-N * x)

Dr. Slack

Tue Dec 01 2015, 05:58PM

Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659

I've done a very crude simulation in N dimensions, with no attempt at the moment to speed things up over the brute force way of one cut, then re-order. This means that 2000 cuts by this method takes as long as 10^7 cuts by the fast 1D method, so I can't get a statistically meaningful result. However, the trend is clear.

If I histogram by volume, as the number of dimensions increases, the distribution skews to favour smaller pieces.

If I histogram by diameter, it peaks up to look more Gaussian.

Ash Small

Tue Dec 01 2015, 10:47PM

Registered Member #3414 Joined: Sun Nov 14 2010, 05:05PM
Location: UK
Posts: 4245

Has this information actually proved useful in improving your cutting method?

Dr. Slack

Wed Dec 02 2015, 10:02AM

Registered Member #72 Joined: Thu Feb 09 2006, 08:29AM
Location: UK St. Albans
Posts: 1659

Ash Small wrote ...

Has this information actually proved useful in improving your cutting method?

Funny you should ask that, the answer is no. Like all cutting tool operation, the important thing is to have a sharp blade.

It has shown me the weirdness of higher dimensional spaces however, when trying to understand the strange distributions I get when I allow the number of dimensions to increase to say 1000 or 1000000. For instance, start with a 10^6 dimensional hypercube with unit volume and unit width, the longest diagonal is 1000. Then shrink the volume to 1% of what it was, the width, the parameter I was using to see whether it gets cut by the 999,999 dimensional hyperknife, stays essentially unchanged. However irrelevant this is to food preparation, it is relevant to channel coding, error correction, cryptography etc.

Fortunately sage sticks resolutely to three dimensions, no matter how finely I chop it, even if the leaves do approximate to 2 before the first slicing, and the threads to 1 briefly after that.

Uspring

Thu Dec 03 2015, 12:20PM

Registered Member #3988 Joined: Thu Jul 07 2011, 03:25PM
Location:
Posts: 711

Ash wrote:

Has this information actually proved useful in improving your cutting method?

I've given up on the multidimensional case, but let me give an example for the one dimensional one, e.g. cutting chives. Say you have a 10cm piece and you cut it 20 times randomly. The probability of a particular piece being longer than 1cm is (1-0.1)^20 = 0.12. Since you have 21 pieces, it is very likely, that at least one of them is longer than 1cm. You could chop 40 times and make the probability that any of the pieces is longer than 1cm drop to around 0.5. If the intent is to avoid any longer than 1cm snippets, it is more economical to stop random chopping at 20 cuts and look for the 1 or 2 long parts and cut those. This looks like common practice to me.

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob