Re: Randomize three variables subject to sum always equal to 1

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Lars-Åke Aspelin (larske_at_REMOOOVEtelia.com)
Date: 09/05/04


Date: Sun, 05 Sep 2004 09:18:15 GMT

On Sun, 05 Sep 2004 01:34:02 GMT, "kcc" <kcconline@comcast.NOSPAM.com>
wrote:

>"Lars-Åke Aspelin" <larske@REMOOOVEtelia.com> wrote in message
>news:of3kj0921l438ba7qil0an5dc98hemvcnl@4ax.com...
>> On Sat, 04 Sep 2004 02:50:43 GMT, "kcc" <kcconline@comcast.NOSPAM.com>
>> wrote:
>>
>> >The problem, or it may be a problem depending on the original poster's
>need,
>> >is that most of the solutions do not produce 3 RV's with the same
>> >distribution.
>>
>> And the one suggested below doesn't do that either.
>> We don't know if that is a problem or not as the original problem was
>> to vaguely defined when it comes to what "randomly selects" mean,
>>
>> >Another way to think of the problem is to start with a pie. Generate 3
>RV's
>> >where each defines the location of where to make a cut. 0 is at 12
>o'clock,
>> >.5 at 6 o'clock, etc. These values would be transformed by defining 3
>new
>> >RV's as the size of the slices. By definition, the sum will equal the
>> >whole.
>> >
>> >No VB is needed. Just use the following formulas:
>> >Row 1 has the uniformly dist RV's
>> >A1, B1 & C1=rand()
>> >Row 2 is the RV's sorted
>> >A2 = min(A1:C1), B2=large(A1:C1,2), C2=max(A1:C1)
>> >Then the desired RV's would be
>> >A3=B2-A2, B3=C2-B2, C3=1-C2+A2
>> >
>> >As with all the solutions, the RV's are not independent.
>> >It's been too many years for me to do the math, but the distribution
>> >appears to be exponential in shape with the highest density at 0
>> >and lowest at 1.
>> >If I could do the math, I also might prove a suspicion that the result is
>> >effectively the same as Tom's.
>> >
>>
>> No, it is not the same as the "divide by the sum" result given by Tom
>> where the "RV's" do have the same distribution (for what it's worth).
>> Actually your "A2", "B2", and "C2" variables have the averages of
>> 0.25, 0.5 and 0.75 respectively. That means that your
>> "A3", "B3" and "C3" variables have the averages of
>> 0.25, 0.25 and 0.5 respectively. Thus the three "RV's" do not have
>> the same distrubution.
>>
>> Lars-Åke
>
>I hate being wrong. The pie analogy was fine, but the implementation
>was flawed. I don't know why it didn't occur to me that sorting would
>effect the distribution. If I skip the extra step and make
>A2==IF(A1=MIN($A1:$C1),LARGE($A1:$C1,2)-A1,IF(A1=LARGE($A1:$C1,2),MAX($A1:$C
>1)-A1,1-A1+MIN($A1:$C1)))
>and copy to B2 and C2, row 2 will have the distribution I was shooting for.
>This time I tested all three RV's rather than one and assume they where the
>same. This time, each has a mean of 1/3, as expected.
>Ken
>

Yes, the sorting is obviously not as innocent as one would imagine.
Without the sorting your three distributions are equal as you state
above.

I really liked your pie analogy and thats why I started fiddling about
with it. My first thought was that once you had selected your three
cuts there would be no changes to the sizes of the pie slices if you
"rotated" the pie to get one of the cuts at "12 o'clock".
The other two cuts will still be uniformly distributed over the pice.

With that "predefined cut" you just have to generate two random
cuts and this will be the same as cutting the interval [0,1] with two
cuts (which someone else might already have proposed).

So with the two randoms in A1 and B1 the following formulas give the
same "common" distribution to the three "RV's" as your example above:

In A2: "=MIN(A1:B1)" (the first slice)
In B2: "=ABS(A1-B1)" (the second slice)
In C2: " =1-MAX(A1:B1)" (the third slice)

The distribution of all these "RV's" is not exponential but triangular
with the probability density function (pdf) equal to 2*(1-x).
(0<=x<=1)

You can obtain the same result with your formulas above and just set
one of the three random numbers in A1:C1 to any constant value in
[0,1] e.g. a 0 in C1. I just chose to skip it from the calculations.
So there is no need for more than two random numbers if you are
satisfied with the triangular distribution for the three "RV's".
The "divide by the sum" proposal gives a "more balanced distribution"
but that involves three random numbers in the calculations.

Lars-Åke



Relevant Pages

  • Re: Hpw make lists that are easy to sort.
    ... It would become competitive to the hashing solution if the sorting would be about ten times faster. ... this case it's the sum of several copies of the uniform distribution), ... that with the fast fourier transform much more efficiently than ...
    (comp.lang.python)
  • Re: Randomize three variables subject to sum always equal to 1
    ... We don't know if that is a problem or not as the original problem was ... >If I could do the math, I also might prove a suspicion that the result is ... it is not the same as the "divide by the sum" result given by Tom ... where the "RV's" do have the same distribution. ...
    (microsoft.public.excel.programming)
  • Re: Integer math sort
    ... > Here is a version of the math (distribution) sort for sorting ...
    (comp.programming)
  • Re: Data structure for finding min of set intersection
    ... about worst case asymptotic running time in theory, ... techniques that might be effective in practice. ... sorting the sets might run much faster than Thetatime ... a little more about the common case, about the distribution on problem ...
    (comp.theory)
  • Re: Integer math sort
    ... >> Here is a version of the math (distribution) sort for sorting ...
    (comp.programming)