Our problem from 2/23/15 was
You’re the owner of the Meathead Math gymnasium and are starting a new series of yoga classes. At first you think you’ll need to do some advertising but then you wonder if word of mouth will be sufficient. In particular, you estimate that every member of the gym has some probability p of being friends with any other member of the gym. For example, if you’re gym member i, there’s a probability p that you’re friends with gym member j, a probability p that you’re friends with gym member k, etc. You estimate that once a gym member learns about the new yoga class, they will tell each of their other gym member friends about it.
If you select one gym member at random to tell about the promotion, what value of p is required for there to be at least a 50% chance of at least half of the gym hearing about the promotion assuming every gym member will tell their friends when they hear about it? You may assume the number of members at the gym N is large.
This problem perfectly illustrates the issues you can face when you don’t solve the problem before presenting it :) When I came up with this problem, it seemed so natural that it must have an exact—if complex—solution. Turns out I was mistaken. After a number of my own attempts and doing some research on the problem, it seems the best we can do is invoke a result from Erdős and Rényi that tells us that a graph quickly goes from being almost entirely disconnected to almost entirely connect very quickly around a threshold value.
A full proof can be found in Section 1.2 or Durrett’s Dynamics on Graphs and in many textbooks on random graphs and graph theory. Below I include a couple of the broad strokes.
First, we note that each member of the gym is symmetric to each other and, as such, should have the same average degree. This average degree depends on the probability p and the number of people n and if it’s constant for each person, we can write
where λ is a constant.
Somewhat intuitively, the proof shows us that when λ<1, there is a very low chance of any connected component in the graph getting larger than O(log(n)), and when λ>1, there is a very high chance that a large component emerges containing some fraction β of all n vertices. It proceeds by approximating the problem as a Galton-Watson branching process. In particular, we start with a single vertex and iteratively add all the neighbors of the vertices we are considering in a breadth-first-seach manner. The distribution of the number of neighbors each vertex is given by a binomial distribution Bi(n,p) but as n gets large, this can be approximated with a Poisson distribution. The proof goes on to employ the extinction criterion for the Galton-Watson branching process to construct the following recurrence relationship
where β is the fraction of vertices that are part of the large component and λ>1. And that gives us the necessary components to solve this problem. We need to solve for p such that there a 50% chance of selecting a node that’s part of the large component. If we assume that λ>1, we know with near certainty that recurrence relationship above holds so we just need p when
Reassuringly, this value of p results in λ>1.
The graph below shows the results of simulating this problem and verifies that the analytic solution closely matches the empirical results.
As always, please comment with any thoughts or feedback you had on this problem or if you approached it differently. Thanks for reading!
I was going to say … this sounded like percolation theory to me. Some tricky stuff!