{"id":764,"date":"2013-12-20T18:48:29","date_gmt":"2013-12-20T17:48:29","guid":{"rendered":"http:\/\/pit-claudel.fr\/clement\/blog\/?p=764"},"modified":"2013-12-21T00:42:13","modified_gmt":"2013-12-20T23:42:13","slug":"the-cereal-box-problem-how-many-boxes-does-it-take-to-find-all-prizes","status":"publish","type":"post","link":"https:\/\/pit-claudel.fr\/clement\/blog\/the-cereal-box-problem-how-many-boxes-does-it-take-to-find-all-prizes\/","title":{"rendered":"The cereal box problem: How many boxes does it take to find all prizes?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"wrap\" style=\"margin-top: 5px; margin-bottom: 1.5em\" src=\"http:\/\/pit-claudel.fr\/clement\/blog\/images\/chocapic.jpg\" alt=\"Chocapic cereals\" \/>Children&#8217;s cereal manufacturers often attract the attention of young clients by including small prizes and toys in every box; sometimes all prizes are identical, but most often individual prizes are part of a collection, and kids are encouraged to collect them and try to complete a full collection. How long does it take ?<\/p>\n<p>Simple probabilistic modeling shows that on average \\(n (1 + 1\/2 + \\ldots + 1\/n)\\) boxes are required to complete a full set of \\(n\\) prizes: for example, it takes on average \\(14.7\\) boxes to complete a full set of six prizes.<\/p>\n<p><!--more--><\/p>\n<h2 id=\"calc\">Numerical and experimental simulation<\/h2>\n<p>The expected number of boxes to buy before finding all \\(n\\) prices is shown below:<\/p>\n<div class=\"illustration\">\n  <a href=\"http:\/\/pit-claudel.fr\/clement\/blog\/images\/cereals-time-to-completion.png\"><img decoding=\"async\" src=\"http:\/\/pit-claudel.fr\/clement\/blog\/images\/cereals-time-to-completion.png\" alt=\"Theoretical time to completion chart\" \/><\/a><\/p>\n<p>This histogram shows the calculated average number of purchases required to accumulate at least one copy of each prize, for various sizes \\(n\\) of prizes sets: for example, finding all prizes in a series of twelve requires on average 37.24 purchases. Note that the curve is somewhat steeper than linear, due to a multiplicative logarithmic factor \\(\\left(1 + \\frac12 + \\frac13 + \\ldots + \\frac1{n}\\right) \\sim \\log(n)\\)<\/p>\n<\/div>\n<p>And here is an experimental simulation for \\(n = 6\\): the graph below shows how many purchases were required to obtain six prizes out of six, over 1,000,000 simulations.<\/p>\n<div class=\"illustration\">\n  <a href=\"http:\/\/pit-claudel.fr\/clement\/blog\/images\/cereals-completion-distribution.png\"><img decoding=\"async\" src=\"http:\/\/pit-claudel.fr\/clement\/blog\/images\/cereals-completion-distribution.png\" alt=\"Experimental time to completion distribution chart\" \/><\/a><\/p>\n<p>This histogram shows the experimental probability distribution of \\(T_6\\); that is, of the number of cereal boxes that one needs to purchase before finding all prizes in a series of six. Interestingly, despite the peak at \\(t = 11\\), the distribution&#8217;s long tail offsets the average to \\(14.65\\) in this experiment \u2014 close to the theoretical prediction of \\(14.7\\).<\/p>\n<\/div>\n<p class=\"fast-forward\">Interested in how these plots were obtained? Read on to find out, or <a href=\"#conclusion\">skip directly to the comments<\/a>  to share your thoughts!<\/p>\n<h2>Calculations and simulation code<\/h2>\n<h3>Theoretical average<\/h3>\n<p>Assume there are \\(n\\) different prizes, all of which are equally distributed in cereal boxes. Model buyer behavior as a sequence of purchases, and call \\(T_k\\) the time at which they find the \\(k\\)<sup>th<\/sup> prize : clearly \\(T_0 = 0\\) (no purchases are needed to find 0 prizes) and \\(T_1 = 1\\) (one finds their first prize immediately after buying their first cereal box). Naturally, \\(T_n\\) denotes the time required to find all prizes.<\/p>\n<p>To calculate \\(\\newcommand{\\E}[1]{\\mathrm{\\mathbf{E}}\\left[#1\\right]}\\E{T_n}\\), call \\(\\Delta_k\\) the elapsed time between the \\(k\\)<sup>th<\/sup> and \\(k+1\\)<sup>th<\/sup> prizes, i.e. \\(\\Delta_k = T_{k+1} &#8211; T_k\\). Summing over \\(k\\) yields \\(\\sum_{k=0}^{n-1} \\Delta_k = T_n &#8211; T_0\\) which, given the linearity of the expectation operator, yields $$<br \/>\n\t\\sum_{k=0}^{n-1} \\E{\\Delta_k} = \\E{T_n}<br \/>\n$$<\/p>\n<p>All that remains to be done now is to calculate \\(\\E{\\Delta_k}\\) for all \\(k \\in \\{ 0, \\ldots, n-1 \\}\\). By definition $$<br \/>\n\t\\newcommand{\\P}[1]{\\mathrm{\\mathbf{P}}\\left(#1\\right)}\\E{\\Delta_k} = \\sum_{\\delta = 1}^\\infty \\P{T_{k+1} = T_{k} + \\delta} \u00b7 \\delta<br \/>\n$$<\/p>\n<p>\\(\\P{T_{k+1} = T_{k} + \\delta}\\) is the probability of finding the \\(k+1\\)<sup>th<\/sup> exactly \\(\\delta\\) purchases after obtaining the \\(k\\)<sup>th<\/sup>; that is, the probability of completing \\(\\delta &#8211; 1\\) unproductive purchases, then a successful one. Now the probability of hitting a sequence of \\(\\delta &#8211; 1\\) unproductive purchases is exactly \\((k\/n)^{\\delta-1}\\); hence $$<br \/>\n\t\\P{T_{k+1} = T_{k} + \\delta} = \\left(\\frac{k}{n}\\right)^{\\delta-1}\\left(1 &#8211; \\frac{k}{n}\\right)<br \/>\n$$<\/p>\n<p>Summing over positive values of \\(\\delta\\) yields $$<br \/>\n\t\\begin{align*}<br \/>\n\t\t\\E{\\Delta_k} &#038;= \\sum_{\\delta = 1}^\\infty \\P{T_k = T_{k-1} + \\delta} \u00b7 \\delta = \\sum_{\\delta = 1}^\\infty (k\/n)^{\\delta-1} \u00b7 (1 &#8211; k\/n) \u00b7 \\delta\\\\<br \/>\n\t\t\t\t\t &#038;= (1 &#8211; k\/n) \\sum_{\\delta = 1}^\\infty \\delta (k\/n)^{\\delta-1} = (1 &#8211; k\/n) \\frac{1}{(1 &#8211; k\/n)^2}\\\\<br \/>\n\t\t\t\t\t &#038;= \\frac{1}{1 &#8211; k\/n} = \\frac{n}{n-k}<br \/>\n\t\\end{align*}<br \/>\n$$<\/p>\n<p>And summing over \\(k\\) yields $$<br \/>\n\t\\begin{align*}<br \/>\n\t\t\\E{T_n} &#038;= \\sum_{k=0}^{n-1} \\E{\\Delta_k} = \\sum_{k=0}^{n-1} \\frac{n}{n-k} = n \\sum_{k=1}^{n} \\frac{1}{k}\\\\<br \/>\n\t\t\t\t&#038;= n \\left(1 + \\frac12 + \\frac13 + \\ldots + \\frac1{n}\\right)<br \/>\n\t\\end{align*}<br \/>\n$$<\/p>\n<p>Which, for large \\(n\\), is \\(n \\left(\\log(n) + \\gamma + \\frac1{2n} + o\\left(\\frac1n\\right)\\right)\\) (where \\(\\gamma \\approx 0.57722\\) denotes the Euler\u2013Mascheroni constant).<\/p>\n<p class=\"details\">Note that, as expected, \\(\\E{\\Delta_k}\\) quickly increases as \\(k\\) grows larger: the more prizes have already been found, the harder it is to get a new one, to the point that finding the last prize requires \\(n\\) purchases on average.<\/p>\n<h3>Numerical simulations<\/h3>\n<p class=\"detail\">The experimental data above was simulated using the following code:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\trepeat = 1000*1000\r\n\ttime_to = defaultdict(Counter)\r\n\t\r\n\tfor _ in range(repeat):\r\n\t\tpurchased = 0\r\n\t\tcollected_prizes = set()\r\n\t\t\r\n\t\twhile len(collected_prizes) &lt; n: # Until all prizes have been found\r\n\t\t\tpurchased += 1 # Purchase one more box\r\n\t\t\tprize = random.randint(1, n) # Open it\r\n\t\t\tif prize not in collected_prizes: # Is it a new prize ?\r\n\t\t\t\ttime_to&#x5B;len(collected_prizes) + 1]&#x5B;purchased]  += 1 # Yes, count it !\r\n\t\t\t\tcollected_prizes.add(prize)\t# And add it to list of already found prizes\r\n<\/pre>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>One might want to find some intuitive way to comprehend the formula derived above, \\(\\E{T_n} = n\\left(1 + 1\/2 + \\ldots + 1\/n\\right)\\). A relatively easy way to get such an intuition is to summarize the calculation above, realizing that each new packet bought after finding the \\(k\\)<sup>th<\/sup> prize brings an extra prize with probability \\((n-k)\/n\\) ; since each purchase is independent, it takes on average \\(n \/ (n-k)\\) purchases to find a new prize; then summing over \\(k\\) yields the final formula.<\/p>\n<p class=\"conclusion\">Have you ever completed a full set of cereal box prizes? How long did it take?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Children&#8217;s cereal manufacturers often attract the attention of young clients by including small prizes and toys in every box; sometimes all prizes are identical, but most often individual prizes are part of a collection, and kids are encouraged to collect them and try to complete a full collection. How long does it take ?<\/p>\n","protected":false},"author":1,"featured_media":823,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[33,57,32,17,8,14],"tags":[108,107,63],"_links":{"self":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/764"}],"collection":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/comments?post=764"}],"version-history":[{"count":5,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/764\/revisions"}],"predecessor-version":[{"id":827,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/764\/revisions\/827"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/media\/823"}],"wp:attachment":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/media?parent=764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/categories?post=764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/tags?post=764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}