![]() |
#1
|
|||
|
|||
![]()
I’m trying to use Excel to estimate total unique visitors and total 1, 2 and 3+ time users from three (non-exclusive) samples of varying size and %s. Then, to estimate the same values from all three samples combined – an example follow (and all the data are attached too)
#1: Total count: 300,000 Sample available: 30,000 Sample Size % of Total: 10.0% Similarly, groups 2 & 3 lay out the same way: Group 2: 5,000 (total) 2,000 (sample) 40.0% (sample %) Group 3: 40,000 10,000 25.0% Total 345,000 42,000 12.2% Next, I use individual names to count duplicates within each group to get numbers like the following: Group 1: Uniques: 24,000 Single Use Sample: 20,000 Dual Use Sample: 3,000 3+ Use Sample: 1,000 …using the same layout for other groups: Group 2: 1,150 (uniques) 1,000 (1x) 100 (2x) 50 (3+x) Group 3: 7,100 6,000 1,000 100 Adding across the groups, provides an incorrect “additive total” as follows: “Additive Total”: 32,250 27,000 4,100 1,150 If I actually combine the raw, individual name datafrom all three groups, I might get true, deduped total numbers like: 30,900 25,000 4,400 1,500 You’ll notice that the increased incidence of multi-use persons (versus the “additive total”) when combining the three different groups as some of the same users exist across the different groups. This, in turn, reduces the one-time use people and uniques relative to the “additive total” figures. I point this out because the next step, and ultimate objective, is to estimate the uniques and single, dual and 3+ use for each of the groups individually and in total: Here again, I can’t use straight-line math to get to a total as when the quantities of users increase, the uniques and single-use persons will be reduced versus a linearly-estimated total. As an example: I can’t take 24,000 uniques in the Group 1 sample and divide by the 10% sample size to get an estimated 240,000 total uniques as many of the sampled uniques would show up again if we actually had the full roster of 300,000 Group 1 people; the real uniques value, from the full dataset might be 180,000 or 210,000. Similarly, when combining the samples across groups, I can’t apply the 12% factor to the combined total sample uniques of 30,900 to get an estimated total of 253,821 as the multi-use counts would increase, decreasing the uniques and one-time use individuals. So that was a long, but (hopefully) illustrative, way of getting to the question of “What’s an Excel command or formula that can be applied to the individual sample groups, and to the combined sample totals, to estimate the full data set values?" It's worth noting that sample sizes might vary from 10-100% in the groups and across different sets of other groups of users that I need to compare to. The bottom line is I need estimates for this set of groups, and several other varied sets of groups so I can compare the absolute, estimated uniques, 1, 2 and 3+ numbers across the sets. Thanks for your help! |
![]() |
Tags |
duplicate, estimate, incidence |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Reporting Machine Usage | EZRider | Excel | 0 | 12-17-2012 08:58 AM |
PowerPoint high cpu usage | grey | PowerPoint | 2 | 05-11-2012 12:15 AM |
![]() |
coffee_king | Word VBA | 1 | 03-24-2012 07:52 PM |
Advice Needed on Possible VBA usage | HorizonSC | Excel Programming | 2 | 09-21-2011 02:47 AM |
powerpoint font usage | cio@creativeio.co.uk | PowerPoint | 0 | 09-13-2011 08:24 AM |