Some thoughts on data collection methods and how choices could skew outcome

The choice of data collection medium could skew your data towards certain known and unknown biases.

The choice of data collection medium could skew your data towards certain known and unknown biases.

When you choose to collect data using an electronic method: online surveys, you have systematically excluded all those without computer access or knowledge of how online surveys work.
The excluded may come from a certain demography or not, and you would not be hearing from them. It would then be wrong to generalise your findings to include that population your data collection strategy has excluded.

Knowing the potential impact and limitations of your data collection strategy could have one of a few impacts:

  1. Have you consider changes to your collection design to include or exclude certain demographics.
  2. Provide a context for your analysis (so your analysis will note those systematically excluded or otherwise and wouldn’t assume coverage for those).
  3. Help to identify and account for factors that may have underpinned your observation but that may not have been directly observed.


This post was published on Facebook. And I had the course to add the following clarifications based on reader feedback:

However, depending on which side of the table you are sitting at:

  1. If you are the creator data collection and presentation initiative, clarify your goals upfront, decide of you want to be honest or creative, run a POC exercise, and then scale the effort. 
  2. If you are the recipient of the outcome of a data-gathering and analysis initiative, see if you can read between the lines. Ask what this data set and/or presentation could be saying and not saying. Query how the data is collected. Query external factors that the presentation hasn’t included. Example: you are confronted with sales data confirming that a product is growing in the market, that doesn’t mean much until you consider the product’s total market share (a product’s market share may be in decline, stagnant and/or growing, without any correlations to the absolute sales number, but knowing the other factors like market share would help you understand the true state of affairs)

    Another example: is the presentation displaying cumulative sales numbers over a period or absolute sales numbers per sub-period in the studied period? If the former, selling a single unit of that product in any period would make it seem as if there has indeed been growth in sales. Whereas if the former, it becomes very obvious that sales have stagnated.

Credit: XKCD carton:



Leave a Reply

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.