"I must begin, not with hypothesis, but with specific instances, no matter how minute."
"Genius is the error in the system."
"The beholder's eye, which moves like an animal grazing, follows paths prepared for it in the picture."
~Paul Klee
I am currently working on a new survey project, which will,
conceivably, become an annual survey to gather information about client’s
experiences of the health services they receive. For now, we are still in the learning (or
pilot) phase of this particular survey process.
As we begin data analysis and gather feedback from the community
regarding their experiences of the survey, I am flooded with thoughts about how
we measure Public Health work more broadly; and the data cleaning process as it
applies to the decisions we make every day to balance feasibility,
effectiveness, and inclusivity.
Often in public health and public policy, we reach for
easy-to-grab statistics, neatly packaged numbers, percentages that speak to a
larger story. But I worry that the picture
we may paint with our research tools and use of statistics can become more of a
paint-by-numbers exercise and less of a nuanced and unique picture of the
landscape. Thus, we risk telling the
same, expected story over time.
In this particular case, we are left with the decision to
address what seems to be systematic error in our “dirty data”, and exclude
surveys from individuals who had less capacity to complete the survey appropriately
(i.e., they didn’t do it according to the instructions). Is this poor survey design? Perhaps - It seems there is reason to consider whether the response categories fully fit the clients' reality; or whether it better fits the institutional, anticipated framework (similar to confirmation bias). Is
this a demonstration of the links between education and health? Maybe – That certainly seems like an overall
trend we are noticing in our analyses linked to demographics. Frequently, however, it
seems that we are excluding those responses
that were more creative and more individualized. For example, some individuals responded by designing
their own, alternative, Likert scales; and/or wrote qualitative descriptions
rather than utilize the closed-ended response categories. Of course, these trends are not consistent
across any particular survey item or set of items, so we can’t simply exclude
certain pieces of the survey. Instead, in
our analysis, we may simply lose the responses from those few individuals who
didn’t want to answer our questions in a prescriptive manner.
By design, our use of quantitative data and statistics is
intended to give us an aggregate picture, to summarize trends, to make
decisions for the program as a whole, not necessarily for each individual. Additionally, there are some creative qualitative
methods that we may employ to capture the more individualized responses, but
even these efforts will not help us capture that information and those
perspectives that are simply missing on different parts of the survey. Further, these responses will be missing from
the easily digestible summary statistics. So, I am left with the questions: Are we unintentionally marginalizing a group
of people in our data analysis process; Do we even have enough information to
identify who the group is that we are excluding; Is it appropriate to measure
satisfaction with such broad strokes; Are we giving a voice to the voiceless or
silencing those voices that do not provide the anticipated response categories; When we utilize quantitative information to
inform and improve health systems are we confident that we are including the
perspectives of those who are often not heard?
If the numbers are small enough, and they don't change the aggregate picture, we don’t worry too much about these
issues – But, should we worry more about the act of exclusion in this process (at least from a philosophical, non-statistical perspective)? Should we simply dismiss these outliers, whether or not they skew our summary statistics?
To be inclusive in our information gathering processes is
not easy and certainly not simple to standardize. An obvious solution is to design a
participatory evaluation process that deliberately includes the voices of the
more marginalized clients. With limited
resources and capacity, such a participatory process is often not feasible on a
consistent basis; and it may not be ideal for measuring trends consistently
over time. In Public Health, we often leave the needs of the individual to the
clinician or direct service provider that can most fully assess individual
needs, but we miss this individualized perspective at the policy and
program-design level. So, as I deal with
“data cleaning” for this survey project, part of me wishes for a messier, less
prescribed painting on the canvas – One that is harder to instantly discern but
filled with depth and nuance.