Participation and the Crowd
The increasingly presence of user-generated data has hit every level of our lives, from non-professional basic daily applications to high level of complexity in financial or even governamental analysis. From the raise of web 2.0, wikipedia or blogs to Volunteered Geographic Information (VGI), this phenomenon has shacked traditional approaches to data analysis. Or, at least, it should have.
This concept presents both opportunities and risks that should be equally addressed (Coleman et al, 2009). Although it could be argued that it is unleashing a huge potential for more democratic knowledge creation, transparency or data volume, some issues have been detected, such as the types of ‘produsers’ (Coleman et al, 2009), privacy concerns and how can this new type of data be aligned with more conventional methods (Goodchild, 2007). When specifically discussing ‘produsers’, their potential motivations are important to bear in mind, as they can entail specific biases. How can we detect if a user is contributing out of altruism or if it has professional or personal interests behind? Or, even worse, what if the data generation is part of a wider agenda with malice or criminal intent? (Coleman et al, 2009). These are reasons why users’ motivations to contribute are an important field to address in order to understand the different possible scenarios and be able to deploy appropriate vigilance and legal systems (Coleman et al, 2009). I would also like to focus the attention on some biases that users might bring without intention. Firstly, the presence of “super users”, also called “zealots” (Anthony et al. 2005), “insiders (Swartz, 2006) or “elite users” (Coleman et al, 2009), a small pool of users that contributes to a large percentage of the work. With the best intention, these users might be inadvertently introducing huge biases in data, as their voices will be prioritized over the rest of the community. That will impose a view of the “the people” in the data that will not be representative of the whole. In the case of geodata, this might also imply that some areas will receive more attention than others, creating a spatial unbalance.
All in all, although user-generated data might be able to scale up the potential of current systems, we should be aware of its possible dangers. Specially, we want to be attentive to the possibility that, through an automatic filtering of contributors, it could be replicating the social biases of the dominant class.