WEEKLY REFLECTION-week 4

The Summary and Reflection of Week 4 [Data Collection].

Core Reading


Main Terms of Week4—Lecture [Data and Power]

  • Data:“units or morsels of information” (Gitelman and Jackson, 2013, p.1) that can be processed by a computer (Posner and Klein, 2017)
  • Data Collection:the process of gathering and measuring information or data on specific topics and variables, which are often related to a system and have a structure.
Data Collection:

Data collection can allow someone to potentially research and answer questions, and come to different outcomes. Data collection can be weaponized to harm by creating biased data sets, but can also be used to create archives and snapshots of information, places, and systems in different points of time.



Main Tasks of Week4-Workshop [Data and Data Analysis]

Collect data from online sources to create a dataset.


Reflection of Reading

In Crawford's chapter, the common practice of systems "treating everything as data that can be taken away" is mentioned. More generally, the hierarchy of training sets inherits and amplifies old biases layer by layer. From early corpora and email archives to later large image datasets, developers have equated "large enough" data with "real enough," leading to a situation where which groups are seen, what labels are attached, and who remains unseen are all determined by the rigid hierarchical relationships within these data sets. Even though the system's slogan "more data is better" seems neutral, in reality, it ultimately marginalizes already overlooked groups even further.

In “What Gets Counted Counts”, the authors pointed out the simplistic gender divisions in the digital or real world. When registering on websites, applying for passports, or even going through airport security, systems often only offer "male" or "female" options. For non-binary gender groups, this seemingly simple choice brings emotional distress and the predicament of identity erasure. This practice makes uncounted groups invisible in policy-making and resource allocation. Furthermore, although many platforms now allow multiple gender self-labeling on the front end, the back end still reverts to a binary gender for advertising purposes. This may seem inclusive on the surface, but in reality, the underlying business logic remains hierarchical.

Regarding my daily use of Douyin (TikTok), my digital experience is profoundly shaped by its extraction logic and algorithms. Before using Douyin, I generally agree to the collection of the following: account and device information, browsing/viewing time, likes, comments, reposts, search terms and following relationships, possible advertising identifiers, and metadata and identification tags of the content I upload. Although these are usually written into the terms of service and privacy policy, "agreement" is often a one-click checkbox, rarely negotiated item by item, and the platform's default product design means that I mostly agree by using it.

These tiny amounts of interactive data are directly used to drive the recommendation algorithm; they are not simply tailored to my preferences, but rather, as "raw data," abstracted and manipulated by the machine. Therefore, my experience is highly customized, yet also strictly limited. My choices when using Douyin are already dominated by the preset classification system and algorithm rules. My usage habits are used to continuously train the platform's algorithm, making it more effective and accurate. This makes me realize that every action I take while using Douyin is contributing my free labor to solidify this data extraction system.


Reflection of Creating Questionnaire

In completing the task of creating a questionnaire using Microsoft Forms, I found the biggest challenge lay in its design and wording. The questions needed to be clear and unambiguous, effectively guiding respondents to provide truthful information while avoiding bias—this required considerable deliberation. Furthermore, since we only needed a small number of questions, careful consideration was needed to ensure the questions were interconnected and supported by content analysis.

If time and resources allowed, I would have done a few different things. First, I would have devoted more time to more thorough pre-testing. Second, I would have further refined the questionnaire content, detailing the themes and questions to create stronger connections between them.

This experience gave me a profound understanding of the delicate relationship between data and power. The process of collecting, analyzing, and interpreting data inherently carries power. Questionnaire designers, by setting questions, define the scope of discussion, and their interpretation of the data directly influences the final decisions and narratives. Therefore, we must recognize that while data is objective, its presentation and conclusions are shaped and driven by people. This reminds us that we must maintain a high degree of transparency, ethical responsibility, and critical thinking when processing and sharing data.