Data Ethics



Kelly McConville & Kevork Horissian

Library Donuts & Coffee | Fall 2025

Data Ethics

  • Considered a subfield of applied ethics.

  • Data ethics studies and evaluates moral problems related to:

    • Data
    • Algorithms
    • Corresponding Practices

Ethical Guidelines

American Statistical Association have published Ethical Guidelines for Statistical Practice

Purpose of the Guidelines:

“The ethical guidelines aim to promote accountability by informing those who rely on any aspects of statistical practice of the standards they should expect. Society benefits from informed judgments supported by ethical statistical practice. All statistical practitioners are expected to follow these guidelines and encourage others to do the same.

In some situations, guideline principles may require balancing competing interests. If an unexpected ethical challenge arises, the ethical practitioner seeks guidance, not exceptions, in the guidelines. To justify unethical behaviors, or to exploit gaps in the guidelines, is unprofessional and inconsistent with these guidelines.”

Let’s consider some morally charged decision points in the data analysis process.

Data Ethics Case Studies

OkCupid Example

In 2016, a group of researchers publicly released a dataset of nearly 70,000 users of the online dating site OkCupid. The dataset included several variables such as username, age, gender, location, relationship interests, and personality traits. When asked whether or not the researchers had tried to anonymize the data, the researchers said “No. Data is already public” and “Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it is a more useful form.”

  1. What is your reaction to the researchers’ statement?
  2. Do you think users of social media platforms are adequately informed as to how their data might be collected, stored, and used?
  3. Who benefits from this type of large-scale collection of personal data? Who is harmed? What might those harms or benefits look like?

Model for Predicting Clicks

Suppose that a job site created a machine learning model that predicts the likelihood that someone will interact with a job posting. Their model finds that people from a certain demographic group are more likely to interact with a nannying job post than a construction job post. A new person from that demographic group engages with the job site and based on the predictive model, the job site shows nannying job posts to them instead of construction job posts.

  1. What is algorithmic bias? Describe how algorithmic bias relates to the given example.
  2. What other ethical issues are present in this example?

Publish or Perish: Replicability

The aphorism “publish or perish” refers to the intense pressure faced by researchers to publish academic work in order to sustain their career. At the same time, the most pres- tigious journals in a scientific field receive many more submissions than can be published, leading to a high rejection rate.

  1. Describe two ways in which the competition to publish has contributed to the replicability crisis.
  2. Who benefits when results that might not replicate are published? Who is harmed? What might those harms or benefits look like?
  3. What is the best way to ensure that science produces both novel findings and replicable results?
  4. Should all scientists be required to engage in replication work or only a select group? What are potential advantages and disadvantages of either option?

Data Ethics at Bucknell