Ethics at the Core

COVID-19 Coverage : See how the pandemic is impacting the world of higher education.

Ethics at the Core

April 2019

By Carrie Klein and Michael Brown

During the past decade, campuses have begun leveraging large sets of data to uncover previously unseen trends and drive strategic decision making. From measuring financial health, to optimizing enrollment and aid packages, to supporting student learning and advising, data analytics has the potential to help an institution meet its mission and improve student success.

For instance, the capacity to leverage diverse data sets can provide opportunities for campuses to better serve students by learning about their interests and behaviors and by encouraging them to make successful choices.

Consider the University of Arizona, Tucson, which used identification card swipes to track student location and time-stamp data to gain insight into how students were spending their out-of-class time. By tracking card use for vending machine purchases, library interactions, and residence hall access (among nearly 1,000 other data points on campus) and combining these data with demographic and performance data, the institution could predict the likelihood that a particular student would drop out. That likelihood might be higher, for instance, if a student was accessing his or her room late at night or spending limited time in study sessions at the library. These predictions were then given to advisers, who were able to intervene and work with students to alter their routines and establish plans for success. By using analytics, in concert with adviser interventions, the university increased student retention rates by almost 3 percent.

Data such as these can be used for more than just supporting student success. A hallmark of analytics technologies is the ability to recombine and repurpose data for new priorities and analyses. For instance, when recombined with facilities-based data sets, student location data can also help an organization understand space and resource use and needs on campus. The flexibility of analytics to generate metrics on various institutional priorities and to inform decisions is among its strengths.

Yet, inherent in the unique nature of data analytics is the potential for users to violate privacy and ethics rules. With vast amounts of data being produced by students and being used by colleges and universities to meet institutional goals, it is important to consider the parameters of that data production and use. Given higher education’s legal duty of care to students, business officers and other leaders on campus must pay attention to, and understand, the ethics of using student-generated data, which is central to unbiased and accurate decision making.

Beware of Black Boxes

Analytics systems are often referred to as “black box technologies” because they are typically proprietary in nature. Often, vendors offer little transparency about what is collected, analyzed, and used. As a result, data users are in the dark about which variables vendors use to make predictions about which outcomes. This lack of transparency creates an opening for vendors to create biased and decontextualized algorithms—the formulas used to make predictions—which can hamper the ability of organizations to make accurately informed decisions.

Scholars are working to shed light on the potential for discrimination and bias to be baked in the design of algorithmic-driven systems such as analytics technologies. These include Safiya Noble, who has written Algorithms of Oppression: How Search Engines Reinforce Racism (NYU Press, 2018), and data scientist Cathy O’Neil and her book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, 2016).

Algorithmic discrimination and bias can appear in higher education when, in an effort to provide intervention with resources and support, algorithms are used to identify students who may be at risk of failing. Many algorithms are constructed from performance and demographic variables to predict an outcome. For instance, a particular algorithm or predictive model might tell you who your low-income students are and then flag all low-income students for an intervention. Because a student’s demographics (race, gender, class, etc.) are immutable, it is unethical for campus leaders to correlate demographics with performance.

The capacity of algorithmic correlations and predictions to provide a depth of understanding to help solve problems related to inequities is limited. Furthermore, they can actually reflect and magnify patterns of discrimination. For instance, based on their characteristics, course load, and GPA, students could be dissuaded from taking a certain course or majoring in a field of interest because data analytics tools suggest, in advance, that they may not be successful. As a result of biased algorithms, students who are less prepared for college, who are first-generation college students, or who come from less affluent backgrounds could be more likely to be tracked onto specific pathways for completion.

While algorithms can be powerful tools, it is important to remember that they do not always take into account the interests, desires, and goals of students. Nor do they take into account the structures that might exist as deterrents to student success—variables that actually may be within an institution’s control to change.

The importance of understanding institutional and student contexts is vital when interpreting data, yet is often impossible given the lack of transparency of analytics tools. Algorithms that might work within one context do not always translate to another. For instance, algorithms developed and trained at small liberal arts colleges may not scale effectively for use at a large research institution or within a community college environment, where the student and institutional contexts are decidedly different.

The harm of black box systems is in their potential to inform data-driven decision making that is uninformed by the specific priorities and goals of an institution or the needs and interests of its students. Despite a lack of evidence, analytics users often assume that algorithmic decisions have greater accuracy, precision, and consistency than the decisions of their human counterparts. But black box algorithms actually elevate the importance of the correlation or prediction above data analysis because the details of the data are hidden. Assuming that using data analytics and black box algorithms is a superior way to make decisions is dangerous because it obscures and decontextualizes the nuances present in data and ultimately sidesteps the students who produced these data.

Codes of Practice

The good news is that business officers can take the lead on developing collaborative codes of practice—the guiding mechanisms for appropriate data use—and data governance policies that recognize the vested interest of students in how, where, and by whom their data are used. Ideally, business officers will do so through the lens of “data justice” and “data care.” These terms are evolving, but they generally speak to the need for an equitable approach to data use that puts the needs, interests, and contexts of those producing data (for example, students) at the forefront of analytics design, analysis, and decision making.

Given the nature of analytics and the consequences of the black box technologies that deliver these data—not only to inform institutional priorities, but also to shape student outcomes—it is necessary for higher education administrators to approach the use of these data thoughtfully. Unfortunately, there are few existing, and no universally accepted, guidelines or policies for practice. Current policies or guidelines are often limited to explanations of federal privacy policies or articulation agreements centered on data security and access.

Data security and access are certainly central to using data ethically. While organizations are legally and ethically bound to secure data and establish rules for access, related policies and practices are often unclear. Similarly, and equally ill-defined, data ownership and consent are important and complex components of ethics and privacy. Ownership of data collected via analytics technologies often lies with data users (colleges and universities) and not with data producers (students). However, ideas about data ownership are evolving, with calls for students to become the owners or co-owners (alongside their universities or technology vendors) of the data they produce. If individual student users are potential data owners with the ability to consent to participate (or not) in producing and using data, then clearer data ownership and consent policies and terms of use agreements are needed to ensure ethical use.

In an effort to provide better guidance and address ethical issues, scholars have developed a number of codes of practice that focus on promoting data transparency, security, ownership, control, stewardship, and trust. Although these codes of practice provide useful guidance, they are still evolving and fall short of comprehensively addressing the contexts and needs of organizations and students from an ethical perspective.

To address the bias and potential for discrimination in data analytics, Linnet Taylor—a data analytics researcher at Tilburg University in the Netherlands—has developed a data justice framework based on three pillars:

1. Visibility includes access to information, representation, and privacy, and focuses on how individuals are represented, profiled, and monitored through analytics systems.

2. Engagement with technology includes autonomy in making technology-related choices (including the choice not to use or be used by technologies) and sharing the benefits of data collection and use.

3. Nondiscrimination includes the ability to challenge bias within, and to be free from discrimination in, big data algorithms.

While this framework for data justice provides a useful foundation for establishing ethical use of data analytics, it does not speak to the unique nature of higher education. As noted, higher education institutions have a legal duty of care. International professors and lecturers Paul Prinsloo and Sharon Slade have been pioneering research on the ethical use of analytics in higher education. They argue that care should be extended to the collection, storage, and use of student data. They also contend that care is required because current policies and data justice initiatives alone do not take into account the specific contexts within higher education or the complexities of individual students. Care-based use of analytics should incorporate codes of practice and pillars of justice from a student-centered perspective.

Guidelines for Ethical Use

Data justice, care, and associated codes of practice combine to form a touchstone for grounding policies, processes, and practices of ethical analytics-informed decision making. From enrollment planning, to benchmarking, to strategic initiatives, to student support, codes of practice that incorporate the principles of data justice and care can be used as a framework for centering students in data analytics processes, unearthing the contextual complexities that influence data-informed decision making, and providing a useful starting point for conversations about ways to improve data-informed processes.

We recommend the following guidelines for approaching data analytics from a more ethical and equitable perspective:

Consider context. Data are invaluable for facilitating efficient and effective solutions to many challenges campuses face. However, campuses are as varied as their students. Each campus has its own mission and priorities, and students bring their own myriad experiences and goals. These contexts matter—and data are only as valuable as the meanings campus leaders derive from them.

Campus audits that are focused on understanding an institution’s readiness, capacity, and unique culture are helpful ways to better align analytics with an institution’s needs and priorities. Analytics and institutional alignment are especially important when working with predictive or prescriptive data, as algorithms developed in one context will not easily translate to the unique context and needs of another environment. To help contextualize data, higher education should approach the purchase, use, and implementation of analytics tools with the mission and priorities of the institution and the specific needs, goals, and experiences of students top of mind.

Collaborate broadly. Improved collaboration across higher education divisions and departments can help focus attention on student needs and interests. At the University of Georgia, Athens, information technology and institutional research teams are collaborating in an effort to understand data analytics from multiple perspectives, and other institutions have similarly combined their business intelligence and institutional research teams. By pooling expertise from a variety of campus stakeholders, colleges and universities can derive a more complex and nuanced understanding of analytics and implement more holistic interventions and responses to those data. Such an approach has great potential in higher education and encourages institutions to go even further in engaging stakeholders across campus in analytics initiatives.

An ethical approach based on data justice and data care would extend these collaborations beyond the better integration of various offices to include faculty, adviser, and student users of these tools—campus members who are often absent from analytics development and decision making. Various studies have indicated that data analytics and tools are more likely to be used inclusively when a broad group of stakeholders (via formal councils or meetings) is involved in their development and implementation. More importantly, collaborations that include data producers in addition to data users are more likely to generate data interpretations of greater relevance to the specific contexts, priorities, and goals of their organization.

Inclusive data governance models are not a new idea, but they remain an underutilized strategy. Rio Salado College, Tempe, Ariz.; University of Michigan, Ann Arbor; and Georgia Gwinnett College, Lawrenceville, all have good higher education models for leveraging the perspectives of faculty, advisers, and students in data policy development.

Push for transparency and open source systems. When purchasing proprietary, vendor-based analytics systems, it is essential to understand which variables comprise analytics tool algorithms, how those algorithms are constructed, and how resulting data interventions are determined. Higher education institutions can use their considerable purchasing power to push for greater transparency from outside analytics vendors.

In addition to advocating for transparent data from vendors, colleges and universities should consider adopting and adapting open source systems, which give institutions rights and full access for researching, modifying, and sharing system data. Transparent and open systems result in a better opportunity to understand institutional and student measures and to ensure ethical use of data.

Furthermore, institutions must be transparent about how data are collected, stored, used, and shared. Through easily accessible policies and communications, students should be able to understand what the institution knows about them and how it uses that knowledge. To improve equity, students should also have a level of control over providing consent for data use, management, and sharing, and the ability to opt out of analytics-based systems.

Refine policies to address access, security, and privacy issues. As analytics technologies pull vast amounts of data from a variety of sources and centralize those data into a single system, proper authorization and verification policies and processes, at a minimum, are essential for interaction with these systems. Equally important is a commitment to establishing the human and capital resources necessary to guard these data against breaches, to ensure student privacy, and to establish rules for accessing and using data analytics.

Beyond simply purchasing and deploying analytics tools, colleges and universities must work with vendors and their own information technologists to ensure data security and clarity regarding data-use processes. This clarity must extend to student access and consent. Arguably, students—being producers of the data that are used to improve institutional outcomes, in addition to their own individual outcomes—should have full access to their data and collaborate in the data consent and collection process. Moreover, students should have the right to consent to how their data will be used and for what purposes.

Address data inequities. Ethical and equitable use of data analytics requires that users address the structural, organizational, and individual inequities that exist in higher education, which can be exacerbated through the use of analytics and their algorithms. Auditing current policies and procedures and creating new ones where needed, along with developing communications related to ethical and equitable analytics use, is a good place to start.

Another good starting point is to provide training and development for all campus members regarding equitable analytics use, as well as implicit bias within postsecondary data and institutions. Furthermore, students—and the faculty and advisers interacting with them—should understand the predictive and prescriptive nature of many analytics technologies and how to use that data. They should also have a path for questioning, disputing, or remediating predictive or prescriptive interventions or privacy violations.

An Absolute Necessity

Data analytics offers great potential to improve organizational outcomes for higher education. Through real-time, visualized, and diverse measures, a more complex and a clearer view of institutional efficiencies can be realized. However, the collection and use of these data must exist within an ethical framework that leverages strategies for data justice, care, and codes of practice.

Ethical analytics policies and practices can help mitigate potential privacy and ethics violations and can provide the context and data-informed decision making needed to create a more holistic picture of higher education’s future. Ethical use of data analytics is not only the right thing to do, it also reminds us of why we do the work that we do. As members of a campus community, business officers and campus leaders are tasked with creating an institution of learning that helps students—who are at the heart of our work—to succeed.

CARRIE KLEIN is a Ph.D. candidate and research assistant in the higher education program, George Mason University, Fairfax, Va. MICHAEL BROWN is an assistant professor in higher education and student affairs in the school of education, college of human sciences, Iowa State University, Ames.

Business officers can take the lead on developing collaborative codes of practice and data governance policies.

Data ownership and consent are important and complex components of ethics and privacy.

Institutions can use their considerable purchasing power to push for greater transparency from outside analytics vendors.

Picking a Predictive Analytics Partner

Campuses today are overflowing with data—data that, if used appropriately, can help colleges and universities operate more efficiently and effectively. With the sheer volume of data available, it is sometimes difficult for an institution to know where to start its analytics efforts. Many institutions seek a partner to help them with their analytics needs, and these partnerships can bring benefits to both parties.

Yet, choosing a vendor can itself be one of the more challenging aspects of implementing predictive analytics at your institution. Among other risks, a bad fit between the campus and its vendor may decrease the likelihood of the system boosting student success. Failure to ask the right questions can also result in ethical lapses in how data are used and particular groups are affected.

Business officers are in a powerful position to ask questions of potential vendors about model transparency, data ownership, and protections for student data. Campus leaders should consider asking the following questions of prospective predictive analytics vendors:

What data go into your prediction? Ask the vendor to provide a list of all data elements the company uses in its predictive models and algorithms. With this information, institutional leaders can determine the sensitivity of the data elements and document where those data are stored. Documenting the data elements provides an important record of what vendor predictions are based on. Vendors should commit to providing the institution with an updated list of variables whenever they make changes.
What kind of data make up your training data? One way bias creeps into algorithms is through nonrepresentative training data. Some vendors use data from multiple institutions to make their models more robust. Ask vendors to document the diversity of the institutions and student data that helped train their models. College and university leaders should also ask how vendors plan to customize the algorithm for their institutions. Many vendors create their model using only the institution’s data. In that case, business officers should ask how far back the training data go, since it’s important to balance having the most recent data possible to reflect the current population with ensuring that there is a valid sample size to train the algorithm.
Can you test your algorithms on my institution’s data? The robustness of algorithms matters, but so does campus context. Before signing a long-term contract with a vendor, campus leaders should ask if the vendor is willing to conduct a pilot test. Some vendors will agree to test their algorithms on your institution’s data. This can help indicate if the system is discriminatory—overidentifying certain groups of students as “at risk” on campus or identifying all students as “at risk”—which would be of limited use to the institution. If a vendor doesn’t agree to a pilot test, it should show how its models performed at institutions similar to yours and agree to a disparate impact analysis of its tool output, after getting the contract, to ensure that the algorithm is correctly identifying at-risk students.
Who owns the cleaned source data, repurposed data, and byproduct data? In some cases, the vendor will not agree to return the cleaned source data to the institution, instead destroying it at the end of the contract. The contract should clearly lay out who owns what data and under what terms. The contract should also prohibit the vendor, or any partner of the vendor, from selling student data.
What are your physical, administrative, and technical data safeguards? Vendors represent a serious potential for data breaches if they do not have the correct protections in place. The massive breaches of customer data at Target Corp. and Home Depot Inc. in recent years happened because of flaws in vendor systems. While most colleges and universities have IT staff who know what questions to ask vendors, given the liability at stake, it is a good idea for business officers to check this box as well. Ask vendors how the company controls access to the institution’s data; if the data are encrypted using an industry-standard encryption both when stored at the vendor and when transferred between the institution and the vendor; and if the data will be stored in a physically secure location.

As demands for big data and artificial intelligence systems continue to increase on campus, business officers play an important leadership role in getting these collaborations right. For additional ethical and practical considerations for collaborating with a predictive analytics vendor, read Choosing a Predictive Analytics Vendor: A Guide for Colleges by searching the title at www.newamerica.org.

IRIS PALMER is senior policy analyst, education policy program, New America, Washington, D.C.

Structure Your Data Governance

Like many of our younger students, Georgia Gwinnett College (GGC), Lawrenceville, has grown up with technology—and by default, has learned to manage lots of data. A public, bachelor’s degree-granting institution in the Atlanta metro area, GGC opened in 2006 with 118 students and today serves more than 12,500 students. Our institution’s rapid growth during an age of prolific data availability has taught college leaders valuable lessons about how to effectively leverage analytics.

College leaders facing pressure from many sides for data analysis and use helped GGC make an important move. They created a data governance and management structure, along with policies and processes, to support efficiency in data use and communication and ensure that the institution’s use of data remains within the bounds of ethical and legal frameworks. Taking the time to develop an appropriate governance structure and culture that addresses ethical concerns related to data access and use can help every institution mitigate risk.

Establishing Policies, Roles, and Terms

Our first step was the creation of a data management and classification policy. We reviewed existing policies from many other institutions to craft a policy to fit GGC’s unique needs. This policy addresses access, use, security, privacy, ethics, and compliance with existing laws and regulations.

During this process, we identified all databases on campus along with their corresponding data stewards—those individuals who have the responsibility, authority, and accountability to manage a particular database. This list of existing campus databases and data stewards is updated annually. We also identified data categories including public, sensitive, internal, confidential, and highly sensitive to help facilitate the intake process.

These efforts required collaboration among multiple offices across campus to ensure that appropriate and ethical practices were followed for all the different kinds of data produced and used on campus.

In tandem, we created a data managers team led by the college’s executive director for the office of plans, policies, and analysis. The team consisted of representatives from various offices, including institutional research, educational technology, enrollment management technology, registrar, and advancement, with occasional attendance by representatives from financial aid and student accounts.

This team conducts intake and is responsible for consistent management of all data requests. It meets weekly to review requests submitted online through a data request form accessible only to employees. All individuals requesting data must sign a confidentiality agreement acknowledging their responsibilities with data usage. This policy and process allowed us to create a single point of contact on campus for requesting data and a single point for multioffice reviews of requests. Each request initiates a conversation about responsibility for the requested data and ensuring the accuracy of the data provided to requestors.

Asking the Right Questions

Every institution will approach and structure its data governance and management differently. However, just as we did at GGC, there are questions that every institution must ask to ensure that an effective governance structure can handle the ethical questions and considerations that inevitably arise surrounding data access and use. These include:

Policy. How effective and comprehensive is your data governance policy in regulating your organization’s data usage? Are your policies clear regarding data governance, confidentiality, and protection?
Process. How effective is your data request process? Is it responsive, timely, and consistent?
Culture. Does your institution value the ethical use of data? Is there compliance with the process or do you have to constantly police activity regarding data usage?
Chief data security officer. Does this individual exercise broad-based authority?

Asking these questions from the start can help ensure that you build an appropriate data governance and management structure that will mitigate institutional risk.

JULIANA LANCASTER is executive director for the office of plans, policies, and analysis; LAURA LEDFORD is executive director of registration services; and JENNIFER STEPHENS is deputy chief of staff, Georgia Gwinnett College, Lawrenceville.

What’s in Your Algorithm?

Algorithms are the underlying formulas that govern analytics technologies. These formulas are constructed from various institutional and student variables. Algorithms are the mechanisms through which data analytics is able to yield insights into institutional and student performance. The power of algorithms lies in their ability to find correlations and make predictions.

However, like all data, analytics and algorithms are value-laden, meaning that they reflect the biases that exist in the environments from which they are culled. As such, the potential exists, even when unintended, for analytics developers to integrate bias into their algorithms, which can result in discriminatory outcomes.