Ethics in Data Science and Proper Privacy and Usage of Data

Prateek Majumder 09 Aug, 2023 • 8 min read

This article was published as a part of the Data Science Blogathon.

Data may be utilized to make decisions and have a large influence on businesses. However, this valuable resource is not without its drawbacks. How can businesses acquire, keep, and use data in an ethical manner? What are the rights that must be protected? Some ethical practices must be followed by data-handling business personnel. Data is someone’s personal information and there must be a proper way to use the data and maintain privacy.

What is Ethics?

The term “ethics” comes from the Greek word Ethos, which means “habit” or “custom.” Ethics instructs us on what is good and wrong. Philosophers have pondered this crucial topic for a long time and have a lot to say about it. Most people associate ethics with morality: a natural sense of what is “good.” We as humans live in a society, and society has rules and regulations. We must be able to decide what is right and what is wrong. Ethics deals with feelings, laws, and social norms which determine right from wrong. Our ways of life must be reasonable and live up to the standards of society.

Why Ethics in Data Science is important?

Today, data science has a significant impact on how businesses are conducted in disciplines as diverse as medical sciences, smart cities, and transportation. Whether it’s the protection of personally identifiable data, implicit bias in automated decision-making, the illusion of free choice in psychographics, the social impacts of automation, or the apparent divorce of truth and trust in virtual communication, the dangers of data science without ethical considerations are as clear as ever. The need for a focus on data science ethics extends beyond a balance sheet of these potential problems because data science practices challenge our understanding of what it means to be human.

Algorithms, when implemented correctly, offer enormous potential for good in the world. When we employ them to perform jobs that previously required a person, the benefits may be enormous: cost savings, scalability, speed, accuracy, and consistency, to name a few. And because the system is more precise and reliable than a human, the outcomes are more balanced and less prone to social prejudice.

Ethics in Data Science

( Image Source:

A Digital World

We are all living in a digital world, where our day-to-day life is dependent on applications, run by tech companies. We need to take a taxi, we call an Uber. We need to order food, we use Zomato and so on. These companies have our personal data. Our email ID, phone numbers, address, purchase history, etc, and so on. The protection of personal data is thus an important aspect in the present day. Perhaps no aspect of data science ethics has gotten greater attention in recent years than the safeguarding of personal data. Our relationships with social and economic networks have undergone a digital revolution, revealing who we are, what we believe, and what we do.

In India, the Personal Data Protection Bill affirms the rights of digital citizens and addresses the hazards of commercial exploitation of personal and personally identifiable data. The Data Protection Bill is a long-awaited and desperately needed piece of legislation that would replace India’s present antiquated, obsolete, and inadequate data protection policy. It has the potential to raise user understanding of their privacy and hold data custodians and processors accountable. Read more about it here.

Who regulates and owns our Data?

In codifying ethical benchmarks such as the right to be informed, the right to object, the right to access, the right to rectification, and the right to be forgotten, these legal frameworks attempt to rebalance the inequitable relationships of power and influence between organizations and individuals.

The divisions between public and private, individuals and society, and the resource wealthy and resource-poor are being redefined as data becomes the new currency of the international economy. Which rights can be allocated with express or implicit permission, and who owns personal data? To what degree should governmental and commercial institutions be permitted to gather and control enormous databases of human interaction? How much should these data controllers and processors be held liable for the loss or abuse of our personal information?

Data Science Ethics

Analysts, data scientists, and information technology professionals must be concerned about data science ethics. Anyone who works with data must understand the fundamentals. Anyone dealing with any type of data must report any instances of data theft, unethical data collection, storage, use, etc.

For example, from the first time a consumer enters their email address on your website to the time they purchase your goods, your organization may gather and keep data about their trips. People in the marketing team might be dealing with the data. The data of the person must be preserved.

Protected data has been made public on the internet in the past, resulting in harm to persons whose information has been made available. Misconfigured databases, spyware, theft, or publishing on a public forum can all lead to data leaks. Individuals and organizations must use safe computing practices, conduct frequent system audits, and adopt policies to address computer and data security. Companies must take appropriate cybersecurity steps to prevent the leakage of data and information. This is more important for banks and financial institutions which deal with customers’ money. Protections must be maintained even when equipment is transferred or disposed of, according to policies.

Some Ethical Practices

Making Decisions:

Data scientists should never make judgments without contacting a client, even if the decision is for the interest of the project. The aims and objectives of projects must be understood by both data scientists and clients.

Let’s say a data scientist wishes to take action on behalf of a customer on a certain ongoing project. Even if the action is advantageous to the client and the project, it must be explained to the client, and no choice should be made on their behalf. Data scientists should only make decisions when it is expressly stated in the contract or when their authority allows them to.

Privacy and Confidentiality of Data:

Data scientists are continually involved in producing, developing, and receiving information. Data concerning client affiliates, customers, workers, or other parties with whom the clients have a confidentiality agreement is often included in this category. Then, regardless of the sort of sensitive information, it is the data scientist’s responsibility to protect it. Only when the customer provides permission for data scientists to share or talk about this type of information should it be disclosed or spoken about. Complete privacy of clients’ or customers’ data must be maintained.

Even if a consumer consents to your organization collecting, storing, and analyzing their personally identifiable information (PII), that doesn’t mean they want it made public.

Personally, identifiable information includes:

Phone Number, Address, Full Name, PAN card number, and so on.

To preserve people’s privacy, make sure you’re keeping the information in a secure database so it doesn’t get into the wrong hands. Dual-authentication password protection and file encryption are two data security solutions that assist safeguard privacy.

Data Ownership:

One of the important concepts of ethics in Data Science is that the individual has data ownership. Collecting someone’s personal data without their agreement is illegal and immoral. As a result, consent is required to acquire someone’s data.

Signed written agreements, digital privacy policies that require users to accept a company’s terms and conditions, and pop-ups with checkboxes that allow websites to track users’ online behavior using cookies are all typical approaches to get consent. To prevent ethical and legal issues, never assume a consumer agrees to you gathering their data; always ask for permission.

Good intentions with Data:

Intentions of data collection and analyzing data must be good. Data professionals must be clear about how and why they use the data. If a team is collecting data regarding users’ spending habits, to make an app to manage expenses, then the intention is good.


Data subjects have a right to know how you plan to acquire, keep, and utilize their personal information, in addition to owning it. Transparency should be used when acquiring data. You should create a policy that explains how cookies are used to track user’s activity and how the information gathered is kept in a secure database, as well as train an algorithm that gives a tailored online experience. It is a user’s right to have access to this information so that they may choose whether or not to accept your site’s cookies.

Some Real-Life Examples:

OK Cupid Data Release

In 2016, Emil Kirkegaard and Julius Daugbjerg Bjerrekr of Denmark shared a dataset on the Open Science Framework that included information on over 70,000 members of the online dating service OkCupid. The researchers scraped information from OkCupid’s site, including user names (but not actual names), ages, gender, religion, and personality characteristics, as well as the answers to the questions the site asks new members to help discover prospective matches, to construct their own dataset.

The information, which was gathered between November 2014 and March 2015, is not anonymous and is quite personal. The only reason the researchers haven’t shared users’ images is that it would take up too much hard disc space, according to the researchers.

Anyone who has repeated a username from one site to another, or who has used a name that may be traced back to them, may suddenly be severely vulnerable. The data was scraped and uploaded in violation of the basic ethical norms that social scientists observe. When questioned on Twitter, the researchers claimed that the data was already public because it had been submitted on OkCupid.

This was a case of unethical behavior with data. Even though the data was public, collecting it and sharing it explicitly was not right.

Robinhood Data Breach

American financial services company Robinhood announced a data breach in November 2021, affecting over five million users of the trading app. A customer support system was used to get email addresses, names, phone numbers, and other information. According to the firm, no Social Security numbers were disclosed throughout the probe. Bank accounts and debit cards were not included.

This was a case of data theft and occurred due to security issues in data storage. Steps should be taken to prevent such cases.

Data Science in the fight against Covid-19:

Outbreak analytics, a data science methodology aimed to guide outbreak response, has risen in response to the rising complexity of outbreak data.

The South Korean government used real-time analytics to improve preventative plan design and Covid-positive patient surveillance. It incorporates data from IoT and AI systems that underpin real smart city networks, as well as personal data supplied by confirmed patients. With the use of big data analytics, researchers can follow the patients’ travels, identify their contacts, and anticipate the possible outbreak magnitude in a specific location. The information is also utilized to create prevention plans and instructions.

This is an example of how data is used for a good purpose.

Frequently Asked Questions

Q1. What is data ethics?

A. Data ethics refers to the moral and responsible use of data, particularly in the context of collecting, analyzing, sharing, and applying data in various fields and industries. It involves considering the potential ethical implications, societal impact, and privacy concerns associated with data-related activities. Data ethics aims to ensure that data practices align with ethical principles and respect the rights and well-being of individuals, communities, and society as a whole.

Q2. What are the 7 ethics in Data Science?

A. The seven ethics in Data Science comprise transparency, accountability, fairness, privacy, security, consent, and integrity. These principles guide ethical conduct throughout the data lifecycle, promoting open communication, responsible decision-making, unbiased algorithms, data protection, secure handling, informed consent, and maintaining the accuracy and reliability of data. Upholding these ethics fosters trust, avoids discrimination, safeguards privacy, prevents misuse, and ensures that Data Science advancements are ethically sound and contribute positively to individuals and society.


Data Science Ethics is an important topic of discussion in today’s world. Organizations and companies using data and implementing data science must follow a set of ethics while dealing with data. When used ethically, data may help you make better decisions and make a difference in the world.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


Prateek Majumder 09 Aug 2023

Prateek is a final year engineering student from Institute of Engineering and Management, Kolkata. He likes to code, study about analytics and Data Science and watch Science Fiction movies. His favourite Sci-Fi franchise is Star Wars. He is also an active Kaggler and part of many student communities in College.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Related Courses