An ethical advancement of scientific knowledge demands a delicate equilibrium between benefits and harms, in particular in health-related research. When applying and advancing scientific knowledge or technologies, Article 4 of UNESCO’s Universal Declaration on Bioethics and Human Rights, ethically justifiable research requires maximizing direct and indirect benefits and minimizing possible harms. The National Institution of Health [NIH] Data Sharing Policy and Implementation Guidance similarly states that data necessary for drawing valid conclusions and advancing medical research should be made as widely and freely available as possible (in order to share the benefits) while safeguarding the privacy of participants from potentially harmful disclosure of sensitive information. This paper discusses the challenges in the maximization of research benefit and the minimization of potential harms in the unique context of health-related research in Big Data from multiple sources, which are differently protected by the law.

Part I frames the ethical dilemma by discussing potential benefits and harms, showing the constant misalignment in health-related research in Big Data from multiple sources, between the benefits in the use of confidential information for scientific purposes and the value in keeping confidentiality. Part II addresses existing regulations, including their nature and legal coverage. It highlights the prevailing challenges when combining data from multiple sources that are differently protected by the law. Part III compares different requirements for consent or authorization to use persons’ health information for research. It focuses on the difficulty of existing regulation to ensure those requirements when using multiple sources of data. Part IV investigates whether exemptions from the authorization requirement could prevail in the context of information that exceeds the protection of HIPAA and the Protection of Human Subjects Regulations. In Part V the paper proposes a solution of a statistical nature, using the method of synthetic data to balance conflicting considerations. Part VI shows how the use of synthetic data can overcome some of the ethical challenges.