Protecting American Genetic Data from National Security Threats

Shannon Bocquet
Feb 21, 2022
8 min read

Authors: Shannon Bocquet and Brian Samuelson

Introduction

So far in the 21st century, the research and development of biotechnologies have rapidly advanced and are increasingly prolific in everyday life. The term “bioeconomy” has emerged to describe this growing field and the notable economic impact in the United States and throughout the world. Unlike other sectors of the U.S. economy, a wide range of industries contribute to the bioeconomy, such as pharmaceutical production, manufacturing of personal protective equipment, and laboratory research. In addition, there are four primary drivers of the bioeconomy: the life sciences, biotechnology, engineering, and computing and information services. While research in life sciences and biotechnologies may be more obvious components of the bioeconomy, the role of engineering in producing automated robots or computing to quickly analyze mass data sets is crucial to expanding this sector. The bioeconomy represents a rapidly expanding portion of the U.S. economy, estimated at $1 trillion out of the total $21 trillion U.S. GDP in 2020. The COVID-19 pandemic has demonstrated the United States' reliance on this broad economic sector and the broader global market.

Figure 1 | The four main drivers of the bioeconomy are engineering, life sciences, biotechnology, and computing and information sciences

Given the dependency of the U.S. on the bioeconomy, it is a matter of national security that all stakeholders involved in the sector are invested in and want to protect from malicious actors. Moreover, the pace at which these technologies are advancing has rendered many existing policies inadequate or obsolete at ensuring a balance between innovation and homeland defense. Of primary concern are the massive amounts of data being produced across all subsections of the bioeconomy and the lack of consideration for potential vulnerabilities to this data or the vulnerabilities the data itself creates.

Biological Information Security

The rapid advancements in biotechnologies are only possible with simultaneous advancement in engineering capabilities and computing and information services. All four drivers of the bioeconomy subsequently feed off each other and enable further innovation for the other drivers creating a positive-feedback loop of advancement. However, the immense computing power and capacity to analyze large datasets rapidly and efficiently drives the other three sectors of the bioeconomy. With developments such as artificial intelligence and machine learning, data can readily be collected and analyzed for any patterns or outliers without any mediation from the human scientist leading the experiment. The incredible advancements, specifically in computing and bioinformatics, allow for different and quicker analytical perspectives of data than a human might be able to interpret, offering more opportunities for innovation in life sciences, biotechnology, and engineering which were not previously possible.

Data itself has now become one of the most critical resources in this era of biology, which is beneficial for scientific advancement, yet this presents a security vulnerability should data be misused or leaked to external actors. At the end of the day, experiments conducted across the life sciences, biotechnology, or engineering fields are meant to collect data and interpret results. With the explosion of data production, individuals can effectively skip the need to conduct their experiments by acquiring external data and leveraging the information it reveals before others can interpret or analyze it. The capacity to apply computing power and analytical tools is now what drives scientific advancement and innovation. Data as a resource is priceless, and current U.S. policies do not adequately address its value, especially when it comes to the bioeconomy.

Massive volumes of data are being generated in the bioeconomy, whether they are from sequencing of pathogens, such as through COVID-19 testing; human genetic sequencing with at-home genetic test kits; health records of populations; or numerous other data collection and storage mechanisms. The large databanks of health and biological information generated are immensely valuable to advancing science to study population trends and conduct surveillance of public health threats. However, these databanks are also troubling as their storage on cloud-based servers poses a risk of malicious access and manipulation of data. The bioeconomy and cybersecurity have become intertwined in the 21stcentury and the era of mass data storage. This interface between the two fields has been coined “cyberbiosecurity," as it involves the protection of biological information stored on servers through cybersecurity methods of mitigating threats. Information stored in the cloud is exposed to potential cyberattack or ransomware, leading to security concerns for proprietary data, industrial espionage, and exposure of personal health information. Cyberbiosecurity is thus a crucial area of investment to adequately protect the U.S. bioeconomy and combat national security concerns associated with advancements in biotechnology.

As highlighted during the pandemic with COVID-19 PCR tests, genetic sequencing is a critical technology which has fed the explosion of biological information. The first human genome was sequenced in 2003 with the Human Genome Project, and it spanned thirteen years of hand sequencing genes. Even upon the project’s completion, simply having the order of As, Ts, Gs, and Cs in the human genome did not mean researchers understood or were able to interpret where each human gene was located and what its distinct function was. Today, next generation sequencing technology allows a biological sample to be run through machines and sequenced in a matter of minutes and for a fraction of the cost incurred earlier this century.

Figure 2 | The cost of human genome sequencing per megabase from 2001 to 2015

The substantial reduction in sequencing time alone has enabled the mass collection of genetic sequences from a range of hosts. These sequences are uploaded to some form of server for storage and immediately susceptible to cyberattacks from malicious actors. Genomic information, regardless of the source, is stored on the same platforms without sufficient concern of exploitation. Particularly troubling is the lack of cybersecurity measures from direct-to-consumer (DTC) genetic testing companies, such as 23andMe or Ancestry.com, to protect their customers’ genomic information. Mass databases compiled by these private companies are gold mines for health analytics with the broad range of health outcomes and genetic diversity and should be protected accordingly. As of late 2021, over 50 million individuals around the world have submitted their genetic information to DTC genetic testing companies to be sequenced. While Americans willingly submit their genetic data today, it is unknown how that information can be used in the future, which poses an inherent risk to homeland security.

China is actively seeking mechanisms to collect genetic data from around the world but has been specifically targeting databases with Americans’ genomic information due to the United States’ comparatively genetically diverse population. Chinese biotechnology companies are progressively moving into the American genomic market via investments into American DTC testing companies to obtain preexisting datasets and directly conducting genomic testing to generate their own massive bio-databases. One of the more prominent, BGI, has active partnerships and contracts with a variety of health institutions across the United States, offering extremely cheap genomic testing services and in return, gaining access to samples from Americans. Another Chinese biotech company, WuXi Healthcare Ventures, contributed to a $155 million investment into 23andMe in 2015, putting Chinese companies in a position to increase their biological databases ten-fold.

International partnerships and free-market competition are key drivers of scientific innovation and the bioeconomy; however, the access Chinese biotech companies are gaining to genomic datasets is not just limited to the company. While most private companies around the world have protections in place to ensure the privacy of their data, even from their country governments, Chinese companies are compelled by law to share any information and collaborate with the Chinese government.

Genomic databanks of millions of people are invaluable for running genome-wide studies where sequences of genetically diverse people can be analyzed to note genetic patterns associated with race, ethnicity, age, socioeconomic class, and more. Access to such data allows for a better understanding of human genetic markers, which can contribute to precision medicine for sub-populations of people or which could be used maliciously to discriminate against those with a certain genetic marker. The dual-use nature of genomic sets is especially concerning as China has proven through the targeting of ethnic Uighurs that they can, and will, use genetic information to persecute individuals or sub-populations. Once an actor gains access to an individual’s genetic data, there is no way to get it back, emphasizing the need for better initial protections of such information before access is granted.

Regulating Genomic Information Privacy

A holistic threat assessment must be conducted across federal, state, and local governments and in cooperation with the private sector to effectively address the current gaps in policy and continue to revise policy as technology advances in the future. There is currently no specific federal authority designated to coordinate actions related to the bioeconomy. Given the breadth of the bioeconomy, nearly every executive department has some role or involvement in at least one of its industries. It is necessary to form a singular authority to oversee and coordinate the public and private sectors’ actions and interests in the bioeconomy. Such an authority would ideally be positioned within the White House’s Office of Science and Technology Policy or established as a new office within the National Security Council structure, two institutions with the existing expertise of enacting and amending policy. Most importantly, this authority will enable collaboration between the U.S. government, which generally funds small, initial research and development (R&D) projects, and the private sector, which makes up the majority of the bioeconomy and its profits. Successfully protecting the bioeconomy requires the private sector to have the capability to protect their industry, something which can be ensured through enacting policies from one coordinated office. A single authority will have the ability to enact sector-wide security regulations requiring certain cybersecurity, physical security, and funding oversight mechanisms to minimize threats to biological technologies and mitigate risk to the U.S. homeland.

To effectively address the risk of genomic databases being exploited by nefarious actors, it is necessary to bridge the gap in policy between the two sectors of cybersecurity and biosecurity. To date, little collaboration has occurred between the two industries, with many biological professionals unaware of the emerging cyber threat to their profession. The growing issue of cyberbiosecurity needs to be expanded upon by the U.S. government through policy. This effort should be coordinated through a lead authority on the bioeconomy, either in OSTP or in a new office in the NSC. The swaths of data being developed and uploaded to cyberspace should be overseen by a new cyberbiosecurity policy, which should address the cybersecurity measures required when human genomic information is being held by a private company or agency within the public sector.

One potential way to address the security of genomic information would be to implement better genetic privacy laws in the United States that regulate the access and sharing of genomic information with foreign entities. The Genetic Information and Nondiscrimination Act (GINA) was passed in 2008 to prevent genetic discrimination against an individual from health insurers or employers, though this focuses on genetic information privacy in terms of discrimination rather than data protection. The Health Insurance Portability and Accountability Act (HIPAA) is most notable for its privacy protections of health information, though even this neglects to address how genomic information is handled outside the healthcare setting.

Individual states have passed their own genetic privacy legislation in the past, with California, Arizona, Florida, and Utah all passing laws in 2021 pertaining to how private companies, like DTC testing, manage consumer information. However, it is necessary to pass nationwide genetic data protection regulations to truly address the gaps in genetic privacy and cyberbiosecurity. Additionally, this regulation should provide limitations on the access foreign biotechnology companies have to American genomic information to mitigate the exploitation of this data.

The expansion of computing power and information services within the bioeconomy have drastically increased the value and demand for genomic data. The realm of cyberbiosecurity will continue to gain in significance into the future as data collection and information analytics technologies explode. As the United States’ adversaries gain more and more interest in American genomic data, it is essential for biotechnology, cybersecurity, and national security to integrate to preserve genetic privacy and continue to promote the U.S. bioeconomy.

This article was prepared by the authors in their personal capacity. The opinions expressed in this article are the authors' own and do not reflect the view of their places of employment.

Protecting American Genetic Data from National Security Threats

Introduction

Biological Information Security

Regulating Genomic Information Privacy

Recent Posts

Comments