- Profile
Jingyi Wei: PhD from Department of Bioengineering at Stanford University, mentored by Professor Silvana Konermann. During the undergraduate period, she represented Peking University in the iGEM competition and won the team gold medal and the gold medal in the environmental category.
- Question A: Can you briefly introduce your research field to those outside the industry in one sentence?
Jingyi Wei: My main research direction involves using CRISPR technology for gene editing. The objective in this field is to artificially upregulate or downregulate the expression level of one or more genes to study biological issues or to treat genetic diseases.
- Question B: Could you please introduce some basic knowledge of these fields to those outside the industry?
Jingyi Wei: Firstly, regarding the relationship between genes and cells, according to the central dogma of biology, chromosomal DNA is composed of many genes, which can be represented as sequences made up of four types of nucleotides (A, T, C, G). The DNA of genes can be transcribed into RNA, and then translated into proteins to perform various functions. The different expression levels of various genes can determine the nature and function of a cell, including its growth, response to stress, and the process of apoptosis, among other things. All cells have the exact same DNA sequence, which can be understood as coming from the same template. However, different cells have varying amounts of RNA and proteins for each gene due to gene regulation, which determines the type of the cell. In summary, all cells have the same gene sequence, and the difference lies only in the expression levels of different genes.
Regarding gene editing, in principle, we cannot edit germ cells because it affects the development of the whole person and fundamentally changes a person. For the treatment of genetic diseases in clinical practice, I participated in a project related to AIDS at Peking University. Our idea was to take hematopoietic stem cells from the patient or a donor, edit their genes to some extent, and then reintroduce them into the patient’s body (before this, the patient’s body needs to undergo procedures like myeloablation to remove diseased blood cells, essentially ‘washing’ the hematopoietic stem cells in the spine). These edited hematopoietic stem cells can then differentiate and expand into various types of blood cells. After a sufficient amount of time, the person’s blood cells can prevent HIV invasion and proliferation, ultimately treating AIDS. With the current technological means, it is very difficult to change all cells on a large scale; it is basically about targeting those cells that are diseased or related to the disease for modification.
- Question C: Could you please list a few of the hot key directions and issues in your field?
Jingyi Wei: In the entire field, although we talk about gene editing, we can’t actually edit as we wish because the current level of technology does not allow us to change a gene sequence into any form we desire. What is more mature now is the interference with the expression level of a certain gene, either increasing or suppressing it. If we want to change a certain segment of a gene into another segment, there have indeed been related technologies recently, but they are not mature and efficient. So, overall, what we are doing more now is disruption, rather than replacing or writing.
At the current stage, one of the hot topics is “printing,” meaning that you can realize any desired gene sequence and ‘print’ it into the genome. In this direction, first, we need better tools. Existing ones, such as Cas9, the most commonly used tool for editing DNA, primarily work by cutting DNA and utilizing the cell’s endogenous repair pathways to cause codon misalignment. For example, DNA bases are translated in groups of three, which can encode an amino acid. After we cut the DNA, imperfect repair may disrupt the original ‘triplet’, leading to a frame shift. This is akin to disrupting the protein, but it does not allow for precise editing. To perform precise editing, one could introduce a gene template and edit through homologous recombination, but this technique is currently of low efficiency.
Recently, a very effective technology called prime editing has been developed. It uses nickase Cas9 for targeted cutting, attaching a template fragment after the guide RNA. Through reverse transcriptase, the DNA sequence near the cutting site is replaced with the desired sequence. However, the efficiency of this technology is still somewhat low, with about ten out of a hundred cells being successfully edited. Therefore, this technology still needs further optimization overall.
The second important direction is the issue of delivery because these gene editing tools need to transfer a relatively large editing protein into human cells and then ensure that the protein reaches a certain level of expression to enable cutting and editing. This process is relatively easy for some cells, where the foreign proteins can be transferred into the target cells through chemical or electroporation methods. However, it becomes difficult for other cells, such as neurons, stem cells, macrophages, T cells, and other less proliferative cell types, which are not easily receptive to these foreign proteins, making gene editing challenging.
The third direction is to explore whether it is possible to discover naturally occurring gene editing tools that are better than the current ones or can complement the shortcomings of existing tools. For example, our group recently discovered proteins like Cas13d, which can directly and efficiently cut RNA (rather than DNA), providing more possibilities for gene editing. Additionally, there are currently no highly efficient targeted transgenic tools, so some research groups are exploring naturally occurring integrases or transposases linked to CRISPR (CAST) in nature to serve as our gene editing tools, in short, to learn from nature.
- Question D: In these directions, what kind of bottlenecks have scientists around the world currently encountered? Why are these issues meaningful? Where does the difficulty of these bottlenecks lie?
Jingyi Wei: The first is how to improve the efficiency of these gene editing tools. This matter itself is very significant because the overall efficiency of gene editing is currently low. To solve this problem, you need to understand how the entire process operates, such as how the proteins cut, what the principle of their cutting is, and then you go through rational design to see if you can make the protein’s activity higher, for example, by designing better proteins through Alpha-fold to make the process more efficient.
The second point is the delivery issue I mentioned before, which is how to deliver to cells that are difficult to accept foreign proteins. The significance of this problem is that delivery efficiency largely determines the final editing efficiency. I think this problem is not unique to the field of gene editing; in fact, many fields encounter this issue, which is how to transfer a gene you want to express into other cells and allow it to be highly expressed. Currently, there are methods using nanoparticles as carriers, and there are relatively traditional methods using liposomes, which everyone can buy, combined with the target cells to achieve delivery.
The third point is how to efficiently discover and screen for naturally existing gene editing tools. With the development of sequencing technology, we can obtain more and more biological genomes and discover many potential gene editing tools from them. However, identifying proteins that may edit genes from these large datasets and then massively testing these proteins for editing activity and off-target rates currently consumes a lot of manpower and resources, so it is an area that needs significant optimization.
- Question E: What is your view on the ethical issues that gene editing may raise?
Jingyi Wei: First of all, I think that the existing technology has not yet reached the level where it can edit germ cells. Its editing efficiency has not reached a very high level, and there might also be off-target effects, meaning that while editing a certain gene, other genes might be disrupted. In severe cases, this could even cause a regression in some bodily functions, making the consequences of gene editing even more serious than the original disease, indicating that gene editing technology has safety concerns.
On the other hand, in the long term, if gene editing could indeed reach the level of arbitrary rewriting, people might wish to use germ cell gene editing to make their offspring taller, or smarter, stronger. This would pose ethical issues, as it would mean intervening in your offspring artificially rather than through natural inheritance. Over time, this might even lead to the emergence of superhumans who are superior in all aspects above the average level, which could lead to increasing unfairness among humans.
- Question F: Currently, some scientists treat genes as texts with only 4 words. Can we train large language models on them, and what are the difficulties?
Jingyi Wei: I think this is a very interesting question. I once took a course on using deep learning to study and deal with gene-related issues. In terms of results, it is indeed very feasible. Some directions can indeed use large language models for related training. For example, I have also been researching using deep learning models to predict the efficiency of editing genes with different guide RNAs. In short, for common gene editing tools like Crispr Cas-9, you basically have a protein, and then you pair it with a relatively short guide RNA, which can bind to your DNA. This is essentially a one-to-one binding, requiring a perfect match (base pairing). However, our guide RNA is often only about 20 bases long, but my gene might be 200 or even 2000 bases long, so we have many potential matching options, but different choices can lead to completely different efficiencies. So, I use deep learning to train a CNN or RNN sequence network, which can study what kind of sequences will bring higher gene editing efficiency. This is what I’ve been working on, involving using deep learning models to predict the efficiency of gene editing. Additionally, deep learning technology can also be used to predict how different transcription regulatory factors affect the expression of certain genes (because transcription factors usually have some characteristic binding sequences), which is also a way to apply deep learning technology in the field of gene regulation.
So, I think we can indeed treat gene sequences as texts containing only 4 words (A, T, C, G) and then train and predict with large language models. For some problems, we can do this, but for others, it will be more complex. Although DNA is simply composed of four nucleotides, ATCG, it also has some epigenetic effects, meaning there are modifications on the DNA, and these modifications outside of the sequences themselves can also affect gene expression and function. At the same time, some proteins in the cell will interact with DNA or RNA, affecting gene expression. So, to deduce some properties of genes, we might need more information besides the ATCG sequence, and encode them into the deep learning network because they will significantly affect the final prediction results. Collecting this data is also a challenge.