Sequence analysis of SARS-CoV-2 part 2

I ended my previous post on the sequence analysis of SARS-CoV-2 with the amino acid alignment of the spike protein from SARS-CoV-2 (MN908947) and Bat coronavirus RaTG13 (MN996532). The spike protein is of specific interest as it is due to its binding with the angiotensin converting enzyme 2 (ACE2) receptor that it is able to gain entry into cells and replicate. In the paper “Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation” the authors were able to express the spike protein and determine its structure using cryogenic electron microscopy (cryo-EM). After my alignment of the spike proteins from MN908947 and MN996532, I was interested in the protein domains that contained the variants between the two viruses.

Supplementary figure 5 from Wrapp et al. contained the multiple sequence alignment of the spike protein from SARS-CoV-2, Bat coronavirus RaTG13, and in addition SARS-CoV with the domain annotations. Below I show my sequence alignment from my previous post broken into the individual protein domains as per Wrapp et al.

N-terminal domain.

MN996532.1      VNLTTRTQLPPAYTNSSTRGVYYPDKVFRSSVLHLTQDLFLPFFS
MN908947.3      VNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS
                **************** ***************** **********

MN996532.1      NVTWFHAIHVSGTNGIKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
MN908947.3      NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
                *************** ********************************************

MN996532.1      NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
MN908947.3      NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
                ************************************************************

MN996532.1      GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPPGFSALEPLVDLPIGINITRFQT
MN908947.3      GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
                ************************************* **********************

MN996532.1      LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
MN908947.3      LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
                ************************************************************

MN996532.1      CTLKS 
MN908947.3      CTLKS
                *****

Receptor-binding domain.

MN996532.1      PNITNLCPFGEVFNATTFASVYAWNRKRISN
MN908947.3      PNITNLCPFGEVFNATRFASVYAWNRKRISN
                **************** **************

MN996532.1      CVADYSVLYNSTSFSTFKCYGVSPTKLNDLCFTNVYADSFVITGDEVRQIAPGQTGKIAD
MN908947.3      CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
                ***********:****************************** *****************

MN996532.1      YNYKLPDDFTGCVIAWNSKHIDAKEGGNFNYLYRLFRKANLKPFERDISTEIYQAGSKPC
MN908947.3      YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
                ******************:::*:* ***:*********:******************.**

MN996532.1      NGQTGLNCYYPLYRYGFYPTDGVGHQPYRVVVLSFELLNAP
MN908947.3      NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAP
                **  *:***:**  *** **:***:*************:**

S1/S2 protease cleavage site insertion.

MN996532.1      FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
MN908947.3      FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
                ************************************************************

MN996532.1      GTNASNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
MN908947.3      GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
                ***:********************************************************

MN996532.1      ECDIPIGAGICASYQTQTNS----RSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
MN908947.3      ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
                ********************    ************************************

S2′ protease cleavage site and fusion peptide.

MN996532.1      RSFIEDLLFNKVTLADAGF
MN908947.3      RSFIEDLLFNKVTLADAGF
                *******************

Heptad repeat 1.

MN996532.1      GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD
MN908947.3      GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD
                ******************************************************************************

Central helix.

MN996532.1      KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG
MN908947.3      KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG
                **************************************************

Connector domain.

MN996532.1      TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGSCDVVIGIVNNTVYDPL
MN908947.3      TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPL
                *************************************************.****************

Once we break the sequence alignment into its separate domains, we can clearly see that most of the variation is in the receptor-binding domain, which is the part that binds to the host ACE2 receptor. In Wrapp et al. it was shown that ACE2 binds to the SARS-CoV-2 S ectodomain (the receptor-binding domain) with approximately 10- to 20-fold higher affinity than ACE2 binding to SARS-CoV S. Furthermore, the S1/S2 protease cleavage site insertion results in an “RRAR” furin recognition site in SARS-CoV-2. It was noted in Warpp et al. that:

Notably, in influenza viruses, amino acid insertions that create a polybasic furin site in a related position in influenza hemagglutinin proteins are often found in highly virulent avian and human influenza viruses (see ref).

For my next post, I will dig into the raw sequencing reads, instead of relying on assembled viral genomes, to try to examine the viral diversity within individual patients and to learn a bit more about polybasic furin sites.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.