I ended my previous post on the sequence analysis of SARS-CoV-2 with the amino acid alignment of the spike protein from SARS-CoV-2 (MN908947) and Bat coronavirus RaTG13 (MN996532). The spike protein is of specific interest as it is due to its binding with the angiotensin converting enzyme 2 (ACE2) receptor that it is able to gain entry into cells and replicate. In the paper "Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation" the authors were able to express the spike protein and determine its structure using cryogenic electron microscopy (cryo-EM). After my alignment of the spike proteins from MN908947 and MN996532, I was interested in the protein domains that contained the variants between the two viruses.
Supplementary figure 5 from Wrapp et al. contained the multiple sequence alignment of the spike protein from SARS-CoV-2, Bat coronavirus RaTG13, and in addition SARS-CoV with the domain annotations. Below I show my sequence alignment from my previous post broken into the individual protein domains as per Wrapp et al.
N-terminal domain.
MN996532.1 VNLTTRTQLPPAYTNSSTRGVYYPDKVFRSSVLHLTQDLFLPFFS
MN908947.3 VNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS
**************** ***************** **********
MN996532.1 NVTWFHAIHVSGTNGIKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
MN908947.3 NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
*************** ********************************************
MN996532.1 NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
MN908947.3 NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
************************************************************
MN996532.1 GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPPGFSALEPLVDLPIGINITRFQT
MN908947.3 GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
************************************* **********************
MN996532.1 LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
MN908947.3 LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
************************************************************
MN996532.1 CTLKS
MN908947.3 CTLKS
*****
Receptor-binding domain.
MN996532.1 PNITNLCPFGEVFNATTFASVYAWNRKRISN
MN908947.3 PNITNLCPFGEVFNATRFASVYAWNRKRISN
**************** **************
MN996532.1 CVADYSVLYNSTSFSTFKCYGVSPTKLNDLCFTNVYADSFVITGDEVRQIAPGQTGKIAD
MN908947.3 CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
***********:****************************** *****************
MN996532.1 YNYKLPDDFTGCVIAWNSKHIDAKEGGNFNYLYRLFRKANLKPFERDISTEIYQAGSKPC
MN908947.3 YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
******************:::*:* ***:*********:******************.**
MN996532.1 NGQTGLNCYYPLYRYGFYPTDGVGHQPYRVVVLSFELLNAP
MN908947.3 NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAP
** *:***:** *** **:***:*************:**
S1/S2 protease cleavage site insertion.
MN996532.1 FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
MN908947.3 FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
************************************************************
MN996532.1 GTNASNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
MN908947.3 GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
***:********************************************************
MN996532.1 ECDIPIGAGICASYQTQTNS----RSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
MN908947.3 ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
******************** ************************************
S2' protease cleavage site and fusion peptide.
MN996532.1 RSFIEDLLFNKVTLADAGF
MN908947.3 RSFIEDLLFNKVTLADAGF
*******************
Heptad repeat 1.
MN996532.1 GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD
MN908947.3 GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD
******************************************************************************
Central helix.
MN996532.1 KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG
MN908947.3 KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG
**************************************************
Connector domain.
MN996532.1 TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGSCDVVIGIVNNTVYDPL
MN908947.3 TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPL
*************************************************.****************
Once we break the sequence alignment into its separate domains, we can clearly see that most of the variation is in the receptor-binding domain, which is the part that binds to the host ACE2 receptor. In Wrapp et al. it was shown that ACE2 binds to the SARS-CoV-2 S ectodomain (the receptor-binding domain) with approximately 10- to 20-fold higher affinity than ACE2 binding to SARS-CoV S. Furthermore, the S1/S2 protease cleavage site insertion results in an "RRAR" furin recognition site in SARS-CoV-2. It was noted in Warpp et al. that:
Notably, in influenza viruses, amino acid insertions that create a polybasic furin site in a related position in influenza hemagglutinin proteins are often found in highly virulent avian and human influenza viruses (see ref).
For my next post, I will dig into the raw sequencing reads, instead of relying on assembled viral genomes, to try to examine the viral diversity within individual patients and to learn a bit more about polybasic furin sites.

This work is licensed under a Creative Commons
Attribution 4.0 International License.
