DNA Data File for Alpha/Beta/Folding


These are the 4 protein sequences that were used in music composed on KAM. The comments are by Dr. Mary Anne Clark.


The sequence below is the human equivalent of the bovine gamma lens crystallin. As before, the [] indicates regions of beta-sheet structure (accordion pleats, more or less0 and the () indicates regions of alpha-helix (springs). In addition, I have added dashes to indicate the presence of "turns" in the protein -- places where it bends. This may be overkill, and I didn't do it for the previous proteins. This is a "mostly beta" structure according to the CATH system for classifying protein folds. The protein is interesting because of the internal evidence for gene duplication; from an original sequence of about 40 amino acids, two cycles of tandem doubling seem to have produced this final 4-motif structure.

I have aligned the four domains defined for the protein so that the similarities in the four motifs can be more easily seen. You may want to incorporate some indication of this four-domain structure into the musical representation of the protein.

GN   CRYGB OR CRYG2.

OS   HOMO SAPIENS (HUMAN).



KW   EYE LENS PROTEIN; MULTIGENE FAMILY; DUPLICATION.

FT   INIT_MET      0      0

FT   DOMAIN        1     39       MOTIF 1.

FT   DOMAIN       40     83       MOTIF 2.

FT   DOMAIN       84     87       CONNECTING PEPTIDE.

FT   DOMAIN       88    128       MOTIF 3.

FT   DOMAIN      129    174       MOTIF 4.

SQ   SEQUENCE   174 AA;  20776 MW;  1FA10D9C CRC32;





    ~G[KITFY]EDR-AF-QGR[SYEC]TTDCPNL-QPY-FS RCN[SIRVES]/

   GC[WMIYER]-PNY-[QGHQYFL]RR[GEY]PD(YQQ)-WM-GLSDSIR[SCCL]IPP/

   HSGA/

    Y[RMKIYDR]-DEL-[RGQMSEL]TDDCLS(VQDRF)HLTEIH[SLNVLE]/

   GS[WILYEM]-PNY-[RGRQYLL]RP[GEY]RR-FLDWG-APNAKVG[SLRR]VMDLY~




The protein below is human beta-globin, which is identical also in chimpanzees and in bonobos. This is a "mostly-alpha" sequence. The regions of helix, some of which are quite long, are designated by the enclosing (). I have also marked the "turns" in this protein, but as before, I am inclined to ignore them.

DE   HEMOGLOBIN BETA CHAIN.

GN   HBB.

OS   HOMO SAPIENS (HUMAN), PAN TROGLODYTES (CHIMPANZEE), AND PAN PANISCUS

OS   (PYGMY CHIMPANZEE) (BONOBO).



SQ   SEQUENCE   146 AA;  15867 MW;  EC9744C9 CRC32;



     ~VHLTP(EEKSAVTALW)-GK-VN(VDEVGGEALGRLLVV)Y(PWTQRF)F(ESF)GDLST 

     (PDAVM)-G-N(PKVKAHGKKVLGAFSDGL)-AH-(LDNLKGTFATLSELHCD)-KL-

     HVD-P-(ENFRLLGNVLVCVLAHHFGKE)FT(PPVQAAYQKVVAGVANALA)-HK-YH~




Here are protein and corresponding DNA [actually the DNA equivalents of the mRNAs] for superoxide dismutase and triosephosphate isomerase. I have indicated beta sheet regions of the proteins by [] and alpha helix by (). I think it would be interesting to distinguish these musically in some way.

SODismutase is mostly beta, and TPIsomerase alternates alpha helix and beta sheet segments.

>gi|134611|sp|P00441|SODC_HUMAN SUPEROXIDE DISMUTASE (CU-ZN).



~MAT[KAVCVL]KGDGPV[QGIINFEQK]ESNG[PVKVWGSIK]GLTEG[LHGFHVH]EFGDNTAG(CTS)AGPHFNPLSR

KHGGPKDEERHVG[DLGNV]TADKDG[VADVSIED]SVISLSGDHCIIGR[TLVVH] 

EKADDLGKGGNEE(STKT)GN

AGS[RLACGV]I[GI]AQ~



[] Beta strand    () helix




>gi|136060|sp|P00938|TPIS_HUMAN TRIOSEPHOSPHATE ISOMERASE (TIM).



~MAPSRK[FFVGGN]WKMNGR(KQSLGELIGTLNA)AKVPAD[TEVVCA]PPTAY(IDFARQK)

LDPKI[AVAAQ]NCYKVTNGAFTGEISP(GMIKD)CGAT[WVVL]GH(SERRHVF)GES

(DELIGQKVAHALA)EGL[GVIACI]GEK(LDERE)AGI(TEKVVFEQTKVIADN)VKDWSK

[VVLAY]EP(VWA)IGTGKTAT(PQQAQEVHEKLRGWLKSNV)S(DAVAQS)TR[IIYG]

GSVTGAT(CKEL)ASQPDVD[GFLV]G(GASL)KPE(FVDIIN)AKQ~



[] Beta strand    () helix