Disclaimer: I have neither experience nor expertise in homology modeling. In my molecular visualization workshops I am often asked about it, so I have gathered the information below.
This document is a supplement to Protein Explorer. Homology modeling cannot be done within Protein Explorer, but a homology model produced outside of Protein Explorer with the methods below can then be loaded into Protein Explorer for visualization.
If you are reading this on paper, you can use the hyperlinks at
Summary.
Homology modeling approximates the 3D structure of a target protein for which only the sequence is available, provided an empirical 3D "template" structure is available with >30% sequence identity. In 2001, about 20% of sequences (in Swiss-Prot/TrEMBL) have suitable templates for homology modeling at least part of the sequence. Homology models are useful to get a rough idea where the alpha carbons of key residues sit the folded protein. They can guide mutagenesis experiments, or hypotheses about structure-function relationships. Homology models are unreliable in predicting the conformations of insertions or deletions, i.e. portions of the sequence that don't align with the sequence of the template, as well as the details of sidechain positions. Homology models are unlikely to be useful in modeling ligand docking (drug design) unless the sequence identity with the template is >70%, and even then, less reliable than an empirical crystallographic or NMR structure.
SWISS-MODEL makes it quick and easy to submit a target sequence and get back an automatically generated homology model, provided an empirical structure with >30% sequence identity exists to use as a template. (The template will be identified automatically, and the alignment made automatically.) These automated models may be useful, but will sometimes have errors that could be avoided if manual adjustments are made to the sequence alignment by an expert. Learning to optimise your models manually would take some time (see resources below).
DeepView is freeware integrated with SWISS-MODEL to help you visualize and evaluate the model, aligned with the template. The best way to learn how to do this is with Gale Rhodes' superb tutorial.
1. What is homology modeling?
Suppose you want to know the 3D structure of a target protein that has not been solved empirically by X-ray crystallography or NMR. You have only the sequence. If an empirically determined 3D structure is available for a sufficiently similar protein (50% or better sequence identity would be good), you can use software that arranges the backbone of your sequence identically to this template. This is called "homology modeling". It is, at best, moderately accurate for the positions of alpha carbons in the 3D structure, in regions where the sequence identity is high. It is inaccurate for the details of sidechain positions, and for inserted loops with no matching sequence in the solved structure.
A homology modeling routine needs three items of input:
2. How good can homology modeling be?
Two proteins with a high level of sequence identity, and very similar secondary and tertiary structure (identical "folds"), will nevertheless have not exactly identical backbone conformations, even when determined under comparable conditions. A homology model can be expected to differ from the real structure to at least this extent. Overall differences in protein backbone structures are quantitated with the root mean square deviation of the positions of alpha carbons, or rmsd. "A model can be considered 'accurate enough' or as 'accurate as you can get' when its rmsd is within the spread of deviations observed for experimental structures displaying a similar sequence identity level as the target and template sequences" (Schwede et al., 3DCrunch). How big is this spread?
The 3DCrunch project used the SWISS-MODEL routines to homology model all sequences in the Swiss-Prot database for which appropriate templates exist. (In 2001, about 20% of the sequences have templates with >30% sequence identity with at least part of the sequence [Liisa Holm, personal communication].) In the same project, in order to assess the accuracy of homology modeling, 1,200 models were made for previously solved structures (see Reliability of models generated by SWISS-MODEL). This enabled comparisons of homology models with empirical structures for the same sequence, where the homology model was made using a template with the most similar sequence available, other than the target sequence itself.
To provide a frame of reference for rmsd values, note that up to 0.5 Å rmsd of alpha carbons occurs in independent determinations of the same protein (Chothia and Lesk, 1996). Proteins with 50% sequence identity have on average 1 Å rmsd ( Schwede et al., 3DCrunch). The values given above are for X-ray crystallographic determinations; NMR determinations have rmsd's several fold higher.
If we define a "highly successful homology model" as one having
<=2 Å rmsd from the empirical structure,
then the template must have >=60%
sequence identity with the target for a success rate >70%. Even at high
sequence identities (60%-95%), as many as one in ten homology models
have an rmsd >5 Å vs. the empirical structure.
Below 40% sequence identity, serious errors begin to appear more often.
For the complete distribution of results, see
Reliability of models generated by SWISS-MODEL,
particularly
Table I.
3. The importance of the sequence alignment.
The homology modeling routine will proceed to arrange the backbone of the
target sequence according to that of the template,
using
the sequence alignment to decide where to position each residue.
Therefore, the quality of the sequence alignment is of crucial importance.
Misplaced indels (gaps representing insertions or deletions) will cause residues to be
misplaced in space. Although there are many routines that will do
alignments automatically, careful inspection and adjustment by someone with
specialized training may improve the quality of the alignment, and
hence, of the homology model.
Good tutorials on such corrections will be found under the links
Correcting alignments in Gert Vriend's
Homology Modeling
Course. DeepView (see below) provides features
that assist in adjusting the alignment easily.
4. Databases of Ready-Made Homology Models.
ModBase is worth checking because if you find a model, it provides a PIR-formatted sequence alignment ready to paste into Protein Explorer's MSA3D (see below). 3DCrunch does not provide this. It might also be worth comparing models of the same sequence from ModBase vs. SWISS-MODEL because they use different algorithms.
It is quicker and easier to submit your sequence to SWISS-MODEL than to try to find a model in 3DCrunch, and you'll get the same "first approach" results either way. 3DCrunch appears not to have been updated since 1998, and only sequences in Swiss-Prot/TrEMBL were modeled, whereas you can submit any sequence to SWISS-MODEL.
5. Introductions to the Principles of Homology Modeling.
6. Tutorials and Courses: How To Do Homology Modeling.
DeepView comes with a built-in
tutorial on homology modeling.
This tutorial walks you through the
steps but does not explain in detail what the program is doing.
The SWISS-MODEL homology modeling server returns a DeepView-ready
PDB file, with the model and each template in a different layer.
DeepView has automated routines to display the sequence alignment,
adjust gap positions, show energetically unfavorable regions of the
alignment, find and fix sidechain clashes. It is very powerful but
the many keyboard shortcuts and hard-to-find options make it a challenge
to use effectively on an occasional basis.
The best place to start
is Rhodes' tutorial (see immediately above).
This is the best starting place for beginners who want to learn
about homology modeling. It guides you through the use of
NCBI Entrez to find a sequence in the human genome, using SWISS-MODEL
to get a homology model, and most importantly, using DeepView to
visualize and evaluate the model.
7. Homology Modeling Servers and Software.
SWISS-MODEL accepts one-letter amino acid code. If you need to convert your sequence from three-letter code, you can do it at Paul Stothard's Sequence Manipulation Site (U Alberta, Edmonton, Canada).
Requirements for Swiss-Model:
To use the WHAT IF model builder, you must choose your template and prepare your alignment first. Instructions for doing these are beyond the scope of the present guide.
The following opinion was sent to the Protein Data Bank Discussion Forum in November, 1999 by Gert Vriend:
One of the goals of the WHAT IF homology modelling module is to produce models that are as good as possible. Another goal is to make errors as obvious as possible when they are unavoidable. Todays modelling technology (which includes MD programs) cannot yet predict where a loop will find its new position if it is disturbed by for example mutations or by binding a ligand or sugar. In WHAT IF we therefore decided not to make a random motion (and without insult meant to my friends in the MD world, optimising a mutated loop by MD invariably looks like a random motion) but just to leave the backbone as 'untouched' as possible. The results in the biannual CASP competition are every round making more clear that this is (still) the best strategy. However, not moving the backbone accounts for about 2/3-rd of the total modelling error in WHAT IF's models.