openfold
installation and make a new directory for the activity: mkdir activity
.run_pretrained_openfold.py
script at line 483 such that it is the correct path to the downloaded parameters. You should be able to change it from "openfold", "resources", "params",
to os.path.dirname(__file__), "openfold", "resources", "params",
.cd activity; mkdir 1_af2; cd 1_af2
mkdir fasta_dir; cd fasta_dir
wget <https://www.rcsb.org/fasta/entry/1QYS> -O 1QYS.fasta
.fasta
file?1_af2
and another empty directory in there: cd ..; mkdir alignments; mkdir alignments/1QYS_1
cd ..; cp ../../examples/monomer/inference.sh .
run_pretrained_openfold.py
, 2) change the MMCIF_DIR
to ../../mmcifs
(see next step), and 3) add the --save_outputs
argument.MMCIF_DIR
with a placeholder .cif
in it: mkdir ../../mmcifs; cp ../../tests/test_data/mmcifs/1hf9.cif ../../mmcifs
bash inference.sh
What did that script create?
*_output_dict.pkl
file?
The .pkl
refers to a pickle file. To open this, use:
import pickle
with open('af2_pred_output_dict.pkl', 'rb') as f:
results_dict = pickle.load(f)
print(results_dict.keys())
How do the predicted structures compare to the native (1QYS)?
How different is the relaxed prediction from the unrelaxed prediction?
How confident is the model about its prediction?
Change the --config_preset
argument in inference.sh
to a different model version, e.g. one of model_2_ptm
, model_3_ptm
, model_4_ptm
, or model_5_ptm
.
openfold/openfold/config.py
and compare the settings between the models. (Look inside the model_config
function).cd activity; mkdir 2_of; cd 2_of
fasta_dir
, alignments
, and inference.sh
from the previous example.--openfold_checkpoint_path /path/to/openfold/resources/openfold_params/finetuning_ptm_1.pt
to the inference script. Make sure to use --config_preset model_1_ptm
.bash inference.sh
Make a new folder for this example: cd activity; mkdir 3_af2_msa; cd 3_af2_msa
Copy fasta_dir
and inference.sh
from the first example.
Download this script for computing MSAs and templates:
Generate MSA and templates for sequence in fasta_dir
: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs
Run the inference script: bash inference.sh
Make a new folder for this example: cd activity; mkdir 4_af2_multimer; cd 4_af2_multimer
Copy inference.sh
from the first example
Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir
Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/8AJY> -O 8AJY.fasta
Generate MSA and templates for sequences in fasta_dir
: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs
In order to run the multimer model with the MMseqs2 MSA and templates, we first need to disable sequence pairing. To do this, edit the _all_seq_msa_features
method of DataPipelineMultimer
so that it matches the following:
def _all_seq_msa_features(alignment_dir, alignment_index):
"""Get MSA features for unclustered uniprot, for pairing."""
msas = []
if alignment_index is not None:
fp = open(os.path.join(alignment_dir, alignment_index["db"]), "rb")
def read_msa(start, size):
fp.seek(start)
msa = fp.read(size).decode("utf-8")
return msa
start, size = next(iter((start, size) for name, start, size in alignment_index["files"]
if name == 'uniprot_hits.sto'))
msa = parsers.parse_stockholm(read_msa(start, size))
msas.append(msa)
fp.close()
else:
for f in os.listdir(alignment_dir):
path = os.path.join(alignment_dir, f)
filename, ext = os.path.splitext(f)
if ext == ".a3m":
with open(path, "r") as fp:
msa = parsers.parse_a3m(fp.read())
elif ext == ".sto" and filename not in ["uniprot_hits", "hmm_output"]:
with open(path, "r") as fp:
msa = parsers.parse_stockholm(
fp.read()
)
else:
continue
msas.append(msa)
# uniprot_msa_path = os.path.join(alignment_dir, "uniprot_hits.sto")
# if not os.path.exists(uniprot_msa_path):
# chain_id = os.path.basename(os.path.normpath(alignment_dir))
# raise ValueError(f"Missing 'uniprot_hits.sto' for {chain_id}. "
# f"This is required for Multimer MSA pairing.")
# with open(uniprot_msa_path, "r") as fp:
# uniprot_msa_string = fp.read()
# msa = parsers.parse_stockholm(uniprot_msa_string)
all_seq_features = make_msa_features([msa])
valid_feats = msa_pairing.MSA_FEATURES + (
'msa_species_identifiers',
)
feats = {
f'{k}_all_seq': v for k, v in all_seq_features.items()
if k in valid_feats
}
return feats
Within the inference script, change --config_preset
to model_1_multimer_v3
.
Run the inference script: bash inference.sh
(Use your target protein)