openfold installation and make a new directory for the activity: mkdir activity.run_pretrained_openfold.py script at line 483 such that it is the correct path to the downloaded parameters. You should be able to change it from "openfold", "resources", "params", to os.path.dirname(__file__), "openfold", "resources", "params", .cd activity; mkdir 1_af2; cd 1_af2mkdir fasta_dir; cd fasta_dirwget <https://www.rcsb.org/fasta/entry/1QYS> -O 1QYS.fasta
.fasta file?1_af2 and another empty directory in there: cd ..; mkdir alignments; mkdir alignments/1QYS_1cd ..; cp ../../examples/monomer/inference.sh .run_pretrained_openfold.py, 2) change the MMCIF_DIR to ../../mmcifs (see next step), and 3) add the --save_outputs argument.MMCIF_DIR with a placeholder .cif in it: mkdir ../../mmcifs; cp ../../tests/test_data/mmcifs/1hf9.cif ../../mmcifsbash inference.sh
What did that script create?
*_output_dict.pkl file?
The .pkl refers to a pickle file. To open this, use:
import pickle
with open('af2_pred_output_dict.pkl', 'rb') as f:
results_dict = pickle.load(f)
print(results_dict.keys())
How do the predicted structures compare to the native (1QYS)?
How different is the relaxed prediction from the unrelaxed prediction?
How confident is the model about its prediction?
Change the --config_preset argument in inference.sh to a different model version, e.g. one of model_2_ptm, model_3_ptm, model_4_ptm, or model_5_ptm.
openfold/openfold/config.py and compare the settings between the models. (Look inside the model_config function).cd activity; mkdir 2_of; cd 2_offasta_dir, alignments, and inference.sh from the previous example.--openfold_checkpoint_path /path/to/openfold/resources/openfold_params/finetuning_ptm_1.pt to the inference script. Make sure to use --config_preset model_1_ptm.bash inference.sh
Make a new folder for this example: cd activity; mkdir 3_af2_msa; cd 3_af2_msa
Copy fasta_dir and inference.sh from the first example.
Download this script for computing MSAs and templates:
Generate MSA and templates for sequence in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs
Run the inference script: bash inference.sh
Make a new folder for this example: cd activity; mkdir 4_af2_multimer; cd 4_af2_multimer
Copy inference.sh from the first example
Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir
Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/8AJY> -O 8AJY.fasta
Generate MSA and templates for sequences in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs
In order to run the multimer model with the MMseqs2 MSA and templates, we first need to disable sequence pairing. To do this, edit the _all_seq_msa_features method of DataPipelineMultimer so that it matches the following:
def _all_seq_msa_features(alignment_dir, alignment_index):
"""Get MSA features for unclustered uniprot, for pairing."""
msas = []
if alignment_index is not None:
fp = open(os.path.join(alignment_dir, alignment_index["db"]), "rb")
def read_msa(start, size):
fp.seek(start)
msa = fp.read(size).decode("utf-8")
return msa
start, size = next(iter((start, size) for name, start, size in alignment_index["files"]
if name == 'uniprot_hits.sto'))
msa = parsers.parse_stockholm(read_msa(start, size))
msas.append(msa)
fp.close()
else:
for f in os.listdir(alignment_dir):
path = os.path.join(alignment_dir, f)
filename, ext = os.path.splitext(f)
if ext == ".a3m":
with open(path, "r") as fp:
msa = parsers.parse_a3m(fp.read())
elif ext == ".sto" and filename not in ["uniprot_hits", "hmm_output"]:
with open(path, "r") as fp:
msa = parsers.parse_stockholm(
fp.read()
)
else:
continue
msas.append(msa)
# uniprot_msa_path = os.path.join(alignment_dir, "uniprot_hits.sto")
# if not os.path.exists(uniprot_msa_path):
# chain_id = os.path.basename(os.path.normpath(alignment_dir))
# raise ValueError(f"Missing 'uniprot_hits.sto' for {chain_id}. "
# f"This is required for Multimer MSA pairing.")
# with open(uniprot_msa_path, "r") as fp:
# uniprot_msa_string = fp.read()
# msa = parsers.parse_stockholm(uniprot_msa_string)
all_seq_features = make_msa_features([msa])
valid_feats = msa_pairing.MSA_FEATURES + (
'msa_species_identifiers',
)
feats = {
f'{k}_all_seq': v for k, v in all_seq_features.items()
if k in valid_feats
}
return feats
Within the inference script, change --config_preset to model_1_multimer_v3.
Run the inference script: bash inference.sh
(Use your target protein)