Activity | Notion

In-Class Activity

Setup for the activity.
1. Go to the openfold installation and make a new directory for the activity: mkdir activity.
2. Edit the run_pretrained_openfold.py script at line 483 such that it is the correct path to the downloaded parameters. You should be able to change it from "openfold", "resources", "params", to os.path.dirname(__file__), "openfold", "resources", "params", .
Make a prediction with AlphaFold2 (AF2) weights without a MSA or templates.
1. Make new folder for this example: cd activity; mkdir 1_af2; cd 1_af2
2. Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir
3. Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/1QYS> -O 1QYS.fasta
  - What is a .fasta file?
4. Make an empty alignment directory in 1_af2 and another empty directory in there: cd ..; mkdir alignments; mkdir alignments/1QYS_1
5. Copy the inference script: cd ..; cp ../../examples/monomer/inference.sh .
6. Edit the inference script by 1) updating the path to run_pretrained_openfold.py, 2) change the MMCIF_DIR to ../../mmcifs (see next step), and 3) add the --save_outputs argument.
7. Make a new MMCIF_DIR with a placeholder .cif in it: mkdir ../../mmcifs; cp ../../tests/test_data/mmcifs/1hf9.cif ../../mmcifs
8. Run the inference script: bash inference.sh
  - What did that script create?
    - What is contained inside of the *_output_dict.pkl file?
      - The .pkl refers to a pickle file. To open this, use:
        
        import pickle with open('af2_pred_output_dict.pkl', 'rb') as f: results_dict = pickle.load(f) print(results_dict.keys())
  - How do the predicted structures compare to the native (1QYS)?
  - How different is the relaxed prediction from the unrelaxed prediction?
  - How confident is the model about its prediction?
  - Change the --config_preset argument in inference.sh to a different model version, e.g. one of model_2_ptm, model_3_ptm, model_4_ptm, or model_5_ptm.
    - How different are the predictions and confidence?
    - Look at openfold/openfold/config.py and compare the settings between the models. (Look inside the model_config function).
Make a prediction with OpenFold (OF) weights without a MSA or templates.
1. Make new folder for this example: cd activity; mkdir 2_of; cd 2_of
2. Copy fasta_dir, alignments, and inference.sh from the previous example.
3. Add --openfold_checkpoint_path /path/to/openfold/resources/openfold_params/finetuning_ptm_1.pt to the inference script. Make sure to use --config_preset model_1_ptm.
4. Run the inference script: bash inference.sh
  - How does the OF prediction compare to the AF2 prediction? Look at the predicted structures as well as the confidence values.
Make a prediction with AF2 with MSAs and templates.
1. Make a new folder for this example: cd activity; mkdir 3_af2_msa; cd 3_af2_msa
2. Copy fasta_dir and inference.sh from the first example.
3. Download this script for computing MSAs and templates:
  
  get_mmseqs_msa_templates.py
4. Generate MSA and templates for sequence in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs
  - What do the MSA and template information look like?
5. Run the inference script: bash inference.sh
  - How do the predictions compare to those without MSA and templates?

Make a multimer prediction with AF2-Multimer.

Make a new folder for this example: cd activity; mkdir 4_af2_multimer; cd 4_af2_multimer
Copy inference.sh from the first example
Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir
Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/8AJY> -O 8AJY.fasta
Generate MSA and templates for sequences in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs

In order to run the multimer model with the MMseqs2 MSA and templates, we first need to disable sequence pairing. To do this, edit the _all_seq_msa_features method of DataPipelineMultimer so that it matches the following:

    def _all_seq_msa_features(alignment_dir, alignment_index):
        """Get MSA features for unclustered uniprot, for pairing."""
        msas = []
        if alignment_index is not None:
            fp = open(os.path.join(alignment_dir, alignment_index["db"]), "rb")

            def read_msa(start, size):
                fp.seek(start)
                msa = fp.read(size).decode("utf-8")
                return msa

            start, size = next(iter((start, size) for name, start, size in alignment_index["files"]
                                    if name == 'uniprot_hits.sto'))

            msa = parsers.parse_stockholm(read_msa(start, size))
            msas.append(msa)
            fp.close()
        else:
            for f in os.listdir(alignment_dir):
                path = os.path.join(alignment_dir, f)
                filename, ext = os.path.splitext(f)

                if ext == ".a3m":
                    with open(path, "r") as fp:
                        msa = parsers.parse_a3m(fp.read())
                elif ext == ".sto" and filename not in ["uniprot_hits", "hmm_output"]:
                    with open(path, "r") as fp:
                        msa = parsers.parse_stockholm(
                            fp.read()
                        )
                else:
                    continue

                msas.append(msa)
            # uniprot_msa_path = os.path.join(alignment_dir, "uniprot_hits.sto")
            # if not os.path.exists(uniprot_msa_path):
            #     chain_id = os.path.basename(os.path.normpath(alignment_dir))
            #     raise ValueError(f"Missing 'uniprot_hits.sto' for {chain_id}. "
            #                      f"This is required for Multimer MSA pairing.")

            # with open(uniprot_msa_path, "r") as fp:
            #     uniprot_msa_string = fp.read()
            # msa = parsers.parse_stockholm(uniprot_msa_string)

        all_seq_features = make_msa_features([msa])
        valid_feats = msa_pairing.MSA_FEATURES + (
            'msa_species_identifiers',
        )
        feats = {
            f'{k}_all_seq': v for k, v in all_seq_features.items()
            if k in valid_feats
        }
        return feats

Within the inference script, change --config_preset to model_1_multimer_v3.
Run the inference script: bash inference.sh
- Do the predictions match the native structure?
- How confidence is AF2-Multimer?
- What happens when you remove MSAs or templates for this complex?

Project Time

(Use your target protein)

Compare predictions with the different AF2 model weights
Compare predictions with different input settings
- With or without MSA
- With or without templates
Compare AF2 and OF predictions of your protein
AF2 is known to poorly model the effects of mutations. Choose few residues within the core of your target protein and mutate them to something you think should be disruptive. How does AF2 predict this mutant?
Can you find the minimal set required inputs?
- How big of an MSA is needed?
- How many templates?
- How many recycles?