In-Class Activity

  1. Setup for the activity.
    1. Go to the openfold installation and make a new directory for the activity: mkdir activity.
    2. Edit the run_pretrained_openfold.py script at line 483 such that it is the correct path to the downloaded parameters. You should be able to change it from "openfold", "resources", "params", to os.path.dirname(__file__), "openfold", "resources", "params", .
  2. Make a prediction with AlphaFold2 (AF2) weights without a MSA or templates.
    1. Make new folder for this example: cd activity; mkdir 1_af2; cd 1_af2
    2. Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir
    3. Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/1QYS> -O 1QYS.fasta
      • What is a .fasta file?
    4. Make an empty alignment directory in 1_af2 and another empty directory in there: cd ..; mkdir alignments; mkdir alignments/1QYS_1
    5. Copy the inference script: cd ..; cp ../../examples/monomer/inference.sh .
    6. Edit the inference script by 1) updating the path to run_pretrained_openfold.py, 2) change the MMCIF_DIR to ../../mmcifs (see next step), and 3) add the --save_outputs argument.
    7. Make a new MMCIF_DIR with a placeholder .cif in it: mkdir ../../mmcifs; cp ../../tests/test_data/mmcifs/1hf9.cif ../../mmcifs
    8. Run the inference script: bash inference.sh
      • What did that script create?

        • What is contained inside of the *_output_dict.pkl file?
          • The .pkl refers to a pickle file. To open this, use:

            import pickle
            with open('af2_pred_output_dict.pkl', 'rb') as f:
            	results_dict = pickle.load(f)
            print(results_dict.keys())
            
      • How do the predicted structures compare to the native (1QYS)?

      • How different is the relaxed prediction from the unrelaxed prediction?

      • How confident is the model about its prediction?

      • Change the --config_preset argument in inference.sh to a different model version, e.g. one of model_2_ptm, model_3_ptm, model_4_ptm, or model_5_ptm.

        • How different are the predictions and confidence?
        • Look at openfold/openfold/config.py and compare the settings between the models. (Look inside the model_config function).
  3. Make a prediction with OpenFold (OF) weights without a MSA or templates.
    1. Make new folder for this example: cd activity; mkdir 2_of; cd 2_of
    2. Copy fasta_dir, alignments, and inference.sh from the previous example.
    3. Add --openfold_checkpoint_path /path/to/openfold/resources/openfold_params/finetuning_ptm_1.pt to the inference script. Make sure to use --config_preset model_1_ptm.
    4. Run the inference script: bash inference.sh
      • How does the OF prediction compare to the AF2 prediction? Look at the predicted structures as well as the confidence values.
  4. Make a prediction with AF2 with MSAs and templates.
    1. Make a new folder for this example: cd activity; mkdir 3_af2_msa; cd 3_af2_msa

    2. Copy fasta_dir and inference.sh from the first example.

    3. Download this script for computing MSAs and templates:

      get_mmseqs_msa_templates.py

    4. Generate MSA and templates for sequence in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs

      • What do the MSA and template information look like?
    5. Run the inference script: bash inference.sh

      • How do the predictions compare to those without MSA and templates?
  5. Make a multimer prediction with AF2-Multimer.
    1. Make a new folder for this example: cd activity; mkdir 4_af2_multimer; cd 4_af2_multimer

    2. Copy inference.sh from the first example

    3. Make a new folder for the fasta: mkdir fasta_dir; cd fasta_dir

    4. Download a fasta file to use for prediction:wget <https://www.rcsb.org/fasta/entry/8AJY> -O 8AJY.fasta

    5. Generate MSA and templates for sequences in fasta_dir: python ../../get_mmseqs_msa_templates.py ./fasta_dir --alignment_dir ./alignments --mmcif_dir ../../mmcifs

    6. In order to run the multimer model with the MMseqs2 MSA and templates, we first need to disable sequence pairing. To do this, edit the _all_seq_msa_features method of DataPipelineMultimer so that it matches the following:

          def _all_seq_msa_features(alignment_dir, alignment_index):
              """Get MSA features for unclustered uniprot, for pairing."""
              msas = []
              if alignment_index is not None:
                  fp = open(os.path.join(alignment_dir, alignment_index["db"]), "rb")
      
                  def read_msa(start, size):
                      fp.seek(start)
                      msa = fp.read(size).decode("utf-8")
                      return msa
      
                  start, size = next(iter((start, size) for name, start, size in alignment_index["files"]
                                          if name == 'uniprot_hits.sto'))
      
                  msa = parsers.parse_stockholm(read_msa(start, size))
                  msas.append(msa)
                  fp.close()
              else:
                  for f in os.listdir(alignment_dir):
                      path = os.path.join(alignment_dir, f)
                      filename, ext = os.path.splitext(f)
      
                      if ext == ".a3m":
                          with open(path, "r") as fp:
                              msa = parsers.parse_a3m(fp.read())
                      elif ext == ".sto" and filename not in ["uniprot_hits", "hmm_output"]:
                          with open(path, "r") as fp:
                              msa = parsers.parse_stockholm(
                                  fp.read()
                              )
                      else:
                          continue
      
                      msas.append(msa)
                  # uniprot_msa_path = os.path.join(alignment_dir, "uniprot_hits.sto")
                  # if not os.path.exists(uniprot_msa_path):
                  #     chain_id = os.path.basename(os.path.normpath(alignment_dir))
                  #     raise ValueError(f"Missing 'uniprot_hits.sto' for {chain_id}. "
                  #                      f"This is required for Multimer MSA pairing.")
      
                  # with open(uniprot_msa_path, "r") as fp:
                  #     uniprot_msa_string = fp.read()
                  # msa = parsers.parse_stockholm(uniprot_msa_string)
      
              all_seq_features = make_msa_features([msa])
              valid_feats = msa_pairing.MSA_FEATURES + (
                  'msa_species_identifiers',
              )
              feats = {
                  f'{k}_all_seq': v for k, v in all_seq_features.items()
                  if k in valid_feats
              }
              return feats
      
    7. Within the inference script, change --config_preset to model_1_multimer_v3.

    8. Run the inference script: bash inference.sh

      • Do the predictions match the native structure?
      • How confidence is AF2-Multimer?
      • What happens when you remove MSAs or templates for this complex?

Project Time

(Use your target protein)