Design scheme of polypeptide sequence and classification of amino acid properties



Polypeptides are compounds composed of complex molecules, each sequence having its own unique chemical and physical properties. Most peptides are relatively easy to synthesize, except for some peptides that are difficult to synthesize, but purification can be difficult. Many peptides are poorly water-soluble, so these hydrophobic peptides often need to be dissolved in organic solvents or specific buffer solutions during purification.

These organic solvents or buffer solutions are sometimes not suitable for biological experiments, so customers cannot use these peptides for research work.

Therefore, when we receive an order for a peptide, we analyze the sequence of these peptides to determine whether it is difficult to synthesize or water-soluble. For some peptides with special sequences, our peptide professionals will give reasonable suggestions to customers after careful analysis and research.

For example, when encountering polypeptide sequences that are difficult to synthesize or have poor water solubility, we will not refuse to synthesize, but provide customers with some methods that can be improved, so that customers can finally obtain satisfactory products. These methods include aspects such as altering the sequence or both ends thereof.

We classify them according to the properties of common amino acids, as detailed in the attached table below. Next, we list the types of peptide problems and ways to reduce or overcome them for customers’ reference. The design of these special peptides can only be finalized after communication and negotiation with Substance staff.

Design Protocols for Difficult-to-Synthesize Peptide Sequences

1. Shorten the sequence

In general, the longer the peptide chain length, the lower the purity of the resulting crude peptide. Most polypeptides with fewer than 15 residues are relatively easy to synthesize. However, when the length of the peptide chain exceeds 20 residues, consideration should be given to how to obtain the desired peptide. In most cases, shortening the length of the peptide to less than 20 residues will achieve desirable results.

2. Reduce the number of hydrophobic residues

If there are many hydrophobic residues in the sequence, especially when they are distributed at 7-12 positions from the carboxyl terminus, the synthesis of the polypeptide will be difficult. This may be due to incomplete coupling due to the formation of β-sheets by the polypeptide side chains during synthesis. In this case, the beta sheet can be opened by replacing one or more hydrophobic residues with some polar residues, or by inserting a Gly or Pro.

3. Minimize “difficult” residues

If the sequence contains more Cys, Met, Arg, Trp residues, the peptide will be difficult to synthesize. Because Cys, Met, Trp or their side chains are easily oxidized. If possible, try to avoid these residues in the sequence, or make some conservative substitutions. For example, Ser replaces Cys, Norleucine replaces Met, and Tyr, Phe or some other hydrophobic residue such as Leu replaces Trp. Lys can be used in place of Arg.

Design protocols for peptide sequences to improve solubility

1. Change the N or C terminus of the sequence

For acidic peptides (i.e. the peptides are negatively charged under neutral conditions), we recommend peptides of the form:

Acetyl-peptide-COOH (acetylated at the N-terminal of the peptide, free carboxyl at the C-terminal) to make the peptide as negatively charged as possible.

For basic peptides (i.e. positively charged peptides under neutral conditions), we recommend peptides in the form:

H-peptide-amide (free amino group at the N-terminus of the polypeptide, amidation at the C-terminus) to make the polypeptide as positive as possible.

2. Shorten or extend the peptide sequence

If the content of hydrophobic residues (W, F, V, I, L, M, Y, A) in the sequence is more than 50%, the solubility of the polypeptide is significantly reduced. In this case, the increased polar residues of the extended sequence often help to increase the polarity of the polypeptide. Conversely, shortening the sequence to reduce hydrophobic residues can also enhance the polarity of the polypeptide. In conclusion, the stronger the polarity of the polypeptide, the better its water solubility.

3. Add hydrophilic residues

In order to improve the solubility of polypeptides, some polypeptide sequences can be arbitrarily added with some polar residues. We suggest adding Glu-Glu to the N- or C-terminus of acidic peptides and Lys-Lys to the N- or C-terminus of basic peptides. If charged groups are not allowed in the sequence, we recommend adding Ser-Gly-Ser to the N or C of the sequence. Obviously, this method is not suitable if neither ends of the polypeptide sequence are allowed to change.

4. Alter the sequence by substituting one or more residues

The solubility of the polypeptide can be improved by changing certain residues in the sequence. A relatively conservative and simple substitution can significantly enhance the solubility of the polypeptide, such as replacing Ala with Gly.

5. Select different structures for a set of overlapping peptides to change the sequence

If it is desired to synthesize some consecutive or overlapping polypeptides, the starting point of each polypeptide should be appropriately changed to balance the hydrophobic and hydrophilic residues in each polypeptide sequence. Or assign “difficult” residues to different sequences (eg, divide two Cys into two sequences, instead of having them appear in the same sequence).

Classification of different properties of amino acids

According to the different characteristics of amino acids, the 20 amino acids and some other common amino acids can be divided into different categories. Some common classification methods are listed below:

1. 20 amino acids and their abbreviations

A: Ala- Alanine
C: Cys – Cysteine
D: Asp-Aspartic acid
E: Glu-Glutamic acid
F: Phe – Phenylalanine
G: Gly – Glycine
H: His – Histidine
I: Ile – Isoleucine
K: Lys- Lysine
L: Leu – Leucine
M: Met – Methionine
N: Asn – Asparagine
P: Pro – Proline
Q: Gln-Glutamine
R: Arg – Arginine
S: Ser – Serine
T: Thr – Threonine
V: Val – Valine
W: Trp – Tryptophan
Y: Tyr – Tyrosine

2. Other common amino acids in protein:

Hydroxyproline (hydroxylated proline – two isomers);
Cystine (oxidised cysteines);
Pyroglutamic acid (cyclised N-terminal glutamic acid)

3. Other amino acids in the polypeptide sequence:

Alpha-amino butyric acid (cysteine ​​replacement)
Beta-amino alanine (straight chain isomer of alanine)
Norleucine (linear sidechain isomer of leucine)

4. Divide by hydrophilicity/hydrophobicity:

Hydrophilic: D, E, H, K, N, Q, R, S, T, Hydroxyproline, pyroglutamic acid
Hydrophobicity: A, F, I, L, M, P, V, W, Y, alpha-amino butyric acid, beta-amino alanine, norleucine
Neutral: C, G

5. Some other amino acid classifications:

Easily oxidized under mild conditions: Cys, Met
prone to deamidation: Asn, Gln, C-terminal amino
Easily degradable: Met, Trp
Positively charged: Lys, Arg, His, N-terminal amino
Negatively charged: Asp, Glu, Tyr, C-terminal carboxyl