Читайте также:
|
|
SUK IST YAR KH UKR PCH BUSHA PODO LSK FR KAL KAZ
SUK (40) 0.000 0.080 0.082 0.085 0.107 0.085 0.080 0.121 0.081 0.053 0.084 0.073
IST (49) 0.080 0.000 0.067 0.062 0.105 0.066 0.073 0.135 0.074 0.049 0.095 0.070
YAR (44) 0.082 0.067 0.000 0.051 0.082 0.057 0.071 0.144 0.057 0.071 0.060 0.056
KH (42) 0.085 0.062 0.051 0.000 0.083 0.023 0.084 0.157 0.077 0.066 0.068 0.058
UKR (30) 0.107 0.105 0.082 0.083 0.000 0.091 0.088 0.133 0.095 0.094 0.076 0.074
PCH (33) 0.085 0.066 0.057 0.023 0.091 0.000 0.083 0.154 0.090 0.070 0.080 0.052
BUSHA (35) 0.080 0.073 0.071 0.084 0.088 0.083 0.000 0.142 0.059 0.071 0.077 0.044
PODO (18) 0.121 0.135 0.144 0.157 0.133 0.154 0.142 0.000 0.165 0.133 0.142 0.127
LSK (39) 0.081 0.074 0.057 0.077 0.095 0.090 0.059 0.165 0.000 0.081 0.080 0.054
FR (43) 0.053 0.049 0.071 0.066 0.094 0.070 0.071 0.133 0.081 0.000 0.073 0.067
KAL (28) 0.084 0.095 0.060 0.068 0.076 0.080 0.077 0.142 0.080 0.073 0.000 0.064
KAZ (40) 0.073 0.070 0.056 0.058 0.074k 0.052 0.044 0.127 0.054 0.067 0.064 0.000
Allele frequency data
The first line indicates the number of
populations. In the following lines the population names are shown. Each population name is
shown in one line. Then, allele frequency data are shown.
n populations
1 population1
2 population2
3 population3
We have locus and the number of chromosomes examined in each population
he line for different alleles consists of the number of nucleotide repeats for
microsatellite DNA loci (the name of allele for other data) and allele frequencies of
populations separated by "*".
XX is the number of repeats for microsatellite DNA (not the fragment size) or for the name of
an allele for other kinds of data. frq1, frq2, and frq3 are frequencies of the allele for
populations 1, 2, and 3. After all allele frequencies for one locus are shown, the number of
chromosomes (not the number of individuals) examined is shown. The line for the number of
chromosomes should start with "#".
The number of repeats and the number of chromosomes do not have to be integer. If
you use only DA, DST, and FST*
distances, XX does not have to be the number of repeats for an
allele and can be anything. The numbers of repeats for alleles are necessary for the
computation of (δµ)
and DSW distances
Genotype data in GENEPOP (Rousset 2008) format can be used as an input file of
POPTREE2. The input data has the following format.
Title line:
Locus name 1
Locus name 2
…
Pop population name 113
Sample 1
Sample 2
…
POP population name 2
…
pop
…
It has three sections: 1) title line, 2) locus names, and 3) genotype data of samples in different
populations.
1) Title line
The first line is a title line. Any comment can be written in it.
2) Locus names
Following the title line are names of loci. They can be shown in different lines or
separated by a comma.
3) Genotype data of each sample in populations. The data of different populations are
separated by “Pop” tags. Following the “Pop” tag, the name of population can be specified. If
a population name is not specified, Data of each sample are shown in a line. The sample name
can be shown before a comma. Genotypes of loci for a sample are separated by a space or a
tab
Neighbor-joining (NJ) and UPGMA trees
In the NJ method (Saitou and Nei 1987), starting from a star-tree (all branches are
connected to one node), a pair of taxa (populations) which gives the smallest sum of branch
lengths are combined into a cluster and form a composite taxa. This process is repeated until
an unrooted tree is produced. The branch lengths are computed by the least-squares method in
each step.
In the UPGMA (Sneath and Sokal 1973), a pair of taxa with the smallest distance are
combined into one cluster and form a composite taxa. This process is repeated until a rooted
tree is made. The branch lengths are calculated so that the sum of the branch lengths from the
taxa to the node connecting the two taxa is half the distance of the two taxa. In the UPGMA, 15
the molecular clock (rate constancy) is implicitly assumed (Chakraborty 1977). If the rate
constancy approximately holds, the UPGMA can be efficient in constructing the correct tree
topology (Takezaki and Nei 1996).
In the bootstrap test (Felsenstein 1985), the loci are resampled with replacement in
POPTREE2. The phylogenetic tree is constructed with the distance values calculated from the
same number of resampled loci as that of the original input dataset in each replication. The
number of replications in which the branch (the grouping of the taxa separated by the branch)
appeared is counted and the proportion of this number in the total replications is shown in
percent on the branch of the tree in the “Phylogeny” window. In the case of the UPGMA tree,
the bootstrap numbers are counted by removing the root of the tree.
NJ method and UPGMA produce bifurcating trees. However, in the phylogenetic tree
displayed in the “Phylogeny” window, the branches with length zero or negative values are
treated as though they do not exist. Because of this treatment a multifurcating node sometimes
appears.
This page shows just one method (UPGMA clustering) for calculating phylogenies from molecular comparison data. There are many other methods (bootstrapping, jack-knifing, parsimony, maximum likelihood, and more), and these may be more appropriate to use in given circumstances. The main purpose of this page is simply to demonstrate one approach to calculation of a phylogeny from molecular comparisons.
First, let's look at some typical molecular comparison data. Figure 1 shows some typical cytochrome c comparisons The numbers in the cells show differences between the cytochrome c molecules of various species: for example, there is only 1 difference in the amino acid sequences between man and monkey, but there are 19 differences between man and turtle.
Figure 1. Selected Cytochrome C comparisons.
In Figure 2, the UPGMA method is applied to the Figure 1 data sample. At each cycle of the method, the smallest entry is located, and the entries intersecting at that cell are "joined." The height of the branch for this junction is one-half the value of the smallest entry. Thus, since the smallest entry at the beginning is 1 (between B=man and F=monkey), B and F are joined with branch heights of 0.5 (=1.0/2). Then, the comparison matrix is reduced by combining cells. These combinations are indicated with colors in Figure 2. For example, the comparisons of A to B (19.0) and A to F (18.0) are consolidated as 18.5 = (19.0+18.0)/2 (red cells), while the comparisons of E to B (36.0) and E to F (35.0.0) are consolidated as 35.5 = (36.0+35.0)/2 (blue cells).
The process is repeated on the reduced comparison matrix, resulting in a smaller matrix with each cycle. When the matrix is completely reduced, the calculation is finished.
Figure 2. Application of UPGMA Clustering Technique.
The final phylogeny calculated from the Figure 1 data is shown in Figure 3. It is in perfect accord with the fossil record, showing fish ancestral to reptiles, reptiles ancestral to mammals, birds splitting from reptiles after the reptile/mammal split, and so forth. The lengths of branches indicate time since last common ancestry; for example, moths and tuna (18.2 branch length) separated long before turtles and chickens (4.0 branch length).
Figure 3. Results of UPGMA Clustering Technique.
What makes such calculations of phylogenies interesting is the fact that the results so often agree with evolutionary trees developed from other methods (anatomy, fossils, or other proteins or genes). Indeed, molecular comparisons provide ample "repeat experiments" of the hypothesis of evolution.
Дата добавления: 2015-11-14; просмотров: 44 | Нарушение авторских прав
<== предыдущая страница | | | следующая страница ==> |
Gr biotechnology | | | Дараган Сергей Вячеславович |