Number of locus : 15

Читайте также:

SUK IST YAR KH UKR PCH BUSHA PODO LSK FR KAL KAZ

SUK (40) 0.000 0.080 0.082 0.085 0.107 0.085 0.080 0.121 0.081 0.053 0.084 0.073

IST (49) 0.080 0.000 0.067 0.062 0.105 0.066 0.073 0.135 0.074 0.049 0.095 0.070

YAR (44) 0.082 0.067 0.000 0.051 0.082 0.057 0.071 0.144 0.057 0.071 0.060 0.056

KH (42) 0.085 0.062 0.051 0.000 0.083 0.023 0.084 0.157 0.077 0.066 0.068 0.058

UKR (30) 0.107 0.105 0.082 0.083 0.000 0.091 0.088 0.133 0.095 0.094 0.076 0.074

PCH (33) 0.085 0.066 0.057 0.023 0.091 0.000 0.083 0.154 0.090 0.070 0.080 0.052

BUSHA (35) 0.080 0.073 0.071 0.084 0.088 0.083 0.000 0.142 0.059 0.071 0.077 0.044

PODO (18) 0.121 0.135 0.144 0.157 0.133 0.154 0.142 0.000 0.165 0.133 0.142 0.127

LSK (39) 0.081 0.074 0.057 0.077 0.095 0.090 0.059 0.165 0.000 0.081 0.080 0.054

FR (43) 0.053 0.049 0.071 0.066 0.094 0.070 0.071 0.133 0.081 0.000 0.073 0.067

KAL (28) 0.084 0.095 0.060 0.068 0.076 0.080 0.077 0.142 0.080 0.073 0.000 0.064

KAZ (40) 0.073 0.070 0.056 0.058 0.074k 0.052 0.044 0.127 0.054 0.067 0.064 0.000

Allele frequency data

The first line indicates the number of

populations. In the following lines the population names are shown. Each population name is

shown in one line. Then, allele frequency data are shown.

n populations

1 population1

2 population2

3 population3

We have locus and the number of chromosomes examined in each population

he line for different alleles consists of the number of nucleotide repeats for

microsatellite DNA loci (the name of allele for other data) and allele frequencies of

populations separated by "*".

XX is the number of repeats for microsatellite DNA (not the fragment size) or for the name of

an allele for other kinds of data. frq1, frq2, and frq3 are frequencies of the allele for

populations 1, 2, and 3. After all allele frequencies for one locus are shown, the number of

chromosomes (not the number of individuals) examined is shown. The line for the number of

chromosomes should start with "#".

The number of repeats and the number of chromosomes do not have to be integer. If

you use only DA, DST, and FST*

distances, XX does not have to be the number of repeats for an

allele and can be anything. The numbers of repeats for alleles are necessary for the

computation of (δµ)

and DSW distances

Genotype data in GENEPOP (Rousset 2008) format can be used as an input file of

POPTREE2. The input data has the following format.

Title line:

Locus name 1

Locus name 2

…

Pop population name 113

Sample 1

Sample 2

…

POP population name 2

…

pop

…

It has three sections: 1) title line, 2) locus names, and 3) genotype data of samples in different

populations.

1) Title line

The first line is a title line. Any comment can be written in it.

2) Locus names

Following the title line are names of loci. They can be shown in different lines or

separated by a comma.

3) Genotype data of each sample in populations. The data of different populations are

separated by “Pop” tags. Following the “Pop” tag, the name of population can be specified. If

a population name is not specified, Data of each sample are shown in a line. The sample name

can be shown before a comma. Genotypes of loci for a sample are separated by a space or a

tab

Neighbor-joining (NJ) and UPGMA trees

In the NJ method (Saitou and Nei 1987), starting from a star-tree (all branches are

connected to one node), a pair of taxa (populations) which gives the smallest sum of branch

lengths are combined into a cluster and form a composite taxa. This process is repeated until

an unrooted tree is produced. The branch lengths are computed by the least-squares method in

each step.

In the UPGMA (Sneath and Sokal 1973), a pair of taxa with the smallest distance are

combined into one cluster and form a composite taxa. This process is repeated until a rooted

tree is made. The branch lengths are calculated so that the sum of the branch lengths from the

taxa to the node connecting the two taxa is half the distance of the two taxa. In the UPGMA, 15

the molecular clock (rate constancy) is implicitly assumed (Chakraborty 1977). If the rate

constancy approximately holds, the UPGMA can be efficient in constructing the correct tree

topology (Takezaki and Nei 1996).

In the bootstrap test (Felsenstein 1985), the loci are resampled with replacement in

POPTREE2. The phylogenetic tree is constructed with the distance values calculated from the

same number of resampled loci as that of the original input dataset in each replication. The

number of replications in which the branch (the grouping of the taxa separated by the branch)

appeared is counted and the proportion of this number in the total replications is shown in

percent on the branch of the tree in the “Phylogeny” window. In the case of the UPGMA tree,

the bootstrap numbers are counted by removing the root of the tree.

NJ method and UPGMA produce bifurcating trees. However, in the phylogenetic tree

displayed in the “Phylogeny” window, the branches with length zero or negative values are

treated as though they do not exist. Because of this treatment a multifurcating node sometimes

appears.

This page shows just one method (UPGMA clustering) for calculating phylogenies from molecular comparison data. There are many other methods (bootstrapping, jack-knifing, parsimony, maximum likelihood, and more), and these may be more appropriate to use in given circumstances. The main purpose of this page is simply to demonstrate one approach to calculation of a phylogeny from molecular comparisons.

First, let's look at some typical molecular comparison data. Figure 1 shows some typical cytochrome c comparisons The numbers in the cells show differences between the cytochrome c molecules of various species: for example, there is only 1 difference in the amino acid sequences between man and monkey, but there are 19 differences between man and turtle.

Figure 1. Selected Cytochrome C comparisons.

In Figure 2, the UPGMA method is applied to the Figure 1 data sample. At each cycle of the method, the smallest entry is located, and the entries intersecting at that cell are "joined." The height of the branch for this junction is one-half the value of the smallest entry. Thus, since the smallest entry at the beginning is 1 (between B=man and F=monkey), B and F are joined with branch heights of 0.5 (=1.0/2). Then, the comparison matrix is reduced by combining cells. These combinations are indicated with colors in Figure 2. For example, the comparisons of A to B (19.0) and A to F (18.0) are consolidated as 18.5 = (19.0+18.0)/2 (red cells), while the comparisons of E to B (36.0) and E to F (35.0.0) are consolidated as 35.5 = (36.0+35.0)/2 (blue cells).

The process is repeated on the reduced comparison matrix, resulting in a smaller matrix with each cycle. When the matrix is completely reduced, the calculation is finished.

Figure 2. Application of UPGMA Clustering Technique.

The final phylogeny calculated from the Figure 1 data is shown in Figure 3. It is in perfect accord with the fossil record, showing fish ancestral to reptiles, reptiles ancestral to mammals, birds splitting from reptiles after the reptile/mammal split, and so forth. The lengths of branches indicate time since last common ancestry; for example, moths and tuna (18.2 branch length) separated long before turtles and chickens (4.0 branch length).

Figure 3. Results of UPGMA Clustering Technique.

What makes such calculations of phylogenies interesting is the fact that the results so often agree with evolutionary trees developed from other methods (anatomy, fossils, or other proteins or genes). Indeed, molecular comparisons provide ample "repeat experiments" of the hypothesis of evolution.

Дата добавления: 2015-11-14; просмотров: 44 | Нарушение авторских прав

<== предыдущая страница	\|	следующая страница ==>
Gr biotechnology	\|	Дараган Сергей Вячеславович

mybiblioteka.su - 2015-2024 год. (0.008 сек.)