构建进化树

构建进化树的主要步骤是比对,建立取代模型,建立进化树以及进化树评估。鉴于以上对于构建系统树的评价,结合本实验室实际情况,以下主要介绍N-J Tree构建的相关软件和操作步骤。

1、用Clustal X构建N-J系统树的过程

(1) 打开Clustal X程序,载入源文件.

File-Load sequences- C: empjc.txt.

(2) 序列比对

Alignment – Output format options – √ Clustal format; CLUSTALW sequence numbers: ON

Alignment – Do complete alignment

(Output Guide Tree file, C: empjc.dnd;Output Alignment file, C: empjc.aln;)

Align → waiting……

等待时间与序列长度、数量以及计算机配置有关。

(3) 掐头去尾

File-Save Sequence as…

Format: ⊙ CLUSTAL

GDE output case: Lower

CLUSTALW sequence numbers: ON

Save from residue: 39 to 1504 (以前后最短序列为准)

Save sequence as: C: empjc-a.aln

OK

将开始和末尾处长短不同的序列剪切整齐。这里,因为测序引物不尽相同,所以比对后序列参差不齐。一般来说,要“掐头去尾”,以避免因序列前后参差不齐而增加序列间的差异。剪切后的文件存为ALN格式。

(4) File-Load sequences-Replace existing sequences?-Yes- C: empjc-a.aln

重新载入剪切后的序列。

(5) Trees-Output Format Options

Output Files : √ CLUSTAL format tree √ Phylip format tree √ Phylip distance matrix

Bootstrap labels on: NODE

CLOSE

Trees-Exclude positions with gaps

Trees-Bootstrap N-J Tree :

Random number generator seed(1-1000) : 111

Number of bootstrap trails(1-1000): 1000

SAVE CLUSTAL TREE AS: C: empjc-a.njb

SAVE PHYLIP TREE AS: C: empjc-a.njbphb

OK → waiting……

等待时间与序列长度、数量以及计算机配置有关。在此过程中,生成进化树文件*.njbphb,可以用TreeView打开查看。

(6) Trees-Draw N-J Trees

SAVE CLUSTAL TREE AS: C: empjc-a.nj

SAVE PHYLIP TREE AS: C: empjc-a.njph

SAVE DISTANCE MATRIX AS: C: empjc-a.njphdst

OK

此过程中生成的报告文件*.nj比较有用,里面列出了比对序列两两之间的相似度,以及转换和颠换分别各占多少。

(7) TreeView

File-Open-C: empjc-a.njbphb

Tree- phylogram(unrooted, slanted cladogram,Rectangular cladogram多种树型)

Tree- Show internal edge labels (Bootstrap value)(显示数值)

Tree- Define outgroup… → ingroup >> outgroup → OK(定义外群)

Tree- Root with outgroup

通常需要对进化树进行编辑,这时首先要Edit-Copy至PowerPoint上,然后Copy至Word上,再进行图片编辑。如果直接Copy至Word则显示乱码,而进化树不能正确显示。

2、 Mega建树

虽然Clustal X可以构建系统树,但是结果比较粗放,现在一般很少用它构树,Mega因为操作简单,结果美观,很多研究者选择用它来建树。

(1) 首先用Clustal X进行序列比对,剪切后生成C: empjc-a.aln文件;(同上)

(2) 打开BioEdit程序,将目标文件格式转化为FASTA格式,

File-Open- C: empjc-a.aln,

File-Save As- C: emp jc-b.fas;

(3) 打开Mega程序,转化为mega格式并激活目标文件,

File-Convert To MEGA Format- C: emp jc-b.fas → C: emp jc-b.meg,

关闭Text Editor窗口-(Do you want to save your changes before closing?-Yes);

Click me to activate a data file- C: empjc-b.meg-OK-

(Protein-coding nucleotide sequence data?-No);

Phylogeny-Neighbor-Joining(NJ)

Distance Options-Models-Nucleotide: Kimura 2-parameter;

√d: Transitions+Transversions;

Include Sites-⊙Pairwise Deletion

Test of Phylogeny-⊙Bootstrap; Replications 1000; Random Seed 64238

OK;开始计算-得到结果;

(4) Image-Copy to Clipboard-粘贴至Word文档进行编辑。

此外,Subtree中提供了多个命令可以对生成的进化树进行编辑,Mega窗口左侧提供了很多快捷键方便使用;View中则给出了多个树型的模式。下面只介绍几种最常用的:

Subtree-Swap:任意相邻两个分支互换位置;

-Flip:所选分支翻转180度;

-Compress/Expand:合并/展开多个分支;

-Root:定义外群;

View-Topology:只显示树的拓扑结构;

-Tree/Branch Style:多种树型转换;

-Options:关于树的诸多方面的改动。

3 、TREECON

打开Clustal X,File-Load sequences-jc-a.aln,File-Save Sequence as…(Format-PHYLIP;Save from residue-1 to 末尾;Save sequence as : C: empjc.phy);

打开TREECON程序,

(1) Distance estimation

点击Distance estimation-Start distance estimation,打开上面保存的jc.phy文件,Sequence Type-Nuleic Acid Sequence,Sequence format-PHYLIP interleaved,Select ALL,OK;

Distance Estimation-Jukes&Cantor(or Kimura),Alignment positions-All,Bootstrap analysis-Yes,Insertions&Deletions-Not taken into account,OK;

Bootstrap samples-1000,OK;运算,等待……

Finished-OK。

(2) Infer tree topology

点击Infer tree topology-Start inferring tree topology,Method-Neighbor-joining, Bootstrap analysis-Yes,OK.;运算,等待……

Finished-OK。

(3) Root unrooted trees

点击Root unrooted trees-Start rooting unrooted trees,Outgroup opition-single sequence(forced),Bootstrap analysis-Yes,OK;

Select Root-X89947,OK;运算,等待……

Finished-OK。

(4) Draw phylogenetic tree

点击Draw phylogenetic tree,File-Open-(new) tree,Show-Bootstrap values/ Distance scale。

File-Copy,粘贴至Word文档,编辑。

TREECON的操作过程看起来似乎较MEGA烦琐,且运算速度明显不及MEGA,如果参数 选择一样,用它构建出来的系统树几乎和MEGA构建的完全一样,只在细节上,比如Bootstrap值二者在某些分支稍有不同。在参数选择方 面,TREECON和MEGA也有些不同,但总体上相差不大。

4、 PHYLIP

PHYLIP是多个软件的压缩包,下载后双击则自动解压。当你解压后就会发现PHYLIP的 功能极其强大,主要包括五个方面的功能软件:i,DNA和蛋白质序列数据的分析软件。ii,序列数据转变成距离数据后,对距离数据分析的软件。 iii,对基因频率和连续的元素分析的软件。iv,把序列的每个碱基/氨基酸独立看待(碱基/氨基酸只有0和1的状态)时,对序列进行分析的软件。v,按 照DOLLO简约性算法对序列进行分析的软件。vi,绘制和修改进化树的软件。在此,主要对DNA序列分析和构建系统树的功能软件进行说明。

(1) 生成PHY格式文件

首先用Clustal X等软件打开剪切后的序列文件C: empjc-a.aln另存为C: empjc.phy(使用File-Save Sequences As命令,Format项选“PHY”)。用BioEdit或记事本打开(2) 打开Phylip软件包里的SEQBOOT

seqboot.exe: can\’t find input file “infile”

Please enter a new file name> C: empjc.phy

按路径输入刚才生成的 *.PHY文件,显示如下:

Bootstrapping algorithm, version 3.6a3

Settings for this run:

D Sequence, Morph, Rest., Gene Freqs? Molecular sequences

J Bootstrap, Jackknife, Permute, Rewrite? Bootstrap

B Block size for block-bootstrapping? 1

R How many replicates? 100

W Read weights of characters? No

C Read categories of sites? No

F Write out data sets or just weights? Data sets

I Input sequences interleaved? Yes

0 Terminal type none

1 Print out the data at start of run No

2 Print indications of progress of run Yes

Y to accept these of type the letter for one to change

R

Number of replicates?

1000

0

Settings for this run:

D Sequence, Morph, Rest., Gene Freqs? Molecular sequences

J Bootstrap, Jackknife, Permute, Rewrite? Bootstrap

B Block size for block-bootstrapping? 1

R How many replicates? 1000

W Read weights of characters? No

C Read categories of sites? No

F Write out data sets or just weights? Data sets

I Input sequences interleaved? Yes

0 Terminal type IBM PC

1 Print out the data at start of run No

2 Print indications of progress of run Yes

Y to accept these of type the letter for one to change

Y

Random number seed (must be odd)?

5(any odd number)

completed replicate number 100

completed replicate number 200

completed replicate number 300

completed replicate number 400

completed replicate number 500

completed replicate number 600

completed replicate number 700

completed replicate number 800

completed replicate number 900

completed replicate number 1000

上面的D、J、R、I、O、1、2代表可选择的选项,键入这些字母后敲回车键,程序的条件就 会发生改变。D选项无须改变。J选项有三种条件可以选择,分别是Bootstrap、Jackknife和Permute。R选项让使用者输入 republicate的数目。所谓republicate就是用Bootstrap法生成的一个多序列组。根据多序列中所含的序列的数目的不同可以选取 不同的republicate。当我们设置好条件后,键入Y按回车。得到一个文件outfile:C:Program FilesPhylipexe outfile.

重命名outfile→infile。

(3) 打开dnadist.exe

Nucleic acid sequence Distance Matrix program, version 3.6a3

Settings for this run:

D Distance ? F84

G Gamma distributed rates across sites? No

T Transition/transversion ratio? 2.0

C One category of substitution rates? Yes

W Use weights for sites? No

F Use emperical base frequencies? Yes

L Form of distance matrix? Square

M Analyze multiple data sets? No

I Input sequences interleaved? Yes

0 Terminal type ?

1 Print out the data at start of run No

2 Print indications of progress of run Yes

Y to accept these of type the letter for one to change

d

D Distance ? Kimura 2-parameter

m

Multiple data sets or multiple weighs? (type D or W)

d

How many data sets?

1000

0

Settings for this run:

D Distance ? Kimura 2-parameter

G Gamma distributed rates across sites? No

T Transition/transversion ratio? 2.0

C One category of substitution rates? Yes

W Use weights for sites? No

F Use emperical base frequencies? Yes

L Form of distance matrix? Square

M Analyze multiple data sets? Yes, 1000 data sets

I Input sequences interleaved? Yes

0 Terminal type ? IBM PC

1 Print out the data at start of run No

2 Print indications of progress of run Yes

Y to accept these of type the letter for one to change

Y

选项D有四种距离模式可以选择,分别是Kimura 2-parameter、Jin/Nei、Maximum-likelihood和Jukes-Cantor。选项T一般键入一个1.5-3.0之间的数 字。选项M键入1000。运行后生成文件C:Program FilesPhylipexe outfile。

重命名outfile→infile。

(4) 打开 neighbor.exe

Neighbor-Joining/UPGMA method version 3.6a3

Settings for this run:

N Neighbor-Joining or UPGMA tree? Neighbor-Joining

O Outgroup root? No, Use as outgroup species 1

L Lower-triangular data metrix? No

R Upper-triangular data metrix? No

S Subreplication? No

J Randomize input order of species? No, Use input order

M Analyze multiple data sets? No

0 Terminal type ?

1 Print out the data at start of run No

2 Print indications of progress of run Yes

3 Print out tree Yes

4 Write out trees onto tree file? Yes

Y to accept these of type the letter for one to change

m

How many data sets?

1000

Random number seed (must be odd)?

5

Settings for this run:

N Neighbor-Joining or UPGMA tree? Neighbor-Joining

O Outgroup root? No, Use as outgroup species 1

L Lower-triangular data metrix? No

R Upper-triangular data metrix? No

S Subreplication? No

J Randomize input order of species? Yes

M Analyze multiple data sets? Yes, 1000 sets

0 Terminal type ? IBM PC

1 Print out the data at start of run No

2 Print indications of progress of run Yes

3 Print out tree Yes

4 Write out trees onto tree file? Yes

Y to accept these of type the letter for one to change

Y

生成文件C:Program FilesPhylipexe outtree&outfile。

重命名outtree→intree;outfile→infile。

(5) 打开consense.exe

Consensus tree program, version 3.6a3

Settings for this run:

C Consensus type ? Majority rule (extended)

O Outgroop root? No, use as outgroup species 1

R Trees to be treated as Rooted? No

T Terminal type ?

1 Print out the sets of the species Yes

2 Print indications of progress of run Yes

3 Print out tree Yes

4 Write out trees onto tree file? Yes

Are these settings correct?

R

T

Settings for this run:

C Consensus type ? Majority rule (extended)

R Trees to be treated as Rooted? Yes

T Terminal type ? IBM PC

1 Print out the sets of the species Yes

2 Print indications of progress of run Yes

3 Print out tree Yes

4 Write out trees onto tree file? Yes

Y

生成文件C:Program FilesPhylipexe outtree。

重命名outtree→ jc.tre。

(6 )打开TreeView

打开C:Program FilesPhylipexe jc.tre。以下操作参照前述详细说明即可。

版权声明:本文为ace9原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/ace9/archive/2011/04/29/2032720.html