В предложенной статье (Elena Zaslavsky et. Al. Antiviral response dictated by choreographed cascade of transcription factors) был проведен анализ дифференциальной экспресии в дендрических

Отчет

Иголкина Анна

1) Введение

В предложенной статье (Elena Zaslavsky et. Al. Antiviral response dictated by choreographed cascade of transcription factors) был проведен анализ дифференциальной экспресии в дендрических клетках(dendritic cell, DC) под действием вируса (Newcastle Disease Virus, NDV). Авторы сфокусировались на начальном времени ап-регуляции(up-regulation) чтобы идентифицировать гены, которые принадлежат единому регуляторному каскаду. Для этого был выделен полный набор RNA из клеток двух различных доноров через (0,1,2,4,6,8,10,12,14,16,18) и (0,1,2,6,10,18) часов после заражения NDV и контролем соответственно.

RNA были реверс-транскрибированы в cDNA. Соответствующие cRNA были гибридизованы с Affymetrix HU133plus2 Gene Chip Arrays by Mount Sinai Microarray Shared Resource Facility.

2) Исходные данные.

#1 Загружаем данные E-GEOD-18791 из ArrayExpress.

>AEset = ArrayExpress("E-GEOD-18791")

>AEset

AffyBatch objectsize of arrays=1164x1164 features (77 kb)cdf=HG-U133_Plus_2 (54675 affyids)number of samples = 57 number of genes=54675annotation=hgu133plus2notes=c("E-GEOD-18791", "", "", "", "", "", "") c("E-GEOD-18791", "", "", "", "", "", "") c("infect", "time", "", "", "", "", "") c("transcription profiling by array", "", "", "", "", "", "")

#2 Смотрим имена колонок и выделяем Факторы, которые будем изучать.

> colnames(pData(AEset)) [1] "Source.Name" [2] "Characteristics..Organism." [3] "Characteristics..cell.type." [4] "Comment..sample." [5] "Comment..donor." [6] "Comment..Sample_source_name." [7] "Comment..Sample_description." [8] "Protocol.REF" [9] "Protocol.REF.1" [10] "Protocol.REF.2" [11] "Extract.Name" [12] "Material.Type" [13] "Protocol.REF.3" [14] "Labeled.Extract.Name" [15] "Label" [16] "Material.Type.1" [17] "Protocol.REF.4" [18] "Protocol.REF.5" [19] "Hybridization.Name" [20] "Array.Design.REF" [21] "Protocol.REF.6" [22] "Scan.Name" [23] "Array.Data.File" [24] "Comment..ArrayExpress.FTP.file." [25] "Protocol.REF.7" [26] "Derived.Array.Data.Matrix.File" [27] "Comment..Derived.ArrayExpress.FTP.file." [28] "Comment..Derived.ArrayExpress.Data.Retrieval.URI."[29] "Factor.Value..infection." [30] "Factor.Value..time." [31] "Unit.TimeUnit." > fac = colnames(pData(AEset))[grep("Factor", colnames(pData(AEset)))]> fac[1] "Factor.Value..infection." "Factor.Value..time."

3) Проверка качества данных

library(affy)

biocLite("simpleaffy")

library("simpleaffy")

saqc=qc(AEset)

plot(saqc)

RLE and NUSE

The RLE (Relative Log Expression) and NUSE (Normalized Unscaled Standard Error) plots are useful and sensitive measures to assess array quality. Both are derived from a probe-level model (PLM) that computes an expression measure using M-estimator robust regression.

RLE plots are constructed using log-scale estimates for the expression of each probe set on each array. For each probe set and each array, ratios are calculated between the expression of a probe set and the median expression of this probe set across all arrays of the experiment. For each array, these relative expression values are displayed as a box plot. Since it is assumed that in most experiments only relatively few genes are differentially expressed, the boxes should be similar in range and be centered close to 0.

The other graphical representation (NUSE) represents normalized standard error (SE) estimates from the PLM fit. The SE estimates are normalized such that for each probe set, the median standard error across all arrays is equal to 1. A box plot of NUSE values is drawn for each array. On the NUSE plot, arrays with lower quality will have boxes that are centered higher and/or have a larger spread than the other good quality arrays from the same experiment. Typically, boxes centered above 1.1 represent arrays that have quality problems which are often detected in several of the other QC measures presented in this chapter.

biocLite("affyPLM")

library("affyPLM")

dataPLM = fitPLM(AEset) #model

boxplot(dataPLM, main="NUSE", ylim = c(0.95, 1.22), outline = FALSE, col="lightblue", las=3, whisklty=0, staplelty=0)

Mbox(dataPLM, main="RLE", ylim = c(-0.4, 0.4), outline = FALSE, col="mistyrose", las=3, whisklty=0, staplelty=0)

rAEset = rma(AEset)

fAEset = varFilter(rAEset, var.cuto=0.5)

> fAEset

ExpressionSet (storageMode: lockedEnvironment)

assayData: 27337 features, 57 samples

element names: exprs

protocolData

sampleNames: GSM466186 GSM466187... GSM466229 (57 total)

varLabels: ScanDate

varMetadata: labelDescription

phenoData

sampleNames: GSM466186 GSM466187... GSM466229 (57 total)

varLabels: Source.Name Characteristics..Organism.... Unit.TimeUnit.

(31 total)

varMetadata: labelDescription

featureData: none

experimentData: use 'experimentData(object)'

Annotation: hgu133plus2

library("limma")

fac_infection <- colnames(pData(AEset))[grep("infection", colnames(pData(AEset)))]

fac_time <- colnames(pData(AEset))[grep("time", colnames(pData(AEset)))]

> fac_infection

[1] "Factor.Value..infection."

> fac_time

[1] "Factor.Value..time."

groups_infection = pData(fAEset)[, fac_infection]

groups_time = pData(fAEset)[, fac_time]

> groups_infection

[1] "Newcastle disease virus" "Newcastle disease virus"

[3] "Newcastle disease virus" "Newcastle disease virus"

[5] "Newcastle disease virus" "Newcastle disease virus"

[7] "Newcastle disease virus" "Newcastle disease virus"

[9] "Newcastle disease virus" "Newcastle disease virus"

[11] "Newcastle disease virus" "Newcastle disease virus"

[13] "Newcastle disease virus" "Newcastle disease virus"

[15] "Newcastle disease virus" "Newcastle disease virus"

[17] "Newcastle disease virus" "Newcastle disease virus"

[19] "Newcastle disease virus" "Newcastle disease virus"

[21] "Newcastle disease virus" "Newcastle disease virus"

[23] "Newcastle disease virus" "Newcastle disease virus"

[25] "Newcastle disease virus" "Newcastle disease virus"

[27] "Newcastle disease virus" "Newcastle disease virus"

[29] "Newcastle disease virus" "Newcastle disease virus"

[31] "Newcastle disease virus" "Newcastle disease virus"

[33] "Newcastle disease virus" "Newcastle disease virus"

[35] "Newcastle disease virus" "Newcastle disease virus"

[37] "Newcastle disease virus" "Newcastle disease virus"

[39] "Newcastle disease virus" "none"

[41] "none" "none"

[43] "none" "none"

[45] "none" "none"

[47] "none" "none"

[49] "none" "none"

[51] "none" "none"

[53] "none" "none"

[55] "none" "none"

[57] "none"

> groups_time

[1] 1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16 18 1 2 4 6 8 10

[26] 12 14 16 18 1 2 4 6 8 10 12 14 16 18 0 6 10 0 1 2 6 10 18 0 1

[51] 2 6 10 18 0 6 10

groups_infection[groups_infection == "none"] = "Ref"

groups_infection[groups_infection == "Newcastle disease virus"] = "Inf"

> groups_infection

[1] "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf"

[13] "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf"

[25] "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf" "Inf"

[37] "Inf" "Inf" "Inf" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref"

[49] "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref" "Ref"

groups = paste0(groups_infection,groups_time)

> groups

[1] "Inf1" "Inf2" "Inf4" "Inf6" "Inf8" "Inf10" "Inf12" "Inf14" "Inf16"

[10] "Inf1" "Inf2" "Inf4" "Inf6" "Inf8" "Inf10" "Inf12" "Inf14" "Inf16"

[19] "Inf18" "Inf1" "Inf2" "Inf4" "Inf6" "Inf8" "Inf10" "Inf12" "Inf14"

[28] "Inf16" "Inf18" "Inf1" "Inf2" "Inf4" "Inf6" "Inf8" "Inf10" "Inf12"

[37] "Inf14" "Inf16" "Inf18" "Ref0" "Ref6" "Ref10" "Ref0" "Ref1" "Ref2"

[46] "Ref6" "Ref10" "Ref18" "Ref0" "Ref1" "Ref2" "Ref6" "Ref10" "Ref18"

[55] "Ref0" "Ref6" "Ref10"

fac = factor(groups)

fac

> fac

[1] Inf1 Inf2 Inf4 Inf6 Inf8 Inf10 Inf12 Inf14 Inf16 Inf1 Inf2 Inf4

[13] Inf6 Inf8 Inf10 Inf12 Inf14 Inf16 Inf18 Inf1 Inf2 Inf4 Inf6 Inf8

[25] Inf10 Inf12 Inf14 Inf16 Inf18 Inf1 Inf2 Inf4 Inf6 Inf8 Inf10 Inf12

[37] Inf14 Inf16 Inf18 Ref0 Ref6 Ref10 Ref0 Ref1 Ref2 Ref6 Ref10 Ref18

[49] Ref0 Ref1 Ref2 Ref6 Ref10 Ref18 Ref0 Ref6 Ref10

16 Levels: Inf1 Inf10 Inf12 Inf14 Inf16 Inf18 Inf2 Inf4 Inf6 Inf8 Ref0... Ref6

строим матрицу дизайна

desmat = model.matrix(~ 0 + fac)

colnames(desmat)=sub("fac","",colnames(desmat))

desmat

> desmat

Inf1 Inf10 Inf12 Inf14 Inf16 Inf18 Inf2 Inf4 Inf6 Inf8 Ref0 Ref1 Ref10 Ref18 Ref2 Ref6

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

5 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

7 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

9 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

11 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

12 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

13 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

15 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

16 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

17 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

22 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

23 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

24 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

25 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

26 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

27 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

28 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

29 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

30 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

31 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

32 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

33 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

34 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

35 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

36 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

37 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

38 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

39 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

40 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

42 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

43 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

44 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

47 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

48 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

49 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

50 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

53 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

54 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

55 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

57 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

attr(,"assign")

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

attr(,"contrasts")

attr(,"contrasts")$fac

[1] "contr.treatment"

conmat <- makeContrasts(Ref1-Inf1,Ref2-Inf2,Ref6-Inf6, Ref10-Inf10,Ref18-Inf18,levels=desmat)

> conmat

Contrasts

Levels Ref1 - Inf1 Ref2 - Inf2 Ref6 - Inf6 Ref10 - Inf10 Ref18 - Inf18

Inf1 -1 0 0 0 0

Inf10 0 0 0 -1 0

Inf12 0 0 0 0 0

Inf14 0 0 0 0 0

Inf16 0 0 0 0 0

Inf18 0 0 0 0 -1

Inf2 0 -1 0 0 0

Inf4 0 0 0 0 0

Inf6 0 0 -1 0 0

Inf8 0 0 0 0 0

Ref0 0 0 0 0 0

Ref1 1 0 0 0 0

Ref10 0 0 0 1 0

Ref18 0 0 0 0 1

Ref2 0 1 0 0 0

Ref6 0 0 1 0 0

строим модель

fit = lmFit(fAEset, desmat)

fitc <- contrasts.fit(fit, conmat)

fit2 = eBayes(fitc)

строим на дифференциальную экспрессию up и down гены

results <- decideTests(fit2, method="separate", adjust.method="holm", p.value=0.05, lfc=1)

> results

TestResults matrix

Contrasts

Ref1 - Inf1 Ref2 - Inf2 Ref6 - Inf6 Ref10 - Inf10 Ref18 - Inf18

1007_s_at 0 0 0 0 0

1053_at 0 0 0 0 0

117_at 0 0 0 0 0

121_at 0 0 0 0 0

1255_g_at 0 0 0 0 0

27332 more rows...

up <- which(results[,4] == 1)

upGenes <- fit2[up, ]

tup <- upGenes[,5]

down <- which(results[,4] == -1)

downGenes <- fit2[down, ]

tdown <- downGenes[,4]

par(mfrow=c(3,2))

volcanoplot(fit2, coef=1, highlight=15)

volcanoplot(fit2, coef=2, highlight=15)

volcanoplot(fit2, coef=3, highlight=15)

volcanoplot(fit2, coef=4, highlight=15)

volcanoplot(fit2, coef=5, highlight=15)

up <- which(results[, 4] == 1)

upGenes <- fit2[up, ]

tup <- upGenes[,5]

down <- which(results[,4] == -1)

downGenes <- fit2[down, ]

tdown <- downGenes[,4]

# 1: 1

topgenes1 = topTable(fit2, coef = 1, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

ngenes1=dim(topgenes1)[1]

ngenes1

topUPgenes1 = topTable(upGenes, coef = 1, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nUPgenes1=dim(topUPgenes1)[1]

nUPgenes1

topDOWNgenes1 = topTable(downGenes, coef = 1, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nDOWNgenes1=dim(topDOWNgenes1)[1]

nDOWNgenes1

# 2: 2

topgenes2 = topTable(fit2, coef = 2, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

ngenes2=dim(topgenes2)[1]

ngenes2

topUPgenes2 = topTable(upGenes, coef = 2, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nUPgenes2=dim(topUPgenes2)[1]

nUPgenes2

topDOWNgenes2 = topTable(downGenes, coef = 2, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nDOWNgenes2=dim(topDOWNgenes2)[1]

nDOWNgenes2

# 3: 6

topgenes6 = topTable(fit2, coef = 3, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

ngenes6=dim(topgenes6)[1]

ngenes6

topUPgenes6 = topTable(upGenes, coef = 3, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nUPgenes6=dim(topUPgenes6)[1]

nUPgenes6

topDOWNgenes6 = topTable(downGenes, coef = 3, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nDOWNgenes6=dim(topDOWNgenes6)[1]

nDOWNgenes6

# 4: 10

topgenes10 = topTable(fit2, coef = 4, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

ngenes10=dim(topgenes10)[1]

ngenes10

topUPgenes10 = topTable(upGenes, coef = 4, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nUPgenes10=dim(topUPgenes10)[1]

nUPgenes10

topDOWNgenes10 = topTable(downGenes, coef = 4, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nDOWNgenes10=dim(topDOWNgenes10)[1]

nDOWNgenes10

# 5: 18

topgenes18 = topTable(fit2, coef = 5, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

ngenes18=dim(topgenes18)[1]

ngenes18

topUPgenes18 = topTable(upGenes, coef = 5, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nUPgenes18=dim(topUPgenes18)[1]

nUPgenes18

topDOWNgenes18 = topTable(downGenes, coef = 5, adjust = "holm", number = Inf, p.value = 0.05, lfc=1)

nDOWNgenes18=dim(topDOWNgenes18)[1]

nDOWNgenes18

library(xtable)

restable <- data.frame(up=c(nUPgenes1,nUPgenes2,nUPgenes6,

+ nUPgenes10,nUPgenes18),down=c(nDOWNgenes1,nDOWNgenes2,nDOWNgenes6,

+ nDOWNgenes10,nDOWNgenes18),tot=c(ngenes1,ngenes2,ngenes6,

+ ngenes10,ngenes18))

rownames(restable) <- c("1 h", "2 h", "6 h", "10 h", "18 h")

colnames(restable) <- c("N up", "N down", "N tot")

xtable(restable, caption = "Resulting numbers of genes", label = "numtab")

\begin{table}[ht]

\centering

\begin{tabular}{rrrr}

\hline

& N up & N down & N tot \\

\hline

1 h & 0 & 0 & 0 \\

2 h & 0 & 5 & 1 \\

6 h & 0 & 296 & 205 \\

10 h & 256 & 475 & 731 \\

18 h & 184 & 395 & 648 \\

\hline

\end{tabular}

\caption{Resulting numbers of genes}

\label{numtab}

\end{table}

Иерархическая кластеризация

biocLite("amap")

library(amap)

heatcol<-colorRampPalette(c("Green", "Red"))(32)

fAEset1=fAEset[topgenes1[, "ID"], ]

fAEset1=fAEset1[,pData(fAEset1)$Factor.Value..time==1]

fAEset2=fAEset[topgenes2[, "ID"], ]

fAEset2=fAEset2[,pData(fAEset2)$Factor.Value..time==2]

fAEset6=fAEset[topgenes6[, "ID"], ]

fAEset6=fAEset6[,pData(fAEset6)$Factor.Value..time==6]

fAEset10=fAEset[topgenes10[, "ID"], ]

fAEset10=fAEset10[,pData(fAEset10)$Factor.Value..time==10]

fAEset18=fAEset[topgenes18[, "ID"], ]

fAEset18=fAEset18[,pData(fAEset18)$Factor.Value..time==18]

иерархическая кластеризация будет производиться с помощью корреляции пирсона

(по каждому времени)

матрица

m6 = exprs(fAEset6)

clust.genes<-hcluster(x=m6, method="pearson", link="average")

clust.arrays<-hcluster(x=t(m6), method="pearson", link="average")

heatmap(x=as.matrix(m6), Rowv=as.dendrogram(clust.genes),Colv=as.dendrogram(clust.arrays), col=heatcol, margin = c(5, 8))

m10 = exprs(fAEset10)

clust.genes<-hcluster(x=m10, method="pearson", link="average")

clust.arrays<-hcluster(x=t(m10), method="pearson", link="average")

heatmap(x=as.matrix(m10), Rowv=as.dendrogram(clust.genes),Colv=as.dendrogram(clust.arrays), col=heatcol, margin = c(5, 8))

m18 = exprs(fAEset18)

clust.genes<-hcluster(x=m18, method="pearson", link="average")

clust.arrays<-hcluster(x=t(m18), method="pearson", link="average")

heatmap(x=as.matrix(m18), Rowv=as.dendrogram(clust.genes),Colv=as.dendrogram(clust.arrays), col=heatcol, margin = c(5, 8))

m10

m18

Кластеризация К-средних

> plot(2:kmax, km, xlab="K", ylab = "sum(witinss)",

+ type = "b", pch = "o", main = "6 h, < 1%")

> OPTIMALK6

[1] 7

KMAX = 100

kmax <- c(KMAX)

if (nrow(m10) < KMAX)

{

kmax <- nrow(m10)

}

km <- rep(NA, (kmax - 1))

i <- c(2)

OPTIMALK = i

while(i < kmax)

{

km[i] <-sum(kmeans(m10, i, iter.max = 20000, nstart = 10)$withinss)

if(i >= 3 & km[i-1]/km[i] <= 1.01)

{

OPTIMALK = 1

i <- kmax

}

else

{

i <- i+1

}

OPTIMALK10 <- ceiling(OPTIMALK / 3)

OPTIMALK10

plot(2:kmax, km, xlab="K", ylab = "sum(witinss)", type = "b", pch = "o", main = "10 h, < 1%")

> OPTIMALK10

[1] 9

KMAX = 100

kmax <- c(KMAX)

if (nrow(m18) < KMAX)

{

kmax <- nrow(m18)

}

km <- rep(NA, (kmax - 1))

i <- c(2)

OPTIMALK = i

while(i < kmax)

{

km[i] <-sum(kmeans(m18, i, iter.max = 20000, nstart = 18)$withinss)

if(i >= 3 & km[i-1]/km[i] <= 1.01)

{

OPTIMALK = i

i <- kmax

}

else

{

i <- i+1

}

OPTIMALK18 <- ceiling(OPTIMALK / 3)

OPTIMALK18

> OPIMALK18

[1] 13

plot(2:kmax, km, xlab="K", ylab = "sum(witinss)", type = "b", pch = "o", main = "6 h, < 1%")

cl <- kmeans(x = as.matrix(m6), OPTIMALK6, iter.max = 100000, nstart = 10)

maxm <- max(m6)

minm <- min(m6)

par(mfrow = c(ceiling(sqrt(OPTIMALK6)), ceiling(sqrt(OPTIMALK6))))

for (i in 1:OPTIMALK6)

{

matplot(t(m6[cl$cluster == i,]), type = "b",

main = paste("cluster:", i), ylab = "log expression",

col = 1, lty = 1, ylim = c(minm, maxm))

}

cl <- kmeans(x = as.matrix(m10), OPTIMALK10, iter.max = 100000, nstart = 10)

maxm <- max(m10)

minm <- min(m10)

par(mfrow = c(ceiling(sqrt(OPTIMALK10)), ceiling(sqrt(OPTIMALK10))))

for (i in 1:OPTIMALK10)

{

matplot(t(m10[cl$cluster == i,]), type = "b",

main = paste("cluster:", i), ylab = "log expression",

col = 1, lty = 1, ylim = c(minm, maxm))

cl <- kmeans(x = as.matrix(m18), OPTIMALK18, iter.max = 100000, nstart = 10)

maxm <- max(m18)

minm <- min(m18)

par(mfrow = c(ceiling(sqrt(OPTIMALK18)), ceiling(sqrt(OPTIMALK18))))

for (i in 1:OPTIMALK18)

{

matplot(t(m18[cl$cluster == i,]), type = "b",

main = paste("cluster:", i), ylab = "log expression",

col = 1, lty = 1, ylim = c(minm, maxm))

}

Самоорганизующиеся карты

Дата добавления: 2015-11-04; просмотров: 31 | Нарушение авторских прав

<== предыдущая лекция	\|	следующая лекция ==>
Rock in Opposition Avant-rock Experimental music	\|	Refrigerators and Sleeping Coaches

mybiblioteka.su - 2015-2024 год. (0.079 сек.)