测多少数据量？几个G？多少reads？如何换算？

wangchuang2017

42880人浏览 · 2018-11-17 14:59:29

wangchuang2017 · 2018-11-17 14:59:29 发布

关键词：

lncRNA表达量低，所以要看lncRNA的表达量变化，就要比普通RNA-seq多测一些。

要兼顾SNP和低表达量的lncRNA，要测得更深一些~

到底需要测多少数据量呢？

我们看看权威的ENCODE对RNA-seq的测序深度是如何评价的：

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011)

The ENCODE Consortium

Sequencing depth.

The amount of sequencing needed for a given sample is determined by the goals of the experiment and the nature of the RNA sample. Experiments whose purpose is to evaluate the similarity between the transcriptional profiles of two polyA+ samples may require only modest depths of sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are mappable to the genome or known transcriptome, Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms requires more extensive sequencing.

The ability to detect reliably low copy number transcripts/isoforms depends upon the depth of sequencing and on a sufficiently complex library. For experiments from a typical mammalian tissue or in which sensitivity of detection is important, a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.

[Specialized studies in which the prevalence of different RNAs has been intentionally altered (e.g. “normalizing” using DSN) as part of sample preparation need more than the read amounts (>30M paired end reads) used for simple comparison (see above). Reasons for this include:

(1) overamplification of inserts as a result of an additional round of PCR after DSN and

(2) much more broad coverage given the nature of A(-) and low abundance transcripts.

权威的话转换如下：

根据研究目的决定测序深度：

目的1：通过抓取polyA尾巴建库（只测那些带有polyA尾巴的基因，大多是蛋白编码基因），

寻找样品间基因转录谱的相似性，只需要30M reads，长度大于30nt即可，双端测序，其中20-25M能够回帖到已知转录组上。

目的2：要发现新的转录本，对已知isoform（同一基因由于不同的可变剪接方式形成多种isoform，勉强译为亚型）进行定量分析，

兼顾低表达量的转录本或isoform，就需要100-200M read，长度大于76bp，双端测序。

lncRNA-seq属于这一类型。

注：ENCODE测的是人和小鼠，其他物种不包括在此推荐范围内。

另外，miRNA测序，只需要10M read，每条read长50bp，单端测序。

ChIP-seq，需要20M read，每条read长50bp，单端测序。