关键词:

lncRNA表达量低,所以要看lncRNA的表达量变化,就要比普通RNA-seq多测一些。

要兼顾SNP低表达量的lncRNA,要测得更深一些~

到底需要测多少数据量呢?

 

我们看看权威的ENCODE对RNA-seq的测序深度是如何评价的:

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011)

The ENCODE Consortium

 

Sequencing depth.

The amount of sequencing needed for a given sample is determined by the goals of the experiment and the nature of the RNA sample. Experiments whose purpose is to evaluate the similarity between the transcriptional profiles of two polyA+ samples may require only modest depths of sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are mappable to the genome or known transcriptome, Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms requires more extensive sequencing.

 

The ability to detect reliably low copy number transcripts/isoforms depends upon the depth of sequencing and on a sufficiently complex library. For experiments from a typical mammalian tissue or in which sensitivity of detection is important, a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.

[Specialized studies in which the prevalence of different RNAs has been intentionally altered (e.g. “normalizing” using DSN) as part of sample preparation need more than the read amounts (>30M paired end reads) used for simple comparison (see above). Reasons for this include:

(1) overamplification of inserts as a result of an additional round of PCR after DSN and

(2) much more broad coverage given the nature of A(-) and low abundance transcripts.

权威的话转换如下:

根据研究目的决定测序深度:

目的1:通过抓取polyA尾巴建库(只测那些带有polyA尾巴的基因,大多是蛋白编码基因),

寻找样品间基因转录谱的相似性,只需要30M reads,长度大于30nt即可,双端测序,其中20-25M能够回帖到已知转录组上。

 

目的2:要发现新的转录本,对已知isoform(同一基因由于不同的可变剪接方式形成多种isoform,勉强译为亚型)进行定量分析,

兼顾低表达量的转录本isoform,就需要100-200M read,长度大于76bp,双端测序。

lncRNA-seq属于这一类型。

注:ENCODE测的是人和小鼠,其他物种不包括在此推荐范围内。

 

另外,miRNA测序,只需要10M read,每条read长50bp,单端测序。

ChIP-seq,需要20M read,每条read长50bp,单端测序。

 

销售只说多少G,不说reads数,如何把reads数换算成G呢?

这跟测序长度有关:

PE150或2*150,即 双端测序,每条read长度150bp。

150bp X 2端 X read数 = 数据量

例如,测50M read,150bp X 2端 X 50M read = 15000M = 15G

注:对于双端测序,一个RNA片段,即fragment,也叫read,会测出来2条序列。

 

SE50或1*50,即 单端测序,每条read长度50bp。

50bp X 1端 X read数 = 数据量

例如,测20M read,50bp X 1端 X 20M read = 1000M = 1G

 

再絮叨一句:这里的G是碱基数(Gbase,Gb),跟你看到的文件大小(gigabyte,GB)不是一回事哦~

测序公司给你的文件通常是压缩的fastq格式,里面有read ID号,有碱基,有每个碱基的质量

小哈看到文件大小就感觉数据量不够,是基于经验的推测,要明确测了多少数据量,跑一个FastQC或RSeQC就知道了。

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐