Hi-C data analysis tools and papers
Hi-C data analysis tools and papers
全文链接如下:
https://github.com/mdozmorov/HiC_tools
Tools are sorted by publication date, newest on top. Unpublished tools are listed at the end of each section. Related repositories: HiC_data, scHiC_notes. Please, open an issue or a pull request to add other information, tools, or user experience.
Table of content
Pipelines for Hi-C data processing
Capture-C
HiChIP
4C
Resolution improvement
Simulation
Normalization of Hi-C data
CNV-aware normalization
Reproducibility and QC of Hi-C data
Loop callers
Capture-C peaks
Differential interactions
TAD callers
Differential TAD analysis
Prediction of 3D features
SNP-oriented Hi-C analysis
CNV and Structural variant detection
Visualization
De novo genome scaffolding
3D modeling
Papers
Micro-C
Multi-way interactions
Methodological Reviews
General Reviews
Technology
Normalization
TAD detection
Hi-C prediction
Spectral clustering
URLs
Pipelines
Juicer - Java full pipeline to convert raw reads into Hi-C maps, visualized in Juicebox. Call domains, loops, CTCF binding sites. .hic file format for storing multi-resolution Hi-C data. https://github.com/theaidenlab/juicebox/wiki/Download
Durand, Neva C., Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. “Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments.” Cell Systems 3, no. 1 (July 2016)
Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. “A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.” Cell 159, no. 7 (December 18, 2014) - Juicer analysis example. TADs defined by frequent interactions. Enriched in CTCF and cohesin members. Five domain types. A1 and A2 enriched in genes. Chr 19 contains 6th pattern B6. Enrichment in different histone modification marks. TADs are preserved across cell types. Yet, differences between Gm12878 and IMR90 were detected. Boundaries detection by scanning image. Refs to the original paper.
HiC-Pro - Python and command line-based optimized and flexible pipeline for Hi-C data processing, https://github.com/nservant/HiC-Pro
Servant, Nicolas, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chong-Jian Chen, Jean-Philippe Vert, Edith Heard, Job Dekker, and Emmanuel Barillot. “HiC-Pro: An Optimized and Flexible Pipeline for Hi-C Data Processing.” Genome Biology 16 (December 1, 2015) - HiC pipeline, references to other pipelines, comparison. From raw reads to normalized matrices. Normalization methods, fast and memory-efficient implementation of iterative correction normalization (ICE). Data format. Using genotyping information to phase contact maps.
HiCExplorer - set of programs to process, normalize, analyze and visualize Hi-C data, Python, .cool format, conversion utilities. https://hicexplorer.readthedocs.io/en/latest/, https://github.com/deeptools/HiCExplorer/
Ramírez, Fidel, Vivek Bhardwaj, Laura Arrigoni, Kin Chung Lam, Björn A. Grüning, José Villaveces, Bianca Habermann, Asifa Akhtar, and Thomas Manke. “High-Resolution TADs Reveal DNA Sequences Underlying Genome Organization in Flies.” Nature Communications 9, no. 1 (December 2018)
Galaxy HiCExplorer - a web server for Hi-C data preprocessing, QC, visualization. Web interface, https://hicexplorer.usegalaxy.eu/, Docker container, https://github.com/deeptools/docker-galaxy-hicexplorer
Wolff, Joachim, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, and Björn A. Grüning. “Galaxy HiCExplorer: A Web Server for Reproducible Hi-C Data Analysis, Quality Control and Visualization.” Nucleic Acids Research 46, no. W1 (July 2, 2018)
HiCUP - Perl-based pipeline, alignment only, output - BAM files. http://www.bioinformatics.babraham.ac.uk/projects/hicup/
Wingett, Steven, Philip Ewels, Mayra Furlan-Magaril, Takashi Nagano, Stefan Schoenfelder, Peter Fraser, and Simon Andrews. “HiCUP: Pipeline for Mapping and Processing Hi-C Data.” F1000Research 4 (2015) - HiCUP pipeline, alignment only, removes artifacts (religations, duplicate reads) creating BAM files. Details about Hi-C sequencing artifacts. Used in conjunction with other pipelines.
FAN-C - Python pipeline for Hi-C processing. Input - raw FASTQ (aligned using BWA or Bowtie2, artifact filtering) or pre-aligned BAMs. KR or ICE normalization. Analysis and Visualization (contact distance decay, A/B compartment detection, TAD/loop detection, Average TADs, triangular heatmaps, comparison of two heatmaps). Automatic or modular. Compatible with .cool and .hic formats. https://github.com/vaquerizaslab/fanc, Tweet
Kruse, Kai, Clemens B Hug, and Juan M Vaquerizas. “FAN-C: A Feature-Rich Framework for the Analysis and Visualisation of C Data.” Preprint. Genomics, February 4, 2020
GITAR - full Hi-C pre-processing, normalization, TAD detection, and visualization. Python scripts wrapping other tools. Table 1 summarizes the functionality of existing tools. https://www.genomegitar.org/https://github.com/Zhong-Lab-UCSD/HiCtool
Calandrelli, Riccardo, Qiuyang Wu, Jihong Guan, and Sheng Zhong. “GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data.” Genomics, Proteomics & Bioinformatics 16, no. 5 (2018): 365–72. https://doi.org/10.1016/j.gpb.2018.06.006.
HiCdat - Hi-C processing pipeline and downstream analysis/visualization. Analyses: normalization, correlation, visualization, comparison, distance decay, PCA, interaction enrichment test, epigenomic enrichment/depletion. https://github.com/MWSchmid/HiCdat
Schmid, Marc W., Stefan Grob, and Ueli Grossniklaus. “HiCdat: A Fast and Easy-to-Use Hi-C Data Analysis Tool.” BMC Bioinformatics 16 (September 3, 2015)
HiC-bench - complete pipeline for Hi-C data analysis. https://github.com/NYU-BFX/hic-bench
Lazaris, Charalampos, Stephen Kelly, Panagiotis Ntziachristos, Iannis Aifantis, and Aristotelis Tsirigos. “HiC-Bench: Comprehensive and Reproducible Hi-C Data Analysis Designed for Parameter Exploration and Benchmarking.” BMC Genomics 18, no. 1 (December 2017)
TADbit - TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit, the user can map FASTsQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models. https://github.com/3DGenomes/tadbit
Serra, François, Davide Baù, Mike Goodstadt, David Castillo, Guillaume J. Filion, and Marc A. Marti-Renom. “Automatic Analysis and 3D-Modelling of Hi-C Data Using TADbit Reveals Structural Features of the Fly Chromatin Colors.” PLoS Computational Biology 13, no. 7 (July 2017)
HiC_Pipeline - Python-based pipeline performing mapping, filtering, binning, and ICE-correcting Hi-C data, from raw reads (.sra, .fastq) to contact matrices. Additionally, converting to sparse format, performing QC. https://github.com/XiaoTaoWang/HiC_pipeline
HiCpipe - an efficient Hi-C data processing pipeline. It is based on Juicer and HiC-pro, which combines the advantages of these two processing pipelines. HiCpipe is much faster than Juicer and HiC-pro and can output multiple features of Hi-C maps. https://github.com/ChenFengling/HiCpipe
ENCODE project Data Production and Processing Standard of the Hi-C Mapping Center, PDF
cword - perl cworld module and collection of utility/analysis scripts for C data (3C, 4C, 5C, Hi-C). https://github.com/dekkerlab/cworld-dekker
my5C - web-based tools, well-documented analysis and visualization of 5S data, http://my5c.umassmed.edu/
nf-core-hic - Analysis of Chromosome Conformation Capture data (Hi-C and more), Nextflow pipeline. https://github.com/nservant/nf-core-hic. Also, nf-core/hic
distiller-nf - Java modular Hi-C mapping pipeline for reproducible data analysis, nextflow pipeline. Alignment, filtering, aggregating Hi-C matrices. https://github.com/mirnylab/distiller-nf
4D Nucleome Hi-C Processing Pipeline, set of scripts wrapped in a Docker image. Works with .hic and .cool files. Overview, https://github.com/4dn-dcic/docker-4dn-hic
Abdennur, Nezar, and Leonid Mirny. “Cooler: Scalable Storage for Hi-C Data and Other Genomically-Labeled Arrays.” BioRxiv, February 22, 2019.
cooler file format for storing Hi-C matrices, sparse, hierarchical, multi-resolution. cooler Python package for data loading, aggregation, merging, normalization (balancing), viewing, exporting data. Together with “pairs” text-based format, and hic, cooler is accepted by the 4D Nucleome Consortium DAC. https://github.com/mirnylab/cooler,https://cooler.readthedocs.io/en/latest/
cooltools - tools to work with .cool files, Documentation, https://github.com/mirnylab/cooltools
hiclib - Python tools to QC, map, normalize, filter and analyze Hi-C data, https://bitbucket.org/mirnylab/hiclib
hic2cool - Lightweight converter between hic and cool contact matrices. https://github.com/4dn-dcic/hic2cool
pairtools - tools for low-level processing of mapped Hi-C paired reads. https://github.com/mirnylab/pairtools. Documentation
Capture-C
CaptureCompendium - all-in-one toolkit for the design, analysis and presentation of 3C experiments, combines oligonucleotide design Capsequm2, sequence mapping and extraction CCseqBasic, statistical data presentation and distribution CaptureCompare with Peaky integration, CaptureSee. Allows for multi-way interactions (Tri-C). Overview of previous tools doing parts. http://userweb.molbiol.ox.ac.uk/public/telenius/CaptureCompendium/
Telenius, Jelena M., Damien J. Downes, Martin Sergeant, A. Marieke Oudelaar, Simon McGowan, Jon Kerry, Lars L.P. Hanssen, et al. “CaptureCompendium: A Comprehensive Toolkit for 3C Analysis.” Preprint. Bioinformatics, February 18, 2020.
GOPHER - probe design for Capture Hi-C. All, or selected, promoters, or around GWAS hits. https://github.com/TheJacksonLaboratory/Gopher
Hansen, Peter, Salaheddine Ali, Hannah Blau, Daniel Danis, Jochen Hecht, Uwe Kornak, Darío G. Lupiáñez, Stefan Mundlos, Robin Steinhaus, and Peter N. Robinson. “GOPHER: Generator Of Probes for Capture Hi-C Experiments at High Resolution.” BMC Genomics 20, no. 1 (December 2019).
capC-MAP - Capture-C analysis pipeline. Python and C++, run through a configuration file. Outputs bedGraph. Compared with HiC-Pro, better detects PCR duplicates, identifies more interactions. Normalization tuned for Capture-C data. https://github.com/cbrackley/capC-MAP, https://capc-map.readthedocs.io/
Buckle, Adam, Nick Gilbert, Davide Marenduzzo, and Chris A Brackley. “CapC-MAP: A Software Package for Analysis of Capture-C Data.” Preprint. Genomics, October 30, 2018.
HiChIP
CID - Chromatin Interaction Discovery, call chromatin interactions from ChIA-PET. Outperforms ChIA-PET2, MANGO pipelines, call more peaks than HICCUPS, hichipper. Java implementation, https://groups.csail.mit.edu/cgs/gem/cid/
Guo, Yuchun, Konstantin Krismer, Michael Closser, Hynek Wichterle, and David K Gifford. “High Resolution Discovery of Chromatin Interactions.” Nucleic Acids Research, February 14, 2019.
HiChIP-Peak - HiChIP peak caller, focus on peaks at re-ligation sites. Peak filtering, then negative binomial model. Differential peak analysis similar to DiffBind. https://github.com/ChenfuShi/HiChIP_peaks
Shi, Chenfu, Magnus Rattray, and Gisela Orozco. “HiChIP-Peaks: A HiChIP Peak Calling Algorithm.” Preprint. Bioinformatics, June 27, 2019.
4C
pipe4C - 4C-seq processing pipeline, R-based, https://github.com/deLaatLab/pipe4C
Krijger, Peter H.L., Geert Geeven, Valerio Bianchi, Catharina R.E. Hilvering, and Wouter de Laat. “4C-Seq from Beginning to End: A Detailed Protocol for Sample Preparation and Data Analysis.” Methods 170 (January 2020)
peakC - an R package for non-parametric peak calling in 4C/Capture-c/PCHiC data. https://github.com/deWitLab/peakC
Geeven, Geert, Hans Teunissen, Wouter de Laat, and Elzo de Wit. “PeakC: A Flexible, Non-Parametric Peak Calling Package for 4C and Capture-C Data.” Nucleic Acids Research 46, no. 15 (September 6, 2018)
4Cseqpipe processing pipeline and a genome-wide 4C primer database, http://compgenomics.weizmann.ac.il/tanay/?page_id=367/
Werken, Harmen J. G. van de, Gilad Landan, Sjoerd J. B. Holwerda, Michael Hoichman, Petra Klous, Ran Chachik, Erik Splinter, et al. “Robust 4C-Seq Data Analysis to Screen for Regulatory DNA Interactions.” Nature Methods 9, no. 10 (October 2012) - 4C technology paper. Two different 4bp cutters to increase resolution. Investigation of beta-globin locus, interchromosomal interactions.
Resolution improvement
HiCSR - enhancement of Hi-C contact maps using a Generative Adversarial Network trained to optimize a custom loss function (weighted adversarial loss, pixel-wise L1 loss, and a feature reconstruction loss). An increase in resolution refers to recovering additional Hi-C contacts, “saturating” downsampled and noisy Hi-C matrices, not increasing the number of pixels. Representation learning with autoencoder with several convolutional layers and skip connections, then using it for the generator to create new matrices with discriminator telling them fake or real. Compared with HiCPlus, HiCNN, hicGAN, DeepHiC. Reproducibility is better using four metrics. Python3 PyTorch implementation https://github.com/PSI-Lab/HiCSR
Dimmick, Michael C., Leo J. Lee, and Brendan J. Frey. “HiCSR: A Hi-C Super-Resolution Framework for Producing Highly Realistic Contact Maps.” Preprint. Genomics, February 25, 2020.
DeepHiC - a generative adversarial network (GAN) for enhancing Hi-C data. Does not change the bin size, enhances the content of Hi-C data. Reconstructs the content from ~1% of the original data. Outperforms BoostHiC, HiCPlus, HiCNN. Online tool: http://sysomics.com/deephic/, code: https://github.com/omegahh/DeepHiC
Hong, Hao, Shuai Jiang, Hao Li, Cheng Quan, Chenghui Zhao, Ruijiang Li, Wanying Li, et al. “DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution.” Preprint. Bioinformatics, July 29, 2019.
hicGAN - improving resolution (saturation) of Hi-C data using Generative Adversarial Networks. Generator - five inner residual blocks to fight vanishing gradient (each block has two convolutional layers and batch normalization) and an outer skip connection. The discriminator has three convolutional blocks. Evaluation metrics: MSE, signal-to-noise ratio, structure similarity index, chromatin loop score. Compared against HiCPlus. Python, Tensorflow implementation
Liu, Qiao, Hairong Lv, and Rui Jiang. “HicGAN Infers Super Resolution Hi-C Data with Generative Adversarial Networks.” Bioinformatics 35, no. 14 (July 15, 2019)
HiCNN - a computational method for resolution enhancement. A modification of the HiCPlus approach, using very deep (54 layers, five types of layers) convolutional neural network. A Hi-C matrix of regular resolution is transformed into the high-resolution but very sparse matrix, HiCNN predicts the missing values. Pearson and MSE evaluation metrics, overlap of Fit-Hi-C-detected significant interactions - perform similar or slightly better than HiCPlus. PyTorch implementation. http://dna.cs.miami.edu/HiCNN/
Liu, Tong, and Zheng Wang. “HiCNN: A Very Deep Convolutional Neural Network to Better Enhance the Resolution of Hi-C Data.” Edited by John Hancock. Bioinformatics, April 9, 2019
Boost-HiC - infer fine-resolution contact frequencies in Hi-C data, performs well even on 0.1% of the raw data. TAD boundaries remain. Better than HiCPlus. It can be used for differential analysis (comparison) of two Hi-C maps. https://github.com/LeopoldC/Boost-HiC
Carron, Leopold, Jean-baptiste Morlot, Vincent Matthys, Annick Lesne, and Julien Mozziconacci. “Boost-HiC : Computational Enhancement of Long-Range Contacts in Chromosomal Contact Maps,” November 18, 2018.
mHi-C - recovering alignment of multi-mapped reads in Hi-C data. Generative model to estimate probabilities for each bin-pair originating from a given origin. Reproducibility of contact matrices (stratum-adjusted correlation), reproducibility and number of significant interactions are improved. Novel interactions. Enrichment of TAD boundaries in LINE and SINE repetitive elements. Multi-mapping is not sensitive to trimming. Read filtering strategy (Figure 1, supplementary figures are very visual). https://github.com/keleslab/mHiC
Zheng, Ye, Ferhat Ay, and Sunduz Keles. “Generative Modeling of Multi-Mapping Reads with MHi-C Advances Analysis of High Throughput Genome-Wide Conformation Capture Studies,” October 3, 2018.
HIFI - Hi-C Interaction Frequency Inference for restriction fragment-resolution analysis of Hi-C data. Sparsity is resolved by using dependencies between neighboring restriction fragments, with Markov Random Fields performing the best. Better resolves TADs and sub-TADs, significant interactions. CTCF, RAD21, SMC3, ZNF143 are enriched around TAD boundaries. Matrices normalized for fragment-specific biases. https://github.com/BlanchetteLab/HIFI
Cameron, Christopher JF, Josée Dostie, and Mathieu Blanchette. “Estimating DNA-DNA Interaction Frequency from Hi-C Data at Restriction-Fragment Resolution.” Preprint. Bioinformatics, July 25, 2018.
HiCPlus - increasing resolution of Hi-C data using convolutional neural network, mean squared error as a loss function. Basically, smoothing parts of Hi-C image, then binning into smaller parts. Performs better than bilinear/biqubic smoothing. https://github.com/zhangyan32/HiCPlus
Zhang, Yan, Lin An, Ming Hu, Jijun Tang, and Feng Yue. “HiCPlus: Resolution Enhancement of Hi-C Interaction Heatmap,” March 1, 2017.
Simulation
FreeHi-C v.2.0 - simulation of realistic Hi-C matrices with user- or data-driven spike-ins. Spike-ins are introduced on read-level and converted to interaction frequency level. Benchmark of HiCcompare, multiHiCcompare, diffHiC, and Selfish. Assessment of FDR, power, significance order, PRC and AUROC, genomic properties. GM12878 and A549 replicates of experimental Hi-C data. Three simulation settings with varying background distribution of interaction frequencies, spike-in proportions, sequencing depth. Figure 5 - summary of performances for all methods and comparison types. Subjective top performers: multiHiCcompare, HiCcompare, diffHiC, Selfish.
Zheng, Ye, Peigen Zhou, and Sündüz Keleş. “FreeHi-C Spike-in Simulations for Benchmarking Differential Chromatin Interaction Detection.” Methods, July 2020
FreeHi-C - Hi-C data simulation based on properties of experimental Hi-C data. Preserves A/B compartments, TADs, the correlation between replicated (HiCRep), significant interactions, improves power to detect differential interactions. Robust to sequencing depth changes. Tested on replicates of GM12878, A549 human cancer cells, malaria P.falciparum. Compared with poorly performing Sim3C. All simulated data are at https://zenodo.org/record/3345896. Python3 implementation https://github.com/keleslab/FreeHiC
Zheng, Ye, and Sündüz Keleş. “FreeHi-C Simulates High-Fidelity Hi-C Data for Benchmarking and Data Augmentation.” Nature Methods 17, no. 1 (January 2020)
Normalization
HiCorr - a method for correcting known (mappability, CG content) and unknown (visibility) biases in Hi-C maps (multiplicative effects, Methods). Easy Hi-C protocol allowing for low-input (~100K cells) Hi-C (in vivo HindIII digestion, in situ proximity ligation, DpnII digestion after lysis and reverse crosslink, Methods). HiCorr outputs ratio matrixes representing enrichment of Hi-C signal, hence loops can be easily extracted. Recovers 65% of HICCUPS loops and more. Chromatin loops are better marks of cell identity than compartments and outperform eQTLs in defining neurological GWAS target genes. Human iPSCs, neural progenitors (NPCs), neurons, fetal cerebellum, adult temporal cortex, data from other studies. https://github.com/JinLabBioinfo/HiCorr
Lu, Leina, Xiaoxiao Liu, Wei-Kai Huang, Paola Giusti-Rodríguez, Jian Cui, Shanshan Zhang, Wanying Xu, et al. “Robust Hi-C Maps of Enhancer-Promoter Interactions Reveal the Function of Non-Coding Genome in Neural Development and Diseases.” Molecular Cell, June 2020
multiHiCcompare - joint normalization of multiple Hi-C datasets using cyclic loess regression through pairs of MD plots (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance edgeR-based testing of significant interactions. https://bioconductor.org/packages/multiHiCcompare/
Stansfield, John C, Kellen G Cresswell, and Mikhail G Dozmorov. “MultiHiCcompare: Joint Normalization and Comparative Analysis of Complex Hi-C Experiments.” Edited by Inanc Birol. Bioinformatics, January 22, 2019.
Binless - a resolution-agnostic normalization method that adapts to the quality and quantity of available data, to detect significant interactions and differences. Negative binomial count regression framework, adapted for ICE normalization. Fused lasso to smooth neighboring signals. TADbit for data processing, details of read filtering. https://github.com/3DGenomes/binless
Spill, Yannick G., David Castillo, Enrique Vidal, and Marc A. Marti-Renom. “Binless Normalization of Hi-C Data Provides Significant Interaction and Difference Detection Independent of Resolution.” Nature Communications 10, no. 1 (26 2019)
HiCcompare - joint normalization of two Hi-C datasets using loess regression through an MD plot (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance permutation testing of significant interactions. https://bioconductor.org/packages/HiCcompare/
Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. “HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets.” BMC Bioinformatics 19, no. 1 (December 2018).
HiFive - handling and normalization or pre-aligned Hi-C and 5C data, https://www.taylorlab.org/software/hifive/
Sauria, Michael EG, Jennifer E. Phillips-Cremins, Victor G. Corces, and James Taylor. “HiFive: A Tool Suite for Easy and Efficient HiC and 5C Data Analysis.” Genome Biology 16, no. 1 (December 2015). - HiFive - post-processing of aligned Hi-C and 5C data, three normalization approaches: “Binning” - model-based Yaffe & Tanay’s method, “Express” - matrix-balancing approach, “Probability” - multiplicative probability model. Judging normalization quality by the correlation between matrices.
HiCNorm - removing known biases in Hi-C data (GC content, mappability, fragment length) via Poisson regression, http://www.people.fas.harvard.edu/~junliu/HiCNorm/
Hu, Ming, Ke Deng, Siddarth Selvaraj, Zhaohui Qin, Bing Ren, and Jun S. Liu. “HiCNorm: Removing Biases in Hi-C Data via Poisson Regression.” Bioinformatics (Oxford, England) 28, no. 23 (December 1, 2012) - Poisson normalization. Also tested negative binomial.
CNV-aware normalization
Hi-C data normalization considering CNVs. Extension of matrix-balancing algorithm to either retain the copy-number variation effect (LOIC) or remove them (CAIC). ICE itself can lead to misrepresentation of the contact probabilities between CNV regions. Estimating CNV directly from Hi-C data correcting for GC content, mappability, fragment length using Poisson regression. LOIC - the sum of contacts for a given genomic bin is proportional to CNV. CAIC - raw interaction counts are the product of a CNV bias matrix and the expected contact counts at a given genomic distance. Data, and cancer-hic-norm - Normalization of cancer Hi-C data, scripts for the manuscript. LOIC and CAIC methods are implemented in the iced Python package, https://github.com/hiclib/iced
Servant, Nicolas, Nelle Varoquaux, Edith Heard, Emmanuel Barillot, and Jean-Philippe Vert. “Effective Normalization for Copy Number Variation in Hi-C Data.” BMC Bioinformatics 19, no. 1 (September 6, 2018)
HiCapp - Iterative correction-based caICB method. Method to adjust for the copy number variants in Hi-C data. Loess-like idea - we converted the problem of removing the biases across chromosomes to the problem of minimizing the differences across the count-distance curves of different chromosomes. Our method assumes equal representation of genomic locus pairs with similar genomic distances located on different chromosomes if there were no bias in the Hi-C maps. https://bitbucket.org/mthjwu/hicapp
Wu, Hua-Jun, and Franziska Michor. “A Computational Strategy to Adjust for Copy Number in Tumor Hi-C Data.” Bioinformatics (Oxford, England) 32, no. 24 (December 15, 2016)
OneD - CNV bias-correction method, addresses the problem of partial aneuploidy. Bin-centric counts are modeled using the negative binomial distribution, and its parameters are estimated using splines. A hidden Markov model is fit to infer the copy number for each bin. Each Hi-C matrix entry is corrected by dividing its value by the square root of the product of CNVs for the corresponding bins. Reproducibility score (eigenvector decomposition and comparison) to measure improvement in the similarity between replicated Hi-C data. https://github.com/qenvio/dryhic
Vidal, Enrique, François le Dily, Javier Quilez, Ralph Stadhouders, Yasmina Cuartero, Thomas Graf, Marc A Marti-Renom, Miguel Beato, and Guillaume J Filion. “OneD: Increasing Reproducibility of Hi-C Samples with Abnormal Karyotypes.” Nucleic Acids Research, January 31, 2018.
Reproducibility
IDR2D - Irreproducible Discovery Rate that identifies replicable interactions in ChIP-PET, HiChIP, and Hi-C data. Includes the original 1D IDR version (https://github.com/nboley/idr). Resolves multiple pairwise interactions.
Krismer, Konstantin, Yuchun Guo, and David K Gifford. “IDR2D Identifies Reproducible Genomic Interactions.” Preprint. Bioinformatics, July 3, 2019.
3DChromatin_ReplicateQC - Comparison of four Hi-C reproducibility assessment tools, HiCRep, GenomeDISCO, HiC-Spector, QuASAR-Rep. Tested the effects of noise, sparsity, resolution. Spearman doesn’t work well. All tools performed similarly, worsening expectedly. QuASAR has a QC tool measuring the level of noise. https://github.com/kundajelab/3DChromatin_ReplicateQC
Yardimci, Galip, Hakan Ozadam, Michael E.G. Sauria, Oana Ursu, Koon-Kiu Yan, Tao Yang, Abhijit Chakraborty, et al. “Measuring the Reproducibility and Quality of Hi-C Data,” Genome Biology, March 19, 2019
HiCRep - Similarity assessment using generalized Cochran-Mantel-Haenzel statistics M2. Spearman/Pearson doesn’t work. 2-step procedure: Smooth the matrix, then CMH statistics. Basically, splitting data by distance chunks, Pearson on each chunk, summarize. Simple and well-thought stats. Methods: Hi-C datasets with replicates, including 11 ENCODE datasets. R package https://github.com/MonkeyLB/hicrep, and Python implementation
Yang, Tao, Feipeng Zhang, Galip Gurkan Yardimci, Ross C Hardison, William Stafford Noble, Feng Yue, and Qunhua Li. “HiCRep: Assessing the Reproducibility of Hi-C Data Using a Stratum-Adjusted Correlation Coefficient.” Genome Research, August 30, 2017
[QuASAR] - Hi-C quality and reproducibility measure using spatial consistency between local and regional signals. Finds the maximum useful resolution by comparing quality and replicate scores of replicates. Part of HiFive pipeline
Sauria, Michael EG, and James Taylor. “QuASAR: Quality Assessment of Spatial Arrangement Reproducibility in Hi-C Data.” BioRxiv, November 14, 2017.
HiC-Spector - reproducibility metric to quantify the similarity between contact maps using spectral decomposition. Decomposing Laplacian matrices and sum the Euclidean distance between eigenvectors. https://github.com/gersteinlab/HiC-spector
Yan, Koon-Kiu, Galip Gürkan Yardimci, Chengfei Yan, William S. Noble, and Mark Gerstein. “HiC-Spector: A Matrix Library for Spectral and Reproducibility Analysis of Hi-C Contact Maps.” Bioinformatics (Oxford, England) 33, no. 14 (July 15, 2017)
localtadsim - Analysis of TAD similarity using a variation of information (VI) metric as a local distance measure. 23 human Hi-C datasets, Hi-C Pro processed into 100kb matrices, Armatus to call TADs. Defining structurally similar and variable regions. Comparison with previous studies of genomic similarity. Cancer-normal comparison - regions containing pan-cancer genes are structurally conserved in normal-normal pairs, not in cancer-cancer. https://github.com/Kingsford-Group/localtadsim
Sauerwald, Natalie, and Carl Kingsford. “Quantifying the Similarity of Topological Domains across Normal and Cancer Human Cell Types.” Bioinformatics (Oxford, England) 34, no. 13 (July 1, 2018)
Loop callers
HiCExplorer’s hicDetectLoops for loop detection. Review and critique of HiCCUPS, HOMER, GOTHIC, cLoops, FastHiC. Distance-dependent of chromatin interactions with a continuous negative binomial distribution, detection of the interaction counts with p-values smaller than a threshold, then filtering. https://github.com/deeptools/HiCExplorer/
Wolff, Joachim, Rolf Backofen, and Björn Grüning. “Loop Detection Using Hi-C Data with HiCExplorer.” Preprint. Bioinformatics, March 6, 2020
Chromosight - loop and pattern detection (borders, FIREs, hairpins, and centromeres) in Hi-C maps. Takes in a single, whole-genome contact map, text-based bedGraph2d, and binary cool formats, ICE-normalizes. Sliding window, pattern detection using Pearson correlation with the template, then series of filters. Output - text-based. Outperforms HiCexplorer, HICCUPS, HOMER, cooltools, in the order of decreasing F1. Tested on synthetic Hi-C data mimicking S. cerevisiae genome, benchmark data at https://zenodo.org/record/3742095, Python3 code at https://github.com/koszullab/chromosight
Matthey-Doret, Cyril, Lyam Baudry, Axel Breuer, Rémi Montagne, Nadège Guiglielmoni, Vittore Scolari, Etienne Jean, et al. “Computer Vision for Pattern Detection in Chromosome Contact Maps.” Preprint. Bioinformatics, March 8, 2020.
SIP - loop caller using image analysis. Regional maxima-based, peaks called in a sliding window. Distance-normalized Hi-C matrices, image adjusted using Gaussian blur, contrast enhancement, White Top-Hat correction, identified peaks then filtered by peak enrichment, empirical FDR, loop decay. Comparison with HiCCUPS and cLoops callers. Robust to noise, sequencing depth, much faster, good agreement, improved detection rate. SIPMeta - average metaplots of loops on bias-corrected images for better representation. Java implementation, works with .hic and .cool files https://github.com/PouletAxel/SIP
Rowley, M. Jordan, Axel Poulet, Michael H. Nichols, Brianna J. Bixler, Adrian L. Sanborn, Elizabeth A. Brouhard, Karen Hermetz, et al. “Analysis of Hi-C Data Using SIP Effectively Identifies Loops in Organisms from C. Elegans to Mammals.” Genome Research 30, no. 3 (March 2020)
Mustache - loop detection from Hi-C and Micro-C maps. Scale-space theory, detection of blob-shaped objects in a multi-scale representation of contact maps, Gaussian kernels with increasing scales. Differences of adjacent Gaussians guide the search for local maxima. Series of filtering steps to minimize false positives. Corrected for multiple testing p-values of blobs. Applied to Gm12878 and K562 Hi-C data, and HFFc6 cell line Micro-C data, 5kb resolution. Compared with HiCCUPS, detects similar and more loops flanked by convergent CTCF, RAD21, SMC3, loops confirmed by ChIA-PET and HiChIP data. Python3 tool, Conda/Docker wrapped, handles .hic/.cool files. https://github.com/ay-lab/mustache, Tweet
Ardakany, Abbas Roayaei, Halil Tuvan Gezer, Stefano Lonardi, and Ferhat Ay. “Mustache: Multi-Scale Detection of Chromatin Loops from Hi-C and Micro-C Maps Using Scale-Space Representation.” Preprint. Bioinformatics, February 26, 2020.
FitHiC2 - protocol to install/run FitHiC Python3 tool/scripts. Fit of non-increasing cubic splines to distance-interaction frequency decay to identify significant interactions in individual matrices. Accounts for biases derived from KR (ICE, or other) normalization (HiCKRy). Works with fixed-bin- or restriction cut site resolution data. Overview of FitHiC algorithm, accounting for biases. Flexible input options, from HiC-Pro, Juicer, and other tools, validPairs file format. Post-processing to prioritize highly significant interactions supported by the nearby loci, and filter noisy detections. HTML report, flexible BED-derived output format, conversion to formats for WashU epigenome and UCSC browsers. Installable using conda, pip, GitHub. Comparable methods - HiCCUPS, HOMER, GOTHiC, HiC-DC, a brief description of each. Tested on three datasets. GitHub: https://github.com/ay-lab/fithic, Executable on Code Ocean: https://codeocean.com/capsule/4528858/tree/v3, Data: https://zenodo.org/record/3380589
Kaul, Arya, Sourya Bhattacharyya, and Ferhat Ay. “Identifying Statistically Significant Chromatin Contacts from Hi-C Data with FitHiC2.” Nature Protocols, January 24, 2020.
FIREcaller - an R package to detect frequently interacting regions (FIREs, <200Kb interactions). Within-sample (HiCNormCis) and cross-sample (quantile) normalization, converting FIRE counts to Z-scores, taking significant ones. Schmitt data https://yunliweb.its.unc.edu/FIREcaller/
Crowley, Cheynna, Yuchen Yang, Yunjiang Qiu, Benxia Hu, Hyejung Won, Bing Ren, Ming Hu, and Yun Li. “FIREcaller: An R Package for Detecting Frequently Interacting Regions from Hi-C Data.” Preprint. Bioinformatics, April 29, 2019.
coolpup.py - Pile-up (aggregation, averaging) analysis of Hi-C data (.cool format) for visualizing and identifying chromatin loops from several sparse datasets, e.g., single-cell. Visualization using plotpup.py script. Scripts for the paper: https://github.com/Phlya/coolpuppy_paper/tree/master/Nagano, tool: https://github.com/Phlya/coolpuppy
Flyamer, Ilya M., Robert S. Illingworth, and Wendy A. Bickmore. “Coolpup.Py - a Versatile Tool to Perform Pile-up Analysis of Hi-C Data.” BioRxiv, January 1, 2019
cLoops - DBSCAN-based algorithm for the detection of chromatin loops in ChIA-PET, Hi-C, HiChIP, Trac-looping data. Local permutation-based estimation of statistical significance, several tests for enrichment over the background. Outperforms diffHiC, Fit-Hi-C, GOTHiC, HiCCUPS, HOMER. https://github.com/YaqiangCao/cLoops
Cao, Yaqiang, Xingwei Chen, Daosheng Ai, Zhaoxiong Chen, Guoyu Chen, Joseph McDermott, Yi Huang, and Jing-Dong J. Han. “Accurate Loop Calling for 3D Genomic Data with CLoops,” November 8, 2018.
FitHiChIP - significant peak caller in HiChIP and PLAC-seq data. Accounts for assay-specific biases, as well as for the distance effect. 3D differential loops detection. Methods. https://github.com/ay-lab/FitHiChIP
Bhattacharyya, Sourya, Vivek Chandra, Pandurangan Vijayanand, and Ferhat Ay. “FitHiChIP: Identification of Significant Chromatin Contacts from HiChIP Data,” September 10, 2018.
StripeCaller - A toolkit for analyzing architectural stripes. Architectural stripes, created by extensive loading of cohesin near CTCF anchors, with Nipbl and Rad21 help. Little overlap between B cells and ESCs. Architectural stripes are sites for tumor-inducing TOP2beta DNA breaks. ATP is required for loop extrusion, cohesin translocation, but not required for maintenance, Replication of transcription is not important for loop extrusion. Zebra algorithm for detecting architectural stripes, image analysis, math in Methods. Human lymphoblastoid cells, mouse ESCs, mouse B-cells activated with LPS, CH12 B lymphoma cells, wild-type, treated with hydroxyurea (blocks DNA replication), flavopiridol (blocks transcription, PolII elongation), oligomycin (blocks ATP). Hi-C, ChIA-pet, ChIP-seq, ATAC-seq, and more Data1, Data2.
Vian, Laura, Aleksandra Pękowska, Suhas S.P. Rao, Kyong-Rim Kieffer-Kwon, Seolkyoung Jung, Laura Baranello, Su-Chen Huang, et al. “The Energetics and Physiological Impact of Cohesin Extrusion.” Cell 173, no. 5 (May 2018)
HiC-DC - significant interaction detection using the zero-inflated negative binomial model and accounting for biases like GC content, mappability. Compared with Fit-Hi-C, more conservative. Robust to sequencing depth. Detects significant, biologically relevant interactions at all length scales, including sub-TADs. BWA-MEM alignment (Python script), then processing in R. https://bitbucket.org/leslielab/hic-dc/src/master/
Carty, Mark, Lee Zamparo, Merve Sahin, Alvaro González, Raphael Pelossof, Olivier Elemento, and Christina S. Leslie. “An Integrated Model for Detecting Significant Chromatin Interactions from High-Resolution Hi-C Data.” Nature Communications 8, no. 1 (August 2017)
GoTHIC - R package for peak calling in individual HiC datasets, while accounting for noise. https://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html
Mifsud, Borbala, Inigo Martincorena, Elodie Darbo, Robert Sugar, Stefan Schoenfelder, Peter Fraser, and Nicholas M. Luscombe. “GOTHiC, a Probabilistic Model to Resolve Complex Biases and to Identify Real Interactions in Hi-C Data.” Edited by Mark Isalan. PLOS ONE 12, no. 4 (April 5, 2017) - The GOTHiC (genome organization through HiC) algorithm uses a simple binomial distribution model to simultaneously remove coverage-associated biases in Hi-C data and detect significant interactions by assuming that the global background interaction frequency of two loci. Use of the Benjamini–Hochberg multiple-testing correction to control for the false discovery rate.
HMRFBayesHiC - a hidden Markov random field-based Bayesian peak caller to identify long-range chromatin interactions from Hi-C data. Borrowing information from neighboring loci. Previous peak calling methods, Fit-Hi-C. Interactions between enhancers and promoters as a benchmark. https://yunliweb.its.unc.edu/HMRFBayesHiC/
Xu, Zheng, Guosheng Zhang, Fulai Jin, Mengjie Chen, Terrence S. Furey, Patrick F. Sullivan, Zhaohui Qin, Ming Hu, and Yun Li. “A Hidden Markov Random Field-Based Bayesian Method for the Detection of Long-Range Chromosomal Interactions in Hi-C Data.” Bioinformatics (Oxford, England) 32, no. 5 (01 2016)
FastHiC - hidden Markov random field (HMRF)-based peak caller, fast and well-performing. https://yunliweb.its.unc.edu/fasthic/
Xu, Zheng, Guosheng Zhang, Cong Wu, Yun Li, and Ming Hu. “FastHiC: A Fast and Accurate Algorithm to Detect Long-Range Chromosomal Interactions from Hi-C Data.” Bioinformatics (Oxford, England) 32, no. 17 (01 2016)
FitHiC - Python tool for detection of significant chromatin interactions, https://noble.gs.washington.edu/proj/fit-hi-c/
Ay, Ferhat, Timothy L. Bailey, and William Stafford Noble. “Statistical Confidence Estimation for Hi-C Data Reveals Regulatory Chromatin Contacts.” Genome Research 24, no. 6 (June 2014) - Fit-Hi-C method, Splines to model distance dependence. Model mid-range interaction frequencies, decay with distance. Biases, normalization methods. Two-step splines - use all dots for the first fit, identify and remove outliers, second fit without outliers. Markers of boundaries - insulators, heterochromatin, pluripotent factors. CNVs are enriched in chromatin boundaries. Replication timing data how-to http://www.replicationdomain.com/. Validation Hi-C data. http://chromosome.sdsc.edu/mouse/hi-c/download.html
HiCPeaks - Python CPU-based implementation for BH-FDR and HICCUPS, two peak calling algorithms for Hi-C data, proposed by Rao et al. 2014. Text-to-cooler Hi-C data converter, two scripts to call peaks, and one for visualization (creation of a .png file). https://github.com/XiaoTaoWang/HiCPeaks
HOMER - Perl scripts for normalization, visualization, significant interaction detection, motif discovery. Does not correct for bias. http://homer.ucsd.edu/homer/interactions/
Capture-C peaks
Peaky - Bayesian sparse variable selection approach. The model proposes that for any given bait, the expected CHi-C signal at each prey fragment is expressed as a sum of contributions from a set of fragments directly contacting that bait. https://github.com/cqgd/pky
Eijsbouts, Christiaan Q, Oliver S Burren, Paul J Newcombe, and Chris Wallace. “Fine Mapping Chromatin Contacts in Capture Hi-C Data.” BMC Genomics 20, no. 1 (December 2019).
ChiCMaxima - a pipeline for detection and visualization of chromatin loops in Capture Hi-C data. Loess smoothing combined with a background model to detect significant interactions Comparison with GOTHiC and CHiCAGO. https://github.com/yousra291987/ChiCMaxima
Ben Zouari, Yousra, Anne M Molitor, Natalia Sikorska, Vera Pancaldi, and Tom Sexton. “ChiCMaxima: A Robust and Simple Pipeline for Detection and Visualization of Chromatin Looping in Capture Hi-C,” October 16, 2018.
HiCapTools - A software package that can design sequence capture probes for targeted chromosome capture applications and analyze sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution. https://github.com/sahlenlab/HiCapTools
Anandashankar Anil, Rapolas Spalinskas, Örjan Åkerborg, Pelin Sahlén; “HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications.”, Bioinformatics, Volume 34, Issue 4, 15 February 2018
CHiCAGO is a Capture Hi-C data processing method that filters out contacts that are expected by chance given the linear proximity of the interacting fragments on the genome and takes into account the asymmetric biases introduced by the capture step used in the Capture Hi-C approach. Two-component background model (Delaporte distribution) - Brownian motion (Neg. Binom.) and technical noise (Poisson). Account for distance. https://bioconductor.org/packages/Chicago/, Tweet by Mikhail Spivakov: Running Chicago with data generated w/ a 4-cutter such as DpnII? Default settings were tuned on 6-cutter data (HindIII) & not optimal for this. Our suggested settings for DpnII are: MaxLBrowndist = 75000, binsize = 1500, minFragLen=75, maxFragLen=1200.
Cairns, Jonathan, Paula Freire-Pritchett, Steven W. Wingett, Csilla Várnai, Andrew Dimond, Vincent Plagnol, Daniel Zerbino, et al. “CHiCAGO: Robust Detection of DNA Looping Interactions in Capture Hi-C Data.” Genome Biology 17, no. 1 (2016): 127.
Differential interactions
Serpentine - differential analysis of two Hi-C maps using the 2D serpentine-binning method. Serpentine is a subset of connected pixels defined by thresholds in control and experimental contact maps. Serpentines are then compared using the Mean-Deviation plot. Help to alleviate the effect of sparsity. Uses HiCcompare functionality. Normalization does not help. Python package, currently processes full 1500x1500 matrices. https://github.com/koszullab/serpentine
Baudry, Lyam, Gaël A Millot, Agnes Thierry, Romain Koszul, and Vittore F Scolari. “Serpentine: A Flexible 2D Binning Method for Differential Hi-C Analysis.” Edited by Alfonso Valencia. Bioinformatics 36, no. 12 (June 1, 2020)
multiHiCcompare - joint normalization of multiple Hi-C datasets using cyclic loess regression through pairs of MD plots (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance edgeR-based testing of significant interactions. https://bioconductor.org/packages/multiHiCcompare/
Stansfield, John C, Kellen G Cresswell, and Mikhail G Dozmorov. “MultiHiCcompare: Joint Normalization and Comparative Analysis of Complex Hi-C Experiments.” Edited by Inanc Birol. Bioinformatics, January 22, 2019
Chicdiff - differential interaction detection in Capture Hi-C data. Signal normalization based on the CHiCAGO framework, differential testing using DESeq2. Accounting for distance effect by the Independent Hypothesis Testing (IHW) method to learn p-value weights based on the distance to maximize the number of rejected null hypotheses. https://github.com/RegulatoryGenomicsGroup/chicdiff
Cairns, Jonathan, William R. Orchard, Valeriya Malysheva, and Mikhail Spivakov. “Chicdiff: A Computational Pipeline for Detecting Differential Chromosomal Interactions in Capture Hi-C Data.” BioRxiv, January 1, 2019
Selfish - comparative analysis of replicate Hi-C experiments via a self-similarity measure - local similarity borrowed from image comparison. Check reproducibility, detect differential interactions. Boolean representation of contact matrices for reproducibility quantification. Deconvoluting local interactions with a Gaussian filter (putting a Gaussian bell around a pixel), then comparing derivatives between contact maps for each radius. Simulated (Zhou method) and real comparison with FIND - better performance, especially on low fold-changes. Stronger enrichment of relevant epigenomic features. Matlab implementation https://github.com/ucrbioinfo/Selfish
Roayaei Ardakany, Abbas, Ferhat Ay, and Stefano Lonardi. “Selfish: Discovery of Differential Chromatin Interactions via a Self-Similarity Measure.” BioRxiv, January 1, 2019
HiCcompare - joint normalization of two Hi-C datasets using loess regression through an MD plot (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance permutation testing of significant interactions. https://bioconductor.org/packages/HiCcompare/
Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. “HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets.” BMC Bioinformatics 19, no. 1 (December 2018)
diffloop - Differential analysis of chromatin loops (ChIA-PET). edgeR framework. https://bioconductor.org/packages/diffloop/
Lareau, Caleb A., and Martin J. Aryee. “Diffloop: A Computational Framework for Identifying and Analyzing Differential DNA Loops from Sequencing Data.” Bioinformatics (Oxford, England), September 29, 2017.
FIND - differential chromatin interaction detection comparing the local spatial dependency between interacting loci. Previous strategies - simple fold-change comparisons, binomial model (HOMER), count-based (edgeR). FIND exploits a spatial Poisson process model to detect differential chromatin interactions that show a significant change in their interaction frequency and the interaction frequency of their adjacent bins. “Variogram” concept. For each point, compare densities between conditions using Fisher’s test. Explored various multiple correction testing methods, used r^th ordered p-values (rOP) method. Benchmarking against edgeR in simulated settings - FIND outperforms at shorter distances, edgeR has more false positives at longer distances. Real Hi-C data normalized using KR and MA normalizations. R package https://bitbucket.org/nadhir/find/downloads/
Mohamed Nadhir, Djekidel, Yang Chen, and Michael Q. Zhang. “FIND: DifFerential Chromatin INteractions Detection Using a Spatial Poisson Process.” Genome Research, February 12, 2018. https://doi.org/10.1101/gr.212241.116.
AP - aggregation preference - parameter, to quantify TAD heterogeneity. Call significant interactions within a TAD, cluster with DBSCAN, calculate weighted interaction density within each cluster, average. AP measures are reproducible. Comparison of TADs in Gm12878 and IMR90 - stable TADs change their aggregation preference, these changes correlate with LINEs, Lamin B1 signal. Can detect structural changes (block split) in TADs. https://github.com/XiaoTaoWang/TADLib
Wang, X.-T., Dong, P.-F., Zhang, H.-Y., and Peng, C. (2015). “Structural heterogeneity and functional diversity of topologically associating domains in mammalian genomes.” Nucleic Acids Research
diffHiC - Differential contacts using the full pipeline for Hi-C data. Explanation of the technology, binning. MA normalization, edgeR-based. Comparison with HOMER. https://bioconductor.org/packages/diffHic/
Lun, Aaron T. L., and Gordon K. Smyth. “DiffHic: A Bioconductor Package to Detect Differential Genomic Interactions in Hi-C Data.” BMC Bioinformatics 16 (2015)
TAD callers
BHi-Cect - identification of the full hierarchy of chromosomal interactions (TADs). Spectral clustering starting from the whole chromosome, detecting nested BHi-Cect Partition Trees (BPTs), partitioned in non-contiguous and interwoven enclaves, inspired by fractal globule idea. Variation of information to test the agreement between two clustering results, overlap-based metrics to test correspondence with TADs. Correspondence analysis of enclaves association with TF content. Gene enrichment. Different enclaves show different epigenomic and gene expression signatures, bottom enclaves are most crisply defined. Resolution affects what enclave size can be detected. https://github.com/princeps091-binf/BHi-Cect
Kumar, Vipin, Simon Leclerc, and Yuichi Taniguchi. “BHi-Cect: A Top-down Algorithm for Identifying the Multi-Scale Hierarchical Structure of Chromosomes.” Nucleic Acids Research 48, no. 5 (March 18, 2020)
TADBD TAD caller using a multi-scale Haar diagonal template (sum of on-diagonal squares minus the sum of off-diagonal squares). Compared with HiCDB, IC-Finder, EAST (also using Haar features), TopDom, HiCseg using simulated (Forcato) and experimental (K562 and IMR90). ICE-normalized data. MCC, Jaccard. FAst. R package https://github.com/bioinfo-lab/TADBD/
Lyu, Hongqiang, Lin Li, Zhifang Wu, Tian Wang, Jiguang Zheng, and Hongda Wang. “TADBD: A Sensitive and Fast Method for Detection of Typologically Associated Domain Boundaries.” BioTechniques, April 7, 2020
TADpole - hierarchical TAD boundary caller. Preprocessing by filtering sparse rows, transforming the matrix into its Pearson correlation coefficient matrix, running PCA on it and retaining 200 PCs, transforming into a Euclidean distance matrix, clustering using the Constrained Incremental Sums of Squares clustering (rioja::chclust(, coniss)), estimating significance, Calinski-Harabasz index to estimate the optimal number of clusters (chromatin subdivisions). Benchmarking using Zufferey 2018 datasets, mouse limb bud development with genomic inversions from Kraft 2019. Resolution, normalization, sequencing depth. Metrics: the Overlap Score, the Measure of Concordance, all from Zufferey 2018. Enrichment in epigenomic marks. DiffT metric for differential analysis (on binarized TAD/non-TAD matrices). Compared with 22 TAD callers, including hierarchical (CaTCH, GMAP, Matryoshka, PSYCHIC). https://github.com/3DGenomes/TADpole
Soler-Vila, Paula, Pol Cuscó Pons, Irene Farabella, Marco Di Stefano, and Marc A. Marti-Renom. “Hierarchical Chromatin Organization Detected by TADpole.” Preprint. Bioinformatics, July 11, 2019.
HiCDB - TAD boundary detection using local relative insulation (LRI) metric, improved stability, less parameter tuning, cross-resolution, differential boundary detection, lower computations, visualization. Review of previous methods, directionality index, insulation score. Math of LRI. GSEA-like enrichment in genome annotations (CTCF). Differential boundary detection using the intersection of extended boundaries. Compared with Armatus, DI, HiCseg, IC-finder, Insulation, TopDom on 40kb datasets. Accurately detects smaller-scale boundaries. Differential TADs are enriched in cell-type-specific genes. https://github.com/ChenFengling/RHiCDB
Chen, Fengling, Guipeng Li, Michael Q. Zhang, and Yang Chen. “HiCDB: A Sensitive and Robust Method for Detecting Contact Domain Boundaries.” Nucleic Acids Research 46, no. 21 (November 30, 2018)
OnTAD - hierarchical TAD caller, Optimal Nested TAD caller. Sliding window, adaptive local minimum search algorithm, similar to TOPDOM. C++ implementation. https://github.com/anlin00007/OnTAD. OnTAD for coolers - a Python wrapper to work with .cool files.
An, Lin, Tao Yang, Jiahao Yang, Johannes Nuebler, Qunhua Li, and Yu Zhang. “Hierarchical Domain Structure Reveals the Divergence of Activity among TADs and Boundaries,” July 3, 2018. - Intro about TADs, Dixon’s directionality index, Insulation score. Other hierarchical callers - TADtree, rGMAP, Arrowhead, 3D-Net, IC-Finder. Limitations of current callers - ad hoc thresholds, sensitivity to sequencing depth and mapping resolution, long running time and large memory usage, insufficient performance evaluation. Boundaries are asymmetric - some have more contacts with other boundaries, support for asymmetric loop extrusion model. Performance comparison with DomainCaller, rGMAP, Arrowhead, TADtree. Stronger enrichment of CTCF and two cohesin proteins RAD21 and SMC3. TAD-adjR^2 metric quantifying the proportion of variance in the contact frequencies explained by TAD boundaries. Reproducibility of TAD boundaries - Jaccard index, tested at different sequencing depths and resolutions. Boundaries of hierarchical TADs are more active - more CTCF, epigenomic features, TFBSs expressed genes. Super-boundaries - shared by 5 or more TADs, highly active. Rao-Huntley 2014 Gm12878 data. Distance correction - subtracting the mean counts at each distance.
3D-NetMod - hierarchical, nested, partially overlapping TAD detection using graph theory. Community detection method based on the maximization of network modularity, Louvain-like locally greedy algorithm, repeated several (20) times to avoid local maxima, then getting consensus. Tuning parameters are estimated over a sequence search. Benchmarked against TADtree, directionality index, Arrowhead. ICE-normalized data brain data from Geschwind (human data) and Jiang (mouse data) studies. Computationally intensive. Python implementation https://bitbucket.org/creminslab/3dnetmod_method_v1.0_10_06_17
Norton, Heidi K., Daniel J. Emerson, Harvey Huang, Jesi Kim, Katelyn R. Titus, Shi Gu, Danielle S. Bassett, and Jennifer E. Phillips-Cremins. “Detecting Hierarchical Genome Folding with Network Modularity.” Nature Methods 15, no. 2 (February 2018): 119–22. https://doi.org/10.1038/nmeth.4560.
deDoc - TAD detection minimizing structural entropy of the Hi-C graph (structural information theory). Detects optimal resolution (= minimal entropy). Pooled 10 single-cell Hi-C analysis. Intro about TADs, a brief description of TAD callers, including hierarchical. Works best on raw, non-normalized data, highly robust to sparsity (0.1% of the original data sufficient). Compared with five TAD callers (Armatus, TADtree, Arrowhead, MrTADFinder, Domaincall (DI)), and a classical graph modularity detection algorithm. Enrichment in CTCF, housekeeping genes, H3K4me3, H4K20me1, H3K36me3. Other benchmarks - weighted similarity, number, length of TADs. Detects hierarchy over different passes. Java implementation (won’t run on Mac) https://github.com/yinxc/structural-information-minimisation
Li, Angsheng, Xianchen Yin, Bingxiang Xu, Danyang Wang, Jimin Han, Yi Wei, Yun Deng, Ying Xiong, and Zhihua Zhang. “Decoding Topologically Associating Domains with Ultra-Low Resolution Hi-C Data by Graph Structural Entropy.” Nature Communications 9, no. 1 (15 2018): 3265. https://doi.org/10.1038/s41467-018-05691-7.
CaTCH - identification of hierarchical TAD structure. Reciprocal insulation (RI) index. Benchmarked against Dixon’s TADs (diTADs). CTCF enrichment as a benchmark, enrichment of TADs in differentially expressed genes. https://github.com/zhanyinx/CaTCH_R
Zhan, Yinxiu, Luca Mariani, Iros Barozzi, Edda G. Schulz, Nils Blüthgen, Michael Stadler, Guido Tiana, and Luca Giorgetti. “Reciprocal Insulation Analysis of Hi-C Data Shows That TADs Represent a Functionally but Not Structurally Privileged Scale in the Hierarchical Folding of Chromosomes.” Genome Research 27, no. 3 (2017)
HiTAD - hierarchical TAD identification, different resolutions, correlation with chromosomal compartments, replication timing, gene expression. Adaptive directionality index approach. Data sources, methods for comparing TAD boundaries, reproducibility. H3K4me3 enriched and H3K4me1 depleted at boundaries. TAD boundaries (but not sub-TADs) separate replication timing, A/B compartments, gene expression. https://github.com/XiaoTaoWang/TADLib
Wang, Xiao-Tao, Wang Cui, and Cheng Peng. “HiTAD: Detecting the Structural and Functional Hierarchies of Topologically Associating Domains from Chromatin Interactions.” Nucleic Acids Research 45, no. 19 (November 2, 2017)
IC-Finder - Segmentations of HiC maps into hierarchical interaction compartments, http://membres-timc.imag.fr/Daniel.Jost/DJ-TIMC/Software.html
Noelle Haddad, Cedric Vaillant, Daniel Jost. “IC-Finder: inferring robustly the hierarchical organization of chromatin folding.” Nucleic Acids Res. 2017 Jun 2; 45(10).
ClusterTAD - A clustering method for identifying topologically associated domains (TADs) from Hi-C data, https://github.com/BDM-Lab/ClusterTAD
Oluwadare, Oluwatosin, and Jianlin Cheng. “ClusterTAD: An Unsupervised Machine Learning Approach to Detecting Topologically Associated Domains of Chromosomes from Hi-C Data.” BMC Bioinformatics 18, no. 1 (November 14, 2017)
EAST - Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps, Haar-like features (rectangles on images) and a function that quantifies TAD properties: frequency within is high, outside - low, boundaries must be strong. Objective - finding a set of contiguous non-overlapping domains maximizing the function. Restricted by the maximum length of TADs. Boundaries are enriched in CTCF, RNP PolII, H3K4me3, H3K27ac. https://github.com/ucrbioinfo/EAST
Abbas Roayaei Ardakany, Stefano Lonardi, and Marc Herbstritt, “Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps” (Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany, 2017)
TADtree - Hierarchical (nested) TAD identification. Two ways of TAD definition: 1D and 2D. Normalization by distance. Enrichment over the background. http://compbio.cs.brown.edu/software/
Weinreb, Caleb, and Benjamin J. Raphael. “Identification of Hierarchical Chromatin Domains.” Bioinformatics (Oxford, England) 32, no. 11 (June 1, 2016)
TopDom - An efficient and Deterministic Method for identifying Topological Domains in Genomes, Method is based on the general observation that within-TAD interactions are stronger than between-TAD. binSignal value as the average of nearby contact frequency, fitting a curve, finding local minima, test them for significance. Fast, takes linear time. Detects similar domains to HiCseq and Dixon’s directionality index. Found expected enrichment in CTCF, histone marks. Housekeeping genes and overall gene density are close to TAD boundaries, differentially expressed genes are not. http://zhoulab.usc.edu/TopDom/
Shin, Hanjun, Yi Shi, Chao Dai, Harianto Tjong, Ke Gong, Frank Alber, and Xianghong Jasmine Zhou. “TopDom: An Efficient and Deterministic Method for Identifying Topological Domains in Genomes.” Nucleic Acids Research 44, no. 7 (April 20, 2016)
TADtool - wrapper for directionality index and insulation score TAD callers. https://github.com/vaquerizaslab/tadtool
Kruse, Kai, Clemens B. Hug, Benjamín Hernández-Rodríguez, and Juan M. Vaquerizas. “TADtool: Visual Parameter Identification for TAD-Calling Algorithms.” Bioinformatics (Oxford, England) 32, no. 20 (15 2016)
Arboretum-Hi-C - a multitask spectral clustering method to identify differences in genomic architecture. Intro about the 3D genome organization, TAD differences, and conservation. Assessment of different clustering approaches using different distance measures, as well as raw contacts. Judging clustering quality by enrichment in genomic regulatory signals (Histone marks, LADs, early vs. late replication timing, TFs like POLII, TAF, TBP, CTCF, P300, CMYC, cohesin components, LADs, replication timing, SINE, LINE, LTR) and by numerical methods (Davies-Bouldin index, silhouette score, others). Although spectral clustering on contact counts performed best, spectral + Spearman correlation was chosen. Comparing cell types identifies biologically relevant differences as quantified by enrichment. Peak counts or average signal within regions were used for enrichment. Data https://zenodo.org/record/49767, and Arboretum-HiC https://bitbucket.org/roygroup/arboretum-hic
Fotuhi Siahpirani, Alireza, Ferhat Ay, and Sushmita Roy. “A Multi-Task Graph-Clustering Approach for Chromosome Conformation Capture Data Sets Identifies Conserved Modules of Chromosomal Interactions.” Genome Biology 17, no. 1 (December 2016).
Armatus - TAD detection at different resolutions, Dynamic programming method. https://github.com/kingsfordgroup/armatus
Filippova, Darya, Rob Patro, Geet Duggal, and Carl Kingsford. “Identification of Alternative Topological Domains in Chromatin.” Algorithms for Molecular Biology 9, no. 1 (2014)
HiCseg - TAD detection by maximization of likelihood based block-wise segmentation model. 2D segmentation rephrased as 1D segmentation - not contours, but borders. Statistical framework, solved with dynamic programming. Dixon data as gold standard. Hausdorff distance to compare segmentation quality. Parameters (from TopDom paper): nb_change_max = 500, distrib = ‘G’ and model = ‘Dplus’. https://cran.r-project.org/web/packages/HiCseg/index.html
Lévy-Leduc, Celine, M. Delattre, T. Mary-Huard, and S. Robin. “Two-Dimensional Segmentation for Analyzing Hi-C Data.” Bioinformatics (Oxford, England) 30, no. 17 (September 1, 2014)
domaincaller - A Python implementation of the original DI domain caller, https://github.com/XiaoTaoWang/domaincaller
Differential TAD analysis
TADcompare - R package for differential and time-course TAD boundary analysis. Uses SpectralTAD score - spectral decomposition of Hi-C matrices - to statistically detect five types of differential TAD boundaries: merge, split, complex, shifted, strength change. In the time-course analysis, detects six types of boundary score changes: highly common, early appearing, late appearing, early disappearing, late disappearing, and dynamic TAD boundaries. Returns genomic coordinated and types of TAD boundary changes in BED format. https://bioconductor.org/packages/TADCompare/
Cresswell, Kellen G., and Mikhail G. Dozmorov. “TADCompare: An R Package for Differential and Temporal Analysis of Topologically Associated Domains.” Frontiers in Genetics 11 (March 10, 2020)
DiffTAD - differential contact frequency in TADs between two conditions. Two - permutation-based comparing observed vs. expected median interactions, and parametric test considering the sign of the differences within TADs. Both tests account for distance stratum. https://bitbucket.org/rzaborowski/differential-analysis
Zaborowski, Rafal, and Bartek Wilczynski. “DiffTAD: Detecting Differential Contact Frequency in Topologically Associating Domains Hi-C Experiments between Conditions.” BioRxiv, January 1, 2016
dcHiC - Differential Compartment Analysis of Hi-C Datasets, https://github.com/ay-lab/dcHiC
Prediction of 3D features
3DEpiLoop - prediction of 3D interactions from 1D epigenomic profiles using Random Forest trained on CTCF peaks (histone modifications are the most important predictors and TFBSs). https://bitbucket.org/4dnucleome/3depiloop
Al Bkhetan, Ziad, and Dariusz Plewczynski. “Three-Dimensional Epigenome Statistical Model: Genome-Wide Chromatin Looping Prediction.” Scientific Reports 8, no. 1 (December 2018).
SNIPER - 3D subcompartment (A1, A2, B1, B2, B3) identification from low-coverage Hi-C datasets. A neural network based on a denoising autoencoder (9 layers) and a multi-layer perceptron. Sigmoidal activation of inputs, ReLU, softmax on outputs. Dropout, binary cross-entropy. exp(-1/C) transformation of Hi-C matrices. Applied to Gm12878 and 8 additional cell types to compare subcompartment changes. Compared with Rao2014 annotations, outperforms Gaussian HMM and MEGABASE.
Xiong, Kyle, and Jian Ma. “Revealing Hi-C Subcompartments by Imputing High-Resolution Inter-Chromosomal Chromatin Interactions.” BioRxiv, January 1, 2018
TADBoundaryDectector - TAD boundary prediction from sequence only using deep learning models. 12 architectures tested, with three convolutional and an LSTM layer performed best. Methods, Implementation in Keras-TensorFlow. Model evaluation using different criteria, 96% accuracy reported. Deep learning outperforms feature-based models, among which Boosted Trees, Random Forest, elastic net logistic regression are the best performers. Data augmentation (aka feature engineering) by randomly shifting TAD boundary regions by som base pairs of length (0-100). Tested on Drosophila data. https://github.com/lincshunter/TADBoundaryDectector
Henderson, John, Vi Ly, Shawn Olichwier, Pranik Chainani, Yu Liu, and Benjamin Soibam. “Accurate Prediction of Boundaries of High Resolution Topologically Associated Domains (TADs) in Fruit Flies Using Deep Learning.” Nucleic Acids Research, May 3, 2019.
SNP-oriented
iRegNet3D - Integrated Regulatory Network 3D (iRegNet3D) is a high-resolution regulatory network comprised of interfaces of all known transcription factor (TF)-TF, TF-DNA interaction interfaces, as well as chromatin-chromatin interactions and topologically associating domain (TAD) information from different cell lines. Goal: SNP interpretation. Input: One or several SNPs, rsIDs, or genomic coordinates. Output: For one or two SNPs, on-screen information of their disease-related info, connection over TF-TF and chromatin interaction networks, and whether they interact in 3D and located within TADs. For multiple SNPs, the same info downloadable as text files. http://iregnet3d.yulab.org/index/
Liang, Siqi, Nathaniel D. Tippens, Yaoda Zhou, Matthew Mort, Peter D. Stenson, David N. Cooper, and Haiyuan Yu. “IRegNet3D: Three-Dimensional Integrated Regulatory Network for the Genomic Analysis of Coding and Non-Coding Disease Mutations.” Genome Biology 18, no. 1 (December 2017)
3DSNP - 3DSNP database integrating SNP epigenomic annotations with chromatin loops. Linear closest gene, 3D interacting gene, eQTL, 3D interacting SNP, chromatin states, TFBSs, conservation. For individual SNPs. http://cbportal.org/3dsnp/
Lu, Yiming, Cheng Quan, Hebing Chen, Xiaochen Bo, and Chenggang Zhang. “3DSNP: A Database for Linking Human Noncoding SNPs to Their Three-Dimensional Interacting Genes.” Nucleic Acids Research 45, no. D1 (January 4, 2017)
HUGIn - tissue-specific Hi-C linear display of anchor position and around. Overlay gene expression and epigenomic data. Association of SNPs with genes based on Hi-C interactions. Tissue-specific. http://yunliweb.its.unc.edu/HUGIn/
Martin, Joshua S, Zheng Xu, Alex P Reiner, Karen L Mohlke, Patrick Sullivan, Bing Ren, Ming Hu, and Yun Li. “HUGIn: Hi-C Unifying Genomic Interrogator.” Edited by Inanc Birol. Bioinformatics 33, no. 23 (December 1, 2017)
CNV and Structural variant detection
hicpipe and, alternatively, HiCnorm normalization preserves CNVs in Hi-C data. From Zhang et al., “Local and Global Chromatin Interactions Are Altered by Large Genomic Deletions Associated with Human Brain Development.”
hic_breakfinder - Detection of structural variants (SV) by integrating optical mapping, Hi-C, and WGS. Custom pipeline using LUMPY, Delly, Control-FREEC software. New Hi-C data on 14 cancer cell lines and 21 previously published datasets. Integration of the detected SVs with genomic annotations, including replication timing. Supplementary data with SVs resolved by individual methods and integrative approaches. https://github.com/dixonlab/hic_breakfinder
Dixon, Jesse R., Jie Xu, Vishnu Dileep, Ye Zhan, Fan Song, Victoria T. Le, Galip Gürkan Yardımcı, et al. “Integrative Detection and Analysis of Structural Variation in Cancer Genomes.” Nature Genetics, September 10, 2018.
HiCnv, HiTrans - CNV, translocation calling from Hi-C data. CNV calling using HMM on per-restriction site quantified data and 1D-normalized accounting for low GC-content (<0.2), mappability (<0.5). Translocation calling on inter-chromosomal matrices, binned. https://github.com/ay-lab/HiCnv, https://github.com/ay-lab/HiCtrans
Chakraborty, Abhijit, and Ferhat Ay. “Identification of Copy Number Variations and Translocations in Cancer Cells from Hi-C Data.” Edited by Christina Curtis. Bioinformatics 34, no. 2 (January 15, 2018)
HiNT - CNV and translocation detection from ~10-20% ambiguous chimeric reads in Hi-C data. Three tools: HiNT-Pre - preprocessing of Hi-C data; HiNT-CNV and HiNT-TL - CNV and translocation detection, respectively (accept HiC-Pro output). Tested on K562 (cancer) and Gm12878 (normal) data. Removal of known biases using a GAM with Poisson function. Outperforms Delly, Meerkat, hic_breakfinder, HiCtrans. Relatively little overlap with CNVs from WGS (BIC-seq2). Gold-standard - FISH data from Dixon et al., “Integrative Detection and Analysis of Structural Variation in Cancer Genomes.” https://github.com/parklab/HiNT
Wang, Su, Soohyun Lee, Chong Chu, Dhawal Jain, Geoff Nelson, Jennifer M. Walsh, Burak H. Alver, and Peter J. Park. “HiNT: A Computational Method for Detecting Copy Number Variations and Translocations from Hi-C Data.” Preprint. Bioinformatics, June 3, 2019
TAD fusion score - quantifying the effect of deletions on Hi-C interactions. Intro about TAD fusion effect on genome structure. TAD fusion score - the expected total number of changes in pairwise genomic interactions as a result of the deletion. TAD fusion events are negatively selected. https://github.com/HormozdiariLab/TAD-fusion-score
Huynh, Linh, and Fereydoun Hormozdiari. “Contribution of Structural Variation to Genome Structure: TAD Fusion Discovery and Ranking.” BioRxiv, March 9, 2018.
Visualization
Hi-C data visualization review. Good introduction into the 3D genome organization, 115 key references. Table 2. Hi-C visualization tools
Ing-Simmons, Elizabeth, and Juan M. Vaquerizas. “Visualising Three-Dimensional Genome Organisation in Two Dimensions.” Development 146, no. 19 (October 1, 2019)
HiCeekR - Shiny app and GUI for Hi-C data analysis and interpretation. Input - aligned BAM file, with marked duplicates, restriction enzyme cutting sites (HRF5), genome in FASTQ, optionally ChIP-seq BAM, or RNA-seq gene expression (TSV). The workflow includes filtering (PCR artifacts, self-circle, dangling end fragments, using diffHiC) with diagnostic plots, binning interaction matrices in BED (coordinates) and TSV (counts) formats, normalization (ICE, WavSiS, using chromoR), calling A/B compartments (PCA, using HiTC), TADs (directionality index, TopDom, HiCseg), gene expression/epigenomic integration, network analysis and enrichment in GO, KEGG, other databases (using gProfileR). Visualization of zoomable heatmaps, networks (ggplot2, plotly, heatmaply, networkD3). Starts with creating configuration file. Compared with GITAR, HiCPro, HiC-bench, HiCdat, HiCexplorer, not Juicer or HiGlass. Illustrated using Rao2014 Gm12878 data. 32Gb RAM (minimal 16Gb) is sufficient, preprocessing of BAM files (Hi-C or ChIP-seq) is the longest. https://github.com/lucidif/HiCeekR
Di Filippo, Lucio, Dario Righelli, Miriam Gagliardi, Maria Rosaria Matarazzo, and Claudia Angelini. “HiCeekR: A Novel Shiny App for Hi-C Data Analysis.” Frontiers in Genetics 10 (November 4, 2019): 1079.
HiCBricks - data format and visualization package. hdf5-based data storage format to handle large Hi-C matrices. Visualization of one or two Hi-C matrices, adding annotations. https://bioconductor.org/packages/HiCBricks/
Pal, Koustav, Ilario Tagliaferri, Carmen M Livi, and Francesco Ferrari. “HiCBricks: Building Blocks for Efficient Handling of Large Hi-C Datasets.” Edited by Inanc Birol. Bioinformatics, November 7, 2019
3D Genome Browser - visualizing existing Hi-C and other chromatin conformation capture data. Alongside with genomic and epigenomic data. Own data can be submitted in BUTLR format. http://promoter.bx.psu.edu/hi-c/
Wang, Yanli, Fan Song, Bo Zhang, Lijun Zhang, Jie Xu, Da Kuang, Daofeng Li, et al. “The 3D Genome Browser: A Web-Based Browser for Visualizing 3D Genome Organization and Long-Range Chromatin Interactions.” Genome Biology 19, no. 1 (December 2018)
DNARchitect - a Shiny App for visualizing genomic data (HiC, mRNA, ChIP, ATAC, etc.) in bed, bedgraph, and bedpe formats. Wraps Sushi R package. Web-version, http://shiny.immgen.org/DNARchitect/, GitHub, https://github.com/alosdiallo/DNA_Rchitect
Ramirez, R N, K Bedirian, S M Gray, and A Diallo. “DNA Rchitect: An R Based Visualizer for Network Analysis of Chromatin Interaction Data.” Edited by John Hancock. Bioinformatics, August 2, 2019
GENOVA- GENome Organisation Visual Analytics, an R package for rich visual analysis of Hi-C data. Input - HiC-Pro processed files, BED, text formats. Single or two experiment analysis. Integration of external annotations, A/B compartments, cis-/trans-interactions, TADs and loops, genes, insulation score heatmap, differences.
HiGlass - visualization server for Google maps-style navigation of Hi-C maps. Overlay genes, epigenomic tracks. http://higlass.io/, https://github.com/higlass/higlass, and many HiGlass-related developments from the author, https://github.com/pkerpedjiev
Kerpedjiev, Peter, Nezar Abdennur, Fritz Lekschas, Chuck McCallum, Kasper Dinkla, Hendrik Strobelt, Jacob M Luber, et al. “HiGlass: Web-Based Visual Comparison And Exploration Of Genome Interaction Maps.” BioRxiv, 2017, 121889.
HiPiler - exploration and comparison of loops and domains as snippets-heatmaps of data. https://github.com/flekschas/hipiler
Lekschas, Fritz, Benjamin Bach, Peter Kerpedjiev, Nils Gehlenborg, and Hanspeter Pfister. “HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples.” IEEE Transactions on Visualization and Computer Graphics 24, no. 1 (January 2018). TechBlog: HiPiler simplifies chromatin structure analysis
HiCPlotter - Hi-C visualization tool, allows for integrating various data tracks. https://github.com/kcakdemir/HiCPlotter
Akdemir, Kadir Caner, and Lynda Chin. “HiCPlotter Integrates Genomic Data with Interaction Matrices.” Genome Biology 16 (2015)
NAT - the 4D Nucleome Analysis Toolbox, for Hi-C data (text, cool format) normalization (ICE, Toeplitz, CNV-Toeplitz), TAD calling (Directionality index, Armatus, custom), karyotype abnormalities visualization on inter-chromosomal matrices, time-course visualization. Matlab. https://github.com/laseaman/4D_Nucleome_Analysis_Toolbox
Seaman, Laura, and Indika Rajapakse. “4D Nucleome Analysis Toolbox: Analysis of Hi-C Data with Abnormal Karyotype and Time Series Capabilities.” Bioinformatics (Oxford, England) 34, no. 1 (01 2018)
HiC-3DViewer - HiC-3DViewer is an interactive web-based tool designed to provide an intuitive environment for investigators to facilitate the 3D exploratory analysis of Hi-C data. It based on Flask and can be run directly or as a docker container.
Mohamed Nadhir, Djekidel, Wang, Mengjie, Michael Q. Zhang, Juntao Gao. “HiC-3DViewer: a new tool to visualize Hi-C data in 3D space.” Quantitative Biology (2017)
NuChart - gene-centric network of genes interacting in 3D. Integration of epigenomic features. Statistical network analysis. ftp://fileserver.itb.cnr.it/nuchart/
Merelli, Ivan, Pietro Liò, and Luciano Milanesi. “NuChart: An R Package to Study Gene Spatial Neighbourhoods with Multi-Omics Annotations.” PloS One 8, no. 9 (2013): e75146. https://doi.org/10.1371/journal.pone.0075146
HiTC - R package for High Throughput Chromosome Conformation Capture analysis, Processed data import from TXT/BED into GRanges. Quality control, visualization. Normalization, 45-degree rotation and visualization of triangle TADs. Adding annotation at the bottom. PCA to detect A/B compartments. https://bioconductor.org/packages/HiTC/
Servant, Nicolas, Bryan R. Lajoie, Elphège P. Nora, Luca Giorgetti, Chong-Jian Chen, Edith Heard, Job Dekker, and Emmanuel Barillot. “HiTC: Exploration of High-Throughput ‘C’ Experiments.” Bioinformatics (Oxford, England) 28, no. 21 (November 1, 2012)
CoolBox - Jupyter notebook based genomic data visulization toolkit utilizing pyGenomeTracks. https://github.com/GangCaoLab/CoolBox
pyGenomeTracks - python module to plot beautiful and highly customizable genome browser tracks, https://github.com/deeptools/pyGenomeTracks
TADKit - 3D Genome Browser. Main web site, http://sgt.cnag.cat/3dg/tadkit/, and GitHub, https://github.com/3DGenomes/TADkit
De novo genome scaffolding
instaGRAAL - reimplementation of GRAAL genome assembler (chromosome level) for large genomes. Similar MCMC approach, implemented on NVIDIA GPU. Tested, among others, on segments of the human genome. https://github.com/koszullab/instaGRAAL
Baudry, Lyam, Nadège Guiglielmoni, Hervé Marie-Nelly, Alexandre Cormier, Martial Marbouty, Komlan Avia, Yann Loe Mie, et al. “InstaGRAAL: Chromosome-Level Quality Scaffolding of Genomes Using a Proximity Ligation-Based Scaffolder.” Genome Biology 21, no. 1 (December 2020)
HiCAssembler - Hi-C scaffolding tool combining assembly using Hi-C data with scaffolds from regular sequencing (short or long sequencing). Uses strategies from LACHESIS and 3D-DNA. Visual adjustment of scaffolding errors. Automatic and manual misassembly correction. https://github.com/maxplanck-ie/HiCAssembler
Renschler, Gina, Gautier Richard, Claudia Isabelle Keller Valsecchi, Sarah Toscano, Laura Arrigoni, Fidel Ramirez, and Asifa Akhtar. “Hi-C Guided Assemblies Reveal Conserved Regulatory Topologies on X and Autosomes despite Extensive Genome Shuffling.” BioRxiv, March 18, 2019.
bin3C - resolving metagenome-assembled genomes from Hi-C data. Metagenomic assembly using SPAdes. Tested using simulated (Sim3C and MetaART) and real-life data. Performance metrics: adjusted mutual information, weighted Bcubed. Contact matrix where bins are contigs. Infomap method for clustering the whole-contig graph. Compared with ProxiMeta (Phase Genomics). https://github.com/cerebis/bin3C
DeMaere, Matthew Z., and Aaron E. Darling. “Bin3C: Exploiting Hi-C Sequencing Data to Accurately Resolve Metagenome-Assembled Genomes.” Genome Biology 20, no. 1 (December 2019)
3D-DNA Hi-C genome assembler and its application/validation. Methods are in the supplemental. https://github.com/theaidenlab/3D-DNA
Dudchenko, Olga, Sanjit S. Batra, Arina D. Omer, Sarah K. Nyquist, Marie Hoeger, Neva C. Durand, Muhammad S. Shamim, et al. “De Novo Assembly of the Aedes Aegypti Genome Using Hi-C Yields Chromosome-Length Scaffolds.” Science (New York, N.Y.) 356, no. 6333 (07 2017)
GRAAL - Genome (Re)Assembly Assessing Likelihood - genome assembly from Hi-C data. Gaps in genome assembly that can be filled by scaffolding. Superior to Lachesis and dnaTri, which are sensitive to duplications, clustering they use to initially arrange the scaffolds, parameters, unknown reliability. A Bayesian approach, prior assumptions are that cis-contact probabilities follow a power-law decay and that counts in the interaction matrix are Poisson. Multiple genomic structures tested using MCMC (Multiple-Try Metropolis algorithm) to maximize the likelihood of data given a genomic structure. https://github.com/koszullab/GRAAL
Marie-Nelly, Hervé, Martial Marbouty, Axel Cournac, Jean-François Flot, Gianni Liti, Dante Poggi Parodi, Sylvie Syan, et al. “High-Quality Genome (Re)Assembly Using Chromosomal Contact Data.” Nature Communications 5 (December 17, 2014)
dnaTri - genome scaffolding via probabilistic modeling using two constraints of Hi-C data - distance-dependent decay and cis-trans ratio. Using known chromosome scaffolds and de novo assembly. Naive Bayes classifier to distinguish chromosome-specific vs. on different chromosomes contigs. Average linkage clustering to assemble contigs into 23 groups of chromosomes. Completed 65 previously unplaced contigs. Data, https://github.com/NoamKaplan/dna-triangulation
Kaplan, Noam, and Job Dekker. “High-Throughput Genome Scaffolding from in Vivo DNA Interaction Frequency.” Nature Biotechnology 31, no. 12 (December 2013)
Lachesis - a three-step genome scaffolding tool: 1) graph clustering of scaffolds to chromosome groups, 2) ordering clustered scaffolds (minimum spanning tree, reassembling longest-to-shortest branches), 3) assigning orientation (exact position and the decay of interactions). Duplications and repeat regions may be incorrectly ordered/oriented. Tested on a normal human, mouse, drosophila genomes, and on the HeLa cancer genome. https://github.com/shendurelab/LACHESIS
Burton, Joshua N., Andrew Adey, Rupali P. Patwardhan, Ruolan Qiu, Jacob O. Kitzman, and Jay Shendure. “Chromosome-Scale Scaffolding of de Novo Genome Assemblies Based on Chromatin Interactions.” Nature Biotechnology 31, no. 12 (December 2013)
3D modeling
TADdyn - studying time-dependent dynamics of chromatin domains during natural and induced cell processes by simulating smooth 3D transitions of chromosome structure. A part of TADBit, developed by the Marti-Renom group. Tested on in situ Hi-C time course experiment, reprogramming of murine B cells to pluripotent cells, changes of 21 genomic loci. https://github.com/3DGenomes/TADbit/tree/TADdyn, [Data and video}(http://sgt.cnag.cat/3dg/datasets/)
Di Stefano, Marco, Ralph Stadhouders, Irene Farabella, David Castillo, François Serra, Thomas Graf, and Marc A. Marti-Renom. “Transcriptional Activation during Cell Reprogramming Correlates with the Formation of 3D Open Chromatin Hubs.” Nature Communications 11, no. 1 (December 2020)
StoH-C - 3D genome reconstruction using tSNE. Python scripts for 3D embedding and visualization (plot-ly, matplotlib, Chart Studio). Visually tested on fission yeast genome as compared with MDS-reconstructed genome (wild type, G1-arrested, rad21 mutation, clr4 deletion). https://github.com/kimmackay/StoHi-C
MacKay, Kimberly, and Anthony Kusalik. “StoHi-C: Using t-Distributed Stochastic Neighbor Embedding (t-SNE) to Predict 3D Genome Structure from Hi-C Data.” Preprint. Bioinformatics, January 29, 2020.
Hierarchical3DGenome - high-resolution (5kb) reconstruction of the 3D structure of the genome. Using LorDG (https://github.com/BDM-Lab/LorDG), first, assemble the 3D model at the level of TADs, then inside individual TADs. Gm12878 cell line, Arrowhead for TAD calling, KR and ICE normalization, benchmarking against miniMDS, five tests including comparison with FISH. https://github.com/BDM-Lab/Hierarchical3DGenome
Trieu, Tuan, Oluwatosin Oluwadare, and Jianlin Cheng. “Hierarchical Reconstruction of High-Resolution 3D Models of Large Chromosomes.” Scientific Reports 9, no. 1 (March 21, 2019): 4971.
CSynth - 3D genome interactive modeling on GPU, and visualization. http://csynth.org/
Todd, Stephen, Peter Todd, Simon J McGowan, James R Hughes, Yasutaka Kakui, Frederic Fol Leymarie, William Latham, and Stephen Taylor. “CSynth: A Dynamic Modelling and Visualisation Tool for 3D Chromatin Structure.” BioRxiv, January 1, 2019
GenomeFlow - a complete set of tools for Hi-C data alignment, normalization, 2D visualization, 3D genome modeling and visualization. ClusterTAD for TAD identification. LorDG and 3DMax for 3D genome reconstruction. https://github.com/jianlin-cheng/GenomeFlow
Trieu, Tuan, Oluwatosin Oluwadare, Julia Wopata, and Jianlin Cheng. “GenomeFlow: A Comprehensive Graphical Tool for Modeling and Analyzing 3D Genome Structure.” Bioinformatics (Oxford, England), September 12, 2018.
ShRec3D - shortest-path reconstruction in 3D. Genome reconstruction by translating a Hi-C matrix into a distance matrix, then multidimensional scaling. Uses binary contact maps. https://sites.google.com/site/julienmozziconacci/home/softwares
Lesne, Annick, Julien Riposo, Paul Roger, Axel Cournac, and Julien Mozziconacci. “3D Genome Reconstruction from Chromosomal Contacts.” Nature Methods 11, no. 11 (November 2014): 1141–43.
Papers
3D Genome, from technology to visualization - a GitBook by Xingzhao Wen and Sheng Zhong covering biological and computational aspects of 3D genomics and RNA-genome interactions
Methodological Reviews
pipelines_list.csv - A list of available pipelines, URLs, from Miura et al., “Practical Analysis of Hi-C Data”
pipeline_comparison.csv - Available analysis options in each pipeline, from Miura et al., “Practical Analysis of Hi-C Data”
Table summarizing functionality of Hi-C data analysis tools, from Calandrelli et al., “GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data”
Ay, Ferhat, and William S. Noble. “Analysis Methods for Studying the 3D Architecture of the Genome.” Genome Biology 16 (September 2, 2015) - Hi-C technology and methods review. Table 1 - list of tools. Biases, normalization, matrix balancing. Extracting significant contacts, obs/exp ratio, parametric (power-law, neg binomial, double exponential), non-parametric (splines). 3D enrichment. References. TAD identification, directionality index. Outlook, the importance of comparative analysis
Chang, Pearl, Moloya Gohain, Ming-Ren Yen, and Pao-Yang Chen. “Computational Methods for Assessing Chromatin Hierarchy.” Computational and Structural Biotechnology Journal 16 (2018) - Review of higher-order (chromatin conformation capture) and primary order (DNAse, ATAC) technologies and analysis tools. Table 1 - technology summaries. Table 2 - tool summaries. Inter-chromosomal calls using Binarized contact maps. Visualization. Primary order technologies - details and peak calling.
Nicoletti, Chiara, Mattia Forcato, and Silvio Bicciato. “Computational Methods for Analyzing Genome-Wide Chromosome Conformation Capture Data.” Current Opinion in Biotechnology 54 (December 2018) - 3C-Hi-C tools review, Table 1 lists categorizes main tools, Figure 1 displays all steps in technology and analysis (alignment, resolution, normalization, including accounting for CNVs, A/B compartments, TAD detection, visualization). A concise description of all tools.
Pal, Koustav, Mattia Forcato, and Francesco Ferrari. “Hi-C Analysis: From Data Generation to Integration.” Biophysical Reviews, December 20, 2018. - Hi-C technology, data, 3D structures, analysis, and tools. Technology improvement and increasing resolution. FASTQ processing steps (“Hi-C data analysis: from FASTQ to interaction maps” section), pipelines, finding minimum resolution, normalization. Downstream analysis: A/B compartment detection, TAD callers, Hierarchical TADs, interaction callers. Data formats (pairix, sparse matrix format, cool, hic, butlr, hdf5, pgl). Hi-C visualization tools. Table 2 - summary and comparison of all tools
Yardımcı, Galip Gürkan, and William Stafford Noble. “Software Tools for Visualizing Hi-C Data.” Genome Biology 18, no. 1 (December 2017). - Hi-C technology, data, and visualization review. Suggestions about graph representation.
Waldispühl, Jérôme, Eric Zhang, Alexander Butyaev, Elena Nazarova, and Yan Cyr. “Storage, Visualization, and Navigation of 3D Genomics Data.” Methods, May 2018 - Review of tools for visualization of 3C-Hi-C data, challenges, analysis (Table 1). Data formats (hic, cool, BUTLR, ccmap). Database to quickly access 3D data. Details of each visualization tool in Section 4
Oluwadare, Oluwatosin, Max Highsmith, and Jianlin Cheng. “An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data.” Biological Procedures Online 21, no. 1 (December 2019) - 3D genome reconstruction review. Intro into equilibrium/fractal globule models. Classification of reconstruction methods: distance-, contact-. and probability-based. Table 1 summarizes many tools, methods, and references.
General Reviews
Bouwman, Britta A. M., and Wouter de Laat. “Getting the Genome in Shape: The Formation of Loops, Domains and Compartments.” Genome Biology 16 (August 10, 2015) - TAD/loop formation review. Convergent CTCF, cohesin, mediator, different scenarios of loop formation. Stability and dynamics of TADs. Rich source of references.
Chakraborty, Abhijit, and Ferhat Ay. “The Role of 3D Genome Organization in Disease: From Compartments to Single Nucleotides.” Seminars in Cell & Developmental Biology 90 (June 2019): 104–13. - 3D genome structure and disease. Evolution of technologies from FISH to variants of chromatin conformation capture. Hierarchical 3D organization, Table 1 summarizes each layer and its involvement in disease. Rearrangement of TADs/loops in cancer and other diseases. Specific examples of the biological importance of TADs, loops as means of distal communication.
Zheng, H., and Xie, W. (2019). “The role of 3D genome organization in development and cell differentiation.” Nat. Rev. Mol. Cell Biol. - 3D structure of the genome and its changes during gametogenesis, embryonic development, lineage commitment, differentiation. Changes in developmental disorders and diseases. Chromatin compartments and TADs. Chromatin changes during X chromosome inactivation. Promoter-enhancer interactions established during development are accompanied by gene expression changes. Polycomb-mediated interactions may repress developmental genes. References to many studies.
Yu, Miao, and Bing Ren. “The Three-Dimensional Organization of Mammalian Genomes.” Annual Review of Cell and Developmental Biology 33 (06 2017) - 3D genome structure review. The role of gene promoters, enhancers, and insulators in regulating gene expression. Imaging-based tools, all flavors of chromatin conformation capture technologies. 3D features - chromosome territories, topologically associated domains (TADs), the association of TAD boundaries with replication domains, CTCF binding, transcriptional activity, housekeeping genes, genome reorganization during mitosis. Use of 3D data to annotate noncoding GWAS SNPs. 3D genome structure change in disease.
Fraser, J., C. Ferrai, A. M. Chiariello, M. Schueler, T. Rito, G. Laudanno, M. Barbieri, et al. “Hierarchical Folding and Reorganization of Chromosomes Are Linked to Transcriptional Changes in Cellular Differentiation.” Molecular Systems Biology 11, no. 12 (December 23, 2015) - 3D genome organization parts. Well-written and detailed. References. Technologies: FISH, 3C. 4C, 5C, Hi-C, GCC, TCC, ChIA-PET. Typical resolution - 40bp to 1Mb. LADs - conserved, but some are cell type-specific. Chromosome territories. Cell type-specific. Inter-chromosomal interactions may be important to define cell-specific interactions. A/B compartments identified by PCA. Chromatin loops, marked by CTCF and Cohesin binding, sometimes, with Mediator. Transcription factories
Dekker, Job, Marc A. Marti-Renom, and Leonid A. Mirny. “Exploring the Three-Dimensional Organization of Genomes: Interpreting Chromatin Interaction Data.” Nature Reviews. Genetics 14, no. 6 (June 2013) - 3D genome review. Chromosomal territories, transcription factories. Details of each 3C technology. Exponential decay of interaction frequencies. Box 2: A/B compartments (several Mb), TAD definition, size (hundreds of kb). TADs are largely stable, A/B compartments are tissue-specific. Adjacent TADs are not necessarily of opposing signs, may jointly form A/B compartments. Genes co-expression, enhancer-promoters interactions are confined to TADs. 3D modeling.
Witten, Daniela M., and William Stafford Noble. “On the Assessment of Statistical Significance of Three-Dimensional Colocalization of Sets of Genomic Elements.” Nucleic Acids Research 40, no. 9 (May 2012)
Technology
Review of Hi-C, Capture-C, and Capture-C technologies, their computational preprocessing. Experimental protocols, similarities and differences, types of reads (figures), details of alignment, read orientation, elimination of artifacts, quality metrics. A brief overview of preprocessing tools. Example preprocessing of three types of data. Java tool for preprocessing all types of data, Diachromatic (Differential Analysis of Chromatin Interactions by Capture), GOPHER (Generator Of Probes for capture Hi-C Experiments at high Resolution) for genome cutting, probe design
Hansen, Peter, Michael Gargano, Jochen Hecht, Jonas Ibn-Salem, Guy Karlebach, Johannes T. Roehr, and Peter N. Robinson. “Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data.” Genes 10, no. 7 (July 18, 2019): 548.
Chromosome conformation capture technologies, 4C, 5C, Hi-C, ChIP-loop, ChIA-PET. From microscopy observations (constrained movement of genomic loci, LADs, preferential stability of chromosome conformation and its independence from transcription), to technology details (Figure 1). Examples of alpha- and beta-globin locus studies by different technologies, X chromosome inactivation, HOXA-d gene clusters. The future vision of single-cell, single-allele investigation of chromatin interactions.
Wit, E. de, and W. de Laat. “A Decade of 3C Technologies: Insights into Nuclear Organization.” Genes & Development 26, no. 1 (January 1, 2012)
Review of technologies for studying the 3D structure of the genome. From microscopy to 3C techniques revealing CTCF and cohesin as the key proteins for establishing chromatin loops.TADs are unlikely over large distances >>1Mb. Details of 3C, 4C, 5C, Hi-C, ChIP-PET, and other derivatives. A/B compartments and their subdivision. TADs, their conservation, ~35-50% still seem to change. CTCF (directionality of binding important) and cohesin. Diseases and the 3D genome, examples. Key steps in data analysis and interpretation, software, visualization. Hi-C data specifics - chimeric reads, mapping, data representation as fixed or enzyme-sized bins, normalization, detection of A/B compartments and TAD boundaries, significant interactions. Hi-C analysis tools: HiC-Pro, HiCUP, HOMER, Juicer. Tools for 3D modeling.
Denker, Annette, and Wouter de Laat. “The Second Decade of 3C Technologies: Detailed Insights into Nuclear Organization.” Genes & Development 30, no. 12 (June 15, 2016)
scsHi-C - sister-chromatid-sensitive Hi-C to explore interactions between the sister chromatids. Distinguishing cis from trans sister contacts based on 4-thio-thymidine (4sT) labeling. Paired organization of sister chromatins in interphase and complete separation in mitosis. TADs that exhibit tight pairing are heterochromatin marked by H3K27me3. Chromatids are predominantly linked at TAD boundaries, within TADs - more flexible. Investigation of looping mechanism - NIPBL-depletion, Sororin degradation. Jupyter notebooks for each analysis https://github.com/gerlichlab/scshic_analysis
Mitter, Michael, Catherina Gasser, Zsuzsanna Takacs, Christoph C. H. Langer, Wen Tang, Gregor Jessberger, Charlie T. Beales, et al. “Sister-Chromatid-Sensitive Hi-C Reveals the Conformation of Replicated Human Chromosomes.” Preprint. Cell Biology, March 11, 2020.
4C technology, wet-lab protocol, and data analysis and visualization. R-based processing pipeline pipe4C, configuration parameters
Krijger, Peter H.L., Geert Geeven, Valerio Bianchi, Catharina R.E. Hilvering, and Wouter de Laat. “4C-Seq from Beginning to End: A Detailed Protocol for Sample Preparation and Data Analysis.” Methods 170 (January 2020)
SisterC Hi-C technology to test interactions between sister chromatids. Uses BrdU incorporation in S-phase and single-strand degradation by UV/Hoechst treatment to obtain inter-sister or intra-sister interactions. Findings about the alignment of chromatids, strong at centromeres, looser (~35kb spaced) interactions along arms, loops up to 50kb. Tested on the S. cerevisiae genome. distiller-nf, pairtools, cooltools for processing.
“Detecting Chromatin Interactions along and between Sister Chromatids with SisterC,” bioRxiv 2020.03.10
Pore-C chromatin conformation capture using Oxford Nanopore Technologies, direct sequencing of multi-way chromatin contacts without amplification (concatemers, HOLR - high-order and long-range contacts). >18 times higher efficiency as compared with SPRITE, enrichment in concatemers. Tested on the Gm12878 cell line. Hi-C matrix from Pore-C well resembles published data. Concatemers show significantly lower distance decay. Concatemers better resolve complex cancer rearrangements, well-suited for de novo genome scaffolding. Pore-C tools and a Snakemake pipeline, detection of multi-way interactions
Ulahannan, Netha, Matthew Pendleton, Aditya Deshpande, Stefan Schwenk, Julie M. Behr, Xiaoguang Dai, Carly Tyer, et al. “Nanopore Sequencing of DNA Concatemers Reveals Higher-Order Features of Chromatin Structure.” Preprint. Genomics, November 7, 2019.
Tiled-C low-input 3C technology, requiring >20,000 cells. Applied for in vivo mouse erythroid differentiation, alpha-globin genes. TADs are pre-existing, regulatory interactions gradually form during differentiation. Integration with scRNA-seq data (CITE-seq technology) and ATAC-seq data. Analyzed using CCseqBasic pipeline and TiledC. Data (Tiled-C, CITE-seq, ATAC-seq)
Oudelaar, A. Marieke, Robert A. Beagrie, Matthew Gosden, Sara de Ornellas, Emily Georgiades, Jon Kerry, Daniel Hidalgo, et al. “Dynamics of the 4D Genome during Lineage Specification, Differentiation and Maturation in Vivo.” Preprint. Genomics, September 10, 2019.
Methyl-HiC technology, in situ Hi-C and WGBS. Comparable Hi-C matrices, TADs. 20% fewer CpGs overall, more CpGs in open chromatin. Proximal CpGs correlate irrespectively of loop anchors, weaker for inter-chromosomal interactions. Application to single-cell, mouse ESCs under different conditions. Relevant clustering, cluster-specific genes. Methods for wet-lab and computational processing. Bulk (replicates) and single-cell Methyl-HiC data. Scripts in https://bitbucket.org/dnaase/bisulfitehic/src/master/, Bhmem pipeline to map bisulfite-converted reads, Juicer pipeline for processing, VC normalization, HiCRep at 1Mb matrix similarity.
Li, Guoqiang, Yaping Liu, Yanxiao Zhang, Naoki Kubo, Miao Yu, Rongxin Fang, Manolis Kellis, and Bing Ren. “Joint Profiling of DNA Methylation and Chromatin Architecture in Single Cells.” Nature Methods, August 5, 2019.
Review of chromatin conformation capture technologies, from image-based methods (FISH), through proximity ligation (3/4/5C, Hi-C, TCC, ChIA-PET, scHi-C), to ligation-free methods (GAM, SPRITE, ChIA-Drop). Details of each technology (Table 1, Figures), comparison of them (Table 2).
Kempfer, Rieke, and Ana Pombo. “Methods for Mapping 3D Chromosome Architecture.” Nature Reviews Genetics, December 17, 2019.
Optimization steps for Hi-C wet-lab protocol. Pitfalls and their effect on the downstream quality. Recommendations for each step.
Golloshi, Rosela, Jacob Sanders, and Rachel Patton McCord. “Iteratively Improving Hi-C Experiments One Step at a Time.” Preprint. Genomics, March 22, 2018.
DLO Hi-C technology (digestion-ligation-only Hi-C). Uses two rounds of digestion and ligation, without biotin and pull-down. Allows for early evaluation of Hi-C quality. Cost-effective, high signal-to-noise-ratio. Tested on THP-1 (human monocytes) and K562 cells. Data processed with ChIA-PET Tool, normalized with ICE
Lin, Da, Ping Hong, Siheng Zhang, Weize Xu, Muhammad Jamal, Keji Yan, Yingying Lei, et al. “Digestion-Ligation-Only Hi-C Is an Efficient and Cost-Effective Method for Chromosome Conformation Capture.” Nature Genetics 50, no. 5 (May 2018)
HIPMap - high-throughput imaging and analysis pipeline to map the location of gene loci within the 3D space. FISH in a 384-well plate format, automated imaging.
Shachar, Sigal, Gianluca Pegoraro, and Tom Misteli. “HIPMap: A High-Throughput Imaging Method for Mapping Spatial Gene Positions.” Cold Spring Harbor Symposia on Quantitative Biology 80 (2015)
HiC method description, 1Mb, Gm06990. Small chromosomes, but 18, interact. Compartment A associated with open chromatin. 1Mb, 100kb resolution
Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, et al. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science (New York, N.Y.) 326, no. 5950 (October 9, 2009)
3C technology, the matrix of interaction frequencies, application to reveal spatial information, applied to yeast (S cerevisiae) genome. Interphase and metaphase chromosomes show different patterns of interactions. Distance-dependent decay of interaction frequencies. Basic observations on chromosome size, inter-chromosomal interactions.
Dekker, Job, Karsten Rippe, Martijn Dekker, and Nancy Kleckner. “Capturing Chromosome Conformation.” Science (New York, N.Y.) 295, no. 5558 (February 15, 2002)
Micro-C
Ultra-deep Micro-C maps of human embryonic stem cells and fibroblasts. Compared with in situ Hi-C (DpnII 4bp cutter), Micro-C allows for the detection of ~20,000 additional loops, improved signal-to-noise ratio. Similar distance-dependent decay, recovery of A/B compartments, better recovery of close-range interactions. High-resolution interaction boundaries are not created equal - most are CTCF+ and YY1+, some are CTCF- and YY1+, CTCF- and YY1-, and completely negative boundaries. Multiple, weak pause sites of SMC complexes. Distiller-nf processing, ICE normalization, other Mirnylab tools. Data at https://data.4dnucleome.org/
Krietenstein, Nils, Sameer Abraham, Sergey V. Venev, Nezar Abdennur, Johan Gibcus, Tsung-Han S. Hsieh, Krishna Mohan Parsi, et al. “Ultrastructural Details of Mammalian Chromosome Architecture.” Molecular Cell 78, no. 3 (May 2020)
Micro-C (MNase digestion Hi-C) data analysis. Nucleosome-level (~100-200bp) resolution Hi-C, captures all structures in regular Hi-C data. Stripes/flames structures correspond to enhancer-promoter interactions, colocalize with PolII, CTCF. TADs are further split into “micro TADs”, insulation score. Active boundaries colocalize with CpG islands, promoters, tRNA genes. Inactive boundaries are in repeats. Micro TADs subdivided into five subgroups. Two-start zig-zag model tetra-nucleosome stacks. Mouse stem cell, 38 biological replicates. Twitter
Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. “Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.” Molecular Cell, March 2020
Micro-C (MNase digestion Hi-C) technology and basic analysis. Human embryonic stem cells H1-ESC and differentiated human foreskin fibroblasts (HFFc6). Captures standard Hi-C features, with many additional interaction peaks (“dots”). Enrichment of classical marks of TAD boundaries (Fig 3C) - RAD21, TAF1, PHF8, CTCF, TBP, POL2RA, YY1, and more.
Krietenstein, Nils, Sameer Abraham, Sergey Venev, Nezar Abdennur, Johan Gibcus, Tsung-Han Hsieh, Krishna Mohan Parsi, et al. “Ultrastructural Details of Mammalian Chromosome Architecture.” Preprint. Genomics, May 17, 2019.
Micro-C technology - mononucleosome resolution mapping in yeast. Micrococcal nuclease to fragment chromatin. Yeast does not have TADs, but Micro-C revealed self-associating domains (chromatin interaction domains, CIDs) driven by the number of genes. Enrichment of histone modifications. Data
Hsieh, Tsung-Han S., Assaf Weiner, Bryan Lajoie, Job Dekker, Nir Friedman, and Oliver J. Rando. “Mapping Nucleosome Resolution Chromosome Folding in Yeast by Micro-C.” Cell 162, no. 1 (July 2015)
Multi-way interactions
Multi-contact 3C (MC-3C, based on conventional 3C, C-walk, and multi-contact 4C approaches) technology reveals distinct chromosome territories with very little mixing, never entanglement, same with chromosomal compartment domains (A-A, B-B interactions predominant, A-B - minimal to none). Analysis of C-walks - connected paths of pairwise interactions. Compared with C-walks generated from Hi-C data. Permutation analysis of the significance of insulated, mixed, and intermediate domains. PacBio sequencing, processing with SMRT Analysis package and a custom pipeline
Tavares-Cadete, Filipe, Davood Norouzi, Bastiaan Dekker, Yu Liu, and Job Dekker. “Multi-Contact 3C Data Reveal That the Human Genome Is Largely Unentangled.” Preprint. Genomics, March 4, 2020.
MC-4C multi-way contacts technology and computational protocols. ~2 weeks, ~$600/sample, best for <120kb regions. Computational protocol, test data included
Vermeulen, Carlo, Amin Allahyar, Britta A. M. Bouwman, Peter H. L. Krijger, Marjon J. A. M. Verstegen, Geert Geeven, Christian Valdes-Quezada, et al. “Multi-Contact 4C: Long-Molecule Sequencing of Complex Proximity Ligation Products to Uncover Local Cooperative and Competitive Chromatin Topologies.” Nature Protocols 15, no. 2 (February 2020)
MC-4C - Multi-way interactions technology, uses Nanopore MinION (or, PacBio) sequencing. Cross-linking, cutting with four-cutter and six-cutter enzymes, circularization, cutting with Cas9 gRNA designed to the viewpoint region, selective amplification of concatemers with primers specific to the viewpoint. Rigorous filtering strategy, interactions are allowing to distinguish reads coming from individual alleles. Compared with genome-wide multi-contact technologies C-walks, SPRITE, GAM, Tri-C. Applied to mouse beta-globin (fetal liver where hemoglobin genes are expressed, and brain, where they are silent) and protocadherin-alpha (same tissues, vice versa ) loci. Super enhancers can form hubs, target multiple genes. WAPL deletion in HAP1 (leukemia) cells stimulates the collision of CTCF-anchored domain loops to form rosette-like structures. MC-4C processing pipeline, Visualization of the analyzed data, raw data, processed data matrices
Allahyar, Amin, Carlo Vermeulen, Britta A. M. Bouwman, Peter H. L. Krijger, Marjon J. A. M. Verstegen, Geert Geeven, Melissa van Kranenburg, et al. “Enhancer Hubs and Loop Collisions Identified from Single-Allele Topologies.” Nature Genetics 50, no. 8 (August 2018)
GAM (genome architecture mapping) - restriction- and ligation free chromatin conformation capture technology. Isolates and sequences DNA content of many ultra-thin (~0.22um) cryo-fixes nuclear slices, whole-genome amplification.~30kb matrix is built on frequencies of co-occurrences of regions in multiple slices. Applied to mouse ESCs. Normalization using linkage disequilibrium (better than ICE). GAM and Hi-C matrices are highly correlated, A/B compartments, TADs significantly overlap. Enhancers and active genes significantly interact. Multi-way interactions, triplets, super-enhancers are enriched in three-way interactions, confirmed by FISH. SLICE (Statistical Inference of co-segregation) mathematical model to assign significance of interactions (Supplementary Note 1). Negative binomial modeling of significant interactions, log-normal noise. Supplementary Table 2 - genomic coordinates of triplet (three-way interacting) TADs. Raw data, processed matrices, Python scripts
Beagrie, Robert A., Antonio Scialdone, Markus Schueler, Dorothee C. A. Kraemer, Mita Chotalia, Sheila Q. Xie, Mariano Barbieri, et al. “Complex Multi-Enhancer Contacts Captured by Genome Architecture Mapping.” Nature 543, no. 7646 (23 2017)
C-walks, multi-way technology, genome-wide. TADs organize chromosomal territories. Active and inactive TAD properties. Methods: Good mathematical description of insulation score calculations. Filter TADs smaller than 250kb. Inter-chromosomal contacts are rare, ~7-10%. Concatemers (more than two contacts) are unlikely.
Olivares-Chauvet, Pedro, Zohar Mukamel, Aviezer Lifshitz, Omer Schwartzman, Noa Oded Elkayam, Yaniv Lubling, Gintaras Deikus, Robert P. Sebra, and Amos Tanay. “Capturing Pairwise and Multi-Way Chromosomal Conformations Using Chromosomal Walks.” Nature 540, no. 7632 (November 30, 2016)
TM3C - Tethered multiple 3C technology to probe multi-point contacts. NHEK, KBM7 cells, detected the Philadelphia chromosome, investigated triple contacts in the IGF2-H19 locus at 40kb, detected typical genomic structures (chromosomal compartments, distance-decay, TADs), reconstructed 3D genome at 1Mb resolution. A two-phase mapping strategy that separately maps chimeric subsequences within a single read (Methods). Multiple 4-cutter restriction enzymes
Ay, Ferhat, Thanh H Vu, Michael J Zeitz, Nelle Varoquaux, Jan E Carette, Jean-Philippe Vert, Andrew R Hoffman, and William S Noble. “Identifying Multi-Locus Chromatin Contacts in Human Cells Using Tethered Multiple 3C.” BMC Genomics 16, no. 1 (2015)
INGRID - chromatin conformation capture technology using in-gel replication of interacting DNA segments. Detects multi-way interactions. Protocol, demonstration of beta-globin gene locus.
Gavrilov, Alexey A., Helena V. Chetverina, Elina S. Chermnykh, Sergey V. Razin, and Alexander B. Chetverin. “Quantitative Analysis of Genomic Element Interactions by Molecular Colony Technique.” Nucleic Acids Research 42, no. 5 (March 1, 2014)
Normalization
Lyu, Hongqiang, Erhu Liu, and Zhifang Wu. “Comparison of Normalization Methods for Hi-C Data.” BioTechniques 68, no. 2 (2020) - a comprehensive analysis of six Hi-C normalization methods for their ability to remove systematic biases. The introduction provides a good classification and overview of different normalization methods, including the latest methods for cross-sample normalization, such as “multiHiCcompare.” Human and mouse Hi-C data were used, only cis interaction matrices are considered. A systematic protocol for benchmarking is presented. Several benchmarks were performed, including statistical quality, the influence of resolution, consistency of distance-dependent changes in interaction frequency, reproducibility of the 3D architecture. multiHiCcompare is reported as outperforming other methods on a range of performance metrics.
Imakaev, Maxim, Geoffrey Fudenberg, Rachel Patton McCord, Natalia Naumova, Anton Goloborodko, Bryan R. Lajoie, Job Dekker, and Leonid A. Mirny. “Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization.” Nature Methods 9, no. 10 (October 2012) - ICE - Iterative Correction and Eigenvalue decomposition, normalization of HiC data. Assumption - all loci should have equal visibility. Deconvolution into eigenvectors/values.
Yaffe, Eitan, and Amos Tanay. “Probabilistic Modeling of Hi-C Contact Maps Eliminates Systematic Biases to Characterize Global Chromosomal Architecture.” Nature Genetics 43, no. 11 (November 2011) - Sources of biases: 1) non-specific ligation (large distance between pairs); 2) length of each ligated fragments; 3) CG content and nucleotide composition; 4) Mappability. Normalization. Enrichment of long-range interactions in active promoters. General aggregation of active chromosomal domains. Chromosomal territories, high-activity and two low-activity genomic clusters
TAD detection
Brief description of 22 TAD calling methods. Source: Zufferey et al., “Comparison of Computational Methods for the Identification of Topologically Associating Domains.”
Dali, Rola, and Mathieu Blanchette. “A Critical Assessment of Topologically Associating Domain Prediction Tools.” Nucleic Acids Research 45, no. 6 (April 7, 2017) - TAD definition, tools. Meta-TADs, hierarchy, overlapping TADs. HiCPlotter for visualization. Manual annotation as a gold standard. Sequencing depth and resolution affect things. Code, manual annotations
Forcato, Mattia, Chiara Nicoletti, Koustav Pal, Carmen Maria Livi, Francesco Ferrari, and Silvio Bicciato. “Comparison of Computational Methods for Hi-C Data Analysis.” Nature Methods, June 12, 2017 - Hi-C processing and TAD calling tools benchmarking, Table 1, simulated (Lun and Smyth method) and real data. Notes about pluses and minuses of each tool. TAD reproducibility is higher than chromatin interactions, increases with a larger number of reads. Consistent enrichment of TAD boundaries in CTCF, irrespectively of TAD caller. Hi-C replication is poor, just a bit more than random. Supplementary table 2 - technical details about each program, Supplementary Note 1 - Hi-C preprocessing tools, Supplementary Note 2 - TAD callers. Supplementary note 3 - how to simulate Hi-C data. Supplementary note 6 - how to install tools. Tools for TAD comparison, and simulated matrices, https://bitbucket.org/mforcato/hictoolscompare.git
Olivares-Chauvet, Pedro, Zohar Mukamel, Aviezer Lifshitz, Omer Schwartzman, Noa Oded Elkayam, Yaniv Lubling, Gintaras Deikus, Robert P. Sebra, and Amos Tanay. “Capturing Pairwise and Multi-Way Chromosomal Conformations Using Chromosomal Walks.” Nature 540, no. 7632 (November 30, 2016) - TADs organize chromosomal territories. Active and inactive TAD properties. Methods: Good mathematical description of insulation score calculations. Filter TADs smaller than 250kb. Inter-chromosomal contacts are rare, ~7-10%. Concatemers (more than two contacts) are unlikely.
Zufferey, Marie, Daniele Tavernari, Elisa Oricchio, and Giovanni Ciriello. “Comparison of Computational Methods for the Identification of Topologically Associating Domains.” Genome Biology 19, no. 1 (10 2018) - Comparison of 22 TAD callers across different conditions. Callers are classified as linear score-based, statistical model-based, clustering, graph theory. Table 1 and Additional file 1 summarizes each caller. The effect of data resolution, normalization, hierarchy. Test on Rao 2014 data, chromosome 6. ICE or LGF (local genomic feature,) normalization. The measure of Concordance (MoC) to compare TADs. CTCF/cohesin as a measure of biological significance. TopDom, HiCseg, CaTCH, CHDF are the top performers. R scripts, including for calculation MoC, https://github.com/CSOgroup/TAD-benchmarking-scripts
Rocha, Pedro P., Ramya Raviram, Richard Bonneau, and Jane A. Skok. “Breaking TADs: Insights into Hierarchical Genome Organization.” Epigenomics 7, no. 4 (2015) - Textbook overview of TADs in 3 pages with key references. 3D organization discovery using FISH, 3C, Hi-C. Discovery of A/B compartments (euchromatin, heterochromatin), TADs as regulatory units conserved even across syntenic regions in different organisms. TADs coordinate gene expression. TAD boundaries are not created equal. Examples of changes of TAD boundaries (Hox gene cluster, ES differentiation). Hierarchy of TADs.
Crane, Emily, Qian Bian, Rachel Patton McCord, Bryan R. Lajoie, Bayly S. Wheeler, Edward J. Ralston, Satoru Uzawa, Job Dekker, and Barbara J. Meyer. “Condensin-Driven Remodelling of X Chromosome Topology during Dosage Compensation.” Nature 523, no. 7559 (July 9, 2015). - Insulation Score to define TADs - sliding square along the diagonal, aggregating signal within it. This aggregated score is normalized and binned into TADs, boundaries. See Methods and implementation. matrix2insulation.pl, Parameters: -is 480000 -ids 320000 -im iqrMean -nt 0 -ss 160000 -yb 1.5 -nt 0 -bmoe 0.
“Hierarchical Regulatory Domain Inference from Hi-C Data” - presentation by Bartek Wilczyński about TAD detection, existing algorithms, new SHERPA and OPPA methods. Video, PDF, Web site, GitHub - SHERPA and OPPA code there.
Hi-C prediction
HiC-Reg - Predicting Hi-C contact counts from one-dimensional regulatory signals (Histone marks, CTCF, RAD21, Tbp, DNAse). Random Forest regression. Feature encoding - distance between two regions, pair-concat, window, multi-cell. Works across chromosomes (some chromosomes are worse than others) and cell lines (Gm12878, K562, Huvec, Hmec, Nhek, can be used to predict interactions on new cell lines). Selection of the most important features using multi-task group LASSO (distance, CTCF, Tbp, H4K20me1, DNAse, others). Predicted contacts correspond well to the original contacts (distance-stratified Pearson correlation), define TADs similar to the originals (Jaccard), define significant contacts (Fit-Hi-C) more enriched in CTCF binding. Validated on HBA1 and PAPPA gene promoters. Hi-C normalization doesn’t have much effect. https://github.com/Roy-lab/HiC-Reg
Zhang, Shilu, Deborah Chasman, Sara Knaack, and Sushmita Roy. “In Silico Prediction of High-Resolution Hi-C Interaction Matrices.” Nature Communications 10, no. 1 (December 2019)
Predicting TAD boundaries using training data and making new predictions. Bayesian network (BNFinder method), random forest vs. basic k-means clustering, ChromHMM, cdBEST. Using sequence k-mers and ChIP-seq data from modENCODE for prediction - CTCF ChIP-seq performs best. Used Boruta package for feature selection. The Bayesian network performs best. To read on their BNFinder method
Bednarz, Paweł, and Bartek Wilczyński. “Supervised Learning Method for Predicting Chromatin Boundary Associated Insulator Elements.” Journal of Bioinformatics and Computational Biology 12, no. 06 (December 2014)
Spectral clustering
Y. X Rachel Wang, Purnamrita Sarkar, Oana Ursu, Anshul Kundaje and Peter J. Bickel, “Network modelling of topological domains using Hi-C data” - TAD analysis using graph-theoretical (network-based) methods. Treats TADs as a “community” within the network. Shows that naive spectral clustering is generally ineffective, leaving gaps in the data.
Liu, Sijia, Pin-Yu Chen, Alfred Hero, and Indika Rajapakse. “Dynamic Network Analysis of the 4D Nucleome.” BioRxiv, January 1, 2018. - Temporal Hi-C data analysis using graph theory. Integrated with RNA-seq data. Network-based approaches such as von Neumann graph entropy, network centrality, and multilayer network theory are applied to reveal universal patterns of the dynamic genome. Toeplitz normalization. Graph Laplacian matrix. Detailed statistics.
Norton, Heidi K, Harvey Huang, Daniel J Emerson, Jesi Kim, Shi Gu, Danielle S Bassett, and Jennifer E Phillips-Cremins. “Detecting Hierarchical 3-D Genome Domain Reconfiguration with Network Modularity,” November 22, 2016. - Graph theory for TAD identification. Louvain-like local greedy algorithm to maximize network modularity. Vary resolution parameter, hierarchical TAD identification. Hierarchical spatial variance minimization method. ROC analysis to quantify performance. Adjusted RAND score to quantify the TAD overlap.
Chen, Jie, Alfred O. Hero, and Indika Rajapakse. “Spectral Identification of Topological Domains.” Bioinformatics (Oxford, England) 32, no. 14 (15 2016) - Spectral algorithm to define TADs. Laplacian graph segmentation using Fiedler vector iteratively. Toeplitz normalization to remove distance effect. Spectral TADs do not overlap with Dixon’s, but better overlap with CTCF. Python implementation https://github.com/shappiron/TAD-Laplacian-Identification
URLs
4D Nucleome Protocols - Collection of genomic technologies currently in use or being developed in the 4DN network - links to wet-lab protocols and papers. 4DN portal blog
3d-genome-processing-tutorial - A 3D genome data processing tutorial for ISMB/ECCB 2017. https://github.com/hms-dbmi/3d-genome-processing-tutorial
Workshop on measuring, analyzing, and visualizing the 3D genome with Hi-C data. Presentations (PDFs, PPTX) and Jupyter notebooks. Cooler, HiCGlass, HiPlier. https://github.com/hms-dbmi/hic-data-analysis-bootcamp
更多推荐
所有评论(0)