Clustering of Hi-C contact maps
Shubham Chaturvedi, Pierre Neuvial, Nathalie Vialaneix
2024-10-08
Source:vignettes/hicClust.Rmd
hicClust.Rmd
# IMPORTANT: this vignette can not be created if HiTC is not installed
if (!require("HiTC", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
}
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
Introduction
Hi-C is a sequencing-based molecular assay designed to measure intra and inter-chromosomal interactions between the DNA molecule. In particular, the identification of Topologically-Associated Domains (TADs), that is, of regions of the genome in which physical interactions are frequent, provides insight into the three-dimensional organization of a genome [2].
Hi-C data are in the form of two-dimensional contact maps,
i.e., matrices whose
entry quantifies the intensity of the physical interaction between two
genome regions
and
at the DNA level. In this vignette, we demonstrate the use of
adjclust::hicClust
to perform adjacency-constrained
hierarchical agglomerative clustering (HAC) of Hi-C contact maps. The
output of this function is a dendrogram, which can be cut to identify
TADs. The algorithm used for adjacency-constrained (HAC) is described in
[3,4].
Loading and displaying a sample Hi-C contact map
The data set hic_imr90_40_XX
is an object of class
HTCexp
which has been obtained from the HiTC
package [4]. It is a contact map corresponding to the first 500 x 500
bins on chromosome X vs chromosome X.
load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust"))
The script used to create this map can be found by executing the following command:
system.file("system/create_hic_chrXchrX.R", package="adjclust")
## [1] "/home/runner/work/_temp/Library/adjclust/system/create_hic_chrXchrX.R"
Now we have a look at the data.
HiTC::mapC(hic_imr90_40_XX)
Using hicClust
hicClust
operates directly on objects of class
HTCexp
fit <- hicClust(hic_imr90_40_XX)
It is also possible to work on binned data. Below we choose a bin size of :
##
## Call:
## hicClust(binned)
##
## Cluster method : hicClust
## Number of objects: 205
HiTC::mapC(binned)
The output is of class chac
. In particular, it can be
plotted as a dendrogram silently relying on the function
plot.dendrogram
:
plot(fitB, mode = "corrected")
Moreover, the output contains an element named merge
which describes the successive merges of the clustering, and an element
gains
which gives the improvement in the criterion
optimized by the clustering at each successive merge.
## [,1] [,2]
## [1,] -3 -4
## [2,] -2 1
## [3,] 2 -5
## [4,] -1 3
## [5,] 4 -6
## [6,] -17 -18
Other types of input
Contacts maps can also be stored as objects of class
Matrix::dsCMatrix
, or as plain text files. These types of
input are also accepted as first argument to hicClust
.
References
[1] Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14, 22.
[2] Dixon J.R., et al (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485(7398), 376.
[3] Randriamihamison N., Vialaneix N., and Neuvial P. (2021). Applicability and interpretability of Ward’s hierarchical agglomerative clustering with or without contiguity constraints. Journal of Classification, 38, 363–389.
[4] Servant N., et al (2012). HiTC: Exploration of High-Throughput ‘C’ experiments. Bioinformatics, 28(21), 2843-2844.
Session information
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] adjclust_0.6.10 HiTC_1.48.0 GenomicRanges_1.56.1
## [4] GenomeInfoDb_1.40.1 IRanges_2.38.1 S4Vectors_0.42.1
## [7] BiocGenerics_0.50.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.34.0 capushe_1.1.2
## [3] gtable_0.3.5 rjson_0.2.23
## [5] xfun_0.48 bslib_0.8.0
## [7] ggplot2_3.5.1 Biobase_2.64.0
## [9] lattice_0.22-6 vctrs_0.6.5
## [11] tools_4.4.1 bitops_1.0-9
## [13] curl_5.2.3 parallel_4.4.1
## [15] fansi_1.0.6 tibble_3.2.1
## [17] highr_0.11 pkgconfig_2.0.3
## [19] Matrix_1.7-0 RColorBrewer_1.1-3
## [21] sparseMatrixStats_1.16.0 desc_1.4.3
## [23] lifecycle_1.0.4 GenomeInfoDbData_1.2.12
## [25] compiler_4.4.1 Rsamtools_2.20.0
## [27] textshaping_0.4.0 Biostrings_2.72.1
## [29] munsell_0.5.1 codetools_0.2-20
## [31] htmltools_0.5.8.1 sass_0.4.9
## [33] RCurl_1.98-1.16 yaml_2.3.10
## [35] pillar_1.9.0 pkgdown_2.1.1
## [37] crayon_1.5.3 jquerylib_0.1.4
## [39] MASS_7.3-60.2 BiocParallel_1.38.0
## [41] cachem_1.1.0 DelayedArray_0.30.1
## [43] viridis_0.6.5 abind_1.4-8
## [45] digest_0.6.37 restfulr_0.0.15
## [47] fastmap_1.2.0 grid_4.4.1
## [49] colorspace_2.1-1 cli_3.6.3
## [51] SparseArray_1.4.8 magrittr_2.0.3
## [53] S4Arrays_1.4.1 utf8_1.2.4
## [55] XML_3.99-0.17 UCSC.utils_1.0.0
## [57] scales_1.3.0 rmarkdown_2.28
## [59] XVector_0.44.0 httr_1.4.7
## [61] matrixStats_1.4.1 gridExtra_2.3
## [63] ragg_1.3.3 evaluate_1.0.0
## [65] knitr_1.48 BiocIO_1.14.0
## [67] viridisLite_0.4.2 rtracklayer_1.64.0
## [69] rlang_1.1.4 dendextend_1.18.0
## [71] Rcpp_1.0.13 glue_1.8.0
## [73] jsonlite_1.8.9 R6_2.5.1
## [75] MatrixGenerics_1.16.0 GenomicAlignments_1.40.0
## [77] systemfonts_1.1.0 fs_1.6.4
## [79] zlibbioc_1.50.0