Takes output from HOMER with heatmaps of regions surrounding peaks/tss and reformats it to be amenable to further analysis and visualization. Automatically deduces the number of bins and samples based on column names produced by HOMER analysis.
ReadHeatmaps(heatmaps.raw, sample.names = NULL, select.cols = NA, test = FALSE, raw = FALSE)
output from HOMER of heatmap analysis over tag directories
names of samples analyzed (tag directories); make sure this is in the same order as supplied in the data file!
the colClasses arg for read.table; use this to select specific columns (samples) from the data, since a large file will crash R. Provide a vector with classes, with "NULL" (in quotes) for all columns that you want to exclude. Requires you to know composition (columns) of data...see parameter "test"
reads in first 5 rows for inspection of data size/composition
returns raw, unprocessed form of data (basically read.table output)
List of data.frames comprising positions (rows) vs. genes/regions
Highly recommended to register parallel backend with doParallel for parallelization of computations!
Warning: heatmaps take up a lot of disk space, and consequently will use about 8x the file size in RAM..making "huge" analysis unfeasible. Data is from the following command from HOMER:
annotatePeaks.pl tss mm10 -ann $GTF -size 5000 -hist 25 -ghist -d $TAGDIRS
ReadHeatmaps(hm.dat.txt, c("A", "B", "C"))