Skip to content

LanceOTron Command Line Interface

CircleCI

A bare-bones interface to the trained LanceOTron (LoT) model from the command line.

LoT is an all-in-one peak caller that identifies peak regions from a coverage track and uses a convolutional neural network to classify them based on their shape. This algorithm has a web client available at https://lanceotron.molbiol.ox.ac.uk/ where users can upload coverage tracks, call peaks, and visualize them using multi locus view, a powerful visualization engine. This web client will do most of the heavy lifting for most of the people that want to use the tool, but for those who need to call peaks in batch mode we provide a command line interface in this package. This document details the installation of LoT as well as typical use cases. See the left side of this page for tutorials on the three main modules.

Installation

  1. Clone the repository.
  2. Install dependencies with pip.
  3. Install the package.
  4. Run tests to ensure that everything is working.
git clone git@github.com:Chris1221/lanceotron.git; cd lanceotron # Step 1
pip install -r requirements.txt # Step 2
pip install -e . # Step 3
python -m unittest

Usage

To see available commands, use the --help flag.

lanceotron --help

Call Peaks

To call peaks from a bigWig track, use the callPeaks command.

Call peaks from a coverage track and score them using the LanceOTron model.

Parameters:

Name Type Description Default
file str

Path to bigwig track.

required
threshold int

Initial threshold used for selecting candidate peaks. Defaults to 4.

4
window int

Window size for rolling mean to use for selecting candidate peaks. Defaults to 400.

400
folder str

Output folder. Defaults to "./".

'./'
skipheader bool

Skip writing header. Defaults to False.

False
Source code in lanceotron/modules.py
def find_and_score_peaks(
        file: str, 
        threshold: int = 4,
        window: int = 400, 
        folder: str = "./", 
        skipheader: bool = False) -> None:
    """Call peaks from a coverage track and score them using the LanceOTron model.

    Args:
        file (str): Path to bigwig track.
        threshold (int, optional): Initial threshold used for selecting candidate peaks. Defaults to 4.
        window (int, optional): Window size for rolling mean to use for selecting candidate peaks. Defaults to 400.
        folder (str, optional): Output folder. Defaults to "./".
        skipheader (bool, optional): Skip writing header. Defaults to False.
    """


    bigwig_file = file
    out_folder = make_directory_name(folder)
    initial_threshold = threshold

    min_peak_width = 50
    max_peak_width = 2000
    read_coverage_factor = 10**9
    out_file_name = bigwig_file.split('/')[-1].split('.')[0]+'_L-tron.bed'

    pyBigWig_object = pyBigWig.open(bigwig_file)
    read_coverage_total = pyBigWig_object.header()['sumData']
    read_coverage_rphm = read_coverage_total/read_coverage_factor
    pyBigWig_object.close()

    bigwig_data = Ltron.Bigwig_data(bigwig_file)
    genome_stats_dict = bigwig_data.get_genome_info()
    bed_file_out = []

    for chrom in genome_stats_dict:
        print(chrom)
        coverage_array_smooth = bigwig_data.make_chrom_coverage_map(genome_stats_dict[chrom], smoothing=window)
        enriched_region_coord_list = Ltron.label_enriched_regions_dynamic_threshold_width(coverage_array_smooth, genome_stats_dict[chrom]['chrom_mean']*initial_threshold, genome_stats_dict[chrom]['chrom_mean'], max_peak_width, min_region_size=min_peak_width)
        chrom_file_out = []
        if enriched_region_coord_list:
            wide_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_wide_v5_03.p')
            deep_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_deep_v5_03.p')

            coverage_array = bigwig_data.make_chrom_coverage_map(genome_stats_dict[chrom])/read_coverage_rphm
            X_wide_array, X_deep_array = Ltron.extract_signal_wide_and_deep_chrom(coverage_array, enriched_region_coord_list, read_coverage_rphm)
            standard_scaler_wide = pickle.load(open(wide_path, 'rb'))
            X_wide_array_norm = standard_scaler_wide.transform(X_wide_array)
            X_wide_array_norm = np.expand_dims(X_wide_array_norm, axis=2)
            standard_scaler = StandardScaler()
            X_deep_array_norm_T = standard_scaler.fit_transform(X_deep_array.T)
            standard_scaler_deep = pickle.load(open(deep_path, 'rb'))
            X_deep_array_norm = standard_scaler_deep.transform(X_deep_array_norm_T.T)
            X_deep_array_norm = np.expand_dims(X_deep_array_norm, axis=2)
            model = build_model()
            model_classifications = model.predict([X_deep_array_norm, X_wide_array_norm], verbose=1)
            K.clear_session()
            for i, coord_pair in enumerate(enriched_region_coord_list):
                out_list = [chrom, coord_pair[0], coord_pair[1], model_classifications[0][i][0], model_classifications[1][i][0], model_classifications[2][i][0]]
                X_wide_list = X_wide_array[i][:-1].tolist()
                X_wide_list = [100. if x>10 else x for x in X_wide_list]
                out_list+=X_wide_list
                chrom_file_out.append(out_list)
            bed_file_out+=chrom_file_out

    with open(out_folder+out_file_name, 'w', newline='') as f:
        if not skipheader:
            f.write('chrom\tstart\tend\toverall_peak_score\tshape_score\tenrichment_score\tpvalue_chrom\tpvalue_10kb\tpvalue_20kb\tpvalue_30kb\tpvalue_40kb\tpvalue_50kb\tpvalue_60kb\tpvalue_70kb\tpvalue_80kb\tpvalue_90kb\tpvalue_100kb\n')
        bed_writer = csv.writer(f, delimiter='\t')
        bed_writer.writerows(bed_file_out)

Call Peaks with Input

To call peaks from a bigWig track with an input file, use the callPeaks_Input command.

Call peaks from a coverage track with an input and score them using the LanceOTron model.

Parameters:

Name Type Description Default
file str

Path to bigwig track.

required
input str

Control input track used to calculate Poisson-based significance of peaks.

required
threshold int

Initial threshold used for selecting candidate peaks. Defaults to 4.

4
window int

Window size for rolling mean to use for selecting candidate peaks. Defaults to 400.

400
folder str

Output folder. Defaults to "./".

'./'
skipheader bool

Skip writing header. Defaults to False.

False
Source code in lanceotron/modules.py
def call_peaks_with_input(
        file: str, 
        input: str,
        threshold: int = 4,
        window: int = 400, 
        folder: str = "./", 
        skipheader: bool = False) -> None:
    """Call peaks from a coverage track with an input and score them using the LanceOTron model.

    Args:
        file (str): Path to bigwig track.
        input (str): Control input track used to calculate Poisson-based significance of peaks.
        threshold (int, optional): Initial threshold used for selecting candidate peaks. Defaults to 4.
        window (int, optional): Window size for rolling mean to use for selecting candidate peaks. Defaults to 400.
        folder (str, optional): Output folder. Defaults to "./".
        skipheader (bool, optional): Skip writing header. Defaults to False.
    """

    bigwig_file = file
    control_file = input
    out_folder = make_directory_name(folder)
    initial_threshold = threshold

    min_peak_width = 50
    max_peak_width = 2000
    read_coverage_factor = 10**9
    out_file_name = bigwig_file.split('/')[-1].split('.')[0]+'_L-tron.bed'


    pyBigWig_object = pyBigWig.open(bigwig_file)
    read_coverage_total = pyBigWig_object.header()['sumData']
    read_coverage_rphm = read_coverage_total/read_coverage_factor
    pyBigWig_object.close()

    bigwig_data = Ltron.Bigwig_data(bigwig_file)
    genome_stats_dict = bigwig_data.get_genome_info()
    bed_file_out = []

    for chrom in genome_stats_dict:
        print(chrom)
        coverage_array_smooth = bigwig_data.make_chrom_coverage_map(genome_stats_dict[chrom], smoothing=window)
        enriched_region_coord_list = Ltron.label_enriched_regions_dynamic_threshold_width(coverage_array_smooth, genome_stats_dict[chrom]['chrom_mean']*initial_threshold, genome_stats_dict[chrom]['chrom_mean'], max_peak_width, min_region_size=min_peak_width)
        chrom_file_out = []
        if enriched_region_coord_list:

            wide_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_wide_v5_03.p')
            deep_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_deep_v5_03.p')

            coverage_array = bigwig_data.make_chrom_coverage_map(genome_stats_dict[chrom])/read_coverage_rphm
            X_wide_array, X_deep_array = Ltron.extract_signal_wide_and_deep_chrom(coverage_array, enriched_region_coord_list, read_coverage_rphm)
            standard_scaler_wide = pickle.load(open(wide_path, 'rb'))
            X_wide_array_norm = standard_scaler_wide.transform(X_wide_array)
            X_wide_array_norm = np.expand_dims(X_wide_array_norm, axis=2)
            standard_scaler = StandardScaler()
            X_deep_array_norm_T = standard_scaler.fit_transform(X_deep_array.T)
            standard_scaler_deep = pickle.load(open(deep_path, 'rb'))
            X_deep_array_norm = standard_scaler_deep.transform(X_deep_array_norm_T.T)
            X_deep_array_norm = np.expand_dims(X_deep_array_norm, axis=2)
            model = build_model()
            model_classifications = model.predict([X_deep_array_norm, X_wide_array_norm], verbose=1)
            pyBigWig_input = pyBigWig.open(control_file)
            read_coverage_total_input = pyBigWig_input.header()['sumData']
            read_coverage_rphm_input = read_coverage_total_input/read_coverage_factor
            K.clear_session()
            for i, coord_pair in enumerate(enriched_region_coord_list):
                average_cov = coverage_array[coord_pair[0]:coord_pair[1]].mean()*read_coverage_rphm
                pvalue_input = calculate_pvalue_from_input(chrom, coord_pair[0], coord_pair[1], read_coverage_total, read_coverage_total_input, pyBigWig_input, average_cov)
                out_list = [chrom, coord_pair[0], coord_pair[1], model_classifications[0][i][0], model_classifications[1][i][0], model_classifications[2][i][0], pvalue_input]
                X_wide_list = X_wide_array[i][:-1].tolist()
                X_wide_list = [100. if x>10 else x for x in X_wide_list]
                out_list+=X_wide_list
                chrom_file_out.append(out_list)
            pyBigWig_input.close()
            bed_file_out+=chrom_file_out

    with open(out_folder+out_file_name, 'w', newline='') as f:
        if not skipheader:
            f.write('chrom\tstart\tend\toverall_peak_score\tshape_score\tenrichment_score\tpvalue_input\tpvalue_chrom\tpvalue_10kb\tpvalue_20kb\tpvalue_30kb\tpvalue_40kb\tpvalue_50kb\tpvalue_60kb\tpvalue_70kb\tpvalue_80kb\tpvalue_90kb\tpvalue_100kb\n')
        bed_writer = csv.writer(f, delimiter='\t')
        bed_writer.writerows(bed_file_out)

Score a Bed file

To score the peaks in an existing Bed file, use the scoreBed command.

Score an existing bed file using LanceOTron's model and a coverage track.

Parameters:

Name Type Description Default
file str

Input bigWig track.

required
bed str

Bed file of regions to score.

required
folder str

Output folder. Defaults to "./".

'./'
skipheader bool

Skip writing header. Defaults to False.

False
Source code in lanceotron/modules.py
def score_bed(
    file: str,
    bed: str,
    folder: str = "./",
    skipheader: bool = False) -> None:
    """Score an existing bed file using LanceOTron's model and a coverage track.

    Args:
        file (str): Input bigWig track.
        bed (str): Bed file of regions to score.
        folder (str): Output folder. Defaults to "./".
        skipheader (bool): Skip writing header. Defaults to False.
    """
    bigwig_file = file
    out_folder = make_directory_name(folder)
    bed_file = bed

    read_coverage_factor = 10**9
    out_file_name = bigwig_file.split('/')[-1].split('.')[0]+'_L-tron.bed'

    pyBigWig_object = pyBigWig.open(bigwig_file)
    read_coverage_total = pyBigWig_object.header()['sumData']
    read_coverage_rphm = read_coverage_total/read_coverage_factor
    pyBigWig_object.close()

    bed_list = Ltron.bed_file_to_list(bed_file)
    chroms_in_bed = []
    for bed_entry in bed_list:
        if bed_entry[0] not in chroms_in_bed:
            chroms_in_bed.append(bed_entry[0])

    bigwig_data = Ltron.Bigwig_data(bigwig_file)
    genome_stats_dict = bigwig_data.get_genome_info(include_special_chromosomes=True)
    bed_file_out = []

    for chrom in chroms_in_bed:
        print(chrom)
        enriched_region_coord_list = []
        for bed_entry in bed_list:
            if bed_entry[0]==chrom:
                enriched_region_coord_list.append([bed_entry[1], bed_entry[2]])
        chrom_file_out = []
        if enriched_region_coord_list:
            wide_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_wide_v5_03.p')
            deep_path = pkg_resources.resource_filename('lanceotron.static', 'standard_scaler_deep_v5_03.p')

            coverage_array = bigwig_data.make_chrom_coverage_map(genome_stats_dict[chrom])/read_coverage_rphm
            X_wide_array, X_deep_array = Ltron.extract_signal_wide_and_deep_chrom(coverage_array, enriched_region_coord_list, read_coverage_rphm)
            standard_scaler_wide = pickle.load(open(wide_path, 'rb'))
            X_wide_array_norm = standard_scaler_wide.transform(X_wide_array)
            X_wide_array_norm = np.expand_dims(X_wide_array_norm, axis=2)
            standard_scaler = StandardScaler()
            X_deep_array_norm_T = standard_scaler.fit_transform(X_deep_array.T)
            standard_scaler_deep = pickle.load(open(deep_path, 'rb'))
            X_deep_array_norm = standard_scaler_deep.transform(X_deep_array_norm_T.T)
            X_deep_array_norm = np.expand_dims(X_deep_array_norm, axis=2)
            model = build_model()
            model_classifications = model.predict([X_deep_array_norm, X_wide_array_norm], verbose=1)
            K.clear_session()
            for i, coord_pair in enumerate(enriched_region_coord_list):
                out_list = [chrom, coord_pair[0], coord_pair[1], model_classifications[0][i][0], model_classifications[1][i][0], model_classifications[2][i][0]]
                X_wide_list = X_wide_array[i][:-1].tolist()
                X_wide_list = [100. if x>10 else x for x in X_wide_list]
                out_list+=X_wide_list
                chrom_file_out.append(out_list)
            bed_file_out+=chrom_file_out

    with open(out_folder+out_file_name, 'w', newline='') as f:
        if not skipheader:
            f.write('chrom\tstart\tend\toverall_peak_score\tshape_score\tenrichment_score\tpvalue_chrom\tpvalue_10kb\tpvalue_20kb\tpvalue_30kb\tpvalue_40kb\tpvalue_50kb\tpvalue_60kb\tpvalue_70kb\tpvalue_80kb\tpvalue_90kb\tpvalue_100kb\n')
        bed_writer = csv.writer(f, delimiter='\t')
        bed_writer.writerows(bed_file_out)

Examples

There is a basic bigWig file included in the test subdirectory. To try out the caller, execute it on this file.

lanceotron callPeaks test/chr22.bw -f output_folder

Citation

@article {Hentges2021.01.25.428108,
    author = {Hentges, Lance D. and Sergeant, Martin J. and Downes, Damien J. and Hughes, Jim R. and Taylor, Stephen},
    title = {LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq},
    year = {2021},
    doi = {10.1101/2021.01.25.428108},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/01/27/2021.01.25.428108},
    journal = {bioRxiv}
}

Bug Reports and Improvement Suggestions

Please raise an issue if there is anything you wish to ask or contribute.