Knime4NGS

Knime4NGS is a set of custom KNIME nodes, designed for processing large next generation sequencing datasets. The nodes wrap functionalities of popular command line tools like BWA, Samtools and GATK and allow an easy creation of comprehensive NGS KNIME workflows.

Besides the following quick start guide, we provide a comprehensive documentation and a technical supplement including detailed descriptions and usage examples.

Quick Start

Prerequisites

  • Linux OS (required)
  • KNIME or KNIME SDK (required)
  • Software binaries (can be obtained/configured via preference page)
  • KNIME Cluster Extension (recommended)
  • KNIME Virtual Nodes Extension (for parallel execution only)
Specific prerequisites required for certain nodes
  • R packages: gplots, ggplot2, reshape, gsalib; for RNA-Seq pipelines: argparse, edgeR, DESeq, Biobase and limma
  • Variant Effect Predictor (VEP) script including the Ensembl Core and Variation APIs (for variant annotation, see Section 2.6) 3 VEP itself requires the following perl modules: Archive::Extract, Archive::Zip, Test::More, DBI and CGI

Download and Install Nodes

You can download and install the KNIME4NGS nodes by specifying the KNIME4NGS Update Site as a software repository within KNIME. In order to do that, go to:

Help > Install New Software > Add 
and add http://ibisngs.github.io/knime4ngs/updateSite to the available software sites. After selecting the new software site, the IBIS KNIME Nodes should appear. If this is not the case, please uncheck "Group items by category". Select and install the displayed package. After restarting KNIME, the KNIME4NGS nodes should appear in the KNIME Node Repository.

Configuration (recommended)

Our nodes can be configured by using the KNIME4NGS preference page. Setting the required file paths will speed up workflow creation and node configuration. The KNIME4NGS Preference Page can be found at:
File > Preferences > KNIME > KNIME4NGS
Missing binaries can be obtained by using ’Download missing binaries’ and/or search for existing binaries in your file system. This step is necessary since most nodes do not contain the respective execution binaries. Although binaries can be selected in each node independently, setting global binary paths in the preference page is recommended.

Test Workflows

For testing the node functionalities, we provide two example workflows. A typical variant calling workflow that covers all steps from initial quality control and read mapping up to variant filtration as well as a workflow for differential expression detection. The workflows can be imported into KNIME by selecting:
File > Import KNIME Workflow...
A detailed description of how to configure the workflows is given in the corresponding README.

Documentation

A comprehensive documentaion including detailed descriptions and examples can be found here.

Source Code

The complete source code of KNIME4NGS is available at https://github.com/ibisngs/knime4ngs-src.

FAQs

  1. Which nodes can work on compressed input files?

    ⇒ Compatible with compressed FastQ files: FastQC, RawReadManipulator

    ⇒ Compatible with compressed vcf files: VEP

  2. No active session error in parallel chunk environment.

    ⇒ Reset parallel chunk start and re-execute.

  3. The node wont execute and always returns to "orange configure-state" without error message.

    ⇒ Reset the last successfully executed predecessor node and re-execute the nodes.

  4. Update has encountered a problem.

    ⇒ Restart KNIME before updating.

  5. Node is in EXECUTING state and cannot be canceled although it has been successfully executed on the cluster.

    ⇒ Go to the corresponding workflow folder in the knime-workspace. Edit the xml file of the executing node and replace EXECUTING by IDLE or CONFIGURED.

  6. VQSR: Unable to retrieve results.

    ⇒ Use VQSR in single-threaded mode (Threads 1).

  7. VQSR: No data found.

    ⇒ Reduce maximum number of Gaussians to 4.

  8. After inserting a new node into the workflow (or when using an imported workflow), the nodes show warnings concerning missing binaries although all paths are set by using the preference page.

    ⇒ Open the node dialog and click 'OK'. Afterwards the node should be ready for execution.

  9. How does the included FastQCmod differ from FastQC v0.10.1?

    ⇒ See Link for detailed description.

  10. What is the RawReadManipulator and how does it function?

    ⇒ See Link for detailed description.

Contact

knime4ngs@gmail.com