How to handle multiple input files

Sometimes you have multiple input files that you want to merge for one sample. For example, a common use case is a single library that was spread across multiple sequencing lanes, yielding multiple input files that need to be merged, and then run through the pipeline as one. Rather than putting multiple lines in your sample annotation sheet, which causes conceptual and analytical challenges, PEP has two ways to merge these:

  1. Use shell expansion characters (like ‘*’ or ‘[]’) in your data_source definition or filename (good for simple merges)
  2. Specify a merge table which maps input files to samples for samples with more than one input file (infinitely customizable for more complicated merges).

Dealing with multiple input files is described in detail in the PEP documentation.

Note: to handle different classes of input files, like read1 and read2, these are not merged and should be handled as different derived columns in the main sample annotation sheet (and therefore different arguments to the pipeline).