Working with Fragment Catalogs
Document Version: $Revision: 1.1 $
To start from scratch, the tool requires a CSV file with a SMILES
column and an activity column.  It's perfectly ok to have other
columns as well, you specify these two columns using the
--smiCol and --actCol arguments.
There are four steps to the process:
-  Build the fragment catalog, command line argument -b
 This loops through a set of molecules and builds a fragment
catalog containing all unique fragments found in the molecules.
  Requirements:
  Important arguments:
 
-  -n: specifies the maximum number of molecules to be considered
-  --catalog=[filename]: provides the name of the file to be used to store the pickled catalog.
 
-  Score molecules against the catalog, command line argument -s
  Requirements:
  Important arguments:
 
-  -n: specifies the maximum number of molecules to be considered
-  --catalog=[filename]: provides the name of the file containing a 
pickled catalog.
-  --scores=[filename]: provides the name of the file to be used to store 
the pickled compound scores
-  --onbits=[filename]: provides the name of the file to be used for
pickled OnBit lists (lists with the bits set by each molecule screened).  Providing this 
option can save a lot of time.
 
-  Calculate information gains for the molecules, command line argument -g
  Requirements:
  Important arguments:
 
-  --scores=[filename]: provides the name of the file containing pickled compound scores
-  --gains=[filename]: provides the name of the file to be used to store 
the gains (a csv file).
 
-  Display details about the fragments, command line argument -d
  Requirements:
  Important arguments:
 
-  --nBits=[value]: provide the maximum number of bits on which to report 
(they are presented in order of decreasing Gain).
-  --catalog=[filename]: provides the name of the file containing pickled catalog
-  --gains=[filename]: provides the name of the file containing the
  calculated gains (a CSV file)
-  --details=[filename]: provides the name of the file to be used to store 
the details (a CSV file).