Supercomputing |
Statewide SoftwareExtractProp 1.0
Note, In some cases, the prediction programs do not contain sufficient information to ascribe property values to a specific sequence. In this case, it is necessary to provide additional information to the output file where sequence information is not generated in the application. Examples are included below for the program Mitoprot. ExtractProp has been developed at the Ohio Supercomputer Center with support from the National Science Foundation through a project 2010 award to Drs. Meier and Rose of the Ohio State University. Correspondence regarding the program, including features suggestions and requests, should be directed to the author at eas@osc.edu. Click here to download the jar file. System Prerequesites: Accessing the application: Application Installation: Running the application: 1). For Unix/Linux: Assuming CLASSPATH variable has been set and exported, the file at /usr/tmp/file.text will be processed and output will be written to standard out. % java –jar ExtractProp.jar //usr/tmp/file.text If the CLASSPATH variable has not been set and exported, the following command will process the same file. The ExtractProp.jar file is located in /var/Extract/ExtractProp.jar. % java –jar /var/Extract/ExtractProp.jar //usr/tmp/file.text 2). For Windows: Assuming CLASSPATH variable has been set and exported, the file at c:\usr\tmp\file.text will be processed and output will be written to standard out. C:> java –jar ExtractProp.jar //usr/tmp/file.text If the CLASSPATH variable has not been set and exported, the following command will process the same file. The ExtractProp.jar file is located in c:\var\Extract\ExtractProp.jar. C:> java –jar /var/Extract/ExtractProp.jar //usr/tmp/file.text ExtractProp Output Format: The following tagged fields are generated by the program: <seqprop></seqprop> : the identifiers for an ExtractProp record Example: <seqprop> <key> Q9SU47</key> The output file is not fully XML compliant by intent in version 1.0. To change the current output file to an XML compliant document, simply add a start and end tag of your choice at the beginning and end of the file. Special processing notes: MULTICOIL Analysis Input: The detailed output file is used as input for extracting coiled-coil coverage information from the MULTICOIL program. It is very important that the FASTA file used as input to the MULTICOIL analysis have the sequence identifier as the second token in the header description. This insures an identifier is available for use in the detailed output. This second field in the FASTA header will be used for the sequence identifier in the resulting XML output file. HMMTOP Analysis Mitoprot Analysis These three enhancements are required: The string [BEGIN MITOPROT] should be on a separate line at the beginning of each the Mitoprot output for a given sequence. The string [SEQUENCE KEY] should be placed after the [BEGIN MITOPROT] line and must be followed by the sequence identifier chosen for the sequences processed. The string [END MITOPROT] signals the end of an individual Mitoprot output file. An example enhanced output file is shown below. This output was generated for the sequence with the identifier Q9ZWA5: [BEGIN MITOPROT]
[SEQUENCE KEY] (Q9ZWA5) Predotar Analysis The string [BEGIN PREDOTAR] is added at the beginning of the file, with the string [END PREDOTAR] added after the final output line generated by the application. The additional line [SEQUENCE ID] [MITOCHONDRIAL] [CHLOROPLAST] is added as column headers to assist the ExtractProp application to confirm correct selection of the values. The following is an example Predotar file which has been enhanced. [BEGIN PREDOTAR]
[END PREDOTAR] General Format Analysis: Support for a general format is provided for in the application. The general format looks like the following: EXTRACTPROP GENERAL Fields are whitespace separated, with sequenceid, start, end, property_name property_value all using a single token. The property_description field will use all remaining fields for its definition. Multiple property lines may be present in a given file. For example: EXTRACTPROP GENERAL The maximum multicoil probability in the domain represents a property named 'max_multicoil_prob' with a value .9988 in the range 56 to 88 for sequence with an identifier Q9AW89. Multicolumn Format Analysis: The general format, while very general, can be very verbose and lengthy to create. Consequently, a condensed multicolumn format has been developed to support input from a more compact tabular property representation. It is important to note that this analysis does not retain information for the property description in the current release. It is also important to note that an optional delimiter may be specified as the field separator.. This is particularly useful when files contain whitespace. An example is shown: EXTRACTPROP MULTICOLUMN Where the application will create two property records for identifier QSDV245, one for PROPNAME_A and one for PROPNAME_B. Multiple sequence property combinations can be placed in the same file. Additional capabilities and formats Documentation Last Updated – September 10, 2004 CitationAnnkatrin Rose, Sankaraganesh Manikantan, Shannon J. Schraegle, Michael A. Maloy, Eric A. Stahlberg, and Iris Meier (2004). Genome-wide Identification of Arabidopsis Coiled-Coil Proteins and the Establishment of the ARABI-COIL Database. Plant Physiology, 134, 927-939. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ExtractProp is a Java based application for processing output from programs for computational prediction of biological properties. The program has been designed for high-throughput processing, with an auto-sense capability to detect output types from a number of prediction programs and formats and process them appropriately. The resulting output is a tagged, near XML like document containing many properties of interest.