goto IdAlign on the Web or Download the executable jar file (New version: March 2015).

Expected file format

The format of the uploaded file is expected to be a tab separated file with the first line in the file containing column names and subsequent lines being either blank or containing exactly the same number of columns as the first line. The column names line must contain a header "Name" -which indicates the column of metabolites - and "FileName" (case-sensitive) - which indicates the name of the "file" from which the row's data was drawn. All other column names are arbitrary although two or more columns with the same name will lead to undefined results.

Except for the Name and FileName columns all other columns are scanned to see if they can be converted into numbers (either floating point or integers). Currently no attempt is to infer the meaning of column values from header names. Spaces are stripped and a number can be prefixed by '<' or '>' and can have a "units" value postfix as one of (m/z | scans | %). If this is not the case then entire value is taken to be a text string.


The file is read and a list of metabolites created. Each metabolite references a map (dictionary) of files (keyed on the filename supplied in the FileName column) which – in turn -- contains a hash map of named values. Again the names are supplied by the column header under which the value appeared. A normalizing metabolite is found (initially the first that matches ‘rutenol’ in the list).

The user can then supply a data name (‘selected Data’) to display in the table. Values that fall below a user defined minimum are highlighted. Missing values for each column are calculated as half the smallest value found in that file/column or 0.0).

XLSX Output

The output is a table containing filenames as columns and metabolite values as rows. The output uses Excel's Formula support to scale entire columns by the input of a single value. Multiple worksheets are created. The first sheet presents the table of values selected by the user. Those values that are missing are replaced by a formula referencing a missing value cell that appears above each column. This value is initially equal to half the smallest value found for that column or zero if no values exist.

The next worksheet is a table of formulas viz: normalized!A3 = rawdata!A3/normalized!A1 where A1 is a cell containing the normalization value specified by the user at the web interface. The final worksheet permits whole table scaling.

Known Issues

If two different values exist for the same (metabolite- Name,FileName) pair then they are averaged.

The webapp stores the uploaded and parsed files server-side in the servlet’s session object. Currently up to ten files will be stored per session with the oldest files being “lost” and the entire session expiring after an idle time of 4 hours. The files are keyed on the filename sent by the browser during upload - a point of difference here since for example Firefox ( only sends the filename whereas IE7 ( sends the entire path name. Computational parameters (Data Value to display) specified by the users are stored in the session and applied to each computation and each file.

There are still unresolved issues about file character encoding. These seem to mainly affect the presentation of metabolite names under Firefox and not the computations.

When the software fails - such as when an incorrect file type is uploaded - it fails ungracefully and possibly confusingly. This is a UI issue that the author will improve if time permits.

Update: the webstart, Windows and Mac versions now use apache POI to generate the Excel spreadsheets

Undefined results will occur for if two or more columns have the same name. Currently the last in the row will "overwrite" the earlier columns.

It is assumed that each column has a homogenous format - that all values parse to a number or all parse to text.

logos The Government of Western Australia The University of Western Australia Australian Research Council Centre of Excellence in Plant Energy Biology