Functions#
analysis is the main Agatta function, which can be used to perform a standard cladistic or biogeographic analysis from hierarchical matrices or newick files. The analysis function can also return the following two interpretation tools:
the global retention index, the retention index per character, per state, and per free-subtree.
The procedure for testing character states in the context of taxonomic homology proposed by Cao (2008).
The following figure summarises the Agatta pipeline for the analysis function.
The following functions can be used when the user wants to perform only a specific part of the analysis pipeline:
hmatrix: converts a hierarchical matrix into a newick file.tripdec: decompose input trees into three-item statements and compute their weights.standardisation: replace the terminals of trees by others. Generally used in biogeographic analysis to replace OTU in phylogenetic trees by areas of endemism.convert: compute a triplet matrix readable by an external software (PAUP*, TNT, WQFM, wTREE-QMC).fp: compute a free-paralogy analysis for character trees with repeated leaves.consensus: compute a consensus tree.support: compute the retention index (global, per character, per state, per free-paralogy subtree).chartest: compute the character state test procedure for all character states in three-item analysis.
Finally, the describetree function is separate and can be used to obtain simple information about trees, their number of leaves, polytomies, etc. This can be useful when manipulating trees before and after analysis.
The following section explains how each function works, as also shown using the help function, for example agatta help analysis.
Analysis#
Main function of the Agatta python package. Allow to perform a three-item
analysis, e.g., in the context of systematics phylogenetics, using
hierarchical characters, or in cladistic biogeography. The cladogram(s)
obtained by congruence is the tree that maximises the amount of hypotheses
of cladistic relationships (i.e., the three-item statements) deduced from
the input trees (the characters).
The analysis can be performed using one or several text file containing a
hierarchical matrix (see section mandatory parameters below for
informations about the format), newick rooted trees encoded in a newick,
nexus, or nexml file, or a mix of input formats.
There are no constraints on the input trees excepted that they must
be rooted to perform the analysis. If they are repeated leaves
(polymorphism), they are automatically removed (several methods
are implemented, see below).
Several options are available for analysing the results: the user can
compute a consensus when several cladograms are optimal, a specific
character-state testing procedure can be used to test whether each
character state (i.e., each informative node of a tree) is a synapomorphy
or an homoplasy, and finally a retention index can be computed to
obtained the proportion of phylogenetic relationships of characters
that have been retained in the optimal cladogram.
Usage:
agatta analysis <file>... [-s -v --analysis=<type> --chartest
--consensus=<type> --parallel=<int>
--pdf --prefix=<file>
--repetitions=<type> --replicates=<int>
--ri --rosetta=<file> --softpath=<path>
--software=<type> --taxarep1=<path>
--weighting=<type> --detailed_tripdec]
Mandatory parameters:
<file> Path(s) of the file(s) containing the character trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
Mixing input file format is allowed.
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--analysis=<type> Type of tree search analysis between an exact
branch and bound ('bandb') or an heuristic tree search
('heuristic'). The heuristic search is only available through
PAUP*, TNT, WQFM, or wTREE-QMC thus it is mandatory to add the flag
--software=tnt, --software=paup, --software=wqfm, or
software=wtree-qmc (and the flag --softpath accordingly).
By default the analysis is in branch and bound below 15 terminals
and heuristic otherwise.
--chartest Test and locates all character states on the cladogram.
Each character state can be a synapomorphy if the hypothesis is
accepted or an homoplasy if the hypothesis of sameness is rejected.
Two output files are writen: prefix.chartest gives all locations
of the states on the cladogram and if the test is passed or not,
sorted by state, and prefix.chartest_node gives the same
information sorted by node.
--consensus=<type> Compute a consensus which can be a strict
consensus ('strict') or a reduced cladistic consensus ('rcc',
Wilkinson 1994) which is able to detect more common information
in subtrees. By default, the flag without argument produces a
strict consensus. The output file is prefix.constrict or
prefix.rcc.
--parallel=<type> Option for choosing if the analysis is made
using multiprocessing or not. This argument can be:
- 'not' if the user does not wan to use multiprocessing.
- 'auto' for automatic detection of the number of cpu.
- any integer corresponding to the number of cpu allowed.
By default, the analysis is made in parallel using all available
cpu. This option can be used in the case of very large character
tree that can saturate the RAM if too many parallel processing are
active.
--pdf Compute a pdf file to visualise character states on
the cladogram if --chartest is used. One page for each state is
writen. The file is named prefix.pdf.
--prefix=<file> Prefix of all saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--repetitions=<type> The removal of repeated leaves in character
trees is made using the method of free-paralogy subtree.
Two algorithms are implemented, the original one from Nelson and
Ladiges (1996) ('FPS') for dealing with paralogy in cladistic
biogeography, and the algorithm of Rineau et al. (2021) ('TMS')
designed for all cases of repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
The repetition-free character trees are writen in prefix.poly and
each new tree receives an id, e.g. 1.2 corresponds to the second
repetition-free subtree computed from the 1st original character
tree.
--replicates=<int> State the number of replicates in case
of an heuristic tree search. By default the heuristic search is
launched with 1000 replicates.
--ri Compute the retention index of the resulting cladogram and
a retention index for each character which states the percentage
of phylogenetic information retained in the optimal cladogram.
The results are writen in prefix.ri.
--rosetta=<file> If the input tree leaves are parts, cladistics
requires a standardisation step with replacement of parts to
wholes. It is especially important in cladistic biogeography where
terminal taxa are replaced by biogeographic areas. The rosetta flag
replaces leaves of input trees according to a csv file with its
path given as argument. The csv file is a table with two columns,
one with the name of the tree leaves and the second with their
corresponding names to be switched. The results of the
standardisation are writen in the file prefix.stand.
--softpath=<path> Path of the software declared in --software.
--software=<type> Choose how to perform the three-item analysis.
The analysis can be performed using the built-in branch and bound
in Agatta ('agatta'). 'paup' and 'tnt' can be used for branch and
bound or heuristic search. 'wqfm' and 'wtree-qmc' are conceived to
perform heuristic searches only.
By default the analysis is made using built-in branch and bound.
However it works only with very few terminals.
User should consider to switch the software is the
analysis time appears to be too long. A prefix.nex file is
generated if 'paup', a prefix.tnt file for 'tnt', a
prefix.wqfm for 'wqfm', and a prefix.wtqmc for 'wtree-qmc'.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
--weighting=<type> Weighting scheme to use on triplets. The
type of weighting will change the results of the analysis.
The following schemes can be used:
- FW: Fractional weighting from Rineau et al. (2021),
- FWNL: Fractional weighting from Nelson and Ladiges (1992),
- UW: Uniform weighting from Nelson and Ladiges (1992),
- MW: Minimal weighting from Wilkinson et al. (2004),
- AW: Additive weighting : the weight of a triplet in additive
weighting corresponds to the number of trees in which the
triplet is present,
- NW: No weighting (all triplets have a weight of 1).
By default 'FW' is used.
--detailed_tripdec Compute a detailed csv table showing the
link between triplet weights and character trees. Each column
corresponds to one character (same order as <file>). Each line
corresponds to a triplet. The last column and line give the sum
of all weights of the column or line, respectively.
--nsupport Compute node support for each resulting cladogram based
on triplets (amount weighted triplets compatible with each node).
Output:
Four ouput files are writen all the time when using the analysis
command in addition to optionnal output files.
The files are:
- prefix.log is a log file with all the parameters of the analysis.
- prefix.triplet is a file with all triplets deduced from the input
character trees. Each row corresponds to one triplet with its
weight as a fraction and as a float.
- prefix.taxabloc is a table file with the correspondance between
leaf identifiers and names given in the input.
- prefix.tre is a newick file recording all the optimal cladograms
found during the analysis.
hmatrix#
hmatrix converts one or several hierarchical matrices into rooted trees.
Each column of the matrix corresponds to one tree.
Usage:
agatta analysis <file>... [-s -v --chardec --prefix=<file>]
Mandatory parameters:
<file> Hierarchical matrix (one or several).
A complete guide on the hierarchical matrix format is available
here:
https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--chardec Decompose each tree into components (one subtree for
each informative node).
--prefix=<file> Prefix of the saving file. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
Output:
One output file prefix.tre with the trees deduced from the hierarchical
matrix.
Tripdec#
tripdec
Decomposes rooted tree(s) into minimal cladistic statements (triplets
or three-item statements) stating that two leaves are closer between them
than to a third. During decomposition the weight of each triplet is
computed according to a specific weighting scheme. In three-item analysis,
the weighted triplets are then analysed to compute the cladogram that is
in agreement with the maximum amout of them.
Usage:
agatta tripdec <file>... [-s -v --parallel=<int> --prefix=<file>
--taxarep1=<path> --weighting=<type>
--repetitions=<type> --detailed_tripdec]
Mandatory parameters:
<file> Path(s) of the file containing the character trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--parallel=<type> Option for choosing if the analysis is made
using multiprocessing or not. This argument can be:
- 'not' if the user does not wan to use multiprocessing.
- 'auto' for automatic detection of the number of cpu.
- any integer corresponding to the number of cpu allowed.
By default, the analysis is made in parallel using all available
cpu. This option can be used in the case of very large character
tree that can saturate the RAM if too many parallel processing are
active.
--prefix=<file> Prefix of all saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
--weighting=<type> Weighting scheme to use on triplets. The
type of weighting will change the results of the analysis.
The following schemes can be used:
- FW: Fractional weighting from Rineau et al. (2021),
- FWNL: Fractional weighting from Nelson and Ladiges (1992),
- UW: Uniform weighting from Nelson and Ladiges (1992),
- MW: Minimal weighting from Wilkinson et al. (2004),
- AW: Additive weighting : the weight of a triplet in additive
weighting corresponds to the number of trees in which the
triplet is present,
- NW: No weighting (all triplets have a weight of 1).
By default 'FW' is used.
--detailed_tripdec Compute a detailed csv table showing the
link between triplet weights and character trees. Each column
corresponds to one character (same order as <file>). Each line
corresponds to a triplet. The last column and line give the sum
of all weights of the column or line, respectively.
--repetitions=<type> The removal of repeated leaves in character
trees is made using the method of free-paralogy subtree.
Two algorithms are implemented, the original one from Nelson and
Ladiges (1996) ('FPS') for dealing with paralogy in cladistic
biogeography, and the algorithm of Rineau et al. (2021) ('TMS')
designed for all cases of repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
The repetition-free character trees are writen in prefix.poly and
each new tree receives an id, e.g. 1.2 corresponds to the second
repetition-free subtree computed from the 1st original character
tree.
Output:
Two files are writen after decomposition. The main one is
prefix.triplet, a file containing all triplets deduced from the input
character trees. Each row corresponds to one triplet with its
weight as an integer (if UW, AW, or NW), or as a fraction and as a
float otherwise. The second file prefix.taxabloc is a table file with
the correspondance between leaf identifiers and names given in the
input trees.
Standardisation#
standardisation
This command is used for standardisation of characters in the context of
cladistic biogeography, i.e., replacement of leaves using a correspondence
table. In cladistic theory, the standardisation is the construction of
character trees (hypotheses of kinship relationship between bearers) based
on homologies (hypotheses of kinship relationships between parts). The
standardisation is used to convert phylogenies in areagrams in cladistic
biogeography. The only currently implemented option for managing MAST is
their automatic deletion (i.e. MAST do not bear any unambiguous
biogeographic information).
Usage:
agatta standardisation <file> <file>... [-s -v --prefix=<file>]
Mandatory parameters:
Two <file> arguments at least are requested with the
standardisation command.
The first <file> argument is the path of a csv table with two
columns, the first column corresponds to the leaf names of input
trees, and the second column to corresponding names for
replacement. For example, in cladistic biogeography, taxa are in
the left column and corresponding areas in the right column.
The other(s) <file> correspond to the path(s) of the file(s)
containing the character trees. The trees can be encoded for Agatta
in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--prefix=<file> Prefix of the saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
Output:
One output newick file prefix.stand containing the trees after
leaf replacement.
Convert#
convert
The convert command is intended to compute a triplet matrix readable by an
external software (currently PAUP*, TNT, WQFM, wTREE-QMC are implemented)
from a file containing a hierarchical matrix, a list of trees, or a list of
triplets.
Usage:
agatta convert <file>... [-s -v --analysis=<type> --filetype=<type>
--log --parallel=<int> --prefix=<file>
--replicates=<int> --software=<type>
--taxarep1=<path> --weighting=<type>
--repetitions=<type>]
Mandatory parameters:
<file> Path(s) of the file(s) containing character trees or
triplets.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
The user can also use as input file a triplet file generated from
the Agatta tripdec command.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--analysis=<type> Type of tree search analysis between an exact
branch and bound ('bandb') or an heuristic tree search
('heuristic'). A line is writen accordingly in the output file.
By default the analysis is in branch and bound below 15 terminals
and heuristic otherwise.
--filetype=<type> The input file can be a classic file containing
newick trees (newick file, nexus file, or nexml file) ('trees'), or
a list of triplets ('triplets') generated using the Agatta tripdec
command.
--log Add a line in the output file to generate a log file during
the PAUP* or TNT analysis.
--parallel=<type> Option for choosing if the triplet decomposition
is made using multiprocessing or not. This argument can be:
- 'no' if the user does not want to use multiprocessing.
- 'auto' for automatic detection of the number of cpu.
- any integer corresponding to the number of cpu allowed.
By default, the analysis is made in parallel using all available
cpu. This option can be used in the case of very large character
tree that can saturate the RAM if too many parallel processing are
active.
--prefix=<file> Prefix of saving .nex or .tnt file. The complete
path can be used. By default, the prefix is 'agatta_out' and output
file is saved in the directory of <file>.
--replicates=<int> State the number of replicates in case
of an heuristic tree search. By default the heuristic search is
launched with 1000 replicates.
--software=<type> Choose how to perform the three-item analysis.
The analysis can be performed using the built-in branch and bound
in Agatta ('agatta'). 'paup' and 'tnt' can be used for branch and
bound or heuristic search. 'wqfm' and 'wtree-qmc' can be used for
heuristic search only. By default the analysis is made using
built-in branch and bound. However it works only with very few
terminals. User should consider to switch the software is the
analysis time appears to be too long. A prefix.nex file is
generated if 'paup', a prefix.tnt file for 'tnt', a prefix.wqfm
for 'wqfm', and a prefix.wtqmc for 'wtree-qmc'.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
--weighting=<type> Weighting scheme to use on triplets. The
type of weighting will change the results of the analysis.
The following schemes can be used:
- FW: Fractional weighting from Rineau et al. (2021),
- FWNL: Fractional weighting from Nelson and Ladiges (1992),
- UW: Uniform weighting from Nelson and Ladiges (1992),
- MW: Minimal weighting from Wilkinson et al. (2004),
- AW: Additive weighting : the weight of a triplet in additive
weighting corresponds to the number of trees in which the
triplet is present,
- NW: No weighting (all triplets have a weight of 1).
By default 'FW' is used.
--repetitions=<type> The removal of repeated leaves in character
trees is made using the method of free-paralogy subtree.
Two algorithms are implemented, the original one from Nelson and
Ladiges (1996) ('FPS') for dealing with paralogy in cladistic
biogeography, and the algorithm of Rineau et al. (2021) ('TMS')
designed for all cases of repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
The repetition-free character trees are writen in prefix.poly and
each new tree receives an id, e.g. 1.2 corresponds to the second
repetition-free subtree computed from the 1st original character
tree.
Output:
A triplet matrix file with weights that can be analysed using
PAUP* or TNT or a triplet file with weights for WQFM or wTREE-QMC.
fp#
fp
The free-paralogy analysis is a method used to manage repetitions
(polymorphism in phylogenetics) in the context of cladistic analysis
using hierarchical characters. Free-paralogy subtree analysis build
subtrees to avoid repetition of leaves. Two distinct algorithm for building
subtrees are currently implemented in Agatta (see --repetitions).
Usage:
agatta fp <file>... [-s -v --prefix=<file> --repetitions=<type>
--taxarep1=<path>]
Mandatory parameters:
<file> Path(s) of the file containing the character trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--prefix=<file> Prefix of all saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--repetitions=<type> Two algorithms are implemented, the original
one from Nelson and Ladiges (1996) ('FPS') for dealing with
paralogy in cladistic biogeography, and the algorithm of
Rineau et al. (2021) ('TMS') designed for all cases of
repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
Output:
One newick file prefix.poly with a list of subtrees free of repeated
leaves. Each subtree is labelled with the number of the original tree
and the number of the subtree generated from the original tree,
e.g. 1.2 corresponds to the second repetition-free subtree computed
from the 1st original character tree.
consensus#
consensus
Compute a consensus of several equally optimal trees. Two types of
consensus are available: a strict consensus, that displays only the clades
common to all trees, and a reduced cladistic consensus (rcc), that displays
subtrees that are common to all trees (or in other words, the rcc
generates all triplets common to all optimal trees and combines them
into the bigest trees).
Usage:
agatta consensus <file> [-s -v --consensus=<type> --prefix=<file>
--taxarep1=<path>]
Mandatory parameters:
<file> Path of the file containing the equally optimal trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--consensus=<type> Compute a consensus which can be a strict
consensus ('strict') or a reduced cladistic consensus ('rcc',
Wilkinson 1994) which is able to detect more common information
in subtrees. By default, the flag without argument produces a
strict consensus.
--prefix=<file> Prefix of all saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
Output:
One newick file containing the strict consensus prefix.constrict or
a list of common subtrees (the rcc profile) prefix.rcc.
""")
elif command == "describetree":
print("""
describetree
The describetree command output basic informations on a list of
rooted trees. The informations writen are related to the number of
nodes, terminals, internal nodes, symmetric nodes, apical nodes, to the
sesolution of the tree, number of dichotomies, polytomies.
Usage:
agatta describetree <file> [-s -v --prefix=<file> --showtaxanames]
Mandatory parameters:
<file> Path of the file containing rooted trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
THe following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--prefix=<file> Prefix of the saving file. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--showtaxanames The list of all tree leaves is writen in the
output file.
Output:
One output file prefix.dt with descriptors for each tree.
Support#
support
The support command gathers several triplet metrics and indices that are
intended to compare trees between them. The retention index measures the
amount of cladistic relationships from the input characters retained in
the optimal cladogram (or a consensus). The triplet distance measures
the distance in terms of triplets between two trees. The ITRI compares
in terms of triplets a tree relatively to a reference tree (used to
measure efficiency of methods using simulations).
All these metrics handle weighting schemes.
Usage:
agatta support <file> <file>... [-s -v --index=<type> --prefix=<file>
--taxarep1=<path> --taxarep2=<path>
--weighting=<type> --repetitions=<type>]
Mandatory parameters:
At least two <file> arguments are requested which represents path
of tree files. The first is the cladogram, the others are
character files.
The requested files depend of the --index flag:
--index=ri: the retention index compares the cladogram to its
characters. The first <file> contains one tree considered as
the optimal cladogram (or a consensus); the second <file>
contains the input character trees used to construct the
cladogram and can be newick files or a hierarchical matrix.
--index=tripdistance: compute various measures for comparison
between only two trees t1 and t2 (one in each file).
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--index The index flag specifies which measure to use to compare
rooted trees between retention index ('ri'), or inter-tree
retention index/triplet distance ('tripdistance'). More
informations on the input file in the mandatory parameters section.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
--taxarep2=<path> Idem as --taxarep1 for the second <file>.
--weighting=<type> Weighting scheme to use on triplets. The
type of weighting will change the results of the analysis.
The following schemes can be used:
- FW: Fractional weighting from Rineau et al. (2021),
- FWNL: Fractional weighting from Nelson and Ladiges (1992),
- UW: Uniform weighting from Nelson and Ladiges (1992),
- MW: Minimal weighting from Wilkinson et al. (2004),
- AW: Additive weighting : the weight of a triplet in additive
weighting corresponds to the number of trees in which the
triplet is present,
- NW: No weighting (all triplets have a weight of 1).
By default 'FW' is used.
--repetitions=<type> The removal of repeated leaves in character
trees is made using the method of free-paralogy subtree.
Two algorithms are implemented, the original one from Nelson and
Ladiges (1996) ('FPS') for dealing with paralogy in cladistic
biogeography, and the algorithm of Rineau et al. (2021) ('TMS')
designed for all cases of repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
The repetition-free character trees are writen in prefix.poly and
each new tree receives an id, e.g. 1.2 corresponds to the second
repetition-free subtree computed from the 1st original character
tree.
Output:
One ouput file is writen the results of the tree comparisons:
- For ri, a global retention index stating the overall information
content of all character trees retained in the cladogram is writen,
plus a ri for each character, a ri for each character state, and a
ri for each subtree if polymorphism allowing to cut tcharacter
trees in subtrees present.
- For tripdistance, several values are computed given two trees t1
(or reference/true tree) and t2 (or reconstructed tree) ('number of
triplets' can refer to a sum of triplet weights depending of the
weighting scheme chosen). The interpetation differs if one wants to
compare two equal topologies or if the comparison involves a
reference tree and a tree to be compared with (in this case, the
significance is added in parentheses):
* Number of triplets in t1 (relevant elements)
* Number of triplets in t2 (retreived elements)
* Number of triplets both in t1 and t2 (true positives)
Note that t1 triplets present in t2 and t2 triplets also in
t1 may differ because of the weighting)
* Number of triplets in t2 but not in t1 (false positives)
* Number of triplets in t1 but not in t2 (false negatives)
* ITRI(t1,t2) (Precision: (t2 triplets present in t1)/t2)
amount of triplets from the reconstructed tree that are true.)
* ITRI(t2,t1) (Recall: (t1 triplets present in t2)/t1) amount
of true triplets that are present in the reconstructed tree)
* Triplet distance (F1-score: harmonic mean of precision and
recall (2 * Precision * Recall) / (Precision + Recall))
All calculations are made using a specific weighting scheme
(option --weighting).
Chartest#
chartest
The chartest command is intended to use the character state testing
procedure for hierarchical characters. Each character state
(an informative node in a rooted tree) is tested against a cladogram:
if the test fails, the character state hypothesis (an hypothesis of clade)
is rejected, and the state becomes homoplasic; otherwise, if the state pass
the test, it becomes a synapomorphy that supports a specific node of the
cladogram.
Usage:
agatta chartest <file> <file>... [-s -v --pdf --prefix=<file>
--taxarep1=<path> --taxarep2=<path>
--repetitions=<type>]
Mandatory parameters:
<file> the first <file> contains one tree considered as
the optimal cladogram (or a consensus); the others are the path(s)
of file(s) containing the character trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
The following url give more information on how to build input
files: https://vrineau.github.io/AgattaDocs/Input%20files.html.
The first <file> contains only one tree: the optimal cladogram or
the consensus tree resulting from the analysis of a set of
character trees. The second <file> contains the set of character
trees.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--pdf Compute a pdf file to visualise character states on
the cladogram if --chartest is used. One page for each state is
writen. The file is named prefix.pdf.
--prefix=<file> Prefix of all saving files. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--taxarep1=<path> If the user wants to replace identifiers by real
leaf names in the result files, this flag can be used with a path
to a csv file with two columns, the first with the identifiers in
the actual newick strings and the other with the names the user
wants.
--taxarep2=<path> Idem as --taxarep1 for the second <file>.
--repetitions=<type> The removal of repeated leaves in character
trees is made using the method of free-paralogy subtree.
Two algorithms are implemented, the original one from Nelson and
Ladiges (1996) ('FPS') for dealing with paralogy in cladistic
biogeography, and the algorithm of Rineau et al. (2021) ('TMS')
designed for all cases of repetitions (not only paralogy).
If the flag is not used and if repetitions are detected, they are
automatically removed using Rineau et al. algorithm's.
The repetition-free character trees are writen in prefix.poly and
each new tree receives an id, e.g. 1.2 corresponds to the second
repetition-free subtree computed from the 1st original character
tree.
Output:
Two output files are writen: prefix.chartest gives all locations
of the states on the cladogram and if the test is passed or not,
sorted by state, and prefix.chartest_node gives the same
information sorted by node. For better visualisation, the flag --pdf
allow to generate a pdf file with a page for each character state with
the resulting location and results.
describetree#
The describetree command output basic informations on a list of
rooted trees. The informations writen are related to the number of
nodes, terminals, internal nodes, symmetric nodes, apical nodes, to the
sesolution of the tree, number of dichotomies, polytomies.
Usage:
agatta describetree <file> [-s -v --prefix=<file> --showtaxanames]
Mandatory parameters:
<file> Path of the file containing rooted trees.
The trees can be encoded for Agatta in a file in several ways:
- a hierarchical matrix,
- A newick file with a single newick tree on each line,
- A nexus file (extension in .nex),
- A nexml file (extension in .nexml).
THe following url give more information on how to build input
files: URL.
Optionnal parameters:
-s Silent mode.
-v Verbose mode.
--prefix=<file> Prefix of the saving file. The complete path can
be used. By default, the prefix is 'agatta_out' and all files are
saved in the directory of the first <file>.
--showtaxanames The list of all tree leaves is writen in the
output file.
Output:
One output file prefix.dt with descriptors for each tree.
References#
Cao, N. (2008). Analyse à trois éléments et anatomie du bois des Fagales Engl (Doctoral dissertation, Paris, Muséum national d’histoire naturelle).