UsageΒΆ
You can use refchooser to select a good reference from a list of assemblies. You will need either a directory of assemblies or a file containing the paths to the assemblies. Refchooser prints a table of metrics for each assembly.
The captured metrics are:
- N50
- N90
- Number of contigs
- Assembly length
- Mean mash distance to all other assemblies
The results can be sorted by any metric you choose. By default, the assemblies are sorted by a simple score which is the N50/Distance ratio.
To print the table of assemblies sorted by N50:
# Choose the top 10 from a collection of 900 assemblies
refchooser metrics --sort N50 --top 10 assembly_paths.txt sketch_directory
Assembly N50 N90 Contigs Length Mean_Distance Path Score
SRR5868281 791291 119061 30 4845891 7.045173e-04 fasta/SRR5868281.fasta 1.123168e+09
SRR7439260 775386 146629 24 4815254 4.927176e-04 fasta/SRR7439260.fasta 1.573693e+09
SRR7906469 775033 432700 18 4779049 6.352714e-04 fasta/SRR7906469.fasta 1.220003e+09
SRR6949545 774499 146519 21 4814882 5.308503e-04 fasta/SRR6949545.fasta 1.458978e+09
SRR6949610 774132 105140 33 4888983 8.929775e-04 fasta/SRR6949610.fasta 8.669110e+08
SRR7426190 774120 146629 30 4820457 5.317999e-04 fasta/SRR7426190.fasta 1.455660e+09
SRR7426155 774120 146449 29 4775352 6.618484e-04 fasta/SRR7426155.fasta 1.169633e+09
SRR7441818 774120 146519 25 4797608 5.506614e-04 fasta/SRR7441818.fasta 1.405800e+09
SRR7439259 774120 146629 25 4815750 4.911346e-04 fasta/SRR7439259.fasta 1.576187e+09
SRR7439242 774120 146519 32 4803747 5.681594e-04 fasta/SRR7439242.fasta 1.362505e+09
To print the table of assemblies sorted by mean mash distance:
# Choose the top 10 from a collection of 900 assemblies
refchooser metrics --sort Mean_Distance --top 10 assembly_paths.txt sketch_directory
Assembly N50 N90 Contigs Length Mean_Distance Path Score
SRR1645597 226490 55522 55 4803421 4.611227e-04 fasta/SRR1645597.fasta 4.911708e+08
SRR1965968 237440 55508 59 4804728 4.614244e-04 fasta/SRR1965968.fasta 5.145805e+08
SRR1963305 166064 47353 61 4804588 4.618624e-04 fasta/SRR1963305.fasta 3.595530e+08
SRR1646405 226711 56774 58 4800826 4.629222e-04 fasta/SRR1646405.fasta 4.897389e+08
SRR1967694 287598 63327 54 4800251 4.637451e-04 fasta/SRR1967694.fasta 6.201639e+08
SRR7458586 216691 68846 48 4802064 4.642351e-04 fasta/SRR7458586.fasta 4.667700e+08
SRR7439539 333102 76943 45 4796679 4.646953e-04 fasta/SRR7439539.fasta 7.168180e+08
SRR5584738 216691 54594 62 4797573 4.649960e-04 fasta/SRR5584738.fasta 4.660061e+08
SRR7439240 774109 146629 34 4814374 4.658887e-04 fasta/SRR7439240.fasta 1.661575e+09
SRR8691682 216324 75764 47 4795530 4.659022e-04 fasta/SRR8691682.fasta 4.643121e+08
To print the table of assemblies sorted by the N50/Mean_Distance ratio score:
# Choose the top 10 from a collection of 900 assemblies
refchooser metrics --top 10 assembly_paths.txt sketch_directory
Assembly N50 N90 Contigs Length Mean_Distance Path Score
SRR7439240 774109 146629 34 4814374 4.658887e-04 fasta/SRR7439240.fasta 1.661575e+09
SRR7439252 774092 146519 26 4810069 4.722259e-04 fasta/SRR7439252.fasta 1.639241e+09
SRR5237981 749843 146968 30 4811975 4.681156e-04 fasta/SRR5237981.fasta 1.601833e+09
SRR7439259 774120 146629 25 4815750 4.911346e-04 fasta/SRR7439259.fasta 1.576187e+09
SRR7439260 775386 146629 24 4815254 4.927176e-04 fasta/SRR7439260.fasta 1.573693e+09
SRR7140222 773996 105140 27 4810707 4.939575e-04 fasta/SRR7140222.fasta 1.566928e+09
SRR7426191 774120 146629 28 4813941 5.066112e-04 fasta/SRR7426191.fasta 1.528036e+09
SRR7347002 774109 146519 34 4822419 5.137192e-04 fasta/SRR7347002.fasta 1.506872e+09
SRR1793292 774006 119046 56 4833765 5.247949e-04 fasta/SRR1793292.fasta 1.474873e+09
SRR6945020 774120 146519 21 4813673 5.303144e-04 fasta/SRR6945020.fasta 1.459738e+09