On the main limitations of assembly theory methods and their classification of molecular biosignatures
Biosignatures & Paleobiology
A recently introduced approach called “assembly theory”, featuring a computable index based on the basic principles of statistical compression, has been claimed as a new and superior approach to classifying and distinguishing living from non-living systems and the complexity of molecular biosignatures.
Here, we demonstrate that the assembly pathway method underlying this index is a suboptimal restricted version of Huffman coding (Shannon-Fano type), widely adopted in computer science in the 1950s, which is comparable ( or lower) than other popular statistical and computable compressions. diets. We show how simple modular instructions can mislead the assembly index, leading to the inability to capture subtleties beyond trivial statistical properties that are unrealistic in biological systems.
We present cases whose low complexities can arbitrarily deviate from randomness to which the assembly pathway method would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural). Our theoretical and empirical results imply that the assembly index, whose computability we show is not an advantage, offers no substantial advantage over existing computable or non-computable concepts and methods. Alternatives are discussed.
Abicumaran Uthamacumaran, Felipe S. Abrahão, Narsis A. Kiani, Hector Zenil
Comments: 32 pages with appendix, 3 figures
Subjects: Information theory (cs.IT)
Cite as: arXiv:2210.00901 [cs.IT] (or arXiv:2210.00901v2 [cs.IT] for this version)
Focus to learn more
From: Hector Zenil
[v1] Fri Sep 30 2022 11:19:53 AM UTC (1,113 KB)
[v2] Sun Oct 09 2022 00:33:31 UTC (557 KB)