On the main limitations of assembly theory methods and their classification of molecular biosignatures

Biosignatures & Paleobiology

Press release

cs. IT

October 13, 2022

Correlation diagram between “molecular assembly” (MA) and compression algorithms. The strongest positive correlation was identified between MA compression and 1D-RLE (R = 0.9001), which is one of the most basic compression schemes and among the most similar to the original definition of MA. Other compression algorithms, including Huffman coding (R = 0.896), also show a strong positive correlation with MA. As seen, the compression values ​​of 1D-RLE and 1D-Huffman encodings show overlapping and almost identical medians (horizontal line in the center) and ranges on the whisker plot. Our analysis reveals similarity in behavior of MA and popular lossless statistical compression algorithms that are based on the same counting principles.

cs. IT

A recently introduced approach called “assembly theory”, featuring a computable index based on the basic principles of statistical compression, has been claimed as a new and superior approach to classifying and distinguishing living from non-living systems and the complexity of molecular biosignatures.

Here, we demonstrate that the assembly pathway method underlying this index is a suboptimal restricted version of Huffman coding (Shannon-Fano type), widely adopted in computer science in the 1950s, which is comparable ( or lower) than other popular statistical and computable compressions. diets. We show how simple modular instructions can mislead the assembly index, leading to the inability to capture subtleties beyond trivial statistical properties that are unrealistic in biological systems.

We present cases whose low complexities can arbitrarily deviate from randomness to which the assembly pathway method would assign arbitrarily high statistical significance, and show that it fails in simple cases (synthetic or natural). Our theoretical and empirical results imply that the assembly index, whose computability we show is not an advantage, offers no substantial advantage over existing computable or non-computable concepts and methods. Alternatives are discussed.

Abicumaran Uthamacumaran, Felipe S. Abrahão, Narsis A. Kiani, Hector Zenil

Comments: 32 pages with appendix, 3 figures
Subjects: Information theory (cs.IT)
Cite as: arXiv:2210.00901 [cs.IT] (or arXiv:2210.00901v2 [cs.IT] for this version)
https://doi.org/10.48550/arXiv.2210.00901
Focus to learn more
Submission history
From: Hector Zenil
[v1] Fri Sep 30 2022 11:19:53 AM UTC (1,113 KB)
[v2] Sun Oct 09 2022 00:33:31 UTC (557 KB)
https://arxiv.org/abs/2210.00901
Astrobiology

Co-founder of SpaceRef, member of the Explorers Club, ex-NASA, external teams, journalist, space and astrobiology, deceased climber.

Sharon D. Cole