The Evolution of Science: From Descartes to Generative AI
“The belief that science means observable and repeatable experiments, which began with Descartes in the seventeenth century, is, to this extent, over. Global understanding is, by contrast, based on computational models supported by a knowledge infrastructure” — How to See the World, Nicolas Mirzoeff
The Arabic numeral system permitted numbers to be manipulated easily, which led to mathematics and its role as the tool to validate science. In fact, the history of science is in large part the history of new mathematics validating new fundamentals in science. Today, we have a new set of “mathematical” tools, the advances in Artificial Intelligence (AI) and Machine Learning (ML) in the last five years. This new ML is really not so much about generating new text or art. This new ML is about validating new science at a more fundamental level than we have ever explored before. The purpose of this article is to talk about this new ML tool to validate new science.
Modern science began with the work of Newton and Descartes. Newton gave us the first accurate understanding of physics, but he also is credited with developing differential calculus. This pairing of physical science and mathematics continues to shape research to this day particularly in the application of partial differential equations to multivariable problems in engineering and physics. Descartes is credited with developing algebra to explain geometry. A geometric shape could be explained by a series of equations (algebra), whereby coordinates located a point, points determined lines and lines determined planes and shape. This algebra supported Descartes’ view of science as a top-down inspection of the tangible from the macro to the micro level and a focus on matter, structure and linear, deterministic causality. Descartes, not surprisingly, was an empiricist philosopher in addition to his study of mathematics and science. His “natural” philosophy shaped science for the next two hundred years and still to this day.
The next big advance in science was Quantum Theory. Much of the underpinnings for quantum physics was based in new mathematics. First, Ludwig Boltzmann gave us statistical mechanics, which introduce probability and uncertainty into the study of physical science. Maxwell, Poincare, Heisenberg, Schrodinger, Bohr, Plank and Einstein all used math to build on Boltzmann’s work. This left us with a new understanding of reality built on sub-atomic, invisible particles behaving in a probabilistic fashion. We could not have ended up any further from Descartes’ natural philosophy. Science now focused on the invisible. Fortunately, the next breakthrough in math and science, Chaos Theory, helped us bridge the uncertainty of quantum physics with the natural world we see every day.
In 1972 MIT professor Edward Lorenz introduced the idea of deterministic chaos. Benoît Mandelbrot, a researcher at IBM, advanced the work of Lorenz, established “a mathematical basis of pattern formation in nature”[1], and showed that deterministic, nonlinear systems with sensitive dependence on initial conditions (SDIC) could be modeled on a computer. Not only did Mandelbrot explain a part of natural science that was previously little understood, but he introduced the concept of “fractals” to explain the patterns that consistently repeat across all of nature. With the patterns documented, the math came easily and computerization greatly facilitated further research in modeling chaotic phenomena in fields such as meteorology, geology and biology. Whatever was left of Descartes’ metaphysics and epistemology after quantum physics, the new understanding of the natural patterns shown by chaos theory demonstrated another way for math to explain previously unexplained science. Chaos theory also demonstrated perhaps a more significant point. Science could be understood by applying computer modeling to look for patterns in systems. This focus in science on systems was then applied to another category of systems — complexity science.
In 1984 Physic Nobel Laureate Murray Gell-Mann founded the Santa Fe Institute along with a group of other distinguished scientists and scholars to explore complex systems. Gell-Mann explained complexity, “what we should look for were great syntheses that were emerging today, that were highly interdisciplinary,” says Gell-Mann[2]. Some were already well on their way: Molecular biology. Nonlinear science. Cognitive science. But surely there were other emerging syntheses out there, he said, and this new institute should seek them out”. In contrast to chaotic systems, complex systems are not deterministic, as shown below. Deterministic systems exhibit “unique evolution”, wherein “a given state of a model is always followed by the same history of state transitions”.[3]
The feature of “nonlinear”, a “system need not change proportionally to the change in a variable”[4], provides the flexibility to mathematically capture the idea that all natural and manmade systems are networks that include feedback loops. This connectivity, different networked variables in different states at different points in time, explains the non-deterministic nature of complex systems, the multi-variable nature of the systems and the emergent quality of these systems. Emergence is a system feature where the characteristics of the whole cannot be explained additively by the components. Water turning into ice is an example of emergence. What complexity showed us was another type of system explained by principles that went far beyond Cartesian science.
One characteristic of complex systems explains why ML has made so much progress as a tool to explain science. Complex systems are bottom up hierarchical. What this means is that quantum particles join to form atoms that become molecules, then cells, organs (systems) and eventually humans (systems). Herbert Simon, Nobel economist, called this combining of components synthesis[5] and it is the basis of human creativity and evolution. The slot machine spins and the results change every time you pull the lever. Some outcomes at any level of the hierarchy of systems enhance survivability and other variations do not. Whether a synthetic or natural process, this combinatorial process creates the variety to potentially improve outcomes. This notion of the combinatorial process is the intellectual foundation for computational biology, chemistry and physics.
EO Wilson, legendary biology professor at Harvard, explains well.
“We are drowning in information, while starving for wisdom. The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices wisely.” (1999)
Based on this thinking, Wilson founded what came to be called computational biology — the application of ML to biological investigation. In biology, we had not only the animals and species, but all the genomes and their multi-layered hierarchy of components to consider. As the datasets increased in size over time, the application of ML expanded from data analytics to predictive and prescriptive analytics, from biology to medical science, agriculture, materials science and cyber-physical applications. ML was the perfect tool to use for pattern recognition across a wide range of disciplines. Eventually, we came to realize that the ML could be used for more than analyzing data. ML could be used to develop proposed solutions to problems in medicine, materials science, agriculture and many other fields. ML could analyze the synthetic combinations of components to determine the best theoretical solutions. No longer did we have to evaluate thousands and thousands of solutions. The ML prescreened the solutions, reduced the workload and more importantly shortened the time to market (for life saving solutions).
Hannah Fry explains what had really happened.
“Mathematics is about abstracting away from reality, not about replicating it. And it offers real value in the process. By allowing yourself to view the world from an abstract perspective, you create a language that is uniquely able to capture and describe the patterns and mechanisms that would otherwise remain hidden. And, as any scientist or engineer of the past 200 years will tell you, understanding these patterns is the first step toward being able to exploit them.”[6]
ML, as Wilson had anticipated it, became the best tool in history for mathematical manipulation through the use of algorithms for pattern recognition. As the complexity economist W. Bryan Arthur explains, “With equations we manipulate the system to arrive at some form we are seeking: some expression of a solution, some formula, some necessary condition, some mathematical structure, some sought-after demonstration of a truth contained in the system.” … “Algorithms give us the possibility to study formation. The researcher studies what generative process produces a given pattern and how this might vary with different algorithmic designs. So there is a back and forth between the pattern or structure formed and the algorithm that has formed it. The style becomes experimental: an algorithm produces some structure, and that structure feeds back to querying the algorithm that produced it.”[7] The next step in the evolution of ML was to repurpose this “generative process”.
As ML grew in popularity and usefulness, cloud computing flourished and is forecasted to reach revenues of over $1 trillion by 2026/2027, according to Synergy Research.[8] Cloud computing combined with better database technology supported the expansion of dataset size for any particular problem. As the database technology improved, it was matched by an improvement in the assortment of available ML algorithms. One set of algorithms to emerge was Generative AI, which garnered much attention for its manipulation of text and art data to produce original writing and art. The more significant development was the use of Generative AI in science.
Generative AI comes in many versions — unsupervised, supervised and reinforcement. Regardless of the style of algorithm, synthetic data is used either as an output in the form of writings or art or as new training data to refine the algorithms. This use of synthetic data as training data has many uses, including to make users anonymous. However, I think the more exciting discovery is explained by Daphne Koller, computer scientist, MacArthur Genius, and CEO of early-stage biomedicine company Insitro. Using synthetic data, what Insitro has found is new features in the medical datasets that were previously unknown to researchers. Basically, the algorithms saw patterns unseen by humans and duplicated them in the new synthetic data. New features, repeating in future synthetic datasets, Koller believes could take the study of medicine to a whole new level of fundamental medical science.[9] This logic could also be applied in almost any computational field of natural science to open up the study of new levels of fundamentals.
Marc Andresseen, cofounder of VC firm a16z, in a recent podcast, makes the point that new technology allows us to “revisit the fundamentals”. Scientists historically have been limited by the tools within experimental reach. Generative AI holds the potential to transform science at the fundamental level. The modern history of science was shaped initially by empirical data analysis and validated by mathematics. Today, with synthetic data, we are on the verge of the math doing the entire process of discovery with the scientists doing only the clinical validation. As the VCs at Air Street Capital say, “AI-first by design”. This AI-first by design is explained well in the Journal of Cheminformatics:
“The use of artificial intelligence and machine learning (AI/ML) in drug discovery has increased rapidly in recent years, providing AI-aided design tools for drug design projects. The strengths of AI lie in finding patterns from vast amount of data from heterogeneous sources, at its best augmenting humans’ abilities in challenging tasks such as molecular optimization. Advances in de novo molecular design tools enable automation of the design step in in silico design-make-test-analyze (DMTA) cycles of drug design.[10]
At this point advanced researchers use the ML to accelerate the new approach, DMTA, to increase the development of new chemicals and drugs across a wide range of industries. Researchers will continue to improve the algorithms to optimize the process, but much of the scientific investigation has moved to computational models that are revolutionizing biology, chemistry and medical science.
Emad Mostaque, Founder of Stable Diffusion, in the MIT Technology Review (Feb 2023) highlights the point.
“Google and Microsoft are going all in with generative AI as core to their future. There is no “we’re still early here”, trillion-dollar companies are shifting their whole strategy and focus. I can’t ever recall a technology and strategy shift as fast and meaningful as this.”
To compare Generative AI to Excel or the iPhone is to understate the potential impact of this new technology. Generative AI’s effect may well be equivalent to electricity or Shannon’s Information Theory. Generative AI will be a Superpower![11]
“In reality we know nothing, for truth is in the depths.” — Democritus
[1] https://bu.ac.bd/uploads/BUJ1V5I12/6.%20Hena%20Rani%20Biswas.pdf
[2] Complexity: The Emerging Science at the Edge of Order and Chaos
M. Mitchell Waldrop
[3] https://www.statisticshowto.com/deterministic-function-nondeterministic/
[4] https://www.statisticshowto.com/deterministic-function-nondeterministic/
[5] https://monoskop.org/images/9/9c/Simon_Herbert_A_The_Sciences_of_the_Artificial_3rd_ed.pdf
[6] Hannah Fry, The Mathematics of Love
[7] https://beijer.kva.se/wp-content/uploads/2020/03/Disc269_Arthur_2020.pdf
[8] https://www.nextplatform.com/2023/01/26/cloud-spending-to-top-1-trillion-in-four-years/
[9] https://www.mckinsey.com/industries/life-sciences/our-insights/it-will-be-a-paradigm-shift-daphne-koller-on-machine-learning-in-drug-discovery?stcr=7A3EF8C089684F0490B4EF7DB897B8C8&__hScId__=v70000018486afd1cd88980d6e96639818&__hRlId__=6405f1adbd5f4ae50000021ef3a0bccf&__hDId__=6405f1ad-bd5f-4ae5-92f6-a4c54383c313&__hSD__=d3d3Lm1ja2luc2V5LmNvbQ==&cid=other-eml-alt-mip-mck&hlkid=26014d1191c442e88e6ad8cd2ccadc39&hctky=10393242&hdpid=6405f1ad-bd5f-4ae5-92f6-a4c54383c313&cid=app
[10] https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00667-8
[11] Many have used this phrase. It is not clear to me who deserves the credit.