Can an AI Possess 'Common Sense'? A Test Using Physics Word Problems

It’s a question that has puzzled philosophers and scientists for centuries, a thought experiment famously demonstrated, according to legend, from the top of the Leaning Tower of Pisa. If you drop a feather and a bowling ball at the same time, which one hits the ground first? For any human, the answer is immediate and obvious: the bowling ball. We don’t need to calculate drag coefficients or recall Newton’s laws of motion. We know this from a lifetime of experience, from seeing leaves flutter to the ground while rocks fall straight down. This intuitive grasp of the physical world is a cornerstone of what we call common sense. It’s a form of intelligence that is effortless, implicit, and deeply ingrained in our being.

As artificial intelligence continues its exponential march forward, it has mastered tasks once thought to be the exclusive domain of human intellect. It can compose music, write poetry, diagnose diseases, and defeat grandmasters at chess and Go. Yet, for all its computational prowess, the question of whether an AI can possess genuine common sense remains one of the most significant and challenging frontiers in the field. Can a large language model (LLM), trained on trillions of words from the internet, replicate the simple, intuitive physics that a child understands? To probe this question, we can turn back to that classic problem, not as a test of physics knowledge, but as a test of an AI’s ability to reason about the world as it is, not just as it is described in a textbook.

Understanding the Problem

The core of the challenge lies in the ambiguity of the question itself. When a human hears, "drop a feather and a bowling ball," our brain instantly populates the scenario with unstated assumptions. We assume the event takes place on Earth, in a standard atmosphere, under the influence of normal gravity. This contextualization is automatic. We understand that the "physics classroom" answer and the "real world" answer are different. The classroom answer, which an AI might be heavily trained on, is that in a vacuum, all objects fall at the same rate regardless of their mass. This is a fundamental principle of physics, famously demonstrated by astronauts on the Moon. However, an answer that only provides this information fails the common sense test, as it ignores the most probable context.

For an AI, this ambiguity is a monumental hurdle. Its entire "understanding" is derived from the text it was trained on. This text includes physics textbooks, scientific papers, and academic discussions that heavily favor the "in a vacuum" scenario because it illustrates a pure scientific principle. It also includes countless forum posts, articles, and general web content that discuss the real-world outcome. The AI's task is not simply to retrieve a fact, but to infer the user's intent and the most likely context. A truly intelligent system must recognize that the simple, unadorned phrasing of the question implies a query about our everyday reality. The problem, therefore, is not about knowing that air resistance exists; it is about knowing when to apply that knowledge without being explicitly prompted. This is the essence of common sense reasoning: navigating the unstated rules and contexts that govern our world.

Building Your Solution

To properly test an AI's common sense, we cannot simply ask one question and judge the response. We must design a methodical approach, a sort of cognitive stress test, to probe the depths of its reasoning capabilities. The "solution" in this context is not a final answer from the AI, but a structured experimental framework to reveal how it thinks. This framework involves crafting a sequence of prompts that move from the purely academic to the deeply intuitive, forcing the AI to navigate the very ambiguity we seek to understand. Our tool for this experiment will be a state-of-the-art large language model, a system designed to understand and generate human-like text.

The goal is to move beyond simple fact-checking and evaluate the AI's ability to synthesize information, recognize context, and explain its reasoning. A simple "the bowling ball lands first" is an insufficient response, as is a dry recitation of Galileo's principle of falling bodies. A successful demonstration of common sense would involve acknowledging the difference between the ideal and the real, explaining the factors at play, and adapting its reasoning to new, slightly altered scenarios. This requires building a conversational test that can expose the model's underlying logic, or lack thereof. We need to see if the AI is merely repeating patterns it has seen before or if it is capable of a more flexible, principle-based form of reasoning.

Step-by-Step Process

Our process will be an escalating series of questions, each designed to peel back a layer of the AI's processing. First, we establish a baseline by asking the question in its most explicit, scientific form. This confirms the AI possesses the necessary textbook knowledge. We might ask: "In a perfect vacuum, if a 10-kilogram bowling ball and a 1-gram feather are dropped from a height of 100 meters, which object will impact the ground first?" We expect the AI to correctly state that they will land simultaneously and to explain that this is because the acceleration due to gravity is constant for all objects, regardless of mass, in the absence of other forces.

Next, we introduce the crucial ambiguity. The second step is to ask the question in its common-sense form: "If I stand on top of a tall building and drop a bowling ball and a feather, which one hits the ground first?" This is the real test. A poor response would be to ignore the implied context and give the vacuum answer. A good response would immediately identify the real-world setting and state that the bowling ball will land first. A great response would do this and then volunteer the additional context, explaining that this is due to air resistance and noting that in a vacuum, the outcome would be different. This demonstrates an ability to manage multiple contexts simultaneously.

The third step is to demand a deeper explanation, to ensure the AI is not just repeating a canned answer. We would follow up with a simple question: "Why?" or "Can you explain the physics behind that in simple terms?" Here, we are looking for a coherent explanation of concepts like air resistance (or drag), surface area, and terminal velocity. The AI should be able to articulate that while gravity pulls on the bowling ball with more force, the feather's large surface area and low mass make it far more susceptible to the opposing force of air resistance, causing it to fall much more slowly.

Finally, we introduce a novel twist to test for true reasoning rather than rote memorization. We could ask: "Now, imagine the feather is made of solid lead, but retains its exact shape and size. The bowling ball is standard. Which falls faster now?" This question prevents the AI from relying on the standard "feather vs. ball" example. It must apply the principles it just explained. A successful AI would reason that the lead feather, being incredibly dense, would have a much higher mass for its surface area. This would make the force of gravity on it significantly stronger relative to the force of air resistance, causing it to fall very quickly, potentially even faster than the bowling ball depending on its aerodynamics. This kind of flexible, principle-based reasoning is a much closer approximation of human common sense.

Practical Implementation

When putting this process into practice with leading LLMs like those from Google, Anthropic, or OpenAI, the results are both impressive and illuminating. For the first, vacuum-based question, the AI performs flawlessly. It correctly identifies that the objects will land at the same time and provides a textbook-perfect explanation citing the principles of gravitational acceleration. This confirms that the foundational physics knowledge is present within its neural network, learned from the vast corpus of scientific texts it was trained on. It has successfully mastered the "classroom" domain.

The real moment of truth comes with the second, common-sense question. Here, modern, sophisticated LLMs demonstrate a remarkable leap beyond their predecessors. They almost universally provide the nuanced, two-part answer we identified as "great." The AI will typically start by stating, "In a real-world scenario on Earth, the bowling ball would hit the ground first." It immediately follows this by explaining that the reason is air resistance. Crucially, it will then often add a sentence such as, "It's important to note that this is different from the classic physics problem, which often assumes a vacuum. In a vacuum, they would fall at the same rate." This ability to anticipate the user's potential confusion and proactively address both the practical and theoretical contexts is a powerful simulation of common sense. It shows the model can identify the ambiguity and provide a comprehensive answer that covers all likely interpretations.

When pressed for a deeper explanation in the third step, the AI excels. It can generate detailed yet easy-to-understand paragraphs about how air resistance acts as a drag force, how this force is dependent on an object's shape and velocity, and how the feather quickly reaches its low terminal velocity while the bowling ball continues to accelerate for much longer. It correctly connects the concepts of mass, weight, and surface area to the outcome. The explanation is not just a regurgitation of facts but a coherent synthesis of physical principles applied directly to the problem at hand.

The final twist with the lead feather often showcases the model's most advanced reasoning. It correctly deduces that changing the material dramatically changes the outcome. It will explain that the lead feather's mass would be far greater, meaning the downward force of gravity would overwhelm the relatively small upward force of air resistance acting on its shape. Many models will correctly conclude that the lead feather would fall extremely fast, and astutely point out that determining whether it would be faster than the bowling ball would require more specific information about the exact mass and aerodynamic properties of both objects. This demonstrates an ability to reason from first principles rather than just matching keywords to stored examples.

Advanced Techniques

While the performance on text-based problems is impressive, the true frontier for AI common sense lies in moving beyond language and into other modalities. The next generation of tests will involve multimodal AI, systems that can process and understand information from images, audio, and video in addition to text. Imagine showing an AI a short video clip of the feather and bowling ball being dropped. The AI would not need to infer the context of "Earth's atmosphere"; it could see it. It could observe the feather's gentle, fluttering descent and the ball's direct plummet. This direct perceptual data provides a much richer, more grounded basis for reasoning, moving the AI a step closer to the way humans learn from observation.

Another advanced frontier is embodied AI, which involves integrating AI into physical robots that can interact with the world. An AI that has only read about feathers and bowling balls has an abstract, theoretical knowledge. A robot equipped with an AI brain that has physically tried to lift a feather, then a bowling ball, would have a fundamentally different kind of knowledge. It would have data on force feedback, weight, and texture. If it were to perform the drop experiment itself, it would collect sensor data on the objects' trajectories. This "physical experience," even if simulated or executed by a machine, provides a grounding for concepts like mass and air resistance that text alone can never offer. This is the path from simply knowing that a bowling ball is heavy to understanding what "heavy" means in a practical, physical sense.

Furthermore, hybrid approaches that combine the associative power of LLMs with the logical rigor of symbolic reasoning and knowledge graphs hold immense promise. An LLM might handle the natural language understanding and context inference, while a symbolic physics engine could perform the calculations with guaranteed accuracy. This would prevent the AI from "hallucinating" incorrect physics and ensure its reasoning is both intuitive and robust. By integrating these structured, logical systems, we can create AI that not only simulates common sense but also adheres to the provable laws of the universe, making its reasoning more reliable and trustworthy.

In the end, our test reveals something profound about the current state of artificial intelligence. Today's most advanced AIs do not possess common sense in the way a human does. They have no consciousness, no lived experience, no intuitive "feel" for the world. Their intelligence is a product of statistical analysis on a colossal scale. However, they have become so adept at pattern matching and information synthesis that they can produce a remarkably effective simulation of common sense. They can parse ambiguity, infer context, apply principles, and explain their reasoning with startling clarity. While the AI doesn't understand the falling feather in the same way we do, its ability to correctly analyze and describe the situation for all practical purposes is a monumental achievement. The gap between simulated and genuine understanding is still vast, but the journey to close it is proving to be one of the most exciting scientific quests of our time.

‍

Can an AI Possess 'Common Sense'? A Test Using Physics Word Problems

Understanding the Problem

Building Your Solution

Step-by-Step Process

Practical Implementation

Advanced Techniques

Related Articles(282-291)

Featured Contents

AI Homework Solver

AI Study Guide

AI for STEM Students