Why do we (still) talk about symbolism?
LLM w./o. formal prover champions IMO
“At IMO 2024, AlphaGeometry and AlphaProof required experts to first translate problems from natural language into domain-specific languages, such as Lean, and vice-versa for the proofs … This year, our advanced Gemini model operated end-to-end in natural language…”
From: DeepMind
LLM w./o. formal prover champions IMO
From: DeepMind
The “reasoning model”๐
How many vowels are there in the word “Goooooooooooooooooooooooooooooal”?
The “reasoning model”๐
OpenAI-o1 Preview’s performance
#trial |
answer |
#trial |
answer |
1 |
30 โ๏ธ |
6 |
31 โ |
2 |
31 โ |
7 |
31 โ |
3 |
31 โ |
8 |
29 โ |
4 |
33 โ |
9 |
Choose between 32 โ and 31 โ |
5 |
24 โ |
10 |
29 โ |
Subtizing
Subitizing is the rapid, accurate, and effortless ability to perceive small quantities of items in a set, typically when there are four or fewer items, without relying on linguistic or arithmetic processes. The term refers to the sensation of instantly knowing how many objects are in the visual scene when their number falls within the subitizing range.
o4-mini-high - solved by writing Python โ๏ธ
General Problem Solving (1950s)
General problems (subsymbolic) โ Formalisation (symbolic)
โ Formal reasoning / computing
Count the Fruits - GPT 5 Agent
Summary of GPT 5 Agent’s Strategy
- Mask “green” and “orange” areas, calculate continues connected regions.
- 8 green areas / 3 orange areas
- Fit Hough circles, count their colours.
- 7 circles / 7 apples / 13 oranges
- Fit Hough circles again with different parameters, and cluster them.
- 18 clusters / 6 apples / 12 oranges
- Sample points, and cluster them.
- 13 clusters / 1 apple / 12 oranges
- Image segmentation with watershed, then count connected regions.
- 10 green areas (34 pre count) / 3 orange areas (5 pre count)
- “After analyzing the components, I manually counted the fruits and confirmed that there are 7 green apples and 10 oranges, making a total of 17 fruits. Morphological detection predictions were unstable. I’ll deliver this final count to the user.” (7 apples / 10 oranges, still wrong!)
What’s wrong?
In GPT-5’s formalisation, the concepts:
- “Apple” equals to green connected regions, green circles, segments mostly in green colour, cluster of green points, …
- “Orange” equals to orange connected regions, orange circles, segments mostly in orange colour, cluster of orange points, …
They could be
anything definable with an existing Python function, just
NOT apples and oranges.
- The problem lies in the limitation of its formal language.
- And, this is also why it can solve the word-counting problem!
AI still needs humans …
General problems (subsymbolic) โ Formalisation (symbolic)\(^\dagger\)
โ Formal reasoning / computing
\(^\dagger\)Usually requires manual efforts, because we can
abstract the tasks
create formalisation tailored for them:
- Logic Programs, Probabilisitc Graphical Models, Probabilistic Programming, etc.
- SVMs (Kernel tricks, convex optimiszation), ANNs (Transformers, CNN, GNN), Planning (STRIPS), etc.
- Agentic foundational models (Langchain, MCP workflows, etc.)
Why do we (still) talk about symbolism?
Because problem solving involves formal procedures, which are symbolically represented!
NeSy can count
A NeSy solution:
- Implement neuro-predicate
apple
and orange
with YOLO
- Input: raw image;
- Output: a list of all bounding boxes of apples / oranges
- Calculate the
length
of each output list
count_apple(Img, N) :- apple_yolo(Img, List), length(List, N).
count_orange(Img, N) :- orange_yolo(Img, List), length(List, N).
Wait, do we need to manually create those NeSy programs every time (like the good old fashioned expert systems)?
The existence of symbols
… these approaches miss an important consequence of uncertainty in a world of things: uncertainty about what things are in the world. Real objects seldom wear unique identifiers or preannounce their existence like the cast of a play. In areas such as vision, language understanding, … , the existence of objects must be inferred from raw data (pixels, strings, and so on) that contain no explicit object references.
The real-world challenge for us
Can we learn a program / plan / grammar / graph / …
from sensory raw data only,
and without pre-defined primitive symbols,
and works in a world that only has sensory raw inputs and only allows low-level motions as outputs?
Learning to Abstract from Scratch
An Example of Abstraction
- Environment: Minigrid
- Task: Reach to the goal (๐ฉ)
- Low-level Inputs: Raw images
- Low-level Actions: โฌ๏ธ, โฉ๏ธ, โช๏ธ, ๐ซด, ๐ซณ, โป๐
- Reward: { -1 (fail), 1 (success) }
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.
Original Problem (Ground MDP)
An end2end reinforcement learning (RL) task is an MDP in a sub-symbolic environment: \(\langle\mathcal{S}, \mathcal{A}, P, R, \gamma\rangle\)
- \(\mathcal{S}\): original image
- \(\mathcal{A}\): low-level actions
- \(P\): state transition of low-level actions
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.
Ground-truth abstraction
The abstraction of the task is a sub-task decomposition:
- If there are propositional symbols:
- \(K\) - has key; \(U\) - door unlocked; \(G\) - reached to the goal
- \(\neg K\wedge \neg U\wedge\neg G \Longrightarrow K\wedge \neg U\wedge\neg G \Longrightarrow K\wedge U\wedge\neg G \Longrightarrow K\wedge U\wedge G\)
- How to learn such task abstraction without symbols of objects, and even without the definition of grids?
- Very difficult! So we only discover descrete, abstract states, and learn the state-transitions
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.
Minsky’s Example (The Society of Mind, 1986)
Functions can define concepts
The concept of “chair” can be described as state transition functions, such as:
Learning to abstract
Our work learns to abstract via impasse-driven discovery [Unruh and Rosenbloom,
1989], which is implemented based on the idea of Abductive Learning (ABL).
- Meeting impasse โ Exploring and gathering successful trajectories.
- Trajectories โ Abductive learning to get \(\sigma_{\text{new}}\) and transition \(\tau_{\text{new}\rightarrow \text{old}}\).
- The Abstract State Machine is updated โ Training atomic policies in sub-MDPs.
- (Sub-MDP is a subalgebra of the original MDP defined based on abstract states)
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.
Learning to abstract
Our work learns to abstract via impasse-driven discovery [Unruh and Rosenbloom,
1989], which is implemented based on the idea of Abductive Learning (ABL).
- Meeting impasse โ Exploring and gathering successful trajectories.
- Trajectories โ Abductive learning to get \(\sigma_{\text{new}}\) and transition \(\tau_{\text{new}\rightarrow \text{old}}\).
- The Abstract State Machine is updated โ Training atomic policies in sub-MDPs.
- (Sub-MDP is a subalgebra of the original MDP defined based on abstract states)
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.
Experimental results
Abductive atate abstraction vs vanilla end2end reinforcement learning vs hierarchical reinforcement learning with groundtruth subtask hierarchy
Out-of-distribution Generalization
Trained on 5 maps
โฎ
Tested on 50 unseen maps
Out-of-distribution Generalization
1st col.: training reward in 5 random maps. End2end DRL still cannot converge;
2nd col.: testing reward in 50 random maps. The abstracted model did extrapolates;
3rd-4th cols.: continual learning in 50 random maps, requires much less training data.
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.