From End-to-End to Step-by-Step

Or: How I Learned to Stop Worrying and Love DNNs

(Press ? for help, n and p for next and previous slide)


Wang-Zhou Dai

School of Intelligence Science & Technology,
Nanjing University
Sep 13, 2025

https://daiwz.net

Why do we (still) talk about symbolism?

DNNs work pretty well

“Next-token prediction”

Credit: @cwolferesearch

LLM w./o. formal prover champions IMO

“At IMO 2024, AlphaGeometry and AlphaProof required experts to first translate problems from natural language into domain-specific languages, such as Lean, and vice-versa for the proofs … This year, our advanced Gemini model operated end-to-end in natural language…”

From: DeepMind

LLM w./o. formal prover champions IMO

From: DeepMind

The “reasoning model”๐Ÿ“

How many vowels are there in the word “Goooooooooooooooooooooooooooooal”?

The “reasoning model”๐Ÿ“

OpenAI-o1 Preview’s performance

#trial answer #trial answer
1 30 โœ”๏ธ 6 31 โŒ
2 31 โŒ 7 31 โŒ
3 31 โŒ 8 29 โŒ
4 33 โŒ 9 Choose between 32 โŒ and 31 โŒ
5 24 โŒ 10 29 โŒ

Subtizing

Subitizing is the rapid, accurate, and effortless ability to perceive small quantities of items in a set, typically when there are four or fewer items, without relying on linguistic or arithmetic processes. The term refers to the sensation of instantly knowing how many objects are in the visual scene when their number falls within the subitizing range. 

o4-mini-high - solved by writing Python โœ”๏ธ

General Problem Solving (1950s)

General problems (subsymbolic) โ†’ Formalisation (symbolic)
โ†’ Formal reasoning / computing

Count the Fruits - GPT 5 Agent

Summary of GPT 5 Agent’s Strategy

  1. Mask “green” and “orange” areas, calculate continues connected regions.
    • 8 green areas / 3 orange areas
  2. Fit Hough circles, count their colours.
    • 7 circles / 7 apples / 13 oranges
  3. Fit Hough circles again with different parameters, and cluster them.
    • 18 clusters / 6 apples / 12 oranges
  4. Sample points, and cluster them.
    • 13 clusters / 1 apple / 12 oranges
  5. Image segmentation with watershed, then count connected regions.
    • 10 green areas (34 pre count) / 3 orange areas (5 pre count)
  6. “After analyzing the components, I manually counted the fruits and confirmed that there are 7 green apples and 10 oranges, making a total of 17 fruits. Morphological detection predictions were unstable. I’ll deliver this final count to the user.” (7 apples / 10 oranges, still wrong!)

What’s wrong?

In GPT-5’s formalisation, the concepts:

  • “Apple” equals to green connected regions, green circles, segments mostly in green colour, cluster of green points,
  • “Orange” equals to orange connected regions, orange circles, segments mostly in orange colour, cluster of orange points,

They could be anything definable with an existing Python function, just NOT apples and oranges.

  • The problem lies in the limitation of its formal language.
  • And, this is also why it can solve the word-counting problem!

AI still needs humans …

General problems (subsymbolic) โ†’ Formalisation (symbolic)\(^\dagger\)
โ†’ Formal reasoning / computing
\(^\dagger\)Usually requires manual efforts, because we can abstract the tasks create formalisation tailored for them:

  • Logic Programs, Probabilisitc Graphical Models, Probabilistic Programming, etc.
  • SVMs (Kernel tricks, convex optimiszation), ANNs (Transformers, CNN, GNN), Planning (STRIPS), etc.
  • Agentic foundational models (Langchain, MCP workflows, etc.)

Why do we (still) talk about symbolism?

Because problem solving involves formal procedures, which are symbolically represented!

NeSy can count

A NeSy solution:

  • Implement neuro-predicate apple and orange with YOLO
    • Input: raw image;
    • Output: a list of all bounding boxes of apples / oranges
  • Calculate the length of each output list
count_apple(Img, N) :- apple_yolo(Img, List), length(List, N).
count_orange(Img, N) :- orange_yolo(Img, List), length(List, N).

Wait, do we need to manually create those NeSy programs every time (like the good old fashioned expert systems)?

The existence of symbols

 … these approaches miss an important consequence of uncertainty in a world of things: uncertainty about what things are in the world. Real objects seldom wear unique identifiers or preannounce their existence like the cast of a play. In areas such as vision, language understanding, … , the existence of objects must be inferred from raw data (pixels, strings, and so on) that contain no explicit object references. 

The real-world challenge for us

Can we learn a program / plan / grammar / graph / …
from sensory raw data only,
and without pre-defined primitive symbols,
and works in a world that only has sensory raw inputs and only allows low-level motions as outputs?

Well, monkeys can

Monkeys vs Pacman

Q. Yang, et. al., A language model of problem solving in humans and macaque monkeys, Current Biology, 2025.

Monkeys learn “grammars”

Q. Yang, et. al., A language model of problem solving in humans and macaque monkeys, Current Biology, 2025.

Human also learn “grammars”

Q. Yang, et. al., A language model of problem solving in humans and macaque monkeys, Current Biology, 2025.

Good players learn better “grammars”

Q. Yang, et. al., A language model of problem solving in humans and macaque monkeys, Current Biology, 2025.

Symbolic abstraction and decision making

  1. Symbolic abstraction ability is related to problem-solving ability.
    • Proposing new concepts, summarizing new rules;
    • Higher intelligence โ‰… More complex rule systems;
  2. Behind decision-making is formal reasoning and learning based on abstract structures.
    • Enhances the interpretability of decisions, and enhaces the reliability of their execution;
    • The underlying execution of plans, i.e., translate high-level intentions into low-level actions, requires a large amount of empirical training;
  3. Most importantly, symbolic abstraction (rules, concepts) can be learned from sub-symbolic environments
    • Discrete, symbolic representations can naturally emerge with experiential learning

Learning to Abstract from Scratch

An Example of Abstraction

  • Environment: Minigrid
  • Task: Reach to the goal (๐ŸŸฉ)
  • Low-level Inputs: Raw images
  • Low-level Actions: โฌ†๏ธ, โ†ฉ๏ธ, โ†ช๏ธ, ๐Ÿซด, ๐Ÿซณ, โป๐Ÿ‘ˆ
  • Reward: { -1 (fail), 1 (success) }

Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Original Problem (Ground MDP)

An end2end reinforcement learning (RL) task is an MDP in a sub-symbolic environment: \(\langle\mathcal{S}, \mathcal{A}, P, R, \gamma\rangle\)

  • \(\mathcal{S}\): original image
  • \(\mathcal{A}\): low-level actions
  • \(P\): state transition of low-level actions

Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Ground-truth abstraction

The abstraction of the task is a sub-task decomposition:

  • If there are propositional symbols:
    • \(K\) - has key; \(U\) - door unlocked; \(G\) - reached to the goal
  • \(\neg K\wedge \neg U\wedge\neg G \Longrightarrow K\wedge \neg U\wedge\neg G \Longrightarrow K\wedge U\wedge\neg G \Longrightarrow K\wedge U\wedge G\)
  • How to learn such task abstraction without symbols of objects, and even without the definition of grids?
  • Very difficult! So we only discover descrete, abstract states, and learn the state-transitions

Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Minsky’s Example (The Society of Mind, 1986)

Functions can define concepts

The concept of “chair” can be described as state transition functions, such as:

Learning to abstract

Our work learns to abstract via impasse-driven discovery [Unruh and Rosenbloom, 1989], which is implemented based on the idea of Abductive Learning (ABL).

  • Meeting impasse โ†’ Exploring and gathering successful trajectories.
  • Trajectories โ†’ Abductive learning to get \(\sigma_{\text{new}}\) and transition \(\tau_{\text{new}\rightarrow \text{old}}\).
  • The Abstract State Machine is updated โ†’ Training atomic policies in sub-MDPs.
    • (Sub-MDP is a subalgebra of the original MDP defined based on abstract states)

Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Learning to abstract

Our work learns to abstract via impasse-driven discovery [Unruh and Rosenbloom, 1989], which is implemented based on the idea of Abductive Learning (ABL).

  • Meeting impasse โ†’ Exploring and gathering successful trajectories.
  • Trajectories โ†’ Abductive learning to get \(\sigma_{\text{new}}\) and transition \(\tau_{\text{new}\rightarrow \text{old}}\).
  • The Abstract State Machine is updated โ†’ Training atomic policies in sub-MDPs.
    • (Sub-MDP is a subalgebra of the original MDP defined based on abstract states)

Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Experimental results

Abductive atate abstraction vs vanilla end2end reinforcement learning vs hierarchical reinforcement learning with groundtruth subtask hierarchy

Out-of-distribution Generalization

Trained on 5 maps

โฎ•

Tested on 50 unseen maps

Out-of-distribution Generalization

1st col.: training reward in 5 random maps. End2end DRL still cannot converge;
2nd col.: testing reward in 50 random maps. The abstracted model did extrapolates;
3rd-4th cols.: continual learning in 50 random maps, requires much less training data.
Z. Wang, et. al., From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning, IJCAI 2025.

Summary

Learning โฌŒ Abstraction โฌŒ Reasoning

โ€œThe era of experienceโ€ โ€”โ€” Silver and Sutton, 2025

Abstracting raw traces?

Open problem: Program (grammar) induction with neuro-predicate (concept) invention.

For an under-trained agent, the traces look like this:

  • We want model it like \(a^*b^*c^*\), but which segments corresponds to \(a\), \(b\) and \(c\)?
    • Needs to learn state perception models \(\phi_a: \mathbb{R}^n \rightarrow \{0,1\},\,\phi_b: \mathbb{R}^n \rightarrow \{0,1\},\ldots\)
    • while allowing the alphabet \(\{a, b, c, \ldots\}\) to increase during learning
    • and induce rules like \(c\leftarrow a\wedge b\) for high-level planning
    • meanwhile, train low-level action models \(\pi_a:\mathbb{R}^n \rightarrow\mathbb{R}^m , \ldots\) to execute the plan
  • A possible solution: combining abduction and induction and deep / statistical learning.

Dai and Muggleton, Abductive Knowledge Induction From Raw Data, IJCAI 2021.

Job Opportunity @ Nanjing

  • Nanjing University is seeking talented faculty and researchers to persue advancements in the field of classical symbolic AI, statistical relational AI and neuro-symbolic AI.
  • We offer all kinds of positions, from postdoc to full professor.
  • AI research @ Nanjing is leading by top-notch scholars, such as Stephen Muggleton FREng, Tan Tieniu FREng, and Zhi-Hua Zhou (current President of Trustee Board of IJCAI)