Connecting Machine Learning and Logical Reasoning
(Press ?
for help, n
and p
for next and previous slide)
Joint work with Zhi-Hua Zhou$^*$, Stephen Muggleton$^\dagger$, Yang Yu$^*$, Qiu-Ling Xu$^*$ Yu-Xuan Huang$^*$ and Le-Wen Cai$^*$.
$^\dagger$Department of Computing, Imperial College London
$^*$Department of Computer Science, Nanjing University
Adding data to improve the model:
Supervised Learning
Data (alone) could be never enough…
Human: How many eyes does a giraffe have?
GPT-3: A giraffe has two eyes.
Human: How many eyes does my foot have?
GPT-3: Your foot has two eyes.
Human: How many eyes does a spider have?
GPT-3: A spider has eight eyes.
Human: How many eyes does the sun have?
GPT-3: The sun has one eye.
Human: How many eyes does a blade of grass have?
GPT-3: A blade of grass has one eye.
e.g., Logic Programs:
Adding knowledge to speed-up problem solving and learning:
Examples: Background Knowledge (Primitive Predicates): Leanred Hypothesis:
Cannot solve problems like this…
Yoshua Bengio: From System 1 Deep Learning to System 2 Deep Learning.
(NeurIPS’2019 Keynote)
Neural Theorem Prover (Rocktäschel and Riedel, 2017)
Difficult to extrapolate:
(Trask et al., 2018)
Representation | Statistical / Neural | Symbolic |
---|---|---|
Examples | Many | Few |
Data | Tables | Programs / Logic programs |
Hypotheses | Propositional/functions | First/higher-order relations |
Explainability | Difficult | Possible |
Knowledge transfer | Difficult | Easy |
Training examples: $〈 x,y〉 $.
ABL is a framework where machine learning and logical reasoning can be entangled and mutually beneficial.
Task: Image sequence only with label of equation’s correctness
1+1=10
, 1+0=1
,…(add); 1+1=0
, 0+1=1
,…(xor).
Equation structure (DCG grammars):
X+Y=Z
;0
and 1
.Binary operation:
%%%%%%%%%%%%%% LENGTH: 7 to 8 %%%%%%%%%%%%%%
This is the CNN's current label:
[[1, 2, 0, 1, 0, 1, 2, 0], [1, 1, 0, 1, 0, 1, 3, 3], [1, 1, 0, 1, 0, 1, 0, 3], [2, 0, 2, 1, 0, 1, 2], [1, 1, 0, 0, 0, 1, 2], [1, 0, 1, 1, 0, 1, 3, 0], [1, 1, 0, 3, 0, 1, 1], [0, 0, 2, 1, 0, 1, 1], [1, 3, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 1, 3, 3]]
****Consistent instance:
consistent examples: [6, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+1+00 01+0+00 0+00+011
Abduced labels:
00+1=00 01+0=00 0+00=011
Consistent percentage: 0.3
****Learned Rules:
rules: ['my_op([0],[0],[0,1])', 'my_op([1],[0],[0])', 'my_op([0],[1],[0])']
Train pool size is : 22
...
This is the CNN's current label:
[[1, 1, 0, 1, 2, 1, 3, 3], [1, 3, 0, 3, 2, 1, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 3, 3, 2, 1, 1], [1, 1, 0, 1, 2, 1, 3, 3], [1, 1, 0, 1, 2, 1, 3, 3], [3, 0, 1, 1, 2, 1, 1]]
****Consistent instance:
consistent examples: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Abduced labels:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Consistent percentage: 1.0
****Learned feature:
Rules: ['my_op([1],[0],[0])', 'my_op([0],[1],[0])', 'my_op([1],[1],[1])', 'my_op([0],[0],[0,1])']
Train pool size is : 77
Test Acc. vs Eq. length
Task: predict punishment from court record (text).
Method | F1 |
---|---|
BERT-10 | 0.811±0.010 |
PL-10 | 0.814±0.006 |
Tri-10 | 0.812±0.016 |
ABL-10 | 0.824±0.014 |
SS-ABL-10 | 0.862±0.005 |
BERT-50 | 0.857±0.006 |
PL-50 | 0.858±0.010 |
Tri-50 | 0.861±0.007 |
ABL-50 | 0.860±0.003 |
SS-ABL-50 | 0.865±0.007 |
BERT-100 | 0.863±0.003 |
ABL-100 | 0.867±0.008 |
Method | MAE | MSE |
---|---|---|
BERT-10 | 0.8668±0.0320 | 1.2044±0.1233 |
PL-10 | 0.8616±0.0346 | 1.1548±0.1073 |
Tri-10 | 0.8402±0.0659 | 1.1548±0.1073 |
ABL-10 | 0.8728±0.1016 | 1.3756±0.2168 |
SS-ABL-10 | 0.8239±0.0174 | 1.1459±0.0487 |
BERT-50 | 0.8300±0.0198 | 1.0654±0.0443 |
PL-50 | 0.8316±0.0346 | 1.0448±0.1543 |
Tri-50 | 0.8102±0.0213 | 0.9944±0.0461 |
ABL-50 | 0.8416±0.0294 | 1.0821±0.1097 |
SS-ABL-50 | 0.7876±0.0272 | 0.9591±0.0910 |
BERT-100 | 0.8213±0.0141 | 1.0114±0.0312 |
ABL-100 | 0.8223±0.0302 | 1.0065±0.0931 |
A task from Neural arithmetic logic units (Trask et al., NeurIPS 2018).
:- Z = [7,3,5], Y = 15, prog(Z, Y).
% true.
\[ \underset{\theta}{\operatorname{arg max}}\sum_z \sum_H P(y,z,H|B,x,\theta) \]
#=
is a CLP(Z) predicate for representing arithmetic constraints:
X+Y#=3, [X,Y] ins 0..9
will output X=0, Y=3; X=1, Y=2; ...
Domain Knowledge | End-to-end Models | \(Meta_{Abd}\) |
---|---|---|
Recurrence | LSTM & RNN | Prolog’s list operations |
Arithmetic functions | NAC & NALU (Trask et al., 2018) |
Predicates add , mult and eq |
Permutation | Permutation matrix \(P_{sort}\) (Grover et al., 2019) | Prolog’s permutation |
Sorting | sort operator (Grover et al., 2019) |
Predicate s (learned as sub-task) |
Learned Programs:
%% Accumulative Sum
f(A,B):-add(A,C), f(C,B).
f(A,B):-eq(A,B).
%% Accumulative Product
f(A,B):-mult(A,C), f(C,B).
f(A,B):-eq(A,B).
Learned Programs:
% Sub-task: Sorted
s(A):-s_1(A,B),s(B).
s(A):-tail(A,B),empty(B).
s_1(A,B):-nn_pred(A),tail(A,B).
%% Bogosort by reusing s/1
f(A,B):-permute(A,B,C),s(C).
sorted/1
uses the subset of sorted MNIST sequences in training data as example.
References: