(Press ?
for help, n
and p
for next and previous slide)
Adding data to improve the model:
Adding knowledge to speed-up problem solving and learning:
Examples: Background Knowledge (Primitive Predicates): Leanred Hypothesis:
Yoshua Bengio: From System 1 Deep Learning to System 2 Deep Learning.
(NeurIPS’2019 Keynote)
Representation | Statistical / Neural | Symbolic |
---|---|---|
Examples | Many | Few |
Data | Tables | Programs / Logic programs |
Hypotheses | Propositional/functions | First/higher-order relations |
Learning | Parameter optimisation | Combinatorial search |
Explainability | Difficult | Possible |
Knowledge transfer | Difficult | Easy |
A Boolean algebra is a complemented lattice with binary operators \(\wedge\), \(\vee\) and a unary operator \(\neg\) and elements \(1\) and \(0\) s.t. commutative, associative and distributive laws hold.
Finite set \(W_m=\{\frac{k}{m-1}\mid 0\leq k\leq m-1\}\) or infinite set \(W_\infty =[0,1]\)
Semantics: Logical relations \(\Leftrightarrow\) Statistical correlation. e.g.,
can be interpreted as:
Semantics:
Semantics:
Neural Theorem Prover (Rocktäschel and Riedel, 2017)
(Wu et al., A Comprehensive Survey on Graph Neural Networks. arxiv 2019)
Difficult to extrapolate:
(Trask et al., 2018)
Semantics: Distribution on propositions. Assuming all groundings (\(f_i\)) are independent:
Probability of atom \(a\) being \(True\):
Problog (De Raedt & Kimmig, 2015): An example of noisy-or rules:
%% Logic rules
0.3::stress(X) :- person(X).
0.2::influences(X,Y) :- person(X), person(Y).
smokes(X) :- stress(X).
smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y).
0.4::asthma(X) :- smokes(X).
%% Facts
person(1). person(2). person(3). person(4).
friend(1,2). friend(4,2). friend(2,1). friend(2,4).
friend(3,2).
%% Observed facts
evidence(smokes(2),true). evidence(influences(4,2),false).
%% unknown facts
query(smokes(1)). query(smokes(3)). query(smokes(4)).
query(asthma(1)). query(asthma(2)). query(asthma(3)). query(asthma(4)).
Inference result:
prob(smoke(1), 0.50877). prob(smoke(3), 0.44). prob(smoke(4), 0.44).
prob(asthma(1), 0.203508). prob(asthma(2), 0.4). prob(asthma(3), 0.176).
prob(asthma(4), 0.176).
Weighted Model Counting: Compile logic formulae to a DAG then count possible worlds.
Define the structure of logic rules:
%% Rules
t(_)::stress(X) :- person(X).
t(_)::influences(X,Y) :- person(X), person(Y).
smokes(X) :- stress(X).
smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y).
%% Facts
person(1). person(2). person(3). person(4).
friend(1,2). friend(4,2). friend(2,1). friend(2,4).
friend(3,2).
%% examples
evidence(smokes(2),false).
evidence(smokes(4),true).
evidence(influences(1,2),false).
evidence(influences(4,2),false).
evidence(influences(2,3),true).
evidence(stress(1),true).
Learned weights:
0.666666666666667::stress(X) :- person(X).
0.339210385615516::influences(X,Y) :- person(X), person(Y).
Cannot solve problems like this…
Training examples: $〈 x,y〉 $.
Task: Image sequence only with label of equation’s correctness
1+1=10
, 1+0=1
,…(add); 1+1=0
, 0+1=1
,…(xor).
%%%%%%%%%%%%%% LENGTH: 7 to 8 %%%%%%%%%%%%%%
This is the CNN's current label:
[[1, 2, 0, 1, 0, 1, 2, 0], [1, 1, 0, 1, 0, 1, 3, 3], [1, 1, 0, 1, 0, 1, 0, 3], [2, 0, 2, 1, 0, 1, 2], [1, 1, 0, 0, 0, 1, 2], [1, 0, 1, 1, 0, 1, 3, 0], [1, 1, 0, 3, 0, 1, 1], [0, 0, 2, 1, 0, 1, 1], [1, 3, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 1, 3, 3]]
****Consistent instance:
consistent examples: [6, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+1+00 01+0+00 0+00+011
Abduced labels:
00+1=00 01+0=00 0+00=011
Consistent percentage: 0.3
****Learned Rules:
rules: ['my_op([0],[0],[0,1])', 'my_op([1],[0],[0])', 'my_op([0],[1],[0])']
Train pool size is : 22
...
This is the CNN's current label:
[[1, 1, 0, 1, 2, 1, 3, 3], [1, 3, 0, 3, 2, 1, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 3, 3, 2, 1, 1], [1, 1, 0, 1, 2, 1, 3, 3], [1, 1, 0, 1, 2, 1, 3, 3], [3, 0, 1, 1, 2, 1, 1]]
****Consistent instance:
consistent examples: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Abduced labels:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Consistent percentage: 1.0
****Learned feature:
Rules: ['my_op([1],[0],[0])', 'my_op([0],[1],[0])', 'my_op([1],[1],[1])', 'my_op([0],[0],[0,1])']
Train pool size is : 77
Test Acc. vs Eq. length
A task from Neural arithmetic logic units (Trask et al., NeurIPS 2018).
:- Z = [7,3,5], Y = 15, prog(Z, Y).
% true.
How to write the prog
?
\[ \underset{\theta}{\operatorname{arg max}}\sum_z \sum_H P(y,z,H|B,x,\theta) \]
prog
).
#=
is a CLP(Z) predicate for representing arithmetic constraints:
X+Y#=3, [X,Y] ins 0..9
will output X=0, Y=3; X=1, Y=2; ...
Domain Knowledge | End-to-end Models | \(Meta_{Abd}\) |
---|---|---|
Recurrence | LSTM & RNN | Prolog’s list operations |
Arithmetic functions | NAC & NALU (Trask et al., 2018) |
Predicates add , mult and eq |
Permutation | Permutation matrix \(P_{sort}\) (Grover et al., 2019) | Prolog’s permutation |
Sorting | sort operator (Grover et al., 2019) |
Predicate s (learned as sub-task) |
Learned Programs:
%% Accumulative Sum
f(A,B):-add(A,C), f(C,B).
f(A,B):-eq(A,B).
%% Accumulative Product
f(A,B):-mult(A,C), f(C,B).
f(A,B):-eq(A,B).
Learned Programs:
% Sub-task: Sorted
s(A):-s_1(A,B),s(B).
s(A):-tail(A,B),empty(B).
s_1(A,B):-nn_pred(A),tail(A,B).
%% Bogosort by reusing s/1
f(A,B):-permute(A,B,C),s(C).
sorted/1
uses the subset of sorted MNIST sequences in training data as example.
References: