(Press ? for help, n and p for next and previous slide)
Adding data to improve the model:
Adding knowledge to speed-up problem solving and learning:
Examples:
Background Knowledge (Primitive Predicates):
Leanred Hypothesis:
Yoshua Bengio: From System 1 Deep Learning to System 2 Deep Learning.
(NeurIPS’2019 Keynote)
| Representation | Statistical / Neural | Symbolic | 
|---|---|---|
| Examples | Many | Few | 
| Data | Tables | Programs / Logic programs | 
| Hypotheses | Propositional/functions | First/higher-order relations | 
| Learning | Parameter optimisation | Combinatorial search | 
| Explainability | Difficult | Possible | 
| Knowledge transfer | Difficult | Easy | 
A Boolean algebra is a complemented lattice with binary operators \(\wedge\), \(\vee\) and a unary operator \(\neg\) and elements \(1\) and \(0\) s.t. commutative, associative and distributive laws hold.
Finite set \(W_m=\{\frac{k}{m-1}\mid 0\leq k\leq m-1\}\) or infinite set \(W_\infty =[0,1]\)
Semantics: Logical relations \(\Leftrightarrow\) Statistical correlation. e.g.,
can be interpreted as:
Semantics:
Semantics:
Neural Theorem Prover (Rocktäschel and Riedel, 2017)
(Wu et al., A Comprehensive Survey on Graph Neural Networks. arxiv 2019)
Difficult to extrapolate:
(Trask et al., 2018)
Semantics: Distribution on propositions. Assuming all groundings (\(f_i\)) are independent:
Probability of atom \(a\) being \(True\):
Problog (De Raedt & Kimmig, 2015): An example of noisy-or rules:
%% Logic rules
0.3::stress(X) :- person(X).
0.2::influences(X,Y) :- person(X), person(Y).
smokes(X) :- stress(X).
smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y).
0.4::asthma(X) :- smokes(X).
%% Facts
person(1).   person(2).   person(3).   person(4).
friend(1,2). friend(4,2). friend(2,1). friend(2,4).
friend(3,2).
%% Observed facts
evidence(smokes(2),true).  evidence(influences(4,2),false).
%% unknown facts
query(smokes(1)).  query(smokes(3)).  query(smokes(4)).
query(asthma(1)).  query(asthma(2)).  query(asthma(3)).  query(asthma(4)).
Inference result:
prob(smoke(1), 0.50877).   prob(smoke(3), 0.44).  prob(smoke(4), 0.44).
prob(asthma(1), 0.203508). prob(asthma(2), 0.4).  prob(asthma(3), 0.176).
prob(asthma(4), 0.176).
Weighted Model Counting: Compile logic formulae to a DAG then count possible worlds.
Define the structure of logic rules:
%% Rules
t(_)::stress(X) :- person(X).
t(_)::influences(X,Y) :- person(X), person(Y).
smokes(X) :- stress(X).
smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y).
%% Facts
person(1).   person(2).   person(3).   person(4).
friend(1,2). friend(4,2). friend(2,1). friend(2,4).
friend(3,2).
%% examples
evidence(smokes(2),false).
evidence(smokes(4),true).
evidence(influences(1,2),false).
evidence(influences(4,2),false).
evidence(influences(2,3),true).
evidence(stress(1),true).
Learned weights:
0.666666666666667::stress(X) :- person(X).
0.339210385615516::influences(X,Y) :- person(X), person(Y).
Cannot solve problems like this…
Training examples: $〈 x,y〉 $.
Task: Image sequence only with label of equation’s correctness
1+1=10, 1+0=1,…(add); 1+1=0, 0+1=1,…(xor).
%%%%%%%%%%%%%% LENGTH:  7  to  8 %%%%%%%%%%%%%%
This is the CNN's current label:
[[1, 2, 0, 1, 0, 1, 2, 0], [1, 1, 0, 1, 0, 1, 3, 3], [1, 1, 0, 1, 0, 1, 0, 3], [2, 0, 2, 1, 0, 1, 2], [1, 1, 0, 0, 0, 1, 2], [1, 0, 1, 1, 0, 1, 3, 0], [1, 1, 0, 3, 0, 1, 1], [0, 0, 2, 1, 0, 1, 1], [1, 3, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 1, 3, 3]]
****Consistent instance:
consistent examples: [6, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+1+00 01+0+00 0+00+011
Abduced labels:
00+1=00 01+0=00 0+00=011
Consistent percentage: 0.3
****Learned Rules:
rules:  ['my_op([0],[0],[0,1])', 'my_op([1],[0],[0])', 'my_op([0],[1],[0])']
Train pool size is : 22
...
This is the CNN's current label:
[[1, 1, 0, 1, 2, 1, 3, 3], [1, 3, 0, 3, 2, 1, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 1, 1, 2, 1, 3, 3], [1, 1, 0, 1, 0, 1, 3, 3], [1, 0, 3, 3, 2, 1, 1], [1, 1, 0, 1, 2, 1, 3, 3], [1, 1, 0, 1, 2, 1, 3, 3], [3, 0, 1, 1, 2, 1, 1]]
****Consistent instance:
consistent examples: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
mapping: {0: '+', 1: 0, 2: '=', 3: 1}
Current model's output:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Abduced labels:
00+0=011 01+1=01 0+00=011 00+0=011 0+00=011 00+0=011 0+01=00 00+0=011 00+0=011 1+00=00
Consistent percentage: 1.0
****Learned feature:
Rules:  ['my_op([1],[0],[0])', 'my_op([0],[1],[0])', 'my_op([1],[1],[1])', 'my_op([0],[0],[0,1])']
Train pool size is : 77
Test Acc. vs Eq. length
A task from Neural arithmetic logic units (Trask et al., NeurIPS 2018).
:- Z = [7,3,5], Y = 15, prog(Z, Y).
% true.
How to write the prog?
\[ \underset{\theta}{\operatorname{arg max}}\sum_z \sum_H P(y,z,H|B,x,\theta) \]
prog).
#= is a CLP(Z) predicate for representing arithmetic constraints:
X+Y#=3, [X,Y] ins 0..9 will output X=0, Y=3; X=1, Y=2; ...| Domain Knowledge | End-to-end Models | \(Meta_{Abd}\) | 
|---|---|---|
| Recurrence | LSTM & RNN | Prolog’s list operations | 
| Arithmetic functions | NAC & NALU (Trask et al., 2018) | 
Predicates add, mult and eq | 
| Permutation | Permutation matrix \(P_{sort}\) (Grover et al., 2019) | Prolog’s permutation | 
| Sorting | sort operator (Grover et al., 2019) | 
Predicate s (learned as sub-task) | 
Learned Programs:
%% Accumulative Sum
f(A,B):-add(A,C), f(C,B).
f(A,B):-eq(A,B).
%% Accumulative Product
f(A,B):-mult(A,C), f(C,B).
f(A,B):-eq(A,B).
Learned Programs:
% Sub-task: Sorted
s(A):-s_1(A,B),s(B).
s(A):-tail(A,B),empty(B).
s_1(A,B):-nn_pred(A),tail(A,B).
%% Bogosort by reusing s/1
f(A,B):-permute(A,B,C),s(C).
sorted/1 uses the subset of sorted MNIST sequences in training data as example.
References: