By Owain Evans from Oxford at #NIPS2017 Aligned Artificial Intelligence Workshop


SJ: takes a long time due to serial reasoning, meta-resonating, running experiments, discussing with experts

e.g. alphago

Optimizing for slow judgements

  • agent takes actions w/ highest approval/reward after slow judgement (e.g. thinking for 5 days)
  • slow judgement considered short-term progress on task, safety constraints , long-term consideration
  • standard RL vs christiano “approval directed”


ai-complete: involve math reasoning, scientific experiments, moral deliberation
missing data - slow judgements intrinsically expensive


  • cheap signals- human quick judgement ad revenue, stock price. hand-engineered shape reward
  • use cheap signals (not sparse) to optimize

Alternative: optimize for cheap signals

  • no need to collect expensive data
  • easier to apply standard ML
  • cheap signals are correlated with slow judgements
    • optimizing for cheap migh initially do well for slow judgements (but will ultimately diverge)

Research goal

  • Make optimizing for slow judgements computationally competitive (to achieve practical tasks e.g. Siri)
  • competitive = similar speed when deployed, training similar size, training not much longer

Problem statement

Frame “Predicting slow judgement” as classification problem:
$$x_{i} => h^{*}(x_{i})$$
input(feature vector) target

$$h^{*}(xi)$$: Alice’s judgement about $x_{i}$ after long deliberation and research (e.g. 5 days)
(5 days may seem absurd)

JB note: wow this is a dream. Can RL help us get 5 days worth of effort in evaluating politics?

Guiding example:

  • figure out what this means
    “When I was governor of Massachusetts $CLAIM”
    JB note: I abbreviated this.


  • could do poorly due to few $h^{*}$ and AI-completeness
  • Cheap signals at train time (mitigate few h star)
  • Cheap signals at test time (mitigate AI-completeness)
    Data we have: some human expert made a judgement already

Consider cheap signals where Alice makes judgements after a series of timestamps (e.g. 20s, 60s)
They get more expensive over time.

Then we can get judgements from Bob and also Alice.

JB Note: wow would be fun to play with diversity in ensembles of judges!

What now?

Collect more of the cheapest signal

Relation to standard ML problem:

  • most training data is unlabeled => semi-supervised learning
  • for unlabeled data, we have noisy/biased labels => weak supervision
  • item-user (sparse) matrix => collaborative filtering

Datasets for PSJ

Creating two:

  1. Fermi: fermi estimation comparisons (no research)
    e.g. weight of bush pig in kg < 99
  2. Politifact: judge truth of political statement using Google
    e.g. number of letters in the human language


Was hard to get MT workers to think hard about these.
Base rate correctness:

  1. Fermi 51%
  2. Politifact 52% (hard to get them to do more research)

Modeling the data

pilot study : 500 questions

fix total cost
snap_and_answer - mostly quick
same_spend - mix of quick and slower

conclusion - time for research/calculation helps models