Markov Decision processes

A tuple – (S,s1,A,P,R)

S – finite set of states.

s1 – initial state.

A – finite set of actions.

P – Given a state s1 and action a, what is the probability of ending up at a particular state s2? This information is provided by P. This is called a State transition probability matrix. S*A*S

R – What is the reward for taking an action a at the state s?

Pi – Policy, what action to take at which state? A mapping from state space to action space. The optimal policy is the policy with the maximum reward.

Value function – Expected reward starting from state s and following policy pi.

Bellman equations –


Ways to find an optimal policy:

Optimal policy when starting in state s is the policy whose value function is maximum starting from state s.

Think about the case where a particular policy is always better than another policy for all initial states. Then first policy is greater than second policy and this is called partial ordering.


There always exists a policy such that the partial ordering of it with all other policies is greater or equal. Such a policy/policies is called optimal policy.

Iterative methods:

  1. Policy iteration:
    1. At each state, pick the action with max value function.
    2. you get the new policy.
    3. Again go to step 1 and loop until the new policy and old policy are same.
  2. Value iteration:
    1. Finding an optimal value function rather than explicit policy.
    2. For every iteration improve value vector.
    3. once you converge to a particular value vector, use it to find the optimal policy.

Model-free methods:

Reinforcement learning is Markov decision process with unknown transition model and/or reward distribution.

Agent can observe samples and induce transition model and reward distribution from that.


Uses Q-values instead of the value function.

To be continued……….

Random number Generator

The ideal random number generator is impossible. So, we usually have a pseudo-random generator which will be seeded by a number initially and then calculated subsequent random numbers using elements like current system time etc. For debugging stochastic programs, we have random.seed() in python which will generate the same sequence of random numbers again and again as we seed it with the same number.

Decision trees

What are decision trees?

Decision trees are classification models built using a tree where a particular feature is selected for branching at different levels down the tree.

Decision tree

The primary choice is which feature to use for splitting first?

We use info gain or Gini to make that decision.

Info gain:


but what does the info gain represent?

Info gain of a particular feature(work experience in above example) represents the predictability gain as we go down one level using that feature.

So, How is this predictability gain calculated?

This is measured by difference in predictability at parent node and expected sum of predictabilities in child nodes.

What is predictability at a node?

If number of positive and negative examples at a particular node is almost equal, then it is equally likely for a new example at that node to be either positive or negative.

If number of positive examples is far more than the number of negative examples at a particular node, then we can say with much more confidence that a new example at this node is more likely to be positive.

Second case is more predictable and contains less surprise, whereas first case is less predictable and has more surprise.

Now we use Entropy to quantify surprise at any node. How?


  1. How to train?
    1. How to select the next best feature for branching?
      1. Calculate info-gain for all the features and select the one with maximum.
  2. How to predict?
    1. Just pass the test example down the tree and you have your classification at the leaf node.
  3. How to write a decision tree in python using scikit?
    1. writing the classifierdtclf
    2. visualizing the decision treevizdt

When to use decision trees?

  1. well-suited for categorical data.

How much training data do you need?

Nature of decision boundary

Decision trees are interpretable unlike neural networks. Here you can exactly understand why a classifier makes a decision.

They are weak learners and hence good for ensemble methods.

They can handle unbalanced data sets, 1 % positive examples and 99 % negative examples.

This is Univariate – does not combine features.



Hash tables


  1. Random access.
  2. Fixed size.

Linked list:

  1. Not Random access.
  2. Not fixed size.

What we need is something that has random access and not fixed size.

Solution: Hash table


Uses a hash function to determine where to store a given key value.

Same hash function is used later to search for where a given key value is stored.


If the hash output of two keys point to the same location.

Solution: each location has a linked list of values.

Jeff Hawkins On intelligence

There are some basic principles by which the neo-cortex operates in Brain. These principles are responsible for all different kinds of cognition like vision, audition, touch etc.

The Brain is not a computing machine, it doesn’t compute the movement of the arm when it needs to move its arm. It just remembers the pattern it has acquired in its experience.

The Brain is very sparse in the sense that only very few cells are active at any point of time.

Sensory data invokes models we have previously developed about or related to them.

It’s not a computational problem, it’s a memory problem. Its a complex memory system involving hierarchies, sparse distributions etc.

If you want to capture intelligence, you don’t necessarily have to implement other activities of Brain like emotions(amygdala)

Creativity is pretty much analogy of patterns.

Induction is key for AGI

Induction -> models -> prediction -> decision -> action

For example, finding a number in the telephone directory.

Inducts that numbers are sorted by alphabetical order of names from the data.

Models or imagines a straight line domain for all data.

Predicts at what part of the book the number can be found.

Takes decision and action to open that part.

Again infers from data and updates its predictions, decisions, and acts accordingly with the new information.


It is not necessary for a theory to be computable in general, at least in most of the sciences. But as the Artificial intelligence sits as a branch of computer science, most of AI researchers looks for the theory of AI from the perspective of its computability. MARCUS HUTTER suggests that at least a small chunk of researchers should look for theory of intelligence without considering computational resource constraints and then we can approximate that theory to computability once we find it.

MML principle

Solomonoff’s theory of induction – Algorithmic probability and Kolmogorov complexity.