Ideas to work on

Understanding customer personality from user data.


Objective: Learn the habits of the user from his digital activities. Reflect it on him for retro. predict his behavior next day.

what data?

Digital data: what apps and sites he checks with what frequency?

Time stats:

Financial stats:

Data about the lifetime of a product would be very valuable to recommend another product of that category at the right time.

Input modifier:

Suppose you have a function add with arguments a and b. We usually have certain assumptions or requirements about what format that input has to be in. In this case, it might be some numbers. what if they are strings? it will through an error. but of we give same input to a human, he would probably concatenate two strings or map each char to an integer and add the result or something of that sort.

So, input modifier is an imaginative agent which would manipulate input we have/given to fit into the function argument requirements. This would allow for more collaboration among different functionalities at our disposal and may open the door for creativity in computation. However, there would be a trade-off of little uncertainty in it.

When you give circle and square arguments to put together function, input modifier can think of it as one inside the other, one after the other and so many other possibilities. this uncertainty is inevitable even in human cognition.

Universal evaluation of each person’s impact on the world and fairly distributing wealth based on that.

Principle component analysis for text summarization.

App for updating customers with latest IPOs and their prospectus.

App for local retail – buying things based on their closeness for faster delivery.






Design and Analysis of algorithms

The most important principle to be an algorithm designer is not to be content.

Asymptotic analysis:




The master method for analyzing the running time of divide and conquer algorithms:

  1. Black box solution for running time of you input few parameters related to the recurrence.
  2. Assumptions: All recurrences are of equal size.
  3. Master method

n- original problem size

a- number of recurrences called in each instance, rate at which subproblems proliferate. (Evil)

b- The factor by which the input size is divided for calling recurrences.

d- Polynomial exponent of the remaining work needed to merge solutions for the final solution.

b^d – Rate of work shrinkage per subproblem. (good)

Case 1; Same work at each level.

Case 2: More work at the root level.

Case 3: More work at leaf level.

Algorithm Design paradigms:

  1. Divide and conquer:
    1. write the base case.
    2. Using recursion to solve subproblems of a problem.
    3. Combine subproblems
    4. Store the results of subproblems in a hashmap and use them to trim other repeating recursive paths. (Dynamic programming)
  2. Randomization: Like in quick sort.
    1. Decomposition principle:
  3. Greedy algorithms.
  4. Dynamic programming.



n-num of vertices, m-num of edges

Graph partitioning: Cuts of a graph, Minimum cut


  1. Adjacency matrix. O(n^2)
  2. Adjacency lists. O(n+m)

Strongly connected components: Regions where you can go from any node A to any node B in the directed graph.

Kosaraju’s 2-pass strongly connected components algorithm:

One of the smartest and beautiful algorithms.

The structure of internet:

Structure of internet.PNG

Further reading: Networks, crowds and markets.



arcs – Directed edges, ordered pair.

Markov Decision processes

Markov decision process.PNG

A tuple – (S,s1,A,P,R)

S – finite set of states.

s1 – initial state.

A – finite set of actions.

P – Given a state s1 and action a, what is the probability of ending up at a particular state s2? This information is provided by P. This is called a State transition probability matrix. S*A*S

R – What is the reward for taking an action a at the state s?

Pi – Policy, what action to take at which state? A mapping from state space to action space. The optimal policy is the policy with the maximum reward.

Value function – Expected reward starting from state s and following policy pi.

Bellman equations –


Ways to find an optimal policy:

Optimal policy when starting in state s is the policy whose value function is maximum starting from state s.

Think about the case where a particular policy is always better than another policy for all initial states. Then first policy is greater than second policy and this is called partial ordering.


There always exists a policy such that the partial ordering of it with all other policies is greater or equal. Such a policy/policies is called optimal policy.

Iterative methods:

  1. Policy iteration:
    1. At each state, pick the action with max value function.
    2. you get the new policy.
    3. Again go to step 1 and loop until the new policy and old policy are same.
  2. Value iteration:
    1. Finding an optimal value function rather than explicit policy.
    2. For every iteration improve value vector.
    3. once you converge to a particular value vector, use it to find the optimal policy.

Model-free methods:

Reinforcement learning is Markov decision process with unknown transition model and/or reward distribution.

Agent can observe samples and induce transition model and reward distribution from that.


Uses Q-values instead of the value function.

To be continued……….

Credit assignment problem:

Random number Generator

The ideal random number generator is impossible. So, we usually have a pseudo-random generator which will be seeded by a number initially and then calculated subsequent random numbers using elements like current system time etc. For debugging stochastic programs, we have random.seed() in python which will generate the same sequence of random numbers again and again as we seed it with the same number.

Decision trees

What are decision trees?

Decision trees are classification models built using a tree where a particular feature is selected for branching at different levels down the tree.

They help you to create non-linear boundaries with series of linear questions.

Decision tree

The primary choice to make is which feature to use for splitting first?

The objective in choosing the order of split is to further narrow down the possibilities faster. For example, Animal? Person? famous? living? singer? female? Michael jackson?

If you have started with Michael, if I said yes, then that’s good, but if I said no, you effectively narrowed down almost nothing.

We use info gain or Gini to make that decision.

Info gain:


but what does the info gain represent?

Info gain of a particular feature(work experience in above example) represents the predictability gain as we go down one level using that feature.

So, How is this predictability gain calculated?

This is measured by difference in predictability at parent node and expected sum of predictabilities in child nodes.

What is predictability at a node?

If number of positive and negative examples at a particular node is almost equal, then it is equally likely for a new example at that node to be either positive or negative.

If number of positive examples is far more than the number of negative examples at a particular node, then we can say with much more confidence that a new example at this node is more likely to be positive.

Second case is more predictable and contains less surprise, whereas first case is less predictable and has more surprise.

Now we use Entropy to quantify surprise at any node. How?


If all the samples belong to the same class, entropy is 0. If the samples are evenly distributed among all classes, then the entropy is 1.

  1. How to train?
    1. How to select the next best feature for branching?
      1. Calculate info-gain for all the features and select the one with maximum.
  2. How to predict?
    1. Just pass the test example down the tree and you have your classification at the leaf node.
  3. How to write a decision tree in python using scikit?
    1. writing the classifierdtclf
    2. visualizing the decision treevizdt

When to use decision trees?

  1. well-suited for categorical data.

How much training data do you need?

Nature of decision boundary

Decision trees are interpretable, unlike neural networks. Here you can exactly understand why a classifier makes a decision.

They are weak learners and hence good for ensemble methods.

They can handle unbalanced data sets, 1 % positive examples and 99 % negative examples.

This is Univariate – does not combine features.

Expressiveness among all possible decision trees in n-attributes:

How to deal with continuous attributes? Age,weight,distance

It does make sense to repeat an attribute along a path of a tree if attributes are discrete. If attributes are continuous, then you ask different questions for same attributes.

What to do to avoid overfitting in decision tree?

  1. like usually, using cross validation.
  2. Checking with cross validation at every time you expand the tree and the cv score is less than a particular threshold, then stop the expansion.
  3. Or first, expand the whole tree and start pruning back the leaves and check cv score at each step.


ID3: Top down

Tuning parameters in sklearn:

  1. min_samples_split(default=2):  If node contains only n samples, then it will not be split any further.
  2. criterion=gini,entropy

Decision trees are generally prone to overfitting.

You can build bigger classifiers like ensemble methods using decision trees.





Hash tables


  1. Random access.
  2. Fixed size.

Linked list:

  1. Not Random access.
  2. Not fixed size.

What we need is something that has random access and not fixed size.

Solution: Hash table


Uses a hash function to determine where to store a given key value.

Same hash function is used later to search for where a given key value is stored.


If the hash output of two keys point to the same location.

Solution: each location has a linked list of values.

Jeff Hawkins On intelligence

There are some basic principles by which the neo-cortex operates in Brain. These principles are responsible for all different kinds of cognition like vision, audition, touch etc.

The Brain is not a computing machine, it doesn’t compute the movement of the arm when it needs to move its arm. It just remembers the pattern it has acquired in its experience.

The Brain is very sparse in the sense that only very few cells are active at any point of time.

Sensory data invokes models we have previously developed about or related to them.

It’s not a computational problem, it’s a memory problem. Its a complex memory system involving hierarchies, sparse distributions etc.

If you want to capture intelligence, you don’t necessarily have to implement other activities of Brain like emotions(amygdala)

Creativity is pretty much analogy of patterns.