- Think of reading as an active pursuit rather than being passive.
- Don’t get bogged down in details, try to sprint through them and look for the gist of the author.
- Read with the intention of finding out the gist and pace yourself accordingly.
- Learn to adjust your speed based on the complexity of the content.
- Skim through the content if you know what he is going to say.
- Stop vocalization during reading.
- Don’t be a word by word reader, be a phrase reader.
- Try to speed up yourself in your reading, that will help you more concentrated on the content.
- Develop your vocabulary.
- Like any other skill, more you practice, better you become. Make sure you practice with the above principles.
- Finally, Don’t consider it as an ephemeral course but as a lifelong journey of betterment.
Let’s say we have a machine that understands all the high-level abstractions and patterns in data. Let’s say it has built on its own models to predict what’s coming next. Data science ends here presenting models and predictions to Entrepreneurs or executives to make a decision. But Is it intelligent at this stage? No, it has to be able to make decisions on its own to be truly intelligent.
So, what guides decision making process?
A company makes its decision to maximize its profit. Profit is their value and that guides their decision-making process.
A man’s value of his survival deeply coded in his DNA motivates him to make proper decisions that could help him get proper food, shelter, clothes, sex etc.
Evolution has managed to deduce very rich and distinct emotions that could help us identify reinforcements in the real world. This makes it an imperative for any machine to be hard coded with these rules of value system to make use of reinforcement learning or other latest tech meme of that kind.
Whether it is chess or Go, if there is something it should definitely know before starting the game, it has to be the winning state and possibly evaluation functions for all different states possible in the Game. But that’s a small world to code valuations/reinforcements. How would you go about writing such codes for a machine to evaluate the humongous set of states, instances, objects in the real world? This makes me believe that modeling the value system or giving to a machine ability to build its own value system is at the core of AI. Too early to ignore Minsky’s rule-based systems.
Understanding customer personality from user data.
Objective: Learn the habits of the user from his digital activities. Reflect it on him for retro. predict his behavior next day.
Digital data: what apps and sites he checks with what frequency?
Data about the lifetime of a product would be very valuable to recommend another product of that category at the right time.
Suppose you have a function add with arguments a and b. We usually have certain assumptions or requirements about what format that input has to be in. In this case, it might be some numbers. what if they are strings? it will through an error. but of we give same input to a human, he would probably concatenate two strings or map each char to an integer and add the result or something of that sort.
So, input modifier is an imaginative agent which would manipulate input we have/given to fit into the function argument requirements. This would allow for more collaboration among different functionalities at our disposal and may open the door for creativity in computation. However, there would be a trade-off of little uncertainty in it.
When you give circle and square arguments to put together function, input modifier can think of it as one inside the other, one after the other and so many other possibilities. this uncertainty is inevitable even in human cognition.
Universal evaluation of each person’s impact on the world and fairly distributing wealth based on that.
Principle component analysis for text summarization.
App for updating customers with latest IPOs and their prospectus.
App for local retail – buying things based on their closeness for faster delivery.
Train a classifier to identify a student’s stage in Flow state graph in Udacity.
The most important principle to be an algorithm designer is not to be content.
The master method for analyzing the running time of divide and conquer algorithms:
- Black box solution for running time of you input few parameters related to the recurrence.
- Assumptions: All recurrences are of equal size.
n- original problem size
a- number of recurrences called in each instance, rate at which subproblems proliferate. (Evil)
b- The factor by which the input size is divided for calling recurrences.
d- Polynomial exponent of the remaining work needed to merge solutions for the final solution.
b^d – Rate of work shrinkage per subproblem. (good)
Case 1; Same work at each level.
Case 2: More work at the root level.
Case 3: More work at leaf level.
Algorithm Design paradigms:
- Divide and conquer:
- write the base case.
- Using recursion to solve subproblems of a problem.
- Combine subproblems
- Store the results of subproblems in a hashmap and use them to trim other repeating recursive paths. (Dynamic programming)
- Randomization: Like in quick sort.
- Decomposition principle:
- Greedy algorithms.
- Dynamic programming.
n-num of vertices, m-num of edges
Graph partitioning: Cuts of a graph, Minimum cut
- Adjacency matrix. O(n^2)
- Adjacency lists. O(n+m)
Strongly connected components: Regions where you can go from any node A to any node B in the directed graph.
Kosaraju’s 2-pass strongly connected components algorithm:
One of the smartest and beautiful algorithms.
The structure of internet:
Further reading: Networks, crowds and markets.
arcs – Directed edges, ordered pair.
A tuple – (S,s1,A,P,R)
S – finite set of states.
s1 – initial state.
A – finite set of actions.
P – Given a state s1 and action a, what is the probability of ending up at a particular state s2? This information is provided by P. This is called a State transition probability matrix. S*A*S
R – What is the reward for taking an action a at the state s?
Pi – Policy, what action to take at which state? A mapping from state space to action space. The optimal policy is the policy with the maximum reward.
Value function – Expected reward starting from state s and following policy pi.
Bellman equations –
Ways to find an optimal policy:
Optimal policy when starting in state s is the policy whose value function is maximum starting from state s.
Think about the case where a particular policy is always better than another policy for all initial states. The first policy is greater than second policy and this is called partial ordering.
There always exists a policy such that the partial ordering of it with all other policies is greater or equal. Such a policy/policies is called optimal policy.
- Policy iteration:
- At each state, pick the action with max value function.
- you get the new policy.
- Again go to step 1 and loop until the new policy and old policy are same.
- Value iteration:
- Finding an optimal value function rather than explicit policy.
- For every iteration improve value vector.
- once you converge to a particular value vector, use it to find the optimal policy.
- This is cheap compared to policy iteration.
Reinforcement learning is Markov decision process with unknown transition model and/or reward distribution.
Agent can observe samples and induce transition model and reward distribution from that.
Uses Q-values instead of the value function.
To be continued……….
Utility: Utility of the policy at a state is what happens if we start running from that state.
Reward gives us immediate feedback but utility gives us long-term feedback. utilities allow you to take short-term negatives for long-term positives.
Credit assignment problem:
The ideal random number generator is impossible. So, we usually have a pseudo-random generator which will be seeded by a number initially and then calculated subsequent random numbers using elements like current system time etc. For debugging stochastic programs, we have random.seed() in python which will generate the same sequence of random numbers again and again as we seed it with the same number.
What are decision trees?
Decision trees are classification models built using a tree where a particular feature is selected for branching at different levels down the tree.
They help you to create non-linear boundaries with series of linear questions.
The primary choice to make is which feature to use for splitting first?
The objective in choosing the order of split is to further narrow down the possibilities faster. For example, Animal? Person? famous? living? singer? female? Michael jackson?
If you have started with Michael, if I said yes, then that’s good, but if I said no, you effectively narrowed down almost nothing.
We use info gain or Gini to make that decision.
but what does the info gain represent?
Info gain of a particular feature(work experience in above example) represents the predictability gain as we go down one level using that feature.
So, How is this predictability gain calculated?
This is measured by difference in predictability at parent node and expected sum of predictabilities in child nodes.
What is predictability at a node?
If number of positive and negative examples at a particular node is almost equal, then it is equally likely for a new example at that node to be either positive or negative.
If number of positive examples is far more than the number of negative examples at a particular node, then we can say with much more confidence that a new example at this node is more likely to be positive.
Second case is more predictable and contains less surprise, whereas first case is less predictable and has more surprise.
Now we use Entropy to quantify surprise at any node. How?
If all the samples belong to the same class, entropy is 0. If the samples are evenly distributed among all classes, then the entropy is 1.
- How to train?
- How to select the next best feature for branching?
- Calculate info-gain for all the features and select the one with maximum.
- How to select the next best feature for branching?
- How to predict?
- Just pass the test example down the tree and you have your classification at the leaf node.
- How to write a decision tree in python using scikit?
When to use decision trees?
- well-suited for categorical data.
How much training data do you need?
Nature of decision boundary
Decision trees are interpretable, unlike neural networks. Here you can exactly understand why a classifier makes a decision.
They are weak learners and hence good for ensemble methods.
They can handle unbalanced data sets, 1 % positive examples and 99 % negative examples.
This is Univariate – does not combine features.
Expressiveness among all possible decision trees in n-attributes:
How to deal with continuous attributes? Age,weight,distance
It does make sense to repeat an attribute along a path of a tree if attributes are discrete. If attributes are continuous, then you ask different questions for same attributes.
What to do to avoid overfitting in decision tree?
- like usually, using cross validation.
- Checking with cross validation at every time you expand the tree and the cv score is less than a particular threshold, then stop the expansion.
- Or first, expand the whole tree and start pruning back the leaves and check cv score at each step.
ID3: Top down
Tuning parameters in sklearn:
- min_samples_split(default=2): If node contains only n samples, then it will not be split any further.
Decision trees are generally prone to overfitting.
You can build bigger classifiers like ensemble methods using decision trees.