Fractal dimension, roughness

The scaling factor(SF):

Scale a shape to one-half or any other fraction of it’s original

The mass scaling factor(MSF):

Find the ratio of the mass of new shape to an original shape or the ratio of a number of pixels captured by new shape to original shape.

SF^D=MSF

D is the fractal dimension. This is used as a measure of the roughness of the shape.

Usually, natural objects have fractal dimensions and man-made objects have more integer dimension.

Fractal dimension

Techniques in Math

When you see a curve that looks like some exponential curve and you don’t know the exponent, then take logarithm on both sides and plot to find the slope of the straight line as your exponent.

The logarithm of a number by two different base values differs by a constant factor.

If we want to find the extremum of a function with constraints, Lagrange Multipliers will be useful. These help us to incorporate constraints and function into another function so that we don’t need to think about constraints anymore and just finding the extremum of this new function will suffice.

Architecture for AGI

This article mainly runs on following dimensions.

  1. Perceiving the state of the world and state of yourself in the world.
    1. The power of discriminating and encapsulating different material and immaterial concepts which are often indefinitely defined, fuzzy, context -based and have no rigid boundaries.
      1. Natural language processing.
      2. Abstractions- clustering problem. Identifying the need for a level of clustering for current problem domain.importance of similarity measure in clustering. Rudimentary kind of clustering is clustering based on how different objects in the world be useful to him in what ways… Like things to eat to be clustered as one, things to talk to, things to use as the tool, or which body part to be used for what object. K-means algorithm.naive Bayes inferring classes from the models and models from the classes.em algorithm.
    2. pattern recognition.
    3. proprioception and self-referential systems.
    4. object recognition (https://www.youtube.com/watch?v=5yeusVF42K4)
    5. Donald Hoffman’s theory of perception.perceptual systems tuned to fitness outcompete those that are tuned to truth/reality.
  2. Goals and priorities in them.
    1. Switching between the goals based on the timeframes and dependencies for each goal.
    2. Inbuilt goals like getting sex, food, staying alive and goals we choose for ourselves by the culture and experiences and interaction between two kinds. Ethical issues of giving a power of creating self-goals to AI systems.
    3. Decision-making.
  3. Relevance measure for objects perceived in the world to the current goal.
    1. Prioritising perception or attention based on relevance to goal.
    2. Adaptivity of attention based on the skill level(acquired by repetition of the same process)
  4. Identifying actions or a sequence of actions to impact objects for achieving our goals.(planning)
    1. The element of stochasticity or randomness in finding the sequence is necessary to be creative when the sequence is unknown or not obvious.
    2. This can happen from analogies, empathy and copying, pattern recognition.
    3. Considering optimisation in face of multiple paths to a goal.
    4. related to imagination.
  5. A parallel Emotion machine with continuous feedback of utility or happiness on your present state in the world.
    1. Decision-making.
    2. operant conditioning.
  6. Platform knowledge: what information do we have at our birth about the world we are surrounded by and how to build it for a machine?
    1. Evolution.
    2. Jealous baby.
    3. A young baby doesn’t get surprised if a teddy bear passes behind a screen and reemerges as an aeroplane, but a one-year-old does.
    4. No categorical classification at the very beginning.
    5. As a child grows, a number of neurones decreases but a number of synapses increases.
  7. Knowledge representation, search and weights for retention(what you don’t use, you lose)
    1. Brain plasticity.
    2. Repetition and skill.
    3. Priming and classical conditioning.
    4. The DRM effect
    5. Do elements containing similar structure trigger each other.(Hofstader’s Danny in the grand canyon), pattern recognition by analogy.
    6. Reconstruction of data as memory at hippocampus with already existing memory.
    7. Procedural , episodic, semantic
    8. Representation of ideas that could be independent of language, which could allow for generalized reasoning to draw conclusions from combining ideas.
  8. Imagination at the core of knowledge, emotion machine and goals.
    1. Demis hassabis research on Imagination and memory.

What this article doesn’t talk about is:

  1. Whether a machine can have qualia even though it acts like it has general intelligence.Mary’s room. How to test qualia for a machine? Is Turing test enough for that?
  2. When did the consciousness first evolved and How?
  3. Moral implications of AGI and singularity.

Organising the interaction of all the above:

Why do I think I am talking about AGI? because we should be able to deduce any human endeavour from the interactions of above dimensions, at least from the infinitely long interactions and their orderings. Connections missed in below diagram:

  1. Goals to emotion machine.

my-agi-architecture

Deep dive (one at a time):

    1. Why object recognition is difficult?
      1. Objects defined based on purpose rather than its look or structure. We need to co-ordinate with module 3 and 7 to overcome this.
      2. We need to identify the object even though the viewpoint of it changes. When viewpoint changes, we have this problem of dimension hopping in training neural networks to recognize the object.Usually, inputs of neural networks for image recognition would be pixels, but when viewpoint changes, the input at one pixel at one training instance will be same at another pixel during different training instance. This is dimension hopping.#viewpoint_invariance.
    2. True language understanding is impossible without internal modeling the world.
  1.  Goals and priorities
  2. Relevance
  3. Planning (identifying action sequences):
  4. Emotion machine and utility:
  5. Platform knowledge: Jean Piaget, a child development psychologist argued that children create a mental model of the world around them and continuously modify that model as they interact with reality and receives feedback. He called this phenomenon as Equilibration.
    1. Infant reflexes .
    2. A child brain is less focussed with more learning rate on multitude of things, whereas an adult is more focussed, having better self control but with less learning rate.
    3. It seems true that humans come into the world with some innate abilities. some examples by steven pinker,
      1. when a human and a pet are both exposed to speech, human acquires the language whereas pet doesn’t, presumably because of some innate difference between them.
      2. Sexual preferences of men and women vary.
      3. Experimental studies on identical twins who are separated at birth and examined at later stages of life showing astonishing similarities.
      4. Do we have moral sense at birth?
    4. The above discusses about some facts on human nature at birth, but our main focus is to find Is there a seed meta-program that we are endowed with at birth which makes all other learning and behavior possible? If there exists what is it? These kinds of systems are called constructivist systems (thorisson).

Existing Cognitive Architectures: Two types

  1. Uniformity first:
    1. Soar Architecture
  2. Diversity first:
    1. ACT-R : intelligent tutoring systems are its application. John Anderson.

      Autocatalytic Endogenous Reflective Architecture (AERA)

Sigma architecture:

Architecture-requisite

Deep mind(Demis hassabis):

Deep learning + Reinforcement learning

Miscellaneous:

  1. Art is partly skill and partly stochastic.
  2. Poetry at the intersection of analogy and utility.
  3. How much of General and/or human intelligence has to do with reason? If it is fully correlated with reason and morality is built on only reason, then we and AGI has no conflict of interest as more intelligence only means more morality. #moralitynintelligence
  4. Value system guiding attention, attention setting the problem domain, problem domain looking for heuristics, heuristics guiding the search path.

Glossary:

  1. Distributed representation: Many neurones are involved in representing one concept and one neurone is involved in representing many concepts.

Robust software development process

At scratch:

Document the requirements.

Divide and conquer: Break the implementation of the project into the small chunks of independent interacting software modules.

Design the proper architecture for these interacting modules.

Document the design.


Development:

  1. Build a module.
  2. write any simulators needed to give it input and see the output (optional).
  3. Test the module completely.

Repeat above three steps until all the modules are completed.

Put modules together in a proper interaction flow.

Write test cases and automate testing.


Customize:

Read new requirements and add it to the requirements document to track or identify any reported scenarios as bugs or unexpected(requires new implementation).

Implement the requirement and test for implementation.

Do regression testing and update the regression test suite with the test cases of new implementation and modify the already existing test cases if necessary as per the requirement.


The process is complete.

Let’s talk about main elements of debugging for developer

  1. search (ctrl+f).
  2. breakpoints.
  3. call stack.
  4. call hierarchy.
  5. watch.
  6. data breakpoints and conditional breakpoints.
  7. logs.

Learn system design:

https://www.educative.io/collection/5668639101419520/5649050225344512

Python

Sorting a list of tuples by multiple conditions:

sorted_by_length = sorted(list_,
                         key=lambda x: (x[0], len(x[1]), float(x[1])))

To get the lower bound integer in python:

int(1.6) =1

int(1.4)=1


function:

def function_name(i):


Modules:

write a python file with filename.py extension and add definitions and statements in them.

‘import filename’ in another file.

access definitions inside that file using filename.nameofdef


tup=()

concatenating to a tuple: tup=tup+(a[i],)

list1=[] – ordered sequence.

list1.append(1)

Can’t mutate a string. assignment error.


Aliasing problem in python:

if you have an object say a list and you attach it to the name list1

if you say list2=list1

list2=list1, then list2 points to same object as that of list one.

If you add an element to list2, then it will be reflected in list1 as well which you might not have intended.

So how to take a copy of list1 without having any impact on list1 if I do changes to the copy?

list2=list1[:]              cloned

sort the original list-> list1.sort()

take a sorted list of original list into another object

list2=sorted(list1)

http://stackoverflow.com/q/2739552/5819704 – [[1,2]]*3, all three lists inside outer list refer to same object. This doesn’t sync with your intuition at the beginning.

tuple and string doesn’t support item assignment i.e. s[1]=5, only list supports.


Functions as objects:

sending functions as arguments to other functions.

Higher order programming: Map

Dictionaries: A list of key, value pairs

my_dict={‘Ana’:’B’,”Vivek’:’Ex’}

‘Vivek’ in my_dict  -> True

my_dict[‘Vivek’]=’Ex’

my_dict.keys{}

my_dict.values{}

Memorization: Storing the already computed values in dictionary and looking up for them in case we need them again. This will save from computing something that is already computed.


Debugging:

  1. Testing:
    1. Unit testing-
      1. validate each piece of the program.
      2. testing each function separately.
    2. Regression testing-
      1. After each fix, you need to retest to confirm already tested modules or not broken by the latest fix.
    3. Integration testing-
      1. Does overall program work?
    4. Black box testing-
      1. without looking at the code.
      2. done by other than the implementer to avoid some implementer biases.
    5. Glass box testing:
      1. Path complete.
      2. Design of test cases based on code.
  2. Defensive programming:
    1. specifications for functions.
    2. Modularize programs.
    3. check conditions on inputs/outputs.
  3. Eliminate the source of bugs:

try:

except:

else:

finally:

raise ()

assert             to check whether the assumptions are met or not.


classes and objects:

Data attributes associated with the class, but not with instance or objects.

Generator: yield 1 in method. runs upto the yield and stops. again call the funct.__next__() runs upto next yield and stops.


Lisp and python programming languages allow you to implement reflection.


Using Pylab:

import pylab as plt

plt.figure(‘figurename’) – to plot in new figure

plt.clf()

plt.ylim(0,1000)

plt.title(‘Linear’)

plt.xlabel(‘sample points’)

plt.ylabel(‘linear function’)

plt.subplot(211)

plt.plot(list of x values, list of y values,’b-‘,label=’linear’)

plt.yscale(‘log’)

model=pylab.polyfit(observedX,observedY, n)

Finds coefficients of a polynomial of degree n, that provides a best least squares fit for the observed data. It returns n+1 values.

estYvalues=pylab.polyval(model,xVals)

 


Anonymous function – lambda:

f1=lambda x,y: x+y

f1(2,3)=5


Credits: Jake Vanderplas

An abstract class in python can’t be instantiated.


Scikit:


Pandas:


 

Statistics

Science and the art of collecting, analyzing and interpreting data.

Qualitative data:

Types:

Nominal:

Ordinal:

Quantitative data:

Types:

Discrete:

Continuous:

Summarizing data (Summary statistics):

  1. by a typical value (Averages)
    1. Median – More robust than mean, because if one of the elements in data is missing, it might have a large impact on mean, but not a significant impact on the median.
    2. Mean

The mean is affected by outliers(skewing) while the median isn’t

The average doesn’t efficiently summarize the data set in case of bimodal distribution.

2. How different the values of data set from this typical value(variability).

1. range(max-min) – Outliers have the big impact.

2. interquartile range – resistant to outliers.

3. Variance and standard deviation.

Visualizing data:


Law of Large numbers/Bernoulli’s law:

Image result for law of large numbers equation

For a random variable, if you take a large number of samples and average them, as the number of samples n tends to infinity, the average tends to approach expectation of that random variable.

Gambler’s fallacy (wrong):

Law of large numbers doesn’t imply that if deviation from expected behavior occurs, then these deviations will be evened out by opposite occurrences in future.

Example: He is due for a hit because he hasn’t had any.

Regression to the mean (correct):

Following an extreme random event, the next random event is likely to be less extreme.

Central Limit Theorem:

You have a random variable with a random distribution.

Pick n samples from that random distribution and average them.

Do this multiple times and plot those averages of n samples of original distribution.

This new plot/ distribution will be Normal distribution irrespective of the shape of the original distribution.

If you increase sample size n, variance of the new distribution decreases by a factor of n (var^2/n).

Monte Carlo simulation:

Stratified sampling:

Bigger sample size -> less standard deviation -> small confidence interval -> narrow error bar.

Glossary:

Statistically significant:

When confidence intervals don’t overlap.

When you conduct an experiment, you may come up with a hypothesis from the data. How do you know this result or hypothesis isn’t come up just by chance?

This is what we do to find out that.

We think of null hypothesis, where the intended element has no effect. Then we simulate the experiment with null hypothesis for multiple times and plot that.

Image result for 95 confidence interval

What is the probability of the original hypothesis results appearing on this null hypothesis data/plot? If the original hypothesis is not in the 95% confidence interval of null hypothesis plot, then we can say that the experiment is statistically significant and can favor the original hypothesis that it does have an effect.


Standard error:

The standard deviation of the sampling distribution of the sample mean or

sampling distribution of the mean is also called the standard error of the mean.


Always take independent random samples.


Statistical fallacies and morals:

Statistics about the data is not the same as data.

Use visualization tools to look at the data.

Look at the axes labels scales.

Are things being compared really comparable?

Artificial Intelligence

Definition: The study and design of intelligent agents, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success.


Agent and environment:

Rational agent: The agent which selects the action that maximizes its performance measure given its inputs or perceptions and built-in knowledge.

PEAS: Performance, Environment, Actuators and Sensors

Types of Environment:

  1. Fully vs partially observable:
  2. Deterministic( vs stochastic): The next state of the environment is completely determined by the current state.
  3. Episodic vs sequential: Each perception-action pair can be considered as an episode.
  4. static vs dynamic:
  5. discrete vs continuous:
  6. single-agent vs multi-agent:
  7. known vs unknown:

Types of Agents in the order of increasing generality:

  1. Simple reflex agents: based on current state only. Look up the table of percepts and actions.
  2. Model-based reflex agents: based on percept history, maintains a model of the world, condition-action rules decide the action of the agent.
  3. Goal-based agents: contains goal information additional to Model-based agents.Goals decide the actions of the agent.
  4. Utility-based agents:  Utility function is the agent’s performance measure. try to increase the happiness.
  5. Learning agents:

Agent’s organisation:

  1. Atomic representation: Consider each state as a black box.
  2. Factored representation: Each state has attribute value properties.
  3. Structured representation: Relationship between objects inside a state.

Search Agents

General example problems from AI textbook:

  1. Travelling salesman problem.
  2. Route finding problem
  3. Robot navigation
  4. protein design

State space, search space and search tree.

Measures:

  1. Completeness: Can the algorithm find a solution when there exists one?
  2. Optimality: Can the algorithm find the least cost path to the solution?
  3. Time complexity: How long does it take to complete the search.
  4. Space complexity: How much memory is needed to complete the search.

Compare search

Two types of search:

  1. Uninformed: No information about the domain.
    1. Breadth-first: FIFO
      1. bfs
      2. Applications: computing shortest path when all the edge lengths are same.
    2. Depth first: LIFO
      1. dfs
      2. Depth limited is not optimal because it might return us a high-cost path on the left sub-tree without considering other low-cost paths in the right subtree.
      3. Applications: Topological ordering of directed acyclic graphs.
    3. Depth limited: Depth first with depth limit
    4. Iterative deepening: Depth limited with increasing limit, combines benefits of depth first and breadth first. First do depth first search to a depth limit n and then increase the depth and do depth first search again. diff of BFS and IDS
    5. Uniform cost: Expand least cost node first, FIFO
      1. Dijkstra’s algorithm is a variant of ucs, except that ucs finds shortest path from start node to goal node, whereas Dijkstra finds shortest path from start node to every other node.ucs and Djikshtra.Computing shortest path when edge lengths are different. Use heap data structure for speed.
      2. I doubt Google uses UCS for it’s route finding in maps. Cost between the nodes can be a mix of distance, traffic, weather etc.Maps algorithm
        1. ucs
  2. Informed: Some information about the domain and goals.
    1. Heuristics search (h(n)): Determine how close we are to the goal based on a heuristic function. Admissible heuristic thinks that the cost to the solution is less than the actual cost.
      1. Greedy search expands the node that appears to be closest to the goal. uses a heuristic function that has the information of cost from a particular node to the target node.
        1. greedy search.PNG
      2. A* search:  Consider both the heuristic that says about the cost from a node to the target node and cost to reach that node. A* search will revise it’s decision, whereas greedy search would not. This property makes greedy incomplete and A* complete.
      3. IDA*:
  3. Local search: iterative improvement algorithms by an optimization/fitness function.
    1. Hill climbing (steepest descent/ascent,greedy local search):some variants of this are:hill-climbing
      1. sideways moves – escape flat lines.
      2. Random restart – to overcome local maxima.
      3. Stochastic –
    2. Simulated annealing : from statistical physics.
    3. Local beam search – k states instead of single state.
    4. Genetic algorithms: from evolutionary biology
      1. genetic algo.PNG

Adversarial search and games:

Multiagent environment, uncontrollable opponent.

Not a sequence of actions, but a strategy or policy.

Approximation is the key ingredient in the face of combinatorial explosion.

Types of games.PNG

(perfect information, deterministic) – Zero sum games.

At any stage, one agent will try to pick the move with maximum utility out of all legal moves and the opponent will pick the move with minimum utility.

Embedded thinking or backward reasoning:

Thinking through the consequences and potential actions of the opponent before making the decision.

Minimax : Two opponents playing optimally, one trying to maximize and other trying to minimize the utility.

Alpha-Beta pruning:

When the agent that is trying to find Max utility of it’s children, you start with an alpha as (-infinity), and update alpha if you find any child nodes with utility better than present alpha. During this process, if you come across any 2nd gen childs with utility less than alpha, then you don’t need look for utilities of other 2nd gen childs of that particular 1st child. Why? because that 1st gen child is trying to find minimum, once you go down below alpha, that 1st gen child is out context at the zeroth generation which is trying to find max has found that this 1st gen child can no longer contribute to me. In this context, ordering of child nodes matters a lot. Right ordering can save you from exploring lot of unnecessary childs/paths.

Stochastic/Non-deterministic games:

A chance/random element involved.

Expectiminimax.

Machine Learning:

Data collection -> Data preparation -> Exploratory Data analysis(picture the data using different graphs and charts) -> Machine learning techniques -> Visualization -> Decision

  1. Decision trees:
  2. Rule induction:
  3. Neural networks:
  4. SVMs
  5. Clustering method
    1. K-means
    2. Gaussian mixtures
    3. Hierarchical clustering
    4. Spectral clustering.
  6. Association rules
  7. Feature selection
  8. Visualization
  9. Graphical models
  10. Genetic algorithm

Statistics:

  1. Hypothesis testing
  2. Experimental design
  3. Anova
  4. Linear regression
  5. Logistic regression:
    1. It is a linear classifier, why?
      1. Because the decision boundary where the probability=0.5 is linear.
  6. GLM
  7. PCA

Definition of Machine learning:

A computer program is said to learn from experience E with respect to some class of tasks T and performance P, if its performance at tasks in T , as measured by P, improves with experience E.

BY it’s very definition, Machine learning algorithms are generally subdued to a particular class of tasks and not generic enough for general intelligence.

K-nearest neighbors:

O(n*d) – n is the number of training examples and d is the number of features.

K is usually chosen as odd value.

z-scaling – Each feature has a mean of 0 and a standard deviation of 1.

Applications:

Recommender systems (collaborative filtering).

Breast cancer diagnosis

Handwritten character classification

information retrieval

To Avoid over fitting:

Reduce number of features manually or do feature selection.

Do a model selection

Use regularization (keep the features but reduce their importance by setting small parameter values)

Do a cross validation to estimate the test error.

Training set used for learning the model.

Validation set is used to tune the parameters, usually to control overfitting.

Linear Models:

  1. Linear regression.
    1. Using least square loss
  2. Linear classification:
    1. Perceptron: Data has to be linearly separable.

Classification: Logistic regression

Decision Trees: Tree classifiers

Start with a set of all our examples at the top root node.

Create branches considering a splitting criteria and repeat this for creating other branches as you go down.

write the final decision at the bottom of the tree based on training examples and used this tree to classify any new test data.

First, we calculate entropy at a node and then compare info gains for extending that node for different dimensions. Then we choose to extend based on the category of high info gain.

Pruning strategies to escape overfitting:

  1. Stop growing the tree earlier.
  2. Grow a complex tree and then prune it back. Remove a subtree and check it against a validation set, if the performance doesn’t degrade, remove the subtree from original.

Cart method:

Another method to build a decision tree.

  1. Adopt same greedy, top-down algorithm.
  2. Binary splits instead of multiway splits.
  3. Uses Gini index instead of information entropy.

Gini=1-p^2-P^2

p-proportion of positive examples

P-proportion of negative examples.

Bayes Rule:

Bayes rule.PNG

Discriminative Algorithms:

Idea: Model p(y/x), conditional distribution of y given x.

Find a decision boundary that separates positive from negative example.

Generative Algorithms:

modelling for Bayes rule.PNG

Naive Bayes assumes that feature values are conditionally independent given the label.

Naive Bayes classifier is linear.

Naive Bayes.PNG

Ensemble Methods:

An Ensemble method combines the predictions of many individual classifiers by majority voting.

Such individual classifiers, called weak learners, are required to perform slightly better than random.

  1. Majority voting:
    1. Condorcet’s Jury theorem:
      1. Assumptions:
        1. Each individual makes the right choice with a probability p.
        2. The votes are independent.
      2. If p>0.5, then adding more voters increases the probability that the majority decision is correct. If p<0.5, then adding more voters makes things worse.
  2. Boosting:
    1. One of the most popular Machine learning methods.
    2. AdaBoost.M1-popular algorithm.
    3. Weak learners can be trees, perceptrons, decision stumps etc.
    4. The predictions of all weak learners are combined with a weighted majority voting.
    5. Idea : Train the weak learners on weighted training examples.AdaBoost.PNG
      1. AdaBoost with Decision stumps lead to a form of feature selection.
      2. Bootstrapping is a resampling technique for training weak learners from m samples of original training data.
  3. Bagging:Bagging.PNG
    1. Bagging on trees works better than building a big tree on whole data.
  4. Random forests: Make lot of decision trees with each tree trained on random subset of training data and random subset of all features considered. use the majority votes of different decision trees to determine the final outcome for random forest. This is relying on the principle that majority of decision trees will compensate for some wrongly classifying decision trees at different instances.
    1. Quora uses random forests for determining whether two questions are duplicates.

A practical guide to apply Support Vector Classification:

Clustering examples:

  1. Clustering of the population by their demographics.
  2. Audio signal separation.
  3. Image segmentation.
  4. Clustering of geographic objects
  5. Clustering of stars.

K-means:

K-means fails for:

Solution for How to select K.

G-means:G-Means betterment of k-means.PNG

How to evaluate?

internal evaluation: High intra-cluster similarity, Low inter-cluster similarity.

External evaluation: Mutual information, entropy, adjusted random index.

Other clustering methods:

  1. Special clustering
  2. DBSCAN
  3. BIRCH

Association Rules:

Supp(A->C): P(A^C)

Probabilistic interpretation:

R: A->C

conf(A->C)=P(C/A)=P(A^C)/P(A)

Though this looks like a valid measure, it doesn’t take into account P(C).

Interest(A->C)=P(A^C)/(P(A)*P(C))=Conf(A->C)/supp(C)

interest(R)=1 then A and C are independent.

interest(R)>1 then A and C are positively dependent.

interest(R)

Applications:

Collaborative filtering, customer behavior analysis, web organization, affinity promotion, cross-selling

Multidimensional rules:

Read more about Quantitative Association rules as well.

CONSTRAINT SATISFACTION PROBLEMS:

Care about goal rather than the path.

intelligence_levels

Three elements:

  1. A set of variables.
  2. A set of domains for each variable.
  3. A set of constraints that specify allowable combinations of values.

Types of consistency:

  1. Node consistency – unary constraints
  2. Arc-consistency (Constraint propagation)- binary constraints – For every value that can be assigned to X, if there exists a value that can be assigned to Y, then X->Y is arc consistent. AC-3 is the algorithm that makes a CSP arc consistent. O(n^2*d^3)
  3. Path-consistency – n-ary constraints

Forward checking propagates the information from assigned to unassigned variables only and does not check the interaction between unassigned variables. Arc consistency checks for all arcs and keep doing it whenever the domain of a variable is reduced. In this sense, AC is more general and an improvement of FC.

Two types of CSPs:

  1. Search based
    1. BFS
    2. DFS
    3. BTS
  2. Inference

Choose the one with minimum remaining values to fill first.

Fill with least constraining value first – the one that rules out fewest values in remaining variables.

Can work on independent subproblems separately based on the problem structure.

Backtracking:

d-size of domain, n-number of variables.

O(d^n)

Reinforcement learning:

Learning behaviour or actions based on the rewards/feedback from environment.

Science of sequential decision making.

  1. Markov decision process – Formal model for sequential decision making.
    1. Markov property – Future is independent of the past given the current state.
    2. Applying reinforcement framework to Markov process.
    3. Theorem – Due to the Markov property, A discounted MDP always has the stationary optimal policy (pi).
    4. A policy is better than another policy if it’s value is greater than the other one calculated from all starting states.
    5. One useful fact is there always exist an optimal policy which is better than all other policies.
    6. Bellman equations.
    7. No closed form solution to Bellman equations to solve for optimal policy.
    8. other iterative solution approaches are
      1. Policy iteration.
      2. Value iteration.
    9. Think of different policies to traverse through MDP. Calculate value of each policy.
    10. Q-value iteration.
    11. Epsilon greedy exploration – with probability epsilon, take random action, with probability (1-epsilon) , take greedy action.

Logical agents or knowledge-based agents:

Mainly based on knowledge representation(domain specific) and inference(domain independent).

  1. Using propositional logic.
  2. Using First – Order Logic – used in personalized assistants.

pros and cons of Logical agents:

  1. Doesn’t handle uncertainty, probability does.
  2. Rule-based doesn’t use data, ML does.
  3. It is hard to model every aspect of the world.
  4. Models are encoded explicitly.

Resolution algorithm

Natural Language Processing:

Text classification (spam filtering)- Naive Bayes classifier.

Sentiment analysis

Naive bayes classifier – Assumes features are independent.

Classifies the new example based on the probabilities of different classes given the features of that particular example.

m-estimate of the probability:

  1. Augment the sample size by m virtual examples, distributed according to prior p.

Chain rule of probability:

  1. To estimate joint probability

N-gram model: Look N-1 words in the past to predict next word.

Bigrams capture synctactic dependencies such as noun comes after eat and verb comes after to etc.

3-grams and 4-grams are common in practice.

Perplexity – Hiigher the conditional probability, lower the perplexity.

Great progress:

Tagged text – which word is what type – noun, adjective,verb

Name entity recognition – yesterday(time), I(person), five (quantity)

Good progress:

Parsing – Exhibit the grammatical structure of a sentence.

Sentiment analysis – Does it identify sarcasm?

Machine translation

Information extraction.

work in progress:

Text summarization

Question/Answering

Dialog systems: Siri, echo etc.


Deep learning :

  1. Less or no feature engineering required.
  2. Need to choose number of layers in the stack and number of nodes in each layer.
  3. Needs to be sensitive to minute details to distinguish between two breeds of a dog and at the same time, it should be invariant to large irrevelant variations like background, lighting and pose etc.
  4. ReLU usually learns much faster in networks with many layers compared to sigmoid.
  5. Pre-training was only needed for small data sets with the revival of deep learning.
  6. Convnet combined with recurrent net to generate image captions. Vision to language interface.
  7. Handwritten characters, spoken words, faces were most succesfull applications.
  8. Theorem: No more than 2 hidden layers can represent any arbitrary region (assuming sufficient number of neurons or units).
  9. The number of neurons in each hidden layer are found by cross validation.
  10. The Mammalian visual cortex is hierarchical with 5-10 levels just for the visual system.
  11. Examples of Deep neural networks – Restricted Boltzmann machines, convolutional NN, autoencoders etc.
  12. Software for Deep learning:
    1. Tensorflow – Python
    2. Theano – Python
    3. Torch – LUA
    4. Caffe – Suited for vision

Robotics:

Path planning:

  1. Visibility graph: VGRAPH – A-star search through the nodes between obstacles. guaranteed to give shortest path in 2d. Doesn’t scale well to 3d. Path close to obstacles (nodes at vertices).
  2. Voronoi path planning:
  3. Potential field path planning: Consider both robot and obstacles as positive charges repelling each other and Goal as negative charge attracting the robot. Take a Random walk in case you reach a local minimum.
  4. Probabilistic roadmap planner: First needs to sample the space entirely and detect some collision free nodes. connect k nearest neighbor nodes. then do graph search to go from start to goal with these connected nodes. Can’t be sure that all nodes will be connected.

Glossary:

  • Discriminative models learn the (hard or soft) boundary between classes
  • Generative models model the distribution of individual class.
  • Confusion matrix.
  • ply – each move by one of the players.
  • policy – State to action mapping
  • stationary policy -Particular action for a particular state all the time.
  • Representation learning – automatically discovers representations required to classify or predict, Not much feature engineering required. Ex: Deep learning.
  • Saddle points/ minimax point – where derivative of the function is zero, but because of max in one ax and min in other axes. In the shape of the saddle on the horse.
  • Unsupervised pre-training – Making features before feeding into the learning network.
  • Configuration space(C – space): Set of parameters that completely describe the robot’s state. Proprioception.

CREDITS: Columbia University