Naive Bayes

Bayes theorem helps us to incorporate new evidences/information into our model.

Code from sklearn

from sklearn.naive_bayes import GaussianNB


Bayesian Learning:

Bayesian learning.PNG

Naive bayes


Linear in the number of variables.

Naive Bayes assumes conditional independence across attributes and doesn’t capture inter-relationship among attributes.

Gaussian naive Bayes assumes continues values associated with each class are distributed in Gaussian fashion.

Even if the probability of one of the attributes given label becomes zero, the whole thing ends up being zero.


Machine learning

Pandas and numpy:

numpy.mean(df[ (>0) ][“bronze”])

olympic_medal_counts = {‘country_name’:countries,
‘gold’: Series(gold),
‘silver’: Series(silver),
‘bronze’: Series(bronze)}
df = DataFrame(olympic_medal_counts)

del df[‘country_name’]
avg_medal_count=df.apply(lambda x:numpy.mean(x))




df[‘points’] = df[[‘gold’,’silver’,’bronze’]].dot([4, 2, 1]) olympic_points_df = df[[‘country_name’,’points’]]


For highly-skewed feature distributions such as 'capital-gain' and 'capital-loss', it is common practice to apply a logarithmic transformation on the data so that the very large and very small values do not negatively affect the performance of a learning algorithm. Using a logarithmic transformation significantly reduces the range of values caused by outliers. Care must be taken when applying this transformation however: The logarithm of 0 is undefined, so we must translate the values by a small amount above 0 to apply the the logarithm successfully.


In addition to performing transformations on features that are highly skewed, it is often good practice to perform some type of scaling on numerical features. Applying a scaling to the data does not change the shape of each feature’s distribution (such as 'capital-gain' or 'capital-loss' above); however, normalization ensures that each feature is treated equally when applying supervised learners. Note that once scaling is applied, observing the data in its raw form will no longer have the same original meaning, as exampled below.

We will use sklearn.preprocessing.MinMaxScaler for this.

If data is not normally distributed, especially if the mean and median vary significantly (indicating a large skew), it is most often appropriate to apply a non-linear scaling — particularly for financial data. One way to achieve this scaling is by using a Box-Cox test, which calculates the best power transformation of the data that reduces skewness. A simpler approach which can work in most cases would be applying the natural logarithm.

Identifying Outliers:

Tukey’s method



from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

R2 score:

r2 score.PNG

If {\bar {y}} is the mean of the observed data:

{\bar {y}}={\frac {1}{n}}\sum _{i=1}^{n}y_{i}

then the variability of the data set can be measured using three sums of squares formulas:

SS_{\text{tot}}=\sum _{i}(y_{i}-{\bar {y}})^{2},
SS_{\text{reg}}=\sum _{i}(f_{i}-{\bar {y}})^{2},
{\displaystyle SS_{\text{res}}=\sum _{i}(y_{i}-f_{i})^{2}=\sum _{i}e_{i}^{2}\,}

The most general definition of the coefficient of determination is

R^{2}\equiv 1-{SS_{\rm {res}} \over SS_{\rm {tot}}}.\,


Accuracy = true positive + true negatives/total

from sklearn.metrics import accuracy_score


Accuracy is not a right metric when data is skewed.







f beta.PNG



Underfitting: Error due to bias, Oversimplified model, performs badly on both training and testing data.

Overfitting: Error due to variance. Over complicated model. model is too specific, performs badly on testing data.

Model complexity graph:

Model complexity graph.PNG

Training set: for training model.

Cross validation set: for choosing right parameters like degree of the polynomial. Useful to check whether the trained model is overfitting. If trained model performs poorly on this cross validation set, then the model is overfitted.

Testing set: For final testing.


Divide all your data into k buckets and iteratively create models by choosing one bucket as testing set and remaining for training.

Use average of these models for final model.



If you increase training points on different models:


Grid Search CV:


Plot learning curves to identify when to stop collecting data.

Supervised learning:

If there is an order in output data, then go for continuous model. Ex: income, age.

If there is no order, go for a discrete model. Ex: phone numbers, persons

Algorithms to minimize sum of squared errors:

  1. Ordinary least squares: sklearn LinearRegression.
  2. Gradient descent.

There can be multiple lines that minimize |error|, but only one that minimizes error^2.

Instance based learning:


  1. Remembers.
  2. Fast and doesn’t learn.
  3. simple.
  4. No generalization.
  5. Sensitive to noise.


  1. Look up:

In k-nn all features matter equally because when we calculate distance, all features are treated equally.


Locally weighted regression (evolved from k-nearest neighbors):

Naive bayes:

Powerful tools for creating classifiers for incoming labeled data.

Expectation maximization:

expectation maximization.PNG

This is very similar to k-means clustering.

EM is for soft clustering when there is any ambiguity regarding which data point to move to which cluster.

Which supervised classifiers are suitable for numerical as well as categorical data?

The data you have is called ‘mixed data’ because it has both numerical and categorical values. And since you have class labels; therefore, it is a classification problem.  One option is to go with decision trees, which you already tried. Other possibilities are naive Bayes where you model numeric attributes by a Gaussian distribution or so. You can also employ a minimum distance or KNN based approach; however, the cost function must be able to handle data for both types together. If these approaches don’t work then try ensemble techniques. Try bagging with decision trees or else Random Forest that combines bagging and random subspace. With mixed data, choices are limited and you need to be cautious and creative with your choices.

Feature scaling:

To give equal importance to all the different features, we can normalize them to the range of 0 to 1 before applying learning algorithm.


from sklearn.preprocessing import MinMaxScaler



Feature rescaling would be useful in k-means clustering and rbf svm where we calculate distances but not much in decision trees and linear regression.

Feature selection:


  1. Knowledge discovery, interpretability, and insight. To identify which features actually matter among all of them.
  2. Curse of dimensionality – The amount of data that you need to train grows exponential to the number of features that you have.


Feature selection:

Filtering(fast) and Wrapping(slow):

knn suffers from curse of dimensionality because it doesn’t know which features are important. So, we can use decision trees as filtering mechanism to determine important features and then pass them on to knn for learning.

Wrapper methods evaluate subsets of variables which allows, unlike filter approaches, to detect the possible interactions between variables.


  1.  A feature is said to be strongly relevant if the Bayes optimal classifier’s performance is strongly affected by the absence of this feature.
  2. A feature is weakly relevant if there exists some other feature which can suffice the purpose of this feature.
  3. Depends on how much information a feature provides.


  1. Depends on error/model/learner.

Composite feature:

composite feature.PNG

When to use PCA?

  1. When some latent features are driving the patterns in the data.
  2. Dimensionality reduction, reduce noise, better visualization.

Feature transformation:

Independent Component analysis: cocktail party problem.


Other feature transformations:

  1. Random component analysis – Fast and it usually works.
  2. Linear discriminant analysis –

Design patterns


Design that is more automated and require less maintenance is the standard.

Getters and setters:

Setters can be used for validation and constraining the value to be set to a particular range.

Getters can be used for returning default value, if the variable is not set yet or for lazy instantiation.

Can’t make objects out of abstract class. Abstract class can have some non abstract members.

Interface have only abstract methods.


Image result for design patterns java


  1. Singleton:
    1. Can only have one instance of that particular class.
    2. President of a country, System in java.
    3. Private constructor, singleton using enum.
    4. @Singleton annotation.
    5. Difficult to unit test – why?
  2. Factory:
    1. Having a logic to return a particular subclass object, when asked for a class object.
  3. Abstract Factory:
  4. Builder:
    1. Separates object construction from its representation.
    2. interfaces.
  5. Prototype:
    1. Chess game initial setup.
    2. Copying/cloning the initial setup rather than creating the initial setup everytime you need it. Reduce redundant work.
    3. Copy a fully initialized instance.
    4. Link to code.

How to create objects?


Inheritance? Interface? etc.

How are different classes related?

How are objects composed?

  1. Adapter:
    1.  Match interfaces of different classes. helps to communicate.
  2. Composite:
  3. Proxy:
    1. An object representing another object, like credit card as a proxy of bank account.
    2. Remote object and Home object(proxy).
  4. Flyweight:
    1. Reuse same objects by resetting values of the objects appropriately instead of creating new objects every time.
  5. Facade:
    1. Event managers, process, execute, group many steps into a single step.
  6. Bridge:
  7. Decorator:
    1. Add responsibilities to objects dynamically.
    2. Ex: adding different Toppings for different pizzas, adding discounts to different orders.


Interactions between different objects.

  1. Template method:
  2. Mediator:
    1. instead of applications talking to each other, we use an enterprise service bus.
  3. Chain of responsibility:
    1. Passing a request through different objects.
  4. Observer:
    1. A way of notifying a change to a number of classes.
    2. This pattern is implemented in java.
    3. Subject extends Observable.
    4. Who wants to listen implements Observer and registers with the subject.
  5. Strategy:
    1. change the implementation/strategy of an interface at a later point in time.
    2. Pass whatever implementation needs to be used as an argument.
  6. Command:
    1. Encapsulate a command request as an object.
    2. java.lang.runnable threads are implemented like this.
  7. State:
  8. Visitor:
    1. Adding new operations to a particular class without inheritance and wi
  9. Iterator:
    1. Sequentially access the elements of a collection.
  10. Interpreter:
  11. Memento:
    1. Saving states of something as objects to restore them in future point of time if necessary.
    2. Undo/Redo operations.

Strategy pattern:



Strategy design pattern.PNG


Strategy pattern - when.PNG

Observer pattern:


Observer pattern - when


Observer pattern

Factory pattern :

Factory pattern - whenFactory pattern

Abstract Factory pattern:

Singleton pattern:

Singleton pattern.PNG

 Builder pattern:

Builder pattern.PNG

Prototype pattern:


Dependency injection

dependency injection1.PNG

What is dependency injection?

Instead of initiating the object we are dependent on, we take help of a dependency framework which will push the ready made object to the dependent class during runtime.

In the above picture we are not initializing hotDrink with new hotDrink(), instead we are getting it as a constructor parameter.

Why is it useful?

This is useful to decouple two different code packages and remove direct dependency.

If you have an interface which can have multiple implementations, with the dependency injection framework, you can choose to run which implementation to run at runtime. Suppose you want to isolate and test a particular package, you can provide mock implementations of all other interfaces it is dependent on.

If you have to do the same without DI injection and interfaces, then you need to change code in lot of places, calling mock object at all the places you were calling original object to test.

It is useful when you have dependencies depending on other dependencies and so on.

Implementing with Google Guice:

Guice diGuice di2

You should usually initialize the injector where you bootstrap the program.

bind or guice configure() helps you define which implementation of interface to use.

@implementedby annotation can be used instead of configure() and bind.

@implementedby can be used as a default one.

If both guice module(bind) and @implementedby compete for different implementations of same interface, guice module wins.


Wanna have conditional logic to pick implementations?


Dependency injection without interface:


Dependency injection

Enable a class to be generic by making it disown the responsibility of defining an object in itself. Make other classes define the input object for it.

If you want to change the input object , you dont need to change the original class with dependency injection.

dependency injection

A spring container contains a set of objects or beans.

Essence of Rich Dad Poor Dad

Identify what are assets and what are liabilities.

Increase asset column.

Don’t work for money, make money work for you.

Rich buy luxuries late.

Rich guys income statement and balance sheet.


How corporations help rich with taxes.


Corporations earn, spend and then pay taxes on the rest.

Individuals earn, get taxed and then spend.

Listing some assets:

  1. Businesses that do not require my presence.
  2. Stocks – Fortunes are made in new stock issues(new stocks are tax-free).
  3. Bonds.
  4. Income generating real estate.
  5. Notes (IOUs).
  6. Royalties from intellectual property.

Dimensions of financial literacy:

  1. Accounting:
  2. Investing:
  3. Understanding markets:
  4. The law: