[1]:
%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
    <div>Enter Password:</div>
    <input id="password" type="password"/>
    <button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
<a href="#" onclick="code_toggle(this); return false;">show code</a>
""")
[1]:
show code

Advice to Managers

An old statistics joke

A physicist, an engineer, and a statistician are on a hunting trip. They are walking through the woods when they spot a deer in the clearing.

The physicist calculates the distance to the target, the velocity and drop of the bullet, adjusts, and fires, missing the deer by five feet to the left.

The engineer looks frustrated. “You forgot to account for the wind. Give it here.” After licking a finger to determine the wind speed and direction, the engineer snatches the rifle and fires, missing the deer by five feet to the right.

Suddenly, without firing a shot, the statistician cheers, “Woo hoo! We got it!”

[2]:
hide_answer()
  • Being precisely perfect on average can mean being actually wrong each time. Regression can keep missing several feet to the left or several feet to the right. Even if it averages out to the correct answer, regression can mean never actually hitting the target.

  • Unlike regression, machine learning predictions might be wrong on average, but when the predictions miss, they often don’t miss by much. Statisticians describe this as allowing some bias in exchange for reducing variance.

Correlation vs Causation

  • Correlation is not causation. Recall the sales prediction case. It’s easy to say that there is a correlation between rain and monthly sales as shown by the regression. However, unless you are selling umbrellas, it might be difficult to prove that there is cause and effect.

  • Petabytes allow us to say: “Correlation is enough.” and causality is dead - Chris Anderson. Given enough statistical evidence, it’s no longer necessary to understand why things happen – we need only know what things happen together.

What do you think?

[3]:
hide_answer()
  • Correlation is not causation. When you see a correlation from a regression analysis, you cannot make assumptions. You have to go out and see what’s happening in the real world. What is the physical mechanism causing the relationship. For example, go out and observe consumers buying your product in the rain, talk to them and find out what is actually causing them to make the purchase. The goal is NOT to figure out what is going on in the data but to figure out what is going on in the world.

  • Correlation is enough. The beer and diapers. Some time ago, Wal-Mart runs queries on its point of sale systems with 1.2 million baskets worth in all. Many correlations appeared. Some of these were obvious.

    beer

    However, one correlation stood out like a sore thumb because it was so unexpected. Those queries revealed that, between 5pm and 7pm, customers tended to co-purchase beer and diapers. By moving these two items closer together, Wal-Mart reportedly saw the sales of both items increase geometrically.
  • The key question is “Can we take action on the basis of a correlation finding?” The answer depends primarily on two factors:

    • Confidence that the correlation will reliably recur in the future. The higher that confidence level, the more reasonable it is to take action in response.

    • The tradeoff between the risk and reward of acting. If the risk of acting and being wrong is extremely high, for example, acting on even a strong correlation may be a mistake. image1

How do managers use it?

Regression analysis is the go-to method in analytics. It helps us figure out what we can do. Mangers often uses it to

[4]:
hide_answer()
  • explain a phenomenon they want to understand (e.g. why did customer service calls drop last month?)

  • predict things about the future (e.g. what will sales look like over the next six months?)

  • or decide what to do (e.g. should we go with this promotion or a different one?)

What mistakes do managers make?

[5]:
hide_answer()
  • Managers need to scope the problem. It’s managers’ job to identify the factors that may have an impact and ask your analyst to look at those. If you ask a data analyst to tell you something you don’t know, then you deserve what you get, which is bad analysis. It’s the same principle as flipping a coin. If someone do it enough times, you will eventually think you see something interesting, like a bunch of heads all in a row.

  • Regression analysis is very sensitive to bad data so be careful about data collecting and whether to act on the data.

  • Don’t make the mistake of ignoring the error term. If the regression explains 90% of the relationship, that is great. But if it explains 10%, and you act like it is 90%, that is not good.

  • Don’t let data replace your intuition and judgment.