[1]:

%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
    <div>Enter Password:</div>
    <input id="password" type="password"/>
    <button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
<a href="#" onclick="code_toggle(this); return false;">show code</a>
""")

[1]:

show code

[2]:

%run loadmlfuncs.py

Current Status of Deep Learning¶

Achievements¶

Deep learning has achieved the following breakthroughs, all in historically difficult areas of machine learning:

Near-human-level image classification
Near-human-level speech recognition
Near-human-level handwriting transcription
Improved machine translation
Improved text-to-speech conversion
Digital assistants such as Google Now and Amazon Alexa
Near-human-level autonomous driving
Improved ad targeting, as used by Google, Baidu, and Bing
Improved search results on the web
Ability to answer natural-language questions
Superhuman Go playing

Hardware¶

Although our laptop can run small deep-learning models, typical deep-learning models used in computer vision or speech recognition require orders of magnitude more computational power.

Throughout the 2000s, companies like NVIDIA and AMD have been investing billions of dollars in developing fast, massively parallel chips, graphical processing units (GPUs), to power the graphics of increasingly photorealistic video games — cheap, single-purpose supercomputers designed to render complex 3D scenes on the screen in real time.

At the end of 2015, the NVIDIA TITAN X, a gaming GPU that cost $1,000 can perform 6.6 trillion float32 operations per second. That is about 350 times more than what you can get out of a modern laptop. Meanwhile, large companies train deep-learning models on clusters of hundreds of GPUs of a type developed specifically for the needs of deep learning, such as the NVIDIA Tesla K80. The sheer computational power of such clusters is something that would never have been possible without modern GPUs.

The deep-learning industry is starting to go beyond GPUs and is investing in increasingly specialized, efficient chips for deep learning. In 2016, at its annual I/O convention, Google revealed its tensor processing unit (TPU) project: a new chip design developed from the ground up to run deep neural networks, which is reportedly 10 times faster and far more energy efficient than top-of-the-line GPUs.

If you don’t already have a GPU that you can use for deep learning, then running deep-learning experiments in the cloud is a simple, low cost way for you to get started without having to buy any additional hardware. But if you’re a heavy user of deep learning, this setup isn’t sustainable in the long term or even for more than a few weeks.

Investment¶

As deep learning became the new state of the art for computer vision and eventually for all perceptual tasks, industry leaders took note. What followed was a gradual wave of industry investment far beyond anything previously seen in the history of AI.

In 2011 (right before deep learning took the spotlight), the total venture capital investment in AI was around $19 million
By 2014, the total venture capital investment in AI had risen to $394 million
- Google acquired the deep-learning startup DeepMind for a reported $500 million — the largest acquisition of an AI company in history.
- Baidu started a deep-learning research center in Silicon Valley, investing $300 million in the project.
- Intel acquired a deep-learning hardware startup Nervana Systems for over $400 million.

There are currently no signs that this uptrend will slow any time soon.

Cases¶

As entrepreneurs of AI start-ups, Alice and Bob had received similar amount of investments and competed in the same market

Alice spent lots of money to hire top engineers in AI field
Bob hired only mediocre engineers and spent most of his money to obtain high quality data with larger size

Who will you invest? Why?

[3]:

hide_answer()

[3]:

show answer

acc1 = net_compare(512, .25)
acc2 = net_compare(128, 1)
print('The accuracy of a complicated model (with 512 nodes) with less (one fourth of) training data:', acc1)
print('The accuracy of a simple model (with 128 nodes) and full training data:', acc2)
print('The improvement is {}%!'.format(round((acc2-acc1)/(1-acc1)*100,2)))

Suppose you’re trying to develop a model that can take as input images of a clock

and can output the time of day. What machine learning approach will you use?

[4]:

hide_answer()

[4]:

show answer

If you choose to use the raw pixels of the image as input data, then you have a difficult machine-learning problem on your hands. You’ll need a convolutional neural network to solve it, and you’ll have to expend quite a bit of computational resources to train the network.
But if you already understand the problem at a high level, you can write a five-line Python script to follow the black pixels of the clock hands and output the $(x, y)$ coordinates of the tip of each hand. Then a simple machine-learning algorithm can learn to associate these coordinates with the appropriate time of day. For example, the long hand has $(x=0.7, y=0.7)$ and the short hand has $(x=0.5, y=0.0)$ in the first image, and the long hand has $(x=0.0, y=1.0)$ and the short hand has $(x=-0.38, y=0.32)$ in the second image.
You can go even further: do a coordinate change, and express the $(x, y)$ coordinates as the angle of each clock hand. For example, the long hand has angle $45$ degree and the short hand has angle $0$ degree in the first image, and the long hand has angle $90$ degree and the short hand has angle $140$ degree in the second image. At this point, your features are making the problem so easy that no machine learning is required; a simple rounding operation and dictionary lookup are enough to recover the approximate time of day.