[1]:
%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
    <div>Enter Password:</div>
    <input id="password" type="password"/>
    <button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
<a href="#" onclick="code_toggle(this); return false;">show code</a>
""")
[1]:
show code
[2]:
%run loadmlfuncs.py

Current Status of Deep Learning

Achievements

Deep learning has achieved the following breakthroughs, all in historically difficult areas of machine learning:

  • Near-human-level image classification

  • Near-human-level speech recognition

  • Near-human-level handwriting transcription

  • Improved machine translation

  • Improved text-to-speech conversion

  • Digital assistants such as Google Now and Amazon Alexa

  • Near-human-level autonomous driving

  • Improved ad targeting, as used by Google, Baidu, and Bing

  • Improved search results on the web

  • Ability to answer natural-language questions

  • Superhuman Go playing

Hardware

Although our laptop can run small deep-learning models, typical deep-learning models used in computer vision or speech recognition require orders of magnitude more computational power.

Throughout the 2000s, companies like NVIDIA and AMD have been investing billions of dollars in developing fast, massively parallel chips, graphical processing units (GPUs), to power the graphics of increasingly photorealistic video games — cheap, single-purpose supercomputers designed to render complex 3D scenes on the screen in real time.

At the end of 2015, the NVIDIA TITAN X, a gaming GPU that cost $1,000 can perform 6.6 trillion float32 operations per second. That is about 350 times more than what you can get out of a modern laptop. Meanwhile, large companies train deep-learning models on clusters of hundreds of GPUs of a type developed specifically for the needs of deep learning, such as the NVIDIA Tesla K80. The sheer computational power of such clusters is something that would never have been possible without modern GPUs.

The deep-learning industry is starting to go beyond GPUs and is investing in increasingly specialized, efficient chips for deep learning. In 2016, at its annual I/O convention, Google revealed its tensor processing unit (TPU) project: a new chip design developed from the ground up to run deep neural networks, which is reportedly 10 times faster and far more energy efficient than top-of-the-line GPUs.

If you don’t already have a GPU that you can use for deep learning, then running deep-learning experiments in the cloud is a simple, low cost way for you to get started without having to buy any additional hardware. But if you’re a heavy user of deep learning, this setup isn’t sustainable in the long term or even for more than a few weeks.

Investment

As deep learning became the new state of the art for computer vision and eventually for all perceptual tasks, industry leaders took note. What followed was a gradual wave of industry investment far beyond anything previously seen in the history of AI.

  • In 2011 (right before deep learning took the spotlight), the total venture capital investment in AI was around $19 million

  • By 2014, the total venture capital investment in AI had risen to $394 million

    • Google acquired the deep-learning startup DeepMind for a reported $500 million — the largest acquisition of an AI company in history.

    • Baidu started a deep-learning research center in Silicon Valley, investing $300 million in the project.

    • Intel acquired a deep-learning hardware startup Nervana Systems for over $400 million.

There are currently no signs that this uptrend will slow any time soon.

Cases

As entrepreneurs of AI start-ups, Alice and Bob had received similar amount of investments and competed in the same market

  • Alice spent lots of money to hire top engineers in AI field

  • Bob hired only mediocre engineers and spent most of his money to obtain high quality data with larger size

Who will you invest? Why?

[3]:
hide_answer()
acc1 = net_compare(512, .25)
acc2 = net_compare(128, 1)
print('The accuracy of a complicated model (with 512 nodes) with less (one fourth of) training data:', acc1)
print('The accuracy of a simple model (with 128 nodes) and full training data:', acc2)
print('The improvement is {}%!'.format(round((acc2-acc1)/(1-acc1)*100,2)))

Suppose you’re trying to develop a model that can take as input images of a clock

clock

and can output the time of day. What machine learning approach will you use?

[4]:
hide_answer()
  • If you choose to use the raw pixels of the image as input data, then you have a difficult machine-learning problem on your hands. You’ll need a convolutional neural network to solve it, and you’ll have to expend quite a bit of computational resources to train the network.

  • But if you already understand the problem at a high level, you can write a five-line Python script to follow the black pixels of the clock hands and output the \((x, y)\) coordinates of the tip of each hand. Then a simple machine-learning algorithm can learn to associate these coordinates with the appropriate time of day. For example, the long hand has \((x=0.7, y=0.7)\) and the short hand has \((x=0.5, y=0.0)\) in the first image, and the long hand has \((x=0.0, y=1.0)\) and the short hand has \((x=-0.38, y=0.32)\) in the second image.

  • You can go even further: do a coordinate change, and express the \((x, y)\) coordinates as the angle of each clock hand. For example, the long hand has angle \(45\) degree and the short hand has angle \(0\) degree in the first image, and the long hand has angle \(90\) degree and the short hand has angle \(140\) degree in the second image. At this point, your features are making the problem so easy that no machine learning is required; a simple rounding operation and dictionary lookup are enough to recover the approximate time of day.