Theano¶
Pointers¶
Announcements mailing list: http://groups.google.com/group/theano-announce
User mailing list: http://groups.google.com/group/theano-users
Deep Learning Tutorials: http://www.deeplearning.net/tutorial/
Installation: https://deeplearning.net/software/theano/install.html
Description¶
Mathematical symbolic expression compiler
Dynamic C/CUDA code generation
Efficient symbolic differentiation
Theano computes derivatives of functions with one or many inputs.
Speed and stability optimizations
Gives the right answer for
log(1+x)
even if x is really tiny.
Works on Linux, Mac and Windows
Transparent use of a GPU
float32 only for now (working on other data types)
Still in experimental state on Windows
On GPU data-intensive calculations are typically between 6.5x and 44x faster. We’ve seen speedups up to 140x
Extensive unit-testing and self-verification
Detects and diagnoses many types of errors
On CPU, common machine learning algorithms are 1.6x to 7.5x faster than competitive alternatives
including specialized implementations in C/C++, NumPy, SciPy, and Matlab
Expressions mimic NumPy’s syntax & semantics
Statically typed and purely functional
Some sparse operations (CPU only)
The project was started by James Bergstra and Olivier Breuleux
For the past 1-2 years, I have replaced Olivier as lead contributor
Simple example¶
>>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable
>>> b = a + a**10 # build symbolic expression
>>> f = theano.function([a], b) # compile function
>>> f([0,1,2])
array([ 0., 2., 1026.])
Unoptimized graph |
Optimized graph |
---|---|
![]() |
![]() |
Symbolic programming = Paradigm shift: people need to use it to understand it.
Exercise 1¶
import theano
a = theano.tensor.vector() # declare variable
out = a + a**10 # build symbolic expression
f = theano.function([a], out) # compile function
print f([0,1,2])
# prints `array([0,2,1026])`
theano.printing.pydotprint_variables(b, outfile="f_unoptimized.png", var_with_name_simple=True)
theano.printing.pydotprint(f, outfile="f_optimized.png", var_with_name_simple=True)
Modify and execute the example to do this expression: a**2 + b**2 + 2*a*b
Real example¶
Logistic Regression
GPU-ready
Symbolic differentiation
Speed optimizations
Stability optimizations
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N,low=0, high=2))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability that target = 1
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b])
# Compile
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.1*gw, b:b-0.1*gb})
predict = theano.function(inputs=[x], outputs=prediction)
# Train
for i in range(training_steps):
pred, err = train(D[0], D[1])
print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])
Optimizations:
Where are those optimization applied?
log(1+exp(x))
1 / (1 + T.exp(var))
(sigmoid)log(1-sigmoid(var))
(softplus, stabilisation)GEMV (matrix-vector multiply from BLAS)
Loop fusion
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b))
# 1 / (1 + T.exp(var)) -> sigmoid(var)
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1)
# Log(1-sigmoid(var)) -> -sigmoid(var)
prediction = p_1 > 0.5
cost = xent.mean() + 0.01*(w**2).sum()
gw,gb = T.grad(cost, [w,b])
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
# w-0.1*gw: GEMV with the dot in the grad
updates={w:w-0.1*gw, b:b-0.1*gb})
Theano flags¶
Theano can be configured with flags. They can be defined in two ways
With an environment variable:
THEANO_FLAGS="profile=True,profile_memory=True"
With a configuration file that defaults to
~/.theanorc
Exercise 2¶
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the graph used in the slides
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Modify and execute the example to run on CPU with floatX=float32
You will need to use:
theano.config.floatX
andndarray.astype("str")
GPU¶
Only 32 bit floats are supported (being worked on)
Only 1 GPU per process
Use the Theano flag
device=gpu
to tell to use the GPU device
Use
device=gpu{0, 1, ...}
to specify which GPU if you have more than oneShared variables with float32 dtype are by default moved to the GPU memory space
Use the Theano flag
floatX=float32
Be sure to use
floatX
(theano.config.floatX
) in your codeCast inputs before putting them into a shared variable
Cast “problem”: int32 with float32 to float64
A new casting mechanism is being developed
Insert manual cast in your code or use [u]int{8,16}
Insert manual cast around the mean operator (which involves a division by the length, which is an int64!)
Exercise 3¶
Modify and execute the example of Exercise 2 to run with floatX=float32 on GPU
Time with:
time python file.py
Symbolic variables¶
# Dimensions
T.scalar, T.vector, T.matrix, T.tensor3, T.tensor4
Dtype
T.[fdczbwil]vector (float32, float64, complex64, complex128, int8, int16, int32, int64)
T.vector to floatX dtype
floatX: configurable dtype that can be float32 or float64.
Custom variable
All are shortcuts to:
T.tensor(dtype, broadcastable=[False]*nd)
Other dtype: uint[8,16,32,64], floatX
Creating symbolic variables: Broadcastability
Remember what I said about broadcasting?
How to add a row to all rows of a matrix?
How to add a column to all columns of a matrix?
Details regarding symbolic broadcasting…
Broadcastability must be specified when creating the variable
The only shorcut with broadcastable dimensions are: T.row and T.col
For all others:
T.tensor(dtype, broadcastable=([False or True])*nd)
Differentiation details¶
>>> gw,gb = T.grad(cost, [w,b]) # doctest: +SKIP
T.grad works symbolically: takes and returns a Theano variable
T.grad can be compared to a macro: it can be applied multiple times
T.grad takes scalar costs only
Simple recipe allows to compute efficiently vector x Jacobian and vector x Hessian
We are working on the missing optimizations to be able to compute efficently the full Jacobian and Hessian and Jacobian x vector
Benchmarks¶
Multi-Layer Perceptron:
60x784 matrix times 784x500 matrix, tanh, times 500x10 matrix, elemwise, then all in reverse for backpropagation

Convolutional Network:
256x256 images convolved with 6 7x7 filters, downsampled to 6x50x50, tanh, convolution with 16 6x7x7 filter, elementwise tanh, matrix multiply, softmax elementwise, then in reverse

Elemwise
All on CPU
Solid blue: Theano
Dashed Red: numexpr (without MKL)
