This Page

PerformanceΒΆ

Theano uses several tricks to obtain good performance:
  • common sub-expression elimination
  • [custom generated] C code for many operations
  • pre-allocation of temporary storage
  • loop fusion (which gcc normally can’t do)

On my neural net experiments for my course projects, I was getting around 10x speed improvements over basic numpy by using theano. [More specific speed tests would be nice.]

With a little work, Theano could also implement more sophisticated optimizations:

  • automatic ordering of matrix multiplications
  • profile-based memory layout decisions (e.g. row-major vs. col-major)
  • gcc intrinsics to use MMX, SSE2 parallelism for faster element-wise arithmetic
  • conditional expressions