We recommend that everybody update to this version.
0.6rc4 skipped for a technical reason.
Committers since 0.5:
Frederic Bastien Pascal Lamblin Ian Goodfellow Olivier Delalleau Razvan Pascanu abalkin Arnaud Bergeron Nicolas Bouchard + Jeremiah Lowin + Matthew Rocklin Eric Larsen + James Bergstra David Warde-Farley John Salvatier + Vivek Kulkarni + Yann N. Dauphin Ludwig Schmidt-Hackenberg + Gabe Schwartz + Rami Al-Rfou’ + Guillaume Desjardins Caglar + Sigurd Spieckermann + Steven Pigeon + Bogdan Budescu + Jey Kottalam + Mehdi Mirza + Alexander Belopolsky + Ethan Buchman + Jason Yosinski Nicolas Pinto + Sina Honari + Ben McCann + Graham Taylor Hani Almousli Ilya Dyachenko + Jan Schlüter + Jorg Bornschein + Micky Latowicki + Yaroslav Halchenko + Eric Hunsberger + Amir Elaguizy + Hannes Schulz + Huy Nguyen + Ilan Schnell + Li Yao Misha Denil + Robert Kern + Sebastian Berg + Vincent Dumoulin + Wei Li + XterNalz +
A total of people contributed to this release. People with a “+” by their names contributed a patch for the first time.
We recommend that everybody update to this version.
We plan to release 0.6 in one week if there is no problem introduced with this release candidate.
Theano 0.6rc4 was skipped due to a problem with pypi
Committers for this rc5 only:
Frederic Bastien Pascal Lamblin Arnaud Bergeron abalkin Olivier Delalleau John Salvatier Razvan Pascanu Jeremiah Lowin Ludwig Schmidt-Hackenberg + Vivek Kulkarni Matthew Rocklin Gabe Schwartz James Bergstra Sigurd Spieckermann + Bogdan Budescu + Mehdi Mirza + Nicolas Bouchard Ethan Buchman + Guillaume Desjardins Ian Goodfellow Jason Yosinski Sina Honari + Ben McCann + David Warde-Farley Ilya Dyachenko + Jan Schluter + Micky Latowicki + Yaroslav Halchenko + Alexander Belopolsky Hannes Schulz + Huy Nguyen + Robert Kern + Sebastian Berg + Vincent Dumoulin + Wei Li + XterNalz +
A total of 36 people contributed to this release. People with a “+” by their names contributed a patch for the first time.
Windows related fixes.
Speed-ups.
Crash fixes.
A few small interface changes.
GPU memory leak fix.
A few corner cases fixes without incidence.
More Theano determinism
tensor.{dot,tensordot} more complete/faster/GPU friendly.
tensor.tensordot now support Rop/Lop
tensor.dot support n-dimensional inputs as NumPy
Commiters for this rc3 only: Frederic Bastien Ian Goodfellow Pascal Lamblin Jeremiah Lowin abalkin Olivier Delalleau Razvan Pascanu Rami Al-Rfou’ Vivek Kulkarni Guillaume Desjardins David Warde-Farley Eric Hunsberger Amir Elaguizy James Bergstra
tensor.tensordot now support Rop/Lop (Jeremiah Lowin) This remove the class TensorDot and TensorDotGrad. It is the Dot/Elemwise ops that are used.
tensor.dot support n-dimensional inputs as NumPy (Jeremiah Lowin) Work on the GPU too.
The Theano flag nvcc.flags now accept -ftz=true, –prec-div=false and –prec=sqrt=false as value. (Frederic B.) To enable all of them, use the Theano flag nvcc.flags=–use_fast_math.
New op theano.sparse.ConstructSparseFromList (Rami Al-Rfou’ Vivek Kulkarni)
Make Theano work with Anaconda on Windows. (Pascal L.)
Add tensor_var.diagonal and theano.tensor.{diag,diagonal}. (abalkin)
AdvencedSubtensor1 can now have a sparse gradient. (Rami Al-Rfou’, Vivek Kulkarni)
Implemented GpuContiguous.grad. (Ian G.)
c_code for SpecifyShape op. (Frederic B.)
cross-entropy optimization now work when specify_shape is used. (Pascal L.)
The Scan optimization ScanSaveMem and PushOutDot1 applied more frequently. (Razvan P, reported Abalkin) A skipped optimization warning was printed.
dot(vector, vector) now faster with some BLAS implementation. (Eric Hunsberger) OpenBLAS and possibly others didn’t call {s,d}dot internally when we called {s,d}gemv. MKL was doing this.
Compilation speed up: Take the compiledir lock only for op that generate c_code. (Frederic B)
c_code for theano.sparse.AddSD. (Rami Al-Rfou’, Vivek Kulkarni)
Commiters for this rc2 only: Razvan Pascanu Pascal Lamblin Frederic Bastien Ian Goodfellow Jeremiah Lowin Caglar Gulcehre Jey Kottalam Matthew Rocklin abalkin
Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
Fix an optimization warning. Now it gets optimized. (Frederic B.)
Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.) Also implement the gradient on the min/max bound.
Fix crash in the grad of tensor.switch for int (Ian G.)
Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
Fix crash as sometimes sparse.dot would return a different dtype number that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
Better error msg (Ian G.)
Move all sparse random functions back to sandbox as they don’t have a state inside Theano. (Pascal L.) They were moved outside the sandbox in 0.6rc1
LoadFromDisk now is allowed to only support some memmap mode. (Pascal L.) Otherwise, this was causing errors, segmentation faults or wrong results.
Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
Better handling and error message of gradients on integer. (Ian G.)
Fixed a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)
Outputs of Scan nodes could contain corrupted values: some parts of the output would be repeated a second time, instead of the correct values. It happened randomly, and quite infrequently, but the bug has been present (both in Python and Cython) since April 2011. (Pascal L.)
In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale. It did not return the right number of elements. (Frederic B.)
set_subtensor(x[int vector], new_value) when moved to the GPU was transformed into inc_subtensor on the GPU. Now we have a correct (but slow) GPU implementation. Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly in all cases as well as all inc_subtensor. Note 2: If your code was affected by the incorrect behavior, we now print a warning by default (Frederic B.)
Fixed an issue whereby config values were used as default arguments, with those defaults then stuck at old values if the config variables were changed during program execution. (David W-F)
Fixed many subtle bugs involving mutable default arguments which may have led to unexpected behavior, such as objects sharing instance variables they were not supposed to share. (David W-F)
Correctly record the GPU device number used when we let the driver select it. (Frederic B.)
Min, max with NaN in inputs did not return the right output. (Pascal L.)
The grad of TensorDot, was returning the wrong shape for some combination of axes. We now raise NotImplementedError in those cases. (Frederic B.)
theano.sparse.CSMGrad op (generated by the grad of CSM) didn’t handle unsorted input correctly and gradient that is sparser than the input. In that case, a bad result was returned. But this could happen only when a sparse input of a Theano function was not sorted. This happens for example with sparse advanced indexing from scipy. The conclusion is most of time Nan in the graph. (Yann Dauphin)
theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn’t handle correctly not contiguous inputs/outputs. (Pascal L.)
Fix a corner case CVM updates case. (Pascal L.) This happened if the update to a shared variable is itself after optimization. The CVM was not used by default.
Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.) This probably didn’t cause problem as there is only the UsmmCscDense op (used call to Usmm with CSC matrix) that could interfere with them.
Convolution on the GPU now checks the generation of the card to make it faster in some cases (especially medium/big ouput image) (Frederic B.)
- We had hardcoded 512 as the maximum number of threads per block. Newer cards support up to 1024 threads per block.
Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
We now pass the GPU architecture to nvcc when compiling (Frederic B.)
Now we use the GPU function async feature by default. (Frederic B.) Set the environment variable CUDA_LAUNCH_BLOCKING to 1 to disable this for profiling or debugging.
Faster creation of CudaNdarray objects (Frederic B.)
Now some Max reductions are implemented on the GPU. (Ian G.)
sparse.remove0 (Frederic B., Nicolas B.)
sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
sparse.{diag,square_diagonal} (Nicolas B.)
Support for uint* dtype.
Implement theano.sparse.mul(sparse1, sparse2) when both inputs don’t have the same sparsity pattern. (Frederic B.)
New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
Op class: SamplingDot (Yann D., Nicolas B.) * Optimized version: SamplingDotCsr, StructuredDotCSC * Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
Implement the CSMProperties grad method (Yann Dauphin)
Move optimizations to theano/sparse/opt.py (Nicolas B.)
new flag “profile_optimizer” (Frederic B.) when profile=True, will also print the time spent in each optimizer. Useful to find optimization bottleneck.
new flag “cmodule.remove_gxx_opt” (Frederic B.) If True, will remove -O* parameter passed to g++. This is useful to debug in gdb module compiled by Theano. The parameter -g is passed by default to g++.
new flag cmodule.compilation_warning if True, will print compilation warning.
new flag allow_gc (Frederic B.) When False, do not garbage collect intermediate results when they are not needed. This uses more memory, but allocates memory less frequently so faster.
new flag vm.lazy (Frederic B.) Useful only for the vm linkers. When lazy is None, auto detect if lazy evaluation is needed and use the apropriate version. If lazy is True/False, force the version used between Loop/LoopGC and Stack.
new flag cxx. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
New flag print_active_device that defaults to True. (Matthew R.)
Do not try to use the BLAS library when blas.ldflags is manually set to an empty string (Frederic B., Pascal L.)
When importing theano on a computer without GPU with the Theano flags ‘device’ or ‘init_gpu_device’ set to gpu* (Frederic B., reported by Luo Heng)
Optimization printed a useless error when scipy was not available. (Frederic B.)
GPU conv crash/slowdown on newer hardware (James B.)
Better error handling in GPU conv (Frederic B.)
GPU optimization that moves element-wise Ops to the GPU. Crash happened in a particular execution order of this optimization and the element-wise fusion optimization when upcasting some inputs to float32 (to compute them on the GPU). (Frederic B., reported by Sander Dieleman)
GpuReshape in some particular case when the input is not contiguous (Frederic B., reported by Sander Dieleman)
GpuSoftmaxWithBias with shape (0, N) with N > 1. (Frederic B., reported by Razvan P.)
Fix crash under 64-bit Windows, when taking subtensors of the form a[n:] (Pascal L., reported by Simon McGregor)
Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable dimensions, which could typically result in optimization crashes (Olivier D.)
Fixed crash when concatenating some arrays with specific broadcasting patterns (Olivier D.)
Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
In advanced indexing, if some inputs are constant, no need to call constant(...) on their value any more. (Pascal L., reported by John Salvatier)
Fix crash on GPU when the GpuSubtensor didn’t put the right stride when the result tensor had a dimension with size of 1. (Pascal L, reported Graham T.)
Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
If you grad again a random state, don’t crash (Razvan P.)
GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535. (Frederic B. reported by Gabe Schwartz)
Potential crash due to parallel compilation when importing theano.sandbox.cuda (Olivier D.)
Crash fix on python 2.4 with slicing. (Pascal L.)
grad of argmin and argmax (Razvan P.)
Don’t compute the Rop for shared variables with updates (mostly random). We don’t use them and they caused crash. (Razvan P.)
MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)