Commiters for this rc2 only: Razvan Pascanu Pascal Lamblin Frederic Bastien Ian Goodfellow Jeremiah Lowin Caglar Gulcehre Jey Kottalam Matthew Rocklin abalkin
Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
Fix an optimization warning. Now it gets optimized. (Frederic B.)
Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.) Also implement the gradient on the min/max bound.
Fix crash in the grad of tensor.switch for int (Ian G.)
Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
Fix crash as sometimes sparse.dot would return a different dtype number that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
Better error msg (Ian G.)
Move all sparse random function back to sandbox as it don’t have a state inside Theano. (Pascal L.) They where moved outside the sandbox in 0.6rc1
LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.) Otherwise, this was causing errors, segmentation faults or wrong results.
Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
Better handling and error message of gradients on integer. (Ian G.)
Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)
Outputs of Scan nodes could contain corrupted values: some parts of the output would be repeated a second time, instead of the correct values. It happened randomly, and quite infrequently, but the bug has been present (both in Python and Cython) since April 2011. (Pascal L.)
In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale. It did not return the right number of elements. (Frederic B.)
set_subtensor(x[int vector], new_value) when moved to the GPU was transformed into inc_subtensor on the GPU. Now we have a correct (but slow) GPU implementation. Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly in all cases as well as all inc_subtensor. Note 2: If your code was affected by the incorrect behavior, we now print a warning by default (Frederic B.)
Fixed an issue whereby config values were used as default arguments, with those defaults then stuck at old values if the config variables were changed during program execution. (David W-F)
Fixed many subtle bugs involving mutable default arguments which may have led to unexpected behavior, such as objects sharing instance variables they were not supposed to share. (David W-F)
Correctly record the GPU device number used when we let the driver select it. (Frederic B.)
Min, max with NaN in inputs did not return the right output. (Pascal L.)
The grad of TensorDot, was returning the wrong shape for some combination of axes. We now raise NotImplementedError in those cases. (Frederic B.)
theano.sparse.CSMGrad op (generated by the grad of CSM) didn’t handle unsorted input correctly and gradient that is sparser than the input. In that case, a bad result was returned. But this could happen only when a sparse input of a Theano function was not sorted. This happens for example with sparse advanced indexing from scipy. The conclusion is most of time Nan in the graph. (Yann Dauphin)
theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn’t handle correctly not contiguous inputs/outputs. (Pascal L.)
Fix a corner case CVM updates case. (Pascal L.) This happened if the update to a shared variable is itself after optimization. The CVM was not used by default.
Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.) This probably didn’t cause problem as there is only the UsmmCscDense op (used call to Usmm with CSC matrix) that could interfere with them.
Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
We now pass the GPU architecture to nvcc when compiling (Frederic B.)
Now we use the GPU function async feature by default. (Frederic B.) Set the environment variable CUDA_LAUNCH_BLOCKING to 1 to disable this for profiling or debugging.
Faster creation of CudaNdarray objects (Frederic B.)
Now some Max reductions are implemented on the GPU. (Ian G.)
sparse.remove0 (Frederic B., Nicolas B.)
sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
sparse.{diag,square_diagonal} (Nicolas B.)
Support for uint* dtype.
Implement theano.sparse.mul(sparse1, sparse2) when both inputs don’t have the same sparsity pattern. (Frederic B.)
New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
Op class: SamplingDot (Yann D., Nicolas B.) * Optimized version: SamplingDotCsr, StructuredDotCSC * Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
Implement the CSMProperties grad method (Yann Dauphin)
Move optimizations to theano/sparse/opt.py (Nicolas B.)
new flag “profile_optimizer” (Frederic B.) when profile=True, will also print the time spent in each optimizer. Useful to find optimization bottleneck.
new flag “cmodule.remove_gxx_opt” (Frederic B.) If True, will remove -O* parameter passed to g++. This is useful to debug in gdb module compiled by Theano. The parameter -g is passed by default to g++.
new flag cmodule.compilation_warning if True, will print compilation warning.
new flag allow_gc (Frederic B.) When False, do not garbage collect intermediate results when they are not needed. This uses more memory, but allocates memory less frequently so faster.
new flag vm.lazy (Frederic B.) Useful only for the vm linkers. When lazy is None, auto detect if lazy evaluation is needed and use the apropriate version. If lazy is True/False, force the version used between Loop/LoopGC and Stack.
new flag cxx. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
New flag print_active_device that defaults to True. (Matthew R.)
Do not try to use the BLAS library when blas.ldflags is manually set to an empty string (Frederic B., Pascal L.)
When importing theano on a computer without GPU with the Theano flags ‘device’ or ‘init_gpu_device’ set to gpu* (Frederic B., reported by Luo Heng)
Optimization printed a useless error when scipy was not available. (Frederic B.)
GPU conv crash/slowdown on newer hardware (James B.)
Better error handling in GPU conv (Frederic B.)
GPU optimization that moves element-wise Ops to the GPU. Crash happened in a particular execution order of this optimization and the element-wise fusion optimization when upcasting some inputs to float32 (to compute them on the GPU). (Frederic B., reported by Sander Dieleman)
GpuReshape in some particular case when the input is not contiguous (Frederic B., reported by Sander Dieleman)
GpuSoftmaxWithBias with shape (0, N) with N > 1. (Frederic B., reported by Razvan P.)
Fix crash under 64-bit Windows, when taking subtensors of the form a[n:] (Pascal L., reported by Simon McGregor)
Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable dimensions, which could typically result in optimization crashes (Olivier D.)
Fixed crash when concatenating some arrays with specific broadcasting patterns (Olivier D.)
Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
In advanced indexing, if some inputs are constant, no need to call constant(...) on their value any more. (Pascal L., reported by John Salvatier)
Fix crash on GPU when the GpuSubtensor didn’t put the right stride when the result tensor had a dimension with size of 1. (Pascal L, reported Graham T.)
Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
If you grad again a random state, don’t crash (Razvan P.)
GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535. (Frederic B. reported by Gabe Schwartz)
Potential crash due to parallel compilation when importing theano.sandbox.cuda (Olivier D.)
Crash fix on python 2.4 with slicing. (Pascal L.)
grad of argmin and argmax (Razvan P.)
Don’t compute the Rop for shared variables with updates (mostly random). We don’t use them and they caused crash. (Razvan P.)
MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)