[WIP] A different approach to benchmarking (#9)
* mlstm baseline
Signed-off-by: Edward Yang <ezyang@fb.com>
* Use CUDA event API
Signed-off-by: Edward Yang <ezyang@fb.com>
* Robust pinning and governor checking in the script.
Signed-off-by: Edward Yang <ezyang@fb.com>
* Final benchmarks.
Signed-off-by: Edward Yang <ezyang@fb.com>
* More variants
Signed-off-by: Edward Yang <ezyang@fb.com>
* Clean up
Signed-off-by: Edward Yang <ezyang@fb.com>
* Add some missing files
Signed-off-by: Edward Yang <ezyang@fb.com>
* Slightly updated INSTALL instructions.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Minor bugfixes.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Experimental lstm variable with C++, but not all ops are implemented.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Switch benchmarks to scale by sequence length, and display us rather than ms.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>