Bandicoot: Efficient GPU linear algebra via C++ template metaprogramming