Skip to content

Draft: Gemv kernel optimizations

m.hrywniak requested to merge mh/gemv_kernel_opts into master

Allows using less than MAXM bytes. Smem is just used as a manual cache (no sharing across threads). Passing BLOCKSIZE via template argument BS.

Cast flat buffer into shape for easier arithmetic.

Merge request reports