Willsch, Dennis (95e93340) at 13 Apr 15:53
Add transmon population <m> and fix [ -> [0: and abs
Willsch, Dennis (06caa594) at 08 Dec 12:16
Test on GeForce GTX 1050 Ti with NVHPC 21.9
Willsch, Dennis (b8685c23) at 14 Oct 16:57
Check cuTENSOR version after merge
... and 7 more commits
Willsch, Dennis (a914c31b) at 23 Aug 18:53
Merge branch 'dressedbasis-to-double' of jugit.fz-juelich.de:qip/ju...
... and 1 more commit
Willsch, Dennis (48102488) at 23 Aug 18:47
Pass 4000 nanoseconds dressedbasis check
Willsch, Dennis (71cf30db) at 23 Aug 16:47
Update makefile
Willsch, Dennis (78e20853) at 20 Aug 14:57
Merge branch 'dressedbasis-to-double-cutensor' into dressedbasis-to...
... and 4 more commits
Willsch, Dennis (fd64b3b6) at 12 Aug 11:34
Update makefile
Willsch, Dennis (0d7eb644) at 11 Aug 17:01
Add support for sorting and verifying dressed states if energy is g...
... and 1 more commit
Willsch, Dennis (123a28e8) at 11 Aug 10:44
Change dressed basis from double complex to double
Willsch, Dennis (97b21a7d) at 10 Aug 22:16
Add support for dressed basis
This contains the NVTX ranges to get more structure into the profiles. The snippets and macros are from the blog post at https://developer.nvidia.com/blog/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/
I marked this as Draft as I just noticed that it probably makes sense to exclude the profile commands text file. In any case now you can already see the code changes so we can discuss at our call.
Thanks for your help Markus, the newer version and X-forwarding with ssh -Y
(both through VPN and intermediate gateway at JSC with ssh -A -Y -t jscgw ssh -A -Y juwels-booster
) worked for me now.
Allows using less than MAXM bytes. Smem is just used as a manual cache (no sharing across threads). Passing BLOCKSIZE via template argument BS.
Cast flat buffer into shape for easier arithmetic.
I played around with the optimizations and merged most of them into master. I don't fully understand why the strided kernels are faster than the non-strided ones (given that non-strided = strided with stride=1) but I verified that all changes you made indeed improved the performance. Thanks again!
Willsch, Dennis (d611da8e) at 21 Jul 12:49
Use thrust::complex for CUDA kernels as in mh/gemv_kernel_opts
Willsch, Dennis (83746614) at 21 Jul 11:54
Merge some optimizations from mh/gemv_kernel_opts into master
... and 1 more commit
If I'm understanding correctly, you're using the standalone module on the Booster (error msg you posted) and the profiler version that came with the HPC SDK on your own machine, is that correct?
I'm asking because I don't think the errors are sharing the same root cause. Can you try, on your local machine, to install the latest version of e.g. nsys
and try opening the report file? https://docs.nvidia.com/hpc-sdk/archive/21.3/hpc-sdk-release-notes/index.html shows that 21.3 ships with nsys 2020.5 which is quite old.
A quick way to install the newer version is via the repos, e.g. https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64/, otherwisne the install page has some more detailed instructions via https://developer.nvidia.com/nsight-compute
In general, newer versions of the profilers can open older report files, but not the other way round. Recent(-ish) versions of nsys should also show an error message if the report is too new for them.
For the other issue of remote launching the apps, X-forwarding works fine for me with the modules you listed above. Did you connect with ssh -Y
? That's needed on my end, though I run a somewhat non-standard WSL-with-XServer setup on Windows.
Some warnings are to be expected as the head nodes don't have GPUs so software fallback is needed. Sample output:
[hrywniak1@jwlogin24 ~]$ nsys-ui
libk5crypto.so.3 requires EVP_KDF_ctrl. Switching to system OpenSSL libraries
OpenGL version is too low (0). Falling back to Mesa software rendering.
OpenGL version: "3.1 Mesa 18.1.9 (git-f57f37f3ba)"
[hrywniak1@jwlogin24 ~]$ ncu-ui
libk5crypto.so.3 requires EVP_KDF_ctrl. Switching to system OpenSSL libraries
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Could not get current OpenGL version!
Warning: OpenGL Version check failed. Falling back to Mesa software rendering.
[hrywniak1@jwlogin24 ~]$ nsys --version; ncu --version
NVIDIA Nsight Systems version 2021.2.1.58-642947b
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2021 NVIDIA Corporation
Version 2021.2.0.0 (build 30066266) (public-release)
using the modules you posted above, I get this output and can open the reports directly on the Booster filesystem. Ping @d.willsch because I'm not sure you'll get the notification otherwise.
Great, I was able to verify more than 2x speedup for the gemv kernels. I will play with it a bit more tomorrow and then merge some of it into the master branch. Thanks a lot Markus and Jiri!
Thanks Markus, I was able to generate qdrep and ncu-rep profiles. But I still couldn't open them using nv-nsight-cu, neither with ml Nsight-Systems/2021.1.1 Nsight-Compute/2020.3.0
nor with
ml use $OTHERSTAGES
ml Stages/Devel-2020
ml GCCcore/.10.3.0
ml Nsight-Systems/2021.2.1
ml Nsight-Compute/2021.2.0
I always get errors like this both on the booster (SSH with X forwarding) and on my local machine using NVHPC 21.3.
/p/software/juwelsbooster/stages/Devel-2020/software/Nsight-Compute/2021.2.0-GCCcore-10.3.0/host/linux-desktop-glibc_2_11_3-x64/ncu-ui: line 27: 10871 Aborted (core dumped) "$NV_AGORA_PATH/CrashReporter" "NVIDIA Nsight Compute" "NVIDIA Nsight Compute" "2021.2.0.0 (build 30066266) (public-release)" "$NV_AGORA_PATH/ncu-ui.bin" "$@"
How do you open and X-forward nv-nsight-cu remotely? Do you use the remote desktop from jupyter jsc or similar?