| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
| |
Specify `__CUDA_ARCH_LIST__` explicitly so CUB namespace stay the same across all nvcc invokations
commit_hash:2100ccb2307100378bcead498fd34cd11e44c566
|
| |
|
|
|
|
|
| |
nvcc disables them implicitly
ISSUE:
commit_hash:0b68decce1f030902bd770b8b98fc8102c97e738
|
| |
|
|
|
|
|
| |
IDs generated by different cicc invocations should match
ISSUE:
commit_hash:7cd593cee44b31875e7166709d7614dcfa3f1f14
|
| |
|
|
|
|
|
|
|
| |
features
E.g. sm_90a or sm_100f
ISSUE:
commit_hash:250df064a8abcac925db676565582b5ef05401bb
|
| |
|
|
| |
commit_hash:f73df3ec27f0695b21e7047ee465b15b201ea06b
|
|
|
compilation
Instead of a single graph node launching NVCC to compile .cu for both host and all device architectures
CUDA_SRCS generates multiple nodes:
- node per each device architecture producing PTX and CUBIN
- node merging all PTX and CUBIN files into a single FATBIN blob
- node producing .cpp with host code
- node compiling host .cpp with embedded FATBIN blob
CUDA_ARCHITECTURES variable is used to determine the list of architectures to compile device code for.
ISSUE:
commit_hash:0a4c2797dd238ae062482af30694df6978301278
|