|
|
compilation
Instead of a single graph node launching NVCC to compile .cu for both host and all device architectures
CUDA_SRCS generates multiple nodes:
- node per each device architecture producing PTX and CUBIN
- node merging all PTX and CUBIN files into a single FATBIN blob
- node producing .cpp with host code
- node compiling host .cpp with embedded FATBIN blob
CUDA_ARCHITECTURES variable is used to determine the list of architectures to compile device code for.
ISSUE:
commit_hash:0a4c2797dd238ae062482af30694df6978301278
|