Under Construction!

Главная страница описания Главная страница ЛТФ

CUDA


Технология вычислений на графических процессорах Nvidia

  • Введение
  • Аппаратное обеспечение
  • Программное обеспечение
  • Производительность
  • Пример для Maple
  • Источники информации

  • Введение

    CUDA Logo

    Вверх Главная страница описания Главная страница ЛТФ

    Аппаратное обеспечение

    См. статья о Nvidia Tesla в Wikipedia
    TECHNICAL SPECIFICATIONS
    Tesla A30 Tesla A30 на THEOR4 ( Micro-architecture Ampere GA100 )
    • Peak FP64: 5.2 TF
    • Peak FP64: Tensor Core 10.3 TF
    • Peak FP32: 10.3 TF
    • TF32 Tensor Core: 82 TF | 165 TF*
    • BFLOAT16 Tensor Core: 165 TF | 330 TF*
    • Peak FP16 Tensor Core: 165 TF | 330 TF*
    • Peak INT8 Tensor Core: 330 TOPS | 661 TOPS*
    • Peak INT4 Tensor Core: 661 TOPS | 1321 TOPS*
    • Media engines:
      1 optical flow accelerator (OFA)
      1 JPEG decoder (NVJPEG)
      4 Video decoders (NVDEC)
    • GPU Memory 24GB HBM2
    • GPU Memory Bandwidth 933GB/s
    • Interconnect PCIe Gen4: 64GB/s
      Third-gen NVIDIA® NVLINK® 200GB/s**
    • Form Factor 2-slot, full height, full length (FHFL)
    • Max thermal design power (TDP) 165W Multi-Instance GPU (MIG): 4 MIGs @ 6GB each 2 MIGs @ 12GB each 1 MIGs @ 24GB
    • Virtual GPU (vGPU) software support: NVIDIA AI Enterprise NVIDIA Virtual Compute Server
    Tesla P100 Tesla P100 на THEOR3 ( Micro-architecture Pascal GP100 )
    RTX A2000 RTX A2000 на i9a ( Micro-architecture Ampere GA106 )
    • Form factor → PCIe x16 form factor
    • # OF CUDA CORES → 3328
    • # OF Tensor cores → 104
    • CUDA compute capability → 7.5
    • Frequency of cuda cores → up to 1.2 GHz
    • Double precision floating point performance (peak) → 249 Gflops
    • Single precision floating point performance (peak) → 7.987 Tflops
    • Total dedicated memory → 12GB GDDR6*
    • Memory speed → 1.5 GHz
    • Memory interface → 192-bit
    • Memory bandwidth → 288 GB/sec
    • Power consumption → 70W TDP
    • System interface → PCIe x16
    Tesla C2075 Tesla C2075 на Theor2 ( Micro-architecture Fermi GF100 )
    • Form factor → 9.75. PCIe x16 form factor
    • Number of CUDA cores → 448
    • Frequency of CUDA cores → 1.15 GHz
    • Double precision floating point performance (peak) → 515 Gflops
    • Single precision floating point performance (peak) → 1.03 Tflops
    • Total dedicated memory → 6GB GDDR5*
    • Memory speed → 1.5 GHz
    • Memory interface → 384-bit
    • Memory bandwidth → 144 GB/sec
    • Power consumption → 225W TDP
    • System interface → PCIe x16 Gen2
    • Thermal solution → Active Fansink
    • Display support → Dual-Link DVI-I: 1 → Maximum Display Resolution 1600x1200


    Вверх Главная страница описания Главная страница ЛТФ

    Программное обеспечение

    NVIDIA CUDA Toolkit

    Примечание: GPU Tesla C2075 не поддерживается современным ПО.

    Вверх Главная страница описания Главная страница ЛТФ

    Производительность

    for (i = 0; i < MatrixSize; i++)
    for (j = 0; j < MatrixSize; j++)
    for (k = 0; k < MatrixSize; k++)
    C[j][i] += A[j][k] * B[k][i];
    GFlops = 2 * MatrixSize3 /109/ExecutionTime

    Вверх Главная страница описания Главная страница ЛТФ

    Пример для Maple

    theor2:> maple test_cuda.mpl
    |\^/| Maple 16 (X86 64 LINUX)
    ._|\| |/|_. Copyright (c) Maplesoft, a division of Waterloo Maple Inc. 2012
    \ MAPLE / All rights reserved. Maple is a trademark of
    <____ ____> Waterloo Maple Inc.
    | Type ? for help.
    > CUDA:-IsEnabled();
    false

    > CUDA:-Enable(true);
    false

    > CUDA:-IsEnabled();
    true

    >
    > CUDA:-HasDoubleSupport();
    table([0 = true])

    >
    > with(LinearAlgebra):
    > M:=RandomMatrix(4000,outputoptions=[datatype=float[4]]);
    [ 4000 x 4000 Matrix ]
    M := [ Data Type: float[4] ]
    [ Storage: rectangular ]
    [ Order: Fortran_order ]

    > N:=RandomMatrix(4000,outputoptions=[datatype=float[4]]);
    memory used=124.1MB, alloc=126.0MB, time=0.88
    [ 4000 x 4000 Matrix ]
    N := [ Data Type: float[4] ]
    [ Storage: rectangular ]
    [ Order: Fortran_order ]

    >
    > time[real](MatrixMatrixMultiply(M,N));
    memory used=185.2MB, alloc=187.1MB, time=0.92
    0.617

    > CUDA:-Enable(false);
    true

    > time[real](MatrixMatrixMultiply(M,N));
    5.623

    >
    >
    > CUDA:-Enable(true);
    false

    > M:=RandomMatrix(4000,outputoptions=[datatype=float[8]]);
    memory used=368.4MB, alloc=248.1MB, time=7.48
    [ 4000 x 4000 Matrix ]
    M := [ Data Type: float[8] ]
    [ Storage: rectangular ]
    [ Order: Fortran_order ]

    > N:=RandomMatrix(4000,outputoptions=[datatype=float[8]]);
    memory used=490.6MB, alloc=370.2MB, time=7.88
    [ 4000 x 4000 Matrix ]
    N := [ Data Type: float[8] ]
    [ Storage: rectangular ]
    [ Order: Fortran_order ]

    >
    > time[real](MatrixMatrixMultiply(M,N));
    1.640

    >
    > CUDA:-Enable(false);
    true

    >
    > time[real](MatrixMatrixMultiply(M,N));
    10.614

    >
    > CUDA:-Properties();
    [table(["Max Threads Dimensions" = [1024, 1024, 64], "Clock Rate" = 1147000,

    "Max Grid Size" = [65535, 65535, 65535], "Memory Pitch" = 2147483647,

    "Max Threads Per Block" = 1024, "Warp Size" = 32,

    "Kernel Exec Timeout Enabled" = false, "Resisters Per Block" = 32768,

    "ID" = 0, "Texture Alignment" = 512, "Minor" = 0,

    "MultiProcessor Count" = 14, "Shared Memory Per Block" = 49152,

    "Total Global Memory" = 4294967295, "Major" = 2, "Name" = "Tesla C2075",

    "Total Constant Memory" = 65536,

    "Device Overlap" = 1

    ])]

    > quit
    memory used=734.8MB, alloc=614.3MB, time=20.10

    Вверх Главная страница описания Главная страница ЛТФ

    Источники информации



    Компьютерная группа ЛТФ

    20  февраля 2013 г.

    e-mail: super@theor.jinr.ru, telepuzik@theor.jinr.ru e-mail yoda@theor.jinr.ru, godzilla@theor.jinr.ru

    Дата обновления: 2025-11-27 21:25:34

    Valid HTML 4.01!

    Вверх Главная страница описания Главная страница ЛТФ