DeepSpeed
645639bc - Rearrange inference OPS and stop using builder.load (#5490)

Commit
266 days ago
Rearrange inference OPS and stop using builder.load (#5490) This PR mainly handles all places where InferenceBuilder is used to access any op or a specific implementation for an op. Instead an op is defined, and its proper implementation is picked inside and the usage will be transparent to the user. What was done in the PR: 1) Added missing ops (added a py file with fallback mechanism) 2) Added missing fallback implementations for existing ops 3) removed all usages for builder.load and replaced them with ops instead. 4) added workspace op and inferenceContext which contains all workspace related functions and inferenceContext is the python fallback of inferenceContext in CUDA 5) a small change to softmax_context signature to fit the fallback signature. --------- Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
  • csrc/transformer/inference/csrc
    • File
      pt_binding.cpp
  • deepspeed
    • inference
      • File
        engine.py
    • model_implementations/transformers
      • File
        ds_llama2.py
      • File
        ds_transformer.py
    • ops/transformer/inference
      • File
        config.py
      • File
        diffusers_attention.py
      • File
        diffusers_transformer_block.py
      • File
        ds_attention.py
      • File
        moe_inference.py
      • op_binding
        • File
          bias_add.py
        • File
          bias_gelu.py
        • File
          bias_relu.py
        • File
          bias_residual.py
        • File
          einsum_sec_sm_ecm.py
        • File
          gated_activation.py
        • File
          gelu_gemm.py
        • File
          layer_norm.py
        • File
          mlp_gemm.py
        • File
          moe_res_matmul.py
        • File
          pad_transform.py
        • File
          pre_rms_norm.py
        • File
          qkv_gemm.py
        • File
          residual_add.py
        • File
          rms_norm.py
        • File
          softmax.py
        • File
          softmax_context.py
        • File
          vector_add.py
        • File
          vector_matmul.py
        • File
          workspace.py
      • triton
        • File
          attention.py
        • File
          ops.py
    • runtime
      • File
        hybrid_engine.py
  • op_builder/hpu
    • File
      __init__.py
    • File
      transformer_inference.py
  • tests/unit/ops/transformer/inference
    • File
      test_bias_add.py
    • File
      test_bias_geglu.py
    • File
      test_bias_gelu.py
    • File
      test_bias_relu.py
    • File
      test_gelu.py
    • File
      test_layer_norm.py
    • File
      test_moe_res_matmult.py
    • File
      test_residual_add.py
    • File
      test_rms_norm.py
    • File
      test_softmax.py
Loading