Prefix match in first iteration of beam search OP (#10231)
* Add BeamSearch op schema
* Add ONNX conversion for beams search
* remove attention_mask and change input order
* add option to run baseline
* add check data type NULL
* applies VerifyNodeAndOpMatch to subgraph
* update input_ids shape
* Add node name for Cast node
* expose API for topk
* parse parameters
* Add beam search scorer
* output results
* fix typo
* use c++ template and format python
* fix build pipeline errors
* symbolic shape infer of input onnx
* output scores
* add kernel def hash
* Handle vocab_mask; move CheckSubgraph
* undo insert_cast_transformer.cc and fusion_utils.py
* fix typo
* fix merge
* update doc
* add repetition penalty
* refactoring: add GptSubgraph class
* move BeamSearchState from .h to .cc file
* adjust logits processor order
* add batch generation example
* fix repetition penalty for dup words in sequence
* Add test
* Add no repeat ngram processor
* refactoring: move logits processor to classes
* fix build warning
* show latency
* use allocator in beam state
* use allocator in sequences
* fix build error
* move next_positions to beam state
* Changes for prefix matching
* removing debugs
* removing more debugs
* clean up
* clean up
* cpu doc updated
* Updated docs
* updated prefix_vocab_mask dimension in convert script
* changes to support bxs prefix_vocab_mask in beamsearchop kernel
* doc update
* OperatorKernels.md updated
* matching docs from artifacts
* minor change in logits processor
* Addressing comments
* Updated the prefix vocab mask usage properly
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>