text-generation-inference
Choosing input/total tokens automatically based on available VRAM?
#2673

Merged

Choosing input/total tokens automatically based on available VRAM? #2673

Narsil merged 13 commits into main from auto_length

drbh commented on 2024-10-21

Choosing input/total tokens automatically based on available VRAM?

a1aac784

Update doc.

79469f5f

Narsil force pushed from b2272ab7 to 79469f5f 1 year ago

Remove generated files.

a31db047

Trying to fix non chunking targets.

0a01dde9

Attempt #2

5c3efbc7

fix.

82a6cb82

QuantLinear is rocm compatible.

849d8821

Much simpler logic after the overhead.

10534511

Updating logic + non flash.

6994fa12

Revert doc text.

cacaba64

Simple updates.

199973cc

Fix integration mt0 (transformers update).

e3db5259

drbh dismissed these changes on 2024-10-25

OlivierDehaene requested a review from

OlivierDehaene 1 year ago

Merge branch 'main' into auto_length

c3fb2ecd

Narsil dismissed their stale review via c3fb2ecd 1 year ago

Narsil merged 0c9b6cdd into main 1 year ago

Narsil deleted the auto_length branch 1 year ago

Reviewers

drbh

OlivierDehaene

Assignees

No one assigned

Labels

None yet

Milestone

No milestone