transformers
22e3da92 - Fix wrong input shapes in doc-string of models (#37729)

Commit

73 days ago

Fix wrong input shapes in doc-string of models (#37729) * Fix wrong position_ids shape in doc Supported by ClvpDecoder.forward, line 1212--1215: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) 1214 position_embeds = self.position_embeds_layer(position_ids) 1215 inputs_embeds = inputs_embeds + position_embeds * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong inputs_embeds shape in doc Supported by CTRLModel.forward, line 448--449: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong token_type_ids shape in doc Supported by CTRLModel.forward, line 441--460: src/transformers/models/ctrl/modeling_ctrl.py: 441 if token_type_ids is not None: 442 token_type_ids = token_type_ids.view(-1, input_shape[-1]) 443 token_type_embeds = self.w(token_type_ids) 444 token_type_embeds *= np.sqrt(self.d_model_size) 445 else: 446 token_type_embeds = 0 447 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds *= np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong position_ids shape in doc Supported by CTRLModel.forward, line 448--460: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds *= np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix wrong token_type_ids shape in doc Supported by TFCTRLMainLayer.call, line 376--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 376 if token_type_ids is not None: 377 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 378 token_type_embeds = self.w(token_type_ids) 379 token_type_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, dtype=token_type_embeds.dtype)) 380 else: 381 token_type_embeds = tf.constant(0.0) 382 position_ids = tf.reshape(position_ids, [-1, shape_list(position_ids)[-1]]) 383 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds * Fix wrong inputs_embeds shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds * Fix wrong inputs_embeds shape in doc Supported by ClvpDecoder.forward, line 1212--1213: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) * Fix wrong position_ids shape in doc Supported by FlaxGemmaPreTrainedModel.__call__, line 502--508: src/transformers/models/gemma/modeling_flax_gemma.py: 502 batch_size, sequence_length = input_ids.shape 503 504 if position_ids is None: 505 if past_key_values is not None: 506 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 507 508 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxGPT2PreTrainedModel.__call__, line 482--488: src/transformers/models/gpt2/modeling_flax_gpt2.py: 482 batch_size, sequence_length = input_ids.shape 483 484 if position_ids is None: 485 if past_key_values is not None: 486 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 487 488 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by GPT2Model.forward, line 918--921: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) 920 position_embeds = self.wpe(position_ids) 921 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPT2Model.forward, line 918--919: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPT2LMHeadModel.forward, line 1156--1157: src/transformers/models/gpt2/modeling_gpt2.py: 1156 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 1157 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong labels shape in doc Supported by GPT2DoubleHeadsModel.forward, line 1314--1315: src/transformers/models/gpt2/modeling_gpt2.py: 1314 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 1315 `labels = input_ids`. Indices are selected in `[-100, 0, ..., config.vocab_size - 1]`. All labels set to * Fix wrong token_type_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong inputs_embeds shape in doc Supported by TFGPT2MainLayer.call, line 486--488: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) * Fix wrong position_ids shape in doc Supported by GPTBigCodeModel.forward, line 962--965: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) 964 position_embeds = self.wpe(position_ids) 965 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPTBigCodeModel.forward, line 962--963: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTBigCodeForCausalLM.forward, line 1158--1159: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 1158 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 1159 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTNeoModule.__call__, line 549--552: src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py: 549 input_embeds = self.wte(input_ids.astype("i4")) 550 position_embeds = self.wpe(position_ids.astype("i4")) 551 552 hidden_states = input_embeds + position_embeds * Fix wrong position_ids shape in doc Supported by GPTNeoModel.forward, line 685--720: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) 687 688 # kept for BC (non `Cache` `past_key_values` inputs) 689 return_legacy_cache = False 690 if use_cache and not isinstance(past_key_values, Cache): 691 return_legacy_cache = True 692 if past_key_values is None: 693 past_key_values = DynamicCache() 694 else: 695 past_key_values = DynamicCache.from_legacy_cache(past_key_values) 696 logger.warning_once( 697 "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and " 698 "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class " 699 "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)" 700 ) 701 702 seq_length = inputs_embeds.shape[1] 703 if cache_position is None: 704 past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0 705 cache_position = torch.arange(past_seen_tokens, past_seen_tokens + seq_length, device=inputs_embeds.device) 706 707 if position_ids is None: 708 position_ids = cache_position.unsqueeze(0) 709 710 causal_mask = self._update_causal_mask( 711 attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions 712 ) 713 714 # Prepare head mask if needed 715 # 1.0 in head_mask indicate we keep the head 716 # attention_probs has shape bsz x num_heads x N x N 717 # head_mask has shape n_layer x batch x num_heads x N x N 718 head_mask = self.get_head_mask(head_mask, self.config.num_layers) 719 position_embeds = self.wpe(position_ids) 720 hidden_states = inputs_embeds + position_embeds * Fix wrong inputs_embeds shape in doc Supported by GPTNeoModel.forward, line 685--686: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTNeoForCausalLM.forward, line 968--969: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 968 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 969 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTJPreTrainedModel.__call__, line 455--461: src/transformers/models/gptj/modeling_flax_gptj.py: 455 batch_size, sequence_length = input_ids.shape 456 457 if position_ids is None: 458 if past_key_values is not None: 459 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 460 461 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong token_type_ids shape in doc Supported by TFGPTJMainLayer.call, line 482--493: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") 485 486 if token_type_ids is not None: 487 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 488 token_type_embeds = self.wte(token_type_ids, mode="embedding") 489 else: 490 token_type_embeds = tf.constant(0.0) 491 492 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 493 hidden_states = inputs_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPTJMainLayer.call, line 434--449: src/transformers/models/gptj/modeling_tf_gptj.py: 434 elif input_ids is not None: 435 input_shape = shape_list(input_ids) 436 input_ids = tf.reshape(input_ids, [-1, input_shape[-1]]) 437 elif inputs_embeds is not None: 438 input_shape = shape_list(inputs_embeds)[:-1] 439 else: 440 raise ValueError("You have to specify either input_ids or inputs_embeds") 441 442 if past_key_values is None: 443 past_length = 0 444 past_key_values = [None] * len(self.h) 445 else: 446 past_length = shape_list(past_key_values[0][0])[-2] 447 448 if position_ids is None: 449 position_ids = tf.expand_dims(tf.range(past_length, input_shape[-1] + past_length), axis=0) * Fix wrong inputs_embeds shape in doc Supported by TFGPTJMainLayer.call, line 482--484: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") * Fix wrong labels shape in doc Supported by TFGPTJForCausalLM.call, line 812--813: src/transformers/models/gptj/modeling_tf_gptj.py: 812 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 813 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong token_type_ids shape in doc Supported by ImageGPTModel.forward, line 773--780: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) 777 778 if token_type_ids is not None: 779 token_type_embeds = self.wte(token_type_ids) 780 hidden_states = hidden_states + token_type_embeds This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong position_ids shape in doc Supported by ImageGPTModel.forward, line 773--776: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong inputs_embeds shape in doc Supported by ImageGPTModel.forward, line 773--774: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTForCausalImageModeling.forward, line 923--924: src/transformers/models/imagegpt/modeling_imagegpt.py: 923 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 924 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTModel.forward, line 665--666: src/transformers/models/imagegpt/modeling_imagegpt.py: 665 Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set 666 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix wrong position_ids shape in doc Supported by FlaxLlamaPreTrainedModel.__call__, line 484--490: src/transformers/models/llama/modeling_flax_llama.py: 484 batch_size, sequence_length = input_ids.shape 485 486 if position_ids is None: 487 if past_key_values is not None: 488 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 489 490 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxMistralPreTrainedModel.__call__, line 478--484: src/transformers/models/mistral/modeling_flax_mistral.py: 478 batch_size, sequence_length = input_ids.shape 479 480 if position_ids is None: 481 if past_key_values is not None: 482 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 483 484 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

References

#37729 - Fix wrong input shapes in doc-string of models

Author

kkew3

Parents

4d64c385

Files15

src/transformers/models
- clvp
  - modeling_clvp.py
- ctrl
  - modeling_ctrl.py
  - modeling_tf_ctrl.py
- gemma
  - modeling_flax_gemma.py
- gpt2
  - modeling_flax_gpt2.py
  - modeling_gpt2.py
  - modeling_tf_gpt2.py
- gpt_bigcode
  - modeling_gpt_bigcode.py
- gpt_neo
  - modeling_flax_gpt_neo.py
  - modeling_gpt_neo.py
- gptj
  - modeling_flax_gptj.py
  - modeling_tf_gptj.py
- imagegpt
  - modeling_imagegpt.py
- llama
  - modeling_flax_llama.py
- mistral
  - modeling_flax_mistral.py

transformers 22e3da92 - Fix wrong input shapes in doc-string of models (#37729)

transformers
22e3da92 - Fix wrong input shapes in doc-string of models (#37729)