About 11,200,000 results
Open links in new tab
  1. OpenAI API: How do I count tokens before(!) I send an API request?

    Mar 21, 2023 · How do I count tokens before (!) I send an API request? As stated in the official OpenAI article: To further explore tokenization, you can use our interactive Tokenizer tool, …

  2. Looking for a clear definition of what a "tokenizer", "parser" and ...

    Mar 28, 2018 · A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context …

  3. Unable to get the tokenizer of Gemma-3 - Stack Overflow

    Mar 22, 2025 · 3 I am trying to get the tokenizer using huggingface AutoTokenizer library, but I am unable to fetch, is there any other way to get it? Where I am doing wrong?

  4. How to do Tokenizer Batch processing? - HuggingFace

    Jun 7, 2023 · in the Tokenizer documentation from huggingface, the call fuction accepts List [List [str]] and says: text (str, List [str], List [List [str]], optional) — The sequence or batch of …

  5. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  6. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = …

  7. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · And then it points to the train_new_from_iterator() function in Chapter 7 but I can't seem to find reference to how to use it to extend the tokenizer without re-training it.

  8. How does max_length, padding and truncation arguments work in ...

    Dec 11, 2020 · I am working with Text Classification problem where I want to use the BERT model as the base followed by Dense layers. I want to know how does the 3 arguments work? …

  9. how to use tiktoken in offline mode computer - Stack Overflow

    Apr 26, 2023 · First, let's grab the tokenizer blob URL from the source on your remote machine. If we trace the get_encoding function, we find it calls a function from tiktoken_ext.openai_public …

  10. What's the meaning of "Using bos_token, but it is not set yet."

    Dec 21, 2020 · The __call__ method of the tokenizer has an attribute add_special_tokens which defaults to True. This means adding the BOS (beginning of a sentence) token at the beginning …