> ## Documentation Index > Fetch the complete documentation index at: https://docs.chonkie.ai/llms.txt > Use this file to discover all available pages before exploring further. # Sentence Chunker > Split text into chunks while preserving sentence boundaries The `SentenceChunker` splits text into chunks while preserving complete sentences, ensuring that each chunk maintains proper sentence boundaries and context. ## API Reference To use the `SentenceChunker` via the API, check out the [API reference documentation](../../api/chunkers/sentence-chunker). ## Installation SentenceChunker is included in the base installation of Chonkie. No additional dependencies are required. For installation instructions, see the [Installation Guide](/oss/installation). ## Initialization ```python Python theme={"system"} from chonkie import SentenceChunker # Basic initialization with default parameters chunker = SentenceChunker( tokenizer="character", # Default tokenizer (or use "gpt2", etc.) chunk_size=2048, # Maximum tokens per chunk chunk_overlap=128, # Overlap between chunks min_sentences_per_chunk=1 # Minimum sentences in each chunk ) ``` ```javascript JavaScript theme={"system"} import { SentenceChunker } from "@chonkiejs/core"; // Basic initialization with default parameters const chunker = await SentenceChunker.create({ tokenizer: "character", // Default tokenizer chunkSize: 2048, // Maximum tokens per chunk chunkOverlap: 128, // Overlap between chunks minSentencesPerChunk: 1 // Minimum sentences in each chunk }); ``` ## Parameters Tokenizer to use. Can be a string identifier ("character", "word", "byte", "gpt2", etc.) or a tokenizer instance Maximum number of tokens per chunk Number of overlapping tokens between chunks Minimum number of sentences to include in each chunk Minimum number of characters per sentence Use approximate token counting for faster processing. This field is deprecated and will be removed in future versions.{" "} Delimiters to split sentences on Specify whether to include the delimiter with the previous or next chunk. ## Usage ### Single Text Chunking ```python Python theme={"system"} text = """This is the first sentence. This is the second sentence. And here's a third one with some additional context.""" chunks = chunker.chunk(text) for chunk in chunks: print(f"Chunk text: {chunk.text}") print(f"Token count: {chunk.token_count}") ``` ```javascript JavaScript theme={"system"} const text = `This is the first sentence. This is the second sentence. And here's a third one with some additional context.`; const chunks = await chunker.chunk(text); for (const chunk of chunks) { console.log(`Chunk text: ${chunk.text}`); console.log(`Token count: ${chunk.tokenCount}`); } ``` ### Batch Chunking ```python Python theme={"system"} texts = [ "First document. With multiple sentences.", "Second document. Also with sentences. And more context." ] batch_chunks = chunker.chunk_batch(texts) for doc_chunks in batch_chunks: for chunk in doc_chunks: print(f"Chunk: {chunk.text}") ``` ```javascript JavaScript theme={"system"} const texts = [ "First document. With multiple sentences.", "Second document. Also with sentences. And more context." ]; const batchChunks = await chunker.chunkBatch(texts); for (const docChunks of batchChunks) { for (const chunk of docChunks) { console.log(`Chunk: ${chunk.text}`); } } ``` ### Using as a Callable ```python theme={"system"} # Single text chunks = chunker("First sentence. Second sentence.") # Multiple texts batch_chunks = chunker(["Text 1. More text.", "Text 2. More."]) ``` ## Supported Tokenizers SentenceChunker supports multiple tokenizer backends: * **TikToken** (Recommended) ```python theme={"system"} import tiktoken tokenizer = tiktoken.get_encoding("gpt2") ``` * **AutoTikTokenizer** ```python theme={"system"} from autotiktokenizer import AutoTikTokenizer tokenizer = AutoTikTokenizer.from_pretrained("gpt2") ``` * **Hugging Face Tokenizers** ```python theme={"system"} from tokenizers import Tokenizer tokenizer = Tokenizer.from_pretrained("gpt2") ``` * **Transformers** ```python theme={"system"} from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("gpt2") ``` ## Return Type SentenceChunker returns chunks as `Chunk` objects: ```python theme={"system"} @dataclass class Chunk: text: str # The chunk text start_index: int # Starting position in original text end_index: int # Ending position in original text token_count: int # Number of tokens in chunk context: Optional[str] = None # Optional overlap context text embedding: Union[list[float], "np.ndarray", None] = None # Optional embedding vector ```