Python iterators provide a standardized way to traverse elements of a collection one at a time without exposing the underlying representation. They are commonly used to iterate over lists, dictionaries, sets, or even files and streams, making them essential for memory-efficient programming.
The iterator protocol in Python requires an object to implement the __iter__() method, which returns the iterator object itself, and the __next__() method, which returns the next element. This design allows for lazy evaluation, meaning elements are computed only when needed, which is crucial for handling large datasets.
Iterators are often combined with generators to simplify code and reduce memory usage. Generators allow developers to define iterators using functions with the yield keyword, producing items on-the-fly instead of storing the entire sequence in memory.
An iterable in Python is any object capable of returning its elements one at a time, such as a list, tuple, or string. Iterables implement the __iter__() method, which returns an iterator object.
An iterator, on the other hand, is the object returned by calling __iter__() on an iterable. It maintains an internal state and implements the __next__() method, which produces the next value when called, raising StopIteration when the sequence ends.
Practically, you can loop over an iterable directly using a for loop, but the for loop internally converts it into an iterator and calls __next__() repeatedly until exhaustion.
To implement a custom iterator, a class must define the __iter__() and __next__() methods. __iter__() typically returns self, while __next__() calculates the next value and maintains the iteration state.
For a class generating squares of numbers, you can maintain a counter that increments with each call to __next__(), returning the square of the counter until a predefined limit is reached, after which StopIteration is raised.
This approach allows the class to be used in any context that expects an iterator, such as for loops or comprehension expressions, providing a clean and memory-efficient iteration mechanism.
Python iterators enable lazy evaluation, generating each element only when needed rather than storing the entire dataset in memory. This is crucial for large data streams where holding all elements at once is infeasible.
For example, reading a multi-gigabyte log file line by line can be efficiently handled using an iterator, whereas loading all lines into a list could consume excessive memory and degrade performance.
Iterators also integrate with generators and itertools to create complex, composable pipelines. This combination allows for filtering, mapping, and batching operations on-the-fly without materializing intermediate sequences, enhancing scalability and reducing memory footprint.
All iterators are inherently iterables because they implement the __iter__() method, allowing them to be used in a for loop or any context requiring an iterable.
Calling iter() on an iterator returns the iterator itself as part of the iterator protocol. Iterators do not store all elements in memory and cannot be reused after exhaustion without re-creating them.
iter() converts any iterable into an iterator. Custom classes implementing __iter__() and __next__() can also be iterators.
Generator functions with yield implicitly create an iterator object when called. List comprehensions create lists, which are iterables but not iterators.
Iterators internally track the current position, allowing successive calls to __next__() to return the next element.
When an iterator is exhausted, next() raises StopIteration. Iterators cannot reset automatically; they must be recreated.
itertools.chain() combines multiple iterables into a single iterator without storing all elements in memory.
The class maintains a current value starting from 0 and increments by 2 with each call to __next__().
When the current value exceeds the specified limit, StopIteration is raised, signaling the end of iteration.
This design allows iteration over even numbers up to a given limit without precomputing a list, saving memory and providing lazy evaluation.
// Python
class EvenNumbers:
def __init__(self, limit):
self.limit = limit
self.current = 0
def __iter__(self):
return self
def __next__(self):
if self.current > self.limit:
raise StopIteration
value = self.current
self.current += 2
return value
# Usage
for num in EvenNumbers(10):
print(num)
The generator maintains two variables a and b to track consecutive Fibonacci numbers.
Yield produces each Fibonacci number on-the-fly, allowing iteration without storing the entire sequence.
This is efficient for generating long sequences where memory usage is a concern.
// Python
def fibonacci(n):
a, b = 0, 1
count = 0
while count < n:
yield a
a, b = b, a + b
count += 1
# Usage
for num in fibonacci(7):
print(num)
The iterator uses a stack to manage elements, processing nested lists in a LIFO manner to flatten them.
Lists encountered during iteration are reversed and extended onto the stack to preserve order in the flattened sequence.
This approach allows traversal of arbitrarily nested lists without recursion, making it suitable for large or deeply nested data structures.
// Python
class FlattenIterator:
def __init__(self, nested_list):
self.stack = nested_list[::-1]
def __iter__(self):
return self
def __next__(self):
while self.stack:
top = self.stack.pop()
if isinstance(top, list):
self.stack.extend(top[::-1])
else:
return top
raise StopIteration
# Usage
nested = [1, [2, [3, 4], 5], 6]
for val in FlattenIterator(nested):
print(val)
The iterator maintains an index and wraps around using modulo arithmetic, producing an infinite repeating sequence.
StopIteration is only raised if the input list is empty, otherwise iteration continues indefinitely.
This pattern is useful in applications such as round-robin scheduling or repeated simulations where cyclic access is required.
// Python
class InfiniteCycle:
def __init__(self, data):
self.data = data
self.index = 0
def __iter__(self):
return self
def __next__(self):
if not self.data:
raise StopIteration
value = self.data[self.index]
self.index = (self.index + 1) % len(self.data)
return value
# Usage
cycle_iter = InfiniteCycle([1,2,3])
for i in range(10):
print(next(cycle_iter))
Iterators process data one element at a time instead of loading the entire dataset into memory. When working with multi-gigabyte database exports, audit logs, or event streams, this significantly reduces memory consumption and startup time.
A common production pattern is reading records from a file, transforming them, and sending them to another system. Using iterators allows each record to be processed immediately after it is read, creating a streaming pipeline rather than a batch-loading approach.
This design also improves scalability because memory usage remains relatively constant regardless of the size of the source data. As datasets grow, iterator-based solutions typically remain stable while list-based approaches may encounter memory pressure or performance degradation.
Iterators are stateful objects. Every call to next() advances the iterator. A for loop repeatedly calls next() internally, consuming elements until StopIteration is raised.
Converting an iterator to a list also consumes all remaining elements. Most iterator objects do not support len() because the total number of remaining elements may be unknown or expensive to determine.
Batch processing is a common requirement when sending records to APIs, databases, or message queues. Instead of handling one record at a time, the iterator returns groups of records.
The iterator maintains an index and slices the underlying collection on each iteration. This pattern is frequently used in ETL and integration workloads where systems impose batch size limits.
// Python
class BatchIterator:
def __init__(self, data, batch_size):
self.data = data
self.batch_size = batch_size
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.data):
raise StopIteration
batch = self.data[self.index:self.index + self.batch_size]
self.index += self.batch_size
return batch
records = list(range(1, 11))
for batch in BatchIterator(records, 3):
print(batch)
Iterators maintain internal state. When multiple consumers share the same iterator, each consumer advances the iterator position. This can result in missing records, inconsistent processing, or difficult-to-debug behavior.
For example, if one component reads five records before another component starts processing, those five records are no longer available to the second consumer. Unlike lists, iterators do not automatically provide independent views of the same data.
In production systems, it is often safer to create separate iterators from the original iterable or use tools such as itertools.tee() when independent traversal is required. However, developers should understand the memory implications of duplicating iterator state.
StopIteration is the mechanism used by the iterator protocol to indicate exhaustion. When no additional values are available, __next__() raises StopIteration.
For loops catch this exception internally and terminate the loop gracefully. Developers usually interact with it indirectly through iteration constructs.
This example demonstrates the low-level iterator protocol that powers every for loop in Python.
The iter() function creates an iterator, and next() retrieves successive values until StopIteration signals completion. Understanding this behavior helps when debugging custom iterators.
// Python
numbers = (10, 20, 30)
iterator = iter(numbers)
try:
while True:
print(next(iterator))
except StopIteration:
print('Iteration completed')
Large files can be processed incrementally using chunk-based iteration. This approach avoids loading the entire file into memory.
The iterator reads a fixed number of characters during each iteration and automatically stops when the end of the file is reached. Similar patterns are commonly used for log processing and file transfer systems.
// Python
class FileChunkIterator:
def __init__(self, filename, chunk_size=1024):
self.file = open(filename, 'r', encoding='utf-8')
self.chunk_size = chunk_size
def __iter__(self):
return self
def __next__(self):
chunk = self.file.read(self.chunk_size)
if not chunk:
self.file.close()
raise StopIteration
return chunk
# Usage
# for chunk in FileChunkIterator('sample.txt', 512):
# print(chunk)
map(), filter(), and zip() produce lazy iterators that generate values on demand. This allows large datasets to be processed efficiently.
sorted() is different because it immediately creates and returns a list containing all sorted elements.
Generators are usually preferred when iteration logic is straightforward and does not require complex state management. They provide the same lazy behavior while significantly reducing boilerplate code.
A generator can often replace dozens of lines of iterator class implementation with a few yield statements. This improves readability and maintainability without sacrificing performance.
Custom iterator classes become more valuable when multiple state variables, configuration options, resource management requirements, or specialized behaviors need to be encapsulated within a reusable object.
The generator evaluates each status code lazily and yields only successful HTTP responses. This avoids creating unnecessary intermediate collections.
Similar filtering pipelines are frequently used in API monitoring, integration platforms, observability systems, and event-processing applications where millions of records may pass through a workflow.
// Python
def successful_responses(status_codes):
for code in status_codes:
if 200 <= code < 300:
yield code
responses = [200, 404, 201, 500, 204, 301]
for code in successful_responses(responses):
print(code)
The itertools module provides a suite of tools for building complex iterators that perform combinations, permutations, chaining, grouping, and infinite iteration without creating intermediate collections.
For example, itertools.cycle() can be used for round-robin scheduling, and itertools.islice() allows slicing an iterator efficiently, which is especially useful for large datasets or streaming data.
By leveraging itertools with custom iterators, developers can create memory-efficient pipelines for ETL, batch processing, or analytics tasks without the overhead of storing all intermediate results in memory.
itertools.count() generates an infinite iterator of numbers. permutations() and combinations() produce iterators over all possible arrangements and selections, respectively.
sum() computes a value immediately and returns an integer, not an iterator, so it does not support lazy iteration.
The iterator keeps track of both the current key and index within the list associated with that key.
It moves to the next key when the inner list is exhausted, flattening the dictionary into a stream of key-value tuples.
This approach is useful when iterating over structured data from APIs, configuration files, or nested datasets in a memory-efficient manner.
// Python
class DictListFlattener:
def __init__(self, data):
self.items = list(data.items())
self.outer_index = 0
self.inner_index = 0
def __iter__(self):
return self
def __next__(self):
if self.outer_index >= len(self.items):
raise StopIteration
key, lst = self.items[self.outer_index]
if self.inner_index >= len(lst):
self.outer_index += 1
self.inner_index = 0
return self.__next__()
value = lst[self.inner_index]
self.inner_index += 1
return (key, value)
# Usage
data = {'a':[1,2], 'b':[3,4]}
for k,v in DictListFlattener(data):
print(k,v)
A generator expression uses lazy evaluation, creating an iterator that yields values one at a time, whereas a list comprehension evaluates immediately and returns a complete list.
Generator expressions are memory-efficient for large sequences because they produce items on-the-fly without storing the entire result in memory.
In practice, generator expressions are preferred when processing streams of data, while list comprehensions are convenient for small collections where immediate access to all items is required.
For loops, list(), and sum() internally iterate through the iterator, consuming elements as they go.
reversed() requires a sequence with a known length and indexable elements, so it cannot directly operate on generic iterators.
The generator iterates from 2 up to the specified limit and checks each number for primality by testing divisibility up to its square root.
Using yield allows each prime number to be produced on demand, avoiding storage of all primes in memory and supporting efficient processing of large limits.
// Python
def primes_up_to(limit):
for n in range(2, limit+1):
is_prime = True
for i in range(2, int(n**0.5)+1):
if n % i == 0:
is_prime = False
break
if is_prime:
yield n
for p in primes_up_to(20):
print(p)
Standard iterators are synchronous and block until each element is available. When dealing with asynchronous streams, you must use async iterators and async for loops.
Python provides the __aiter__() and __anext__() methods for asynchronous iteration, allowing integration with async generators, network I/O, or event-driven streams without blocking the main thread.
This separation ensures that large-scale real-time data processing, such as consuming messages from a queue or streaming logs, can be efficiently handled using iterator patterns while maintaining non-blocking concurrency.
Iterators are stateful, so sharing them across threads or reusing them after exhaustion can lead to lost data or inconsistent processing.
Functions that consume iterators fully can leave the caller with an empty iterator, which may not be expected unless the developer explicitly accounts for it.
The generator maintains a current value and increments it by the step size on each iteration.
Because it is infinite, it never raises StopIteration, and elements are produced lazily as needed.
This pattern is useful in simulations, scheduling, or generating predictable sequences in streaming applications.
// Python
def arithmetic_sequence(start=0, step=1):
current = start
while True:
yield current
current += step
seq = arithmetic_sequence(5, 3)
for _ in range(5):
print(next(seq))
The iterator maintains the next element from each input iterator and always yields the smaller one, advancing the corresponding iterator.
This allows efficient, memory-friendly merging of two sorted sequences without creating intermediate lists.
Such iterators are widely used in external sorting, merging logs, or streaming sorted datasets from multiple sources.
// Python
class MergeSortedIterators:
def __init__(self, iter1, iter2):
self.iter1 = iter(iter1)
self.iter2 = iter(iter2)
self.next1 = next(self.iter1, None)
self.next2 = next(self.iter2, None)
def __iter__(self):
return self
def __next__(self):
if self.next1 is None and self.next2 is None:
raise StopIteration
if self.next1 is None:
result, self.next2 = self.next2, next(self.iter2, None)
elif self.next2 is None:
result, self.next1 = self.next1, next(self.iter1, None)
else:
if self.next1 <= self.next2:
result, self.next1 = self.next1, next(self.iter1, None)
else:
result, self.next2 = self.next2, next(self.iter2, None)
return result
# Usage
for val in MergeSortedIterators([1,3,5],[2,4,6]):
print(val)