InterviewQAs

Python Generators

Download as PDF
All questions in this page are included
Preparing…
Download PDF
PG
Python Generators

Python generators are widely used in data engineering, API integrations, ETL pipelines, log processing systems, and streaming applications where loading everything into memory is impractical. They provide a way to produce values lazily, allowing applications to handle large datasets efficiently.

Experienced engineers often rely on generators when working with files, database records, message queues, and network streams. Instead of returning complete collections, generators yield data incrementally, reducing memory consumption and improving scalability.

Generators become particularly valuable in enterprise systems where data volumes are unpredictable. They help create processing pipelines that transform, filter, and aggregate data on demand while maintaining a small memory footprint.

Beyond memory efficiency, generators improve application responsiveness. Consumers can begin processing results immediately without waiting for the entire dataset to be produced. This behavior is useful in reporting systems, event-driven architectures, and real-time integrations.

Understanding generator execution flow, state preservation, delegation, exception handling, and advanced features such as send(), throw(), and yield from is important for technical interviews and practical software development. The questions below focus on realistic scenarios encountered by professional Python developers.

Question 01

Why would a data engineer choose a generator instead of returning a list when processing a large CSV file containing millions of records?

EASY

A generator allows records to be processed one at a time instead of loading the entire file into memory. In a large CSV containing millions of rows, creating a list could consume several gigabytes of RAM and potentially cause application failures or severe performance degradation.

With a generator, each row is read, transformed, and consumed only when requested by the caller. This creates a streaming workflow where memory usage remains relatively constant regardless of file size.

In production ETL systems, generators are often combined with filtering, validation, and transformation stages. Each stage processes records incrementally, enabling efficient pipelines that can handle datasets much larger than available system memory.

Question 02

Which statements about Python generators are correct?

MEDIUM
  • A Generators preserve local variable state between yields.
  • B Generators automatically execute all code when created.
  • C Generators can be iterated only once unless recreated.
  • D Generators always consume more memory than lists.

A generator pauses execution at each yield statement and retains local variables, execution position, and internal state. When iteration resumes, execution continues from the exact point where it stopped.

Creating a generator does not execute its body immediately. Execution begins only when iteration starts. Additionally, generators are typically memory-efficient because values are produced on demand rather than stored all at once.

Question 03

Create a generator that streams log entries from a list and returns only ERROR messages.

EASY

This generator filters records lazily and yields only entries matching a specific condition. The consumer receives values one at a time as iteration progresses.

A similar approach is commonly used in monitoring platforms where log streams can be extremely large. Rather than storing all matching records, the application processes them incrementally.

# Python
logs = [
    "INFO Application started",
    "ERROR Database connection failed",
    "INFO User authenticated",
    "ERROR Invalid token"
]


def error_logs(log_entries):
    for entry in log_entries:
        if entry.startswith("ERROR"):
            yield entry


for log in error_logs(logs):
    print(log)
Question 04

How does 'yield from' simplify generator-based processing pipelines?

MEDIUM

The 'yield from' statement delegates iteration to another iterable or generator. Instead of manually looping through values and yielding them individually, a generator can hand control to another generator using a single statement.

This delegation makes complex processing pipelines easier to read and maintain. Nested generators become cleaner because intermediate forwarding code disappears.

In enterprise integration systems, 'yield from' is useful when combining multiple data sources or processing stages. Each stage remains independent while still participating in a larger streaming workflow.

Question 05

Which operations can be performed on an active generator object?

HARD
  • A send()
  • B throw()
  • C close()
  • D append()

Generators support advanced control mechanisms through send(), throw(), and close(). These methods allow callers to pass values into generators, inject exceptions, or terminate execution gracefully.

append() is a list operation and has no meaning for generator objects. Understanding generator control methods is important when building coroutine-style workflows and event-processing systems.

Question 06

Write a generator that reads records in batches from a dataset and yields fixed-size chunks.

MEDIUM

Batch processing is a common requirement when interacting with APIs, databases, and message brokers. This generator accumulates records until the desired batch size is reached and then yields the batch.

The final partial batch is also returned, ensuring that no records are lost. This pattern is frequently used when loading data into Salesforce, Snowflake, or other enterprise platforms.

# Python

def batch_generator(records, batch_size):
    batch = []

    for record in records:
        batch.append(record)

        if len(batch) == batch_size:
            yield batch
            batch = []

    if batch:
        yield batch


records = range(1, 11)

for chunk in batch_generator(records, 3):
    print(chunk)
Question 07

What are the tradeoffs of using generators extensively in a production application?

HARD

Generators provide excellent memory efficiency, but they can introduce debugging complexity. Since execution is suspended and resumed repeatedly, understanding program flow may become more difficult than with traditional functions.

Another consideration is that generators are consumable streams. Once exhausted, they cannot be reused unless recreated. This behavior can lead to subtle bugs when multiple consumers expect access to the same data.

In large systems, developers should balance memory savings against maintainability. Generators are highly effective for streaming workloads, but excessive chaining of generators can make troubleshooting and observability more challenging.

Question 08

What will happen when a generator function reaches the end of its execution without another yield statement?

MEDIUM
  • A A StopIteration exception is raised internally.
  • B The generator automatically restarts.
  • C Iteration ends normally.
  • D The generator converts itself into a list.

When execution reaches the end of a generator function, Python raises StopIteration internally to signal completion. Iteration constructs such as for loops handle this exception automatically.

The generator does not restart itself and does not convert into another data structure. Understanding termination behavior helps developers build reliable streaming workflows.

Question 09

Demonstrate how values can be sent into a generator using send().

HARD

The generator acts like a stateful processor that accepts incoming values through send(). Each received value updates the running total while preserving state between executions.

This pattern is useful in streaming analytics, event aggregation, and real-time monitoring systems where data arrives incrementally rather than as a complete collection.

# Python

def running_total():
    total = 0

    while True:
        value = yield total
        if value is not None:
            total += value


gen = running_total()

print(next(gen))
print(gen.send(10))
print(gen.send(5))
print(gen.send(20))
Question 10

Build a multi-stage generator pipeline that reads numbers, filters even values, and transforms them into squares.

HARD

This example demonstrates a streaming pipeline where each generator performs a single responsibility. Data flows through multiple stages without creating intermediate collections.

The design mirrors many real-world ETL and integration workloads. Records are sourced, filtered, enriched, and delivered progressively, allowing the application to process large volumes of data with minimal memory usage.

# Python

def source():
    for number in range(1, 11):
        yield number


def filter_even(numbers):
    for number in numbers:
        if number % 2 == 0:
            yield number


def square(numbers):
    for number in numbers:
        yield number ** 2


pipeline = square(filter_even(source()))

for value in pipeline:
    print(value)
Question 11

How do generator expressions differ from list comprehensions, and when would you prefer one over the other?

MEDIUM

Generator expressions create values lazily, while list comprehensions build the entire collection immediately in memory. This distinction becomes significant when working with large datasets where memory consumption matters.

For example, calculating aggregates over millions of records can often be performed using a generator expression because values are consumed one at a time. A list comprehension would first allocate memory for every element before processing begins.

List comprehensions remain useful when data must be accessed multiple times, indexed, or reused across operations. Generator expressions are generally preferred for one-time sequential processing where memory efficiency is a priority.

Question 12

Which of the following statements about generator expressions are correct?

MEDIUM
  • A Generator expressions use parentheses instead of square brackets.
  • B Generator expressions immediately allocate memory for all values.
  • C Generator expressions return generator objects.
  • D Generator expressions support iteration.

Generator expressions produce generator objects that generate values on demand. They are created using parentheses and participate in normal iteration protocols.

Unlike lists, they do not allocate memory for every generated value at creation time. This makes them suitable for processing large streams of data.

Question 13

Create a generator that yields lines from a file while skipping empty lines.

MEDIUM

This generator streams file content one line at a time while filtering blank records. Only meaningful data is returned to the consumer.

Such patterns are frequently used when processing log files, CSV exports, configuration files, and large audit datasets where loading the entire file into memory would be inefficient.

# Python

def read_non_empty_lines(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            cleaned = line.strip()
            if cleaned:
                yield cleaned


for line in read_non_empty_lines('sample.txt'):
    print(line)
Question 14

What happens if an exception occurs inside a generator during iteration?

HARD

When an exception occurs inside a generator and is not handled, it propagates to the caller at the point where the next value is requested. Iteration stops unless the exception is caught by surrounding code.

This behavior allows generators to communicate processing failures directly to consumers. The caller can decide whether to retry, log the error, skip records, or terminate processing.

In production systems, it is often beneficial to handle expected exceptions within the generator and continue processing, while allowing unexpected failures to propagate for visibility and troubleshooting.

Question 15

Which statement best describes the purpose of the yield keyword?

EASY
  • A It terminates the Python interpreter.
  • B It returns a value and preserves execution state.
  • C It converts a generator into a list.
  • D It creates a thread.

The yield keyword pauses execution, returns a value to the caller, and preserves local variables and execution position.

When iteration resumes, execution continues immediately after the yield statement. This capability is what enables lazy evaluation and stateful iteration.

Question 16

Write a generator that continuously produces sequential invoice numbers starting from a specified value.

MEDIUM

This generator maintains internal state and produces unique invoice numbers indefinitely. Each call retrieves the next available value.

The pattern is useful for simulations, testing environments, event identifiers, and scenarios requiring sequential value generation without storing large ranges.

# Python

def invoice_generator(start_number):
    current = start_number

    while True:
        yield current
        current += 1


generator = invoice_generator(1000)

print(next(generator))
print(next(generator))
print(next(generator))
Question 17

Why can generators improve response time in data processing applications?

MEDIUM

Generators allow consumers to begin processing data immediately rather than waiting for an entire dataset to be created. This reduces perceived latency and improves responsiveness.

Consider a reporting service that retrieves records from multiple sources. A generator can start delivering records as soon as they become available, enabling downstream components to begin work earlier.

This incremental processing model is especially valuable in streaming systems, integration platforms, and large-scale analytics workloads where data production may take significant time.

Question 18

Which situations are strong candidates for generator-based solutions?

HARD
  • A Streaming large API responses.
  • B Processing multi-gigabyte log files.
  • C Frequently accessing random elements by index.
  • D Building data transformation pipelines.

Generators excel when data can be processed sequentially and incrementally. Streaming APIs, large files, and transformation pipelines are ideal use cases because they benefit from lazy evaluation.

Random indexed access is not a strength of generators because values are produced sequentially and are not stored in a structure that supports direct indexing.

Question 19

Create a generator that retries failed operations and yields successful results only.

HARD

This generator demonstrates how processing logic and retry behavior can be combined into a streaming workflow. Records are yielded only after successful execution.

Similar designs appear in integration platforms that communicate with external APIs where transient failures may occur due to rate limits, network interruptions, or temporary service outages.

# Python
import random


def process_records(records, max_retries=3):
    for record in records:
        retries = 0

        while retries < max_retries:
            try:
                if random.choice([True, False]):
                    raise Exception('Temporary failure')

                yield f'Success: {record}'
                break

            except Exception:
                retries += 1


for result in process_records(['A', 'B', 'C']):
    print(result)
Question 20

Implement a generator that merges multiple sorted streams into a single sorted output.

HARD

The generator uses a priority queue to efficiently merge multiple sorted streams while maintaining ordering. Only the minimum required data is held in memory.

This technique is commonly used in distributed processing systems, log aggregation platforms, and large-scale data pipelines where multiple sorted inputs must be combined into a unified stream.

# Python
import heapq


def merge_streams(*streams):
    heap = []

    for index, stream in enumerate(map(iter, streams)):
        try:
            value = next(stream)
            heapq.heappush(heap, (value, index, stream))
        except StopIteration:
            pass

    while heap:
        value, index, stream = heapq.heappop(heap)
        yield value

        try:
            next_value = next(stream)
            heapq.heappush(heap, (next_value, index, stream))
        except StopIteration:
            pass


stream1 = iter([1, 4, 7])
stream2 = iter([2, 5, 8])
stream3 = iter([3, 6, 9])

for item in merge_streams(stream1, stream2, stream3):
    print(item)
Question 21

How can generators help reduce memory pressure when consuming paginated REST API responses?

MEDIUM

A generator can request one page at a time and yield records incrementally instead of collecting all pages into a single list. This approach keeps memory usage stable regardless of the total number of records returned by the API.

The consumer processes each record as it arrives, allowing downstream transformations, validations, or database writes to begin immediately. This often reduces overall processing latency.

In enterprise integrations, paginated APIs can return millions of records. Using generators prevents memory spikes and makes long-running synchronization jobs more reliable.

Question 22

Which characteristics are true for a Python generator function?

MEDIUM
  • A It contains at least one yield statement.
  • B Calling it immediately executes all statements.
  • C It returns a generator object.
  • D Its state can be resumed after yielding.

A function containing a yield statement becomes a generator function. Invoking it returns a generator object without immediately executing the function body.

Execution begins only when iteration starts. The generator preserves its state and continues from the last yield point when the next value is requested.

Question 23

Write a generator that streams database records in pages and yields individual records to the consumer.

MEDIUM

The generator hides pagination details from consumers. Records are exposed as a continuous stream even though the source returns data in pages.

This pattern is frequently used when integrating with databases, cloud services, and SaaS platforms that enforce pagination limits.

# Python

def fetch_pages():
    yield [101, 102, 103]
    yield [104, 105, 106]
    yield [107, 108]


def stream_records():
    for page in fetch_pages():
        for record in page:
            yield record


for record in stream_records():
    print(record)
Question 24

What are the implications of passing the same generator object to multiple consumers?

HARD

A generator represents a single execution stream. Every consumer shares the same internal state, meaning values consumed by one consumer are no longer available to another.

This behavior often surprises developers who expect generators to behave like reusable collections. Once a value has been produced and consumed, it cannot be retrieved again from the same generator instance.

When multiple consumers need access to identical data, consider recreating the generator, materializing results into a collection, or using utilities such as itertools.tee while carefully evaluating memory tradeoffs.

Question 25

Which built-in function is commonly used to retrieve the next value from a generator?

EASY
  • A next()
  • B yield()
  • C iterate()
  • D resume()

The next() function advances a generator to its next yield statement and returns the produced value.

If no more values remain, StopIteration is raised to indicate that iteration has completed.

Question 26

Create a generator that yields only unique values from a stream while preserving order.

MEDIUM

The generator maintains a set of previously encountered values and emits each value only once.

This approach is useful when processing event streams, customer identifiers, or transaction feeds where duplicate records must be filtered without disrupting order.

# Python

def unique_stream(values):
    seen = set()

    for value in values:
        if value not in seen:
            seen.add(value)
            yield value


items = [10, 20, 10, 30, 20, 40]

for item in unique_stream(items):
    print(item)
Question 27

Why are generators often preferred for building multi-stage ETL pipelines?

MEDIUM

Generators allow each stage of an ETL pipeline to process records incrementally. Extraction, transformation, validation, and loading can occur continuously without waiting for complete datasets.

This streaming architecture minimizes memory consumption and reduces the delay between data ingestion and availability. Problems can also be detected earlier because records are processed immediately.

Large enterprise systems commonly use generator-based pipelines to move millions of records through multiple transformation stages while maintaining predictable resource usage.

Question 28

Which scenarios can lead to unexpected behavior when working with generators?

HARD
  • A Attempting to iterate over an exhausted generator again.
  • B Expecting random index access.
  • C Sharing a generator between multiple consumers.
  • D Using a generator inside a for loop.

Generators are sequential streams and do not support random access. Once exhausted, they cannot be restarted without creating a new generator instance.

Sharing a generator among consumers can also cause confusion because all consumers advance the same internal execution state.

Question 29

Implement a generator that calculates a moving average from an incoming stream of numbers.

HARD

The generator maintains only the values needed for the current window and produces averages incrementally.

Streaming analytics systems frequently use this pattern to calculate rolling metrics without storing entire datasets in memory.

# Python
from collections import deque


def moving_average(values, window_size):
    window = deque(maxlen=window_size)

    for value in values:
        window.append(value)
        yield sum(window) / len(window)


numbers = [10, 20, 30, 40, 50]

for avg in moving_average(numbers, 3):
    print(avg)
Question 30

Create a generator that monitors incoming events and yields an alert whenever a threshold is exceeded.

HARD

This generator continuously evaluates incoming events and emits alerts only when specific conditions are met.

The pattern is common in observability platforms, infrastructure monitoring systems, fraud detection pipelines, and operational dashboards where real-time notifications are required.

# Python

def threshold_monitor(events, threshold):
    for event in events:
        if event > threshold:
            yield f'ALERT: {event} exceeded threshold {threshold}'


metrics = [40, 60, 95, 30, 120]

for alert in threshold_monitor(metrics, 80):
    print(alert)