Prompt engineering isn’t about creating elaborate prompts. It’s about developing the judgment to choose the right structure, logic, and level of control for a given task.
This article gives you 40 scenario-based questions and answers that reflect real decisions you make when working with LLMs in production. Try answering each question before revealing the solution. The explanations focus on why one approach works better than the others in the given scenario.
| A. Use a Generative AI model to decide creatively | B. Use a supervised classification model trained on labelled data |
| C. Use an LLM with high temperature | D. Ask the LLM to explain first and then decide |
Correct Answer: B
Supervised classification models are designed for fixed-label problems where accuracy and consistency matter. Training on labeled ticket data allows the model to learn clear decision boundaries and apply them deterministically. Generative AI is less reliable for strict categorization because it may introduce variability or creative interpretations, which are undesirable in customer support routing.
| A. Rule-based text templates | B. Traditional ML classification |
| C. Generative AI with controlled creativity | D. A deterministic decision tree |
Correct Answer: C
Generative AI with controlled creativity is ideal for producing multiple headline variations. By tuning creativity parameters, the model can explore different emotional angles while staying on message. Rule-based or classification approaches lack variation, while deterministic models cannot generate diverse outputs needed for marketing experiments.
| A. Prompt an LLM to estimate revenue based on trends | B. Use a multimodal LLM with charts as input |
| C. Ask the LLM to summarize historical revenue patterns | D. Use a time-series forecasting or regression model |
Correct Answer: D
Revenue prediction is a numeric forecasting task that requires statistical grounding and auditability. Time-series and regression models are purpose-built for this type of structured financial data. LLMs can describe trends but are unreliable for precise numeric forecasts.
| A. Use traditional automation for predictable requests and GenAI for open-ended ones | B. Use Generative AI for all emails |
| C. Use rule-based systems for all emails | D. Avoid automation because requirements differ |
Correct Answer: A
Predictable questions benefit from deterministic automation, while open-ended queries require flexibility. A hybrid approach uses the strengths of both traditional automation and Generative AI. Applying one method to all cases would either reduce accuracy or increase risk.
| A. LLMs are too slow for healthcare use | B. LLMs may hallucinate or inconsistently apply fixed decision rules |
| C. LLMs cannot read policy documents | D. LLMs are too expensive for classification |
Correct Answer: B
Insurance decisions rely on strict, consistently applied rules. Generative AI models may hallucinate or interpret policies inconsistently. This creates unacceptable risk in regulated healthcare workflows.
| A. Increase temperature so the model explores formats | B. Remove examples to reduce confusion |
| C. Add a strict formatting instruction with a bullet-point template | D. Shorten the input text |
Correct Answer: C
The problem is output structure, not creativity. Adding a strict formatting instruction with a clear template constrains the model effectively. Temperature changes do not reliably enforce format.
| A. Zero-shot prompting | B. One-shot prompting |
| C. High-temperature sampling | D. Few-shot prompting with structured examples |
Correct Answer: D
Consistent contract analysis requires predictable structure. Few-shot prompting with structured examples shows the model exactly how to organize its output. This is more reliable than zero-shot or high-temperature approaches.
| A. “Return the output as a JSON object with fixed keys.” | B. “Extract the key information carefully.” |
| C. “Be precise and thorough.” | D. “Summarize the product details.” |
Correct Answer: A
Downstream systems require predictable, machine-readable output. Explicitly requesting a JSON object with fixed keys enforces structure. Vague extraction instructions lead to inconsistent formats.
| A. In the user prompt | B. In the system message as a non-negotiable rule |
| C. In a few-shot example | D. In post-processing only |
Correct Answer: B
This is a hard security rule that must never be violated. System messages have the highest priority and cannot be overridden. That makes them the correct place for enforcing sensitive constraints.
| A. Lower the temperature | B. Add examples of cautious responses |
| C. Redefine the assistant’s role as a compliance-focused advisor | D. Ask users to be more specific |
Correct Answer: C
Changing the assistant’s role influences behavior across all responses. A compliance-focused role naturally leads to cautious, qualified answers. This is more effective than tuning randomness parameters.
| A. Increase temperature to explore alternatives | B. Ask the model to explain its reasoning in detail |
| C. Shorten the input text | D. Explicitly list the allowed categories and forbid any others |
Correct Answer: D
The model invents labels because the output space is underspecified. Explicitly listing allowed categories constrains responses. This prevents invalid outputs.
| A. Include the tone instructions as a persistent system rule | B. Add the instruction once in a user message |
| C. Rely on the model to infer tone from context | D. Increase the number of examples dynamically |
Correct Answer: A
Tone consistency requires persistence across interactions. System-level instructions are always applied regardless of user input. This makes them the most reliable option.
| A. Faster response time | B. Better adherence to format and expectations |
| C. Lower token usage | D. Higher creativity |
Correct Answer: B
Examples clarify expectations better than instructions alone. A high-quality example demonstrates both format and level of detail. This improves adherence to desired outputs.
| A. Persona anchoring | B. Higher temperature with self-consistency |
| C. Explicit refusal conditions and constraints | D. Open-ended role definition |
Correct Answer: C
The assistant must know exactly when to refuse. Explicit refusal conditions define clear boundaries. This prevents guessing in regulated environments.
| A. Increase max tokens | B. Remove all examples |
| C. Increase top-p | D. Lower temperature |
Correct Answer: D
Temperature directly controls randomness in outputs. Lowering it reduces variability and stabilizes formatting. This is the least disruptive first fix.
| A. response.choices[0].message.content | B. response[“text”] |
| C. Response.message.content | D. response.output.text |
Correct Answer: A
LLM APIs return outputs as a list of choices. The generated text is stored in the first choice’s message content. Accessing it directly retrieves the correct output.
user_id request_count| A. logs_df.loc[“request_count” > 50] | B. logs_df[logs_df[“request_count”] > 50] |
| C. logs_df.iloc[logs_df[“request_count”] > 50] | D. logs_df[“request_count”].filter(>50) |
Correct Answer: B
Boolean filtering in pandas must be applied column-wise. This syntax correctly selects rows meeting the condition. Other options misuse indexing methods.
| A. Each snapshot will remain unchanged | B. Python automatically deep copies nested lists |
| C. Older snapshots will reflect the most recent changes | D. Only the latest snapshot is affected |
Correct Answer: C
All snapshots reference the same mutable object. When the list changes, all snapshots reflect the update. Older states are therefore lost.
| A. Manually loop and print values | B. Save JSON to Excel before analysis |
| C. Use a pivot table without a DataFrame | D. Convert the JSON list directly into a pandas DataFrame |
Correct Answer: D
Pandas operates on tabular data structures. Converting JSON directly into a DataFrame enables efficient analysis. Intermediate formats add unnecessary complexity.
user_idtokens_used
A.
|
B.
|
C.
|
D.
|
Correct Answer: C
The task requires filtering users by call count before aggregation. Separating the steps ensures the condition is applied correctly. This avoids incorrect totals.
responses = [
{"status": "success", "cost": 0.02},
{"status": "error", "cost": 0.00},
{"status": "success", "cost": 0.05}
]
A.
|
B.
|
C.
|
D.
|
Correct Answer: B
Only successful responses should contribute to cost. A loop with a conditional check enforces this explicitly. Other options either fail or hard-code values.
A.
|
B.
|
C.
|
D.
|
Correct Answer: A
Word count must be computed per row. Applying a function row-wise achieves this correctly. The other options misuse vector operations.
statuslatency_ms
A.
|
B.
|
C.
|
D.
|
Correct Answer: D
Pandas requires element-wise logical operators. Parentheses and & ensure both conditions are evaluated per row. Python’s and does not work for Series.
user_idresponse_code
A.
|
B.
|
C.
|
D.
|
Correct Answer: A
The requirement is to find users with at least one 500 error. Filtering first and then extracting unique user IDs directly answers this. Other options compute unrelated aggregates.
| A. Temperature | B. max_tokens |
| C. top_p | D. stop_sequence |
Correct Answer: B
Output length is controlled by max_tokens. Temperature and top-p affect randomness, not size. Stop sequences terminate output but do not cap length.
| A. Increase temperature to explore alternatives | B. Add a rule that the assistant must answer every question |
| C. Explicitly instruct the assistant to answer only when supported by retrieved documents | D. Add more examples without constraints |
Correct Answer: C
The problem is hallucination when evidence is missing. Explicitly restricting answers to retrieved documents prevents guessing. This enforces grounded responses.
| A. Put all rules in the user message | B. Put everything in the app configuration message |
| C. Allow users to override tone when needed | D. Place compliance rules in the system message, tone in the app configuration, and tasks in the user message |
Correct Answer: D
Security requires separating responsibilities across message layers. System messages enforce compliance, configuration controls tone, and user messages define tasks. This limits prompt injection risk.
| A. Plan-then-execute prompting | B. Zero-shot prompting |
| C. High temperature sampling | D. Removing tool descriptions |
Correct Answer: A
The agent needs to reason about steps before acting. Plan-then-execute prompting enforces correct ordering. Other strategies do not constrain tool usage.
| A. Persona anchoring | B. Strict attribution requirements with refusal conditions |
| C. Few-shot prompting only | D. Higher max tokens |
Correct Answer: B
Exact citations and refusal on conflict require strict constraints. Attribution requirements enforce traceability. Refusal conditions prevent unsafe resolution.
| A. Zero-shot prompting | B. Self-consistency |
| C. ReAct-style reasoning loop | D. Temperature tuning |
Correct Answer: C
The task requires iterative reasoning, actions, and evaluation. ReAct-style loops explicitly support this structure. Other methods lack execution traceability.
| A. Persona anchoring | B. Increased few-shot examples |
| C. High temperature reasoning | D. Explicit context isolation |
Correct Answer: D
The assistant must not use external knowledge. Explicit context isolation enforces this restriction. The refusal clause ensures safe failure.
| A. Tree-of-Thought or planning-oriented Chain-of-Thought | B. Zero-shot prompting |
| C. Higher temperature sampling | D. Few-shot output-only examples |
Correct Answer: A
Complex tasks benefit from decomposition and planning. Planning-oriented Chain-of-Thought makes dependencies explicit. This improves reliability over zero-shot approaches.
| A. Ask the model to estimate availability | B. Add a system-level rule requiring a tool call before answering |
| C. Increase creativity settings | D. Let users verify manually |
Correct Answer: B
Availability must be based on real-time data. A system-level rule requiring a tool call enforces this. Estimation or creativity introduces risk.
| A. Few-shot prompting | B. Zero-shot prompting |
| C. ReAct-style reasoning with action-observation loops | D. Output-only prompting |
Correct Answer: C
Full debugging requires visibility into reasoning and tool calls. ReAct-style loops expose each action and observation. This supports auditing and diagnosis.
| A. Increase max tokens | B. Hard-code approvals in the user prompt |
| C. Raise temperature to encourage exploration | D. Require the agent to explicitly plan and validate each step before execution |
Correct Answer: D
The agent skips validation because it is optional. Forcing explicit planning and validation embeds the check into execution. This structurally prevents bypassing approvals.
| A. Explicit context isolation with refusal conditions | B. Persona anchoring |
| C. Increased creativity through temperature | D. Few-shot summarization |
Correct Answer: A
The risk is blending assumptions with policy. Context isolation restricts the knowledge source. Refusal conditions prevent unsafe extrapolation.
| A. Zero-shot prompting | B. Chain-of-Thought with hidden or tagged reasoning |
| C. Few-shot prompting | D. High temperature sampling |
Correct Answer: B
Planning improves code correctness. Hidden Chain-of-Thought allows internal reasoning without exposing it. Users receive only the final code.
| A. Self-consistency | B. Zero-shot prompting |
| C. Planning-oriented Chain-of-Thought (Least-to-Most or Tree-of-Thought) | D. Constitutional AI |
Correct Answer: C
Tool orchestration requires dependency-aware planning. Planning-oriented Chain-of-Thought produces executable plans. Other methods lack structure.
| A. Send raw JSON directly | B. Ignore tool output |
| C. Re-run the tool automatically | D. Translate tool output into a user-friendly explanation and offer further help |
Correct Answer: D
Raw tool output is not user-friendly. Translating it into clear language improves usability. This preserves correctness without leaking system data.
| A. System message | B. User message |
| C. Few-shot examples | D. Output post-processing only |
Correct Answer: A
System message a non-negotiable security rule. Only system messages are fully non-overridable. Placing it there guarantees enforcement.
If you scored well on these, like more than 30 correct, then you’re already thinking beyond prompts and into system design.
If some questions surprised you, then they’ve served their purpose. As prompt engineering isn’t memorization, it’s judgment. Instead of brute forcing all possible answers to questions, developing a sound understanding of the problem should be the focus.
If you’re finding it too hard to go through the questions you might consider taking a free prompt engineering course.