Prompt Engineering

9 min readJan 31, 2024

Explain to me like I am a eight years old…

Content of this Article

Introduction to Prompt Engineering
Zero Shot and Few Shot Prompting
Chain of Thought (CoT) Prompting
Self Consistency Prompting
Directional Stimulus Prompting
Retrieval Augmented Generation
Generated Knowledge Prompting
ReAct Prompting
Conclusion

Introduction to Prompt Engineering

With the inception of ChatGPT in 2022, we have seen how this innovative technology has not only transformed how we interact with AI but has also set new standards in natural language processing and user experience.

A glance at LLM Apps adoption in 2023, we see OpenAI dominating across the board.

Screenshot from https://state-of-llm.streamlit.app/

However, one pertinent question regarding using ChatGPT remains:

How can we communicate with ChatGPT in ways where it can return more accurate and relevant response?

That brings me to Prompt Engineering.

Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for your desired outcomes without updating the model weights. It involves crafting inputs, or prompts, in a way that guides the AI to provide the most accurate and relevant outputs.

Do note that this post only focuses on prompt engineering for autoregressive language models, so nothing about image generation or multimodality models. At its core, the goal of prompt engineering is about alignment and model steerability.

I will go through a total of 6 Prompt Engineering techniques in this article. Lets jump straight into it ~!

Zero Shot and Few Shot Prompting

Zero-Shot and Few-Shot Prompting two most basic approaches for prompting the model, commonly used for benchmarking LLM performance. (e.g.: Comparing Gemini Performance with GPT-4)

Zero Shot Prompting

Zero-Shot Learning simply feed the task text to the model and ask for results.

All the Sentiment Analysis examples are from Stanford Sentiment Treebank (SST-2)

Zero Shot Prompt:

Text: you ‘ll find yourself wishing that you and they were in another movie .
Sentiment: …

Few Shot Prompting

Few-Shot Prompting provides a few examples, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted.

Therefore, few-shot learning often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when input and output text are long.

Few Shot Prompt:

Text: will find little of interest in this film , which is often preachy and poorly acted
Sentiment: Negative
Text: warm water under a red bridge is a celebration of feminine energy , a tribute to the power of women to heal .
Sentiment: Positive
Text: a fine , rousing , g-rated family film , aimed mainly at little kids but with plenty of entertainment value to keep grown-ups from squirming in their seats .
Sentiment: Positive
Text: i 'm not sure which half of dragonfly is worse : the part where nothing 's happening , or the part where something 's happening
Sentiment: …

Many studies have looked into how to construct in-context examples, and they observed that choice of prompt format, training examples, and the order of the examples can lead to dramatically different performance, from near random guess to near SoTA.

Do check out Lilian’s Blog Post for tips on Example Selection and Example Ordering!

Chain of Thought (CoT) Prompting

Proposed by Google in 2022, as the authors write:

We explore how generating a chain of thought — a series of intermediate reasoning steps — significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting.

Chain-of-Thought (CoT) Prompting (Wei et al. 2022) generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, before eventually leading to the final answer. The benefit of CoT is more pronounced for complicated reasoning tasks, while using large models (e.g. with more than 50B parameters).

Types of CoT prompts

There are 2 main types of CoT prompting:

Few-Shot Chain-of-Thought (CoT) Prompting: To prompt the model with a few examples, each containing manually written or model-generated reasoning.

Zero-Shot Chain-of-Thought (CoT) Prompting: To use natural language statement like "Let's think Step by Step" to explicitly encourage the model to first generate reasoning chains and then to prompt with "Therefore, the answer is" to produce answers (Kojima et al. 2022). Or a similar statement "Let's work this out it a step by step to be sure we have the right answer” (Zhou et al. 2022).

Do check out Lilian’s Blog Post for tips on various CoT Tips and Extensions!

Self-Consistency Sampling

As the authors of Self-Consistency Improves Chain of Thought Reasoning in Language Models write:

… we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths.

Self-Consistency Sampling (Wang et al. 2022) is to sample from various reasoning paths with Few-Shot Chain-of-Thought Prompting and temperature > 0 and then selecting the best one out of these candidates. The criteria for selecting the best candidate can vary from task to task. A general solution is to pick majority/most consistent vote.

This method resembles Ensemble Learning.

Directional Stimulus Prompting

Proposed by Microsoft Engineers in July 2023, Li et al., (2023) write:

The directional stimulus serves as hints or cues for each input query to guide LLMs toward the desired output, such as keywords that the desired summary should include for summarization. We utilize a small tunable model (e.g., T5) to generate such directional stimulus for each query, allowing us to optimize black-box LLMs by optimizing a small policy model.

They proposed a new prompting technique to better guide the LLM in generating the desired summary. A tuneable policy Language Model is trained to generate the stimulus/hint.

This is especially useful as the stimulus/hint essentially acts as directions for the LMM when you want it to give specific information from the task you want to perform.

The figure below shows how Directional Stimulus Prompting compares with standard prompting. The policy Language Model can be small and optimized to generate the hints that guide a black-box frozen LLM.

Retrieval Augmented Generation (RAG)

RAG combines the strengths of two major components: a Neural Retrieval Mechanism and a Sequence-to-Sequence model.

In simpler terms, RAG combines an information retrieval component with a text generator model.

This combination allows RAG to extends the capabilities of LLMs like ChatGPT and Google Bard by supplementing it with additional knowledge and up-to-date external data, making LLMs adept at answering questions and providing explanations where facts could evolve over time.

Screenshot from Jerry Liu’s talk on LlamaIndex (2023)

Coincidentally, I wrote a Medium article about RAG just a couple of days back. Give it a read if you want to find out more!

Generated Knowledge Prompting

A similar idea to RAG, Generated Knowledge Prompting for Commonsense Reasoning was proposed. Instead of retrieving additional contextual information from an external database, like what RAG does, the authors suggested using an LLM to generate its own knowledge and then incorporating that into the prompt to improve common sense reasoning.

This “internal retrieval” technique generates knowledge to be used as part of the prompt, which is created using the AI itself.

Image from Paper: Generated Knowledge Prompting for Commonsense Reasoning

Prompt 1:

Input: Explain to me briefly the difference between spark driver memory and spark executor memory.
Response: In Apache Spark, a distributed computing framework, the concepts of Spark Driver Memory and Spark Executor Memory are essential for managing the application’s memory usage. Here’s a brief explanation of each: …

Prompt 2:

Input: Can I set my driver memory the same as my executor memory? Yes or No?
Response: …

With the model-generated knowledge, prompt the LLM further to get the answer you want.

ReAct Prompting

ReAct is inspired by the synergies between “acting” and “reasoning” which allow humans to learn new tasks and make decisions or reasoning.

It draws similarities from Chain-of-Thought (CoT) Prompting as well as Retrieval Augmented Generation (RAG).

In CoT, you are breaking down the steps in arriving at a response. ReAct goes one step further. Beyond reasonsing, ReAct act based off of what else is necessary to arrive at the response.

While both RAG and ReAct uses private databases (external data sources/knowledge bases), ReAct have the ability to go to public resources like Wikipedia through an API to bring even more additional information to complete the task.

Here’s an example of how ReAct reacts:

Assuming you are a top executive at Deloitte.
Give a Prompt like: “Who are Deloitte’s biggest competitors? What is our Client Retention Rate compared to the industry in 2023?”

When prompted, the chain execution that follows:

> Entering new AgentExecutor chain…
I need to find out who are Deloitte’s biggest competitors and then compare our Client Retention Rate with the industry in 2023.
Action: Search [Public Database Search]
Action Input: “Deloitte’s biggest competitors”
Observation: Deloitte’s competitors and similar companies include: Ernst & Young, KPMG, PricewaterhouseCoopers (PwC), Accenture, McKinsey & Company, Boston Consulting Group...
Thought: I need to search for our own internal Client Retention Rate in 2023.
Action: Search [Private Database Search]
Action Input: “Client Retention Rate in 2023”
Observation: 86.4%
Thought: I need to search for the industry’s Client Retention Rate in 2023.
Action: Search [Public Database Search]
Action Input: Professional Services Network’s Client Retention Rate in 2023
Observation: 81.9%
Thought: I now know the final answer.
Final Answer: Deloitte’s biggest competitors include Ernst & Young, KPMG, PricewaterhouseCoopers (PwC), Accenture, McKinsey & Company, and Boston Consulting Group. In 2023, our Client Retention Rate was 86.4%, which is notably higher than the industry average for Professional Services Networks, which stood at 81.9%. This indicates that Deloitte has maintained a strong competitive edge in client retention compared to its primary competitors in the industry.
> Finished chain.

Conclusion

In conclusion, this article has highlighted Prompt Engineering’s evolution from basic techniques like Zero Shot and Few Shot Prompting to more sophisticated methods such as Chain-of-Thought, Self Consistency, and Directional Stimulus Prompting. We have also looked at the integration of external data in Retrieval Augmented Generation, the creative possibilities with Generated Knowledge Prompting, and the nuanced interactions enabled by ReAct Prompting.

Thanks so much for reading my article!!! Feel free to drop me any comments, suggestions, and follow me on LinkedIn!

Much Thanks to:

Weng, Lilian. (Mar 2023). Prompt Engineering. Lil’Log. https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
Prompt Engineering Guide.
https://www.promptingguide.ai/techniques