Package 'batchLLM' reference manual

Title:	Batch Process LLM Text Completions Using a Data Frame
Description:	Batch process large language model (LLM) text completions using data frame rows, with support for OpenAI's 'GPT' (<https://chat.openai.com>), Anthropic's 'Claude' (<https://claude.ai>), and Google's 'Gemini' (<https://gemini.google.com>). Includes features such as local storage, metadata logging, API rate limiting delays, and a 'shiny' app addin.
Authors:	Dylan Pieper [aut, cre, cph]
Maintainer:	Dylan Pieper <[email protected]>
License:	MIT + file LICENSE
Version:	0.2.0
Built:	2025-03-12 06:30:57 UTC
Source:	https://github.com/dylanpieper/batchllm

Batch Process LLM Text Completions Using a Data Frame

Description

Batch process large language model (LLM) text completions by looping across the rows of a data frame column. The package currently supports OpenAI's GPT, Anthropic's Claude, and Google's Gemini models, with built-in delays for API rate limiting. The package provides advanced text processing features, including automatic logging of batches and metadata to local files, side-by-side comparison of outputs from different LLMs, and integration of a user-friendly Shiny App Addin. Use cases include natural language processing tasks such as sentiment analysis, thematic analysis, classification, labeling or tagging, and language translation.

Usage

batchLLM(
  df,
  df_name = NULL,
  col,
  prompt,
  LLM = "openai",
  model = "gpt-4o-mini",
  temperature = 0.5,
  max_tokens = 500,
  batch_delay = "random",
  batch_size = 10,
  case_convert = NULL,
  sanitize = FALSE,
  attempts = 1,
  log_name = "batchLLM-log",
  hash_algo = "crc32c",
  ...
)
batchLLM(
  df,
  df_name = NULL,
  col,
  prompt,
  LLM = "openai",
  model = "gpt-4o-mini",
  temperature = 0.5,
  max_tokens = 500,
  batch_delay = "random",
  batch_size = 10,
  case_convert = NULL,
  sanitize = FALSE,
  attempts = 1,
  log_name = "batchLLM-log",
  hash_algo = "crc32c",
  ...
)

Arguments

`df`	A data frame that contains the input data.
`df_name`	An optional string specifying the name of the data frame to log. This is particularly useful in Shiny applications or when the data frame is passed programmatically rather than explicitly. Default is NULL.
`col`	The name of the column in the data frame to process.
`prompt`	A system prompt for the LLM model.
`LLM`	A string for the name of the LLM with the options: "openai", "anthropic", and "google". Default is "openai".
`model`	A string for the name of the model from the LLM. Default is "gpt-4o-mini".
`temperature`	A temperature for the LLM model. Default is .5.
`max_tokens`	A maximum number of tokens to generate before stopping. Default is 500.
`batch_delay`	A string for the batch delay with the options: "random", "min", and "sec". Numeric examples include "1min" and "30sec". Default is "random" which is an average of 10.86 seconds (n = 1,000 simulations).
`batch_size`	The number of rows to process in each batch. Default is 10.
`case_convert`	A string for the case conversion of the output with the options: "upper", "lower", or NULL (no change). Default is NULL.
`sanitize`	Extract the LLM text completion from the model's response by returning only content in `<result>` XML tags. Additionally, remove all punctuation. This feature prevents unwanted text (e.g., preamble) or punctuation from being included in the model's output. Default is FALSE.
`attempts`	The maximum number of loop retry attempts. Default is 1.
`log_name`	A string for the name of the log without the `.rds` file extension. Default is "batchLLM-log".
`hash_algo`	A string for a hashing algorithm from the 'digest' package. Default is `crc32c`.
`...`	Additional arguments to pass on to the LLM API function.

Value

Returns the input data frame with an additional column containing the text completion output. The function also writes the output and metadata to the log file after each batch in a nested list format.

Examples

## Not run: 
library(batchLLM)

# Set API keys
Sys.setenv(OPENAI_API_KEY = "your_openai_api_key")
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")
Sys.setenv(GEMINI_API_KEY = "your_gemini_api_key")

# Define LLM configurations
llm_configs <- list(
  list(LLM = "openai", model = "gpt-4o-mini"),
  list(LLM = "anthropic", model = "claude-3-haiku-20240307"),
  list(LLM = "google", model = "1.5-flash")
)

# Apply batchLLM function to each configuration
beliefs <- lapply(llm_configs, function(config) {
  batchLLM(
    df = beliefs,
    col = statement,
    prompt = "classify as a fact or misinformation in one word",
    LLM = config$LLM,
    model = config$model,
    batch_size = 10,
    batch_delay = "1min",
    case_convert = "lower"
  )
})[[length(llm_configs)]]

# Print the updated data frame
print(beliefs)

## End(Not run)
## Not run: 
library(batchLLM)

# Set API keys
Sys.setenv(OPENAI_API_KEY = "your_openai_api_key")
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")
Sys.setenv(GEMINI_API_KEY = "your_gemini_api_key")

# Define LLM configurations
llm_configs <- list(
  list(LLM = "openai", model = "gpt-4o-mini"),
  list(LLM = "anthropic", model = "claude-3-haiku-20240307"),
  list(LLM = "google", model = "1.5-flash")
)

# Apply batchLLM function to each configuration
beliefs <- lapply(llm_configs, function(config) {
  batchLLM(
    df = beliefs,
    col = statement,
    prompt = "classify as a fact or misinformation in one word",
    LLM = config$LLM,
    model = config$model,
    batch_size = 10,
    batch_delay = "1min",
    case_convert = "lower"
  )
})[[length(llm_configs)]]

# Print the updated data frame
print(beliefs)

## End(Not run)

Interact with batchLLM via a Shiny Gadget

Description

This function provides a user interface using Shiny to interact with the batchLLM package. It allows users to configure and execute batch processing through an interactive dashboard.

Usage

batchLLM_shiny()
batchLLM_shiny()

Value

No return value. Launches a Shiny Gadget that allows users to interact with the batchLLM package.

Beliefs Dataset

Description

The beliefs dataset consists of 20 statements representing opposing views on various scientific, environmental, and societal topics.

Usage

beliefs
beliefs

Format

A data frame with 20 rows and 1 variable:

statement: A character string with a statement representing a belief.

Examples

head(beliefs)
head(beliefs)

Interact with Anthropic's Claude API

Description

This function provides an interface to interact with Claude AI models via Anthropic's API, allowing for flexible text generation based on user inputs. This function was adapted from the claudeR repository by yrvelez on GitHub (MIT License).

Usage

claudeR(
  prompt,
  model = "claude-3-5-sonnet-20240620",
  max_tokens = 500,
  stop_sequences = NULL,
  temperature = 0.7,
  top_k = -1,
  top_p = -1,
  api_key = NULL,
  system_prompt = NULL
)
claudeR(
  prompt,
  model = "claude-3-5-sonnet-20240620",
  max_tokens = 500,
  stop_sequences = NULL,
  temperature = 0.7,
  top_k = -1,
  top_p = -1,
  api_key = NULL,
  system_prompt = NULL
)

Arguments

`prompt`	A string vector for Claude-2, or a list for Claude-3 specifying the input for the model.
`model`	The model to use for the request. Default is the latest Claude-3 model.
`max_tokens`	A maximum number of tokens to generate before stopping.
`stop_sequences`	Optional. A list of strings upon which to stop generating.
`temperature`	Optional. Amount of randomness injected into the response.
`top_k`	Optional. Only sample from the top K options for each subsequent token.
`top_p`	Optional. Does nucleus sampling.
`api_key`	Your API key for authentication.
`system_prompt`	Optional. An optional system role specification.

Value

The resulting completion up to and excluding the stop sequences.

Examples

## Not run: 
library(batchLLM)

# Set API in the env or use api_key parameter in the claudeR call
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")

# Using Claude-2
response <- claudeR(
  prompt = "What is the capital of France?",
  model = "claude-2.1",
  max_tokens = 50
)
cat(response)

# Using Claude-3
response <- claudeR(
  prompt = list(
    list(role = "user", content = "What is the capital of France?")
  ),
  model = "claude-3-5-sonnet-20240620",
  max_tokens = 50,
  temperature = 0.8
)
cat(response)

# Using a system prompt
response <- claudeR(
  prompt = list(
    list(role = "user", content = "Summarize the history of France in one paragraph.")
  ),
  system_prompt = "You are a concise summarization assistant.",
  max_tokens = 500
)
cat(response)

## End(Not run)
## Not run: 
library(batchLLM)

# Set API in the env or use api_key parameter in the claudeR call
Sys.setenv(ANTHROPIC_API_KEY = "your_anthropic_api_key")

# Using Claude-2
response <- claudeR(
  prompt = "What is the capital of France?",
  model = "claude-2.1",
  max_tokens = 50
)
cat(response)

# Using Claude-3
response <- claudeR(
  prompt = list(
    list(role = "user", content = "What is the capital of France?")
  ),
  model = "claude-3-5-sonnet-20240620",
  max_tokens = 50,
  temperature = 0.8
)
cat(response)

# Using a system prompt
response <- claudeR(
  prompt = list(
    list(role = "user", content = "Summarize the history of France in one paragraph.")
  ),
  system_prompt = "You are a concise summarization assistant.",
  max_tokens = 500
)
cat(response)

## End(Not run)

Get Batches

Description

Get batches of generated output in a single data frame from the .rds log file.

Usage

get_batches(df_name = NULL, log_name = "batchLLM-log")
get_batches(df_name = NULL, log_name = "batchLLM-log")

Arguments

`df_name`	A string to match the name of a processed data frame.
`log_name`	A string specifying the name of the log without the `.rds` file extension. Default is "batchLLM-log".

Value

A data frame containing the generated output.

Examples

## Not run: 
library(batchLLM)

# Assuming you have a log file with data for "beliefs_40a3012b" (see batchLLM example)
batches <- get_batches("beliefs_40a3012b")
head(batches)

# Using a custom log file name
custom_batches <- get_batches("beliefs_40a3012b", log_name = "custom-log.rds")
head(custom_batches)

## End(Not run)
## Not run: 
library(batchLLM)

# Assuming you have a log file with data for "beliefs_40a3012b" (see batchLLM example)
batches <- get_batches("beliefs_40a3012b")
head(batches)

# Using a custom log file name
custom_batches <- get_batches("beliefs_40a3012b", log_name = "custom-log.rds")
head(custom_batches)

## End(Not run)

Scrape Metadata

Description

Scrape metadata from the .rds log file.

Usage

scrape_metadata(df_name = NULL, log_name = "batchLLM-log")
scrape_metadata(df_name = NULL, log_name = "batchLLM-log")

Arguments

`df_name`	Optional. A string to match the name of a processed data frame.
`log_name`	A string specifying the name of the log file without the extension. Default is "batchLLM-log".

Value

A data frame containing metadata.

Examples

library(batchLLM)

# Scrape metadata for all data frames in the default log file
all_metadata <- scrape_metadata()
head(all_metadata)

# Scrape metadata for a specific data frame
specific_metadata <- scrape_metadata("beliefs_40a3012b")
head(specific_metadata)

# Use a custom log file name
custom_metadata <- scrape_metadata(log_name = "custom-log")
head(custom_metadata)
library(batchLLM)

# Scrape metadata for all data frames in the default log file
all_metadata <- scrape_metadata()
head(all_metadata)

# Scrape metadata for a specific data frame
specific_metadata <- scrape_metadata("beliefs_40a3012b")
head(specific_metadata)

# Use a custom log file name
custom_metadata <- scrape_metadata(log_name = "custom-log")
head(custom_metadata)

Package 'batchLLM'

Help Index

Batch Process LLM Text Completions Using a Data Frame

Description

Usage

Arguments

Value

Examples

Interact with batchLLM via a Shiny Gadget

Description

Usage

Value

Beliefs Dataset

Description

Usage

Format

Examples

Interact with Anthropic's Claude API

Description

Usage

Arguments

Value

Examples

Get Batches

Description

Usage

Arguments

Value

Examples

Scrape Metadata

Description

Usage

Arguments

Value

Examples