KeMontielOleaNesbit2024

Documentation for KeMontielOleaNesbit2024.

KeMontielOleaNesbit2024.RawDocs
KeMontielOleaNesbit2024.algo1_only_store_draws
KeMontielOleaNesbit2024.algo2_range
KeMontielOleaNesbit2024.compute_functional_from_nmf_draws
KeMontielOleaNesbit2024.credible_set90_range
KeMontielOleaNesbit2024.do_plots
KeMontielOleaNesbit2024.find_NMF_given_solution
KeMontielOleaNesbit2024.gen_NMF
KeMontielOleaNesbit2024.generate_tf_only_matrix
KeMontielOleaNesbit2024.plot_word_cloud
KeMontielOleaNesbit2024.preprocess
KeMontielOleaNesbit2024.run
KeMontielOleaNesbit2024.run_tests
KeMontielOleaNesbit2024.simulation_plots
KeMontielOleaNesbit2024.vb_estimate

KeMontielOleaNesbit2024.RawDocs — Type

RawDocs

Structure to hold document data, including tokens, stems, and metadata.

Fields:

docs: raw document strings
tokens: tokenized words
stems: stemmed tokens
sw_set: set of stopwords
...

source

KeMontielOleaNesbit2024.algo1_only_store_draws — Method

algo1_only_store_draws(gamma1, lam1, gamma2, lam2, eps, T, save_folder; post_draw_num, beta, random_seed)

Draws posterior samples for B and Θ from two gamma/lambda priors, solves NMF, and stores the outputs as .jld2 files.

source

KeMontielOleaNesbit2024.algo2_range — Method

algo2_range(N, I_sim)

Simulates and plots the approximation quality of Algorithm 2 over a range of possible n11 values.

source

KeMontielOleaNesbit2024.compute_functional_from_nmf_draws — Method

compute_functional_from_nmf_draws(FOMC_sec, func, prior_post_draw_name, NMF_draw_folder_name)

Computes a statistic over the posterior NMF draws for a specific FOMC section.

Arguments

FOMC_sec: 1 or 2
func: a function (e.g. HHIpercentdiff) to apply on Herfindahl indices
prior_post_draw_name: path to posterior draw .jld2 file
NMF_draw_folder_name: path to folder with NMF iteration draws

Returns

H_diff_percent: average percent change for each draw
lambda_lower_percent, lambda_upper_percent: lower/upper bounds from NMF path draws

source

KeMontielOleaNesbit2024.credible_set90_range — Method

credible_set90_range(N, I_sim)

Generates 90% robust credible intervals for the function lambda_example using simulation draws.

source

KeMontielOleaNesbit2024.do_plots — Method

do_plots()

Generates all plots related to NMF posterior means and functional measures. Assumes that NMF output has already been generated and saved.

source

KeMontielOleaNesbit2024.find_NMF_given_solution — Method

find_NMF_given_solution(B_init, Theta_init, beta, T, eps; maxit, verbose, random_seed)

Solves a posterior NMF decomposition by iteratively applying Algorithm 1.

Returns

A tuple of lists: (B_list, Theta_list) with matrices from each iteration.

source

KeMontielOleaNesbit2024.gen_NMF — Method

gen_NMF()

Runs the full NMF draw generation pipeline. It:

Estimates OnlineLDA for FOMC1 and FOMC2
Draws posterior B and Θ samples
Applies NMF
Saves all outputs into the NMF_draws_folder

Includes both posterior and prior-based NMF draw scenarios.

source

KeMontielOleaNesbit2024.generate_tf_only_matrix — Function

generate_tf_only_matrix(tf_idf_threshold::Vector{Int}, additional_stop_words::Vector{String}, option)

Generates term-frequency-only matrices for each FOMC section and saves them as Excel and JSON files.

Arguments

tf_idf_threshold: max number of words to retain per section
additional_stop_words: extra stopwords to exclude
option: return "matrix", "text", or nothing

Returns

Depending on option, returns term-document matrices or tokenized meeting texts

source

KeMontielOleaNesbit2024.plot_word_cloud — Method

plot_word_cloud(text::Vector{Vector{String}}, filename::String)

Generates and saves a word cloud plot using all tokens from the provided text.

Arguments

text: nested vector of tokenized words (one subvector per document)
filename: file name to save the output PNG plot under PLOT_PATH

source

KeMontielOleaNesbit2024.preprocess — Method

preprocess()

Runs the main preprocessing pipeline for FOMC data:

Loads raw data
Applies speaker-based separation
Tokenizes and stems the content
Finds collocations
Outputs cleaned data to Excel

source

KeMontielOleaNesbit2024.run — Method

run()

Runs the full project pipeline:

Extracts and preprocesses FOMC meeting transcripts,
Generates TF-only matrices,
Creates word clouds,
Performs variational Bayes topic modeling using OnlineLDA,
Generates and saves NMF posterior draws,
Plots results and simulation outputs.

Outputs are saved to configured CACHE_PATH, MATRIX_PATH, and PLOT_PATH.

source

KeMontielOleaNesbit2024.run_tests — Method

run_tests()

Executes the complete test suite from the /test directory using included unit and integration tests.

source

KeMontielOleaNesbit2024.simulation_plots — Method

simulation_plots()

Runs all simulation visualizations:

Algorithm 2 range plots
Robust credible set plots
Monte Carlo illustrations

Each plot is saved as a PNG in PLOT_PATH.

source

KeMontielOleaNesbit2024.vb_estimate — Method

vb_estimate(section::String; onlyTF, K, alpha, eta, tau, kappa, docs_idx_list, random_seed)

Performs OnlineLDA variational Bayes estimation on preprocessed text.

Arguments

section: "FOMC1" or "FOMC2"
onlyTF: whether to use TF-only dictionary
K: number of topics
alpha, eta: Dirichlet prior parameters
tau, kappa: learning rate params
docs_idx_list: optional subset of documents
random_seed: reproducibility

Returns

herfindahl: vector of Herfindahl indices per document
posterior_mean: normalized gamma matrix
gamma: document-topic matrix
lambda: topic-word matrix
model: fitted OnlineLDA instance
text1: input corpus as a vector of strings

source