KeMontielOleaNesbit2024

Documentation for KeMontielOleaNesbit2024.

KeMontielOleaNesbit2024.RawDocsType
RawDocs

Structure to hold document data, including tokens, stems, and metadata.

Fields:

  • docs: raw document strings
  • tokens: tokenized words
  • stems: stemmed tokens
  • sw_set: set of stopwords
  • ...
source
KeMontielOleaNesbit2024.algo1_only_store_drawsMethod
algo1_only_store_draws(gamma1, lam1, gamma2, lam2, eps, T, save_folder; post_draw_num, beta, random_seed)

Draws posterior samples for B and Θ from two gamma/lambda priors, solves NMF, and stores the outputs as .jld2 files.

source
KeMontielOleaNesbit2024.compute_functional_from_nmf_drawsMethod
compute_functional_from_nmf_draws(FOMC_sec, func, prior_post_draw_name, NMF_draw_folder_name)

Computes a statistic over the posterior NMF draws for a specific FOMC section.

Arguments

  • FOMC_sec: 1 or 2
  • func: a function (e.g. HHIpercentdiff) to apply on Herfindahl indices
  • prior_post_draw_name: path to posterior draw .jld2 file
  • NMF_draw_folder_name: path to folder with NMF iteration draws

Returns

  • H_diff_percent: average percent change for each draw
  • lambda_lower_percent, lambda_upper_percent: lower/upper bounds from NMF path draws
source
KeMontielOleaNesbit2024.do_plotsMethod
do_plots()

Generates all plots related to NMF posterior means and functional measures. Assumes that NMF output has already been generated and saved.

source
KeMontielOleaNesbit2024.find_NMF_given_solutionMethod
find_NMF_given_solution(B_init, Theta_init, beta, T, eps; maxit, verbose, random_seed)

Solves a posterior NMF decomposition by iteratively applying Algorithm 1.

Returns

  • A tuple of lists: (B_list, Theta_list) with matrices from each iteration.
source
KeMontielOleaNesbit2024.gen_NMFMethod
gen_NMF()

Runs the full NMF draw generation pipeline. It:

  • Estimates OnlineLDA for FOMC1 and FOMC2
  • Draws posterior B and Θ samples
  • Applies NMF
  • Saves all outputs into the NMF_draws_folder

Includes both posterior and prior-based NMF draw scenarios.

source
KeMontielOleaNesbit2024.generate_tf_only_matrixFunction
generate_tf_only_matrix(tf_idf_threshold::Vector{Int}, additional_stop_words::Vector{String}, option)

Generates term-frequency-only matrices for each FOMC section and saves them as Excel and JSON files.

Arguments

  • tf_idf_threshold: max number of words to retain per section
  • additional_stop_words: extra stopwords to exclude
  • option: return "matrix", "text", or nothing

Returns

  • Depending on option, returns term-document matrices or tokenized meeting texts
source
KeMontielOleaNesbit2024.plot_word_cloudMethod
plot_word_cloud(text::Vector{Vector{String}}, filename::String)

Generates and saves a word cloud plot using all tokens from the provided text.

Arguments

  • text: nested vector of tokenized words (one subvector per document)
  • filename: file name to save the output PNG plot under PLOT_PATH
source
KeMontielOleaNesbit2024.preprocessMethod
preprocess()

Runs the main preprocessing pipeline for FOMC data:

  • Loads raw data
  • Applies speaker-based separation
  • Tokenizes and stems the content
  • Finds collocations
  • Outputs cleaned data to Excel
source
KeMontielOleaNesbit2024.runMethod
run()

Runs the full project pipeline:

  • Extracts and preprocesses FOMC meeting transcripts,
  • Generates TF-only matrices,
  • Creates word clouds,
  • Performs variational Bayes topic modeling using OnlineLDA,
  • Generates and saves NMF posterior draws,
  • Plots results and simulation outputs.

Outputs are saved to configured CACHE_PATH, MATRIX_PATH, and PLOT_PATH.

source
KeMontielOleaNesbit2024.vb_estimateMethod
vb_estimate(section::String; onlyTF, K, alpha, eta, tau, kappa, docs_idx_list, random_seed)

Performs OnlineLDA variational Bayes estimation on preprocessed text.

Arguments

  • section: "FOMC1" or "FOMC2"
  • onlyTF: whether to use TF-only dictionary
  • K: number of topics
  • alpha, eta: Dirichlet prior parameters
  • tau, kappa: learning rate params
  • docs_idx_list: optional subset of documents
  • random_seed: reproducibility

Returns

  • herfindahl: vector of Herfindahl indices per document
  • posterior_mean: normalized gamma matrix
  • gamma: document-topic matrix
  • lambda: topic-word matrix
  • model: fitted OnlineLDA instance
  • text1: input corpus as a vector of strings
source