Statistical Metrics

Statistical metrics provide hypothesis testing and significance assessment for word associations.

using TextAssociations, DataFrames

text = """
Machine learning uses algorithms to find patterns.
Deep learning is a subset of machine learning.
Algorithms process data to extract patterns.
"""
ct = ContingencyTable(text, "statistics"; windowsize=3, minfreq=1)

nothing

Chi-Square Test

Theory

The chi-square test measures the difference between observed and expected frequencies:

\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

Implementation

using TextAssociations

text = """
Statistical analysis requires careful statistical methods.
Statistical significance indicates statistical relationships.
Random patterns show no statistical correlation.
"""

ct = ContingencyTable(text, "statistical"; windowsize=4, minfreq=1)
results = assoc_score([ChiSquare, Tscore, Zscore], ct)

println("Statistical Tests for 'statistical':")
for row in eachrow(results)
    # Chi-square critical values (df=1)
    chi_sig = if row.ChiSquare > 10.83
        "p < 0.001"
    elseif row.ChiSquare > 6.63
        "p < 0.01"
    elseif row.ChiSquare > 3.84
        "p < 0.05"
    else
        "not significant"
    end

    println("\n$(row.Collocate):")
    println("  χ² = $(round(row.ChiSquare, digits=2)) ($chi_sig)")
    println("  t-score = $(round(row.Tscore, digits=2))")
    println("  z-score = $(round(row.Zscore, digits=2))")
end

Statistical Tests for 'statistical':

analysis:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

careful:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

correlation:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

indicates:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

methods:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

no:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

patterns:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

random:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

relationships:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

requires:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

show:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

significance:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

statistical:
  χ² = 0.0 (not significant)
  t-score = 0.0
  z-score = 0.0

T-Score

Theory

The t-score measures the confidence that the observed frequency differs from expected:

\[t = \frac{O - E}{\sqrt{O}}\]

Application

using TextAssociations

# T-score for collocation strength
t_results = assoc_score(Tscore, ct)

println("\nT-score interpretation:")
for row in eachrow(t_results)
    confidence = if abs(row.Tscore) > 2.576
        "99% confidence"
    elseif abs(row.Tscore) > 1.96
        "95% confidence"
    elseif abs(row.Tscore) > 1.645
        "90% confidence"
    else
        "not confident"
    end

    println("  $(row.Collocate): t=$(round(row.Tscore, digits=2)) ($confidence)")
end


T-score interpretation:

Z-Score

Theory

Z-score standardizes the difference between observed and expected:

\[z = \frac{O - E}{\sigma}\]

Comparison with T-score

using TextAssociations

# Compare T-score and Z-score
both = assoc_score([Tscore, Zscore], ct)

println("\nT-score vs Z-score:")
println("T-score: Uses observed frequency in denominator")
println("Z-score: Uses theoretical standard deviation")

for row in eachrow(first(both, 3))
    println("\n$(row.Collocate):")
    println("  T-score: $(round(row.Tscore, digits=2))")
    println("  Z-score: $(round(row.Zscore, digits=2))")

    if abs(row.Zscore) > abs(row.Tscore)
        println("  → Z-score shows stronger evidence")
    else
        println("  → T-score shows stronger evidence")
    end
end


T-score vs Z-score:
T-score: Uses observed frequency in denominator
Z-score: Uses theoretical standard deviation

Fisher's Exact Test

While not directly implemented, Fisher's exact test is important for small samples:

using TextAssociations

function explain_fisher_test()
    println("Fisher's Exact Test:")
    println("  Use when: Sample sizes are small")
    println("  Advantage: Exact p-values (not asymptotic)")
    println("  Disadvantage: Computationally intensive")
    println("\nRule of thumb: Use Fisher's when any expected frequency < 5")

    # Check if Fisher's would be recommended
    function needs_fisher(ct::ContingencyTable)
        data = cached_data(ct.con_tbl)
        if !isempty(data)
            min_expected = minimum([data.E₁₁, data.E₁₂, data.E₂₁, data.E₂₂])
            return any(min_expected .< 5)
        end
        return false
    end

    return needs_fisher
end

checker = explain_fisher_test()

# Check our contingency table
if checker(ct)
    println("\n⚠ This data might benefit from Fisher's exact test")
else
    println("\n✓ Chi-square test is appropriate for this data")
end

Fisher's Exact Test:
  Use when: Sample sizes are small
  Advantage: Exact p-values (not asymptotic)
  Disadvantage: Computationally intensive

Rule of thumb: Use Fisher's when any expected frequency < 5

✓ Chi-square test is appropriate for this data

Critical Values and P-values

Reference Table

using TextAssociations, DataFrames

critical_values = DataFrame(
    Test = ["Chi-square (df=1)", "T-score", "Z-score"],
    p_0_05 = [3.84, 1.96, 1.96],
    p_0_01 = [6.63, 2.576, 2.576],
    p_0_001 = [10.83, 3.291, 3.291],
    p_0_0001 = [15.13, 3.891, 3.891]
)

println("Critical Values for Statistical Tests:")
for row in eachrow(critical_values)
    println("\n$(row.Test):")
    println("  p < 0.05:  $(row.p_0_05)")
    println("  p < 0.01:  $(row.p_0_01)")
    println("  p < 0.001: $(row.p_0_001)")
    println("  p < 0.0001: $(row.p_0_0001)")
end

Critical Values for Statistical Tests:

Chi-square (df=1):
  p < 0.05:  3.84
  p < 0.01:  6.63
  p < 0.001: 10.83
  p < 0.0001: 15.13

T-score:
  p < 0.05:  1.96
  p < 0.01:  2.576
  p < 0.001: 3.291
  p < 0.0001: 3.891

Z-score:
  p < 0.05:  1.96
  p < 0.01:  2.576
  p < 0.001: 3.291
  p < 0.0001: 3.891

Effect Size vs Significance

Important Distinction

using TextAssociations

# High significance doesn't always mean large effect
text_large = repeat("word1 word2 ", 1000)  # Large sample
text_small = "word1 word2 word1 word2"     # Small sample

ct_large = ContingencyTable(text_large, "word1"; windowsize=2, minfreq=1)
ct_small = ContingencyTable(text_small, "word1"; windowsize=2, minfreq=1)

println("Statistical Significance vs Effect Size:")

# Large sample
if !isempty(assoc_score(ChiSquare, ct_large))
    large_chi = first(assoc_score(ChiSquare, ct_large)).ChiSquare
    large_pmi = first(assoc_score(PMI, ct_large)).PMI

    println("\nLarge sample (n=2000):")
    println("  χ² = $(round(large_chi, digits=2)) (high significance)")
    println("  PMI = $(round(large_pmi, digits=2)) (effect size)")
end

# Small sample
if !isempty(assoc_score(ChiSquare, ct_small))
    small_chi = first(assoc_score(ChiSquare, ct_small)).ChiSquare
    small_pmi = first(assoc_score(PMI, ct_small)).PMI

    println("\nSmall sample (n=4):")
    println("  χ² = $(round(small_chi, digits=2)) (low significance)")
    println("  PMI = $(round(small_pmi, digits=2)) (effect size)")
end

println("\n→ Large samples can show significance for small effects")
println("→ Always report both significance AND effect size")

Statistical Significance vs Effect Size:

→ Large samples can show significance for small effects
→ Always report both significance AND effect size

Multiple Testing Correction

Bonferroni Correction

using TextAssociations

function bonferroni_correction(p_values::Vector{Float64}, alpha::Float64=0.05)
    n_tests = length(p_values)
    corrected_alpha = alpha / n_tests

    println("Bonferroni Correction:")
    println("  Number of tests: $n_tests")
    println("  Original α: $alpha")
    println("  Corrected α: $(round(corrected_alpha, digits=4))")

    # Which tests remain significant?
    significant = p_values .< corrected_alpha
    println("  Significant after correction: $(sum(significant))/$n_tests")

    return corrected_alpha
end

# Example with multiple comparisons
p_values = [0.001, 0.01, 0.02, 0.03, 0.04]
bonferroni_correction(p_values)

0.01

Best Practices

1. Combining Statistical Tests

using TextAssociations, DataFrames

# Create a sample contingency table first
text = "Statistical analysis requires careful statistical methods"
ct = ContingencyTable(text, "statistical"; windowsize=3, minfreq=1)

using TextAssociations

function comprehensive_statistical_test(ct::ContingencyTable)
    # Use multiple tests
    results = assoc_score([ChiSquare, LLR, Tscore, Zscore], ct)

    if !isempty(results)
        # Add consensus column
        results.Consensus = zeros(Int, nrow(results))

        for i in 1:nrow(results)
            consensus = 0
            consensus += results[i, :ChiSquare] > 10.83 ? 1 : 0  # p < 0.001
            consensus += results[i, :LLR] > 10.83 ? 1 : 0
            consensus += abs(results[i, :Tscore]) > 3.291 ? 1 : 0
            consensus += abs(results[i, :Zscore]) > 3.291 ? 1 : 0
            results.Consensus[i] = consensus
        end

        # Filter by consensus
        strong = filter(row -> row.Consensus >= 3, results)

        println("Statistical Consensus (≥3/4 tests significant at p<0.001):")
        for row in eachrow(strong)
            println("  $(row.Collocate): $(row.Consensus)/4 tests agree")
        end
    end
end

comprehensive_statistical_test(ct)

Statistical Consensus (≥3/4 tests significant at p<0.001):

2. Sample Size Considerations

# Minimum sample sizes for reliable statistical tests
const MIN_SAMPLES = Dict(
    :chisquare => 20,   # All expected frequencies > 5
    :tscore => 10,      # Reasonable minimum
    :zscore => 30,      # Central limit theorem
    :fisher => 0        # Works for any size
)

3. Reporting Guidelines

using TextAssociations, DataFrames

# Create contingency table for analysis
text = "Statistical significance requires proper statistical testing methods"
ct = ContingencyTable(text, "statistical"; windowsize=3, minfreq=1)


function report_statistical_results(results::DataFrame)
    println("Statistical Analysis Report")
    println("="^40)

    for row in eachrow(first(results, 3))
        println("\nCollocate: $(row.Collocate)")
        println("  Frequency: $(row.Frequency)")

        if hasproperty(results, :ChiSquare)
            chi_p = row.ChiSquare > 10.83 ? "***" :
                   row.ChiSquare > 6.63 ? "**" :
                   row.ChiSquare > 3.84 ? "*" : "ns"
            println("  χ²(1) = $(round(row.ChiSquare, digits=2)) $chi_p")
        end

        if hasproperty(results, :Tscore)
            println("  t = $(round(row.Tscore, digits=2))")
        end

        # Report effect size alongside
        if hasproperty(results, :PMI)
            println("  Effect size (PMI) = $(round(row.PMI, digits=2))")
        end
    end

    println("\n" * "="^40)
    println("* p < 0.05, ** p < 0.01, *** p < 0.001")
end

results_full = assoc_score([ChiSquare, Tscore, PMI], ct)
report_statistical_results(results_full)

Statistical Analysis Report
========================================

Collocate: methods
  Frequency: 1
  χ²(1) = 0.0 ns
  t = 0.0
  Effect size (PMI) = -0.69

Collocate: proper
  Frequency: 1
  χ²(1) = 0.0 ns
  t = 0.0
  Effect size (PMI) = 0.0

Collocate: requires
  Frequency: 1
  χ²(1) = 0.0 ns
  t = 0.0
  Effect size (PMI) = 0.0

========================================
* p < 0.05, ** p < 0.01, *** p < 0.001

Next Steps

Learn about Similarity Metrics for symmetric measures
Explore Effect Size Metrics for practical significance
Review Choosing Metrics for selection guidance