Skip to content

Comments

Fixes state_abbr is not globally unique#19

Closed
jsm222 wants to merge 1 commit intomidwire:developfrom
jsm222:pr1
Closed

Fixes state_abbr is not globally unique#19
jsm222 wants to merge 1 commit intomidwire:developfrom
jsm222:pr1

Conversation

@jsm222
Copy link

@jsm222 jsm222 commented Oct 30, 2021

And fixes cases where countries do not have a state,
by using country as state.

  • I wrote specs to cover new or modified code in this PR
  • I ran rake spec locally and ALL specs pass

Related Issue: #18
Description: See comments in #18

And fixes cases where countries do not have a state,
by using country as state.
@jsm222 jsm222 changed the title Fixes state_abbr is not globlly unique Fixes state_abbr is not globally unique Oct 31, 2021
@midwire
Copy link
Owner

midwire commented Feb 14, 2026

PR #19 Review Summary: "Fixes state_abbr is not globally unique"

Files: 4 changed (+22/-6) | Branch: pr1develop
Purpose: Scopes state lookups by country to fix non-unique state abbreviations; adds fallback for countries without states.


Critical Issues (3 found — must fix before merge)

1. SQL Injection / SQL Breakage: Unescaped state_name in 3rd fallback query

lib/free_zipcode_data/db_table.rb — third if(res == nil) block

The third fallback query interpolates state_name without escape_single_quotes. State names like "Cote d'Ivoire" will produce malformed SQL, crashing the pipeline. Additionally, state_abbr is never escaped in any of the three queries.

2. Pipeline crash: NoMethodError on unknown country codes

lib/free_zipcode_data/state_table.rb — new if block in write

row[:state] = country_lookup_table[row[:country]][:name]

If row[:country] isn't in country_lookup_table.yml, this calls [:name] on nil, crashing the entire ETL pipeline with no useful error message. This is especially likely in "all countries" mode — the exact use case this PR enables.

3. Unique index on state_name not scoped by country_id

lib/free_zipcode_data/state_table.rb — existing build method

The state_name unique index is ON (name COLLATE NOCASE ASC) without country_id. When the PR maps countries-without-states to their country name, any two countries sharing a state name will silently lose data via duplicate constraint violations. This undermines the core fix.


Important Issues (5 found — should fix)

4. No test coverage — existing tests will break

The PR modifies zero test files. The PR checklist confirms specs were not written and rake spec was not run. At least 5 distinct test failures will occur:

  • 3x ArgumentError in db_table_spec.rb (get_state_id now takes 3 args, tests pass 2)
  • state_table_spec.rb "returns nil when short_state is nil" will fail (behavior changed)
  • county_table_spec.rb and zipcode_table_spec.rb rows lack :country key, silently breaking state lookups
  • seed_counties helper also lacks :country, causing cascading failures

5. Silent data loss: no logging in fallback chain

lib/free_zipcode_data/db_table.rbget_state_id

When all 3 fallback queries return nil, the row is silently dropped. No logging at any fallback level means users have no way to know how many rows were discarded or why.

6. Mutation of shared Kiba row hash

lib/free_zipcode_data/state_table.rbwrite

The row hash is mutated in-place (row[:short_state] = row[:country]). Since Kiba passes the same object to all destinations, this creates an undocumented ordering dependency between StateTable and downstream CountyTable/ZipcodeTable.

7. Multiple RuboCop violations

  • Missing spaces after commas: get_state_id(country,state_abbr, state_name)
  • Non-idiomatic nil checks: if(res == nil) → should be if res.nil?
  • Explicit return res → should be implicit res
  • Inconsistent indentation (10-space vs 8-space in if blocks)
  • Lines likely exceed 110-char max

8. Non-standard SQL style

  • Uses == instead of = for comparisons (SQLite accepts it, but inconsistent with existing codebase)
  • Double space in and s.name
  • String literals instead of heredocs for multi-line SQL (inconsistent with project convention)

Strengths

  • Correctly identifies a real bug: state abbreviations are not globally unique across countries
  • The approach of disambiguating by country via JOIN is sound
  • The 3-step fallback (abbr+name+country → abbr+country → name+country) handles real-world data quality issues gracefully
  • Addresses the root cause reported in issue state abbreviations are not globally unique #18

Recommended Action

Request changes. This PR should not be merged in its current state. Priority order:

  1. Fix the unescaped SQL interpolation (escape all values consistently)
  2. Add nil guard for country_lookup_table access
  3. Update state_name unique index to include country_id
  4. Update all existing tests for the new 3-argument signature and add :country to row hashes
  5. Add test coverage for the 3-step fallback chain and nil/empty short_state fallback
  6. Fix RuboCop violations (spaces, idiomatic Ruby, indentation)
  7. Normalize SQL style (= not ==, heredocs, consistent escaping)

@midwire
Copy link
Owner

midwire commented Feb 17, 2026

Closing in favor of #29

@midwire midwire closed this Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants