diff --git a/packages/deparser/QUOTING-RULES.md b/packages/deparser/QUOTING-RULES.md new file mode 100644 index 00000000..ec8208f3 --- /dev/null +++ b/packages/deparser/QUOTING-RULES.md @@ -0,0 +1,376 @@ +# PostgreSQL Identifier Quoting Rules + +This document describes the identifier quoting strategy used by the pgsql-deparser package. It explains the underlying PostgreSQL rules, our implementation choices, and guidance for contributors. + +## Goals and Non-Goals + +### Goals + +The deparser aims to emit valid PostgreSQL SQL that can be re-parsed by PostgreSQL (and by libpg-query). Additionally, we aim to emit minimally-quoted SQL when safe, especially for qualified names, in order to reduce noisy diffs. For example, we prefer `faker.float` over `faker."float"` when the grammar allows it. + +### Non-Goals + +We do not preserve the user's original quoting style. If the input SQL used `"mycolumn"` but `mycolumn` is safe to emit unquoted, we may emit it unquoted. + +We do not preserve identifier case unless quoting is required. Unquoted identifiers are folded to lowercase by PostgreSQL, so `MyFunc` and `myfunc` are equivalent when unquoted. If case preservation is needed, the identifier must be quoted. + +We do not implement PostgreSQL's `quote_all_identifiers` mode (used by pg_dump for deterministic dumps). This could be added as a deparser option in the future if needed. + +## PostgreSQL Background: What Quoting Means + +### Identifier Folding + +PostgreSQL folds unquoted identifiers to lowercase. This means: + +- `SELECT * FROM MyTable` is equivalent to `SELECT * FROM mytable` +- `SELECT * FROM "MyTable"` preserves the mixed case and refers to a table literally named `MyTable` + +If you need to preserve uppercase letters, spaces, or special characters in an identifier, you must quote it. + +### When Quotes Are Required + +An identifier must be quoted if any of the following are true: + +1. **First character is not a lowercase letter or underscore**: Identifiers starting with uppercase letters, digits, or special characters must be quoted. + +2. **Contains characters outside the safe set**: Only lowercase letters (`a-z`), digits (`0-9`), and underscores (`_`) are considered safe. Note that `$` is explicitly excluded even though PostgreSQL allows it in some contexts; our implementation is conservative. + +3. **Is a SQL keyword**: Keywords must be quoted to be used as identifiers, with one exception: `UNRESERVED_KEYWORD` tokens can be used as identifiers without quotes in most contexts. + +4. **Contains embedded double quotes**: These must be escaped by doubling them (`"` becomes `""`). + +### Keyword Categories in PostgreSQL + +PostgreSQL classifies keywords into four categories (defined in `kwlist.h` / our `kwlist.ts`): + +| Category | Description | Quoting Required? | +|----------|-------------|-------------------| +| `UNRESERVED_KEYWORD` | Can be used as identifiers in most contexts | No | +| `COL_NAME_KEYWORD` | Reserved in some contexts, allowed as column names | Yes (in strict mode) | +| `TYPE_FUNC_NAME_KEYWORD` | Reserved in some contexts, allowed as type/function names | Yes (in strict mode) | +| `RESERVED_KEYWORD` | Fully reserved, cannot be used as identifiers without quotes | Yes | + +Examples: +- `abort`, `absolute`, `access` are `UNRESERVED_KEYWORD` - no quoting needed +- `float`, `interval`, `boolean` are `COL_NAME_KEYWORD` - quoting depends on context +- `left`, `right`, `join` are `TYPE_FUNC_NAME_KEYWORD` - quoting depends on context +- `select`, `from`, `where`, `table` are `RESERVED_KEYWORD` - always need quoting as identifiers + +## The Strict Quoting Algorithm + +The `QuoteUtils.quoteIdentifier()` function implements PostgreSQL's `quote_identifier()` algorithm from `ruleutils.c`. This is the "strict" or "canonical" quoting policy. + +### Algorithm + +``` +function quoteIdentifier(ident): + if ident is empty: + return ident + + safe = true + + // Rule 1: First character must be lowercase letter or underscore + if first_char not in [a-z_]: + safe = false + + // Rule 2: All characters must be in safe set + for each char in ident: + if char not in [a-z0-9_]: + safe = false + + // Rule 3: Must not be a keyword (except UNRESERVED_KEYWORD) + if safe: + kwKind = keywordKindOf(ident) + if kwKind != NO_KEYWORD and kwKind != UNRESERVED_KEYWORD: + safe = false + + if safe: + return ident // No quoting needed + + // Build quoted identifier with escaped embedded quotes + result = '"' + for each char in ident: + if char == '"': + result += '"' // Escape " as "" + result += char + result += '"' + + return result +``` + +### Examples + +| Input | Output | Reason | +|-------|--------|--------| +| `mytable` | `mytable` | All lowercase, not a keyword | +| `my_table_2` | `my_table_2` | Safe characters only | +| `MyTable` | `"MyTable"` | Contains uppercase | +| `my-table` | `"my-table"` | Contains hyphen | +| `my table` | `"my table"` | Contains space | +| `2fast` | `"2fast"` | Starts with digit | +| `float` | `"float"` | COL_NAME_KEYWORD | +| `select` | `"select"` | RESERVED_KEYWORD | +| `abort` | `abort` | UNRESERVED_KEYWORD | +| `say"hello` | `"say""hello"` | Contains embedded quote | + +## Why a Second Policy Exists: Grammar-Slot Sensitivity + +PostgreSQL's `quote_identifier()` function is intentionally conservative and context-free. It doesn't know where the identifier will be used, so it quotes anything that might cause problems in any context. + +However, PostgreSQL's grammar is context-sensitive. Different syntactic positions accept different sets of tokens. The key insight is: + +**Identifiers that appear after a dot (in qualified names) are in highly permissive grammar positions that accept all keyword categories, including `RESERVED_KEYWORD`.** + +This means that while `float()` as a standalone function call would fail to parse (because `float` is a `COL_NAME_KEYWORD`), `faker.float()` parses successfully because the `float` appears after a dot. + +### Empirical Verification + +We verified this behavior by testing with libpg-query: + +| SQL | Parses? | Reason | +|-----|---------|--------| +| `SELECT float()` | No | `float` is COL_NAME_KEYWORD, conflicts with type name | +| `SELECT faker.float()` | Yes | After dot, all keywords accepted | +| `SELECT select FROM t` | No | `select` is RESERVED_KEYWORD | +| `SELECT t.select FROM t` | Yes | After dot, all keywords accepted | +| `SELECT * FROM interval` | No | `interval` is COL_NAME_KEYWORD | +| `SELECT * FROM myschema.interval` | Yes | After dot, all keywords accepted | + +### The Deparser's Opportunity + +Since we know that identifiers after a dot are in permissive positions, we can emit them without keyword-based quoting. This produces cleaner output: + +- Instead of `faker."float"()` we emit `faker.float()` +- Instead of `pg_catalog."substring"()` we emit `pg_catalog.substring()` +- Instead of `t."select"` we emit `t.select` + +This is an intentional deviation from PostgreSQL's `quote_identifier()` behavior, designed to produce minimal quoting while still emitting valid SQL. + +## The Relaxed Quoting Algorithm (After-Dot) + +The `QuoteUtils.quoteIdentifierAfterDot()` function implements "lexical-only" quoting for identifiers in permissive after-dot positions. + +### Algorithm + +``` +function quoteIdentifierAfterDot(ident): + if ident is empty: + return ident + + safe = true + + // Rule 1: First character must be lowercase letter or underscore + if first_char not in [a-z_]: + safe = false + + // Rule 2: All characters must be in safe set + for each char in ident: + if char not in [a-z0-9_]: + safe = false + + // NOTE: No keyword check! Keywords are allowed after dots. + + if safe: + return ident // No quoting needed + + // Build quoted identifier with escaped embedded quotes + result = '"' + for each char in ident: + if char == '"': + result += '"' + result += char + result += '"' + + return result +``` + +### Key Difference from Strict Quoting + +The only difference is that `quoteIdentifierAfterDot()` does not check for keywords. It still quotes for: +- Uppercase letters (case preservation) +- Special characters (hyphens, spaces, etc.) +- Leading digits +- Embedded quotes + +### Examples + +| Input | quoteIdentifier() | quoteIdentifierAfterDot() | +|-------|-------------------|---------------------------| +| `mytable` | `mytable` | `mytable` | +| `MyTable` | `"MyTable"` | `"MyTable"` | +| `float` | `"float"` | `float` | +| `select` | `"select"` | `select` | +| `interval` | `"interval"` | `interval` | +| `my-col` | `"my-col"` | `"my-col"` | + +## Composition Helpers + +### quoteDottedName(parts: string[]) + +This helper applies the appropriate quoting policy to each part of a dotted name: + +- **First part**: Uses strict quoting (`quoteIdentifier()`) because the leading identifier often appears in less-permissive grammar slots +- **Subsequent parts**: Uses relaxed quoting (`quoteIdentifierAfterDot()`) because they appear after dots in permissive slots + +```typescript +static quoteDottedName(parts: string[]): string { + if (!parts || parts.length === 0) return ''; + if (parts.length === 1) { + return QuoteUtils.quoteIdentifier(parts[0]); + } + return parts.map((part, index) => + index === 0 + ? QuoteUtils.quoteIdentifier(part) + : QuoteUtils.quoteIdentifierAfterDot(part) + ).join('.'); +} +``` + +### Examples + +| Input Parts | Output | +|-------------|--------| +| `['mytable']` | `mytable` | +| `['myschema', 'mytable']` | `myschema.mytable` | +| `['faker', 'float']` | `faker.float` | +| `['pg_catalog', 'substring']` | `pg_catalog.substring` | +| `['select', 'from']` | `"select".from` | +| `['MySchema', 'MyTable']` | `"MySchema"."MyTable"` | + +### quoteQualifiedIdentifier(qualifier, ident) + +A convenience wrapper for two-part qualified names: + +```typescript +static quoteQualifiedIdentifier( + qualifier: string | null | undefined, + ident: string +): string { + if (qualifier) { + return `${QuoteUtils.quoteIdentifier(qualifier)}.${QuoteUtils.quoteIdentifierAfterDot(ident)}`; + } + return QuoteUtils.quoteIdentifier(ident); +} +``` + +## Deparser Integration Rules + +### The Problem with the String Visitor + +The deparser's `String` visitor processes individual string nodes from the AST. It cannot reliably determine whether a string is: +- A standalone identifier (needs strict quoting) +- Part of a qualified name (first part needs strict, rest needs relaxed) +- A string literal (needs single-quote escaping, not identifier quoting) + +Therefore, **dotted-name quoting must be done at the call sites that have the list of parts**, not in the String visitor. + +### Anti-Pattern: Don't Do This + +```typescript +// WRONG: This applies per-part quoting without slot context +const name = funcname.map(n => this.visit(n, context)).join('.'); +``` + +This pattern sends each part through the String visitor, which uses strict quoting for all parts. The result is over-quoted output like `faker."float"`. + +### Correct Pattern: Do This Instead + +```typescript +// CORRECT: Extract raw string parts and use quoteDottedName +const funcnameParts = funcname.map((n: any) => + n.String?.sval || n.String?.str || '' +).filter((s: string) => s); +const name = QuoteUtils.quoteDottedName(funcnameParts); +``` + +This extracts the raw string values and applies the correct quoting policy per position. + +### Where to Apply quoteDottedName + +The following AST node handlers should use `quoteDottedName()` for their name components: + +| Handler | Field | Description | +|---------|-------|-------------| +| `FuncCall` | `funcname` | Function name (e.g., `pg_catalog.substring`) | +| `CreateFunctionStmt` | `funcname` | Function being created | +| `ColumnRef` | `fields` | Column reference (e.g., `t.column`) | +| `RangeVar` | `catalogname`, `schemaname`, `relname` | Table reference | +| `TypeName` | `names` | Type name (e.g., `pg_catalog.int4`) | +| `CollateClause` | `collname` | Collation name | + +### The quoteIfNeeded Helper + +The `deparser.ts` file contains a `quoteIfNeeded()` method that routes to `QuoteUtils.quoteIdentifier()`. This is the strict quoting policy and should be used for standalone identifiers, not for dotted-name tails. + +## String Comparison Pitfall + +When the deparser needs to check for specific function names (e.g., to apply special SQL syntax for `pg_catalog.substring`), comparisons should be done carefully. + +### Potential Issue + +If you compare against the quoted output: + +```typescript +const name = QuoteUtils.quoteDottedName(funcnameParts); +if (name === 'pg_catalog.substring') { // This works because neither part needs quoting + // Special handling +} +``` + +This works for `pg_catalog.substring` because neither `pg_catalog` nor `substring` requires quoting. But it would fail for a hypothetical `pg_catalog.Select` because the output would be `pg_catalog."Select"`. + +### Recommended Approach + +For robustness, compare against the raw parts before quoting: + +```typescript +const funcnameParts = funcname.map((n: any) => n.String?.sval || ''); +if (funcnameParts.length === 2 && + funcnameParts[0] === 'pg_catalog' && + funcnameParts[1] === 'substring') { + // Special handling +} +``` + +Or create a helper that normalizes for comparison: + +```typescript +const rawName = funcnameParts.join('.'); // Unquoted for comparison +if (rawName === 'pg_catalog.substring') { + // Special handling +} +``` + +## Keyword Table Accuracy + +The correctness of the strict quoting algorithm depends on the accuracy of the keyword table in `kwlist.ts`. This table must match PostgreSQL's keyword classifications for the target PostgreSQL version. + +If the keyword table diverges from upstream PostgreSQL: +- Keywords missing from the table may be emitted unquoted when they should be quoted +- Keywords with incorrect categories may be quoted unnecessarily or insufficiently + +When updating to support new PostgreSQL versions, ensure `kwlist.ts` is synchronized with PostgreSQL's `kwlist.h`. + +## Summary: Which Function to Use + +| Scenario | Function | Example | +|----------|----------|---------| +| Standalone identifier | `quoteIdentifier()` | Column name in SELECT list | +| Dotted name (multi-part) | `quoteDottedName()` | `schema.table`, `schema.function` | +| Two-part qualified name | `quoteQualifiedIdentifier()` | `schema.table` | +| After-dot component only | `quoteIdentifierAfterDot()` | Indirection field access | +| String literal | `escape()` or `formatEString()` | String values in SQL | + +## Test Fixtures + +The quoting behavior is verified by test fixtures in `__fixtures__/kitchen-sink/pretty/`: + +- `quoting-1.sql` through `quoting-7.sql`: Test cases for `faker.float`, `faker.interval`, `faker.boolean`, and `pg_catalog.substring` + +The corresponding snapshots in `__tests__/pretty/__snapshots__/quoting-pretty.test.ts.snap` demonstrate the expected output with minimal quoting. + +## References + +- PostgreSQL `quote_identifier()`: [ruleutils.c](https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/ruleutils.c) +- PostgreSQL keyword list: [kwlist.h](https://github.com/postgres/postgres/blob/master/src/include/parser/kwlist.h) +- PostgreSQL documentation on identifiers: [SQL Syntax - Lexical Structure](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) diff --git a/packages/deparser/__tests__/pretty/__snapshots__/quoting-pretty.test.ts.snap b/packages/deparser/__tests__/pretty/__snapshots__/quoting-pretty.test.ts.snap index 2a7f41dc..5eee750a 100644 --- a/packages/deparser/__tests__/pretty/__snapshots__/quoting-pretty.test.ts.snap +++ b/packages/deparser/__tests__/pretty/__snapshots__/quoting-pretty.test.ts.snap @@ -1,7 +1,7 @@ // Jest Snapshot v1, https://jestjs.io/docs/snapshot-testing exports[`non-pretty: pretty/quoting-1.sql 1`] = ` -"CREATE FUNCTION faker."float"(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ +"CREATE FUNCTION faker.float(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ BEGIN RETURN min + random() * (max - min); END; @@ -9,7 +9,7 @@ $$ LANGUAGE plpgsql" `; exports[`non-pretty: pretty/quoting-2.sql 1`] = ` -"CREATE FUNCTION faker."float"(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ +"CREATE FUNCTION faker.float(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ BEGIN RETURN min + random() * (max - min); END; @@ -17,7 +17,7 @@ $$ LANGUAGE plpgsql" `; exports[`non-pretty: pretty/quoting-3.sql 1`] = ` -"CREATE FUNCTION faker."interval"(min int, max int) RETURNS interval AS $$ +"CREATE FUNCTION faker.interval(min int, max int) RETURNS interval AS $$ BEGIN RETURN make_interval(secs => (min + floor(random() * (max - min + 1)))::int); END; @@ -25,7 +25,7 @@ $$ LANGUAGE plpgsql" `; exports[`non-pretty: pretty/quoting-4.sql 1`] = ` -"CREATE FUNCTION faker."interval"(min int, max int) RETURNS interval AS $$ +"CREATE FUNCTION faker.interval(min int, max int) RETURNS interval AS $$ BEGIN RETURN make_interval(secs => (min + floor(random() * (max - min + 1)))::int); END; @@ -33,7 +33,7 @@ $$ LANGUAGE plpgsql" `; exports[`non-pretty: pretty/quoting-5.sql 1`] = ` -"CREATE FUNCTION faker."boolean"() RETURNS boolean AS $$ +"CREATE FUNCTION faker.boolean() RETURNS boolean AS $$ BEGIN RETURN random() < 0.5; END; @@ -41,17 +41,17 @@ $$ LANGUAGE plpgsql" `; exports[`non-pretty: pretty/quoting-6.sql 1`] = ` -"CREATE FUNCTION faker."boolean"() RETURNS boolean AS $$ +"CREATE FUNCTION faker.boolean() RETURNS boolean AS $$ BEGIN RETURN random() < 0.5; END; $$ LANGUAGE plpgsql" `; -exports[`non-pretty: pretty/quoting-7.sql 1`] = `"CREATE DOMAIN origin AS text CHECK (value = pg_catalog."substring"(value, '^(https?://[^/]*)'))"`; +exports[`non-pretty: pretty/quoting-7.sql 1`] = `"CREATE DOMAIN origin AS text CHECK (value = pg_catalog.substring(value, '^(https?://[^/]*)'))"`; exports[`pretty: pretty/quoting-1.sql 1`] = ` -"CREATE FUNCTION faker."float"(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ +"CREATE FUNCTION faker.float(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ BEGIN RETURN min + random() * (max - min); END; @@ -59,7 +59,7 @@ $$ LANGUAGE plpgsql" `; exports[`pretty: pretty/quoting-2.sql 1`] = ` -"CREATE FUNCTION faker."float"(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ +"CREATE FUNCTION faker.float(min double precision DEFAULT 0, max double precision DEFAULT 100) RETURNS double precision AS $$ BEGIN RETURN min + random() * (max - min); END; @@ -67,7 +67,7 @@ $$ LANGUAGE plpgsql" `; exports[`pretty: pretty/quoting-3.sql 1`] = ` -"CREATE FUNCTION faker."interval"(min int, max int) RETURNS interval AS $$ +"CREATE FUNCTION faker.interval(min int, max int) RETURNS interval AS $$ BEGIN RETURN make_interval(secs => (min + floor(random() * (max - min + 1)))::int); END; @@ -75,7 +75,7 @@ $$ LANGUAGE plpgsql" `; exports[`pretty: pretty/quoting-4.sql 1`] = ` -"CREATE FUNCTION faker."interval"(min int, max int) RETURNS interval AS $$ +"CREATE FUNCTION faker.interval(min int, max int) RETURNS interval AS $$ BEGIN RETURN make_interval(secs => (min + floor(random() * (max - min + 1)))::int); END; @@ -83,7 +83,7 @@ $$ LANGUAGE plpgsql" `; exports[`pretty: pretty/quoting-5.sql 1`] = ` -"CREATE FUNCTION faker."boolean"() RETURNS boolean AS $$ +"CREATE FUNCTION faker.boolean() RETURNS boolean AS $$ BEGIN RETURN random() < 0.5; END; @@ -91,7 +91,7 @@ $$ LANGUAGE plpgsql" `; exports[`pretty: pretty/quoting-6.sql 1`] = ` -"CREATE FUNCTION faker."boolean"() RETURNS boolean AS $$ +"CREATE FUNCTION faker.boolean() RETURNS boolean AS $$ BEGIN RETURN random() < 0.5; END; @@ -100,5 +100,5 @@ $$ LANGUAGE plpgsql" exports[`pretty: pretty/quoting-7.sql 1`] = ` "CREATE DOMAIN origin AS text - CHECK (value = pg_catalog."substring"(value, '^(https?://[^/]*)'))" + CHECK (value = pg_catalog.substring(value, '^(https?://[^/]*)'))" `; diff --git a/packages/deparser/src/deparser.ts b/packages/deparser/src/deparser.ts index 19ab64bb..9cdf9db3 100644 --- a/packages/deparser/src/deparser.ts +++ b/packages/deparser/src/deparser.ts @@ -1317,7 +1317,7 @@ export class Deparser implements DeparserVisitor { if (node.indirection && node.indirection.length > 0) { const indirectionStrs = ListUtils.unwrapList(node.indirection).map(item => { if (item.String) { - return `.${QuoteUtils.quoteIdentifier(item.String.sval || item.String.str)}`; + return `.${QuoteUtils.quoteIdentifierAfterDot(item.String.sval || item.String.str)}`; } return this.visit(item, context); }); @@ -1335,7 +1335,7 @@ export class Deparser implements DeparserVisitor { if (node.indirection && node.indirection.length > 0) { const indirectionStrs = ListUtils.unwrapList(node.indirection).map(item => { if (item.String) { - return `.${QuoteUtils.quoteIdentifier(item.String.sval || item.String.str)}`; + return `.${QuoteUtils.quoteIdentifierAfterDot(item.String.sval || item.String.str)}`; } return this.visit(item, context); }); @@ -1421,7 +1421,8 @@ export class Deparser implements DeparserVisitor { FuncCall(node: t.FuncCall, context: DeparserContext): string { const funcname = ListUtils.unwrapList(node.funcname); const args = ListUtils.unwrapList(node.args); - const name = funcname.map(n => this.visit(n, context)).join('.'); + const funcnameParts = funcname.map((n: any) => n.String?.sval || n.String?.str || '').filter((s: string) => s); + const name = QuoteUtils.quoteDottedName(funcnameParts); // Handle special SQL syntax functions like XMLEXISTS and EXTRACT if (node.funcformat === 'COERCE_SQL_SYNTAX' && name === 'pg_catalog.xmlexists' && args.length >= 2) { @@ -2018,9 +2019,9 @@ export class Deparser implements DeparserVisitor { if (node.catalogname) { tableName = QuoteUtils.quoteIdentifier(node.catalogname); if (node.schemaname) { - tableName += '.' + QuoteUtils.quoteIdentifier(node.schemaname); + tableName += '.' + QuoteUtils.quoteIdentifierAfterDot(node.schemaname); } - tableName += '.' + QuoteUtils.quoteIdentifier(node.relname); + tableName += '.' + QuoteUtils.quoteIdentifierAfterDot(node.relname); } else if (node.schemaname) { tableName = QuoteUtils.quoteQualifiedIdentifier(node.schemaname, node.relname); } else { @@ -5490,7 +5491,8 @@ export class Deparser implements DeparserVisitor { } if (node.funcname && node.funcname.length > 0) { - const funcName = node.funcname.map((name: any) => this.visit(name, context)).join('.'); + const funcnameParts = node.funcname.map((name: any) => name.String?.sval || name.String?.str || '').filter((s: string) => s); + const funcName = QuoteUtils.quoteDottedName(funcnameParts); if (node.parameters && node.parameters.length > 0) { const params = node.parameters diff --git a/packages/deparser/src/utils/quote-utils.ts b/packages/deparser/src/utils/quote-utils.ts index a2668690..853572a9 100644 --- a/packages/deparser/src/utils/quote-utils.ts +++ b/packages/deparser/src/utils/quote-utils.ts @@ -52,7 +52,6 @@ export class QuoteUtils { static quoteIdentifier(ident: string): string { if (!ident) return ident; - let nquotes = 0; let safe = true; // Check first character: must be lowercase letter or underscore @@ -70,9 +69,6 @@ export class QuoteUtils { // okay } else { safe = false; - if (ch === '"') { - nquotes++; - } } } @@ -104,18 +100,88 @@ export class QuoteUtils { return result; } + /** + * Quote an identifier that appears after a dot in a qualified name. + * + * In PostgreSQL's grammar, identifiers that appear after a dot (e.g., schema.name, + * table.column) are in a more permissive position that accepts all keyword categories + * including RESERVED_KEYWORD. This means we only need to quote for lexical reasons + * (uppercase, special characters, leading digits) not for keyword reasons. + * + * Empirically verified: `myschema.select`, `myschema.float`, `t.from` all parse + * successfully in PostgreSQL without quotes. + */ + static quoteIdentifierAfterDot(ident: string): string { + if (!ident) return ident; + + let safe = true; + + const firstChar = ident[0]; + if (!((firstChar >= 'a' && firstChar <= 'z') || firstChar === '_')) { + safe = false; + } + + for (let i = 0; i < ident.length; i++) { + const ch = ident[i]; + if ((ch >= 'a' && ch <= 'z') || + (ch >= '0' && ch <= '9') || + (ch === '_')) { + // okay + } else { + safe = false; + } + } + + if (safe) { + return ident; + } + + let result = '"'; + for (let i = 0; i < ident.length; i++) { + const ch = ident[i]; + if (ch === '"') { + result += '"'; + } + result += ch; + } + result += '"'; + + return result; + } + + /** + * Quote a dotted name (e.g., schema.table, catalog.schema.table). + * + * The first part uses strict quoting (keywords are quoted), while subsequent + * parts use relaxed quoting (keywords allowed, only quote for lexical reasons). + * + * This reflects PostgreSQL's grammar where the first identifier in a statement + * may conflict with keywords, but identifiers after a dot are in a more + * permissive position. + */ + static quoteDottedName(parts: string[]): string { + if (!parts || parts.length === 0) return ''; + if (parts.length === 1) { + return QuoteUtils.quoteIdentifier(parts[0]); + } + return parts.map((part, index) => + index === 0 ? QuoteUtils.quoteIdentifier(part) : QuoteUtils.quoteIdentifierAfterDot(part) + ).join('.'); + } + /** * Quote a possibly-qualified identifier * - * This is a TypeScript port of PostgreSQL's quote_qualified_identifier() function from ruleutils.c - * https://github.com/postgres/postgres/blob/fab5cd3dd1323f9e66efeb676c4bb212ff340204/src/backend/utils/adt/ruleutils.c#L13139-L13156 + * This is inspired by PostgreSQL's quote_qualified_identifier() function from ruleutils.c + * but uses relaxed quoting for the tail component since PostgreSQL's grammar accepts + * all keywords in qualified name positions. * * Return a name of the form qualifier.ident, or just ident if qualifier * is null/undefined, quoting each component if necessary. */ static quoteQualifiedIdentifier(qualifier: string | null | undefined, ident: string): string { if (qualifier) { - return `${QuoteUtils.quoteIdentifier(qualifier)}.${QuoteUtils.quoteIdentifier(ident)}`; + return `${QuoteUtils.quoteIdentifier(qualifier)}.${QuoteUtils.quoteIdentifierAfterDot(ident)}`; } return QuoteUtils.quoteIdentifier(ident); }