Skip to content

Conversation

@Joannall
Copy link
Collaborator

@Joannall Joannall commented Dec 17, 2025

PR Type

Enhancement


Description

  • Implement intelligent column type inference from Excel data

  • Add column analysis with distinct values and type statistics

  • Generate SQLite indexes for all table columns

  • Refactor cell formatting logic for improved maintainability


Diagram Walkthrough

flowchart LR
  A["Excel Sheet"] -->|"Parse columns"| B["CollectColumnDistinctValues"]
  B -->|"Analyze data types"| C["ColumnAnalysis"]
  C -->|"Infer types"| D["InferColumnTypes"]
  D -->|"Generate DDL"| E["CreateDBTableSqlString"]
  E -->|"With indexes"| F["SQLite Table"]
  C -->|"Generate summary"| G["Analysis Report"]
Loading

File Walkthrough

Relevant files
Enhancement
ColumnAnalysis.cs
New ColumnAnalysis model for type inference                           

src/Plugins/BotSharp.Plugin.ExcelHandler/Models/ColumnAnalysis.cs

  • New model class to store column analysis metadata
  • Tracks distinct values, null counts, and type-specific counts
  • Supports numeric, date, boolean, and integer type detection
+14/-0   
SqliteService.cs
Implement type inference and indexing for SQLite                 

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs

  • Add CollectColumnDistinctValues method to analyze column data types
    and collect statistics
  • Implement InferColumnTypes method to determine optimal SQLite column
    types based on data analysis
  • Add GetCellValueAndType helper to detect cell types (NULL, NUMERIC,
    DATE, BOOLEAN, TEXT)
  • Add FormatCellForSql method to format cell values for SQL insertion
    with proper type handling
  • Add IsBooleanString and IsInteger helper methods for type detection
  • Add GenerateColumnAnalysisSummary method to create human-readable
    analysis reports
  • Refactor ParseSheetData to use new FormatCellForSql method, removing
    inline switch statement
  • Enhance CreateDBTableSqlString to generate CREATE INDEX statements for
    all columns
  • Update SqlCreateTableFn to collect and log column analysis before
    table creation
  • Include column analysis summary in table creation success message
+155/-32

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
SQL injection

Description: SQL statements are constructed via string interpolation using untrusted Excel-derived
identifiers (e.g., sheet.SheetName as tableName and header-derived column names) in
CreateDBTableSqlString without safely quoting/validating the table name and index name,
enabling SQL injection or malformed SQL (e.g., a sheet name like x; DROP TABLE ... would
be embedded into CREATE TABLE IF NOT EXISTS {tableName} and CREATE INDEX ...
idx_{tableName}_{col} ...).
SqliteService.cs [90-291]

Referred Code
    _tableName = sheet.SheetName;
    _headerColumns = ParseSheetColumn(sheet);

    // Collect column distinct values for type inference
    var columnAnalyses = CollectColumnDistinctValues(sheet);
    var inferredTypes = InferColumnTypes(columnAnalyses);

    // generate column summary
    var analysisSummary = GenerateColumnAnalysisSummary(columnAnalyses);
    _logger.LogInformation("Column Analysis:\n{Summary}", analysisSummary);

    string createTableSql = CreateDBTableSqlString(_tableName, _headerColumns, inferredTypes);
    var rowCount = ExecuteSqlQueryForInsertion(createTableSql);

    // Get table schema using sqlite query
    var schema = GenerateTableSchema();
    return (true, $"Table `{_tableName}` has been successfully created in {Provider}. Table schema:\r\n{schema}\r\n\r\nColumn Analysis:\r\n{analysisSummary}");
}
catch (Exception ex)
{
    return (false, ex.Message);


 ... (clipped 181 lines)
Sensitive data exposure

Description: The PR logs and returns a "Column Analysis" summary that includes up to 10 distinct cell
values per column (GenerateColumnAnalysisSummary), which can expose sensitive data from
spreadsheets into application logs and API/UX responses when processing confidential Excel
files.
SqliteService.cs [98-107]

Referred Code
    var analysisSummary = GenerateColumnAnalysisSummary(columnAnalyses);
    _logger.LogInformation("Column Analysis:\n{Summary}", analysisSummary);

    string createTableSql = CreateDBTableSqlString(_tableName, _headerColumns, inferredTypes);
    var rowCount = ExecuteSqlQueryForInsertion(createTableSql);

    // Get table schema using sqlite query
    var schema = GenerateTableSchema();
    return (true, $"Table `{_tableName}` has been successfully created in {Provider}. Table schema:\r\n{schema}\r\n\r\nColumn Analysis:\r\n{analysisSummary}");
}
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Leaks internal errors: The catch block returns ex.Message directly, which can expose internal database/stack
details to callers instead of a generic user-facing error with details only in secure
logs.

Referred Code
catch (Exception ex)
{
    return (false, ex.Message);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Logs raw data: The new column analysis logging includes sampled distinct cell values, which may contain
sensitive data (PII/PHI/secrets) and is written to logs without redaction or opt-out
controls.

Referred Code
var analysisSummary = GenerateColumnAnalysisSummary(columnAnalyses);
_logger.LogInformation("Column Analysis:\n{Summary}", analysisSummary);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
SQL injection risk: SQL statements are constructed via string interpolation using Excel-controlled identifiers
(sheet.SheetName as tableName and header names for index names) without strict
allowlisting/escaping, enabling SQL injection or malformed SQL (especially in CREATE TABLE
and CREATE INDEX).

Referred Code
    _tableName = sheet.SheetName;
    _headerColumns = ParseSheetColumn(sheet);

    // Collect column distinct values for type inference
    var columnAnalyses = CollectColumnDistinctValues(sheet);
    var inferredTypes = InferColumnTypes(columnAnalyses);

    // generate column summary
    var analysisSummary = GenerateColumnAnalysisSummary(columnAnalyses);
    _logger.LogInformation("Column Analysis:\n{Summary}", analysisSummary);

    string createTableSql = CreateDBTableSqlString(_tableName, _headerColumns, inferredTypes);
    var rowCount = ExecuteSqlQueryForInsertion(createTableSql);

    // Get table schema using sqlite query
    var schema = GenerateTableSchema();
    return (true, $"Table `{_tableName}` has been successfully created in {Provider}. Table schema:\r\n{schema}\r\n\r\nColumn Analysis:\r\n{analysisSummary}");
}
catch (Exception ex)
{
    return (false, ex.Message);


 ... (clipped 181 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Non-descriptive variables: The new type inference lambda uses short variable names (a, n) that reduce readability and
self-documentation for non-trivial logic.

Referred Code
private List<string> InferColumnTypes(Dictionary<string, ColumnAnalysis> columnAnalyses)
    => _headerColumns.Select(col =>
    {
        var a = columnAnalyses[col];
        var n = a.TotalCount - a.NullCount;
        return n == 0 ? "TEXT"
             : a.DateCount == n ? "DATE"
             : a.BooleanCount == n ? "BOOLEAN"
             : a.NumericCount == n ? (a.IntegerCount == a.NumericCount ? "INTEGER" : "REAL")
             : "TEXT";
    }).ToList();

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unvalidated identifiers: New DDL generation uses Excel-derived tableName/column names to build SQL without robust
validation/quoting, which can fail on reserved characters and edge-case names and may
surface confusing runtime errors.

Referred Code
private string CreateDBTableSqlString(string tableName, List<string> headerColumns, List<string>? columnTypes = null)
{
    _columnTypes = columnTypes.IsNullOrEmpty() ? headerColumns.Select(x => "TEXT").ToList() : columnTypes;

    var sanitizedColumns = headerColumns.Select(x => x.Replace(" ", "_").Replace("#", "_")).ToList();
    var columnDefs = sanitizedColumns.Select((col, i) => $"`{col}` {_columnTypes[i]}");
    var createTableSql = $"CREATE TABLE IF NOT EXISTS {tableName} ( Id INTEGER PRIMARY KEY AUTOINCREMENT, {string.Join(", ", columnDefs)} );";

    // Create index for each column
    var indexSql = sanitizedColumns.Select(col => $"CREATE INDEX IF NOT EXISTS idx_{tableName}_{col} ON {tableName} (`{col}`);");

    return createTableSql + "\n" + string.Join("\n", indexSql);
}

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Prevent SQL injection in index creation

To prevent SQL injection, quote the tableName in both the CREATE TABLE and
CREATE INDEX SQL statements.

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [279-291]

     private string CreateDBTableSqlString(string tableName, List<string> headerColumns, List<string>? columnTypes = null)
     {
 ...
         var sanitizedColumns = headerColumns.Select(x => x.Replace(" ", "_").Replace("#", "_")).ToList();
         var columnDefs = sanitizedColumns.Select((col, i) => $"`{col}` {_columnTypes[i]}");
-        var createTableSql = $"CREATE TABLE IF NOT EXISTS {tableName} ( Id INTEGER PRIMARY KEY AUTOINCREMENT, {string.Join(", ", columnDefs)} );";
+        var quotedTableName = $"`{tableName.Replace("`", "``")}`";
+        var createTableSql = $"CREATE TABLE IF NOT EXISTS {quotedTableName} ( Id INTEGER PRIMARY KEY AUTOINCREMENT, {string.Join(", ", columnDefs)} );";
 
         // Create index for each column
-        var indexSql = sanitizedColumns.Select(col => $"CREATE INDEX IF NOT EXISTS idx_{tableName}_{col} ON {tableName} (`{col}`);");
+        var indexSql = sanitizedColumns.Select(col => $"CREATE INDEX IF NOT EXISTS `idx_{tableName}_{col}` ON {quotedTableName} (`{col}`);");
 
         return createTableSql + "\n" + string.Join("\n", indexSql);
     }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical SQL injection vulnerability in the CREATE TABLE and CREATE INDEX statements and provides a comprehensive fix that sanitizes the tableName input, significantly improving security.

High
High-level
Avoid indexing every single column

Avoid creating a database index for every column from the Excel sheet, as this
harms write performance and increases storage. Instead, do not create indexes by
default or use a more selective strategy.

Examples:

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [288-290]
        var indexSql = sanitizedColumns.Select(col => $"CREATE INDEX IF NOT EXISTS idx_{tableName}_{col} ON {tableName} (`{col}`);");

        return createTableSql + "\n" + string.Join("\n", indexSql);

Solution Walkthrough:

Before:

function CreateDBTableSqlString(tableName, headerColumns, ...) {
  // ... logic to create table ...
  var createTableSql = $"CREATE TABLE ...";

  // Create an index for EVERY column
  var indexSql = headerColumns.Select(col => 
    $"CREATE INDEX ON {tableName} (`{col}`);"
  );

  return createTableSql + "\n" + string.Join("\n", indexSql);
}

After:

function CreateDBTableSqlString(tableName, headerColumns, ...) {
  // ... logic to create table ...
  var createTableSql = $"CREATE TABLE ...";

  // Do not create indexes by default.
  // Indexing should be an optional, more targeted feature.
  // For example, only index columns specified by the user
  // or those with high cardinality.

  return createTableSql;
}
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a significant performance and design flaw, as indexing every column negatively impacts write performance and storage, which is a poor default behavior.

High
Process the Excel sheet only once

Combine the two separate passes over the Excel data—one for analysis
(CollectColumnDistinctValues) and one for data insertion (ParseSheetData)—into a
single pass to improve performance.

Examples:

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [94]
            var columnAnalyses = CollectColumnDistinctValues(sheet);
src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [114-133]
    private string ParseSheetData(ISheet singleSheet)
    {
        var stringBuilder = new StringBuilder();

        for (int rowIdx = 1; rowIdx < _excelRowSize + 1; rowIdx++)
        {
            IRow row = singleSheet.GetRow(rowIdx);
            stringBuilder.Append('(');
            for (int colIdx = 0; colIdx < _excelColumnSize; colIdx++)
            {

 ... (clipped 10 lines)

Solution Walkthrough:

Before:

function SqlCreateTableFn(sheet) {
  // First pass: Analyze columns
  var columnAnalyses = CollectColumnDistinctValues(sheet);
  var inferredTypes = InferColumnTypes(columnAnalyses);
  CreateDBTableSqlString(..., inferredTypes);
  // ... create table ...
}

function SqlInsertDataFn(sheet) {
  // Second pass: Generate INSERT statements
  var dataSql = ParseSheetData(sheet);
  // ... insert data ...
}

After:

function ProcessSheet(sheet) {
  // Single pass over the sheet
  var columnAnalyses = new ...;
  var insertValues = new StringBuilder();

  for each row in sheet {
    // 1. Analyze cell types and update columnAnalyses
    // 2. Format cell values and append to insertValues
  }

  var inferredTypes = InferColumnTypes(columnAnalyses);
  CreateDBTableSqlString(..., inferredTypes);
  // ... create table ...

  // ... use insertValues to insert data ...
}
Suggestion importance[1-10]: 8

__

Why: This is a valid and important performance optimization, as reading the Excel file twice is inefficient for large datasets; combining the analysis and data preparation into a single pass would significantly improve performance.

Medium
Possible issue
Avoid potential null reference exception

In GetCellValueAndType, store the result of cell.StringCellValue?.Trim() in a
variable to avoid a potential NullReferenceException and redundant calls.

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [219-231]

     private (string value, string type) GetCellValueAndType(ICell cell)
-        => cell.CellType switch
+    {
+        switch (cell.CellType)
         {
-            CellType.String => IsBooleanString(cell.StringCellValue?.Trim())
-                ? (cell.StringCellValue.Trim(), "BOOLEAN")
-                : (cell.StringCellValue?.Trim() ?? "", "TEXT"),
-            CellType.Numeric => DateUtil.IsCellDateFormatted(cell)
-                ? (cell.DateCellValue?.ToString("yyyy-MM-dd HH:mm:ss") ?? "", "DATE")
-                : (cell.NumericCellValue.ToString(), "NUMERIC"),
-            CellType.Boolean => (cell.BooleanCellValue.ToString(), "BOOLEAN"),
-            CellType.Blank => ("", "NULL"),
-            _ => (cell.ToString() ?? "", "TEXT")
-        };
+            case CellType.String:
+                var trimmedValue = cell.StringCellValue?.Trim();
+                return IsBooleanString(trimmedValue)
+                    ? (trimmedValue, "BOOLEAN")
+                    : (trimmedValue ?? "", "TEXT");
+            case CellType.Numeric:
+                return DateUtil.IsCellDateFormatted(cell)
+                    ? (cell.DateCellValue?.ToString("yyyy-MM-dd HH:mm:ss") ?? "", "DATE")
+                    : (cell.NumericCellValue.ToString(), "NUMERIC");
+            case CellType.Boolean:
+                return (cell.BooleanCellValue.ToString(), "BOOLEAN");
+            case CellType.Blank:
+                return ("", "NULL");
+            default:
+                return (cell.ToString() ?? "", "TEXT");
+        }
+    }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential NullReferenceException if cell.StringCellValue is null and provides a robust fix, preventing a runtime crash.

Medium
General
Format numbers with invariant culture

In GetCellValueAndType, use CultureInfo.InvariantCulture when converting numeric
cell values to strings to avoid issues with locale-specific decimal separators.

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/SqliteService.cs [225-227]

 CellType.Numeric => DateUtil.IsCellDateFormatted(cell)
     ? (cell.DateCellValue?.ToString("yyyy-MM-dd HH:mm:ss") ?? "", "DATE")
-    : (cell.NumericCellValue.ToString(), "NUMERIC"),
+    : (cell.NumericCellValue.ToString(System.Globalization.CultureInfo.InvariantCulture), "NUMERIC"),
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: This is a valuable suggestion that prevents potential SQL syntax errors by ensuring numeric values are formatted consistently, regardless of the system's locale, which improves the robustness of the data import logic.

Medium
  • More

@Oceania2018 Oceania2018 merged commit d3eea1f into SciSharp:master Dec 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants