twineconvert

Why your CSV looks broken in Excel (and how to fix it without re-exporting)

Commas in fields, leading zeros stripped from IDs, dates auto-formatted, encoding garbage. The four most common Excel CSV failures and the import path that fixes all of them.

3 min read

Double-clicking a CSV in Excel is the worst way to open it. Excel makes a series of automatic guesses about how to parse the file, and on a meaningful fraction of real CSVs, those guesses are wrong. The data is fine; Excel's interpretation is broken.

Here are the four most common failures, why they happen, and the one Excel feature that bypasses all of them.

Failure 1: leading zeros stripped from IDs

You have a CSV with a column like 0012345. Excel opens it and shows 12345. The leading zeros are gone.

This happens because Excel auto-detects the column as a number, and numbers do not have leading zeros. Zip codes ("01234"), Bates numbers ("ABC0000123"), product SKUs ("000-456-789") all get mangled this way.

Failure 2: dates auto-converted

A column with values like 1-2, 1-3, 1-4 (referring to chapter sections) gets parsed as January 2, January 3, January 4 of the current year. Now your data looks like a list of dates instead of section numbers.

The classic horror story is gene names: the gene SEPT2 (Septin 2) gets converted to "2-Sep" because Excel thinks it is a date. Real research papers have had to be retracted because of this.

Failure 3: UTF-8 characters show as garbage

Your CSV has Café in a cell. Excel opens it and shows Café. The file is fine; Excel just opened it as Windows-1252 instead of UTF-8.

This is a default-encoding mismatch. Excel on Windows defaults to the system codepage (usually Windows-1252 in English locales), and CSVs from the modern world default to UTF-8. The bytes get misinterpreted.

Failure 4: commas inside quoted fields breaking columns

Your CSV has an address column: "123 Main St, Apt 4". The comma is inside the quoted string, so it should not split the column. Excel mostly handles this correctly, but for older Excel versions or files with non-standard quoting, the address can split into two columns and shift everything after it.

The one fix that solves all four

Do NOT double-click the file. Instead:

  1. Open Excel with a blank workbook
  2. Go to Data → From Text/CSV (in newer Excel) or Data → Get External Data → From Text (in older versions)
  3. Pick the CSV file
  4. In the preview dialog:
    • Set File Origin to 65001: Unicode (UTF-8) to fix encoding
    • Set Delimiter to Comma
    • Click Edit or Transform Data to open Power Query
  5. In Power Query:
    • Right-click each problem column → Change TypeText (this prevents the number/date auto-detection)
    • Click Close & Load

The data now imports without Excel applying any of its destructive guesses.

When the import wizard is not enough

A few cases where even the wizard misbehaves:

TSV files saved as .csv

Tab-separated files sometimes get saved with a .csv extension. Excel will guess the delimiter and usually gets it right, but if you see all your data smushed into one column, change Delimiter to Tab in the wizard.

If you have a TSV that needs to be a CSV, our TSV to CSV converter does the swap with proper quoting per RFC 4180.

CSV with semicolons instead of commas

In European locales (German, French, Spanish), CSV files often use ; as the separator because the comma is used as the decimal point. Excel's auto-detection sometimes guesses wrong on these. Force the delimiter to Semicolon in the wizard.

Truly broken CSVs

If a CSV has inconsistent quoting (some fields quoted, some not, no clear rule), even the wizard will produce ugly results. Often the cleanest fix is to convert the CSV to JSON first (using our CSV to JSON converter), inspect the structured output for issues, then convert back to a clean CSV with consistent quoting.

What I do

For any CSV that matters (anything with IDs, dates, or non-ASCII characters), I always use Data → From Text/CSV, never double-click. It takes 10 seconds longer than double-clicking and prevents about 90% of the silent data corruption that "but the numbers were right when I exported" is actually about.