Issue with reading a csv file with all columns as string using polars

Issue with reading a csv file with all columns as string using polars

I have below python code using polars and I do not want python to auto parse values as dates or integers unless explicitly stated. schema_overrides doesn't prevent auto conversion either.


# Read the CSV file with all columns as strings using schema_overrides
file_path = "./xyz.csv"
df = pl.read_csv(file_path, schema_overrides={'*': pl.Utf8})

# Display the DataFrame
print(df)

I get below error: polars.exceptions.ComputeError: could not parse p35038 as dtype i64 at column 'Employee ID' (column number 3)

Answer

This is what infer_schema=False is for.

When False, the schema is not inferred and will be pl.String if not specified in schema or schema_overrides.

pl.read_csv(b"""a,b,c
 1,2,3""")

# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 2   ┆ 3   │
# └─────┴─────┴─────┘
pl.read_csv(b"""a,b,c
1,2,3""", infer_schema=False)

# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 2   ┆ 3   │
# └─────┴─────┴─────┘

"*" in your example is taken literally, it is not treated as a "Wildcard".

Enjoyed this article?

Check out more content on our blog or follow us on social media.

Browse more articles