Issue with reading a csv file with all columns as string using polars

Issue with reading a csv file with all columns as string using polars
python
Ethan Jackson

I have below python code using polars and I do not want python to auto parse values as dates or integers unless explicitly stated. schema_overrides doesn't prevent auto conversion either.

# Read the CSV file with all columns as strings using schema_overrides file_path = "./xyz.csv" df = pl.read_csv(file_path, schema_overrides={'*': pl.Utf8}) # Display the DataFrame print(df)

I get below error: polars.exceptions.ComputeError: could not parse p35038 as dtype i64 at column 'Employee ID' (column number 3)

Answer

This is what infer_schema=False is for.

When False, the schema is not inferred and will be pl.String if not specified in schema or schema_overrides.

pl.read_csv(b"""a,b,c 1,2,3""") # shape: (1, 3) # ┌─────┬─────┬─────┐ # │ a ┆ b ┆ c │ # │ ---------# │ i64 ┆ i64 ┆ i64 │ # ╞═════╪═════╪═════╡ # │ 123# └─────┴─────┴─────┘
pl.read_csv(b"""a,b,c 1,2,3""", infer_schema=False) # shape: (1, 3) # ┌─────┬─────┬─────┐ # │ a ┆ b ┆ c │ # │ ---------# │ str ┆ str ┆ str │ # ╞═════╪═════╪═════╡ # │ 123# └─────┴─────┴─────┘

"*" in your example is taken literally, it is not treated as a "Wildcard".

Related Articles