How to Compare Two Tables and Output Mismatched Records Side by Side in DolphinDB SQL?

How to Compare Two Tables and Output Mismatched Records Side by Side in DolphinDB SQL?
typescript
Ethan Jackson

Sharing a common DolphinDB use case and solution for data processing.

I need to compare data between two tables (e.g., Table A and Table A1) to check if their field values match.

The solution should:

  • Allow specifying which fields to compare (or compare all fields by default).
  • Identify mismatched records and output the differences, including the field names and their differing values.
  • Consolidate discrepancies into a single result table, showing both the original records (from Table A and Table A1) side by side for easy comparison.

For example: If ID=1 in Table A has Name="John" but Table A1 has Name="Jon", the output should display both records together.

How can I implement this to identify the differences between the two tables?

Answer

The following custom function can be used to compare tables tb1 and tb2, outputting records with differing values:

/** * table1: First table * table2: Second table * compareCols: Column names to compare * Returns records where specified column values differ between the tables */ def findDiff(table1, table2, compareCols) { t1 = table(table1[compareCols]).rename!("t1_" + compareCols) t2 = table(table2[compareCols]).rename!("t2_" + compareCols) return join(t1, t2).at(rowAnd(eqObj(t1.values(), t2.values())) == false) }

Usage Example

First create tb1 with 100,000 rows:

n = 100000 id = rand(`APPL`MSFT`GOOG`SAP`UBER`TSLA, n) col1 = rand(50, n) col2 = rand(10.0, n) col3 = -rand(50, n) col4 = -rand(10.0, n) // Create tb1 tb1 = table(id, col1, col2, col3, col4)

Then generate tb2 by randomly setting 5 values in each column to 0 or NULL:

id[rand(n, 5)-1] = string(NULL) col1[rand(n, 5)-1] = 0 col2[rand(n, 5)-1] = 0 col3[rand(n, 5)-1] = 0 col4[rand(n, 5)-1] = 0 tb2 = table(id, col1, col2, col3, col4)

Compare tables with specifying fields:

findDiff(tb1, tb2, `id) // output: t1_id t2_id 0 UBER 1 UBER 2 UBER 3 UBER 4 TSLA

Compare tables with all fields:

// Method1: Explicit column listing findDiff(tb1, tb2, [`id, `col1, `col2, `col3, `col4]) // Method2: Using columnNames() function findDiff(tb1, tb2, tb1.columnNames()) // output: t1_id t1_col1 t1_col2 t1_col3 t1_col4 t2_id t2_col1 t2_col2 t2_col3 t2_col4 0 MSFT 12 1.1322767165029561 -11 -6.598268286304129 MSFT 12 1.1322767165029561 -11 0 1 TSLA 34 0.48924599456841555 -49 -2.662222705938644 TSLA 0 0.48924599456841555 -49 -2.662222705938644 2 MSFT 17 1.6391161118263726 -14 -7.77336316754374 MSFT 17 0 -14 -7.77336316754374 3 UBER 35 3.0947287484132158 -34 -8.46694418469189 35 3.0947287484132158 -34 -8.46694418469189 4 GOOG 11 7.340453306747864 -27 -0.7505336981170869 GOOG 0 7.340453306747864 -27 -0.7505336981170869 ...

Related Articles