How to Compare Two Tables and Output Mismatched Records Side by Side in DolphinDB SQL?

How to Compare Two Tables and Output Mismatched Records Side by Side in DolphinDB SQL?

Sharing a common DolphinDB use case and solution for data processing.

I need to compare data between two tables (e.g., Table A and Table A1) to check if their field values match.

The solution should:

  • Allow specifying which fields to compare (or compare all fields by default).
  • Identify mismatched records and output the differences, including the field names and their differing values.
  • Consolidate discrepancies into a single result table, showing both the original records (from Table A and Table A1) side by side for easy comparison.

For example: If ID=1 in Table A has Name="John" but Table A1 has Name="Jon", the output should display both records together.

How can I implement this to identify the differences between the two tables?

Answer

The following custom function can be used to compare tables tb1 and tb2, outputting records with differing values:

/**
 * table1: First table
 * table2: Second table
 * compareCols: Column names to compare
 * Returns records where specified column values differ between the tables
 */
def findDiff(table1, table2, compareCols) {
    t1 = table(table1[compareCols]).rename!("t1_" + compareCols)
    t2 = table(table2[compareCols]).rename!("t2_" + compareCols)
    return join(t1, t2).at(rowAnd(eqObj(t1.values(), t2.values())) == false)
}

Usage Example

First create tb1 with 100,000 rows:

n = 100000
id = rand(`APPL`MSFT`GOOG`SAP`UBER`TSLA, n)
col1 = rand(50, n)
col2 = rand(10.0, n)
col3 = -rand(50, n)
col4 = -rand(10.0, n)
// Create tb1
tb1 = table(id, col1, col2, col3, col4)

Then generate tb2 by randomly setting 5 values in each column to 0 or NULL:

id[rand(n, 5)-1] = string(NULL)
col1[rand(n, 5)-1] = 0
col2[rand(n, 5)-1] = 0
col3[rand(n, 5)-1] = 0
col4[rand(n, 5)-1] = 0
tb2 = table(id, col1, col2, col3, col4)

Compare tables with specifying fields:

findDiff(tb1, tb2, `id)

// output:
    t1_id   t2_id
0   UBER    
1   UBER    
2   UBER    
3   UBER    
4   TSLA    

Compare tables with all fields:

// Method1: Explicit column listing
findDiff(tb1, tb2, [`id, `col1, `col2, `col3, `col4])

// Method2: Using columnNames() function
findDiff(tb1, tb2, tb1.columnNames())

// output:
    t1_id   t1_col1 t1_col2 t1_col3 t1_col4 t2_id   t2_col1 t2_col2 t2_col3 t2_col4
0   MSFT    12  1.1322767165029561  -11 -6.598268286304129  MSFT    12  1.1322767165029561  -11 0
1   TSLA    34  0.48924599456841555 -49 -2.662222705938644  TSLA    0   0.48924599456841555 -49 -2.662222705938644
2   MSFT    17  1.6391161118263726  -14 -7.77336316754374   MSFT    17  0   -14 -7.77336316754374
3   UBER    35  3.0947287484132158  -34 -8.46694418469189       35  3.0947287484132158  -34 -8.46694418469189
4   GOOG    11  7.340453306747864   -27 -0.7505336981170869 GOOG    0   7.340453306747864   -27 -0.7505336981170869
...

Enjoyed this question?

Check out more content on our blog or follow us on social media.

Browse more questions