How to Compare Two Tables and Output Mismatched Records Side by Side in DolphinDB SQL?

Sharing a common DolphinDB use case and solution for data processing.
I need to compare data between two tables (e.g., Table A and Table A1) to check if their field values match.
The solution should:
- Allow specifying which fields to compare (or compare all fields by default).
- Identify mismatched records and output the differences, including the field names and their differing values.
- Consolidate discrepancies into a single result table, showing both the original records (from Table A and Table A1) side by side for easy comparison.
For example: If ID=1 in Table A has Name="John" but Table A1 has Name="Jon", the output should display both records together.
How can I implement this to identify the differences between the two tables?
Answer
The following custom function can be used to compare tables tb1 and tb2, outputting records with differing values:
/**
* table1: First table
* table2: Second table
* compareCols: Column names to compare
* Returns records where specified column values differ between the tables
*/
def findDiff(table1, table2, compareCols) {
t1 = table(table1[compareCols]).rename!("t1_" + compareCols)
t2 = table(table2[compareCols]).rename!("t2_" + compareCols)
return join(t1, t2).at(rowAnd(eqObj(t1.values(), t2.values())) == false)
}
Usage Example
First create tb1 with 100,000 rows:
n = 100000
id = rand(`APPL`MSFT`GOOG`SAP`UBER`TSLA, n)
col1 = rand(50, n)
col2 = rand(10.0, n)
col3 = -rand(50, n)
col4 = -rand(10.0, n)
// Create tb1
tb1 = table(id, col1, col2, col3, col4)
Then generate tb2 by randomly setting 5 values in each column to 0 or NULL:
id[rand(n, 5)-1] = string(NULL)
col1[rand(n, 5)-1] = 0
col2[rand(n, 5)-1] = 0
col3[rand(n, 5)-1] = 0
col4[rand(n, 5)-1] = 0
tb2 = table(id, col1, col2, col3, col4)
Compare tables with specifying fields:
findDiff(tb1, tb2, `id)
// output:
t1_id t2_id
0 UBER
1 UBER
2 UBER
3 UBER
4 TSLA
Compare tables with all fields:
// Method1: Explicit column listing
findDiff(tb1, tb2, [`id, `col1, `col2, `col3, `col4])
// Method2: Using columnNames() function
findDiff(tb1, tb2, tb1.columnNames())
// output:
t1_id t1_col1 t1_col2 t1_col3 t1_col4 t2_id t2_col1 t2_col2 t2_col3 t2_col4
0 MSFT 12 1.1322767165029561 -11 -6.598268286304129 MSFT 12 1.1322767165029561 -11 0
1 TSLA 34 0.48924599456841555 -49 -2.662222705938644 TSLA 0 0.48924599456841555 -49 -2.662222705938644
2 MSFT 17 1.6391161118263726 -14 -7.77336316754374 MSFT 17 0 -14 -7.77336316754374
3 UBER 35 3.0947287484132158 -34 -8.46694418469189 35 3.0947287484132158 -34 -8.46694418469189
4 GOOG 11 7.340453306747864 -27 -0.7505336981170869 GOOG 0 7.340453306747864 -27 -0.7505336981170869
...
Enjoyed this question?
Check out more content on our blog or follow us on social media.
Browse more questions