Explode nested JSON to Dataframe

Explode nested JSON to Dataframe
typescript
Ethan Jackson

There are loads of answers on this topic, but for the life of me I cannot a solution to my issue.

Say I have a JSON like

json_2_explode = [{'scalar': '43', 'units': 'm', 'parameter': [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}], 'name': 'Foo'}, {'scalar': '54.1', 'units': 's', 'parameter': [{'no_1': '78', 'no_2': '103', 'no_3': '356'}], 'name': 'Yoo'}, {'scalar': '1123.1', 'units': 'Hz', 'parameter': [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}], 'name': 'Baz'}]

documenting some readings for attributes Foo, Yoo and Baz. For each I detail a number, that is, the value itself, some parameters, and the name.

Say this JSON is a column in a dataframe,

df = pd.DataFrame(data = {'col1': [11, 9, 23, 1], 'col2': [7, 3, 1, 12], 'col_json' : [json_2_explode, json_2_explode, json_2_explode, json_2_explode]}, index=[0, 1, 2, 3])
col1 col2 col_json 0 11 7 [{'scalar': '43', 'units': 'MPa', 'parameter':... 1 9 3 [{'scalar': '43', 'units': 'MPa', 'parameter':... 2 23 1 [{'scalar': '43', 'units': 'MPa', 'parameter':... 3 1 12 [{'scalar': '43', 'units': 'MPa', 'parameter':...

The issue I have is that if I try

df = pd.json_normalize(df['col_json'].explode())

I get

scalar units parameter name 0 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo 1 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo 2 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz 3 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo 4 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo 5 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz 6 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo 7 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo 8 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz 9 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo 10 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo 11 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz

So it is exploding each JSON in 3 rows (admitteldy each JSON does contain 3 sub-dicts, so to say).
I actually would like Foo, Yoo and Baz to be documented in the same row, adding columns. Is there maybe solution not involving manually manipulating rows/piercing it as desired? I would love to see one of your fancy one-liners, thanks so much for your help

Answer

json_normalize may work after adding prefixes, I didn't try but this puts all your data on one line.

from flatten_json import flatten prefix_json_2_explode = {} for d in json_2_explode: prefix_json_2_explode.update({d['name'] + '_' + key: value for key, value in d.items()}) print(prefix_json_2_explode) dict_flattened = (flatten(d, '.') for d in [prefix_json_2_explode]) df = pd.DataFrame(dict_flattened) df
Foo_scalar Foo_units Foo_parameter.0.no_1 Foo_parameter.0.no_2 \ 0 43 m 45 1038 Foo_parameter.0.no_3 Foo_name Yoo_scalar Yoo_units Yoo_parameter.0.no_1 \ 0 356 Foo 54.1 s 78 Yoo_parameter.0.no_2 Yoo_parameter.0.no_3 Yoo_name Baz_scalar Baz_units \ 0 103 356 Yoo 1123.1 Hz Baz_parameter.0.no_1 Baz_parameter.0.no_2 Baz_parameter.0.no_3 Baz_name 0 21 43 3577 Baz

Related Articles