There are loads of answers on this topic, but for the life of me I cannot a solution to my issue.
Say I have a JSON like
json_2_explode = [{'scalar': '43',
'units': 'm',
'parameter': [{'no_1': '45',
'no_2': '1038',
'no_3': '356'}],
'name': 'Foo'},
{'scalar': '54.1',
'units': 's',
'parameter': [{'no_1': '78',
'no_2': '103',
'no_3': '356'}],
'name': 'Yoo'},
{'scalar': '1123.1',
'units': 'Hz',
'parameter': [{'no_1': '21',
'no_2': '43',
'no_3': '3577'}],
'name': 'Baz'}]
documenting some readings for attributes Foo
, Yoo
and Baz
. For each I detail a number, that is, the value itself, some parameters, and the name.
Say this JSON is a column in a dataframe,
df = pd.DataFrame(data = {'col1': [11, 9, 23, 1],
'col2': [7, 3, 1, 12],
'col_json' : [json_2_explode,
json_2_explode,
json_2_explode,
json_2_explode]}, index=[0, 1, 2, 3])
col1 col2 col_json
0 11 7 [{'scalar': '43', 'units': 'MPa', 'parameter':...
1 9 3 [{'scalar': '43', 'units': 'MPa', 'parameter':...
2 23 1 [{'scalar': '43', 'units': 'MPa', 'parameter':...
3 1 12 [{'scalar': '43', 'units': 'MPa', 'parameter':...
The issue I have is that if I try
df = pd.json_normalize(df['col_json'].explode())
I get
scalar units parameter name
0 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
1 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo
2 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz
3 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
4 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo
5 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz
6 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
7 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo
8 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz
9 43 m [{'no_1': '45', 'no_2': '1038', 'no_3': '356'}] Foo
10 54.1 s [{'no_1': '78', 'no_2': '103', 'no_3': '356'}] Yoo
11 1123.1 Hz [{'no_1': '21', 'no_2': '43', 'no_3': '3577'}] Baz
So it is exploding each JSON in 3 rows (admitteldy each JSON does contain 3 sub-dicts, so to say).
I actually would like Foo
, Yoo
and Baz
to be documented in the same row, adding columns.
Is there maybe solution not involving manually manipulating rows/piercing it as desired? I would love to see one of your fancy one-liners, thanks so much for your help
Answer
json_normalize may work after adding prefixes, I didn't try but this puts all your data on one line.
from flatten_json import flatten
prefix_json_2_explode = {}
for d in json_2_explode:
prefix_json_2_explode.update({d['name'] + '_' + key: value for key, value in d.items()})
print(prefix_json_2_explode)
dict_flattened = (flatten(d, '.') for d in [prefix_json_2_explode])
df = pd.DataFrame(dict_flattened)
df
Foo_scalar Foo_units Foo_parameter.0.no_1 Foo_parameter.0.no_2 \
0 43 m 45 1038
Foo_parameter.0.no_3 Foo_name Yoo_scalar Yoo_units Yoo_parameter.0.no_1 \
0 356 Foo 54.1 s 78
Yoo_parameter.0.no_2 Yoo_parameter.0.no_3 Yoo_name Baz_scalar Baz_units \
0 103 356 Yoo 1123.1 Hz
Baz_parameter.0.no_1 Baz_parameter.0.no_2 Baz_parameter.0.no_3 Baz_name
0 21 43 3577 Baz