I'm not able to get organized data in my scraper

I'm making a scraper to capture events from a website, with title, date, location and link, so I could put them into a dataframe.
The scraper worked, but some events that have two dates are coming up wrong. For example:
[['Concerto | Clamor Pela Paz',
'04/04/2025',
'Menu',
'https://www.theatromunicipal.org.br/evento/concerto-clamor-pela-paz/'],
['Concerto | Clamor Pela Paz',
'a 05/04/2025',
'Theatro Municipal - Sala de Espetáculos',
'https://www.theatromunicipal.org.br/evento/concerto-clamor-pela-paz/'],
Notice that it's the same event, but in different lists. And the date came broken - the first date is in the first list, in the middle there is a "Menu" that shouldn't be there. The second date comes in another list, with an "a" in front, which shouldn't be there either.
What could be causing this error?
In the website's HTML, the dates are inside the same tag and the same class, but in different lists.
I captured the dates this way:
datas = sopa.findAll('span', class_='elementor-icon-list-text elementor-post-info__item elementor-post-info__item--type-custom')
And I did the for
this way:
lista_eventos = []
for titulo, data, local, link in list(zip(nome_evento, datas, local_evento, link_evento)): # Changed data_evento to datas
titulo = titulo.text.strip()
data = data.text.strip() if hasattr(data, 'text') else data
local = local.text.strip()
link = link.get('href')
lista_eventos.append([titulo, data, local, link])
Colab link: Read more
What am I doing wrong?
Answer
For the dates of the show, all you need is this line of code.
datas = soup.find_all("div", {"class": "jet-listing-dynamic-field__content"})
The datas
list variable will look like this (based on your example).
[<div class="jet-listing-dynamic-field__content">20h</div>,
<div class="jet-listing-dynamic-field__content">sexta-feira 04/04/25</div>,
<div class="jet-listing-dynamic-field__content">17h</div>,
<div class="jet-listing-dynamic-field__content">sábado 05/04/25</div>]
To get the schedule
horario = []
for i in range(0, len(datas), 2):
if i + 1 < len(datas):
hora = datas[i].text.strip()
dia = datas[i+1].text.strip()
horario.append((dia, hora))
for dia, hora in horario:
print(f"Dia: {dia}, Hora: {hora}")
Output
Dia: sexta-feira 04/04/25, Hora: 20h
Dia: sábado 05/04/25, Hora: 17h
Enjoyed this article?
Check out more content on our blog or follow us on social media.
Browse more articles