How do I web scrape this basic webpage to CSV? For use with Rstudio?

https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html

So so far I have:

scrape1<-read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')

My lecturer taught me to use:

scrape1_nodes<-scrape1 %>% html_nodes("p") head(scrape1_nodes)

However this method doesn't seem to be working, is there an easy way to find the CSV or direct R to the data in the HTML page?? Regards,

Answer

One option could be to retrieve the text from below all DIVs with class=psovde-listing and then splitting this vector into a data.frame by \n

library(rvest)

res <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html') %>%
  html_elements("div.psovde-listing") %>% 
  html_text2() 

df <- data.frame(do.call(rbind, strsplit(res[-1], "\n", fixed=TRUE)))
colnames(df) <- c("Day", "CruiseLine", "Ship", "Times", "Passengers")

giving

     Day CruiseLine          Ship         Times Passengers
Wed 16          🧍       AIDAsol a 1000 d 2000       2174
Thu 17          🧍   Renaissance a 0700 d 1800       1358
Wed 23          🧍       AIDAsol a 1000 d 1900       2174
Tue 29          🧍         Amera a 0800 d 2000        834
Sat 3          🧍      AIDAluna a 0800 d 1800       2050
Tue 6          🧍 Mein Schiff 3 a 0730 d 1900       2506
...

Which you can write as CSV with write.csv(df, "cruises.csv")

Notes

One slight difficulty is, that the cruise lines on the website are not noted as text but rather as images. This can be improved upon. What data do you want to scrape exactly?
Looking at the network traffic, I can't find any API or easy POST / GET request to fetch the source data

How do I web scrape this basic webpage to CSV? For use with Rstudio?

Answer

Notes

Related Articles

Required<T> does not work for secondary lookup types

how to replace quenstion mark in sed

Why calling longjmp in a non-main stack causes the program to crash?

chrome device mobile view html select options zoom out

Checksum error for eigen when building Drake wheel

How To fix Linux Kernel 6.15.X with Workstation 17.6.X , vmnon and vmnet issues? [closed]