https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html
So so far I have:
scrape1<-read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html')
My lecturer taught me to use:
scrape1_nodes<-scrape1 %>% html_nodes("p") head(scrape1_nodes)
However this method doesn't seem to be working, is there an easy way to find the CSV or direct R to the data in the HTML page?? Regards,
Answer
One option could be to retrieve the text from below all DIV
s with class=psovde-listing
and then splitting this vector into a data.frame
by \n
library(rvest)
res <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html') %>%
html_elements("div.psovde-listing") %>%
html_text2()
df <- data.frame(do.call(rbind, strsplit(res[-1], "\n", fixed=TRUE)))
colnames(df) <- c("Day", "CruiseLine", "Ship", "Times", "Passengers")
giving
Day CruiseLine Ship Times Passengers
1 Wed 16 🧍 AIDAsol a 1000 d 2000 2174
2 Thu 17 🧍 Renaissance a 0700 d 1800 1358
3 Wed 23 🧍 AIDAsol a 1000 d 1900 2174
4 Tue 29 🧍 Amera a 0800 d 2000 834
5 Sat 3 🧍 AIDAluna a 0800 d 1800 2050
6 Tue 6 🧍 Mein Schiff 3 a 0730 d 1900 2506
...
Which you can write as CSV with write.csv(df, "cruises.csv")
Notes
- One slight difficulty is, that the cruise lines on the website are not noted as text but rather as images. This can be improved upon. What data do you want to scrape exactly?
- Looking at the network traffic, I can't find any API or easy POST / GET request to fetch the source data