Webscraping Images

Now let’s webscrape some images from mtgimage.com to see how we can work with images in R itself. We will need the RCurl and jpeg packages installed and loaded in order to do this. RCurl will give us some webscraping functions while jpeg allows R to convert .jpegs image files into a form R can read.

install.packages('RCurl')
library(RCurl)
install.packages('jpeg')
library(jpeg)

By using regular expressions we’ll look at just the creatures with the Beast subtype from our extracted JSON file, and get their ID codes.

codes <- MRD[grep("Beast", MRD$subtype),]$ID
codes
 "48436" "46556" "46123" "48603" "46115" "48600" "48083"

Since mtgimage.com is designed to be as easy to reference it is trivial to use this information to create a bunch of URLs that we will reference later. The paste() function to drop each multiverse ID into the middle of some text. The only thing we have to do is tell it that the separator value will be nothing (by default it is a space).

Once we have that

urls <- paste("http://mtgimage.com/multiverseid/",codes,"-crop.jpg", sep="")

jpglist <- list()
for (i in 1:7) {
	jpglist[[i]] <- readJPEG(getURLContent(urls[i]))
}

Now we have a bunch of raster files (ie bitmaps) and the built in rasterImage() function to visualize them with. The simplest way to show them is to call a new plot and put the image on it.

plot.new()
rasterImage(jpglist[[1]],0,0,1,1)

Why would you want to do this with R?

To be perfectly honest it isn’t something that is called for often with statistical software but it is a great chance to get practice with how things are referenced in R.

Leave a comment