Switching from SPSS to R: Save scripts, not workspaces!

header_image

I’m back with a quick lesson that I have learned while switching to R for data analysis (if you are curious about why I am doing so, I have a list of reasons here). This was a bit of a painful lessons that cost me a lot of time: SAVE SCRIPTS, NOT WORKSPACES.

What does that mean?

I am going to explain this in non-technical terms (sorry R experts), mainly because I don’t know the technical lingo.

In SPSS, your data file is a tangible thing. You can make changes to it and save it and then go back to the actual file and boom, there is the data just as you left it.

In R things work a bit differently. All changes to data (and analysis, and charts, and everything else) are executed through scripts. You write a block of code that does something. You save this script and each time you open R, you should re-load the script. Objects and dataframes aren’t “real” as they are in SPSS.

Like most R users, I use R Studio. R Studio is amazing and awesome and I love it. But it has a default setting that was allowing me to keep a bad habit I learned from SPSS (i.e., not re-loading a script each time to make sure that it included everything my analysis needed and treating objects as “real”). R Studio has a default setting that will automatically save your workspace and re-load it next time you start the program. Amazing! Or so I thought.

I have been working through the book R for Data Science (a great book which is FREE by the way) and in the workflow section the authors make this point very clearly: save scripts, not workspaces.  I didn’t really get why this was so important. It was so much easier just to open R Studio and have my previous workspace waiting for me.

Unfortunately I learned first-hand why this is so important.

I manually cleared my workspace because I thought I was done with my analysis (and I was sure my script had everything the analysis needed). Turns out my script was missing something pretty important. When I had to go back to my analysis to change something, lo and behold a few objects were missing from my script. I had to manually re-create them from memory.

It wasn’t the end of the world since I was able to do that, but it cost me a lot of time. And what if I had to go back to that analysis a year later? My memory would have certainly faded. If I had been working solely from scripts the entire time this error would have been caught right away (or not have occurred in the first place).

Thankfully you can change the default setting on R Studio so that it doesn’t save your workspace and enable this bad habit. Instructions are here. Don’t repeat my mistake!

Advertisements