Data Manipulation

Data manipulation involves preparing large data sets into a form required for statistical analysis. When you have a large data set most of it can be unrelated to what you are trying to accomplish. Data manipulation can be quite complex but very important for achieving the goals of the analysis.

Data manipulation covers a wide variety of tasks, such as:

  • getting data from text files, spreadsheets, databases and other sources and inputting them into an appropriate statistical package
  • manipulating date/time data and character manipulation
  • aggregating data and reshaping data

Some recommended resources

  • Wikham, H. and Grolemund, G. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly, CA.
  • Spector, P. 2008. Data Manipulation with R. Springer, New York.
  • Nolan, D. and Temple Lang, D. 2014. XML and Web Technologies for Data Sciences with R. Springer, New York.
  • Cody, R. 2008. Cody's Data Cleaning Techniques using SAS, 2nd. Edition, SAS Institute.
  • Data Camp Courses:
    • Introduction to the Tidyverse
    • Import Data into R (Parts 1 and 2)
    • Cleaning Data in R
    • Importing and Cleaning Data in R: Case Studies
    • Data Manipulation in R using dplyr
    • Joining Data in R with dplyr