Beyond Spreadsheets with R

Beyond Spreadsheets with R

Read it now on the O’Reilly learning platform with a 10-day free trial.

O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Book description

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.

About the Technology

Spreadsheets are powerful tools for many tasks, but if you need to interpret, interrogate, and present data, they can feel like the wrong tools for the task. That’s when R programming is the way to go. The R programming language provides a comfortable environment to properly handle all types of data. And within the open source RStudio development suite, you have at your fingertips easy-to-use ways to simplify complex manipulations and create reproducible processes for analysis and reporting.

About the Book

With Beyond Spreadsheets with R you’ll learn how to go from raw data to meaningful insights using R and RStudio. Each carefully crafted chapter covers a unique way to wrangle data, from understanding individual values to interacting with complex collections of data, including data you scrape from the web. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts.

What's Inside

About the Reader

If you’re comfortable writing formulas in Excel, you’re ready for this book.

About the Author

Dr Jonathan Carroll is a data science consultant providing R programming services. He holds a PhD in theoretical physics.

We interviewed Jonathan as a part of our Six Questions series. Check it out here.

Quotes
A useful guide to facilitate graduating from spreadsheets to more serious data wrangling with R.
- John D. Lewis, DDN

An excellent book to help you understand how stored data can be used.
- Hilde Van Gysel, Trebol Engineering

A great introduction to a data science programming language. Makes you want to learn more!
- Jenice Tom, CVS Health

Handy to have when your data spreads beyond a spreadsheet.
- Danil Mironov, Luxoft Poland

Show and hide more Table of contents Product information

Table of contents

  1. Titlepage
  2. Copyright
  3. preface
  4. acknowledgments
  5. about this book
    1. Who needs this book?
    2. How to read this book
      1. Formatting
      2. Structure
      1. 1.1 Data: What, where, how?
        1. 1.1.1 What is data?
        2. 1.1.2 Seeing the world as data sources
        3. 1.1.3 Data munging
        4. 1.1.4 What you can do with well-handled data
        5. 1.1.5 Data as an asset
        6. 1.1.6 Reproducible research and version control
        1. 1.2.1 The origins of R
        2. 1.2.2 What R is and what it isn’t
        1. 1.4.1 Working with R within RStudio
        2. 1.4.2 Built-in packages (data and functions)
        3. 1.4.3 Built-in documentation
        4. 1.4.4 Vignettes
        1. 2.1 Types of data
          1. 2.1.1 Numbers
          2. 2.1.2 Text (strings)
          3. 2.1.3 Categories (factors)
          4. 2.1.4 Dates and times
          5. 2.1.5 Logicals
          6. 2.1.6 Missing values
          1. 2.2.1 Naming data (variables)
          2. 2.2.2 Unchanging data
          3. 2.2.3 The assignment operators (
          1. 3.1 Basic mathematics
          2. 3.2 Operator precedence
          3. 3.3 String concatenation (joining)
          4. 3.4 Comparisons
          5. 3.5 Automatic conversion (coercion)
          6. 3.6 Try it yourself
          7. Terminology
          8. Summary
          1. 4.1 Functions
            1. 4.1.1 Under the hood
            2. 4.1.2 Function template
            3. 4.1.3 Arguments
            4. 4.1.4 Multiple arguments
            5. 4.1.5 Default arguments
            6. 4.1.6 Argument name matching
            7. 4.1.7 Partial matching
            8. 4.1.8 Scope
            1. 4.2.1 Installing packages
            2. 4.2.2 How does R (not) know about this function?
            3. 4.2.3 Namespaces
            1. 4.3.1 Creating messages, warnings, and errors
            2. 4.3.2 Diagnosing messages, warnings, and errors
            1. 5.1 Simple collections
              1. 5.1.1 Coercion
              2. 5.1.2 Missing values
              3. 5.1.3 Attributes
              4. 5.1.4 Names
              1. 5.2.1 Vector functions
              2. 5.2.2 Vector math operations
              1. 5.3.1 Naming dimensions
              1. 5.6.1 The tibble class
              2. 5.6.2 Structures as function arguments
              1. 6.1 Text processing
                1. 6.1.1 Text matching
                2. 6.1.2 Substrings
                3. 6.1.3 Text substitutions
                4. 6.1.4 Regular expressions
                1. 6.2.1 Vectors
                2. 6.2.2 Lists
                3. 6.2.3 Matrices
                1. 6.4.1 dplyr verbs
                2. 6.4.2 Non-standard evaluation
                3. 6.4.3 Pipes
                4. 6.4.4 Subsetting data.frame the hard way
                1. 6.9.1 Solutions — no peeking
                1. 7.1 Tidy data principles
                  1. 7.1.1 The working directory
                  2. 7.1.2 Stored data formats
                  3. 7.1.3 Reading data into R
                  4. 7.1.4 Scraping data
                  5. 7.1.5 Inspecting data
                  6. 7.1.6 Dealing with odd values in data (sentinel values)
                  7. 7.1.7 Converting to tidy data
                  1. 8.1 Looping
                    1. 8.1.1 Vectorization
                    2. 8.1.2 Tidy repetition: Looping with purrr
                    3. 8.1.3 for loops
                    1. 8.2.1 while loops
                    1. 8.3.1 if conditions
                    2. 8.3.2 ifelse conditions
                    1. 9.1 Data preparation
                      1. 9.1.1 Tidy data, revisited
                      2. 9.1.2 Importance of data types
                      1. 9.2.1 General construction
                      2. 9.2.2 Adding points
                      3. 9.2.3 Style aesthetics
                      4. 9.2.4 Adding lines
                      5. 9.2.5 Adding bars
                      6. 9.2.6 Other types of plots
                      7. 9.2.7 Scales
                      8. 9.2.8 Facetting
                      9. 9.2.9 Additional options
                      1. 10.1 Writing your own packages
                        1. 10.1.1 Creating a minimal package
                        2. 10.1.2 Documentation
                        1. 10.2.1 Unit testing
                        2. 10.2.2 Profiling
                        1. 10.3.1 Regression
                        2. 10.3.2 Clustering
                        3. 10.3.3 Working with maps
                        4. 10.3.4 Interacting with APIs
                        5. 10.3.5 Sharing your package
                        1. Windows
                        2. Mac
                        3. Linux
                        4. From source
                        1. Installing RStudio
                        2. Packages used in this book
                        Show and hide more

                        Product information

                        • Title: Beyond Spreadsheets with R
                        • Author(s): Jonathan Carroll
                        • Release date: January 2019
                        • Publisher(s): Manning Publications
                        • ISBN: 9781617294594