Importing Data, Checking the Imported Data and Working With Data in R; Dataset: https://goo.gl/tJj5XG
More Statistics and R Programming Tutorials: https://goo.gl/4vDQzT
How to import a datasets into R , How to make sure data was imported correctly into R and How to begin to work with the imported data in R.
▶︎We will learn to use read.table function (which reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file), and some of the arguments such as header argument and sep argument.
▶︎We will learn to use file.choose function to choose a file interactively
▶︎We will discuss how to use Menu options in RStudio to import data into R
▶︎and how to check the imported data to make sure it was imported correctly into R using the dim function to retrieve dimension of an object and let you know the number of rows and columns of the imported data, the head function in R (head() function), which returns the first or last parts of a vector, matrix, table, data frame and will let you see the first several rows of the data, the tail function in R (tail() function) to see the last several rows of the data in R, the double square brackets in R to subset data (brackets lets you select or subset data from a vector, matrix, array, list or data frame) , and the names function in R to get the names of an object in R.
▶︎▶︎ Download the dataset here:
▶︎Export Data from R (CSV , TXT and other formats): https://bit.ly/2PWS84w
▶︎Graphs and Descriptive Statistics in R: https://bit.ly/2PkTneg
▶︎Probability Distributions in R: https://bit.ly/2AT3wpI
▶︎Bivariate Analysis in R: https://bit.ly/2SXvcRi
▶︎Linear Regression in R: https://bit.ly/1iytAtm
▶︎Intro to Statistics Course: https://bit.ly/2SQOxDH
◼︎ Topics in the video:
0:00:07 How to read a dataset into R using read.table function and save it as an object
0:00:27 How to access the help menu in R
0:01:02 How to let R know that the first row of our data is headers by using header argument
0:01:14 How to let R know how the observations are separated by using sep argument
0:02:03 How to specify the path to the file using file.choose function
0:03:15 How to use Menu options in R Studio to import data into R
0:05:23 How to prepare the Excel data for importing into R
0:06:15 How to know the dimensions (the number of rows and columns) of the data in R using the dim function
0:06:35 How to see the first several rows of the data using the head command in R
0:06:45 How to see the last several rows of the data in R using the tail function
0:07:18 How to check if the data was read correctly into R using square brackets and subsetting data
0:08:21 How to check the variable names in R using the names function
This video is a tutorial for programming in R Statistical Software for beginners, using RStudio.
Content Creator: Mike Marin (B.Sc., MSc.) Senior Instructor at UBC.
Producer and Creative Manager: Ladan Hamadani (B.Sc., BA., MPH)
The #RTutorial is created by #marinstatslectures to support the statistics course (SPPH400 #IntroductoryStatistics) at The University of British Columbia(UBC) although we make all videos available to the everyone everywhere for free!
Thanks for watching! Have fun and remember that statistics is almost as beautiful as a unicorn!
you use the *row.names* argument within the "read.table" command. here is the explanation from R-help, as they say it more clearly than i would:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered.
Hi, it is hard to tell from a distance, but my best guess is that you are trying to import a file that you dont have "permission" to open. for example, the file may be in an "administrator account" and you are working in a different user account that doesn't have permission to access that file.
great job mate... i would love to watch all your videos... better thn ANY fu*k*ng online tuts available out there.. i wl donate someday soon aftr i get the job thru ur videos.. wish u can make videos on SAS , Python, SPSS
When i try to use the built in import function Rstudio says " preparing data import requieres an updated version of the Rcpp package. "
After installing, i keep getting the same prompt. Anyone having this problem?
Not that it matters too much because of the file.choose(read... command but still would be nice to know why it happens.
Thanks Mike for some great learning material.
... got the same issue as Chris Powley (3 ds ago) that the 'Smoke', 'Gender' and 'Caesarean' are being converted to CHARACTERS instead of as FACTORS in my LungCapData (which I have imported as csv - not able to import as txt.). Any suggestions?
So many thanks!
Hi +Carsten Grube , you can tell R to treat it as a factor instead...just use *Gender <- as.factor(Gender)* if you've attached the data, and if you haven't (and are working with $ instead, then use *LungCapData$Gender <- as.factor(LungCapData$Gender)*
Hi +Chris Powley , you can tell R to treat it as a factor instead...just use *Gender <- as.factor(Gender)* if you've attached the data, and if you haven't (and are working with $ instead, then use *LungCapData$Gender <- as.factor(LungCapData$Gender)*
Thanks for your tutorial, but I have some issues when doing the tail(Lung..), in your video it shows the number of the last six column in the table that retrieved, such as 720-725, but in mine, it keep giving 1-6, this even happen when I use LungCapData[c(5,6,7,8,9), ] to retrieve the specific columns, it gives me 12345 as the column no. instead of 5,6,7,8,9 in the table I got. So do you have any idea about that? Thanks.
Hii Marin while I importing data I got error like this can you please rectify this.
Error: '\U' used without hex digits in character string starting ""C:\U"
So I'm playing around with data imports in excel, and I'm trying to extract data from rows 1, 3, and 5. When I do this, the values are correct, but R labels the output as 1, 2, 3. Is there a fix?
Hi Shashim, you can find the the link to the dataset in the description of the video ('show more') or the info in each video: if you press the little "i" on the top right corner in the video you will be able to find the link to the dataset. you can also visit our website for the datasets : http://www.statslectures.com/index.php
Hey Marin, I have a problem. When I import your file LungCapData, the variables Smoke, Gender, Caesarean are in a character form. I cannot calculate means here. How can I change this to a factor variable (0,1)?
Hi, just a quick question re the usefulness of R.I am under the impression R is good at handling "large" data. By large, I mean in excess of 1million records, ie something Excel has problems with.What I don't understand is your preferred method of getting data into R.You state you like Excel (so do I!), so you typically get the data in Excel, save it as a csv then import it into R using read.table.But that would mean the size of your data you want to analyse is limited to what Excel can handle? Suppose you want to analyse data that contains 2million records (and is currently stored in Access or SQL Server). how would you import that into R?
Hi +YOON CHOI , I'm sorry but i have no idea what the issue is...especially without knowing what commands you've entered, or what the error message is. but i would suggest to do one of the tow following things: 1) just import the data using the *import data* tab that is the the top right corner of RStudio. or 2) in the read.table command, instead of specifying the path to the file, just instead enter: *data <- read.table(file.choose(), header=T,...)*, and this will let you select the data from a menu instead of specifying the path to it. i show how to do both of these in this and/or following videos.
Hello. I keep getting an error message - Object "Data" not found. The data showed up but I am unable to do a box plot or anything else because it does not recognized the variable. I have gone through and renamed my data, but it is not recognized no matter what name I change it do in the data set. Any advice?
Hi +Lynette 3132 , it's hard to know the error without knowing what commands you've entered into R. but here's a few things that may help. 1) have you attached the data, (e.g.) *attach(data)* , (with the name of the data in place of 'data') ? if you haven't, then you wont be able to call on variables by their name. (in the video that follows this one in the series, i explain the use of the 'attach' command). 2) when importing the data, have you stored in into something? (e.g.) if you just entered a command like *read.table(....)* then it will read in the data, but it wont be stored in anything. you have to store it in something, (e.g.) *data <- read.table(.....)*, and then it will be stored in the object 'data'. 3) have you done something like save it in 'data' but then try to call on it using 'Data'? R is case sensitive, so capitals/lower-case make a difference.
i hope that helps...
Hi +Boves Bova , i would recommend to take a look at the data is a spreadsheet application (something like Excel), if you want to view the data as it is. R is better for data analysis and manipulation.
Thanks a lot for these wonderful tutorial series. I would really appreciate if you upload some tutorials on analyses of big data (metagenomics or transcriptomics or metabolomics).....once again thank you very much for all the tutorials.
Hey! so I'm brand new to R, I've installed it and been able to do some stuff... But how do I get my window to be in that format? I only have the console showing. How do I get the two windows on the right showing?
Hi....how do you access .rda file located in the 'data' folder (made with save(vector,"MyData.rda") ) ? I thought the point of these files is that I can load them without having to specify a path - for example when it is part of a package. I tried the data("MyData") but it wouldn't load the data. How are these .rda file used in a function?
Hi....I think I finally found a way to do just that:
First I save the vector with:
save(lookuptable,file = "C:/tmp/lookuptable.rda")
and copy the resulting .rda file into the R-data directory.
Then I can get it back without path, this way:
lookuptable <- get("lookuptable", pos=globalenv())
It seems to matter that the name of the file is the same (including upper/lower caps) as the vector I read this into. Took a long time to find this.
Thanks, however, when I need this data for one of the functions in the package (behind the scene - like resource data in a C# method), how do I load it without the user of my package being bothered with specifying paths or anything at all. What I have is a lookup table that one of my functions needs. I thought this is what .rda files can be used for in the data folder. But I am not able to read it in - at least not without specifying MY environment specific path. How do other people accomplish that I wonder. Thanks again for any hint. Much appreciated!
Hi +fanobennemsi , you can usually load those using the load command: *load("MyData.RDA")* . you shouldn't need to specify the path to the file, as long as the file is saved in your current working directory. by default, it will be saved in the current working directory, but if you move the file to a new location, or if you change your working directory, then you will have to specify the path to the file.
if you want to use the variables by name you have to first *attach* the data (e.g.) *attach(data_name)*. if you don't attach, then you have to use the $ to extract the variables (e.g.) *class(data_name$variable_name)*
i explain all of these in my series of videos. id recommend to watch the series of videos, in order, as they will address most of the questions you have or will have.
I figured this out. But now I am having a problem in reading the variables. I used names to get the names of variables form my data set but when i am trying to use them like class( variable name) it is not reading the variables.
Hi Mike. Hope you are doing well. I am trying to do some cost-effectiveness analysis for my PhD project. Can you suggest some R packages for cost-effectiveness analysis? I am also interested in meta-analysis with R. Thank you.
+Yiqun Lin thanks, things are good, hope you're doing well, and progressing with your work! those aren't areas of specialty for me, so i can only recommend things second hand, but not based on my own personal experience with them. for cost-effectiveness, id say to check out "BCEA" or "ICEinfer" , and for meta analysis, there are tons of packages that do different things...check out this link, as it explains a bunch of different packages, and what meta-analysis tools they have: http://cran.r-project.org/web/views/MetaAnalysis.html
Hi +Ramanan R. V. , yes, all commands will be the same using just the R console. the only difference with RStudio is that it takes the console, the plot window, etc, and organizes them a bit more neatly.
The videos are hugely helpful. But i am having problem opening a file directly from folder as shown in the video
"data1 = read.table(file="C:\user\Dominator\Desktop\ratTAB.txt", header=T, sep="\t")"
and i am getting an error as Error: '\u' used without hex digits in character string starting ""C:\u" Not sure what's wrong, can you help please.
Hi +Srijan Ghosh , it's always hard to troubleshoot these things from a distance, but my guess is that you're working in Windows, and you will need to include a double-slash after the C, (e.g.) C:\\user\Dominator\... Or, i know for my mac, it uses forward slashes, not back-slash. what i think is easiest is to go to the file you want to import and look at the "properties" of it, and somewhere there it should give you the location of the file, and you can see how to specify the path to the file from there... hope that helps you sort it out
You do not need to calculate payroll premium if your policy is on a stipulated billing cycle.
Sample Payroll Report.
However, if your policy is not yet on a stipulated billing cycle, this is typically what you will see when we send you a payroll report.
A split payroll report is sent when there is an Anniversary Rating Date on your policy, which is the month and day that rates, rating plans and rating systems are initially applied to a policy in force and each annual anniversary thereafter. Your payroll will need to be annotated for each period specified.
Sample Split Payroll Report.
We will need a complete employee job description before we add the classification to the policy. Please do not report payroll in the new classification until it has been reviewed and endorsed to your policy.
Job Duties Questionnaire.
We recommend you keep a copy of your previous payroll reports and payroll records for at least seven years, as you would your tax records.
Submitting Payroll Reports.
There are three different ways to submit your payroll reports.
State Compensation Insurance Fund P.O. Box 7441 San Francisco, CA 94120-7441.
Free payroll reports.