Posts

redditApiR

Daniel Tafmizi Dr. Friedman Lis 4370 redditApiR Github:  DanielDataGit/RedditApiR: redditApiRPackage  Authenticating Reddit Account: Create or log into account Ensure account is verified by going to account settings-email-verify Go to  preferences (reddit.com) Create an App (personal use script for data collection needs) Use the following script: library(httr) # Initialize parameters client_id <- "Code under "personal use script" " client_secret <- "next to secret" user_agent <- "app name" username <- "Account username" password <- "Account Password" # Get the access token response <- POST(   "https://www.reddit.com/api/v1/access_token",   authenticate(client_id, client_secret),   body = list(     grant_type = "password",     username = username,     password = password   ),   encode = "form",   user_agent(user_agent) ) # Extract the access token token <- content(response)$acc...

RMD

Daniel Tafmizi Dr. Friedman Lis 4370 Module 12 Github:  daniel.R/Work.R/LIS4370Rprog/mod12.Rmd at main · DanielDataGit/daniel.R I have been using the R-markdown structure in my Data and Text Mining course. Having this practice, I have become familiar with the structure and html output. One new thing I learned was the kable function. This offers a visually appealing way to structure an R table output. I am excited to test more knitr functions while creating vignettes for my final project. I think RMD's are a great way to teach users how a package works.

Debugging in R

Daniel Tafmizi Dr. Friedman Lis 4370 Module 11  Github:  daniel.R/Work.R/LIS4370Rprog/debugR.R at main · DanielDataGit/daniel.R When debugging I came across two error statements. The line " outliers [,j ] <- outliers [,j ] && tukey.outlier (x [,j ])  " resulted in error messages. To resolve this problem, I wrote "outliers[j] <- all(outliers [,j ])". This fixed the logical coercion and the lack of a tukey.outlier function . Since we have no method for detecting outliers, this just makes it so that the answer is TRUE for all.  

Description File

Daniel Tafmizi Dr. Friedman Lis 4370 Module 10 Github:  daniel.R/Work.R/LIS4370Rprog/finalproject/DESCRIPTION at main · DanielDataGit/daniel.R     For my final project, I would like to incorporate the fellowship project. In it, I will determine the architecture of my fellowship project, as well as start to compile techniques for ML modelling and its respective visualizations. I plan to define techniques for supervised ML, broken down into classification and regression. I hope to make this as beginner friendly as possible, as I have had and continue to have my fair share of troubles dealing with the ML jargon found in professional sources. Overall, I hope to mimic the scikit-learn python library in a more R/beginner friendly way. If the fellowship project heads in a different direction, this project will still act as a great step in understanding how I can frame different elements for ML visualization.

R Visualization

Image
Daniel Tafmizi Dr. Friedman Lis 4370 Module 9 Github:  daniel.R/Work.R/LIS4370Rprog/wineClusterViz.R at main · DanielDataGit/daniel.R     I did some clustering analysis with the wine dataset that includes Country, alcohol as liters of wine, deaths per 100,000, heart disease per 100,000, and liver disease per 100,000. Some correlations are prevalent. Unsurprisingly, heart disease and death are positively correlated. Alcohol and liver disease have a positive correlation. Interestingly, heart and alcohol have a negative correlation.  I chose to use k-means as opposed to knn because of the small dataset. I think the algorithm did a great job of creating clusters. I found that three clusters resulted in the most uniformity.  It is interesting to see where each country ends up on the map. I thought this was a really cool visualization. It incorporates a dendrogram into the heatmap, showing us the clusters in a different style. This shows us that there are 5 cluster hi...

Mod 8 Input/Output

Daniel Tafmizi Dr. Friedman Lis 4370 Module 8 Github:  daniel.R/Work.R/LIS4370Rprog/Mod8input_output.R at main · DanielDataGit/daniel.R (github.com) After looking at the step-by-step solutions I see I have misunderstood the first prompt. I used group_by to summarize by sex and give the mean age and grade. The second command I used what was listed to extract [iI] names. The subset with grepl worked perfectly. It kept only names with the letter [iI] in it. I find the syntax a little weird. Using "[iI]" is not what I would expect. I was thinking it would be something like c("i", "I") or by using the | operator. Step 1: Sex Age Grade 1 Female 21.375 86.9375 2 Male 21.250 80.2500 Step 2: Name Age Sex Grade 3 Lauri 21 Female 90 4 Leonie 21 Female 91 6 Mikaela 20 Female 69 8 Aiko 24 Female 97 9 Tiffaney 21 Female 78 10 Corina 23 Female 81 11 Petronila 23 Female 98 12 Alecia 20 Female 87 13 ...

Mod 7 OOP

Daniel Tafmizi Dr. Friedman Lis 4370 Module 7 1. How do you tell what OO system (S3 vs. S4) an object is associated with? Using the function isS4() or isS3() on the object name will tell you which OO system the object was designed under.  2.  How do you determine the base type (like integer or list) of an object? There are a few useful functions for understanding the storage types of an object. typeof() for storage type, class() for structure type, mode() for data type. 3.What is a generic function? A  function that can be applied to different data and class types. A generic function will have many methods for differing data and class types and call them as needed.  4. What are the main differences between S3 and S4? Many differences lie in how operations are called in either class. These differences arise from the fact that S4 objects require explicit class, structure, and inheritance definitions, while S3 do not. I like S4 objects because they remind of classes in ...