Statistics

A strange (if very French!) debate is taking place these days in the French main chamber, where some socialist deputies are contesting an incoming change in the regulation of university studies that would allow some courses to be taught ...
A strange (if very French!) debate is taking place these days in the French main chamber, where some socialist deputies are contesting an incoming change in the regulation of university studies that would allow some courses to be taught in… English! Quelle horreur!!! Since this option has been implemented by many universities, incl. Dauphine, it means that we all are acting outside the law! I do not fear in the least being indicted for teaching R and Bayesian statistics in English… However, I find the action of these deputies missing the point: just like most other Western countries, we need to attract bright students from emerging countries in order to keep our departments open. It is unrealistic to think that those students will accept to learn French in addition to English, just because our universities are that attractive (and they are not!). Plus, our own students are asking for courses in English as they realise that their English level is not that great and that this training is more efficient than regular English courses… This position was better expressed in a Le Monde tribune a few days ago signed by several university professors, incl. Cédric Villani. Filed under: Kids, Travel, University life Tagged: English, French, French universities, Le Monde, loi Toubon
about 6 hours ago
(This article was first published on lukemiller.org » R-project, and kindly contributed to R-bloggers) In the previous post I outlined how to query the XTide software with R and parse the results into a handy-dandy data frame. T...
(This article was first published on lukemiller.org » R-project, and kindly contributed to R-bloggers) In the previous post I outlined how to query the XTide software with R and parse the results into a handy-dandy data frame. The biggest hurdle with that method is getting XTide up and running on your computer. The code outlined here works entirely within R, so you don’t need XTide installed on your computer. [...] To leave a comment for the author, please follow the link and comment on his blog: lukemiller.org » R-project. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
about 9 hours ago
From Concrete Mathematics: Incidentally, when we’re faced with a “prove or disprove,” we’re usually better off trying first to disprove with a counterexample, for two reasons: A disproof is potentially easier (we ...
From Concrete Mathematics: Incidentally, when we’re faced with a “prove or disprove,” we’re usually better off trying first to disprove with a counterexample, for two reasons: A disproof is potentially easier (we just need one counterexample); and nit-picking arouses our creative juices. Even if the given assertion is true, out search for a counterexample often leads us to a proof, as soon as we see why  a counterexample is impossible. Besides, it’s healthy to be skeptical.
about 10 hours ago
(This article was first published on a modeler's tribulations, gopi goteti's web log, and kindly contributed to R-bloggers) If you want to create rainfall maps for the whole world in R there is no readily available cod...
(This article was first published on a modeler's tribulations, gopi goteti's web log, and kindly contributed to R-bloggers) If you want to create rainfall maps for the whole world in R there is no readily available code or package to do this. Moreover, data publicly available from research institutions is not generally in plain text format or other familiar formats. Hydrological and climatological studies sometimes require rainfall data over the entire world for long periods of time. The Climate Prediction Center’s (CPC) site, daily data from 1979 to present, is a good resource. This data is available at CPC’s ftp site (ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/). I created R code to download rain/snow (or precipitation to be scientific) data from the CPC’s ftp site and plot it. All my code is available at my GitHub site - https://github.com/RationShop/cpcRain. The Rmarkdown document showing some examples is at my RPubs site - http://rpubs.com/RationShop/cpcRain To leave a comment for the author, please follow the link and comment on his blog: a modeler's tribulations, gopi goteti's web log. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
about 11 hours ago
(This article was first published on plausibel, and kindly contributed to R-bloggers) I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some impro...
(This article was first published on plausibel, and kindly contributed to R-bloggers) I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some improvements. The package is hosted on github.News:I added a reproducible example using artificial data which you can run by calling 'example(build.panel)'. This means you can try out the package before bothering to download anything and it provides a simple test of the main function.I've included a suggestion to use the R survey package to analyse this dataset and made it explicit in the examples how to obtain the desired weights for each wave. Note that your results are invalid in the majority of cases if you ignore the survey design (i.e. the weights).I got some useful comments from Anthony Damico (thanks!) and integrated the SAScii package. (check out his tutorials at http://www.asdfree.com/). This allows one to download the data directly from the PSID server into R, thereby removing any dependency on Stata or SAS to preprocess the raw data. (As is common with large datasets, the raw data come in ASCII format that needs to be fixed up into rows and columns.) The downside is that downloading directly takes a rather long time: downloading FAM1985ER, FAM1986ER and the index IND2009ER took 3 and a half hours.Hopefully I can get another round of feedback (particularly from a windows user: I could not test that all the paths are written correctly on a unix system) before submitting to CRAN.flo. To leave a comment for the author, please follow the link and comment on his blog: plausibel. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
about 12 hours ago
Filed under: Kids, pictures, Travel Tagged: E15 highway, graffitis, Paris suburbs, tags, taxi
Filed under: Kids, pictures, Travel Tagged: E15 highway, graffitis, Paris suburbs, tags, taxi
about 17 hours ago
(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers) I started working with R 2 1/2 years ago. I remember opening R closing it and thinking it was the dumbest thing ever (command ...
(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers) I started working with R 2 1/2 years ago. I remember opening R closing it and thinking it was the dumbest thing ever (command line to a non programmer is not inviting). Now it’s my constant friend. From the beginning I took notes to remind myself all of the things I learned and relearned. They’ve been invaluable to me in learning. They are not particularly well arranged nor do they credit sources properly. There are likely bad or outdated practices in there but I figured they may be helpful to others learning the language and so I’m sharing. Note that : 1) they are poorly arranged 2) they may have mistakes 3) they don’t credit others work properly or at all They were for me but now I think maybe others will find them useful so here they are: click here *Note that the file is larger ~7000KB and 274 pages worth. To leave a comment for the author, please follow the link and comment on his blog: TRinker's R Blog » R. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
1 day ago
One of the movies I watched during my hospitalisation is detachment, by Tony Kaye, with Adrian Brody as the lead actor. My daughter brought it to me as she remembered I was interested in it. detachment is a strong and highly original mov...
One of the movies I watched during my hospitalisation is detachment, by Tony Kaye, with Adrian Brody as the lead actor. My daughter brought it to me as she remembered I was interested in it. detachment is a strong and highly original movie about the U.S. school system and the complete lack of prospects for the students in deprived suburbs. I have seen several movies of that kind in the past, some of them rather good and keeping away from the fairy tale that an exceptional teacher is enough to rescue a class cohort or even a single student from a bleak future. This one is however the most pessimistic of all, with no happy ending of any sort (except for the last minute that should have been cut). The plot is not flawless, e.g. the main teacher redemption of the young prostitute being just too unrealistic, but the burnout of the teachers, the newspeak preaching of the administration, the nihilism of the high school students, the bullying of unusual students, and the complete absolute absence of the parents (unless I am confused we only see one [screaming] mother once, no parent shows up at parents’ night and the bullying father is only a voice…) make up for those flaws. Adrian Brody is delivering a superb performance in a great movie, sadly about a terrible issue with our educational system(s)… Filed under: Books, Kids Tagged: Adrian Brody, detachment, high school, movie review
1 day ago
(This article was first published on Econometrics_Help, and kindly contributed to R-bloggers) Every month I see one or more new R based web server solutions coming into the market, sight seeing some of them thought of sharing on...
(This article was first published on Econometrics_Help, and kindly contributed to R-bloggers) Every month I see one or more new R based web server solutions coming into the market, sight seeing some of them thought of sharing one of my old architecture map manifested to the client long back in early 2009 (good to see quick spreading of scalable and customizable open source statistical computing tool in the market). To leave a comment for the author, please follow the link and comment on his blog: Econometrics_Help. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
1 day ago
(This article was first published on bayesianbiologist » Rstats, and kindly contributed to R-bloggers) I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outc...
(This article was first published on bayesianbiologist » Rstats, and kindly contributed to R-bloggers) I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outcomes that can take on either of two possible states (eg Dead/not dead, heads/tails, cat in picture/no cat in picture, etc.) The most commonly used metric for this class of models is AUC, which assesses the relative error rates (false positive, false negative) across the whole range of possible decision thresholds. The result is a curve that looks something like this: Where the area under the curve (the curve itself is the Receiver Operator Curve (ROC)) is some value between 0 and 1. The higher this value, the better your model is said to perform. The problem with this metric, as many authors have pointed out, is that a model can perform very well in terms of AUC, but be completely miscalibrated in terms of the actual probabilities placed on each outcome. A model which distinguishes perfectly between positive and negative cases (AUC=1) by placing a probability of 0.01 on positive cases and 0.001 on negative cases may be very far off in terms of the actual probability of a positive case. For instance, positive cases may actually occur with probability 0.6 and negative cases with 0.2. In most real situations, our models will predict a whole range of different probabilities with a unique prediction for each data point, but the general idea remains. If your goal is simply to distinguish between cases, you may not care whether the probabilities are not correct. However, if your model is purporting to quantify risk then you very much want to know if you are placing the probabilistically true predictions on cases that are yet to be observed. Which begs the question: What is probabilistic truth?  This questions appears, at least at first, to be rather simple. A frequentist definition would say that the probability is correct, or true, if the predicted probability is equal to the long run outcomes.  Think of a dice rolled over and over counting the number of times a one is rolled. We would compare this frequency to our predicted probability of rolling a one (1/6 for a fair six-sided die) and would say that our predicted probability was true if this frequency matched 1/6. But what about situations where we can’t re-run an experiment over and over again? How then would we evaluate the probabilistic truth of our predictions? I’ll be working through this problem in a series of posts in the coming weeks. Stay tuned! To leave a comment for the author, please follow the link and comment on his blog: bayesianbiologist » Rstats. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
1 day ago