Biostatistician/Professor/Founding Chair of Biostatistics, Vanderbilt U. Blog: Statistical Thinking:fharrell.com @f2harrell on bsky.social

Nashville, TN
We lost one of the greatest statisticians of all time: Sir David Cox, developer of both the binary logistic regression model (1958) and the Cox proportional hazards model (1972), but so much more. rss.org.uk/news-publication/… @vandy_biostat #Statistics #biostat
13
260
911
My close colleague Sam Nwosu @vandy_biostat flabbergasted me today with the gift of a batch of biostatistics cookies made by his wife Brionni, including a cookie version of my book! Much easier to digest than the paper version ...
25
97
801
Announcing that my free text Biostatistics for Biomedical Research has been significantly updated and turned into a 22 chapter reproducible e-book thanks to @rstudio 's Quarto : hbiostat.org/bbr #bbrcourse #Statistics #clinicaltrials #RStats @vandy_biostat @EdgeforScholars
11
203
801
100 pages of online course notes added for Regression Modeling Strategies: intro to survival analysis and parametric survival models: hbiostat.org/doc/rms.pdf now 504 pages #bbrcoourse #Statistics @vandy_biostat
14
204
797
What an 11 days. Positive stress echo test on July 15, cardiac cath on 19th, coronary bypass surgery on 20th, home 24th. Incredible medical care from cardiologist Dr See, surgeon Dr Shah, teams @VUMChealth, support and care from my md wife Liana & family/friends/colleagues.
111
5
741
Toying with the idea of hosting an almost-weekly one-hour live webinar (with student participation) on applied statistics/biostatistics. Some topic choices would come from twitter polls. Please respond to the poll in the next tweet if you are definitely interested.
77
58
713
Statistical thought of the day: bad statistical practice is so pervasive in so much of observational research that correlation is not correlation.
12
157
690
This t-shirt says it all
8
89
610
Charlotte Briggs Harrell 1926-2021 Today I lost my dear mom, two weeks after her 95th birthday. I feel a great emptiness but I also feel wonder at how could I have been this lucky to have had her as a mom.
69
4
531
#rstats discovery of the day: patchwork: elegant grammar for combining plots: github.com/thomasp85/patchwo… by @thomasp85
14
104
463
R Workflow is now an e-book at hbiostat.org/rflow - it's a result of 31 years of using R & its precursor S in reproducible biomedical research, and capitalizes on Quarto, data.table, ggplot2, Hmisc,... #Rstats @vandy_biostat @VUDataScience #DataScience #Statistics @rstudio
10
122
447
Regression Modeling Strategies course notes have been significantly expanded, updated, and converted into a free #quarto e-book at hbiostat.org/rmsc #rmscourse @vandy_biostat @VUDataScience #RStats @quarto_pub @EdgeforScholars #Statistics
2
108
398
Galben Harrell - one of the saddest days of my life to lose him to Cushing's disease today. He was about 12 years old. A finer friend and companion I could not imagine.
40
1
377
New R package rmsb: Bayesian counterpart to the rms (regression modeling strategies) package now on CRAN. Uses pre-compiled @mcmc_stan code. Information including lots of examples at hbiostat.org/R/rmsb #rmscourse @vandy_biostat @VUDataScience
3
82
361
Clinicians: please give me hope as a teacher by telling me that you understand that NNT does not apply to an individual patient unless the average baseline risk in the data used to compute NNT just happened to equal the baseline risk of the individual.
32
105
376
Most straightforward definition of p-value I've been able to write: the probability that someone else's data are more extreme than mine if their data were generated with my H0 in effect. p-values tell nothing more than that. fivethirtyeight.com/features…
15
115
333
Announcing datamethods.org - a place for discussions about data-related methods where methodologists meet clinical, translational, health researchers to discuss design, analysis, measurement, interpretation, articles, and more. Rationale@ fharrell.com/post/disc
7
166
341
Biostatistics for Biomedical Research - 472 pages developed from collaborations with basic+clinical researchers. All my teaching materials not related to predictive modeling or Bayes. Clinicians: let me know what needs to be added. [source on Github] fharrell.com/doc/bbr.pdf
11
137
327
#Statistics thought of the day: Seek probability of a real difference given data, not prob. of data given no real difference. Be bold. Embrace Bayes. Embrace transparent criticizable use of prior beliefs & operate in a predictive actionable mode. fharrell.com/post/journey
6
79
317
Seeing the surgeon authors call us "trolls" reminds me of this: Apparently surgeons can practice statistics with zero training, but were I to practice surgery I would be arrested.
Statisticians clamor for retraction of paper by Harvard researchers they say uses a “nonsense statistic” ⁦@ADAlthousePhD⁩ ⁦@RetractionWatchretractionwatch.com/2019/06/…
14
70
296
The DeGroot prize is a huge honor in the #Statistics field. Congratulations Richard! Statistical Rethinking is a masterpiece. Stunningly intuitive and currently the most influential stats text of all.
The folks at @ISBA_events tell me my book Statistical Rethinking has won the 2024 DeGroot Prize for its contributions to "statistical inference, decision theory and statistical applications". This is huge honor especially given the previous winners, who have influenced me so much
1
38
302
21,255
Forget deep learning. We need to study deep stupidity.
Hundreds rally to preserve right not to vaccinate children amid measles outbreak cbsn.ws/2I3F2j1
11
69
284
#Statistics thought of the day: #MachineLearning is not so much for high dimensional data but for data where interrelationships among predictors and outcome are too complex to model with a statistical model that assumes additivity by default. @VUDataScience #rmscourse
6
36
266
I've greatly expanded my chapter on Bayesian clinical trial design with examples of Bayesian power and sample size simulations for time-to-event and ordinal outcomes, incorporating uncertainty in effect size to detect ... hbiostat.org/bayes/bet/desig… @vandy_biostat
6
58
278
34,332
Biostatistics for Biomedical Research almost-weekly web course registration is now open. Go to hbiostat.org/bbr for course details and registration link. @EdgeforScholars #bbrcourse @vandy_biostat @VUMChealth #StatThink
14
114
280
1/2 @drjohnm has a wonderful piercing commentary about an incredibly harmful paper published in @CircAHA. The paper's authors use of the phrase "real world" is repugnant. (A problem w/ substack: can't add comments there unless you pay). sensiblemed.substack.com/p/a… @EdgeforScholars
18
63
288
175,664
#Statistics thought of the day: If I voiced as many clinical opinions as some clinicians voice statistical opinions I'd be in hot water.
11
35
257
Wake-up call: #MachineLearning performance in time series forecasting: simple statistical methods outperformed complex algorithms: journals.plos.org/plosone/ar…
9
112
252
The ethics of not randomizing convalescent plasma needs serious consideration. The negative consequences on public health and science are potentially huge.
6
69
227
Idea for replacing our nearly broken journal and peer review systems: academics create their own electronic journals/archives, peer reviews are open and authored, and work with universities to give credit for peer review equivalent to (1/k) × writing a paper, for suitable k.
22
73
239
How to convince me of a subgroup effect: (1) don't do "subgroup analysis"; use model-based estimates on the whole sample; (2) show strong evidence for a pre-specified interaction adjusted for all main effects; (3) demonstrate smooth dose-response effect of interacting factor.
4
73
251
Statistical graphics resource suggestion of the day: I just stumbled upon the online #rstats plotly book. Terrific methods for graph construction, use of html widgets, linking graphs, and more: plotly-book.cpsievert.me . Highly recommend plotly graphics model for html reports.
3
79
239
The R Hmisc package, started in 1991, just underwent the biggest update in its history with version 5, now on CRAN. Many new functions and no longer loads other packages at startup: hbiostat.org/R/Hmisc #rstats @vandy_biostat @VUDataScience
4
32
228
30,494
For researchers using the wonderful REDCap electronic data capture/research data management system, Chapter 5 of hbiostat.org/rflow has new sections on automatic interfaces between REDCap and R. #rstats
3
57
237
26,996
Our checklist for authors for statistical issues in study design, analysis, and reporting has been updated and has a new home on datamethods. It is a wiki so that others can improve the content, in addition to posting suggestions as replies. discourse.datamethods.org/t/… #bbrcourse
4
105
231
Soon to release version 6.0 of the R rms package(a 29 year project). With much help from @mcmc_stan guru Ben Goodrich now has blrm for Bayesian binary/ordinal logistic models w/ random effects. Nomograms and other model graphics. Bayes is getting easier: hbiostat.org/R/rms/blrm.html
5
62
234
If @CDCgov is trying to gain back credibility they won't do it with crappy research like this @vandy_biostat @EdgeforScholars #COVID19
A few thoughts on the CDC's newest "science" 👇👇👇 vinayprasadmdmph.substack.co…
10
50
227
Starting with a job as a research aide as an 18 year old, I'm celebrating my 50th consecutive year working in cardiology. What a great field in which to collaborate, with great researchers! #cardiotwitter #Cardio #Cardiovascular @califf001 @DanMarkMD @boback @CMichaelGibson
5
8
231
Language to use to get favorable peer review in #medicine: We will use AI to describe heterogeneity of treatment effect, leading to #PrecisionMedicine and optimizing the number needed to treat, while making startling discoveries about pt's microbiome effects on medical decisions.
19
36
223
Outstanding paper! Especially suited for non-statisticians who want to get started with relaxing linearity assumptions without resorting to dreaded categorization techniques ... @Statistics
NEW PAPER in @bmj_latest "Dealing with continuous variables and modelling non-linear associations in healthcare data: practical guide" --> bmj.com/content/390/bmj-2024… #methodologymatters
2
55
248
19,753
Best book advertisement I could get. Thanks @ChelseaParlett and use hbiostat.org/rmsc to see several new case studies. hbiostat.org/bbr has several simpler case studies. #rmscourse #bbrcourse
📕Regression Modeling Strategies @f2harrell is the G.O.A.T. This book is like an encyclopedia for all the regression models you’re dying to use in your work. From ordinal models, to survival analysis…this book has it all. And endless case studies to see them in action.
8
32
213
35,229
Vaccine denier: one who has an understanding of benefits vs. risks that is so poor that when he is offered a parachute in a plane about to crash he declines the parachute because of an allergy to nylon.
7
66
224
Nice piece. Lack of proper normalization is the tip of the "avoiding #Statistics" iceberg: Forbes: How Data Scientists Turned Against Statistics. forbes.com/sites/kalevleetar… via @GoogleNews
6
82
212
It's easy to create a statistical checklist of what NOT to do - we've had this for years: biostat.mc.vanderbilt.edu/Ma…
The statistical checklist: Could there be a list of guidelines to help analysts do better work? andrewgelman.com/2018/07/17/…
6
70
212
Still believe in p-values? If so, you need to know exactly what you're getting: fharrell.com/post/pval-litan…
8
95
209
New major release of R rms package coming soon. Includes Bayesian Stan-based binary and ordinal logistic regression allowing for the rms capabilities such as partial effect plots, nomograms, etc. Examples here: hbiostat.org/R/rms/blrm.html #rstats @vandy_biostat @VUDataScience
7
45
204
Biostatistics for Biomedical Research web course is likely to start Sept 27. Voting for time of day is too close to call at present. First session will cover Chapter 3 of hbiostat.org/doc/bbr.pdf up to Random Variables.
11
63
216
The #RStats Hmisc package on CRAN is 30 years old and still getting a lot of enhancements and bug fixes. @vandy_biostat @VUDataScience
2
18
203
Optimum decision making in presence of uncertainty comes from probabilistic thinking. The relevant probs. are of a predictive nature: P(the unknown | the known). Thresholds are not helpful and are completely dependent on the utility/cost/loss function. fharrell.com/post/backwards-…
6
72
218
#Statistics thought of the day: #MachineLearning is to statistical models as #PrecisionMedicine (including biomarker-guided therapy and PRS) is to using standard clinical information. Neither ML nor precision med is living up to its hype.
9
50
199
New blog article to help in the choice between developing statistical models and #Machine_Learning algorithms (especially in #medicine): fharrell.com/post/stat-ml
8
93
198
#Statistics thought of the day: Of all the statistical assumptions that are routinely violated that matter the most, the linearity assumption is near the top of the list @vandy_biostat @VUDataScience #rmscourse @EdgeforScholars #bbrcourse
8
55
202
Big update to R Workflow includes Consort and Mermaid diagrams, analyzing data about the data, missing data patterns, descriptive graphics for discrete & continuous longitudinal data & time-to-event data. fharrell.com/post/rflow #RStats @vandy_biostat @VUDataScience @rstudio
3
38
200
Best advice for drawing an ROC curve: use invisible ink. The visibility would then match its utility.
6
38
191
#Statistics thought of the day: to relax the linearity assumption in regression, don't categorize continuous variables. Use cubic splines, which only categorize the 3rd derivative (jolt) of Y vs X #rmscourse #bbrcourse @vandy_biostat @VUDataScience
7
47
204
The #RStats Hmisc package has another major update. One of the biggest changes is new output options for describe() including interactive sparklines for spike histograms. hbiostat.org/R/Hmisc @vandy_biostat
5
34
195
21,814
Surprising fact of the day for clinicians: often treatment efficacy estimates from narrowly focused RCTs are more relevant to clinical practice than "real world" estimates from diverse populations, because of systematic bias in the latter. fharrell.com/post/ehrs-rcts
5
70
192
#Statistics thought of the day: Witnessing the continued hype of #AI by academic medicine and industry makes me think that we should spend more time on #RHI (Real Human Intelligence). @vandy_biostat @VUDataScience #StatThink @MaartenvSmeden
12
34
182
To biomedical researchers: It’s not too late to learn from the medical #Statistics giant Doug Altman. Had his advice been heeded decades ago perhaps we would not have the scandal of poor research we see today.
Your yearly reminder that “We need less research, better research, and research done for the right reasons” #OTD 1948 Douglas Altman b (d 3 June 2018)🇬🇧 A brilliant statistics educator & “one of the most influential medical statisticians of the past 50 years” /5
5
44
188
19,593
Are you a fan of point null hypothesis testing in medical research? Save a lot of time and money---unless you are studying homeopathy, most dietary supplements, or acupuncture, you can safely assume all null hypotheses are false. fharrell.com/post/journey
8
57
189
#Statistics #clinicaltrial thought of the day: a randomized trial on a patient sample differing much from the target population provides a much better estimate of effectiveness for the target pop. than an observational study done ON the target pop. fharrell.com/post/rct-mimic
11
59
197
Sometimes I wish we had introduced ordinal variable values as letters of the alphabet so people would not be tempted to use non-interval-scaled ones as numeric. This will be a great presentation by @rlmcelreath . And the R brms package can handle ordinal X and ordinal Y.
Likert scores are not integers and they cannot be subdued by pretense. Stop pretending and meet me in the warm 3rd circle of stats hell and learn about ordered categorical models. Lecture: piped.video/watch?v=VVQaIkom…
4
26
186
24,798
Highly probable statistical fact of the day: the gains in predictive ability in medical research claimed by #MachineLearning that are validated are less than the gains that would be achieved by applying best statistical practice to statistical modeling #StatThink
3
73
180
Wonderful Mother's day with younger brother Bill (aka Moose, also a Vandy guy) and inspirational 94 year old mom Charlotte
4
1
177
Significant update to the R Hmisc package to version 4.2-0. Hmisc is now > 25 years old! Changes are described here: cran.r-project.org/web/packa… . Many of the changes relate to html report writing and plotly graphics.
5
31
178
When two employees are fired for reporting sexual harassment and the alleged harasser is not, the culture and actions of a company are worth a closer look.
2
44
173
This is one of the best introductions to Bayesian inference for non-statisticians I've ever seen, plus a great overview of frequentist #Statistics: jmir.org/2018/10/e10873 @jmirpub @marcusbendtsen #bbrcourse #Bayesian
6
45
183
Statistical thought for the day - why classification is seldom what is needed for decision making and why probabilistic thinking is helpful: fharrell.com/post/classifica….
3
66
185
A lot of BS is being pushed to the scientific community in the form of dressing up inadequate design so that one can pretend to answer a question with inadequate data. Watch out.
You know it’s true
9
44
194
26,509
For anyone manipulating longitudinal data, the data.table package in #rstats is invaluable. Here is a new example, of regularizing irregularly-timed longitudinal measurements: hbiostat.org/rflow/long#sec-… @vandy_biostat #Statistics #DataScience
42
176
17,186
Extremely important methods comparison: multiple different analyses of the same dataset
Published! "Many Analysts, One Dataset: Making transparent how variations in analytical choices affect results" 65 of us led by Raphael Silberzahn demonstrate the contingencies of analytic decisions on observed outcomes. doi.org/10.1177/251524591774… (OA: psyarxiv.com/qkwst/)
6
58
163
Nice discussion of how to collapse/reduce a large number of levels in a categorical predictor: stats.stackexchange.com/ques…
3
48
177
Final results are in. Thanks to 1422 voters! Friday mornings 10am US ET works for most people (sorry Australia!) for live stream of free BBR biostatistics course. More details about planning are at fharrell.com#teaching and datamethods.org/t/bbr-video-…
If interested in participating in an almost-weekly 1 hour applied stat/biostat series please respond:
12
61
174
#Statistics throught of the day: If sponsors knew how much money was wasted with fixed sample size designs, and how much earlier Bayesian sequential designs would have bailed out on ineffective treatments, they'd be shocked. hbiostat.org/bayes/bet/desig…
8
38
175
25,196
This is a beautiful display - so many claims about machine learning are exaggerated, and so many comparisons with statistical models have used only trivial out-of-date statistical models. #Statistics
Logistic regression still the GOAT😤: When you fix the problems in papers claiming "ML predicts XYZ amazingly well", you end up with ML ~= LR cell.com/patterns/fulltext/S…
1
39
180
16,080
The Last Walk - noble friend Galben before going for surgery, complications of which he could not overcome (caught by security cam):
14
157
Turned off by statistical significance? Afraid that clinical significance cannot be derived from null hypothesis testing? Worried about choice of non-inferiority margins? Bayesian posteriors provide evidence for all possible effect magnitudes: bmjopen.bmj.com/content/bmjo…
4
43
170
This is a must-see on many levels. While watching it I became frightened at how things are so similar in my field of #Statistics especially related to @skdh 's comment "They just wanted to write papers", plus how fad-driven is #Statistics.
How I fell out of love with academia (this video was an accidental publication/scheduling blunder😬😬 but well uh, happy Friday I guess) piped.video/watch?v=LKiBlGDf…
8
28
179
47,143
#Statistics thought of the day: If age is a strong prognostic factor, a hazard ratio that doesn't adjust for age is effectively comparing some of the younger patients on treatment A with some of the older patients on treatment B even with perfect covariate balance. #bbrcourse
4
33
164
An honor to have Ellie Murray @EpiEllie visit @vandy_biostat and to be able to attend a great seminar and chat with her, plus to take part in a @casualinfer podcast recording with Ellie and @LucyStats . #epitwitter
5
9
168
#Statistics thought of the day: One of the most misleading ideas perpetuated by some #MachineLearning advocates is that a ML method requiring 𝗺𝗼𝗿𝗲 parameters to be estimated can get by with 𝘀𝗺𝗮𝗹𝗹𝗲𝗿 sample sizes. stats.stackexchange.com/ques… @vandy_biostat @VUDataScience
7
43
165
#Statistics thought of the day: If you must use any cutpoints for continuous variables, only use them for the 3rd derivative of how the variable relates to outcome, i.e., use cubic splines #statstwitter @vandy_biostat @VUDataScience hbiostat.org/rmsc/genreg.htm…
2
27
165
Giving a seminar @Stanford @StanfordMed yesterday and having the incomparable Brad Efron in attendance was a deep honor. Not to mention the amazing @HeartBobH (background), Rob Tibshirani, Trevor Hastie, @goodmanmetrics and so many others I revere ... daunting! @vandy_biostat
6
12
156
Relative risk was never a good measure; it's perceived to be interpretable precisely because it is misinterpreted. This excellent paper helps to put the nail in the coffin with data and math. @TChivese @bbrcourse #rmscourse @vandy_biostat
#openaccess Questionable utility of the relative risk in clinical research: A call for change to practice jclinepi.com/article/S0895-4…
9
41
155
Statistical quote of the day. Stepwise variable selection has done incredible damage to science. How did we statisticians let this happen?
A journey of a thousand hypotheses begins with a single stepwise regression
7
61
150
New chapter in Regression Modeling Strategies #rms: ordinal regression generalizes Wilcoxon, log-rank, Kaplan-Meier, Cox PH model, and all common survival time analyses. Semiparametric regression is a unifying concept. hbiostat.org/rmsc/ordsurv #rstats #Statistics
2
37
174
10,118
Clinical trialists: A major inefficiency in randomized trials, resulting in inflation of needed sample size, is the belief by investigators in dichotomizing individual patient responses to be in line with how you want to interpret the study. Not needed! hbiostat.org/proj/covid19/st…
2
47
166