It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.1101/…
You have to hand it to Lex Fridman. His grift is not an amateur job. Take his Twitter photo. A professor standing in front of a blackboard with some math. Right?
Community note
Lex Fridman is a research scientist at MIT.
mit.edu/directory/?id=…
lexfridman.co
This recently published figure by @Sarah_E_Ancheta et al. is very disturbing and should lead to some deep introspection in the single-cell genomics community (I doubt it will).
It demonstrates complete disagreement among 5 widely used "RNA velocity" methods 1/
Aristotle was the first to notice honeybees dancing. In 1927 Karl von Frisch decoded the waggle. How it works was "explained" by MV Srinivasan AM FRS in the 1990s. Except @NeuroLuebbert found his papers are junk. A 🧵 about her discovery & our report: arxiv.org/abs/2405.12998 1/
The choice of whether to use Seurat or Scanpy for single-cell RNA-seq analysis typically comes down to a preference of R vs. Python. But do they produce the same results? In biorxiv.org/content/10.1101/… w/ @Josephmrich et al. we take a close look. The results are 👀 1/🧵
This is the paper that the terrorist who killed 10 people in Buffalo cited. @sapinker described the genetic variants as "collectively predict[ing] a big chunk of variance in educational attainment", which is false. 1/2
I've noticed it's becoming increasingly common in genomics to report results of regressions with ridiculously low correlation as "significant" based on a tiny p-value (for the hypothesis that the slope = 0).
Can you guess R^2, the p-value, and where the data below was published?
A friend (who does not work in science) asked me today whether it is true that "protein folding has been solved". My short answer:
The AlphaFold method produced very impressive results on CASP14. Protein folding is not a solved problem.
Kind of weird to see genomics people here today celebrating the log-fold-change of 0.0007371 in the top two times for the 100m dash at the olympics, but also throwing out any result where the log-fold-change is less than 1.
Interesting analysis by @jsm2334 of the Israeli #covid19 data revealing that intuition about vaccine efficacy has been misguided due to the Yule-Simpson effect (also known as Simpson's paradox). h/t @jbakcoleman covid-datascience.com/post/i…
Funny that in the interview you linked to James Watson didn't mention Rosalind Franklin. He did talk about the scientist whose work he stole at other times. Some quotes for your readers:
I've received numerous requests from bench biologists asking for bioinformatics tutorials to work through. In response @sinabooeshaghi and I will teach a #scRNAseq @zoom_us workshop (for up to 300) on Thursday March 26th @ 1pm PST. Join us at caltech.zoom.us/j/315126162
I've posted the notes/slides for my computational biology class at github.com/pachterlab/Bi-BE-…
Topics were chosen based on appearing in >=3 bio areas, although for focus examples are all drawn from #scRNAseq.
Homeworks include both theory and exploration of data (via @GoogleColab).
This reminded of my first computational biology conference. I didn't know anybody and was terrified. Therefore at the banquet I sat next to my advisor at the time. As I was talking to him security came & hauled him off. Someone thought he was a homeless person crashing the event.
While it’s fun to banter about what constitutes a good lab, the part of this that is uncomfortable to discuss is that leaving a bad lab is in many cases near impossible. Few universities offer much support and PIs can and do retaliate, in some cases ending careers.
Last night the genomics community applauded a disgraceful normalization of racism and sexism as Jim Watson was toasted on his 90th birthday. nitter.app/markjcowley/status/995… /1
I'm teaching an introduction (to an introduction) to single-cell RNA-seq today and making use of slides that others might find useful: figshare.com/articles/Introd…#scRNAseq
The blackboard Lex is standing in front of has basic calculus left over from an actual real MIT calculus class. It has nothing to do with what he is "teaching". The stuff he is presenting is a joke. You can listen in to some of his rubbish here: piped.video/-6INDaLcuJY?t=1409
Challenge accepted. Here are a few comments on the paper after starting to wade through its massive content. The paper in question is nature.com/articles/s41586-0… 1/🧵
Critiquing a paper for the number of figures, ext figures & support figures is really weird. Just read the paper & point out any actual issues you find. Anyone can peer review a paper or preprint post "publication".
The exciting reveal of Ultima Genomics last week was accompanied by the publication of four preprints. Intrigued by the potential of the technology, @sinabooeshaghi & I decided to take a look at the data. A 🧵 about our findings & a preprint we posted: biorxiv.org/content/10.1101/… 1/
Please stop using Tophat scholar.google.com.mx/schola… Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
I've been reading some of the #COVID2019 preprints that have been coming out and I can say with confidence that many academics would be better off watching @netflix during their quarantine.
I highly recommend "How to read (single-cell RNA-seq) PCA plots" (by @vallens): nxn.se/valent/2017/6/12/how-…#gi2019
He notes that "A particular danger [in interpreting T-shapes] is that it is tempting to interpret this as a bifurcation in the data." (it almost never is). #gi2019
Maybe it's worth considering that this paper, and others like it, that search in the weeds for "significant" PRS scores and ignore numerous important caveats, should not be published. They don't have any scientific value, and they only serve as material for manipulation. 2/2
I've gotten several requests recently for permission to use the notes from my computational biology class:
github.com/pachterlab/BI-BE-…
A reminder that they're licensed under CC BY 4.0: you're free to share and adapt, just give appropriate credit and indicate if changes were made.
Well first of all, this was an MIT IAP class. IAP is a short period in January when students get to take fun classes on various topic that can be taught by anyone (many by students). I once sat in on a brain dissection. You can learn how to count cards. web.mit.edu/willma/www/mit15…
I am a judge today for a high school science fair and was just reviewing posters. About half described their data as being problematic for drawing conclusions for various reasons (small sample size, inaccurate instrument measurement, etc.) If only my colleagues were this honest.
I have a few things to say about this tweet attacking @mbeisen and subtweeting me. Specifically, I want to talk about cancel culture gone mad... nitter.app/MLevitt_NP2013/status/… 1/14
If you work w/ single-cell RNA-seq & are performing RNA velocity analyses, you might find this @GorinGennady et al. preprint w/ Meichen Fang & Tara Chari of interest. It's a deep dive into the method, and navigation of the 67 pages may be aided w/ this🧵1/
biorxiv.org/content/10.1101/…
An appropriate response to this from @uwcse / @UW would be to ban @pmddomingos from all promotion / tenure decisions (of men and women) because he is clearly not qualified to judge the work of others.
He clearly doesn't know what he's talking about. This explanation of L1 and L2 is 😭 He is standing in front of the formula 1-cos(2θ). I doubt he could tell you what the cosine of an angle is. piped.video/-6INDaLcuJY?t=2293
I watched all this crap so you don't have to.
One of the interesting things about biology is that it’s so complex that we don’t have the slightest idea why some brains can discover new biology, while other brains can tweet this.
Question for you @lexfridman:
Why does your 25 popular episodes guest list contain only one BIPOC and no women at all? There are even more speakers named "Stephen"!
Do you believe that science and technology is for men only?
It really irks me when #bitcoin articles refer to mining as “solving complex math problems”. There is no solving, there is nothing complex, it’s not math, and the only problem is the size of the carbon footprint.
A tiny minority of the “sacrifices” of animals for biology research are actually needed. There is an enormous amount of unneeded murder of animals for research of poor quality, and many researchers don’t take ethical considerations / guidelines, e.g. the 3 Rs, seriously.
A thread on how to analyze *any* single-cell RNA-seq dataset *without a computer* !! This 🤯 magic made possible thanks to insights of @vntranos, @sinabooeshaghi and @pmelsted + work on kallisto | bustools 🐻🚌🛠️ by the authors of biorxiv.org/content/10.1101/… 1/12
This figure provides a glimpse of the future of epidemiology. Contact tracing coupled to genome sequencing for understanding and controlling a pandemic. Incredible work by the computational biologists at deCODE genetics. medrxiv.org/content/10.1101/…
The recommendation paradox:
1958: every student is above average
1998: every student is in the top 10%
2018: every student is in the top 5%
2028: every student is in the top 1%
2033: every student is in the top 0.1%
2035: every student is the best who ever lived and very social.
The @humancellatlas lung atlas that was published today is impressive, but with 2% Asian samples when 60% of the world is Asian, it seems that the initial goal of keeping "ethnic diversity in mind"... may have escaped the mind.
Of course it’s good advice to tell students to choose labs carefully. But the advice that’s really needed is for PIs, not students. The advice is to create environments in their departments where students don’t have to choose labs carefully, because all labs are “good”.
A few months ago @AndrewYNg tweeted that radiologists were on the verge of being obsolete because AI: nitter.app/AndrewYNg/status/93093…. Andrew has >300K twitter followers so his tweet made the rounds (>2,000 likes) /1
Should radiologists be worried about their jobs? Breaking news: We can now diagnose pneumonia from chest X-rays better than radiologists. stanfordmlgroup.github.io/pr…
"...two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates [when identifying differentially expressed genes between two conditions using human population RNA-seq samples]."
tl;dr use the Wilcoxon rank-sum test. genomebiology.biomedcentral.…
This speech by @FareedZakaria is a litany of misinformation. There is much to improve at US universities, but his claims are false and unhelpful.
A rebuttal: 1/🧵
Lex doesn't quite lie, but obviously he is far from telling the whole truth. His "research position" is a whole other bunch of bull (for another time). And he uses this MIT mirage to great advantage, creating the perception that he is effectively a professor there.
The edgeR differential analysis tool has been updated to version 4.0., and this update features support for isoform-level DE, which is important functionality that can be used for #scRNAseq (via pseudobulk). Great to see that transcript-level analysis has become mainstream. 1/🧵
edgeR 4.0: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and ... biorxiv.org/cgi/content/shor…#biorxiv_bioinfo
"These methods can order a set of individual cells along a path, and assign a pseudotime value to each cell that represents where the cell is along that path. This can be a starting point for further analysis to determine gene expression programs driving cell phenotypes."
So I wrote a thread about the lack of representation of women on the SciPy and NumPy papers, and the implications thereof. In return I was blocked by one of the core developers. This is not how one builds open source communities.
The reality is that there is no current decline for men. Rather, there has simply been an increase in women getting degrees after, you know, they were allowed to actually attend many universities in the... wait for it... 1970s. 🤯
Meet the new iPad Pro: the thinnest product we’ve ever created, the most advanced display we’ve ever produced, with the incredible power of the M4 chip. Just imagine all the things it’ll be used to create.
I'm honored and humbled to announce that after carefully studying a UMAP of an integrated single-cell RNA-seq atlas I've discovered a new cell type in the medial prefrontal cortex. I'm happy to share the data upon reasonable request.
It’s one thing to celebrate *science*, e.g. by toasting the discovery of the structure of DNA by Crick, Franklin and Watson. It’s an entirely different matter to toast a man whose actions have directly harmed not only our colleagues, but society at large. /10
Happy to announce that we've just posted a preprint on "Modular and efficient pre-processing of single-cell RNA-seq". Highlights: process #scRNAseq on a laptop, 10x processing up to 51 times faster than Cell Ranger. A new efficient RNA velocity workflow. biorxiv.org/content/10.1101/…
Today in Science History: In 1962, Dr. Rosalind Franklin, whose work was key to determining the double-helix molecular structure of DNA and its significance for information transfer in living material, did not win the Nobel Prize for Medicine and Physiology.
I have never seen a paper where the authors have formulated and experimentally tested a hypothesis based on a discovery made with RNA velocity.
It's totally possible I have just missed this in the literature, and if so I'd love to see a reference. Thanks!
This recently published figure by @Sarah_E_Ancheta et al. is very disturbing and should lead to some deep introspection in the single-cell genomics community (I doubt it will).
It demonstrates complete disagreement among 5 widely used "RNA velocity" methods 1/
The @biorxivpreprint has been weaponized by many labs who use it to plant priority flags, not to accelerate research (accomplished by withholding methods and/or data). This does not serve the interests of either the scientific community, or the public.
A beautiful essay by Michael Jordan on AI calling for us to "broaden our scope, tone down the hype and recognize the serious challenges ahead." Essential reading medium.com/@mijordan3/5e1d58…
I respectfully disagree. I think what is currently "democratizing" science much more than "high impact" journals is @PubPeer. So let's take a tour of the @PubPeer comments for some of this author's papers in "high impact" journals: 1/
I'll go further Itai..."glam" or actually "high impact" journals are actually very much "democratizing" science. If you take them out everything will be based on pedigree, "fame", X-followers etc...so i would be very skeptical with the idea of getting rid of journals...
You know what else sucks? When high profile machine learning people oversell their results to the public. It leaves everyone worse off… because how can us mere mortals publish a paper if we haven’t rendered an entire profession obsolete with our results? /4
I've checked this paper out, as instructed. I was also interested in the main result for personal reasons: I'm 51 years old. Is it true that I've just gone through a major change? And that another one awaits me in just a few years?
Some comments on the paper in this thread 1/🧵
If you think articles should be valued for their content rather than the journals they were published in, use the citation form "Author(s), year, DOI" on slides and omit the journal name.
So apparently in 2023 principal component analysis latent spaces can be said to have an "internal world model" (figure from @jnovembre et al., 2008). Turns out that the "singular" in singular value decomposition refers to the singularity!
UMAP and t-SNE are widely used in single-cell genomics to identifying features of interest, and visually explore data. In a new paper w/ Tara Chari we find that extensive distortions and inconsistent practices make such embeddings counter-productive.🧵journals.plos.org/ploscompbi… 1/
BREAKING NEWS:
On the basis of vote counts from the first 11 rounds of voting for the Speaker of the United States House of Representatives WE CAN NOW PREDICT that Kevin McCarthy will receive zero votes in the 676th round of voting which will take place on June 14th, 2023.
🌌The virial theorem relates time-averaged kinetic energy of objects to their potential energy.
🧬The Price equation relates change in a trait over time in subpopulations to their fitness.
In arxiv.org/abs/2312.06114 we observe that the virial theorem is the Price equation. 1/🧵
Science has been moving very fast, but it's about to move MUCH faster.
In this example, Gemini compiles an up-to-date list of GWAS variants from the literature.
piped.video/watch?v=sPiOP_CB…
Isaac Ben-Israel is an Israeli going around saying the pandemic will end in 70 days. He has a "paper" in Hebrew (I read Hebrew). Below is one of the graphs on which he is basing his prediction.
HE FIT A SIXTH ORDER POLYNOMIAL TO THE DATA
😱😱😱😱😱😱
(c=2E21 😱)
This photo (see RHS of image below) is from what he calls his "MIT course" on Deep Learning for Self-Driving Cars. Sounds like good stuff. CS, math, self driving cars. #broheaven. So what is the problem? He is standing in front of the blackboard.
I disagree that there is a "gulf between[James Watson's] scientific brilliance and his views on race." Yes, he won a Nobel prize but winning this prize does not make one scientifically brilliant. He is scientifically bankrupt and a racist. nytimes.com/2019/01/01/scien…
I think this paper is a Denial Of Peer Review Attack (DOPRA). It's kind of like a DoS (denial of service) attack. There is so much data, so many methods, so much code, so many figures, so many panels, so much supplement, so much text, that it is overwhelming. 18/
I’m not sure I understand the guy with the “no mRNA” sign. I mean who hasn’t been against proteins at one point or another… but a complete and total ban on mRNA seems a bit extreme.