Software performance expert. Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers. Father, husband.

Montreal, Quebec
Quebec is lifting its universal mask mandate tonight at midnight. The mandate lasted 666 days.
918
2,607
10,862
Actual email exchange I just had.
98
935
10,286
1,632,370
Some engineer writes to me to object that my Go library is missing a key feature that it used to have. Turns out that the library never had this feature, but ChatGPT thought so. Actual email exchange:
78
221
3,920
246,444
Floating-point Number Parsing w/Perfect Accuracy at GB/sec piped.video/AVXgvlMeIm4 Talk by myself at Go Systems SF. I don't know what happened to my hair that day.
8
41
3,554
Your operating system, your browser, your database engine, your web server, git, your JavaScript engine, your Python interpreter… all of these are likely written in C/C++. I repeatedly encourage people to learn many programming languages. I do not (at all) think that C or C++ should be used systematically. I use different programming languages. If you can, use Go, Python, JavaScript, C#, Java, Zig, etc.... they are often better choices than C. I literally just published a new book on Python programming (see Amazon). It is a great language. However, if you are proudly broadcasting your belief that safe and performant complex systems cannot be written in C or C++, you are showing your ignorance and you are signalling that you are unable to match what countless programmers have done over decades. And no, C programming is not elitist. Generations of teenagers taught themselves C. Working with C strings does not require a Ph.D.
people here talking about how C/C++ is unsafe, no good, don't learn it Your browser was written in C. Your OS was written in C. AAA games are in C. game engines are in C. "safe" languages are (or were) in C. Your art tools, written in C. Audio tools, C.
93
222
2,892
808,118
When Java became popular, people (me included) claimed that it was massively better than C/C++. This was highly controversial and people mocked me for using Java. I was hammered by the referees during my first grant application for picking Java as my language of choice. In some ways, Java is great.. We have lots of big data software written in Java and Java-like language... Lucene, Elastic Search, and so forth. It is not at all obvious that you could just rewrite them in C and make them faster. You can, but it requires skill. If you task Joe the intern into rewriting your Java code into C, the result will almost assuredly by buggier and slower code. But the same thing happens in reverse... You can write code in Java that will give the average C/C++ system a run for its money. We have an implementation of Roaring Bitmaps in Java, and one in C. When I last benchmarked them, the Java implementation was sometimes faster... Of course, it means that the C version could and maybe should be further optimized... but it is surprisingly easy to write fast code in a higher level language... My own view is that, most of the time, you can write fast and effective software in just about any programming language. What is more important is the social component. Some programming languages attract some people and some problem areas. The Go people are not like the C# people. And the social differences are more important in practice. In some sense, if you are offered a job, and they tell you "we code in C#", they tell you a lot. It is not about the syntax or the tools primarily. It tells you what kind of philosophy they have. Ultimately, all languages s*ck. They all have annoying limitations and you eventually hit them. There is no free lunch.
193
230
2,608
743,005
We write command line utilities in languages like C, not in Java, Ruby or Python for a reason: start-up time matters.
138
88
2,596
274,556
One of my pull requests got the following comment on GitHub. What comes to mind when you read it?
405
39
2,341
385,563
"Data Engineering at the Speed of Your Disk", my talk at the 3rd Performance Summit (supposedly in Seattle but I was in my bedroom with my cat) piped.video/watch?v=p6X8BGSr…
3
109
2,260
Recently, there was a clash between the popular @FFmpeg project, a low-level multimedia library found everywhere… and Google. A Google AI agent found a bug in FFmpeg. FFmpeg is a far-ranging library, supporting niche multimedia files, often through reverse-engineering. It is entirely the result of volunteers and a marvellous piece of technology. For people who have never been on the receiving end of ‘security researchers’, it is difficult to understand why there is a pushback against them. Think about the commons. In Quebec, these are pieces of land where farmers send their cows during the summer. It is collectively owned, like FFmpeg. Everyone is responsible to care for the commons if they are using it. If you are not using it, you are supposed to stay away. Now, imagine a rich corporation comes in and sends its well-paid agents into the commons to find issues with it. Maybe a broken barrier or a dangerous hole. So far so good… But instead of fixing the issues, the corporation says “you have a month to fix the issue or else I will report you to the government”. How much love would the big corporation get in this context? Why do the security researchers insist on disclosing the issue without having contributed to fixing it? So that they can get credit for it. That's their entire scheme: find issues, irrespective of whether they affect the use case of their employer... after all, all issues no matter how small can be potentially significant at some point... and then brag about it without doing the hard work of trying to fix it. Let me be clear that no everyone working in security behaves this way. Many are good actors. But there are enough 'security researchers' behaving as parasites that it has become a recognizable pattern. « But Daniel, who should be fixing the bugs then? » If you are paying for commercial support, then get in touch with the folks you are paying. If you are not paying, then it is on you. It says so in the licenses. It is part of the moral code open source. It is part of the legal framework. Let me be clear. You do not get to bite back at Linus Torvalds if a bug in the linux kernel crashes your server. What you do is that you identify the issue, narrow it down and propose a fix. If you cannot do it, then you pay someone to do it. Or you just do not use Linux.
59
290
2,401
206,458
Knuth did it: The Art of Computer Programming, Volume 4, Fascicle 5 is out!!! (Was supposed to come out in 1960.) amazon.com/dp/0134671791/
24
675
2,010
It is hard to overstate how strong the push for object-oriented programming was. It even bled out into other fields like education (look up "learning objects"). You had to organize your programming projects into hierarchical classes and you would be ridiculed if you did not. Java and C# are a reflection of this era. It took 25 years for the obsession to die down. Basically, the gurus had to be given time to retire. Object-oriented programming can work… but there are serious pitfalls that will make your projects harder to maintain and optimize. Deep inheritance is almost always a disaster. The lesson is: don’t blindly embrace the latest things even if everyone is. Masses will lead you astray. Be critical.
The older you get, the harder to resist saying "I told you so." When OO programming came in, it made no sense to me, and I've never used it. Everyone said I was too old to understand. Thirty years later, everyone's snapping out of it and wondering wtf they'd been thinking.
116
204
1,918
741,956
FreeBSD is a well-established operating system, akin to Linux but with distinct origins. It powers critical systems such as Sony's PlayStation and Netflix's infrastructure. In FreeBSD, the standard C function for generating a random integer within a specified range, like between 0 and 10, is `arc4random_uniform`. Recently, developer @FUZxxl enhanced this core function by optimizing the number of divisions required per function call. This optimization resulted in significant performance improvements. The takeaway from this development is that even highly established functions can benefit from further optimization. Reference: - Fast Random Integer Generation in an Interval, ACM Transactions on Modeling and Computer Simulation, Volume 29, Issue 1, 2019.
43
193
1,948
197,907
The fast JavaScript runtime Bun is much faster than Node.js 22 at decoding Base64 inputs. By much faster, I mean *several times* faster. But they both rely on the same underlying library (simdutf) for the actual decoding. So what gives? The problem is that Node.js needs to interact with v8, the underlying JavaScript engine (from Google)... and doing so is not trivial. Before we can start decoding the string, we need to grab the string... so, in this instance, we call String::Value... In turns, this allocates an array inside Node.js and asks v8 to copy the content to it... In an ideal world, we would avoid the trouble entirely and just ask v8 to give us direct access to how it stores the string... and we try to do that if we can... but let me come back to it... How bad can this be, right? Just a copy. Well. Let us do some profiling... So you see, the base64 decoding itself is about about 1/5 of the running time, but the copy takes half of it. What is up with this CopyChars function? Well, it is mostly just a wrapper around the standard high level C++ function std::copy_n as far as I can tell. (see v8/src/utils/memcopy.h) But we are copying for an 8-bit input to a 16-bit output... why is that? Base64 is pure ASCII... and v8 can store ASCII using 8-bit per character. We get there before both IsExternalOneByte() and IsOneByte() are false (see node/src/node_buffer.cc)... We have fast paths for these cases. If IsExternalOneByte() is true, we just get the bytes and everything is great. Unfortunately, it does not always work. So we have a v8 string that is really pure ASCII, but, seemingly, we can't tell that it is the case from Node.js, and so we have to convert it to UTF-16 needlessly, using a function that is maybe not very well optimized... and then we do the base64 decoding of an ASCII string from the UTF-16 input. It is not great. To be fair, this is just one string, created as 'Buffer.alloc(size, "latin1").toString("base64")', basically the base64 encoded version of the string "latin1latin1latin1...". In actual applications, we might have better luck. Yet. Yet. I am telling this complicated story for a reason. The story illustrates why our software is slower than it should be. We have layers of abstractions to fight against. Sometimes you win, sometimes you lose. These layers are there for a reason, but they are not free. To make matters worse... these abstraction layers often thicken over time... and the friction goes up. To be clear, I do not claim that the Node.js code is optimal. In fact, I know it can be better. But it is not trivial to make it go fast. I sometimes hear people say... "well, it is C++ and C++ is hard". No. The C++ part is easy relatively speaking. The difficulty is at a higher level. It is not a matter of syntax. It is a matter of architecture.
In the next version of Bun `Buffer.from(str, "base64")` gets 6x - 30x faster on large input, thanks to @lemire's simdutf
33
234
1,643
556,789
How much memory does a call to ‘malloc’ allocates? In C, we allocate memory on the heap using the malloc function. Other programming languages like C++ or zig often call on malloc underneath so it is important to understand how malloc works. In theory, you could allocate just one byte like so: How much memory does this actually allocate? On modern systems, the request allocates virtual memory which may or may not be actual (physical) memory. On many systems (e.g., Linux), the physical memory tends to be allocated lazily, as late as possible. Other systems such as Windows are more eager to allocate physical memory. It is also common and easy to provide your own memory allocators, so the behavior varies quite a bit. But how much virtual memory does my call to malloc(1) typically? There is likely some fixed overhead per allocation: you can expect 8 bytes of metadata per allocation although it could be less or more depending on the allocator. You cannot use this overhead in your own program: it is consumed by the system to keep track of the memory allocations. If you asked for 1 bytes, you are probably getting a large chunk of usable memory: maybe between 16 bytes and 24 bytes. Indeed, most memory allocations are aligned (rounded up) and there is a minimum size that you may get. And, indeed, the C language has a function called reallocwhich can be used to extend a memory allocation, often for free because the memory is already available. You can ask how much memory is available. Under Linux, you can use the malloc_usable_size while under FreeBSD and macOS, you ca use malloc_size. So I can write a small programs that inquires how much (virtual) memory was actually granted given a request. For one byte, my macOS laptop gives out 16 bytes while my x64 Linux server seemingly gives 24 bytes. If I plot the memory actually granted versus the memory requested, you see a staircase where, on average, you get 8 extra bytes of memory. Thus you probably should avoid allocating on the heap tiny blocks of memory (i.e., smaller than 16 bytes). Furthermore, you may not want to optimize the allocation size down to a few bytes since it gets rounded up in any case. Finally, you should make use of realloc if you can as you can often extend a memory region, at least by a few bytes, for free.
30
172
1,588
217,187
This UC Berkeley professor is now telling his Computer Science students to « be good at a lot of different things because we don’t know what the future holds. » We should always tell young people: learn to get good at more than one thing, as quickly as possible. It does not matter what the young person is studying or where they live. When you have decades of experience, you are likely to ‘be good at different things’ because that’s the natural trajectory of most decent careers: you end up working on several different problems. Most successful people in their forties or fifties have two or more areas of expertise. When you are young and inexperienced, your most glaring fault is that you haven’t had time to get good at many things. Too often, young people focus narrowly on one specific expertise not realizing the downsides. The first downside, obviously, is that employers may not specifically need your one expertise. But the second, equally important downside, is that you will lack critical thinking if all you know is one specific technology or technique. The mistake young people make is to think that if they get really, really good at this ‘one thing’ then they are set for life. And that’s true for some, but it is statistically false. Very few people have a good career with one specific expertise. There are many reasons why this strategy fails. One of them is that it is really, really hard to get really, really good at any one thing. That is, it is far easier to be in the top 1% in two areas of expertise, than to be in the top 0.1% in one area. Expertise follows a Pareto distribution: it is relatively easy when you get started, and the deeper you dig, the harder it gets. The second reason is that the landscape is dynamic. By the time you have become an expert at this one thing, the demand can fall.
31
192
1,421
116,995
People believe that if they could somehow walk into a research lab, they could then 'steal the knowledge'. It does not work like that. Innovation is illegible. You can go, right now, on many campuses all over the world, just walk in and talk to the professors, the students, and so forth. Typically, they will be forthcoming about what they are thinking and what they are thinking about. You know all these fancy tech companies? If you are an engineer and you meet with one of their engineers, they will often tell you anything you'd like about their systems. But you have to be already at their level to be able to understand what is going on. And if you are at their level, you can often catch up on your own just as fast as you would by reading their notes. Secretive people often want to hide primarily the fact that they have little to hide.
60
177
1,297
378,047
Intel is extending its instruction set. "Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32. This allows the compiler to keep more values in registers; as a result, APX-compiled code contains 10% fewer loads and more than 20% fewer stores than the same code compiled for an Intel® 64 baseline.2 Register accesses are not only faster, but they also consume significantly less dynamic power than complex load and store operations."
25
175
1,283
499,581
In C and C++, we should agree to use UTF-8 by default. It is time.
69
35
1,282
74,927
Faster remainders when the divisor is a constant: beating compilers and libdivide lemire.me/blog/2019/02/08/fa… paper: arxiv.org/abs/1902.01961 code: github.com/lemire/fastmod
5
86
1,115
I'd say that Python 3.14.16 would be a pretty good place to stop.
32
15
1,267
77,474
Replying to @AstraKernel
There is no need to mention Rust because 50 people will do it in the comments.
13
34
1,193
34,067
The reason many programmers say that you do not need mathematics to be a programmer... is that they don't know what mathematics is. It should be entirely uncontroversial that programmers do mathematics all day long. It just does not feel like mathematics because there is no Greek letter, no theorem and so forth. However, there is no doubt that Newton would have considered modern programming to be mathematics. Anyhow, how do I know that you need to master some mathematics to learn programming ? Because I have taught programming to hundreds of students over two decades... and invariably, you get the folks who just 'do not get it'... you drill it down... and then you find that can't reason mathematically... the pattern is undeniable. They often object 'my friend told me that I did not need to be good at math'. Well. Your friend is good at math, he just does not know it. Here is a simple test. Joe always gets at the office after Jill. Joe is at the office. Is Jill at the office ? There is a good percentage of folks who just can't answer this question. They get totally confused. These people cannot program software. They are often doing just fine in other advanced university classes. This little puzzle is mathematics. If you do not see it as a mathematical puzzle to be solved... it is because your innate mathematical abilities are such that you do not even need to make an effort... You are like a fish in water... you do not see the water anymore. But for some people, they cannot relate Jill and Joe after I have said "Joe always gets at the office after Jill." They just don't know. They can't tell.
Software engineers, how many of you are good at math ? I’m curious because they say you need to be good at math to become a great programmer. Is that true?
99
127
1,159
143,852
Replying to @julianhyde
Maybe ChatGPT will write it for me too!
3
3
1,057
57,986
Don’t EVER make the mistake that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That's giving your intelligence much too much credit. —Linus Torvalds
20
159
1,098
93,152
I like how the top Microsoft employees give presentations using macbooks. It seems like Microsoft has given up on making laptops that their own employees want to use. I don’t understand.
80
26
1,047
51,966
There was recently a harsh back and forth between @elonmusk and @ylecun about the role of publications in science. My stance is that, at least as far as computer science is concerned, the number of publications is irrelevant. Let me give a specific example of someone who did not focus too much on formal publications but is still in the top 0.00001%. Chris Lattner (@clattner_llvm) should get the Turing award for his work on LLVM. (The Turing Award is the equivalent to the Nobel prize in computer science.) Though Chris should not get nearly all of the credit, the impact that LLVM has had is... deep. And I don't want to hear anything about "yeah, but it is engineering, not actual 'computer science'". Nonsense. Go read Chris' 2005 thesis and dare tell me that it is not science. Let me be blunt. The 'regular' (non-LLVM) C++ compiler under Visual Studio should be considered legacy at this point. You will typically get better code generation with LLVM: faster code, fewer compiler bugs. And yes, you can turn it on with a single flag (look for clang/llvm in the Visual Studio documentation. The price to pay is longer for switching to LLVM is longer builds. However, it is an acceptable trade-off is almost all cases. I would not be surprised if Microsoft were to retire their C++ compiler in favour of LLVM (making it the default) in a future version of Visual Studio. It would probably be a good move. We are doing a great disservice to the new generation by telling them that science is all about publishing 80 papers in 2 years.
47
104
995
332,976
You can learn C in a few weeks. Do it. Most people do not become full-time C programmers, and that's a good thing... But C is still in the top 5 most used programming languages and it has been the case for the last 40 years. It is hard to beat as a track record. Let us say you use Python. Python is arguably the most popular programming language today, and with good reasons. Can you guess how most people extend Python when performance is critical? C extensions. Fairly easy to write, low overhead. Let us say you decide to learn Go next. I love Go. Go is basically C with garbage collection and stricter typing. Very similar syntax. If you know C, learning Go takes a day or so. And knowing C will make you a better Go programmer. Let us say you decide to learn Swift. Swift interoperates with C "perfectly" which makes it trivial to call standard Linux libraries without overhead from Swift. Suppose you decide to learn C++... C++ is almost a superset of C. And the most commonly used part of C++ in fields such as gaming is the C subset. And so forth.
I remember buying "Learn C in 21 days" when I was 17. Used a couple of more weeks, so yeah. But useful? Nah. Took a decade before I had any use for it (compilers), and that's 13 years ago. Being proficient enough to do something useful is a different thing than learning ofc
56
108
921
275,889
There is sometimes confusion about what a PhD is. The main signal that you should derive from the fact that someone has a PhD is that they are well suited to the university campus environment. Having a PhD is not overly uncommon. Over 3% of the population in a country like Switzerland has a PhD. Hence, in a city of 1 million people, you may have 30,000 people with a PhD. In Germany, that would be 15,000 people. A lot of people have a PhD! Do PhDs lead to great jobs and higher incomes? Not really. If you do get your PhD and secure a good job and keep it for a long time (so no early retirement), then you can do a bit better than the *average person* who stopped with a professional degree. But even that is misleading because we don't know how the person smart enough to outcompete 100 other PhDs for a prestigious job would have done had they not gone for the PhD. You know, Joe, who has a PhD and has become a full professor at Pretigious University... well... it is likely that Joe has been working week-ends and nights for years. Joe is excessively well connected. Joe was always smarter than anyone else in his classes. Joe can sit down and write a great 20-page scientific essay without ChatGPT and without much effort. Joe can navigate politics better than most. Sometimes young people think... well, it is terrible to be job hunting at 22 without experience. Right, but try job hunting at 30 with a PhD and no actual job experience. It is worse! Much worse! Given these facts, do you really want to be 'as smart as a PhD'? « In the short run, pursuing a PhD entails substantial opportunity costs. Early-career earnings for PhD graduates are significantly lower than those of individuals with master’s or professional degrees. (...) Over the lifecycle, earnings do eventually recover but only under specific conditions. The most favourable long-run outcomes are concentrated among those who secure academic employment and remain in full-time work late into life. (...) However, the structure of this system increasingly resembles a tournament: the payoff remains high for those who reach the top, but the odds of doing so have declined. Our analysis documents that the economic outcomes of recent PhD graduates have worsened over time. The bottom of the earnings distribution has grown more populated, and early-career returns have declined even as aggregate statistics appear stable due to rising returns among older cohorts. » (Benjamin et al., 2025) Note: I got a PhD and I did fine. I am a full professor. But I also work 60 hours a week, nights and week-ends.
I'm glad that LLMs achieving "PhD level" abilities has taught a lot of people that "PhD level" isn't very impressive.
46
115
943
147,743
C++23:
47
72
868
162,942
Daniel : what made you think that parsing JSON could be a bottleneck ? Me listening to @gwenshap in San Francisco, explaining how JSON was ruining our performance...
Replying to @yagiznizipli
"JSON parsing is never a bottleneck"
31
24
869
126,781
Redis is a popular in-memory data structure store that can be used as a database, cache, and message broker. A few hours ago, Redis has adopted the fast_float C++ library for faster number parsing. It lead to performance gains of up to 25% in some instances. Number parsing comes up ridiculously often as a bottleneck. The fast_float C++ library is used by WebKit/Safari, GCC (glibc++), Chromium (Chrome and other browses). Other systems have adopted the same algorithm with an independent implementation: LLVM, Rust, C# (.NET), Go and many other systems. Reference: Number Parsing at a Gigabyte per Second, Software: Practice and Experience 51 (8), 2021
6
88
849
49,055
C23: a slightly better C One of the established and most popular programming languages is the C programming language. It is relatively easy to learn, and highly practical. Maybe surprisingly, the C programming language keeps evolving, slowly and carefully. If you have GCC 23 or LLVM (Clang) 16, you already have a compiler with some of the features from the latest standard (C23). lemire.me/blog/2024/01/21/c2…
47
143
813
163,362
Upcoming AMD chips will have 1GB of cache. nextplatform.com/2023/06/14/…
19
62
796
189,621
In C++, a function can return *3* values. Not just two. 😜
67
21
668
151,638
We are being flooded by ‘research papers’. It is not a joke. As an editor of Software: Practice and Experience… I review over a thousand papers a year. Increasingly, papers have been written by AI. My students hand in work done by AI. I have recently got a PhD thesis that the student admitted was AI generated. So what do we do about it? We leave the Soviet model of science: science as a bureaucracy. Focus on doing great and useful work, and then (only then) write about it. Do you do work so that you can publish it? Bring back your focus to doing great work. It should always have been thus. To put it another way, the research paper is not the end point of the research. It has a supportive role. At this point, some people object: “nobody cares about my work, how will they know that it is good if I don’t have a peer-reviewed paper?” Oh brother: if you cure cancer, they will care. And if they don’t care about your work, they won’t read your paper. And your paper may have been generated by AI, and if nobody cares about it, it may as well have been AI generated. Daniel Lemire. 2024. Will AI Flood Us with Irrelevant Papers? Commun. ACM 67, 9 (September 2024), 9. doi.org/10.1145/3673649
48
100
710
73,570
In one of my classes, I always use 42 as a seed. Today, a student asked what was special about 42. Kids don’t know anything.

ALT 42 Hitchhikers Guide To The Galaxy GIF

73
27
658
54,410
You cannot compress random data. In fact, that is what random means… if you can compress purely random data, then it is not random.
80
34
658
98,943
Scan HTML faster with SIMD instructions: .NET/C# Edition Recently, the two major Web engines (WebKit and Chromium) adopted fast SIMD routines to scan HTML content. The key insight is to use vectorized classification (Langdale and Lemire, 2019): you load blocks of characters and identify the characters you seek using a few instructions. In particular, we use ‘SIMD instructions’, special instructions that are available on practically all modern processors and can process 16 bytes or more at once. The problem that WebKit and Chromium solve is to jump to the next relevant characters: one of <, &, \r and \0. Thus we must identify quickly whether we have found one of these characters in a block. On my Apple macbook, a fast SIMD-based approach can scan an HTML page at about 7 GB/s, with code written in C/C++. But what about C#? The recent C# runtime (.NET8) supports fast SIMD instructions. Let us first consider a simple version of the function: This function just visits each character, one by one, and it compares it against the target characters. If one target character is found, we return. Let us consider a SIMD version of the same function. It is slightly more complicated. The function takes two pointers (ref byte* start and byte* end) that mark the beginning and end of the byte array.  The main loop continues  as long as start is at least 16 bytes away from end. This ensures there’s enough data for vectorized operations. We load in the variable ‘data’ 16 bytes from the memory pointed to by start. We use a vectorized lookup table and a comparison to quickly identify the target characters.The code checks if any element in matchesones is not zero. If there’s a match, then we locate the first one (out of 16 characters), we advance start and return. If no match is found, we advance by 16 characters and repeat. We conclude with a fallback look that processes the leftover data (less than 16 bytes). I wrote a benchmark (in C#) that you can run if you have an ARM-based processor. It would be trivial to extend the benchmark to x64 processors. Incredibly, the SIMD-based function is 15 times faster than the conventional function in these tests, and the accelerated C# function is just as fast, if not faster, than the equivalent C/C++ code. In other words, .NET/C# allows you to write very fast code using SIMD instructions.
19
94
673
95,124
Learn C. You can learn it in weeks.
“You dont need to learn C, I didnt” Anyone telling you that is limiting you in favor of their ego Most of the worlds software is in C/C++ Operating systems, browsers, game engines, your favorite programming language Is most likely written in C/C++ It certainly helps to understand how a computer works and what your programs are doing Especially starting off It will make you a better programmer C syntax is simple and there are a ton of resources on learning it Do a couple projects in C Then move onto a a higher language
45
34
625
216,747
Replying to @w2k_
I reported them.
1
638
27,340
I have co-written three textbooks using Microsoft Word. Two of these books, I need to update from time to time because technology changes. E.g., my Python book did not cover 'uv' initially and described venv. I had to change that. The Java book is still mostly Java 8 and does not cover all the neat stuff added between Java 8 and Java 25. Updating a complex Word document is a terrible experience. Maintaining consistency is needlessly difficult. There is no perfect solution when writing technical books... But if you want to self-publish and you do want to pay for an editing team... Microsoft Word is easily the worse option in my opinion. I only ended up with Microsoft Word because that's what my great co-author (Godin) used at the time. Microsoft Word has no builtin programming code highlightning. There are plugins that 'sort of' work, but they are clunky. My latest book and, I hope, all my future technical books, will be written in MarkDown. It is not perfect. My friend @pshufb spotted various technical issues in the first draft of the book. However, I can correct most issues with a reusable script. I think that there is a valid use for Microsoft Word: short documents that you are going to throw away later or simple text-only document. For nearly everything else, there are better solutions if you are tech savvy.
since when did Word become like this
55
21
643
70,595
In 10 years, you will all be unemployed but you’ll also live forever disease-free. … a prediction brought to you by people who never cured a significant disease and work for corporations hiring endless streams of software engineers with eye watering salaries. Don’t pay attention to what people say. Pay attention at what they do. I’ll pay attention when Google is downsizing its staff down to a skeleton crew. I’ll pay attention when AI can make a mouse live to be 10 years old.
DeepMind CEO believes all diseases will be cured in about 10 years ACCELERATE
18
49
610
83,484
Microsoft makes over a 15 billion dollars of profit per year. And they won't bother to provide a clean, user friendly system. And nobody cares. It explains so much.
I present to you, Windows.
47
26
574
43,868
I was recently asked why we do not teach students to write code in Python... completely skipping the perils of memory management and pointers. Some prestigious computer science departments have taken this route. A cynical voice in my head warns that it is because the professors themselves can't write code anymore. Even though it is not 'economical', I argue that you still should write a lot of your own code from scratch. In the 1980s, writing software meant building everything from scratch. Needed a sorting function? You coded it. Developers relied on recipe books or shared snippets of code. Software was lean, running on minimal memory and disk space. It was easy (easier than today) to switch on a pixel on your screen... but if you wanted a hash table, you had to provide it. Looking back, games like Doom—built with just 10,000 lines of code and 5 MB of disk space—seem remarkably efficient compared to modern games that span gigabytes. The original Linux kernel was about 10,000 lines of code too! Today, programming feels like magic. Complex algorithms like red-black trees come for free, backed by stacks of intricate code handling dynamic programming, Unicode, and more. A teenager could recreate Doom in weeks, if not days. We cannot—and would not want to—return to the MS-DOS era. Programming today is far more accessible, which is a net positive. But complexity looms. As Joel Spolsky noted, “all abstractions are leaky.” When systems grow so intricate that no one fully grasps the foundations, we risk becoming like children inheriting a world we do not understand—where things work, or do not, and no one knows why. Jonathan Blow (@Jonathan_Blow ) has warned of this potential collapse, urging us to stay grounded in the fundamentals. The web’s complexity, with its sprawling standards and stacks, is particularly troubling. Engineers like Andreas Kling (@awesomekling ), building browser engines from scratch, are unsung heroes navigating this modern software chaos. Keep in mind how remarkable it is that, except for Andreas's work, we have roughly two Web browser engines: Firefox's engine and WebKit/Chromium. Though WebKit and Chromium are now distinct, one is roughly a fork of the other. How many people understand how it is built and how it works? This mirrors broader concerns. Just as we value international trade but worry about losing the ability to build critical technologies like drones, software’s growing reliance on dependencies raises similar risks. We need both—dependencies and trade—but unchecked complexity could lead to fragility. Balance is required to avoiding collapse. In concrete terms, if you value culture (and you should), you want people to write high quality software from scratch around you. You do not want to be reliant on a few super large teams inside massive corporations. On the short term, there is a trade-off between culture and efficiency. But a rich culture has deep benefits in the long run. So pay a beer to the guys writing a browser engine from scratch!
On my machine, the ada library builds slightly faster (4 s vs 5 s) than the competing solution. The ada crate has two dependencies: derive_more and serde. We also have development dependencies, but these should not be relevant if you are only using the lib. We have build time dependencies (cc and regex), but they are rather standard. Modern URL parsing is somewhat of a daunting task. You might be interested in our paper... please see the expected state machine required for parsing URLs. Yagiz Nizipli, Daniel Lemire, Parsing Millions of URLs per Second, Software: Practice and Experience 54 (5), 2024 onlinelibrary.wiley.com/doi/… It is, of course, possible to write a parser that is very simple and will work 'most of the time'. Yet a lot of people prefer a parser that just follows the modern spec (WHATWG URL) with high computing efficiency. They prefer not to have to worry about Unicode issues.
38
110
597
57,392
To those who say that software performance is not of primary importance… I have one word: deepseek.
15
45
594
32,074
Firms would need to spend 3.5 times more on software than they currently do if open source did not exist. The top six programming languages in our sample comprise 84% of the demand-side value of OSS. Further, 96% of the demand-side value is created by only 5% of OSS developers. Hoffman et al., 2024
13
137
546
85,835
Dijkstra was a critical thinker. It is a shame that he did not have more influence on computer science. The world would be better today.
Replying to @lemire
Fun fact: Even Dijkstra was unable to prevent his uni board from replacing Haskell (fp language) with Java( oop language) in the undergraduate curriculum. cs.utexas.edu/users/EWD/Othe…
25
61
514
77,728
The Windows Dev Kit 2023 is great. You get this tiny box running a mobile ARM processor (8 cores) with 32 GB of RAM, and 512GB fast NVMe. You get a full Windows: you could not tell that it is not Intel. Microsoft is ready for 64-bit ARM systems. microsoft.com/en-us/d/window… I have not tested with games and I don’t do heavy multimedia work. The Microsoft Store does not appear to work well with ARM yet.
44
39
518
224,402
In C++26, reading uninitialized variables is no longer undefined behaviour... and more. Great talk by @herbsutter: Peering Forward - C++’s Next Decade - Herb Sutter - CppCon 2024.
13
62
519
46,259
On-demand JSON: A better way to parse documents? JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox. On-demand JSON: A better way to parse documents? Software: Practice and Experience 54 (6), 2024. doi.org/10.1002/spe.3313 Software: github.com/simdjson/simdjson
30
63
512
121,644
Learning from the object-oriented mania Back when I started programming professionally, every expert and every software engineering professor would swear by object-oriented programming. Resistance was futile. History had spoken: the future was object-oriented. It is hard to understate how strong the mania was. In education, we started calling textbooks and videos ‘learning objects‘. Educators would soon ‘combine learning objects and reuse them‘. A competitor to a client I was working on at the time had written a server in C. They had to pay lip service to object-oriented programming, so they said that their code was ‘object-oriented. I once led a project to build an image compression system. They insisted that before we even wrote a single line of code, we planned it out using ‘UML’. It had to be object-oriented from the start, you see. You had to know your object-oriented design patterns, or you could not be taken seriously. People rewrote their database engines so that they would be object-oriented. More than 25 years later, we can finally say, without needing much courage, that it was insane, outrageous, and terribly wasteful. Yet, even today, the pressure remains on. Students are compelled to write simple projects using multiple classes. Not just learn the principles of object-oriented programming, which is fair enough, but we still demand that they embrace the ideology. To be fair, some of the basic principles behind object-oriented programming can be useful. At least, you should know about them. But the mania was unwarranted and harmful. The lesson you should draw is not that object-oriented is bad, but rather that whatever is the current trendy technique and trendy idea, is likely grossly overrated. The social mechanism is constantly in action, though it is no longer acting for object-oriented programming. It takes many forms. Not long ago, you had to wear a mask to attend a conference. Everyone ‘knew’ that masks stopped viruses and had no side-effect… just like everyone just knew that object-oriented programming makes better and more maintainable software, without negative side-effects. You can recognize such a social contagion by its telltale signs. Rapid Spread: A social contagion spreads quickly through a group or community, much like a wildfire. One day everyone is talking about the latest object-oriented pattern, and the next day, everyone is putting it into practice. Amplification: You often observe the emergence of ‘influencers’, people who gain high social status and use their newly found position to push further the hype. The object-oriented mania was driven by many key players who made a fortune in the process. They appeared in popular shows, magazines, and so forth. Peer Influence: Social contagion often relies on peer influence. E.g., everyone around you starts talking about object-oriented programming. Conformity: People often mimic the behaviors or attitudes of others in their group, leading to a conformity effect. People who do not conform are often excluded, either explicitly or implicitly. For example, object-oriented started to appear in job ads and was promoted by government agencies. Aggressive Behavior: You see a significant change from usual behavior as irrationality creeps in. If you criticize object-oriented programming, something is wrong with you! Grandiose Beliefs or Delusions: Claims that object-oriented programming would forever change the software industry for the better were everywhere. You could just easily reuse your objects and classes from one project to the other. Never mind that none of these claims could ever be sustained. Risky Behavior: Entire businesses bet their capital on projects trying to reinvent some established tool in an object-oriented manner. People kept throwing caution to the wind: let us rebuild everything the one true way, what is the worse that can happen?
70
101
499
95,168
I have been asked by some organization in France to send a signed letter in three copies. I asked whether I could send a scanned letter by email instead. They agreed. They insisted that it be in 3 copies. So I sent three emails and asked whether it was ok. They said I am good.
21
60
477
In C/C++, passing a function by parameter is often free (highly performant) as long as the definitions are all available. In this example, the function 'g' ought to be just as fast as 'add'.
23
21
486
49,675
I gave entire talks on git. I use git every single day. I have written book chapters on git. Yet I estimate that I understand only a fraction of git.
"you can learn git in 30 minutes" This sentence tells me everything i need to know about you.
27
16
487
40,336
21st Century C++ By Bjarne Stroustrup 1. We are moving to modules (replacing the headers and includes). 2. Use std::span instead of pointer addressing. 3. Adopt concepts. 4. Profiles!
17
49
474
80,059
I wasted almost half a day of research yesterday because I trusted ChatGPT's answer too much. I asked ChatGPT how to do X using the programming language Y (it is not a secret, but details are irrelevant). The answer looked quite credible and there were references to human experts proposing the answer. So I blindly trusted it. After much frustration, I finally tested it and found the answer entirely incorrect.
87
50
443
165,148
I created a new GitHub organization dedicated to fast compression routines. So far, we have C, C++ and Java libraries. It is open to all. Other programming languages are invited.
12
48
466
27,131
One time, a student was presenting his PhD thesis: 15 minutes in, Windows launches a software update. We had to actually pause in the middle of the presentation of the thesis to wait for Windows to complete its software update.
Replying to @SheriefFYI
It will force reboot to install Windows updates every week, while you are playing games.
16
18
456
26,368
Several system administrators in charge of mission critical systems effectively gave root access to CrowdStrike. Everyone of these administrators made a major mistake. My impression is that a lot of so-called 'security experts' are in cargo cults. They are not thinking, they are following trends and adopting blindly tools and standards that they don't fully understand. It makes no sense to have a vendor deploy a patch live on your critical servers all at once without any validation whatsoever on your part.
"Professional programmers" focusing on CrowdStrike disassembly/language is a coping mechanism that protects them from realizing that there is a remotely updated 3rd party kernel module that is deployed on significant part of the world. That is why real postmortems are important.
34
82
451
69,647
The man who invented the hash table never studied computer science. In fact, he never went to college. Stop making excuses and build stuff.
The German-American inventor Hans Peter Luhn (IBM): - invented hash tables - invented document summarization - invented modern knowledge management - invented KWIC index - invented modulo-10 checksums - invented publish-subscribe alert services AKA "SDI"; all in the 1950s!
7
58
437
48,453
Casey gave a great talk on object oriented programming. He was kind to object oriented programming… but he couldn’t « not say » that Alan Kay’s vision failed. I found his hypothesis for what motivated Kay credible (i.e., biological cells as self-organizing objects). As a professor, I get to teach object oriented programming… and, yes, there is a shape class which you can derive from to create a rectangle in my lecture notes and manuals. It is all very terrible. Don’t do this in the real world. But you have to teach it just like you have to teach about socialism: it is a catastrophe in practice but you will encounter it everywhere. Better learn about it by someone who has a minimum of real-world experience. I would never ask students to solve a difficult problem using inheritance, except maybe as a lesson on what to avoid. His talk made me realize that, in addition to other strategies that do not involve inheritance, interfaces and composition, we should probably teach the entity-component-system as a good example of what to do instead. It is now part of my lecture notes.
[1/3] I'm really happy with how well The Big OOPs has been received. It was a hard talk to put together due to the volume and complexity of the material, and it took a lot of time to weave everything into a single coherent presentation that could flow well while being accurate.
23
25
444
47,411
As a teenager and young adult, I was ridiculously successful by academic metrics. I easily got As and A+s. I completed my undergraduate on a full scholarship which is uncommon in Canada. I effectively completed my PhD is less than 2 years. I taught myself assembly programming when I was 13. I thought I was really smart. Was I smart? I have learned that it is excessively easy to overestimate your own competence. I was not so smart. I was rather mediocre. What matters, and the only thing that should matter, is your ability to solve problems in the real world. And I had little of that when I was young. I simply 'overfitted' my ability to do really well on whatever I was tested on. But it turns out that schools don't test the right skills. - I did not know how to start and run a business. I did not know how to negotiate a deal. I would later learn about my own incompetence. - I thought I was this great programmer, but my software was always limited to toy cases that would not scale in complexity. I was a bad programmer. - I knew a lot of science, from the books I had read, but I was terrible at applying it in the real world. My experience to younger folks is: go out there and do real projects. Don't pay too much attention to your academic results. If you can solve real problems, you will do ok. Learn to start and run a business. It is really useful!
18
59
443
22,268
The long tail is longer than people imagine. There are more people doing odd things with computers than you would think. And they are doing weirder things than you imagine. 1. I repeatedly get reports from people who are disappointed that I do not provide accelerated functions to POWER processors prior to POWER 7. POWER 7 was released in 2010. I know that IBM is still releasing POWER processors (up to 11) and that Apple once used POWER processors... but who are these people running POWER processors from the beginning of the century? 2. I repeatedly get reports from people using GCC under Windows. GCC under Windows is quite a beast, and it has significant critical bugs. There are Unix/Linux specific software that may require GCC (i.e., neither LLVM nor Visual Studio can work)... but, as far as I can tell, it should only be used for such legacy support work, not for new projects. 3. I get reports from Solaris users every few months. I am not sure who are these Solaris users. I suspect that most people in the software industry today don't know what Solaris is. 4. I regularly get emails and comments from people who want to build software for mainframe computers. 5. A recurrent theme, but one that is slowly fading, are the folks who build 32-bit software for Windows. I still get reports from these people, to this day... and probably for the next 10 years.
32
31
430
24,769
The laws of Physics are not what is limiting the performance of Microsoft Teams.
They are way beyond saving at this point, but if I were Satya N, I would cull at least 30% of engineers to start, then tell people more is coming if they don’t learn to do their jobs.
6
12
404
14,354
Apple's A12 reaches 3.7 instructions per cycle on a bitmap decoding test, that's 40% better than an Intel Skylake. Put another way, the 2.5 GHz A12 retires instructions like a 3.5 GHz Intel Skylake. cc @vielmetti @thecomp1ler @geofflangdale @trav_downs @stephentyrone
13
112
380
You are not depressed because your work is hard technically or intellectually. However, you might be depressed because you are lying all the time.
Replying to @lemire
b4 I started in business I thought engineering-burnout had to do with constantly being faced with hard engineering problems & cracking under pressure doing business realized I craved hard engineering problems & was burned out by meetings/politics/lies/backstabbing/profitteers
11
11
400
62,103
Intel wants to go full 64 bits and drop legacy modes… cdrdv2-public.intel.com/7766…
22
70
386
110,131
Scan HTML even faster with SIMD instructions (C++ and C#) Earlier this year, both major Web engines (WebKit/Safari and Chromium/Chrome/Edge/Brave) accelerated HTML parsing using SIMD instructions. These ‘SIMD’ instructions are special instructions that are present in all our processors that can process multiple bytes at once (e.g., 16 bytes). The problem that WebKit and Chromium solve is to jump to the next target character as fast as possible: one of <, &, \r and \0. In C++, you can use std::find_first_of for this problem whereas in C#, you might use the SearchValues class. But we can do better with a bit of extra work. The WebKit and Chromium engineers use vectorized classification (Langdale and Lemire, 2019): we load blocks of bytes and identify the characters you seek using a few instructions. Then we identify the first such character loaded, if any. If no target character is found, we just move forward and load another block of characters. I reviewed the new methods from the Web engines in C++ and C# along with minor optimizations. The results are good, and it suggests that WebKit and Chromium engineers were right to adopt this optimization. But can we do better? Let us consider an example to see how the WebKit/Chromium approach works. Suppose that my HTML file is as follows: <!doctype html><html itemscope="" itemtype="http://schema.o'... We load the first 16 bytes… <!doctype html>< We find the target characters: the first character and the 16th character (<!doctype html><). We move to the first index. Later we start again from this point, this time loading slightly more data… !doctype html><h And this time, we will identify the 15th character as a target character, and move there. Can you see why it is potentially wasteful? We have loaded the same data twice, and identified the same character again as a target character. Instead, we can adopt an approach similar to that used by systems like simdjson. We load non-overlapping blocks of 64 bytes. Each block of 64 bytes is turned into a 64-bit register where each bit in the word correspond to a loaded character. If the character is a match, then the corresponding bit is set to 1. The computed 64-bit word serves as an index for the 64 characters. Once we have used it up, we can load another block and so forth. Let us build a small C++ structure to solve this problem. We focus on ARM NEON, but the concept is general. ARM NEON is available on your mobile phone, on some servers and on your macBook. Our constructor initializes the neon_match64 object with a character range defined by start and end. We define three public methods: get(): Returns a pointer to the current position within the character range. consume(): Increments the offset and shifts the matches value right by 1. advance(): Advances the position within the range To iterate over the target characters, we might use the class as follows: The class code might look like the following: The most complicated function is the update function because ARM NEON makes it a bit difficult. We use use vld1q_u8(buffer) loads 16 bytes (128 bits) of data from the memory location pointed to by buffer into the data1 variable. Similarly, data2, data3, and data4 load subsequent 16-byte chunks from buffer + 16, buffer + 32, and buffer + 48, respectively. The expression vandq_u8(data1, v0f) performs a bitwise AND operation between data1 and a vector v0f (which contains the value 0xF in each lane). The expression vqtbl1q_u8(low_nibble_mask, ...) uses the low_nibble_mask vector to permute the low nibbles of the result of the AND operation. The result is stored in lowpart1, lowpart2, lowpart3, and lowpart4. The expressionvceqq_u8(lowpart1, data1): Compares each lane of lowpart1 with data1. If equal, the corresponding lane in the result is set to all ones (0xFF). They correspond to target characters. You repeat the same computation four times. We then use bitwise AND with bit_mask(...)  followed by several applications of pairwise sums vpaddq_u8(...) to compute the 64-bit word. I have added this C++ structure to an existing benchmark. Furthermore, I have ported it to C# for good measure. How well does it do? I use a C++ small benchmark where I scan the HTML of the Google home page. I run the benchmark on my Apple M2 laptop (LLVM 15). The number speaks for themselves: the 64-bit vectorized classification approach is the clear winner. It requires few instructions and it allows retiring many instructions per cycle on average. What about C#? Again, the 64-bit vectorized-classification approach is the clear winner. The SearchValues class provided by .NET gets an honorable mention for its good performance. I am not exactly sure why the C# code is running at half the speed of the corresponding C++ code in this instance. But 15 GB/s is already fantastic. I have not yet ported the code for x64 processors, but I expect equally good results.
8
48
391
59,936
Java first came out with a suite of tools that was ahead of its time. You could package your software, document it, and so forth. Over time, some of the very best environments were built to serve the Java ecosystem. But it turned into a curse. Yesterday, I'm setting up a simple benchmark for my blog post using JMH. Sounds like child's play, right? Wrong. What followed was an epic saga of build system wizardry. I spent 10x more time on the build setup than on the code. Look at the screenshot, pay attention to the necessary `<arg>-processor</arg>`? Want to build and run... mvn clean install && java -jar target/benchmarks.jar Why the arcane ritual? Gradle came along, promising a new dawn. But alas, it seems just as complicated. Let us compare with C#/.NET. We find the same XML... but I can hand edit these files with relative ease (except for the silly hash codes). And once you got it working, you can just do... dotnet build dotnet test dotnet format dotnet run Adding a dependency? dotnet add package You know what the equivalent is in Go? Want to write a benchmark? Write the code. No config file. No dependency. Just 'go test -bench .'. I'm no novice. I can write my own CMake build files for C++, I've setup Cargo for Rust, programmed in Python, JavaScript. Ocaml. It always seems simpler, faster, better. Yes, even Python seems better despite all the complaints. Basically, almost everyone spent a whole lot of effort simplifying the builds so that it is easy, reproducible, understandable, debuggable. You should not need hundreds of lines of XML code to build a project unless it is a massive project, like your own Web browser or operating system. But Java? It stands alone. Instead of simplifying the build, it sought to hide it under graphical interfaces. So, when you need to peek under the hood, Java's build systems feel like a labyrinth designed by a mad architect. It's not the autoconf/automake nightmare, but in this age of instant build setup, Java's build ecosystem feels like a relic, a monument to complexity overthrown by the very simplicity it once championed. Maven and gradle are good at failing with mysterious error messages that only a C++ compiler can surpass.
62
50
380
53,144
Scan HTML faster with SIMD instructions: Chrome edition Modern processors have instructions to process several bytes at once. Effectively all processors have the capability of processing 16 bytes one once. These instructions are called SIMD, for single instruction, multiple data. It was once an open question whether these instructions could be useful to accelerate common tasks such as parsing HTML or JSON. However, the work on JSON parsing, as in the simdjson parser, has shown rather decisively that SIMD instructions could, indeed, be helpful in breaking speed records. Inspired by such work, the engine under the Google Chrome browser (Chromium) has adopted SIMD parsing of the HTML inputs. It is the result of the excellent work by a Google engineer, Anton Bikineev. The approach is used to quickly jump to four specific characters: <, &, \r and \0. You can implement something that looks a lot like it using regular C++ code as follows: void NaiveAdvanceString(const char *&start, const char *end) { for (;start < end; start++) { if(*start == '<' || *start == '&' || *start == '\r' || *start == '\0') { return; } } } A ‘naive’ approach using the SIMD instructions available on ARM processors looks as follows. Basically, you just do more or less the same thing as the naive regular/scalar approach, except that instead of taking one character at a time, you take 16 characters at a time. You can do slightly better if you use an approach we call vectorize classification (see Langdale and Lemire, 2019). Basically, you combine a SIMD approach with vectorized table lookups to classify the characters. The ARM NEON version using two table lookups looks as follows: This version is close to Bikineev’s code as it appears in the Google Chrome engine, except that I use standard instrinsics while Google engineers prefer to use the excellent highway SIMD library by Jan Wassenberg. We can do slightly better in this instance because Bikineev only needs to distinguish between four characters. A single table lookup is needed. We did not elaborate in Langdale and Lemire (2019) but vectorized classification works using one, two, three or more table lookups, depending on the complexity of the target set. The simpler version might look as follows: How do these three techniques compare? I wrote a small benchmark where I scan the HTML of the Google home page. I ran the benchmark on my Apple M2 laptop (LLVM 15). The results follow my expectations: the simplest vectorized classification routine has the best performance. However, you may observe that even a rather naive SIMD approach can be quite fast in this instance. If you have an old SSE2 PC, only the simple SIMD approach is available. My results suggest that it might be good enough to get good results.
13
63
393
47,286
Performance tip: avoid unnecessary copies Copying data in software is cheap, but it is not at all free. As you start optimizing your code, you might find that copies become a performance bottleneck. Let me be clear that copies really are cheap. It is often more performant to copy that data than to track the same memory across different threads. The case I am interested in is when copies turn a trivial operation into one that is relatively expensive. Recently, the fast JavaScript runtime (Bun) optimized its base64 decoding routines.Base64 is a technique used to represent binary data, like images or files, as ASCII text. It is ubiquitous on the Internet (email formats, XML, JSON, HTML and so forth). In Node.js and in Bun, you can decode a base64 string Buffer.from(s, "base64"). Importantly, when decoding base64 strings, you can be almost certain that you are dealing with a simple ASCII string. And systems like Node.js and Bun have optimized string representations for the ASCII case, using one byte per character. We had optimized base64 decoding in Node.js some time ago (credit to Yagiz Nizipli for his work)… but I was surprised to learn that Bun was able to beat Node.js by a factor of three. Because both Node.js and Bun use the same base64 decoding, I was confused. I mistakenly thought, based on quick profiling, that Node.js would copy the base64 data from an ASCII format (one byte per character) to a UTF-16 format (two bytes per character) despite our best attempt at avoiding copies. It turns out that the copy was not happening as part of base64 decoding but in a completely separate function. There is an innocuous function in Node.js called  StringBytes::Size which basically must provide an upper on the memory needed by the Buffer.from function. Since the early versions of Node.js, it would allocate memory to compute the size of the output: I install the latest version of Bun (bun upgrade --canary). I compare Node.js 22 with a patched version. I use my Apple MacBook for testing (ARM-based M2 processor). You can see that by simply avoiding the unnecessary copy, I boosted the base64 decoding from 2.5 GB/s to 6.0 GB/s. Not bad for removing a single line of code. Sometimes people observe at this point that the performance of Node.js 18 was already fine: 1.3 GB/s is plenty fast. It might be fast enough, but you must take into account that we are measuring a single operation that is likely part of a string of operations. In practice, you do not just ingest base64 data. You do some work before and some work after. Maybe you decoded a JPEG image that was stored in base64, and next you might need to decode the JPEG and push it to the screen. And so forth. To have an overall fast system, every component should be fast. You may observe that Bun is still faster than Node.js, even after I claim to have patched this issue. But there are likely other architecture issues that Bun does not have. Remember that both Node.js and Bun are using the same library in this instance: simdutf. It is maybe interesting to review Bun’s code (in Zig): It is far simpler than the equivalent in Node where memory is allocated in one function, and then the resulting buffer is passed to another function where the decoding finally happens. It is likely that Bun is faster because it has a simpler architecture where it is easier to see where the unnecessary work happens.
The fast JavaScript runtime Bun is much faster than Node.js 22 at decoding Base64 inputs. By much faster, I mean *several times* faster. But they both rely on the same underlying library (simdutf) for the actual decoding. So what gives? The problem is that Node.js needs to interact with v8, the underlying JavaScript engine (from Google)... and doing so is not trivial. Before we can start decoding the string, we need to grab the string... so, in this instance, we call String::Value... In turns, this allocates an array inside Node.js and asks v8 to copy the content to it... In an ideal world, we would avoid the trouble entirely and just ask v8 to give us direct access to how it stores the string... and we try to do that if we can... but let me come back to it... How bad can this be, right? Just a copy. Well. Let us do some profiling... So you see, the base64 decoding itself is about about 1/5 of the running time, but the copy takes half of it. What is up with this CopyChars function? Well, it is mostly just a wrapper around the standard high level C++ function std::copy_n as far as I can tell. (see v8/src/utils/memcopy.h) But we are copying for an 8-bit input to a 16-bit output... why is that? Base64 is pure ASCII... and v8 can store ASCII using 8-bit per character. We get there before both IsExternalOneByte() and IsOneByte() are false (see node/src/node_buffer.cc)... We have fast paths for these cases. If IsExternalOneByte() is true, we just get the bytes and everything is great. Unfortunately, it does not always work. So we have a v8 string that is really pure ASCII, but, seemingly, we can't tell that it is the case from Node.js, and so we have to convert it to UTF-16 needlessly, using a function that is maybe not very well optimized... and then we do the base64 decoding of an ASCII string from the UTF-16 input. It is not great. To be fair, this is just one string, created as 'Buffer.alloc(size, "latin1").toString("base64")', basically the base64 encoded version of the string "latin1latin1latin1...". In actual applications, we might have better luck. Yet. Yet. I am telling this complicated story for a reason. The story illustrates why our software is slower than it should be. We have layers of abstractions to fight against. Sometimes you win, sometimes you lose. These layers are there for a reason, but they are not free. To make matters worse... these abstraction layers often thicken over time... and the friction goes up. To be clear, I do not claim that the Node.js code is optimal. In fact, I know it can be better. But it is not trivial to make it go fast. I sometimes hear people say... "well, it is C++ and C++ is hard". No. The C++ part is easy relatively speaking. The difficulty is at a higher level. It is not a matter of syntax. It is a matter of architecture.
8
37
385
55,908
When nearly the entire world required face coverings, Sweden did not. When nearly all countries closed schools, they did not. Many predicted that Sweden would be a catastrophe. Today we know that Sweden has one of the lowest excess mortality in the world, same as Japan and South Korea. Those who predicted the worse for Sweden are now turning to post hoc: "we knew Sweden would be fine because it is rich" or "we knew they had enough measures". But it is all post hoc: look at their words from 2020/2021 and you find that they predicted the worse for Sweden. They are now working backward, to avoid facing their mistakes, keeping their faulty model instead. It is and was an emotional response: they "know" that granting all powers to the state keeps you safe. Individual freedom is "dangerous". They celebrated the censorship of their opponents: they called individual freedom a value for "right wing nuts". Sweden was the moderate, while many others went mad. Yet they demonized Sweden. The twentieth century is a long lesson teaching us to renounce overbearing governments, to fight for individual freedom, to preserve free speech. When we fail, millions die or suffer, we stagnate. Progress and happiness require freedom and strength.
The USA would have had 1.50 million fewer deaths if it had the performance of Sweden during the pandemic. medrxiv.org/content/10.1101/…
19
79
382
131,460
The ARM-based servers (e.g., graviton 3) on AWS appear to also default on 4 kB page sizes. That's unfortunate because it makes memory allocation and page faults more expensive/likely. With Linux, you can use transparent huge pages, however, and this can be tremendous benefits in some cases. lemire.me/blog/2022/06/07/me…
people like to talk about the M1’s CPU performance but ignore the perf benefits from page size grow from 4096 -> 16 KB Page faults are expensive and not much desktop software uses < 16KB anyway
12
39
377
175,555
If you think that performance does not matter, watch this video by @jarredsumner comparing the performance of his fast JavaScript runtime with the standard Node.js runtime. You see why we care? cc @yagiznizipli
Calling fetch(url) 15,000 times, in batches of 50 left: Bun v1.1.7 right: Node v22
28
23
375
162,057
Safer than Rust, Newer than Zig, Prettier than Python. Unlike Cython, it has a corporate website. Go Modular!
New C++ successor language just dropped 🔥 modular.com/mojo
19
22
369
184,668
For the C++ folks out there: Node.js is now built with C++20. @yagiznizipli (among others) used the opportunity to simplify the code. C++20 is underrated. It is really a step forward.
Node.js v23.0.0 is out! 💚 This version will replace Node.js 22 as the ‘Current’ release line when Node.js 22 enters long-term support (LTS) later this month. Release notes: nodejs.org/en/blog/release/v…
17
28
365
31,085
Today I learned that some programming languages reorder fields to save bytes. :-/
People do ask me why Odin's struct fields are ordered lik C rather than reordered to minimize padding (which is what a lot of new languages like Rust do). This comment from Russ Cox in 2015 is a good answer as to why Go doesn't do any reordering either: github.com/golang/go/issues/…
41
7
364
73,110
Redis is data structure store that can be used as a database, cache, and message broker. It is used by a wide range of major corporations. Recently, Redis has adopted our library (fast_float) and they report a substantial performance boost (30%).
6
18
365
20,388
Our paper, "Parsing millions of URLs per second", written with @yagiznizipli, is one of the 20 most read articles in the last five years in Software: Practice and Experience according to the publisher (Wiley). Link as a comment.
4
39
368
19,103
A question I often ask top software engineers I meet is: “How often do you read research papers?” The most common answer is a silent grunt. Top engineers do read some research papers occasionally, but they mostly ignore 99.9% of the papers that are supposed to mark advancements in their own field of activity. This even extends to engineers who hold a PhD and are actively doing industrial research. I am sure this is not limited to computing and extends to most businesses. What should you make of my observation? Firstly, we risk overestimating the contribution that a formal research paper makes to the advancement of knowledge and technology. Secondly, we likely underestimate the value of informal knowledge—the kind you find when chatting with people over coffee or on random websites. Who you know and who you read matters. Thus the best model for formal publications is that they serve as archives more than communication. They are like comments you leave on your git commits. Few people read them but they may serve a purpose when someone needs to understand the history of an idea.
63
25
363
49,571
In the ada C++ library, we are currently stuck with a GCC puzzle. We cannot seem to return an std::vector from a function without stack buffer overflow. We have been at this for over a week. Link to the issue in the comments.
11
14
348
55,849
Both can be true. 1. Some software engineers cleared out their work in record time due to AI. 2. AI is useless at what @Jonathan_Blow considers non trivial. How come? Because, evidently, many software engineers mostly do trivial work. I am not saying that they are bad or lazy. Maybe that’s all they are allowed to do.
(It should of course be something that is not set up beforehand, so, randomly chosen in an unpredictable way). Until then, the burden of proof is on those who keep making the obviously false claims.
31
8
350
46,306
Learn C programming. Write a small program. Change the world.
12
26
327
28,815
"But Daniel, how do I divide two integers in C++?" I got you covered...
Apparently, it is how you compute the integer division of 455 by 10 in JavaScript. The answer was upvoted 27 times. I don't know what to think at this point.
12
17
341
71,918
Reminder: open-source projects are not a source of free technical support. If there is no documentation, offer to write some. If some function is missing, offer to write it. Do not demand that functionality be added, that documentation be provided… unless you are willing to pay.
9
67
322