Dataframes powered by a multithreaded, vectorized query engine, written in Rust.

Amsterdam
Polars 0.16.0 is out! github.com/pola-rs/polars/re… A few breaking changes that lead to more strictness and more consistency. It feels good to ditch mistakes!
37
276
52,310
Happy New Year! With a new Python Polars release: github.com/pola-rs/polars/re… A new Rust Polars release: github.com/pola-rs/polars/re… And a new release for Polars plugins: github.com/pola-rs/pyo3-pola… Oh.. And polars now supports plotting😎
6
29
250
28,348
Hugging face just merged Polars integration 🐻‍❄️
Rust 🦀 is sooo fast for querying data :o We just merged support for Polars and it's a blast ! Explore any dataset on 🤗 at ⚡ speed:
2
22
200
15,000
Almost 20.000 stars?!🤯👀
1
8
183
15,561
🤗 x Polars! Polars now supports native reading from @huggingface datasets. Check out our latest blog to learn more about it: pola.rs/posts/polars-hugging…
4
35
188
18,254
You can read the pre-release of the first chapters of the upcoming O'Reilly polars book. Join our discord to download the PDF: discord.com/invite/4UfP5cf
The first two chapters of our book Python Polars: The Definitive Guide have just been released! It ain't much, but it's honest work. @thijsnieuwdorp and I are working hard on the remaining 16 chapters. We'll keep you posted.
6
32
161
24,272
Polars 0.15 is released.🐻‍❄️ Parts of the query engine are rewritten and we have seen up to 20% performance increase because of it and huge decreases of RAM usage. Next releases will increase performance of parquet reading. 🚀 github.com/pola-rs/polars/re…
26
162
Python polars 0.20.4 is released with as highlights: - Full support for the DataFrame interchange protocol. - A lot of extra SQL functions - Better hive partition pruning - Many new expressions for the Array datatype See the full changelog here: github.com/pola-rs/polars/re…
2
25
157
11,035
Polars 0.16.10 is out. Only 5 days in the making, but you wouldn't say that... - full out-of-core sort - writing to excel - writing to databases - streamable UDFS - skipping of entire parquet pages on `is_in` and equality queries. - decimal datatypes ... github.com/pola-rs/polars/re…
2
19
153
10,802
Polars backed time series ML library. According to the authors it is the fastest in the world👀
1
12
125
15,855
The new polars release ships with native cloud readers for `scan_parquet` AND support for reading hive partitioned datasets (with pushed down optimizations). This is in beta-release, so give it a spin and do provide feedback, so that in next releases all will run smoothly. :)
7
18
126
10,432
Polars will start a new phase!
I am very excited to announce that Polars raised a $4M seed round! Chiel Peters and I co-founded Polars the company. Read more on what we will build! pola.rs/posts/company-announ…
1
13
117
8,920
With python polars release 0.19.9 you can compile and link your own custom expressions into the default polars engine. Those user provided expressions will enjoy: - Rust speed - Parallelism - Query optimizations Read more here: pola-rs.github.io/polars/use…
1
20
107
36,637
Polars 0.16.14 has landed! github.com/pola-rs/polars/re… Most string functions have been optimized with performance increases ranging from 2-15x! And the import times has been reduced drastically to < 50 seconds. For comparison: - numpy: 104ms - pandas: 520ms
5
19
118
9,178
We are happy to share more about what we are building and our goal to run Polars on any dataset size! A managed Distributed Polars compute cluster to ensure a single DataFrame API for all your needs. pola.rs/posts/polars-cloud-w…
13
107
6,780
A new release of Polars in Aggregate! TLDR: - Full plan CSE - Benchmark of the updated CSV writer - Dead expression elimination - Ad-hoc SQL on DataFrame and LazyFrame - Support for export to PyTorch and JAX pola.rs/posts/polars-in-aggr…
2
12
106
4,851
New python polars release. Interesting changes since last week. - Zero-copy interchange between numpy/ polars. Moving data between polars and numpy will now only move pointers around.
4
9
100
13,276
Polars 1.19 comes with support for arbitrary predicates in join_where. This means that inequality joins are now more flexible than ever! Here is a small example of something you couldn't do before:
1
10
97
4,651
Polars supports dynamic aggregations based on time windows via the function `group_by_dynamic`. To use it, you specify a date(time) column to group by, and then determine the windows over which values are aggregated. Note: data points can fall within 2+ windows 👇
3
5
95
4,627
Last few months we've made a lot of improvements for the Polars in-memory engine. And it shows in the latest TPC-H benchmark run! pola.rs/posts/benchmarks/
4
91
6,219
The team at @dbsystel (subsidiary of Deutsche Bahn) optimizing Germany’s train schedules realized 20x speedups after switching to Polars. Lowering cloud costs and supporting the organization's sustainability goals: pola.rs/posts/case-db-systel…
5
7
86
5,429
Try duckdb AND polars.
DuckDB 0.7.0 "Labradorius" released with #JSON support, parallel and partitioned export to CSV and Parquet, UPSERT, @DataPolars integration, and much more in our release announcement blog post: duckdb.org/2023/02/13/announ…
9
85
15,940
Latest polars vs pandas 2.0 with pyarrow datatypes. Queries as a whole tell a more realistic story.
Pandas 2.0 vs Polars 0.16.10 I see people taking the wrong conclusions from this microbenchmark. Here are the results of pandas with arrow backed datatypes vs polars on TPCH queries. Micro benchmarks don't show the whole query runtime. Code here: github.com/pola-rs/tpch/pull…
1
9
80
14,901
Check rewrote their airflow DAGs from pandas to Polars and: - Got rid of their out of memory SIGTERMs - Made their architecture simpler - Reduced the ETL time by ~50% - Scaled down their cluster, saving 25% in cloud bills. pola.rs/posts/case-check-tec…
3
84
5,396
Polars latest release comes with FULL plan Common Subplan Eliminiation (CSE). Duplicate parts of the query will automatically be cached/shared. Here is the difference on TPCH q2:
1
9
73
6,999
With the new #fugue release, #polars can run on a #spark, #dask or #ray cluster! And the benchmark done by the fugue team looks super promising! Read more: medium.com/fugue-project/ben…
3
10
70
12,565
The Polars team has been hard at work! We've just published a post with an overview of some of the additions and changes in Polars from 1.3.0 to 1.15.0. We cover performance improvements to I/O, the addition of inequality joins, and much more! pola.rs/posts/polars-in-aggr…
1
8
72
2,425
Always wondered what happens when you run a query in Polars? In our next blog post 'A birds eye of Polars' by Chiel Peters we take a look at the internals of Polars. pola.rs/posts/polars_birds_e…
4
15
71
7,969
Why is there a `struct` data type? A single expression produces a single column, so expressions like `value_counts` need to output structs to map the values to their counts. With that said, do you understand why `.struct.unnest` doesn't break the 1 expr = 1 column principle?
1
5
68
3,345
Today we are launching the first open Crash Course training sessions with a limited time discount. These instructor-led sessions are open to everyone looking to get up and running with Polars. Find a date and sign up via our Academy: pola.rs/academy/
1
8
62
4,580
Pandera just shipped Polars support! pandera--1373.org.readthedoc…
2
6
53
5,804
Polars 0.17.15! This weeks release adds: - Arrows FixedTypeList (we call them Arrray) - writing to delta tables - UNION, EXPLAIN, CASE, EXCLUDE to its SQL vocab - SIMD acceleration for all json parsing See all changes: github.com/pola-rs/polars/re…
11
61
6,485
Can't remember how many days each month has? (Me neither!) Memorise this Polars snippet instead. Using some calendar-aware functions, we can get the answer in a tidy dataframe, as the diagram below shows.
4
5
57
3,810
That's `pl.scan_csv(..).sink_parquet(..)`
While building Data Engineering or Machine Learning pipelines using CSV data, you should first: 🔥Convert the CSV data to parquet and save it using @DataPolars. Using Parquet + @ApacheArrow combination with almost any data engine is significantly faster than CSV
3
5
56
7,658
The context filter lets you filter out rows from a dataframe based on some conditions. Within an aggregation, you can also use filter to filter values from aggregated groups. In this example we ignore unverified times when computing the current record.
2
11
56
3,683
In our latest blog we uncover predicate pushdown, a query optimization technique. Read more on what it is and how you can apply at the link below. pola.rs/posts/predicate-push…
4
52
3,870
Watch Marco Gorelli's Pydata London talk about Polars plugins. A thorough guide on how to get started and bring your custom expressions powered by crates.io! piped.video/j2N_YD5vbOs?si=-N9e…
7
52
3,881
Polars has essentially 18 different data types. If you are unsure what each type is, the conversion table below might help you. Each Polars data type is presented next to the **most similar** Python type.
3
4
47
2,377
Replying to @vboykis
Have you heard of us? 🐻‍❄️
2
2
48
4,929
The expression `clip` is pretty straightforward: You provide a lower and an upper bound, and Polars makes sure all values fall within those bounds. If a value is too small/too large, it's replaced by the bound. Bounds can be literals, other columns, or arbitrary expressions.
5
46
2,569
Extend Polars and run your custom functions as fast as native expressions. Double River re-implemented two key models as plugins and achieved 10x speedups with half the memory usage. pola.rs/posts/case-double-ri…
4
45
2,542
The expression over can be used to compute expressions within isolated groups. This means you can do computations per group without having to group first and then explode after. In this example, we rank swimmers based on their time, but within their race type.
7
46
2,441
A polars LazyFrame knows its schema before running it. That means you can assert the output data types and catch errrors before running your query.
1
2
44
#Python polars 0.14.19 is released. This is the first of many releases where we will improve the capabilities to process larger than RAM data. This release on that topic: - first parts of streaming engine - batched csv reader github.com/pola-rs/polars/re…
1
9
42
You want to join two tables on their ID column, but only when the dates in one table fall within the range of the other table. Polars lets you do that with `join_where`, which supports inequality joins through the use of inequality predicates. Here's an example 👇
3
38
1,661
Polars support in PyCharm! 💯
📣 PyCharm 2023.2 EAP 2 is now out! Try these new features: 🥁 Live templates for Django forms and models 🥁 Support for @DataPolars DataFrames 🥁 Initial @GitLab integration Read more and download 👉 jb.gg/t1xv3m
1
37
2,936
#Python #polars 0.13.52 is just released. This is the first python release that has NO required external dependencies. Thanks to @stinodego we finally removed almost all numpy requirements and most operations will run on the polars engine now!
1
1
32
Scaling up any Polars (or other) query done easy with coiled! This blog shows how it easy it is to run something on a big VM.
New blog post on how to process large datasets with @DataPolars in the cloud. Polars is a great tool for querying large datasets. When you need more memory, you can use Coiled serverless functions to run Polars on a big VM in the cloud. medium.com/coiled-hq/process…
3
31
4,361
How to “expand” ranges like "3-5" to three different rows with the integers 3, 4, and 5, and all other variables are the same?
2
2
33
1,959
Join our webinar with @nvidia on January 28 for an in-depth session on how the GPU engine works, from collecting your query to parallel execution on the GPU. Sign up at info.nvidia.com/nvidia-polar…
1
3
33
1,978
#Python #polars 0.14.27 is released. github.com/pola-rs/polars/re… This release launches the alpha version of our streaming engine. Need to process 250GB on you laptop? Try: my_query().collect(allow_streaming=True) Next we release we will be able to | the result to different files!
5
33
We are looking for a Rust Engineer based in the Netherlands. Are you interested in the job, or do you know someone who is? Please reach out. hiring.pola.rs/o/rust-softwa…
2
13
31
7,251
Try us now and you will feel 10 years younger. 💁
I don't know if you believe in magic, but I use Polars instead of Pandas now and believe me, its magic. I think it managed to slow down my ageing process considerably.
5
27
4,271
Polars and Narwhals now temporarily are visible as street art in Amsterdam. Big shoutout to @Anopsy
*Polars* on the streets of Amsterdam
2
5
28
3,527
Congratulations Matt🎉 Today is the pre-sale of "Effective Polars". A great boost for Polars content!
🥧 Happy Pi Day, Data Enthusiasts! 📊 I hope you are enjoying (or will enjoy) some literal pie, and perhaps you are enjoying some "pie"thon today! 🐍 While I love math, today marks another big day for me. The presale launch of "Effective Polars". Folks have been clamoring for this book for a while. I expect to have it released next week. 🚀 Why dive into "Effective Polars"? • Speed and Efficiency: Learn how to leverage Polars for high-speed data processing, transforming how you handle large datasets. • Real-World Applications: Packed with practical examples, this book guides you through implementing Polars in your projects, making data work for you. • Expert Insights: Benefit from my journey with Polars, sharing the nuances, best practices, and tips that will elevate your data analysis skills to new heights. 🎉 Exclusive Pi Day Presale Offer: I’m offering an exclusive discount for the Pi Day presale to celebrate. There are a few options available. If you buy a bundle, the video course is already available. 🔗 Secure your copy now store.metasnake.com/effectiv… Use the code PIDAY for a 31% discount today only!
1
27
3,291
Researchers and engineers at G-Research are using Polars in their projects to handle data more efficiently. Reporting 150x speedups over earlier implementations. Read our latest case study: pola.rs/posts/case-gresearch…
1
6
27
5,558
joins can produce very large outputs. This weeks #python #polars release pushes slices down to the join level for most join types including cross joins. This join would produce over 4 billion rows, but because we slice a much smaller portion of it, we can finish it under a ms.
2
2
29
New article on the Polars blog! “Breaking the rules with expression expansion” delves into how `.struct.unnest` seems to break one of Polars'most fundamental principles but doesn't: A single expression must always produce a single column as a result. 👉 pola.rs/posts/breaking-the-r…
3
27
1,675
Win a Polars swag box in the @posit_pbc 2024 Table Contest. The top table entry that involves Polars will win the box, containing a Polars hoodie, stickers and more. Find out how to participate: posit.co/blog/announcing-the…
1
4
24
4,154
Replying to @acbass49
And you are in luck! @braaannigan has made his course free today! udemy.com/course/data-analys…
1
3
24
1,078
Polars core algorithms have large materialization speed-ups. This will most likely be significant on tables in the millions of rows.
The re-run of the db-benchmark made me look at our groupby and join materialization again and I have mode some solid performance improvements last 2 weeks. Some queries are ~2x faster!
23
2,434
Today we added an @kapa_ai LLM Plugin in our docs that has RAG access to our reference and user guide. This drastically improves Polars code generation! docs.pola.rs/api/python/stab…
2
3
24
2,677
Who gets the 10k?
4
2
21
Extend python polars with UDFs compiled in rust!
I just added pyo3 bindings to polars `Series` and `DataFrame`. Extending python polars with your own rust functions is now super easy! crates.io/crates/pyo3-polars
20
4,058
Replying to @RexDouglass
Hi, nice to meet you🙂
1
20
1,560
We ran the first 7 TPCH queries on scale factor 10 to compare different #python dataframe solutions on a single machine. This shows that scaling #pandas doesn't always make it faster. See that small purple bar? That's how much faster polars is!
3
5
21
Jetbrians DataSpell has polars support!
DataSpell 2023.2 EAP 1 is out! Highlights include: - Polars column-name completion - Interactive tables for Polars DataFrames - Easier column data type identification Learn more and download DataSpell here:jb.gg/b026ov
3
1
22
2,266
In the current #python #polars 0.13.60 release we added support for memory mapping arrow feather files! You can now run multiple queries against disk and your OS might cache a lot of columns in RAM.
4
21
50 ms that is! 😁
1
18
1,000
Replying to @Victorgoba
That's probably bottlenecked by sqlalchemy. Did you try engine='adbc'
1
17
634
Great example of plotting with polars.
I find fascinating: - How beautiful @DataPolars pipelines look like (and how fast they are) - How nicely @plotlygraphs works with #Polars - That all real estate data in #Dubai is open (also other data published by @DigitalDubai)
5
14
2,691
📢 Hot off the press, new blog article: “Understanding Polars data types” Many dtypes you already know or understand intuitively. This article will push your intuition further, so you can confidently pick the appropriate data types when you are working with data in Polars.
1
14
1,268
Interested in machine learning pipelines with polars? @marktenenholtz course covers that!
Replying to @marktenenholtz
The “Advanced Topics” week of my course lets students build a whole feature engineering pipeline with it. Everyone who tried it *loved* the library and a few said they were immediately bringing it to their work projects. Check out the course here: corise.com/go/forcasting-wit…
1
10
1,820