From the course: Using Large Datasets with pandas

Unlock the full course today

Join today to access over 24,100 courses taught by industry experts.

Big data systems

Big data systems

- [Instructor] Let's have a look at some of the big data systems just in case you need to know them and maybe switch to them. By far, the one that people use most is Spark these days. From the Apache Foundation, it's an open source project. A nice thing about Spark that, once, you don't have to run it yourself, there is commercial companies such as Databricks that will host Spark for you. Another thing is that there is PySpark, which lets you write Python when you walk with Spark, and it has data frames and other familiar APIs. So there's the Panda SPI on Spark, so it tries to be as familiar, but there's still an operational overhead, it's not exactly the same. The other thing is databases. Some databases like BigQuery from Google and other cloud providers have their own version of big databases. They can run on trillions and more rows and run computations for you, and they'll spin machine as much as you need to run a…

Contents