Dplython 介绍
dplython是使用Python语言的dplyr。dplyr是一个使用R语言快速分析数据的库。
dplyr的理念是在一些最常见的任务中限制数据操作的部分功能。这种映射思维更接近编码思维,帮助您在分析数据时提高“思维速度”。
安装:
pip install git+https://github.com/dodger487/dplython.git
使用:
from dplython import * diamonds >> select(X.carat, X.cut, X.price) >> head(5) """
# Filter out rows using dfilter diamonds >> dfilter(X.carat > 4) >> select(X.carat, X.cut, X.depth, X.price) """
# Sample with sample_n or sample_frac, sort with arrange (diamonds >> sample_n(10) >> arrange(X.carat) >> select(X.carat, X.cut, X.depth, X.price))"""
# You can: # add columns with mutate (referencing other columns!) # group rows into dplyr-style groups with group_by # collapse rows into single rows using sumarize (diamonds >> mutate(carat_bin=X.carat.round()) >> group_by(X.cut, X.carat_bin) >> summarize(avg_price=X.price.mean()))"""
# If you have column names that don't work as attributes, you can use an # alternate "get item" notation with X. diamonds["column w/ spaces"] = range(len(diamonds)) diamonds >> select(X["column w/ spaces"]) >> head() """
# It's possible to pass the entire dataframe using X._ diamonds >> sample_n(6) >> select(X.carat, X.price) >> X._.T """
# To pass the DataFrame or columns into functions, apply @DelayFunction @DelayFunctiondef PairwiseGreater(series1, series2): index = series1.index newSeries = pandas.Series([max(s1, s2) for s1, s2 in zip(series1, series2)]) newSeries.index = index return newSeries diamonds >> PairwiseGreater(X.x, X.y)# Passing entire dataframe and plotting with ggplotfrom ggplot import *ggplot = DelayFunction(ggplot) # Simple installationdiamonds = dplyFrame(pandas.read_csv('./diamonds.csv')) # Masked in ggplot pkg(diamonds >> ggplot(aes(x="carat", y="price", color="cut"), data=X._) + geom_point() + facet_wrap("color"))
(diamonds >> dfilter((X.clarity == "I1") | (X.clarity == "IF")) >> ggplot(aes(x="carat", y="price", color="color"), X._) + geom_point() + facet_wrap("clarity"))
# Matplotlib works as well! import pylab as pl pl.scatter = DelayFunction(pl.scatter) diamonds >> sample_frac(0.1) >> pl.scatter(X.carat, X.price)
Dplython 官网
https://github.com/dodger487/dplython
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。