Crab，一款python上优秀的推荐系统引擎

木戈

1682人浏览 · 2014-04-07 23:23:34

木戈 · 2014-04-07 23:23:34 发布

今天，逛网页发现了一款推荐系统的引擎——crab。它是python的一个开源包，整个推荐系统的构架已经写好，其中的推荐算法可以自己定义，在此构架上进行推荐算法的研究可以提高效率。

1、crab的安装
（1）在安装之前，需要一些python包和其他一些，这些是numpy, scipy, setuptools,scikits.learn, python development headers，a working C++compiler。我们通过命令获得：

sudo apt-get install python-dev python-numpy python-numpy-devpython-setuptools python-numpy-dev python-scipy libatlas-devg++

为了获得最新的版本可以执行一下：

pip install -U scikits.learn
or
easy_install -U scikits.learn

为了包内的示例数据集可以顺利运行，我们可以执行一下：

sudo apt-get install python-matplotlib

（2）安装crab
可以通过pip来安装：

pip install -U crab
or:
easy_install -U crab

这种安装最为快速。

当然也可以通过下载源代码来进行安装：
下载地址：https://github.com/muricoca/crab/downloads

先解压，再cd到目录，执行：
python setup.py install

安装OK

2、使用
（1）使用帮助
在终端中执行：
python
help("scikits.crab")
可以获得帮助
可以看到这个包有一下借个目录：
base、datasets、metrics、models、recommenders、similarities、tests、utils
datasets中放了一个电影的示例数据集和一个歌曲的示例数据集
models中包含了几种数据模型
recommenders下是需要我们自己重写的算法，包括knn和svd
similarities是相似度

（2）来一个实例
这个例子是官方的
http://muricoca.github.io/crab/tutorial.html

先从数据集中读取数据
>>> from scikits.crab import datasets
>>> movies = datasets.load_sample_movies()
>>> songs = datasets.load_sample_songs()

我们可以打印出数据集的内容，注意看它们的格式
>>> print movies.data

{1: {1: 3.0,2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
5: {2: 4.5, 3: 1.0, 4: 4.0},
6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}}

可以将用户单独打印出来
>>> print movies.user_ids
{1: 'Jack Matthews',
2: 'Mick LaSalle',
3: 'Claudia Puig',
4: 'Lisa Rose',
5: 'Toby',
6: 'Gene Seymour',
7: 'Michael Phillips'}

也可以将物品单独打印出来
>>> print movies.item_ids
{1: 'Lady in the Water',
2: 'Snakes on a Planet',
3: 'You, Me and Dupree',
4: 'Superman Returns',
5: 'The Night Listener',
6: 'Just My Luck'}

引入模型包，将数据集中数据定义成特定的模型
>>> from scikits.crab.models importMatrixPreferenceDataMode l
>>> #Build the model
>>> model = MatrixPreferenceDataMode l(movies.data)

引入矩阵包和相似度包，计算相似度
>>> from scikits.crab.metrics importpearson_correlation
>>> from scikits.crab.similarities importUserSimilarity
>>> #Build the similarity
>>> similarity = UserSimilarity(model,pearson_correlation)

引入算法包，这里引入的是基于用户的协同过滤算法，当然这个算法要自己重写
>>> from crab.recommenders.knn importUserBasedRecommender
>>> #Build the User based recommender
>>> recommender = UserBasedRecommender(model, similarity,with_preference=True)

这里是为用户5推荐商品，从结构可以看出系统为用户5推荐了3种相似度最高的物品，
>>> #Recommend items for the user 5 (Toby)
>>> recommender.recommend(5)
[(5, 3.3477895267131013), (1, 2.8572508984333034), (6,2.4473604699719846)]

以下为这个推荐系统引擎的构架：

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

ADS1292R 使用过程心电图高精度ADC模块

文章目录1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础https://www.ti.com.cn/cn/lit/wp/slyy192/slyy192.pdf?ts=1600659610730&ref_u

开放原子开发者工作坊

实现一个家庭安防与环境监测系统（一）

开放原子开发者工作坊

【cf】Codeforces Round #774 (Div. 2) 前4题

题目A. Square Counting 简单数学题目大意题解代码B. Quality vs Quantity 排序题目大意题解代码C. Factorials and Powers of Two 状态压缩dp+位运算题目大意题解代码D. Weight the Tree 树形dp+dfs题目大意题解代码E. Power Board 看起来像是数论？许多年没打cf了，偶尔打了一盘，恢复紫名了。A. S