文章编号:1007-757X(2011)01-0044-03
基于web数据挖掘的健康餐饮分析推荐系统的设计
李晓城,张增杰,夏勇明,钱松荣
摘要:随着信息时代生活节奏的加快,快餐文化越来越畅销,而随之带来的饮食的健康问题也被人逐渐关注起来。为了解决这个困境,借用web数据挖掘的技术,提出了在线健康餐饮分析和推荐系统的方案。该系统会跟踪用户的饮食习惯,推荐可以改善用户健康状况的食品,并且避免降低发生疾病的风险。首先介绍web数据挖掘的基础知识,然后提出基于数据挖掘的在线餐饮分析和推荐系统的设计方法,最后给出了分析和推荐系统的实施方案。
关键词:web数据挖掘;健康餐饮;电子商务
中图分类号:TP311文献标志码:A
0引言
随着现在生活节奏的加快,人们自身的健康饮食问题的关注越来越少,所以快餐文化可以越来越畅销,这样会给人们的健康状况带来负面影响。为了解决这个困境,我们依托发达的互联网平台,借助web数据挖掘技术提出了在线餐饮系统的设计方案。
基于WEB数据挖掘,就是利用数据挖掘技术,自动地从网络文档以及服务中发现和抽取信息的过程。我们提出基于web数据挖掘的在线健康餐饮分析和推荐系统,可以跟踪用户的饮食习惯,推荐可以改善用户健康状况的饮食,降低发生疾病的风险。在文献[1]中,提出了基于web数据挖掘的电子商务解决方案,通过顾客和web数据来发现隐藏模式和商业策略,设计了基于数据挖掘技术的新框架来构建网页推荐系统。这个推荐框架可以作为我们分析和推荐系统的基础架构。
在我们的解决方案里,系统架构可以分为3部分:饮食数据获取,数据挖掘和健康饮食推荐。首先,系统方案需要搭建一个C2C的电子商务平台,用户可以在线订餐点菜,系统通过获取用户存在数据库中的数据来跟踪用户的饮食记录。当然,这样的平台也要支持用户可以通过网页在线直接输入饮食数据。然后,我们使用数据挖掘算法像分类、关联规则等来分析用户饮食习惯的有用信息,实现健康状况评定和饮食推荐等。
1Web数据的挖掘
W eb数据挖掘建立在对大量的网络数据进行分析的基础上,采用相应的数据挖掘算法,在具体的应用模型上进行数据的提取、数据筛选、数据转换、数据挖掘和模式分析,最后做出归纳性的推理、预测客户的个性化行为以及用户习惯,从而帮助进行决策和管理,减少决策的风险。在文献[2]中,web数据挖掘处理过程可以分成5个功能模块:数据采集、数据预处理、数据挖掘、分析和评价、知识规则化等模块。
1.1数据的采集
通常从功能上讲,数据采集模块是从外部web环境中有选择的获取数据,为后期的数据挖掘提供材料和资源。Web环境下可以提供的数据源包括:web内容数据、web结构数据、web使用记录数据。这个模块主要由数据搜索、数据选择和数据收录3个相对独立的过程组成。
数据采集
数据预处理
数据挖掘
知识表达
分析评价
图1Web数据挖掘处理过程
1.2数据的预处理
Web中数据大体上都是不完整,不一致的脏数据,无法直接进行数据挖掘,或挖掘结果差强人意。为了
提前数据挖掘的质量产生了数据预处理技术。数据预处理有多种方法:数据清理,数据集成,数据变换,数据归约等。这些数据处理技术在数据挖掘之前使用,大大提高了数据挖掘模式的质量,降低实际挖掘所需要的时间。
1.3数据的挖掘
数据挖掘就是从大量的、不完全的、有噪声的、模糊的、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。利用数据挖掘进行数据分析常用的方法主要有分类、回归分析、聚类、关联规则、特征、变化和偏差分析等,它们分别从不同的角度对数据进行挖掘[3]。
分类是出数据库中一组数据对象的共同特点并按照分类模式将其划分为不同的类,其目的是通过分类模型,将数据库中的数据项映射到某个给定的类别。
聚类分析是把一组数据按照相似性和差异性分为几个类别,其目的是使得属于同一类别的数据间的相似性尽可能大,不同类别中的数据间的相似性尽可能小。
———————————
作者简介:李晓城,复旦大学信息学院,硕士研究生,上海200433张増杰,复旦大学信息学院,硕士
研究生,上海200433
夏勇明,复旦大学信息学院,讲师,上海200433
钱松荣,复旦大学信息学院,教授,上海200433
44
5
关联规则是描述数据库中数据项之间所存在的关系的规则,即根据一个事务中某些项的出现可导出另一些项在同一事务中也出现,即隐藏在数据间的关联或相互关系。1.4分析和评价
分析和评价过程就是要分析通过数据挖掘过程获得知识模型的可信度和有效性,并且得出评价结论,给管理和用户决策提供信息支持。1.5知识的表达
知识表达模块依赖于通过数据挖掘工具在web 数据中
得到的知识模型,并且最好能够通过合适的方式表达出来,便于用户接受和相互交流。
2健康餐饮系统的设计
2.1功能的描述
我们的健康饮食分析和推荐系统方案由3部分组成:饮食数据采集、数据挖掘过程和健康推荐等。系统这3部分的功能描述可以参见图2。
1.录入数据#时间#食品名称#食品数量#食品成分#……
2.数据分析
#营养成分含量#缺少成分#过剩成分#颊囊等级评定#……
3.推荐信息#菜谱#饮食组合#潜在疾病#健康贴士#……
图2系统功能描述
1)数据的采集
在系统的功能实现中,首先需要一个在线餐饮的电子商务网站平台,作为系统服务的基础。用户可以通过网络在系统的商务平台上点菜,就像在餐饮商家门店里一样;当然,用户也可以通过网站的网页把自己的饮食数据录入网站数据库中。系统可以从数据库中获取用户饮食数据,来跟踪用户的饮食记录。
2)数据的挖掘
系统使用数据挖掘的算法如分类、聚类,关联规则等来从用户数据中提取饮食习惯的有用信息。首先分析食品的营养结构,然后计算菜谱中含有多少脂肪、蛋白质、维生素等。接着利用分类算法处理这些组合数据,将用户分为不同的健康等级,并给出相应的评价结果。
3)健康饮食的推荐
经过上一步数据挖掘的过程,我们可以提取出许多有用的信息。比如,营养结构中哪些是缺少的,哪些是过剩的,潜在的疾病等。然后,系统可以根据个人情况推荐健康的餐饮菜谱,生活小贴士等信息。推荐系统推荐的意见会完善个人的饮食结构,提高健康状况。另一方面,我们会实时跟踪用户的个人偏好,利用关联规则发现个人的口味,推荐相应的菜肴,可以提供更好的个性化服务。2.2健康餐饮系统
健康餐饮分析和推荐系统可以有四部分组成:集成层、数据层、推荐驱动层和用户接口。在文献[1]详细描述了电子商务推荐系统的基本框架,在此基础上,我们构建了在线餐饮分析和推荐系统,系统结构如图3所示。
录入数据花粉
X ML 日志
Web 日志
X q uery/X SLT
Web 日志解析
启发、数据清理
健康模型
个性化挖掘
数据挖掘算法
用户接口
用户日志
推荐层
数据层集成层
数据分析推荐信息
数据库/仓库/立方
基础先验知识
O LA P
图3健康饮食分析和推荐系统结构
(1)集成层是一系列数据挖掘的准备过程。例如:数据提取、数据清理、数据转换、数据加载等。这个层利用Xquery/XSLT/XML 机制实现数据仓储,例如关系型或者XML 本地数据库。同时,使用log 解析组件来解析和转换网站服务器生成的ASCII 码的文件成为标准的数据库格式。这一层还用来维护用户会话与web 服务、web 页面之间的关联,这对于通过用户会话来分析用户对web 服务的使用情况非常重要。
(2)数据层是用户输入、输出的饮食数据的仓储部件。同时,这一层还存放预处理的日志,电子商务会话、还有关于web 服务执行信息等。
(3)推荐层的核心是数据挖掘引擎,负责从数据库中加载大量的XML 数据,执行SQL 命令,还有数据挖掘算法
等。这些算法能够对在数据库中的用户菜谱记录进行分类,
并且发现用户饮食组成结构和潜在疾病的关联,然后根据这些相关性推荐不同的餐饮组合,改善其饮食状况。
4
6
系统采用决策树的分类算法来处理用户饮食数据,然后给出健康状况分析。决策树的目标是创建一个模型,根据多种输入变量来预测一个目标变量的值。我们可以用一个矢量来表示营养属性集X =(X 1,X 2,X 3…,X k ),其中X i 代表不同的营养属性,比如脂肪、蛋白质、糖类、维生素、矿物质含量等。分类属性用也用一个矢量表示C=(c 1,c 2,c 3,…,c n ),其中n 个不同的类代表了不同的健康状况。系统采用决策树作为分类器来实现映射函数C f(X):H →,这些规则可以从科学营养结构的先验知识中训练得到。
对于系统推荐模型可以采用关联规则学习方法。数据挖掘中的关联规则原始定义和描述在文献[4]中做了详细论述。例如,在超市的销售历史数据中发现的关联规则:
{beef}potatoes}
{onions,显示,如果顾客购买了洋葱头和土
豆的组合,一般还会购买一些牛肉。根据这样的理论,我们可以从用户的餐饮历史数据中到用户的个人喜好,然后根据不同用户的不同口味偏好推荐不同的菜单,提高个性化服务水平。
1)用户接口
用户接口是个抽象的概念,用于连接用户和web 服务器。用户可以通过接口来在线点菜,也可以直接输入自己的饮食数据。同时,接口还可以反馈健康分析、个性化推荐信息给顾客。
3系统实现的方案
在系统实现过程中,首先要实现一个在线点餐的电子商务平台,作为用户原始数据的来源。电子商务平台应该实现如下功能,如图4所示。
1)电子商务平台在构建系统之处,我们面临的一个问题就是第一手的用户数据从何而来。为了解决这个现实的问题,一个C2C 的电子商务平台,可以连接用户和餐饮商家,同时也创造了获得用户餐饮数据的机会。
电子商务平台应该实现让顾客在线点菜的功能,然后平台服务器发送顾客的订单到餐饮商家,然后顾客就可以直接去商家消费。
录入
数据网站
数据库
日志文件
数据挖掘推荐模型
在线
点菜餐饮商家
图4系统实现方案
2)用户接口
如果用户不通过在线点餐,系统还要通过接口实现用户
数据的录入;同时,接口还要实现健康状况、推荐意见反馈
等。
3)日志信息
服务器应该可以记录顾客在浏览器里的操作的日志信息,然后可以作为进一步的数据挖掘的数据源。
4)后台数据处理
数据处理引擎是整个平台不可获取的一部分。首先,后台数据处理要通过数据库和日志信息收集用户的饮食数据,然后通过特定的数据挖掘算法来分析这些第一手数据,最后给出分析结果和推荐建议。
4结论
在本文中,我们提出了基于数据挖掘的健康饮食分析和推荐系统。从功能上讲,系统可以划分为3部分:用户餐饮数据采集、数据挖掘处理、健康餐饮推荐等。首先,系统方案需要搭建一个C2C 的电子商务平台,用户可以在线订餐点菜,就像在实体店中一样,通过获取用户存在数据库中的数据来跟踪用户的饮食记录。当然,这样的平台也要支持用户可以通过网页在线直接输入饮食数据。然后,我们使用数据挖掘算法像分类、关联规则等来分析用户饮食习惯的有用信息,实现健康状况评定和饮食推荐等。
本文提出的方案是健康饮食分析和推荐系统的第一步,后面还需要在系统实施验证方面做更多的工作。
参考文献
[1]Sun Jinhua,Xie Yanqi.A Web Data Mining Framework
for E-commerce Recommender Systems[C].International Conference on Computational Intelligence and Software Engineering,11-13Dec.2009,pp.1-4.
[2]Xinlin Zhang,Xiangdong Yin.Design of an Information
Intelligent System based on Web Data Mining[C].International Conference on Computer Science and Information Technology 2008,pp.88-91.
[3]D J H,Mannila H and Smyth P.Principles of Data月经一般几天
Mining[M],MIT Press,2000.
[4]Agrawal R,Imielinski T,Swami A.Mining Association
可瑞康奶粉价格Rules between Sets of Items in Large Databases[C].SIGMOD Conference 1993,pp.207-216.
基于javaweb的美食食谱网站
[5]Xinlin Zhang,Xiangdong Yin.Design of an Information
Intelligent System based on Web Data Mining[C].International Conference on Computer Science and I
nformation Technology 2008,pp.88-91.
[6]Chen ting,Niu xiao,Y ang weiping.The Application of
Web Data Mining Technique in Competitive Intelligence System of Enterprise based on XML[C].Third International Symposium on Intelligent Information Technology Application,2009,pp.396-399.
(收稿日期:2010-04-06)
4
Dynamic Modeling for Utility Boiler Super-heater System and Simulation by Using Online Plant Data (21)
Chen X u,Wang Jingcheng(Department of A utomation,Shanghai Jiaotong University,Shanghai200240,China)
Ab stract:Based on mechanism analys is,a lumped parameter model of utility boiler super-heater system is dev eloped.The model shows the dynamic properties of the main steam pressure,temperature and flow.Combining with the actual operation data,the parameter of the model c
an be fitted without testing the actual boiler s ystem.In this paper,polynomials and least-squares were uses to fit phys ical properties formula of water and water vapor.This model is validated by using the actual data over a large vary ing range of plant load.Results showes that the outputs agree basically with the operation data and the model can be us ed for online real-time simulation.
Key wor ds:Super-heater;Mechanism Modeling;L u mped Parameter Method;Poly no mial Fitting;Model Validation
Analysis and Research on Message Archiving Algorithm in Data Preserving System in Online Trading (24)
Chen Rui,Ou Ruofeng,LingLi(Department of Communication Science and Engineering,Fudan University,Shanghai200433,China)公共关系学作业答案
Ab stract:Online trading has become popular recently.As trading security is concerned,a reliable method of pres erving trading data as tes timony is important.This paper focuses on message archiving algorith m in data preserving sys tem over online trading,designs three programs for different situation, providing a new way to solve the problem of evidence certification over online trading.
Key wor ds:Online T rading;Data Pres erving Sys tem;System Architecture;Message Archiving Algorithm
Outdoor Mobile Robot Localization Based on Infrared Omni-Vision (28)
Zhang X u(Department ofAutomation,Shanghai Jiaotong University,Shanghai200240,China)
Ab stract:An outdoor localization system for mobile robot based on infrared Omni-Vision is presented.The near infrared illu minator is used for eliminating the disturbance of natural light and shadow.Omni-directional camera can capture images with a wide range of landmark's information.In the image process ing,improved OTSU method and land mark tracking are adopted to make the sys tem ru n faster and more accurate.The triangulation approach is then used for robot’s localization and navigation.The localization experiments in outdoor environment based on real-robot demonstrate the system's accuracy and robustness.
Key wor ds:Omni-vision;Near-infrared;Mobile Ro bot;Outdoor L o calization
DEVELOPMENT AND APPLICATION
Delay Strongly Dependent Controller Design of Time-delay Network System (31)
Ning Shangpeng(Institute of A utomation,Shanghai Jiaotong University,Shanghai200030,China)
Ab stract:Event-driven model is th e major way to describe the network system with time delay less than one sample period.How to decrease the conservation is the core problem in designing controller for this kind of model.In this paper,by means o f augmentation,the existing event-driven model is transformed into discrete models with uncertainty determined by the time delay,and the uncertainty is described by multi-convex polytropic models. Given the features of the model after transformation,a set of controller gains strongly dependent on time delay is designed,while the controller gains finally executed is acquired by online weighting the vertexes of the s et.T o some extent,the proposed methodology decreases th e conservation of controller in terms of its stronger the dependency on time delay.
Key wor ds:Network T ime Delay;Event-driven;Delay Strongly Dependent;Multi-convex Polytrophic;Predictive Control
Phase Profilometry Based on High Dynamic Range ImageAlgorithm (34)
Shao Saisai,Zhao Y uming(School of Electronic Information and Electrical Engineering,Shanghai Jiaotong University,Shanghai 200240,China)
Ab stract:Traditional phase profilometry encounter some problems when tes ted object has specular reflection or non-uniform color changing.A new method using high dy n amic range grating-deformed image to replace with traditional image is elicited,us ing multiple different exposure time images to combine high dynamic rang e image,which both retain over-saturated area information and dark area’s information.Experiments demonstrated that high dynamic range image based phase profilometry has high accuracy,wide range and low hardware requirement.
Key wor ds:Shape M easurement;High Dynamic Range Image(HDRI);Phase Distribution
Animation Image Registration and Luster Removal Image Composition Technology Based on SURF Feature (37)
Zhuo W uhan,Y an Jingqi(Institute of Image Processing and Pattern Recognition,Shanghai Jiaotong University,Shanghai200240, China)
Ab stract:Luster immediate influences the following algorithm performance in the field of Indus try Detection、Pattern Recognition and Co mputer Vis ion. It is always the hot topic that how to detect and eliminate the luster region in the image.T his article introduces a solution that Animation image registration and Luster removal image composition technology Based on SURF feature to eliminate th
e luster.Firstly,this paper register the animation imag es by the way of feature detection and feature description based on SURF;secondly,we fuse the images after matching;Finally,this paper outpu t the fusing image.The experiment indicates that this method is good for eliminating the luster,and this solution has certain value in the field of theory and application.
Key wor ds:Image Registration;Image Fusing;Surf;Luster-removal
Transplant and Tailor of gSOAP Based on Embedded Linux (40)
产后恶露不尽怎么办Li Y ong,Yu Hui(College of Computer and Communication Engineering,China University of Petroleum(Ease China),Dongying257061, China)
Ab stract:Web Services are application objects bas ed on execute programs,and form the web APIs typically.Then Web Services are a distributed applications platform which runs operations reciprocally,and can execute over all OSs which supports HTTP protocol.On the other side,with the rapid development of the Embedded System,lots of open-source soft are transplanted to the embedded platform by its embedded operating s y s tem,and more software system functions are developed.This article combines above-mentioned two popular techno logies.It transplants gSOAP which is a famous open-source web service soft onto the embedded Linux,and because of the embed
ded system's special application and lack of its resource,gSOAP is tailored specifically.After tailoring,gSOAP takes up les s space,and the application sys tem based on the web services can get better performance.
Key wor ds:Embedded Linux;Web Services;gSOAP;Soft tailor;Transplant
Design of Healthy Eating Analyzing and Recommender System Based on Web Data Mining (44)
III
Li X iaocheng,Z hang Zengjie,Qian Songrong,X ia Yongming(Department of Communication Science and Engineering,Fudan University, Shanghai200433,China)
Ab stract:With the pace of life gradually accelerated nowadays,fast-food becomes more and more popular in daily life,wh ich would lead unhealthy eating habit.In order to solve this problem,this paper pres ent a proposal of healthy eating analyzing and recommender sys tem based on web data mining, wh ich would track your eating habit and recommend the types of foods that improve your health and avoid the types of foods that raise your ris k for illnesses.This paper introduce s ome basic knowledge of web data mining.Then a web-based data mining solution to healthy eating analyzing and recommen
der system is brought up.Finally we post the sys tem implementation propos al for this system.
Key wor ds:Web Data Mining;Healthy Eating;Ea-commerce
Design and Implementation of Information System Based on Web for Vehicle Fuel Consumption (47)
Lin Kaiyan,Jiang Ruxiu(Modern A gricultural Science and Engineering Institute,T ongji University,Shanghai200092,China)
Ab stract:Design and build up an information system based on B/S for vehicle fuel consumption,by collecting and collating related information of No.1 Bus Company of Shanghai.T he s ystem provides such services as query,statistics,statement analysis of buses adding oil record and so on.The analysis results show that,for public traffic companies the system can effectively reduce fuel waste,improve work efficiency,provides scientific grounds of cost accounting for vehicle energy consumption.
Key wor ds:Vehicle Fuel Consumption;Information System;B/S
TECHNICAL COMMUNICATION
An Algorithm Based on Triangle Mesh for Medical Virtual Cutting (50)
Liu Qing,Y ao Lixiu(School of Electronic,Information and Electrical Engineering,Shanghai Jiaotong University,Shanghai200240, China)
Ab stract:A simp le and reliable mesh cutting algorithm was proposed in this paper,which is based on the surface mesh composed of triangles.When moving the cutting tool,the algorithm uses th e OBB oriented bounding box to detect the collision between the cutting tool and the mesh.T o simplify the OBB algorithm,this paper use the OBB result triangle to calculate the collision point only on the first time they collide,then we find the collision triangle according to the cutting direction and the AIF data structure which contain the topological structure of the mesh.Vertecies transferring method was used in mes h cutting,which can avoid the triangles who has very s mall angles or edges.Experiment result show that the algorithm proposed in this paper can emulate the cutting progress in medical navigation s ystem.
Key wor ds:Collision Detection;Virtual Cutting;Vertecies Transferring;Mesh Separation
The Optimization and Simulation of Portfolio Model with Nonnormal Distribution Return Rate without Short Sales (54)
Y u Zhiqin,Y ang Genke(Department ofAutomation,Shanghai Jiaotong University,Shanghai200240,Chi
na)
Ab stract:Under the assumption that rates of return are not normal random variables,a mean-V aR portfolio model without short sales is established to be compared with mean-variance portfolio model and mean-VaR portfolio model with normal distribution return rates.The application shows that the investment res ult of mean-VaR portfolio model is better than that of mean-variance portfolio model.T he investment result of mean-VaR portfolio model with nonnormal d istribution return rate is better than that of mean-V aR portfolio mod el with normal distribution return rate.
Key wor ds:Portfolio;Optimization;Risk Capital
A Document Analysis and Image Preprocessing Algorithm for the Reading Robot (58)
Zhang W eiye,Zhao Qunfei(Digital Image Processing and Pattern Recognition Institute,Shanghai Jiaotong University,Shanghai 200240,China)
Ab stract:The reading robot makes use of optical,mechanical and electronic integration technology to implement auto matic funtions such as turning pages,document information collection,document analys is,OCR(optical character recognition)and reading text.Because of the thicknes s and binding se
am of books,automatically page-turning and visual system could bring some geo metric dis tortions to the page images,and bring down the OCR capability of the reading robot.A document image pre-proces sing algorithm is proposed in this paper,including documemt analys is,image binarization and image dewarping through mathematical models,to imp rove the quality of document image,enhance OCR rate,and ensure the fluent reading and stable work of the reading robot.
Key wor ds:Reading Robot;Image Preprocessing;DocumentAnalysis;Binarization;Dewarping
Design and Implementation of a Business Credit Information’s Data Matching and Integration (62)
Jiang Y ixun,W u Jian,Lei Y aolin(College of Computer,Northwestern Polytechnical University,Xi’an710072,China)
Ab stract:Shaanxi Provinces credit information exchange platform is a typical instance of e-government information sharing platform, this platform is based on centralized data exchanging mode to establish a unified data center.It gets data from government departments, making them integrated and then for publicity.In the process of data integration,data matching is very important issues.This article focused on how to design a rule-based data matching module,so that to deal with the different quality of the data,to identify the same corporate credit information,and to ensure data consistency.Th
e Process is consisted of accurate matching,fuzzy matching,and exclusive matching,so the system can handle incremental data and some bad data.There is much discrepant information which computer can’t handle,they’ll be putting into a manual database for manual handling.
Key wor ds:Data Matching;Credit Information;Data Integration;E-Government
Address:1954Huashan Rd.,Shanghai,P.R.China
Zip Code:200030
Tel:86-21-62933230Fax:86-21-62933230
Email:smcaa@online.sh URL:http//wxdy.chinajournal
IP:202.96.210.198Publisher:Shanghai Microcomputer Application Association
Code Number:M6329Distributor:International Book Trading Corporation(P.0.Box399,Beijing)
IV